Contents

Cangjie Programming Language overview

This article contains a quick overview of the new programming language created by Huawei. It had limited availability even the year before, and by now they already provide download bundles for Windows, Linux and Darwin on their web-site.

https://cangjie-lang.cn/en/download

We'll walk briefly through downloading, running the compiler - and then browse various language features, comparing them with other popular industrial languages. If you have experience with languages like Java and Go (and to lesser extent C++) you'll feel it somewhat familiar - and really the language doesn't try to surprise user with some brain-blowing innovations - seemingly the aim was to provide natively-compiled language to which Java and C# programmers can easily migrate.

There were talks and rumors that the language will be using hieroglyphs - or that it is dedicated to usage with AI - of these both you'll see nothing in this article. Code is written with typical English keywords and AI-integration, if present, is not on the language level itself.


Installation and First Try

So the web-site provides bundles for downloading (I picked Linux - most typical for backend programmer) - but also here are useful links: Documentatation and Playground - I will browse the first when talking of the language - and use either local installation or playground to figure out and demonstrate the syntax etc.

After downloading I simply unpack the bundle to some suitable folder (e.g. ~/utils/cangjie) and found here subfolder bin with cjc executable - obviously compiler. Let's create the file test.cj with simple program like this:

hello world in cangjie programming language

and try to compile it (even without further setup), e.g. ~/utils/cangjie/bin/cjc test.cj - it works fine, and several files are created, among them executable with the name main. But it won't run:

$ ./main

./main: error while loading shared libraries: libcangjie-runtime.so:
    cannot open shared object file: No such file or directory

So we obviously need to set some environment variables. Luckily there is a script envsetup.sh in the root of the bundled files:

$ source ~/utils/cangjie/envsetup.sh

This command will amend both PATH (so that cjc could be accessed without specifying the path) and LD_LIBRARY_PATH (so that executables will find the dynamically linked library). Now that's better:

$ ./main 

Nihao, I'm a fine language :)

As about the library - it could be seen (by running cjc --help that there are options for building with static libraries, but I haven't succeeded in invoking them to create fully-independent executable.


Basic Syntax and Features

The things discussed further, which are not about compiling, you can try in the "playground" mentioned above. Sorry for the screenshots of the code, but as there is no ready markup highlighter for Cangjie I will use images for short snippets (while larger fragments will be included as text).

Identifiers

Very usual - identifiers start with letter and may include digits. Underscores are also respected. Letters should be understood in Unicode sense, so national characters are also valid. Also there is a feature of surrounding identifier in backticks (\...``) which allows using reserved words as identifiers for example (perhaps not very useful feature).

Variables declaration

Keywords let and var are used, the difference is that the former creates immutable variable (in the given scope). This is supposed to help in avoiding mistakes (and may help in concurrent programming also). Type of the variable is usually inferred but could be specified explicitly. Here is const also which may look similar to let but differs in representation and possible optimization.

var a = "thirteen"
let b: Int64
let c: Array<String> = ["one", "two", "three"]

Function definition

If by the example above you may have thought that function declaration goes without any keyword, it is slightly incorrect. Only main function is declared without keyword, others should use func in front of the name. Let's have a look at example:

function declarations in cangjie programming language

From here we can learn several things:

Basic Types

Here basic means not the "simplest" but belonging to the language core. These types fall in several categories:

Operations on those types are also very typical, you can mainly apply your C++, Go or Java experience. Minor subtleties could be found though (e.g. ++ exists in suffix form only).

Arrays

Arrays are declared with the type name Array and type parameter in angular brackets, e.g. Array<String> but this also could be omitted. In other regards everything is quite usual, arrays are of the fixed size.

array demonstartion in cangjie programming language

Here declaration could be let words: Array<String> = ["...", "..."] or alternatively let words = Array<String>(["...", "..."]), but we use "syntactic sugar" for shortening. Type parameter could be omitted if compiler can detect it from context (along with angular brackets), i.e. let words = Array([...]) will work also.

Ranges

This may seem ridiculous to have dedicated built-in type for the "sequence of values" (in other words combination of a few values - start, end and step) - but they could be used as Iterable, for example in for loop.

for loop over range in cangjie programming language

Actually it is the only form of for loop in the language. There is no "classic" form with three parts, found in C or Java. So here the 1..10 is not just another "syntax sugar" but just expression of the Range<Int64> type. Programmers in Python may find it more familiar.

Tuples

Like Python and C# the language includes "tuples" - immutable pairs, or triplets etc, of values, perhaps of different type. Syntax is quite familiar, they could be packed and unpacked at will:

tuple unpacking in cangjie programming language

This I think is very convenient - Java and Go do not have tuples and sometimes it is annoying that they don't.

Unit and Nothing types

Here come two very amusing types. The first of them, Unit has only one value (one less than Bool) - it is expressed by empty parentheses (). As you see, it is used in place of void. The second have yet one less value - i.e. doesn't have any. It is the type of expressions like break and continue. Somewhat "technical" type which couldn't be assigned directly. Meanwhile Unit could be assigned and compared, though it is a bit useless.

Enums

Types which may have user-defined set of values. Not necessary, but convenient. They have dedicated "matching" expressions and syntax to facilitate usage. One important case is mentioned further.

Nulls and Pointers

There are no null or "pointers" (except in "foreign" language interoperability modules). Instead of null there is a Option enum type, which can have either Some<...> value or None. Such concept is often seen in "functional" languages, in Scala and Haskell for example.

I personally don't feel handling them very convenient, but it is supposed to allow better protection about null-pointer errors or similar issues. Though as programmers are lazy, this protection doesn't always work as intended.

Expressions

Most programming languages separate "statements" from "expressions". For example println("Hi, people!") is a statement while the string inside parentheses is expression. Statements could be nested and statements may contain expressions. Actually println is also expression which is calculated as a reference to function.

However Cangjie tries to simplify the matter and everything is expression in it. This leads to some useful consequences, for example conditional operator is expression also and can be used as "ternary operator":

println(if (true) { 5 } else { 8 })

At the same time result of the for loop is a value of type Unit - why not Nothing? I don't see useful logic here.

Loops

We already have seen for-in loop - the only loop of such kind, which can iterate over everything implementing Iterable interface - ranges, arrays or collections which we'll mention further. Here is additional feature of where clause, somewhat alike "list comprehensions" in Python - it allows skip some iterations of the for loop:

for (i in 0..8 where i % 2 == 1)

Though of course this example could be expressed by the range step itself, e.g.

for (i in 1..8:2)

The language also has while loop and do-while properly following the pattern of the old good C and its derivatives.


About Performance

Now as we know some basic syntax and can write small programs, let's check the performance of the compiled programs.

We know that interpreted languages (like PHP and Python) generally are 5..20 times slower than "natively compiled" ones, of whom the C/C++ is among the fastest (as it doesn't deal with automatic memory management). Go as an example of another natively compiled languages is only slightly slower, while Java for example takes intermediate place - it is compiled to byte-code executed by virtual machine, i.e. not into "native code", but has "just in time" compilation to speed up the code in runtime and is normally only 2-3 times slower than C, though it may depend on the application (due to memory management again, rather than pure instruction execution speed).

The Cangjie language seems to be natively compiled, so we'll try to implement a couple of programs which I have in other languages in the dedicated repository Languages Benchmark.

The first uses problem of Collatz Conjecture - the unsolved problem of math, in which we build sequences of numbers from the given starting value, and they (supposedly) always end at 1 but after a number of iterations which is hard to predict. We calculate these sequences or rather their lengths for all numbers up to maxN - and thus the program is just a dobule nested loop with scalar variables. No memory management, so it is mostly about pure instructions execution.

import std.os.*;
import std.convert.*;

main() {
  var maxn = Int.parse(if (let Some(v) <- getEnv("MAXN")) { v } else { "10" })
  var sum = 0
  for (i in 1..(maxn+1)) {
    sum += collatz(i)
  }
  println("sum=${sum}")
}

func collatz(n: Int): Int {
  var cnt = 0
  var res = n
  while (res > 1) {
    res = if (res % 2 > 0) {
      res * 3 + 1
    } else {
      res / 2
    }
    cnt++
  }
  return cnt
}

In this example we also try some functions of "standard library" - you see "imports" above - for Int.parse(...) and getEnv(...) functions. They will allow us to pass maxN as environment variable. Additionally you see my clumsy attempt to deal with Option<String> value returned by getEnv(...). Some other languages would prefer to simply return empty string if no variable is set - or provide extra argument for default value (I believe this is the best idea).

Execution of C equivalent (from the repo mentioned above) takes about 0.73 seconds for 3mln (value for maxN) while Cangjie takes something like 1.03 seconds, which is very close with Go result.

Note however that we need to compile with -O2 optimization switch, otherwise it will work much slower.

The second problem just generates an list of primes of the maxN size, by going over all integers and testing them with "trial division" until the sufficient amount of primes is found. Let this source be represented as an image to preserve highlighting.

generating primes list in cangjie programming language

With this results are surprisingly nice - the code runs 2 seconds in Cangjie against 4 seconds in Go. Of course this now has much to deal with arrays implementation (Cangjie version uses ArrayList collection). We don't use C for comparison as it doesn't have automatically growing lists/arrays out of the box.


Object-Oriented Programming

We'll only briefly touch this as mostly implementation of OOP in Cangjie is close to that in Java, particularly:

There are also some differences, though - let's see them.

Structs vs Classes

Besides class definitions there are also struct definitions - they also may have member functions and fields. Key difference is that former are reference types while latter are value types - as hinted by colleagues, such a feature exists in C# and Swift, while Go for example allows using structs both ways with explicit handling by reference or by value. There are some more subtle things about them (mutability of fields, lack of inheritance etc) which we won't cover now.

Operators overloading (redefinition)

Like in C++ it is possible to define member functions which are invoked by applying what syntactically are operators - i.e. for class Vector we can define operators +, - etc. For channels - operators which look like "pipelining", i.e. >> and so on. This of course is not necessary but sometimes may be convenient.

Extensions

This feature is more like OOP implementation in Go - we can add methods to existing classes, though only in the scope of the given package. Example demonstrates adding a method printSize to the predefined type String:

class extensions in cangjie programming language


Functional Programming

On the language level the main FP-feature is the syntax for defining anonymous functions, particularly in form of lambda-expressions and ability to pass functions as parameters, assign them etc.

Other features, for processing data in "functional" style, are mainly implemented as functions in the std.collection package. Somewhat similar to how it came with Java 8 but with somewhat specific syntax employing "currying". For example map function doesn't take collection to be mapped itself. Instead it accept a transformation function as parameter - and result is the real "mapping functiong" - which one can in turn apply to the collection itself:

functional features for processing collection in cangjie programming language

It will print [3, 3, 5, 4, 4] of course. You see, after map function there follows "parentheses-less" form of passing lambda - curly braces with => arrow inside. This will result in the mapping function - and it is then called with proper parentheses on arr argument. The result is Iterator which could be turned into suitable collection by using one of collectX function. It works but may feel somewhat clumsy compared to Python for example.


Exceptions

They exist in the language (unlike in Go, for example). But there are no "obligatory-style" exceptions (which I think are found only in Java). Keywords try, catch, finally and form of try-with-resource - all of them are pretty familiar for java developer, so I feel we need not say more on the topic. They exist and it is the important point :)


Collections

Again for Java-developer this part looks pretty familiar. Package std.collection provides us with ArrayList and LinkedList, also with HashMap, HashSet and TreeMap. Of course there are subtle details about equality and ordering for the latter ones, but this you'll easily find in the corresponding API documentation.

Another useful package is std.collection.concurrent which provides BlockingQueue, ArrayBlockingQueue, ConcurrentHashMap and NonBlockingQueue - the latter is like "channels" in Go, but without obligatory size limit.

Aforementioned Array type though looks syntactically similar, is rather different and shouldn't be confused, being a "built-in" type. Overall in handling "basic" or "built-in" types the language feels more "unified", like C# rather than Java.


Multithreading and Concurrency

Typical "preemptive" model is implemented, so that all simultaneously running pieces of code may have some chunk of processor time for execution. For programmer it looks like typical threads model, the threads being implemented in the language itself and mapped to OS-threads, so that one OS-thread could execut more than one language thread. This allows much faster "context switching" between threads and better control over giving thread the execution time.

This approach is quite similar to one in Go - just say "thread" instead of "goroutine" and use the keyword spawn instead of go:

spawning a new thread in cangjie programming language

We remember that curly braces with the "arrow" inside are the syntax of lambda-expression, i.e. function definition - not something specific for spawning the thread.

For concurrency and synchronization language has familiar atomics (data types which are safe to read and modify from multiple threads) and also mutexes and the concurrent collection types mentioned above. Here is a ThreadLocal class also - convenient feature found in Java - storage for values bound to threads. Keyword synchronized is how the mutexes are used, resembling java, but without "monitor" at each object.


Reflexion, Annotations and Macroses

Language is rich with these features for "metaprogramming", which allow you to examine or modify the code structure itself in runtime or compile-time. We won't dive into it right now as such features usually may become interesting in larger projects or some specific frameworks. As a sidenote - it's a bit unusual to meet both reflection and macroses for example.

A kind of conditional compilation is supported, somewhat resembling Go build-tags, but these are available above any declaration and may look like this:

@When[os == "Linux"]

But not as flexible (and horrible) as #ifdefs in C/C++.


Basic Input and Output

This paragraph is only here to mention that currently corresponding part of the "development guide" contains three blank pages :)

However there are API descriptions about packages std.io and std.fs and even std.socket so one should be able to work with these features anyway.


Tools

Modern languages generally are somewhat "incomplete" without useful tools accompanying development. Out of the box Cangjie provides:

Also plugin for VSCode IDE is provided. I don't use this IDE so haven't tried this. For vim which I use there exists at least highlighter for syntax, provided by some enthusiast - and seemingly some plugin for Intellij Idea. However seemingly language server is not ready yet, which somewhat limits these 3-rd party tools. Below is the image of how that "colorizer" for vim represents the code:

syntax highlight scheme used for cangjie language in vim editor


Conclusion

The Cangjie language seems to be aimed as a replacement for Java in corporative development, as Java nowadays is a bit overkill with its compilation to "run everywhere" bytecode, while all server backend software generally is executed in pretty specific environment (often in docker and kubernetes), thus it could be better to compile it natively. On the other hand it is somewhat richer than both Java and Go.

On the other hand the language doesn't bring forward some unusual "killer-feature" (like "garbage-less" memory management in Rust or strict type system in Haskell) - no any specific "innovation". This may be good as such innovation often lead to steep learning curve and premature language demise :)