This article contains a quick overview of the new programming language created by Huawei. It had limited availability even the year before,
and by now they already provide download bundles for Windows, Linux and Darwin on their web-site.
https://cangjie-lang.cn/en/download
We'll walk briefly through downloading, running the compiler - and then browse various language features, comparing them with other
popular industrial languages. If you have experience with languages like Java and Go (and to lesser extent C++) you'll feel it somewhat
familiar - and really the language doesn't try to surprise user with some brain-blowing innovations - seemingly the aim was to provide
natively-compiled language to which Java and C# programmers can easily migrate.
There were talks and rumors that the language will be using hieroglyphs - or that it is dedicated to usage with AI - of these both you'll see nothing in this article. Code is written with typical English keywords and AI-integration, if present, is not on the language level itself.
So the web-site provides bundles for downloading (I picked Linux - most typical for backend programmer) - but also here are useful links: Documentatation and Playground - I will browse the first when talking of the language - and use either local installation or playground to figure out and demonstrate the syntax etc.
After downloading I simply unpack the bundle to some suitable folder (e.g. ~/utils/cangjie) and found here subfolder bin with cjc
executable - obviously compiler. Let's create the file test.cj with simple program like this:

and try to compile it (even without further setup), e.g. ~/utils/cangjie/bin/cjc test.cj - it works fine, and several files are created,
among them executable with the name main. But it won't run:
$ ./main
./main: error while loading shared libraries: libcangjie-runtime.so:
cannot open shared object file: No such file or directory
So we obviously need to set some environment variables. Luckily there is a script envsetup.sh in the root of the bundled files:
$ source ~/utils/cangjie/envsetup.sh
This command will amend both PATH (so that cjc could be accessed without specifying the path) and LD_LIBRARY_PATH (so that executables
will find the dynamically linked library). Now that's better:
$ ./main
Nihao, I'm a fine language :)
As about the library - it could be seen (by running cjc --help that there are options for building with static libraries, but I haven't
succeeded in invoking them to create fully-independent executable.
The things discussed further, which are not about compiling, you can try in the "playground" mentioned above. Sorry for the screenshots of the code, but as there is no ready markup highlighter for Cangjie I will use images for short snippets (while larger fragments will be included as text).
Very usual - identifiers start with letter and may include digits. Underscores are also respected. Letters should be understood in Unicode
sense, so national characters are also valid. Also there is a feature of surrounding identifier in backticks (\...``) which allows using
reserved words as identifiers for example (perhaps not very useful feature).
Keywords let and var are used, the difference is that the former creates immutable variable (in the given scope). This is supposed to help
in avoiding mistakes (and may help in concurrent programming also). Type of the variable is usually inferred but could be specified explicitly.
Here is const also which may look similar to let but differs in representation and possible optimization.
var a = "thirteen"
let b: Int64
let c: Array<String> = ["one", "two", "three"]
If by the example above you may have thought that function declaration goes without any keyword, it is slightly incorrect. Only main function
is declared without keyword, others should use func in front of the name. Let's have a look at example:

From here we can learn several things:
main in a form of array of strings, but this parameter is optionalarr[i]) and is zero-basedprintln function only accepts a single parametermain could be omitted or specified as Unit (not uint!); alternatively it could be Int64 to return exit codeHere basic means not the "simplest" but belonging to the language core. These types fall in several categories:
8, 16, 32 or 64 bits, and also including "native" types which have
a size depending on current hardware architectureFloat32 and Float64 of the shorter one Float16 - I learnt with amusement
it is included in IEEE 754 standard, though I doubt about its usability given short (2-3 digits) precisionBool), characer (Rune) and string (String) types - also very typical, the latter is immutable (also quite usual) and supports
variables interpolation with syntax similar to that of PHP (e.g. "I am ${person.age} years old")Operations on those types are also very typical, you can mainly apply your C++, Go or Java experience. Minor subtleties could be found
though (e.g. ++ exists in suffix form only).
Arrays are declared with the type name Array and type parameter in angular brackets, e.g. Array<String> but this also could be omitted.
In other regards everything is quite usual, arrays are of the fixed size.

Here declaration could be let words: Array<String> = ["...", "..."] or alternatively let words = Array<String>(["...", "..."]), but
we use "syntactic sugar" for shortening. Type parameter could be omitted if compiler can detect it from context (along with angular brackets),
i.e. let words = Array([...]) will work also.
This may seem ridiculous to have dedicated built-in type for the "sequence of values" (in other words combination of a few values - start, end
and step) - but they could be used as Iterable, for example in for loop.

Actually it is the only form of for loop in the language. There is no "classic" form with three parts, found in C or Java. So here
the 1..10 is not just another "syntax sugar" but just expression of the Range<Int64> type. Programmers in Python may find it more
familiar.
Like Python and C# the language includes "tuples" - immutable pairs, or triplets etc, of values, perhaps of different type. Syntax is
quite familiar, they could be packed and unpacked at will:

This I think is very convenient - Java and Go do not have tuples and sometimes it is annoying that they don't.
Here come two very amusing types. The first of them, Unit has only one value (one less than Bool) - it is expressed by empty parentheses ().
As you see, it is used in place of void. The second have yet one less value - i.e. doesn't have any. It is the type of expressions like
break and continue. Somewhat "technical" type which couldn't be assigned directly. Meanwhile Unit could be assigned and compared, though
it is a bit useless.
Types which may have user-defined set of values. Not necessary, but convenient. They have dedicated "matching" expressions and syntax to facilitate usage. One important case is mentioned further.
There are no null or "pointers" (except in "foreign" language interoperability modules). Instead of null there is a Option enum type,
which can have either Some<...> value or None. Such concept is often seen in "functional" languages, in Scala and Haskell for example.
I personally don't feel handling them very convenient, but it is supposed to allow better protection about null-pointer errors or similar issues. Though as programmers are lazy, this protection doesn't always work as intended.
Most programming languages separate "statements" from "expressions". For example println("Hi, people!") is a statement while the string inside
parentheses is expression. Statements could be nested and statements may contain expressions. Actually println is also expression which is
calculated as a reference to function.
However Cangjie tries to simplify the matter and everything is expression in it. This leads to some useful consequences, for example
conditional operator is expression also and can be used as "ternary operator":
println(if (true) { 5 } else { 8 })
At the same time result of the for loop is a value of type Unit - why not Nothing? I don't see useful logic here.
We already have seen for-in loop - the only loop of such kind, which can iterate over everything implementing Iterable interface - ranges,
arrays or collections which we'll mention further. Here is additional feature of where clause, somewhat alike "list comprehensions" in
Python - it allows skip some iterations of the for loop:
for (i in 0..8 where i % 2 == 1)
Though of course this example could be expressed by the range step itself, e.g.
for (i in 1..8:2)
The language also has while loop and do-while properly following the pattern of the old good C and its derivatives.
Now as we know some basic syntax and can write small programs, let's check the performance of the compiled programs.
We know that interpreted languages (like PHP and Python) generally are 5..20 times slower than "natively compiled" ones, of whom the
C/C++ is among the fastest (as it doesn't deal with automatic memory management). Go as an example of another natively compiled languages
is only slightly slower, while Java for example takes intermediate place - it is compiled to byte-code executed by virtual machine, i.e. not
into "native code", but has "just in time" compilation to speed up the code in runtime and is normally only 2-3 times slower than C, though
it may depend on the application (due to memory management again, rather than pure instruction execution speed).
The Cangjie language seems to be natively compiled, so we'll try to implement a couple of programs which I have in other languages in the
dedicated repository Languages Benchmark.
The first uses problem of Collatz Conjecture - the unsolved problem of math,
in which we build sequences of numbers from the given starting value, and they (supposedly) always end at 1 but after a number of iterations
which is hard to predict. We calculate these sequences or rather their lengths for all numbers up to maxN - and thus the program is just a
dobule nested loop with scalar variables. No memory management, so it is mostly about pure instructions execution.
import std.os.*;
import std.convert.*;
main() {
var maxn = Int.parse(if (let Some(v) <- getEnv("MAXN")) { v } else { "10" })
var sum = 0
for (i in 1..(maxn+1)) {
sum += collatz(i)
}
println("sum=${sum}")
}
func collatz(n: Int): Int {
var cnt = 0
var res = n
while (res > 1) {
res = if (res % 2 > 0) {
res * 3 + 1
} else {
res / 2
}
cnt++
}
return cnt
}
In this example we also try some functions of "standard library" - you see "imports" above - for Int.parse(...) and getEnv(...) functions.
They will allow us to pass maxN as environment variable. Additionally you see my clumsy attempt to deal with Option<String> value
returned by getEnv(...). Some other languages would prefer to simply return empty string if no variable is set - or provide extra argument
for default value (I believe this is the best idea).
Execution of C equivalent (from the repo mentioned above) takes about 0.73 seconds for 3mln (value for maxN) while Cangjie takes
something like 1.03 seconds, which is very close with Go result.
Note however that we need to compile with -O2 optimization switch, otherwise it will work much slower.
The second problem just generates an list of primes of the maxN size, by going over all integers and testing them with "trial division"
until the sufficient amount of primes is found. Let this source be represented as an image to preserve highlighting.

With this results are surprisingly nice - the code runs 2 seconds in Cangjie against 4 seconds in Go. Of course this now has much to
deal with arrays implementation (Cangjie version uses ArrayList collection). We don't use C for comparison as it doesn't have
automatically growing lists/arrays out of the box.
We'll only briefly touch this as mostly implementation of OOP in Cangjie is close to that in Java, particularly:
C++ and Python use multiple inheritance and Go instead uses aggregation)Go where any class implements interface
if it has suitable methods)Go where two levels - public and private - are distinguished by the first letter of the identifier)There are also some differences, though - let's see them.
Structs vs Classes
Besides class definitions there are also struct definitions - they also may have member functions and fields. Key difference is that
former are reference types while latter are value types - as hinted by colleagues, such a feature exists in C# and Swift, while Go
for example allows using structs both ways with explicit handling by reference or by value.
There are some more subtle things about them (mutability of fields, lack of inheritance etc) which we won't cover now.
Operators overloading (redefinition)
Like in C++ it is possible to define member functions which are invoked by applying what syntactically are operators - i.e. for class Vector
we can define operators +, - etc. For channels - operators which look like "pipelining", i.e. >> and so on.
This of course is not necessary but sometimes may be convenient.
Extensions
This feature is more like OOP implementation in Go - we can add methods to existing classes, though only in the scope of the given package.
Example demonstrates adding a method printSize to the predefined type String:

On the language level the main FP-feature is the syntax for defining anonymous functions, particularly in form of lambda-expressions and
ability to pass functions as parameters, assign them etc.
Other features, for processing data in "functional" style, are mainly implemented as functions in the std.collection package. Somewhat similar
to how it came with Java 8 but with somewhat specific syntax employing "currying". For example map function doesn't take collection to be
mapped itself. Instead it accept a transformation function as parameter - and result is the real "mapping functiong" - which one can in turn
apply to the collection itself:

It will print [3, 3, 5, 4, 4] of course. You see, after map function there follows "parentheses-less" form of passing lambda - curly
braces with => arrow inside. This will result in the mapping function - and it is then called with proper parentheses on arr argument.
The result is Iterator which could be turned into suitable collection by using one of collectX function. It works but may feel somewhat
clumsy compared to Python for example.
They exist in the language (unlike in Go, for example). But there are no "obligatory-style" exceptions (which I think are found only in Java).
Keywords try, catch, finally and form of try-with-resource - all of them are pretty familiar for java developer, so I feel we need not
say more on the topic. They exist and it is the important point :)
Again for Java-developer this part looks pretty familiar. Package std.collection provides us with ArrayList and LinkedList, also with
HashMap, HashSet and TreeMap. Of course there are subtle details about equality and ordering for the latter ones, but this you'll easily
find in the corresponding API documentation.
Another useful package is std.collection.concurrent which provides BlockingQueue, ArrayBlockingQueue, ConcurrentHashMap and NonBlockingQueue -
the latter is like "channels" in Go, but without obligatory size limit.
Aforementioned Array type though looks syntactically similar, is rather different and shouldn't be confused, being a "built-in" type. Overall
in handling "basic" or "built-in" types the language feels more "unified", like C# rather than Java.
Typical "preemptive" model is implemented, so that all simultaneously running pieces of code may have some chunk of processor time for execution.
For programmer it looks like typical threads model, the threads being implemented in the language itself and mapped to OS-threads, so that
one OS-thread could execut more than one language thread. This allows much faster "context switching" between threads and better control over
giving thread the execution time.
This approach is quite similar to one in Go - just say "thread" instead of "goroutine" and use the keyword spawn instead of go:

We remember that curly braces with the "arrow" inside are the syntax of lambda-expression, i.e. function definition - not something specific for spawning the thread.
For concurrency and synchronization language has familiar atomics (data types which are safe to read and modify from multiple threads) and
also mutexes and the concurrent collection types mentioned above. Here is a ThreadLocal class also - convenient feature found in Java -
storage for values bound to threads. Keyword synchronized is how the mutexes are used, resembling java, but without "monitor" at each object.
Language is rich with these features for "metaprogramming", which allow you to examine or modify the code structure itself in runtime or compile-time. We won't dive into it right now as such features usually may become interesting in larger projects or some specific frameworks. As a sidenote - it's a bit unusual to meet both reflection and macroses for example.
A kind of conditional compilation is supported, somewhat resembling Go build-tags, but these are available above any declaration and
may look like this:
@When[os == "Linux"]
But not as flexible (and horrible) as #ifdefs in C/C++.
This paragraph is only here to mention that currently corresponding part of the "development guide" contains three blank pages :)
However there are API descriptions about packages std.io and std.fs and even std.socket so one should be able to work with these
features anyway.
Modern languages generally are somewhat "incomplete" without useful tools accompanying development. Out of the box Cangjie provides:
cjpmcjdbcjprofcjfmtcjcovAlso plugin for VSCode IDE is provided. I don't use this IDE so haven't tried this. For vim which I use there exists at least highlighter
for syntax, provided by some enthusiast - and seemingly some plugin for Intellij Idea. However seemingly language server is not ready yet,
which somewhat limits these 3-rd party tools. Below is the image of how that "colorizer" for vim represents the code:

The Cangjie language seems to be aimed as a replacement for Java in corporative development, as Java nowadays is a bit overkill with its
compilation to "run everywhere" bytecode, while all server backend software generally is executed in pretty specific environment (often in
docker and kubernetes), thus it could be better to compile it natively. On the other hand it is somewhat richer than both Java and Go.
On the other hand the language doesn't bring forward some unusual "killer-feature" (like "garbage-less" memory management in Rust or strict
type system in Haskell) - no any specific "innovation". This may be good as such innovation often lead to steep learning curve and
premature language demise :)