This document aims to introduce to non-Chinese how Chinese Input Methods work.
It can be a simple reference for developers of Operating Systems outside of China who want to enable Chinese users to input their own language.
It should be considered a work in progress and might be incomplete or contain some mistake. Feel free to contact me if you want to add/fix something.
Unfortunately, there are too few FOSS developers in China, and it is hard for non-Chinese to understand how to properly implement Chinese input methods without some minimum knowledge. Hopefully this document will help fill this gap. And who knows, that might even help get more Chinese FOSS developers. :)
There are mostly two written Chinese languages: Simplified and Traditional Chinese.
Mainland China writes in Simplified Chinese, while Hong Kong, Macau and Taiwan write in Traditional Chinese.
There are two big classes of Chinese input methods. Each one will be detailed in the next two sections.
A user of those will type the romanization of the Chinese character, i.e how a word in the Latin alphabet could be written to produce the sound of that character.
For example, 我 (the pronoun “I” or “me”) is pronounced something like “wo”. So a Pinyin user will type those two characters, “w” then “o”, and one of the suggestions will be 我.
Examples of these include Pinyin and Bopomofo.
These are based on the strokes necessary to write a character.
For example, with a pen and paper, to write 三 (the number 3), one needs to:
This is much like when writing a “p”, one would start by “drawing” the vertical bar, then add the round part.
In a stroke-based input method, each type of stroke (vertical, horizontal, curved, ...) is associated to a character of the latin alphabet on the keyboard.
And then one has to type in the right order the series of characters corresponding to the series of strokes necessary to write the full character.
In the above (trivial) example of the number 3, the horizontal stroke corresponds to the “m” key in the Cangjie (version 3) input method. So to input the number 3, the user would have to press three times the “m” key.
Cangjie, Quick or hand-writing (either with pen and paper or with a touch screen device) are all examples of stroke-based input methods.
This document was written and reviewed primarily by people in Hong Kong. If we made any mistake for the other regions using Chinese input methods, please let us know.
The most used Chinese input methods are the following:
Cangjie is a very classic stroke-based input method, as explained above. Every word is represented by a combination of up to 5 keys.
Quick is based on Cangjie, with a simple change to make it easier and reduce the number of keys needed before getting suggestions to only 2: the user only types the first and last key, corresponding to the first and last stroke in Cangjie.
Cangjie (and Quick as it is based on Cangjie) were designed to input the characters of 3 different languages:
Its design is clever enough to limit “collisions” (i.e a given combination of 4 keys returning candidates in more than one language) to a minimum. When collisions happen though, it will usually be limitted to rarely used characters or slow to type combinations.
As such, most of the time, a Cangjie user will only be presented with candidates in the language he is expecting based on his input. (unless he is not using the version he thinks he is)
The Cangjie input method (not its implementation in a given Operating System) was first published in 1976.
Since then, a few different versions have been published, each slightly incompatible with each other.
For example, the word “面” (face, surface) will be inputted differently in each version:
These incompatibilities mean that users will have to spend some time learning a new version, almost as if it were a different input method.
Schools teach Cangjie version 3.
This has a lot to do with inertia: schools teach Cangjie 3 because it is the default on Microsoft Windows, which in turn defaults to version 3 because it’s what is taught at school.
This is because the former has a much steeper learning curve than the latter, which is much easier to use.
However, many people stick to Cangjie because, once they have made the effort to learn it properly, it allows them to type much faster.
Stroke 5 is an input method which was created for the elderly and people with reduced hand mobility.
However, to allow typing with few fingers and with relatively few movements, only 5 keys are used (from a US keyboard layout) :
So for example, to write the word 中 (“middle”), one must first write the leftmost vertical stroke, then the top horizontal line and the rightmost vertical line as one stroke, then the bottom horizontal stroke, and finally the long middle vertical stroke.
As such, a user of the Stroke 5 input method would input the “/nm/” combination of keys.
In Hong Kong, some groups are showing tremendous results with Stroke 5, giving access to electronic devices and the Internet to people who traditionally couldn’t input their own language on a keyboard before.
Microsoft Windows provides both Cangjie and Quick, both in version 3.
Microsoft Windows is used by virtually everybody in Hong Kong, both at home, at school and at work.
Since Windows 7, it offers to optionally enable the results of respective version 5. But that is in addition to the results of version 3.
Mac OS X provides Cangjie and Quick, in a version that « is somewhat like Version 3 and somewhat like Version 5. » [Wikipedia]
Most Mac users of Cangjie in Hong Kong will install the Yahoo input method framework instead of using the default system one, as it allows them to use Cangjie 3 as they are used to.
Quick users tend to not bother. This is because, given the design of Quick, very few things changed between versions 3 and 5.
This is pretty much a work in progress at the moment, hopefully things should land with GNOME 3.8.
For both Cangjie and Quick, versions 3 and 5 are available, and version 3 is the default.