Table Of Contents

This Page

Chinese Input Methods

Introduction

This document aims to introduce to non-Chinese how Chinese Input Methods work.

It can be a simple reference for developers of Operating Systems outside of China who want to enable Chinese users to input their own language.

It should be considered a work in progress and might be incomplete or contain some mistake. Feel free to contact me if you want to add/fix something.

I started it because I wanted to help GNOME developers get a better understanding of the needs of the Hong Kong community while IBus was being integrated to GNOME 3.6.

Unfortunately, there are too few FOSS developers in China, and it is hard for non-Chinese to understand how to properly implement Chinese input methods without some minimum knowledge. Hopefully this document will help fill this gap. And who knows, that might even help get more Chinese FOSS developers. :)

Written Chinese languages

There are mostly two written Chinese languages: Simplified and Traditional Chinese.

Mainland China writes in Simplified Chinese, while Hong Kong, Macau and Taiwan write in Traditional Chinese.

Types of Chinese input methods

There are two big classes of Chinese input methods. Each one will be detailed in the next two sections.

IM based on the sounds of words

A user of those will type the romanization of the Chinese character, i.e how a word in the Latin alphabet could be written to produce the sound of that character.

For example, 我 (the pronoun “I” or “me”) is pronounced something like “wo”. So a Pinyin user will type those two characters, “w” then “o”, and one of the suggestions will be 我.

Examples of these include Pinyin and Bopomofo.

IM based on the strokes necessary to write a word

These are based on the strokes necessary to write a character.

For example, with a pen and paper, to write 三 (the number 3), one needs to:

  • first write the first stroke: three-first,
  • then add the second one: three-second,
  • and finish with the last stroke: three-third

This is much like when writing a “p”, one would start by “drawing” the vertical bar, then add the round part.

In a stroke-based input method, each type of stroke (vertical, horizontal, curved, ...) is associated to a character of the latin alphabet on the keyboard.

And then one has to type in the right order the series of characters corresponding to the series of strokes necessary to write the full character.

In the above (trivial) example of the number 3, the horizontal stroke corresponds to the “m” key in the Cangjie (version 3) input method. So to input the number 3, the user would have to press three times the “m” key.

Cangjie, Quick or hand-writing (either with pen and paper or with a touch screen device) are all examples of stroke-based input methods.

Most used Chinese input methods

Note

This document was written and reviewed primarily by people in Hong Kong. If we made any mistake for the other regions using Chinese input methods, please let us know.

The most used Chinese input methods are the following:

The situation in Hong Kong

Cangjie and Quick

Cangjie is a very classic stroke-based input method, as explained above. Every word is represented by a combination of up to 5 keys.

Quick is based on Cangjie, with a simple change to make it easier and reduce the number of keys needed before getting suggestions to only 2: the user only types the first and last key, corresponding to the first and last stroke in Cangjie.

Multiple languages

Cangjie (and Quick as it is based on Cangjie) were designed to input the characters of 3 different languages:

  • Traditional Chinese
  • Simplified Chinese
  • Japanese

Its design is clever enough to limit “collisions” (i.e a given combination of 4 keys returning candidates in more than one language) to a minimum. When collisions happen though, it will usually be limitted to rarely used characters or slow to type combinations.

As such, most of the time, a Cangjie user will only be presented with candidates in the language he is expecting based on his input. (unless he is not using the version he thinks he is)

Different versions

The Cangjie input method (not its implementation in a given Operating System) was first published in 1976.

Since then, a few different versions have been published, each slightly incompatible with each other.

For example, the word “面” (face, surface) will be inputted differently in each version:

  • “mwyl” in Cangjie 3
  • “mwsl” in Cangjie 5

These incompatibilities mean that users will have to spend some time learning a new version, almost as if it were a different input method.

Schools and education

Schools teach Cangjie version 3.

This has a lot to do with inertia: schools teach Cangjie 3 because it is the default on Microsoft Windows, which in turn defaults to version 3 because it’s what is taught at school.

What people use

After learning at school, most people will move from Cangjie to Quick.

This is because the former has a much steeper learning curve than the latter, which is much easier to use.

However, many people stick to Cangjie because, once they have made the effort to learn it properly, it allows them to type much faster.

In any case, the overwhelming majority uses version 3 of their input method of choice, with the rest mostly using version 5.

Stroke 5 for a11y

Stroke 5 is an input method which was created for the elderly and people with reduced hand mobility.

It is stroke-based, just like Cangjie and Quick.

However, to allow typing with few fingers and with relatively few movements, only 5 keys are used (from a US keyboard layout) :

  • “n” for the “curved” strokes
  • “m” for the “left to right horizontal” strokes
  • ”,” for the “right-to to left-bottom diagonal” strokes
  • ”.” for the “left-top to right-bottom” strokes (and punctuation marks)
  • “/” for the “top to bottom vertical” strokes

So for example, to write the word 中 (“middle”), one must first write the leftmost vertical stroke, then the top horizontal line and the rightmost vertical line as one stroke, then the bottom horizontal stroke, and finally the long middle vertical stroke.

As such, a user of the Stroke 5 input method would input the “/nm/” combination of keys.

In Hong Kong, some groups are showing tremendous results with Stroke 5, giving access to electronic devices and the Internet to people who traditionally couldn’t input their own language on a keyboard before.

Authors

This document was written by Mathieu Bridon (bochecha). You can contact me by email.

I have to thank Wan Leung Wong for his patience and the time he took to explain all these things to me. This document wouldn’t exist without him.

This document is distributed under the Creative Commons Attribution Share-Alike 3.0 Unported license (CC-By-SA).

Sources are available on GitHub.