How to make your application support Input method under Linux

As an Linux application developer, one might not aware that there could be certain effort required to support Input Method (or Input Method Editor, usually referred as IME) under Linux.

What is input method and why should I care about it?

Even if you are not aware, you are probably already using it in daily life. For example, the virtual keyboard on your smart phone is a form of input method. You may noticed that the virtual keyboard allows you to type something, and gives you a list of words based on what you already partially typed. That is a very simple use case of input method. But for CJKV (Chinese, Japanese, Korean, Vietnamese) users, Input method is necessary for them to type their own language properly. Basically imagine this: you only have 26 English key on the keyboard, how could you type thousands of different Chinese characters by a physical keyboard with only limited keys? The answers, using a mapping that maps a sequence of key into certain characters. In order to make it easy to memorize, usually such mapping is similar to what is called Transliteration , or directly use an existing Romanization system.

For example, the most popular way for typing Chinese is Hanyu Pinyin.

In the screenshot above, user just type “d e s h i j i e”, and the input method gives a list of candidates. Modern Input method always tries to be smarter to predict the most possible word that the user wants to type. And then, user may use digit key to select the candidate either fully or partially.

What do I need to do to support Input method?

The state of art of input method on Linux are all server-client based frameworks. The client is your application, and the server is the input method server. Usually, there is also a third daemon process that works as a broker to transfer the message between the application and the input method server.

1. Which GUI toolkit to use?

Gtk & Qt

If you are using Gtk, Qt, there is a good news for you. There is usually nothing you need to do to support input method. Those Gtk toolkit provides a generic abstraction and sometimes even an extensible plugin system (Gtk/Qt case) behind to hide all the complexity for the communication between input method server and application.

The built-in widget provided by Gtk or Qt already handles everything need for input method. Unless you are implementing your own fully custom widget, you do not need to use any input method API. If you need your custom widget, which sometimes happens, you can also use the API provided by the toolkit to implement it.

Here are some pointers to the toolkit API:

Gtk: gtk_im_multicontext_new GtkIMContext


The best documentation about how to use those API is the built-in widget implementation.

SDL & winit

If you are using SDL, or rust’s winit, which does have some sort of input method support, but lack of built-in widget (There might be third-party library based on them, which I have no knowledge of), you will need to refer to their IME API to do some manual work, or their demos.

Refer to their offical documentation and examples for the reference:

Xlib & XCB

Xlib has built-in XIM protocol support, which you may access via Xlib APIs. I found a good article about how to add input method support with Xlib at:

As for XCB, you will need to use a third-party library. I wrote one for XCB for both server and client side XIM. If you need a demo of it, you can find one at:

Someone also wrote a rust binding for it, which is used by wezterm in real world project. Some demo code can be found at:


As for writing a native wayland application from scratch with wayland-client, then you will want to pick the client side input method protocol first. The only common well supported (GNOME, KWin, wlroots, etc, but not weston, just FYI) one is:

2. How to write one with the APIs above?

If you use a toolkit with widget that can already support input method well, you can skip this and call it a day. But if you need to use low level interaction with input method, or just interested in how this works, you may continue to read. Usually it involves following steps:

  1. Create a connection to input method service.
  2. Tell input method, you want to communicate with it.
  3. Keyboard event being forwarded to input method
  4. input method decide how key event is handled.
  5. Receives input method event that carries text that you need to show, or commit to the application.
  6. Tell input method you are done with text input
  7. Close the connection when your application ends, or the relevant widget destructs.

The 1st step sometimes contains two steps, a. create connection. b. create a server side object that represent a micro focus of your application. Usually, this is referred as “Input Context”. The toolkit may hide the these complexity with their own API.

Take Xlib case as an example:

  1. Create the connection: XOpenIM
  2. Create the input context: XCreateIC
  3. Tell input method your application wants to use text input with input method: XSetICFocus
  4. Forward keyevent to input method: XFilterEvent
  5. Get committed text with XLookupString
  6. When your widget/window lost focus, XUnsetICFocus
  7. Clean up: XDestroyIC, XCloseIM.

Take wayland-client + text-input-v3 as an example

  1. Get global singleton object from registry: zwp_text_input_manager_v3
  2. Call zwp_text_input_manager_v3.get_text_input
  3. Call zwp_text_input_v3.enable
  4. Key event is forward to input method by compositor, nothing related to keyboard event need to be done on client side.
  5. Get committed text zwp_text_input_v3.commit_string
  6. Call zwp_text_input_v3.disable
  7. Destroy relevant wayland proxy object.

And always, read the example provided by the toolkit to get a better idea.

3. Some other concepts except commit the text

Support input method is not only about forwarding key event and get text from input method. There are some more interaction required between application and input method that is important to give better user experience.


Preedit is a piece of text that is display by application that represents the composing state. See the screenshot at the beginning of this article, the “underline” text is the “preedit”. Preedit contains the text and optionally some formatting information to show some rich information.

Surrounding Text

Surrounding text is an optional information that application can provide to input method. It contains text around the cursor, where the cursor and user selection is. Input method may use those information to provide better prediction. For example, if your text box has “I love |” ( | is the cursor). With surrounding text, input method will know that there is already “I love ” in the box and may predict your next word as “you” so you don’t need to type “y-o-u” but just select from the prediciton.

Surrounding text is not supported by XIM. Also, not all application can provide valid surrounding text information, for example terminal app.

Reporting cursor position on the window

Many input method engine needs to show a popup window to display some information. In order to allow input method place the window just at the position of the cursor (blinking one), application will need to let input method know where the cursor is.

Notify the state change that happens on the application side

For example, even if user is in the middle of composing something, they may still choose to use mouse click another place in the text box, or the text content is changed programmatically by app’s own logic. When such things happens, application may need to notify that the state need a “reset”. Usually this is also called “reset” in the relevant API.

Posted in Linux | Tagged , , , , | 3 Comments

一趟神奇的 Debian 环境变量之旅

更正:startx 进行 unset DBUS_SESSION_BUS_ADDRESS 的行为是来自上游,而不是来自 debian ( ),而过去这样做的理由应该主要是让 startx 和已有的 session bus 相互隔离吧,但是在 systemd 的 user session bus 成为主流的现在,这个行为反而会导致问题。arch 只是获得这个修复更早,并不是 debian 自己的 patch 增加的这一行。

这是一件一个月之前的事情,有一个人来到 fcitx 的 telegram 群说他在 debian lxqt 不能在 chromium 下输入。在他贴了一下 chromium 在终端输出的结果之后,事情开始变得奇怪了起来。

$ [12752:12787:1013/] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[12752:12787:1013/] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")

意思是 chromium 连接到 dbus 失败,但这个时代,除了一些奇葩的反 systemd 发行版,正常只要你用 systemd ,就应该能正确设置 dbus 的环境变量。光单单为了这一点,我就得感谢 systemd。早在 11 年前我的一篇博客曾经就写道不用主流桌面不用 display manager 而导致了 dbus 设置不对而出现的奇葩问题。但这个时代我还真没有见过用了 systemd 却还没有自动设置正确 dbus 的情况。于是接下来的事情就探入了我以前从未了解过的兔子洞中。

首先,systemd 帮助你设置 DBUS_SESSION_BUS_ADDRESS 环境变量的原理是一个 pam 模块,所谓的 pam 就是 linux 在你登录系统的时候会自动调用的一系列模块,其中例如就有 pam_env 可以读取 /etc/environment 来设置环境变量,pam_kwalletd5 来把你输入的密码传递给 kwalletd 直接进行 kwallet 解锁等等。

而 pam_systemd 正是承担了设置一系列标准 XDG 和 DBUS 环境变量的任务。所以才有了前面说的,只要你用了 systemd,就不应该出现 dbus 设置问题。而好巧不巧,他用的发行版并非正常的 debian 发行版,而是一个 debian 的衍生发行版 omv。他的默认安装正好没有包含 libpam-systemd。这种事谁能想到?但是接下来我在虚拟机中安装 lxqt 之后,却发现 libpam-systemd 作为某种依赖(猜测是推荐依赖)被安装上了。

这问题又回到了原点,接下来我们发现,他是使用 startx 来启动系统的,而他在 startx 之前,环境变量中是有 DBUS_SESSON_BUS_ADDRESS 存在的。也就是说,在 startx 之后的某个过程中,DBUS_SESSION_BUS_ADDRESS 被什么东西 unset 了!你要让我猜,我恐怕想破脑袋都无法想到到底是什么东西 unset 了这个环境变量。而我直接 startx ,根据 startx 要启动 /etc/alternatives/x-session-manager 我也可以直接进入 lxqt,而这样启动的 lxqt 环境完全正常。

接下来只有亲自实践一条路了,然后我模仿和他一模一样的配置,把 exec startlxqt 写到了 ~/.xinitrc 中,结果发现确实之后就没有 DBUS_SESSON_BUS_ADDRESS 了。此时我是完全没有头绪到底是为什么,抱着随便试一试的心情,我开始对整个磁盘上的文件进行 grep DBUS_SESSON_BUS_ADDRESS 。结果发现在 startx 的第一行赫然就写着 unset DBUS_SESSION_BUS_ADDRESS?!WTF?

看看我自己系统上的 startx,并没有这样的内容,而实际上这是最近在上游才修改的 ,在读了许久 debian startx 相关的脚本在之后,这里将 debian 机制写在这里:

1、首先,startx 之后 debian 会查找 ~/.xinitrc 或者系统级别的 /etc/X11/xinit/xinitrc,而系统级的 /etc/X11/xinit/xinitrc 里面的内容是 . /etc/X11/Xsession,也就是执行 debian 自己独有的 /etc/X11/Xsession 。而这个 /etc/X11/Xsession 干的事情概括起来,就是按顺序加载 /etc/X11/Xsession.d 下面的脚本并启动 x-session-manager 指向的东西。而它在 startx 里 unset DBUS_SESSION_BUS_ADDRESS 之后,会由 /etc/X11/Xsession.d 中的某个脚本将这个变量重新设置回来,从而达成和其他系统类似的效果。而当你使用自己的 ~/.xinitrc 的时候,/etc/X11/Xsession.d 这一系列的脚本都会被跳过,从而导致 DBUS_SESSION_BUS_ADDRESS 不被设置。

所以在 debian 下如果你想要使用 startx ,最好的办法是不要用 ~/.xinitrc ,而用 debian 专有的 ~/.xsession 代替,从而让 debian 自己的 /etc/X11/xinit/xinitrc 加载 /etc/X11/Xsession.d 下许多环境设置再进入桌面。

而如果你在其他的发行版上(如 Arch),你则不能使用 ~/.xsession ,必须用 ~/.xinitrc ,因为 ~/.xsession 是 /etc/X11/Xsession 这一整套 debian 独有的脚本负责的。

startx 的上游这个奇怪的 unset 在过去大家总要手动执行 dbus-launch 或者基于 x11 auto launch 的时代有一定的意义,但是在 dbus 改由 systemd user 启动的现在,已经不再有意义了。debian 相当于是有自己的一套机制来恢复这个变量的值,但是只能在 .xsession 中起作用, ~/.xinitrc 并不能享受到。

当我最终发现原因的时候,我整个人是无语的。这是一个你不到 debian 系统里看看就根本发现不了的问题。早年 Linux 各个发行版各自为战搞了很多自己一套独有的东西,创建启动镜像我能叫出来的就至少有三套 dracut,mkinitramfs,mkinitcpio。现在 systemd 好歹在把大家强行统一这件事情上推着走了很长的路。但也有这种历史残留会留下各种各样的不一致行为。因为你问我好用不好用,那当然是挺好的,特别是debian 自己打包的许许多多的工具实际上是依赖这一整套行为的。所以你就算让它今天删掉变成 vanilla 的版本,也并不是一件简单的事情。

这也可以说回我用 arch 作为系统的一个原因,就是系统上的包都是接近 vanilla 的状态(即上游的原始代码没有改动),能够最大程度获得和开发者一致的使用体验。

Posted in Linux | Tagged , , , | 3 Comments

Get event order right (Try 2!)

TL;DR: this is not considered as a user facing change.

In a previous post, we discussed the issue between the input method event order and the blocking dbus call. To put it simple, input method may generate multiple different outcomes from a single key press, such as committing text, set preedit, etc. The key press comes as a Inter-process communication(IPC) from an application to Fcitx.

If this IPC is blocking, then the event can only arrive after the call is done. Previously, we tried to always use async call to ensure the event happens before the reply is delivered to application first. This can’t be used for Gtk 4 GtkIMContext anymore. Gtk 4 hides too much API comparing to Gtk3, which prevents us from doing a lot of things, for example, re-inject the key event into the application.

In the old async mode, the key event will always be filtered by Fcitx IM Context, then re-inject into the application when the result of event handling returns. Upon the result is received, Fcitx IM Context will copy the GdkKeyEvent back into the application with a special flag on the modifier, to prevent it from being handled by Fcitx IM Context again.

In Gtk 4, there is no API to create a synthetic key event (which is problematic for some other features that Fcitx supports, but we will not discuss that here), which means we will need to implement using the synchronous mode anyway.

Well, not really “must”, because I do find some API to allow a hacky asynchronous implementation, by memorizing the pointer address of GdkEvent and use gdk_put_event to reinject the event. Though that doesn’t work for chromium code because it doesn’t use gdk for event handling.

So what we can do here? The answer is, we create a new version of ProcessKeyEvent API, ProcessKeyEventBatch.

In the old synchronous mode, the root cause of wrong event order is the event sending from input method can only be handled “after” the synchronous ends, which is not we want to see.

In the new ProcessKeyEventBatch, what we do is we do not send the event from input method to application immediately. We block the sending procedure on the input method side until the reply happens. When the reply is finally being sent, Fcitx will put all the events that need to be handled by application in the reply.

Say, application want to commit some text before the key event is handled by application. After commitString() is called on the input method side, the CommitString dbus signal doesn’t happen in the new mode, instead, we wait and put them together in the return value of “ProcessKeyEventBatch”. Upon receiving the reply, the FcitxIMContext first decodes the reply to see if there is anything piggybacked in the same reply, and handles them first. This will make the event order consistent on both of the input method side and the application side.

Posted in fcitx development | Tagged , , | Leave a comment

Redmi AirDots 3 Pro 原神版体验兼我的十年耳机史





把时间倒回10年前,从10人民币级耳机毕业的第一款……是 Panasonic RP-HTX7,纯粹是因为看了《花开伊吕波》之后觉得这个耳机颜值很好才买的。当然因为是头一次买上百元级别的,至少是真的发现音质不一样了。唯一的问题是我太能造耳机了,侧躺着也经常强行带着,以至于耳罩坏的很快。

当时考虑的是买个入耳式的蓝牙,因为也确实体验到了有线毕竟是不够爽。下一款就是 Avantree Sacool Bluetooth 这个蓝牙耳机。牌子自然是没听过的,买来之后总之是觉得不太爽,结果反而入了 Comply 的记忆海绵耳塞的坑。而我这个人老实说,耳油是特别重的,以至于 Comply 的消耗速度在我这实在是太快。所以又打起了要不要试试骨传导的想法。

AfterShokz TREKZ Titanium 然后就尝试了骨传导。总之下场是这个被我最后造坏了……另外舒适度来说,实在也没有达到预期。贴在后脸上的位置总之是感觉也怪怪的。

之后又尝试了另一个 Bose Soundsport ,也试了用别的耳塞,总之是怎么戴怎么不得劲。然后我觉得我可能不适合戴入耳式的耳机,想起HTX 7是覆耳式的,于是又鬼迷心窍去买了有线的 AKG Q 701 。AKG Q 701 可能是这么多耳机里我最后悔的一个了。因为 TM 这玩意它压脑袋,戴上之后头顶会很痛。我尝试了许久各种姿势乃至自己用尼龙扣强行让某些地方减轻用力都没有得到什么效果。

之后又听说了主动降噪……我是真的听一个什么新玩意就想要试试…这次学乖了,买了个二手的 Sony MDRNC500D。效果来说是还不错,舒适度也可以,然而最后连接杆被我造坏了……

然后就进入到正题,也就是我还在用的几个耳机。首先是 Bose QuietComfort 35,这个是五年前买的了,用到现在的话换过三次耳罩,不过还算方便吧,但是外壳的损耗程度已经快接近没的可换的状态了。目前大部分时间懒得充电,当有线使用。

然后是 AirPods 第二代,这个我甚至买了两个……如果说我以前怀疑入耳式有多难受,AirPods 是我这么多年以来发现的难得的一个戴着不会难受的耳机。说实话, 不会耳朵痛的耳机真的是爽到不行……相反 AirPods Pro就完全让我的体验回到过去的那个入耳式难受的状态。

在 Redmi 这款联动耳机到之前,基本上是一直在用 AirPods 的状态。偶尔用一下 Bose QC35。首先说一下 AirPods 的优点……那就是一个拆成两个用,双倍续航时间充电还快。充电盒也非常的方便。即使偶尔有个没电了也可以塞进去用另一个。

Redmi 这款耳机基本上也是类似的设计,很惊喜的一点是……虽然是类似 AirPods Pro 的入耳式,但是竟然不会耳朵痛,这真是太神奇了…光是为了可莉语音的情怀也打算多尝试一段时间看看。续航的话没有认真对比,但是感觉上好像不如 AirPods,不过也算可以接受吧。功能上除了双击,长按,还有三击,可操作性上比 AirPods 多了一个按键功能,这点还是颇为满意的。可以同时连接两个设备不需要切换,相比 AirPods 需要手动切换(我并不用 Mac 所以也体会不到自动切换到电脑之类的功能)要稍微方便一点点。

可莉的书包和嘟嘟可故事集也都制作的颇为精致,收藏感觉很好。充电是 USB-C 线,颜色和外形的 Logo 也都很好的制作了相关的四叶草,火元素标志。音质反正我分辨不能。因为到手还没几天所以也很难评价更详细的内容,但是我带着耳朵不会痛这一点是巨大的加分项,光冲着这一点我也愿意继续尝试使用看看。

Posted in 日志 | Leave a comment

libime 原理介绍(二)

之前第一篇主要介绍了关于 beam search 和输入切分相关的内容,以及提供的一些基础数据结构。接下来主要着重补全介绍上次没有提及的 UserLanguageModel 和 HistoryBigram 的实现细节。


首先上一篇当中提到了我们的输入法的算法核心是 N-gram 和 beam search。一般对采用这种算法的输入法来说,N会取 3 或者 2。可以取得效果和内存占用的平衡。这里姑且来说我们也继承了一部分 Sunpinyin 的精神,因为最初最初的数据就是采用 Sunpinyin 使用的 Open-Gram。当然这里顺便一提,在最新的版本我们重新用全新的数据训练了语言模型。但依然采用了和 Sunpinyin 一样的 Trigram。

HistoryBigram,顾名思义,是一个存储用户输入的 Bigram。它干的事情其实非常之简单,就是把用户的输入的句子根据词一条一条的储存起来。而在内存中,它被存储在 DATrie 中。你也许想问一个问题,就是 DATrie 抽象起来看,可以被看作一个字符串到 4 byte 数据的映射,那么它究竟是怎么存储 Bigarm 这样有两个级别的 Key 的映射表呢?

答案其实很简单,就是你把 Key 做特殊的编码让它能够承载更多的内容。简单的把两个字符串加一个分割符拼成一个就可以了。Bi-gram 和 Uni-gram 计算得分的公式也是来自 Sunpinyin 。最直 的来说,Bigram 的概率应该是 P(B|A) = Freq(AB) / Freq(B) 也即出现 AB 的次数除以 A 出现的次数。对于没有 AB 的情况只会得到 0。这对于一个模型来说效果并不好,语言模型会采取平滑(Smoothing)来保证即使数据很稀疏(用户输入的历史是很少的),也能得到有意义的结果。这里就是把 B 单独出现的次数也拿来做一个加权平均。来减少因为采样而导致的数据缺失。


第一个问题,尽管不精确,但是我们采用的方式是两个概率值加权平均。当然这里也小提一下 sunpinyin 的实现。在最初实现 libime 的时候,很多具体的细节的方案都尝试参考了 sunpinyin 的做法,但也发现了 sunpinyin 很多「奇怪的设计」。例如概率值为了减少浮点数运算,通用的做法都是采用对数概率,这样概率的乘积就是简单的求和了,效率要高的多。但 sunpinyin 的概率权重却是用对对数概率计算加权的算术平均,这就非常奇怪了(很奇怪也不太符合数学)。libime 这里就采用了至少道理上更说得通的原始概率计算算术平均。不过这里有一个小问题,假设你有了 log(P_sys) 和 log(P_user) 怎么计算 log( w * P_sys + (1 – w) * P_user) 的问题。

如果你直接计算的话,我们代换一下数字就可以得到 log(a * pow(10, log(P_sys)) + (1 – a) * pow(10, log(P_user))),总共调用了 3 次 math 的函数。能不能减少一些呢?其实是可以的。

假设我们需要计算 log10(exp10(a) + exp10(b))
log10(exp10(a) + exp10(b))
   = log10(exp10(b) * (1 + exp10(a - b)))
   = b + log10(1 + exp10(a - b))
   = b + log1p(exp10(a - b)) / log(10)

得到上面的公式之后,为什么特意弄出 log1p(exp10(a-b)) 的形式呢?是因为标准库中提供了这样一个函数 log1p10exp,可以一步到位计算这个数值。为了更加精确,可以根据 a b 相对大小来选择是 a – b 还是 b – a。其中 log(10) (e为底)这种常数都可以预先计算,就将 math 库函数调用精简到了 1 次。

可能你会问,原本公式的 w 和 (1 – w) 的权重去哪了呢?这个问题也很简单,你只需要把他们先转化为对数计算即可。

log10(w * exp10(log_sys) + (1-w) * exp10(log_user))
   = log10(exp10(log10(w)) * exp10(log_sys) + exp10(log10(1-w)) * exp10(log_user))
   = log10(exp10(log10(w) + log_sys) + exp10(log10(1-w) + log_user))

这样就可以完美带入到之前的公式当中了。log10(w) 和 log10(1-w) 对输入法来说也都是常数,可以直接计算。

另一个问题就是如何保证用户「新」输入的内容能够快速排序到结果的前列。sunpinyin 的做法我做了一定的参考,但是理性上觉得不太合理所以根据情况优化了一下。sunpinyin 只记录 1024 个词在历史当中,当有新的词进入历史的时候,会踢出老的历史。这个设计让用户输入不会偏离系统输入太远,但也会相对影响用户历史的记录。然后最近输入的一些词的频率会乘以一个系数,来变相增加新输入的内容的权重。但这个范围的数字其实很小,可能会导致这部分改变结果顺序的行为很快被遗忘。

libime 这里采取了类似但做了一些自己的改进。第一,将用户输入的词按句子记录,第二,将用户输入的句子分成三个不同大小的 pool (128,8192,65536),每个 pool 给予不同的概率权重。(具体的数字的选择纯粹就是所谓的 heuristic 的了),这样相对保证记住的历史较多(65536个句子 vs 1024 个词),同时也有根据输入时间有衰减的效果。

Posted in fcitx development | Tagged , , | Leave a comment