Key repetition and key event handling issue with Wayland input method protocols

I do have lots of complaints about wayland current input method protocols. Some of them are just lacking features, but this issue is the one that I think have design flaw from the beginning.

Let’s first review how the keyboard event is handled with input method on Wayland and X11.

The XFilterEvent would use XClientMessage to transport the key event to input method, which would actually introduce another message to X Server which is omitted in the graph above. Other than the XClientMessage, other methods may also be used, including raw socket, or DBus which is used by fcitx/ibus.

In Wayland, the things become different.

The input method first places a keyboard grab, to make compositor send all key event to input method server first. Then, depending on the result of key event (filter or not), the input method server may forward the key event back to compositor, then the key event will be forwarded from compositor to application, if input method server find this key is not relevant to the input method engine’s logic.

It may look ok right now, but if you put key repetition into consideration, you’ll find more issues with this design.

Imagine a following scenario:

1. User is using an editor to type some text, and already have some text in application already.
Let’s just say there’s some existing Chinese text: 你好.

Literally this means “hello” in Chinese 🙂

2. User types some new text and the text is stored in input method’s buffer to be converted to another language.

Hello, world!

3. User thinks that all the text is unwanted so the user press backspace and hold it, expect key repetition to remove the whole line, including “shi jie” and 你好 which is already committed.

Here is where it becomes problematic when Wayland decides to use keyboard grab for input method, and client side key repetition.

In X11, key repetition is done on the X Server side, client doesn’t need to worry about the key repetition generation. Client will just receive multiple key press events (release is optional, depending on a “detectable key repetition” option) until the key is physically released.

In Wayland, the key repetition is done on the application(client) side, the common logic is to implement this feature is that, when client gets a wl_keyboard.key press, it will start a timer and generate new key event on its own.

When you put input method in to this example, you will notice that, the very first “Backspace” is forwarded to input method and is invisible to client. So client will not be able to initiate the key repetition logic. That means, if the key need to be filtered by the input method, the input method server have to do the key repetition on its own.

In the case above, since there are texts in the buffer (shi jie), the first backspace will delete “e” in the buffer, then “i”, and then “j” etc..

When the last character in the buffer “s” is deleted, the buffer will become empty, which means, the next “repeated” backspace event need to be forwarded to application. This can still be done via zwp_virtual_keyboard_v1 or zwp_input_method_v1 depending on which version of protocol you are using.

Expected backspace behavior

But the problem is that, what do to next?

Let’s suppose the key repetition option is that “initial delay is 600ms, the repeat rate is 25/s”. The re-injected backspace can only trigger client’s own key repetition after 600ms, while user would expect it is already in the repeat phase, which will generate a backspace every 40ms. So input method have to continue to generate key press since application does not know the key repetition is already started in the past. But, after the first fake key repetition from input method is re-injected to the application, the client side key repetition logic will be now triggered. If input method doesn’t do anything to prevent it, the client will start to trigger key repetition after 600ms. If that happens, we will see both input method and client generating key repetition at the same time. To prevent this from happening, fcitx5 does a workaround by always sending a fake key release immediately after it send a key repetition from the input method side in order to stop the just started client side key repetition timer.

This seems to be very hacky and unreliable to me, since we are trying to “take over” the key repetition on client side, instead of hand it over.

Lets consider another scenario where it is totally broken.

Imagine a input method that can dynamically convert the text around cursor into preedit, and shows alternative text for the word around cursor. This is very common on mobile phone: you can click on a word and the word will be “underlined”, and alternative candidates is shown on the on-screen keyboard.

1. Image user have text “Hello, world |” (| represent the cursor location in application.

2. user starts to press backspace.

3. the first backspace press is ignored by input method, since there’s no word around cursor.

4. client side key repetition kicks in. Please notice that client side key repetition will not be forwarded to input method under current version of protocol

5. Text becomes “Hello, world”, and input method will try to consume “world” and convert it into preedit text and put “world” in the buffer on the input method server. Which means, from this point, any new backspace event should be handled by input method.

Consume the word “world” is and convert it to preedit is not a feature currently supported under fcitx, but we do want to implement such things in the future. Actually, fcitx5-unikey is already able to do something in a similar way, see the video below.

fcitx5-unikey’s consume existing text and re-edit feature. It’s not triggered by “backspace” but “e” in this case, but you get my idea.

But if you remember how client side key repetition works, you will notice that it will never be forwarded to input method, thus the backspace is “leaked” from input method into application, and will cause unexpected behavior.

My proposed solution to solve this is that: just go back to the old X11 model of forwarding event to input method. The procedure would look like:

  1. wl_keyboard.key send to application
  2. text_input.key send key to input method through compositor, this includes all key events, including the repeated key events generated on the client side.
  3. input method server forward it back with the old interface
  4. application got the input method forwarded back event via a new event text_input.forward_key, instead of from wl_keyboard.key.

This introduce more round trips between compositor and application, but it solves the whole issue in a much more cleaner way comparing to the other solution I can think of. And this new interface can even help on other issue like type-to-search’s chicken-and-egg issue, also this may make browser happier by allow them to stick to the javascript key/IME event standard better.

If one want to stick to keyboard grab model, they may have to add a lot of tricky new events like “handover ongoing key repetition” etc, which from my point of view would introduce much more complexity and easier to go wrong.

Posted in fcitx development, Linux | Tagged , , , | 4 Comments

演示一下和 openKylin 合作开发的虚拟键盘

将在下一个版本的 Fcitx 加入支持。

界面的代码位于:https://gitee.com/openkylin/kylin-virtual-keyboard

欢迎大家进行测试,虽然功能的支持还非常初级,但是已经可以进行一些简单的测试(X11 下,想要在 Wayland 下使用还有一些工作需要进行)。

(视频闪烁主要是录制问题)

Posted in fcitx development | Tagged , | 3 Comments

一次休眠导致的 Linux 启动项丢失

最近更新 Arch Linux 的时候,我一次想要重新开机时,却发现 systemd-boot 的启动项没了。一开始,我以为是配置文件被更新删除了,但在我进入系统之后却发现了意想不到的情况。

我的系统在内核更新到一半的时候正好休眠了!正好卡在老的内核镜像文件已经删除,新的内核还未安装的那个节骨眼上。这个系统的待机是有问题的,所以配置成自动进行休眠,但是却没想到正好卡在了更新内核的那个时刻休眠了。休眠尽管会把内存的状态写入到硬盘当中,但是下次再次启动的时候,仍然需要一个内核初始化之后才能进行休眠恢复的操作。

这个不得不说是非常巧合的一个情况。这下我也深刻理解了 Windows 下系统更新时常见的一个提示「正在配置更新,请勿关闭您的计算机」到底有何用意。

对于 Arch Linux 这种不保留老内核的更新方式来说,还是更新之后尽快重启比较好,因为从不同版本的内核恢复也会导致问题。

Posted in Linux | 3 Comments

Switch fcitx theme based on system color

With the next version fcitx will be able to switch to an alternative dark theme when system dark/light theme changes.

The feature relies on xdg desktop portal implementation that supports this value.

Hopefully accent-color in xdg portal can be merged soon so we could also support that.

Demo:

Posted in fcitx development | Tagged , , | Leave a comment

对“fcitx5 依赖 boost 和 KDE, 探讨继续使用 fcitx4 的可行性“的回应

原贴:https://forum.suse.org.cn/t/topic/15817/7

首先,要对其中的几个所谓的错误说法进行驳斥。

1、Fcitx 5 依赖 KDE 和 Boost?

这是错误的,作为高度模块化的项目,核心库和服务器,输入法引擎,配置界面都是分离的代码库。

核心部分,反而比以前要精简得多,因为 gtk 和 qt im module 都变成了独立的项目,事实上,如果你乐意,可以编译出一个和图形库无关的 fcitx,这也是 fcitx5 能被移植到 android 上的基础。

输入法引擎部分,现在新的拼音引擎使用了极少的一部分 boost,大部分都是 header only 的,只有几个少量和 io 相关的库需要 boost 的共享库。如果你的发行版拆包精细,将只是引入约 500k 左右的依赖。

而配置界面的部分,则可能是有疑问的了,事实上它本身是在同一个代码库内分解成了两个实现,一个是只依赖于 Qt 和少量 KF5 的库,另一个则是和 KDE 系统设置集成的,也就是和 fcitx4 的 kcm-fcitx 的等价物。对于基于 QtWidgets 的实现,在 Qt 之外只引入了 1.6M 的依赖。和它所依赖的 Qt 库总计 19M 相比可以说是九牛一毛了。

而且时常大家会对 Gtk 和 Qt 在磁盘容量上有一些错觉,认为 Gtk 是 C 所以就「light」,编译出来的代码量就要小得多。而事实上如果你将相关的图形,字体,io,dbus 相关的库统统加起来,才能等价于那几个 Qt 的库。仅做一个不严谨的比较的话,Gtk 需要的依赖在我的系统上

992K    /usr/lib/libgdk-3.so.0.2405.32
7.8M    /usr/lib/libgtk-3.so.0.2405.32
1.3M    /usr/lib/libglib-2.0.so.0.7600.1
1.2M    /usr/lib/libcairo.so.2.11708.0
416K    /usr/lib/libpango-1.0.so.0.5000.14
64K     /usr/lib/libpangocairo-1.0.so.0.5000.14
96K     /usr/lib/libpangoft2-1.0.so.0.5000.14
1.9M    /usr/lib/libgio-2.0.so.0.7600.1
14M     总计
228K    /usr/lib/libKF5ItemViews.so.5.104.0
1.4M    /usr/lib/libKF5WidgetsAddons.so.5.104.0
5.2M    /usr/lib/libQt5Core.so.5.15.8
508K    /usr/lib/libQt5DBus.so.5.15.8
6.4M    /usr/lib/libQt5Gui.so.5.15.8
6.8M    /usr/lib/libQt5Widgets.so.5.15.8
24K     /usr/lib/libQt5X11Extras.so.5.15.8
21M     总计

在 Fcitx 5 中移除了 fcitx-config-gtk 之后减少了我本人大量的维护工作,因为 fcitx5-config-qt 和 kcm-fcitx5 有许多代码是可以共享的。如果有些基础常识的话,也知道用纯 C 写代码会有多么痛苦。如果你想说 Gtk 有 js binding / python binding,那是否又徒增其他语言的依赖了呢?

现在,只要你乐意,写一个基于 ncurses 的 fcitx 配置界面也没什么不可以,或者你想复活 fcitx-config-gtk 我也没有意见,但是请不要指望我去写,因为没有那个时间精力。

2、Fcitx 5 的拼音支持变少了?

表面上来说,是这样的,曾经有自带拼音,libgooglepinyin,sunpinyin,libpinyin,看起来仿佛百花齐放非常热闹,但实际上如果你对他们有所了解的话:

自带的拼音的基于最大前向匹配的算法不会比20年前的智能 ABC 好到哪去,最多就是沾了一个云拼音和词库更大的光。

libgooglepinyin 是移植自一个古老 android 版本的拼音库而且有已知的问题在特定输入会崩溃,算法本身是 unigram,也是没有上下文预测能力的。

libpinyin 是 bigram 的模型,但在几年间我对它的使用经验就是:多次随意 break ABI,早期的低质量数据输入提示有大量错字作为默认选择。

sunpinyin 是 fcitx5 当中采用的 libime 的设计精神来源,也就是trigram 的拼音输入法。如果你稍微了解一些关于 ngram 的知识,当 order 越高的时候,对上下文整句的预测也会更加准确。但在更早一篇博客我已经写过和 sunpinyin 的对比。Sunpinyin 的输入历史会被快速遗忘,没有多词库功能。而 libime 采用和 sunpinyin 相同的算法原理但是采用了存储方面更加优秀的 kenlm 作为语言模型的二进制格式,在同样的效果上占用的内存相比 sunpinyin 更小。在早期的 libime 版本中,语言模型和词库就是采用和 sunpinyin 相同的 open-gram,同样的数据同样的算法事实上计算出的结果也会是完全一样的。所以当初这样考虑之后,根本没必要再去实现 sunpinyin 的支持了。因为在库本身的易用性上,libime 是可以方便支持外挂任意数量的词库文件,拼音解析及继承自 fcitx 4 的拼音但是有额外的改进,也支持了内模糊切分(xi’an 和 xian)等等其他一系列原本 fcitx 4 没有的新功能

事实上在一个新的 code base 的基础上我们可以自由引入许多新的功能,例如语言模型和词库也已经不是最早的 open-gram 而是用更新的数据重新训练的。词库也增加了许多新的词汇,双拼也支持了完全自定义音节。sunpinyin 在近几年已经完全没有更新过,即使不考虑功能上的更新,数据也完全没有更新过。而由 fcitx 项目掌控的 libime 可以更自由的更新数据,增加新的功能。直接用拼音输入颜文字,拆字等等功能也都是在这个新拼音下才有的。

另外这些都是开源项目,只要你乐意,代码就在那,你可以随时把 fcitx4 的拼音移植过来,但我不会去捡起那个过时的代码库的。

3、fcitx 的初心是什么?

我不能替 yuking 代言,事实上代码库在我主要接手之后的十几年早就经过了长久的演化。早在十几年前因为 GNOME 的某些行为,我是真切对这些东西(opensource,是否要继续fcitx)感到失望而迷茫,当时也写过很多篇博客讨论这些事情。

事实上在我看来 fcitx 的优势就在于它的模块化架构。有很多功能如果你拿到别的框架去实现,你会发现是没法简单扩展的。例如快速输入/unicode在独立于输入法之外而存在的功能,不同输入法之间共享的拼写检查功能(键盘 / 拼音的英文输入是又另一个模块提供的功能),剪贴板的访问等等。

另外从头到尾的对代码库的掌控也提供了更好的对于新功能新平台的支持。ibus 的 qt5 输入法模块至今有许多已知的 bug,也无法在 wayland 下完全正常的使用。fcitx 5 是事实上唯一一个在 wayland 下对 compositor 支持最多最全面最可用的输入法。高度的模块化也让 fcitx 5 现在甚至可以运行在 flatpak 沙盒内。

我并不认为有谁忘记了初心,甚至坚持的很好,不要自己臆想一个初心强加给别人了。

Posted in fcitx development | Tagged , , | 4 Comments