Skip to content

Catastrophic multilingual support for Chinese or Japanese #519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
cxw620 opened this issue Apr 25, 2025 · 2 comments
Open
2 tasks done

Catastrophic multilingual support for Chinese or Japanese #519

cxw620 opened this issue Apr 25, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@cxw620
Copy link

cxw620 commented Apr 25, 2025

System information

zola 0.20.0, latest tabi

Expected behaviour

...

Actual behaviour

...

Steps to reproduce

...

Additional context

目前 zola 和 tabi 的多语言支持还是蛮糟糕的. 对于中文或日语用户, 下面是你可能遇到的一些问题:

Currently zola's multi language support is not so good. For Chinese or Japanese users, you may comes with the following problem:

  • 使用默认官方模板, 按照官方多语言指南, default_language 设置为 zh-Hans, 生成内容失败. 不能开箱即用.

    Using the official template and following the official multilingual guide, setting default_language to zh-Hans will result in failure building contents. Cannot be used out of the box.

    因为 zola 默认情况下没有中文或日语支持.

    This is because Zola does not support Chinese or Japanese by default.

    解决方案 (官方文档其实写了): 需要自行编译安装 zola, 添加分词支持: cargo install --git https://github.com/getzola/zola.git --features indexing-zh,indexing-jp zola.

    Solution: You need to compile and install Zola manually: cargo install --git https://github.com/getzola/zola.git --features indexing-zh,indexing-jp zola.

但很遗憾, 对于中文用户(含简体或繁体), 上面的做法仍然是不够的的, 原因是 zola 使用到的分词库并不接受 zh-Hans 等的写法:

For Chinese users (including Simplified or Traditional one), the above approach remains ineffective because the upstream crate used by Zola does not accept notations like zh-Hans:

https://github.com/mattico/elasticlunr-rs/blob/4db7fac70fa4d6281bf527d9fae07f5a2169f252/src/lang/mod.rs#L85-L105

impl_language! {
    (English, en),
    (Arabic, ar, #[cfg(feature = "ar")]),
    (Chinese, zh, #[cfg(feature = "zh")]),
    // ...
}

为了搜索分词, 参考我的评论 getzola/zola#2800 (comment), 但是并不尽善尽美, 目前还发现了一个严重问题:

For search Chinese, you may try the temporal solution I shared here getzola/zola#2800 (comment), but it is not yet perfect. Currently, the following issues remain:

  • giscus 评论组件失效

    The Giscus comment doesn't work.

    原因是 giscus 的服务器不接受 zh 的写法 (关键 API https://giscus.app/{lang}/widget 当 lang 是 zh 时会返回 404).

    The reason is that Giscus's server does not accept the notation zh (the key API https://giscus.app/{lang}/widget returns 404).

    临时解决方案是设定 lang 为 zh-CN 等 giscus 官方支持的值, 不跟随页面语言.

    The temporary solution is to set lang to a value officially supported by giscus, such as zh-CN, instead of following the page language.


由于此问题牵扯甚广, 我于此谨提一 Issue, 等待商讨一个比较妥当的处理方法.

Since this issue involves a wide range of implications, I create an issue here for a discussion concerning a more appropriate solution.

Final checklist

  • I've checked that the issue isn't already reported.
  • I've tested with the latest version of tabi to check if the issue has already been fixed.
@cxw620 cxw620 added the bug Something isn't working label Apr 25, 2025
@welpo
Copy link
Owner

welpo commented Apr 26, 2025

Hi @cxw620! Thank you for the detailed report!

I feel like the language tag issue should be solved in https://github.com/mattico/elasticlunr-rs. tabi uses the ISO standard.

How can tabi improve the situation? Perhaps detect Chinese/Japanese language and do something so Giscus can load? Any other ideas?

Open to suggestions!

@cxw620
Copy link
Author

cxw620 commented Apr 27, 2025

Hi @cxw620! Thank you for the detailed report!

I feel like the language tag issue should be solved in https://github.com/mattico/elasticlunr-rs. tabi uses the ISO standard.

How can tabi improve the situation? Perhaps detect Chinese/Japanese language and do something so Giscus can load? Any other ideas?

Open to suggestions!

I confirm that giscus accepts lang code like zh-Hans or zh-Hant, the core problem is that zola doesn't accept that. For crate elasticlunr-rs, Chinese support is done by jieba-rs and it actually only supports zh-Hans but not zh-Hant (see messense/jieba-rs#112).

The solution may be:


Ahh, this issue can be moved to a discussion, actually tabi can do little but the upstream zola. Just write here as a reminder for other Chinese users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants