Skip to content

Short English & Chinese Example fails #464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
C-Loftus opened this issue Mar 29, 2025 · 0 comments
Open

Short English & Chinese Example fails #464

C-Loftus opened this issue Mar 29, 2025 · 0 comments

Comments

@C-Loftus
Copy link

When trying to distinguish a trivial example with multiple languages, in this case where one language is Chinese, it sometimes detects it correctly and sometimes does not. I expected that lingua could use a rule-based approach since hanzi is always an indicator of a non-English language.

I was wondering if this is expected behavior. In my testing for other languages that don't use a latin script like Russian, I am not finding this to be an issue.

Example for reproducing

use lingua::DetectionResult;
use lingua::Language::{English, Chinese};
use lingua::LanguageDetectorBuilder;

fn main() {
    let languages = vec![English, Chinese];
    let detector = LanguageDetectorBuilder::from_languages(&languages).build();
    let sentence = "Hello world. 你好世界";

    let results: Vec<DetectionResult> = detector.detect_multiple_languages_of(sentence);
    assert_eq!(results.len(), 2);

    let sentence2 = "Hello my name is bob. 你好世界";

    let results2: Vec<DetectionResult> = detector.detect_multiple_languages_of(sentence2);
    assert_eq!(results2.len(), 1);
}

Related to #463

Environment

I am running lingua = "1.7.1"

host@computer ~/g/lingua-test (master)> cargo --version
cargo 1.85.0 (d73d2caf9 2024-12-31)
host@computer ~/g/lingua-test (master)> rustc --version
rustc 1.85.0 (4d91de4e4 2025-02-17)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant