Jaydar

A Rust library for finding Japanese homophones and ranking them by frequency using the JMDict dictionary, with optional NHK pitch accent data to distinguish true homophones from "fake" homophones (same reading, different pitch).

Features

Find all homophones (words with the same pronunciation) for a given Japanese word
Rank homophones by frequency based on multiple corpora
Support for kanji, hiragana, and katakana input (ソーセージ and 双生児 are homophones)
Identify common vs uncommon words
Distinguish true homophones from "fake" homophones using NHK pitch accent data
Automatically determine input type (unique word, reading)
Automatic katakana-to-hiragana conversion for searches

Usage

Basic usage (without pitch accent)

use jaydar::find;

let homophones = find("かう");
for word in homophones {
    println!("{} ({}) - frequency: {}, common: {}", 
        word.text, 
        word.reading, 
        word.frequency_score,
        word.is_common
    );
}

With NHK pitch accent data

use jaydar::{find_with_nhk, FindWithNhkResult};

let result = find_with_nhk("構成");
match result {
    FindWithNhkResult::NoHomophones => {
        println!("This word has no homophones");
    }
    FindWithNhkResult::UniqueMatch { true_homophones, different_pitch_homophones } => {
        println!("True homophones (same pitch):");
        for word in &true_homophones {
            println!("  {} - pitch: {:?}", word.text, word.pitch_accent);
        }
        
        println!("Fake homophones (different pitch):");
        for word in &different_pitch_homophones {
            println!("  {} - pitch: {:?}", word.text, word.pitch_accent);
        }
    }
    // when the input is hiragana with multiple kanji matches
    FindWithNhkResult::MultipleMatches { homophones } => {
        println!("homopones:");
        for word in &homophones {
            println!("  {} - pitch: {:?}", word.text, word.pitch_accent);
        }
    }
}

When searching for "構成" (pitch 0), words like "後世" (pitch 1) will be categorized as different pitch homophones.

Example with multiple pitch accents:

let result = find_with_nhk("ていど");
// 程度 will have pitch_accent: vec![1, 0] - both pronunciations are valid

Katakana support

use jaydar::find;

// Searching with katakana automatically finds hiragana/kanji homophones
let homophones = find("カイ");
// Returns: 買い, 下位, 階, 回, 貝, 会, etc.

// Works with any katakana input
let homophones = find("コウセイ");
// Returns: 構成, 攻勢, 公正, 厚生, 後世, etc.

Frequency Scoring

The frequency score is calculated based on:

Frequency bucket (1-48): Words in bucket 1 are the 500 most common words
News corpus: Appearance in Mainichi Shimbun newspaper
Ichimango: Presence in the "10,000 word vocabulary classification" book
Loanwords: Common loanword status
Additional: Other common word indicators

Higher scores indicate more common words.

Key Concepts

Result Categories

The find_with_nhk function returns three possible result types:

NoHomophones: The word exists but has no other words with the same reading
- Example: 中国語, タピオカ, 前置き
UniqueMatch: A specific word was searched (kanji/katakana), showing:
- true_homophones: Words with the same reading AND pitch accent (e.g., 構成[0] and 公正[0])
- different_pitch_homophones: Words with the same reading but different pitch accent (e.g., 構成[0] and 後世[1])
MultipleMatches: A reading was searched (typically hiragana), returning all words with that reading
- Example: Searching for "こうせい" returns all words pronounced that way

Note: Many Japanese words have multiple accepted pitch accents. For example, 程度 can be pronounced with either pitch accent 1 or 0. The library stores all accepted pitch accents in order of preference (most mainstream first).

Frequency Ranking

The library correctly ranks common words higher than uncommon ones:

構成 (48,000) > 後世 (36,500)
家庭 (50,000) > 課程 (41,000)
橋 (46,000) > 箸 (32,000)

Examples

Running the basic demo:

cargo run --example demo

Running the demo with pitch accent:

cargo run --example demo_with_pitch

Running the katakana demo:

API Reference

Types

pub struct WordFrequency {
    pub text: String,           // The word (kanji/kana)
    pub reading: String,        // Reading in hiragana
    pub frequency_score: u32,   // Higher = more common
    pub is_common: bool,        // Marked as common in JMDict
}

pub struct WordFrequencyWithPitch {
    pub text: String,
    pub reading: String,
    pub frequency_score: u32,
    pub is_common: bool,
    pub pitch_accent: Vec<u8>,  // Multiple pitch accents in order of preference
}

pub enum FindWithNhkResult {
    NoHomophones,                        // Word has no homophones
    UniqueMatch {                        // Specific word was searched
        true_homophones: Vec<WordFrequencyWithPitch>,      // Same pitch
        different_pitch_homophones: Vec<WordFrequencyWithPitch>, // Different pitch
    },
    MultipleMatches {                    // Reading was searched
        homophones: Vec<WordFrequencyWithPitch>,
    },
}

Functions

// Find homophones without pitch accent data
pub fn find(word: &str) -> Vec<WordFrequency>

// Find homophones with pitch accent data
pub fn find_with_nhk(word: &str) -> FindWithNhkResult

License

Note: This library uses the JMDict database which is licensed from the Electronic Dictionary Research and Development Group under Creative Commons licenses. Applications using this library must display appropriate copyright notices.

The NHK pitch accent data is derived from NHK日本語発音アクセント辞典.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
examples		examples
scripts		scripts
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
homo-kaitou.png		homo-kaitou.png
homo-kousei.png		homo-kousei.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Jaydar

Features

Usage

Basic usage (without pitch accent)

With NHK pitch accent data

Katakana support

Frequency Scoring

Key Concepts

Result Categories

Frequency Ranking

Examples

API Reference

Types

Functions

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Madoshakalaka/jaydar

Folders and files

Latest commit

History

Repository files navigation

Jaydar

Features

Usage

Basic usage (without pitch accent)

With NHK pitch accent data

Katakana support

Frequency Scoring

Key Concepts

Result Categories

Frequency Ranking

Examples

API Reference

Types

Functions

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages