A Rust library for finding Japanese homophones and ranking them by frequency using the JMDict dictionary, with optional NHK pitch accent data to distinguish true homophones from "fake" homophones (same reading, different pitch).

- Find all homophones (words with the same pronunciation) for a given Japanese word
- Rank homophones by frequency based on multiple corpora
- Support for kanji, hiragana, and katakana input (ソーセージ and 双生児 are homophones)
- Identify common vs uncommon words
- Distinguish true homophones from "fake" homophones using NHK pitch accent data
- Automatically determine input type (unique word, reading)
- Automatic katakana-to-hiragana conversion for searches
use jaydar::find;
let homophones = find("かう");
for word in homophones {
println!("{} ({}) - frequency: {}, common: {}",
word.text,
word.reading,
word.frequency_score,
word.is_common
);
}
use jaydar::{find_with_nhk, FindWithNhkResult};
let result = find_with_nhk("構成");
match result {
FindWithNhkResult::NoHomophones => {
println!("This word has no homophones");
}
FindWithNhkResult::UniqueMatch { true_homophones, different_pitch_homophones } => {
println!("True homophones (same pitch):");
for word in &true_homophones {
println!(" {} - pitch: {:?}", word.text, word.pitch_accent);
}
println!("Fake homophones (different pitch):");
for word in &different_pitch_homophones {
println!(" {} - pitch: {:?}", word.text, word.pitch_accent);
}
}
// when the input is hiragana with multiple kanji matches
FindWithNhkResult::MultipleMatches { homophones } => {
println!("homopones:");
for word in &homophones {
println!(" {} - pitch: {:?}", word.text, word.pitch_accent);
}
}
}
When searching for "構成" (pitch 0), words like "後世" (pitch 1) will be categorized as different pitch homophones.
Example with multiple pitch accents:
let result = find_with_nhk("ていど");
// 程度 will have pitch_accent: vec![1, 0] - both pronunciations are valid
use jaydar::find;
// Searching with katakana automatically finds hiragana/kanji homophones
let homophones = find("カイ");
// Returns: 買い, 下位, 階, 回, 貝, 会, etc.
// Works with any katakana input
let homophones = find("コウセイ");
// Returns: 構成, 攻勢, 公正, 厚生, 後世, etc.
The frequency score is calculated based on:
- Frequency bucket (1-48): Words in bucket 1 are the 500 most common words
- News corpus: Appearance in Mainichi Shimbun newspaper
- Ichimango: Presence in the "10,000 word vocabulary classification" book
- Loanwords: Common loanword status
- Additional: Other common word indicators
Higher scores indicate more common words.
The find_with_nhk
function returns three possible result types:
-
NoHomophones: The word exists but has no other words with the same reading
- Example: 中国語, タピオカ, 前置き
-
UniqueMatch: A specific word was searched (kanji/katakana), showing:
- true_homophones: Words with the same reading AND pitch accent (e.g., 構成[0] and 公正[0])
- different_pitch_homophones: Words with the same reading but different pitch accent (e.g., 構成[0] and 後世[1])
-
MultipleMatches: A reading was searched (typically hiragana), returning all words with that reading
- Example: Searching for "こうせい" returns all words pronounced that way
Note: Many Japanese words have multiple accepted pitch accents. For example, 程度 can be pronounced with either pitch accent 1 or 0. The library stores all accepted pitch accents in order of preference (most mainstream first).
The library correctly ranks common words higher than uncommon ones:
- 構成 (48,000) > 後世 (36,500)
- 家庭 (50,000) > 課程 (41,000)
- 橋 (46,000) > 箸 (32,000)
Running the basic demo:
cargo run --example demo
Running the demo with pitch accent:
cargo run --example demo_with_pitch
Running the katakana demo:
pub struct WordFrequency {
pub text: String, // The word (kanji/kana)
pub reading: String, // Reading in hiragana
pub frequency_score: u32, // Higher = more common
pub is_common: bool, // Marked as common in JMDict
}
pub struct WordFrequencyWithPitch {
pub text: String,
pub reading: String,
pub frequency_score: u32,
pub is_common: bool,
pub pitch_accent: Vec<u8>, // Multiple pitch accents in order of preference
}
pub enum FindWithNhkResult {
NoHomophones, // Word has no homophones
UniqueMatch { // Specific word was searched
true_homophones: Vec<WordFrequencyWithPitch>, // Same pitch
different_pitch_homophones: Vec<WordFrequencyWithPitch>, // Different pitch
},
MultipleMatches { // Reading was searched
homophones: Vec<WordFrequencyWithPitch>,
},
}
// Find homophones without pitch accent data
pub fn find(word: &str) -> Vec<WordFrequency>
// Find homophones with pitch accent data
pub fn find_with_nhk(word: &str) -> FindWithNhkResult
Note: This library uses the JMDict database which is licensed from the Electronic Dictionary Research and Development Group under Creative Commons licenses. Applications using this library must display appropriate copyright notices.
The NHK pitch accent data is derived from NHK日本語発音アクセント辞典.