Using derived rules to inflect nouns #112

nciric · 2025-05-01T22:54:06Z

Feature request

We should have access to inferred rules from C++ code and be able to match the incoming lemma (a potential implementation could be a RuleInflector invoked after DictionaryInflector). Words that match the rules, but not being in the dictionary, could then be inflected automatically. Additional processing may be necessary to fix corner cases, but it would possibly be easier than developing a full set of rules from scratch.

Current use case:

Words уранак, пашњак are not in the dictionary but belong to the same inflection group (see below) as word пропланак which is.
Applying rules for group f would produce correct results for both of them without need for extra logic.

NOTE(George): Groups c and f encode the same rules (suffix is k and ak) - is that a bug in dictionary-parser?

Reasoning

In our initial discussion on how to implement inflection library we mentioned couple solutions:

Dictionary based
Rule based
ML based

Above mentioned options can also be mixed into hybrid solution, e.g. rule and dictionary approach with fallback to ML for more complex languages.

Our current implementation is based on dictionary lookup, with specific language tailorings written in C++, e.g. English guessSingularInflection. In the process of Wikidata ingestion we also produce inflection rules for various parts of speech, including nouns, proper-nouns and adjectives. Those rules are then only applied to words already in the dictionary.

I feel that many language specific tailorings could be avoided, reducing complexity and time needed to implement language support, by reusing those rules for words outside of the dictionary that follow the rule patterns.

Benefits of having a rule based inflector:

The dictionary can be sparse, helping with size
We can launch more languages with sparse Wikidata (see language status)

Writing inflection rules by hand is hard. Take a look at a somewhat simple list of rules in Serbian:

Masculine nouns, ending with -∅, -о and -е, and neutral nouns ending with -о and -е and where the stem stays the same.
Neutral nouns ending with -е, where the stem gets expanded with consonants н, т in most cases.
Nouns where the stem ends with -а (both masculine and feminine).
Feminine nouns ending with -∅ if adjacent adjective is also expressed in feminine form.

Implementing that in Pynini which is a system optimized for quick rule matching is not trivial, but doing it in C++ is a harder problem that doesn't scale as well.

grhoten · 2025-05-13T17:02:27Z

Here's some sample code code that may help.

    const auto& inflector(::inflection::dictionary::Inflector::getInflector(::inflection::util::LocaleUtils::SERBIAN()));
    ::std::vector<inflection::dictionary::Inflector_InflectionPattern> inflectionPatterns;
    std::map<std::u16string, inflection::dictionary::Inflector_InflectionPattern> suffixToPattern;
    for (const auto str : {u"кафана", u"мост"}) {
        std::u16string_view word(str);
        inflectionPatterns.clear();
        inflector.getInflectionPatternsForWord(word, inflectionPatterns);
        if (!inflectionPatterns.empty()) {
            std::u16string suffix(word.substr(word.length() - 2));
            suffixToPattern.emplace(suffix, inflectionPatterns.front());
        }
    }
    for (const auto& [suffix, inflectionPattern] : suffixToPattern) {
        std::cout << inflection::util::StringViewUtils::to_string(suffix) << ": " << inflection::util::StringViewUtils::to_string(inflectionPattern.getIdentifier()) << std::endl;
    }

Here are the results:

на: 2
ст: 11

You can use the inflection pattern directly on the relevant words, and you don't need to look up the patterns by name.

nciric added the enhancement New feature or request label May 1, 2025

nciric self-assigned this May 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Using derived rules to inflect nouns #112

Using derived rules to inflect nouns #112

nciric commented May 1, 2025

grhoten commented May 13, 2025

Uh oh!

Uh oh!

Using derived rules to inflect nouns #112

Using derived rules to inflect nouns #112

Comments

nciric commented May 1, 2025

Feature request

Reasoning

grhoten commented May 13, 2025

Uh oh!