Open
Description
This is a GSoC Project idea.
Difficulty/Size: Medium
Right now, the Unicode Inflection project supports Arabic, Danish, German, English, Spanish, French, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Norwegian, Dutch, Portuguese, Russian, Swedish, Thai, Turkish and Chinese. Supporting more languages is desired.
Here is background material for the Unicode Inflection concepts.
- UTW 2023 Automatic Grammar Agreement in Message Formatting
- S12T1 Authoring Grammatically Correct Conversational Templates for Siri
- Let's Come To An Agreement About Our Words :: IMUG 2017.02.16
Expected Outcomes
- Unicode Inflection code will be able to inflect nouns and personal pronouns for the language being supported. Examples include:
- object + plural → objects
- city + plural,genitive → cities’
- Optionally inflect articles, prepositions, adjectives and verbs as necessary for a given language.
- All tests of supported functionality should pass.
- Support a language that isn’t already supported that has sufficient Wikidata, examples include:
- Estonian, Malayalam, Greek, Czech, Norwegian (Nynorsk), Slovak, Ukrainian, Bangla, Punjabi, Polish, Urdu, or Finnish
- Perhaps others, but the required data would need to be added to Wikidata.
- The lexical data will be derived from Wikidata. There is an existing tool to generate appropriate lexical dictionaries for each language, and there are examples of other supported languages.
Skills
- Required: Working proficiency in English
- Required: Understanding of a language that is not already supported in Unicode Inflection.
- Required: Experience with writing software on Windows, Linux, or macOS
- Required: Experience with C or C++
- Preferred: Experience with cmake
- Preferred: Ability to edit XML
- Preferred: Ability to edit data in Wikidata.
Metadata
Metadata
Assignees
Labels
No labels