diff --git a/examples/cargo/README.md b/examples/cargo/README.md index d7d573e219f..9bf95039ce7 100644 --- a/examples/cargo/README.md +++ b/examples/cargo/README.md @@ -50,7 +50,7 @@ icu = { version = "2.0.0", features = ["serde"] } icu_provider_blob = {version = "2.0.0", features = ["alloc"] } ``` -To learn about building ICU4X data, including whether to check in the data blob file to your repository, see [data-management.md](./data-management.md). +To learn about building ICU4X data, including whether to check in the data blob file to your repository, see [data-slimming.md](../tutorials/data-slimming.md) and [data-packs.md](../tutorials/data-packs.md). [« Fully Working Example »](buffer) diff --git a/provider/baked/src/export.rs b/provider/baked/src/export.rs index 2a9b23c8482..3745aca4d5c 100644 --- a/provider/baked/src/export.rs +++ b/provider/baked/src/export.rs @@ -6,7 +6,7 @@ //! //! This module can be used as a target for the `icu_provider_export` crate. //! -//! See our [datagen tutorial](https://github.com/unicode-org/icu4x/blob/main/tutorials/data-management.md) for more information about different data providers. +//! See our [tutorials](https://github.com/unicode-org/icu4x/blob/main/tutorials) for more information about different data providers. //! //! # Examples //! diff --git a/provider/blob/src/export/mod.rs b/provider/blob/src/export/mod.rs index 7618a7f8f28..29d40ff356e 100644 --- a/provider/blob/src/export/mod.rs +++ b/provider/blob/src/export/mod.rs @@ -6,7 +6,7 @@ //! //! This module can be used as a target for the `icu_provider_export` crate. //! -//! See our [datagen tutorial](https://github.com/unicode-org/icu4x/blob/main/tutorials/data-management.md) for more information about different data providers. +//! See our [tutorials](https://github.com/unicode-org/icu4x/blob/main/tutorials) for more information about different data providers. //! //! # Examples //! diff --git a/provider/export/README.md b/provider/export/README.md index 27b6170614f..3bf4303865d 100644 --- a/provider/export/README.md +++ b/provider/export/README.md @@ -6,7 +6,7 @@ For command-line usage, see the [`icu4x-datagen` binary](https://crates.io/crate/icu4x-datagen). -Also see our [datagen tutorial](https://github.com/unicode-org/icu4x/blob/main/tutorials/data-management.md). +See our [tutorials](https://github.com/unicode-org/icu4x/blob/main/tutorials) for more information about different data providers. ## Examples diff --git a/provider/export/src/lib.rs b/provider/export/src/lib.rs index e2f8c7ca5d8..408ee137819 100644 --- a/provider/export/src/lib.rs +++ b/provider/export/src/lib.rs @@ -7,7 +7,7 @@ //! //! For command-line usage, see the [`icu4x-datagen` binary](https://crates.io/crate/icu4x-datagen). //! -//! Also see our [datagen tutorial](https://github.com/unicode-org/icu4x/blob/main/tutorials/data-management.md). +//! See our [tutorials](https://github.com/unicode-org/icu4x/blob/main/tutorials) for more information about different data providers. //! //! # Examples //! diff --git a/provider/fs/src/export/mod.rs b/provider/fs/src/export/mod.rs index 4e9788ad0b0..9eca6f3f1cf 100644 --- a/provider/fs/src/export/mod.rs +++ b/provider/fs/src/export/mod.rs @@ -6,7 +6,7 @@ //! //! This module can be used as a target for the `icu_provider_export` crate. //! -//! See our [datagen tutorial](https://github.com/unicode-org/icu4x/blob/main/tutorials/data-management.md) for more information about different data providers. +//! See our [tutorials](https://github.com/unicode-org/icu4x/blob/main/tutorials) for more information about different data providers. //! //! # Examples //! diff --git a/tools/md-tests/src/lib.rs b/tools/md-tests/src/lib.rs index 0ae497f89e0..c707d95244f 100644 --- a/tools/md-tests/src/lib.rs +++ b/tools/md-tests/src/lib.rs @@ -10,12 +10,12 @@ mod readme {} mod tutorials { #[doc = include_str!("../../../tutorials/quickstart.md")] mod quickstart_md {} - #[doc = include_str!("../../../tutorials/date-picker.md")] - mod date_picker_md {} + #[doc = include_str!("../../../tutorials/data-packs.md")] + mod data_packs_md {} #[doc = include_str!("../../../tutorials/data-provider-runtime.md")] mod data_provider_runtime_md {} - #[doc = include_str!("../../../tutorials/data-management.md")] - mod data_management_md {} + #[doc = include_str!("../../../tutorials/data-slimming.md")] + mod data_slimming_md {} } mod documents { diff --git a/tutorials/README.md b/tutorials/README.md index c9bf44aa379..e79f00d627c 100644 --- a/tutorials/README.md +++ b/tutorials/README.md @@ -8,7 +8,7 @@ Welcome! We're glad you want to try out ICU4X! This page serves as a landing pag If new to ICU4X, we recommend reading through [the introduction tutorial](quickstart.md): it walks through the process of using ICU4X as a Rust dependency, and some of the basics common to most ICU4X components. -It leads in to the [Data management tutorial](data-management.md), which covers how internationalization data can be generated and loaded into ICU4X. Users needing more control over their flow of locale data can then read [the data provider tutorial](data-provider-runtime.md). +It leads in to the [data slimming](data-slimming.md) and [data packs](data-packs.md) tutorials, which cover how internationalization data can be generated and loaded into ICU4X. Users needing more control over their flow of locale data can then read [the runtime data provider tutorial](data-provider-runtime.md). After going through that, you can take a look at [the ICU4X root docs][icu-crate-docs] and check out the various components, each of which covers some area of internationalization and has usage docs for doing so. diff --git a/tutorials/date-picker-data.md b/tutorials/data-packs.md similarity index 73% rename from tutorials/date-picker-data.md rename to tutorials/data-packs.md index 778defc3e7d..8dfd6479742 100644 --- a/tutorials/date-picker-data.md +++ b/tutorials/data-packs.md @@ -1,23 +1,27 @@ -# Interactive Date Picker - Custom Data +# Introduction to ICU4X - Data packs + +If you're happy shipping your app with the recommended set of locales included in `ICU4X`, you can stop reading now. If you want to include additional locales, do runtime data loading, or build your own complex data pipelines, this tutorial is for you. In this tutorial, we will add additional locale data to your app. ICU4X compiled data contains data for hundreds of languages, but there are languages that have data in CLDR that are not included (generally because they don't have comprehensive coverage). For example, if you try using the locale `ccp` (Chakma) in your app, you will get output like `2023 M11 7`. Believe it or not, but this is not actually correct output for Chakma. Instead ICU4X fell back to the "root locale", which tries to be as neutral as possible. Note how it avoided calling the month by name by using `M11`, even though we requested a format with a non-numeric month name. So, let's add some data for Chakma. -## 1. Installing `icu4x-datagen` +## 1. Prerequisites + +This tutorial assumes you have finished the [introductory tutorial](quickstart.md) and continues where that tutorial left off. In particular, you should still have the latest version of your code. Data generation is done using the `icu4x-datagen` tool, which pulls data from [Unicode's *Common Locale Data Repository* (*CLDR*)](http://cldr.unicode.org/index/downloads) and from `ICU4C` releases. -Verify that Rust is installed. If it's not, you can install it in a few seconds from [https://rustup.rs/](https://rustup.rs/). +Verify that Rust is installed (even if you're following the JavaScript tutorial). If it's not, you can install it in a few seconds from [https://rustup.rs/](https://rustup.rs/). -```console +```shell cargo --version # cargo 1.86.0 (adf9b6ad1 2025-02-28) ``` Now you can run -```console +```shell cargo install icu4x-datagen ``` @@ -25,40 +29,41 @@ cargo install icu4x-datagen We're ready to generate the data. We will use the blob format, and create a blob that will contain just Chakma data. At runtime we can then load it as needed. -```console +```shell icu4x-datagen --markers all --locales ccp --format blob --out ccp.blob ``` This will generate a `ccp.blob` file containing data for Chakma. +`icu4x-datagen` has many options, some of which we'll discover below. The default options should work for most purposes, but check out `icu4x-datagen --help` to learn more about fine-tuning your data. + 💡 Note: if you're having technical difficulties, this file is available [here](https://storage.googleapis.com/static-493776/icu4x_2023-11-03/ccp.blob). ## 3. Using the data pack -### Rust Part 3 +
+Rust To use blob data, we will need to add the `icu_provider_blob` crate to our project: -```console +```shell cargo add icu_provider_blob --features alloc ``` We also need to enable the `serde` feature on the `icu` crate to enable deserialization support: -```console +```shell cargo add icu --features serde ``` Now, update the instantiation of the datetime formatter to load data from the blob if the locale is Chakma: -```rust -// At the top of the file: +```rust, ignore use icu::locale::locale; use icu_provider_blob::BlobDataProvider; -// replace the date_formatter creation let date_formatter = if locale == locale!("ccp") { println!("Using buffer provider"); @@ -78,9 +83,10 @@ let date_formatter = if locale == locale!("ccp") { }; ``` -Try using `ccp` now! +
-### JavaScript Part 3 +
+JavaScript Update the formatting logic to load data from the blob if the locale is Chakma. Note that this code uses a callback, as it does an HTTP request: @@ -101,7 +107,7 @@ function load_blob(url, callback) { if (localeStr == "ccp") { load_blob("https://storage.googleapis.com/static-493776/icu4x_2023-11-03/ccp.blob", (blob) => { let dateTimeFormatter = DateTimeFormatter.createYmdtWithProvider( - DataProvider.createFromBlob(blob), + DataProvider.fromByteSlice(blob), locale, DateTimeLength.Long, ); @@ -116,6 +122,8 @@ if (localeStr == "ccp") { } ``` +
+ Try using `ccp` now! ## 4. Slimming the data pack @@ -124,7 +132,7 @@ Note: the following steps are currently only possible in Rust. 🤷 When we ran `icu4x-datagen`, we passed `--markers all`, which make it generate *all* data for the Chakma locale, even though we only need date formatting. We can make `icu4x-datagen` analyze our binary to figure out which markers are needed: -```console +```shell cargo build --release icu4x-datagen --markers-for-bin target/release/tutorial --locales ccp --format blob --out ccp_smaller.blob ``` @@ -135,7 +143,7 @@ This should generate a lot fewer markers! Let's look at the sizes: -```console +```shell wc -c *.blob # 5448603 ccp.blob # 13711 ccp_smaller.blob @@ -149,32 +157,29 @@ The last datagen invocation still produced a lot of markers, as you saw in its o Replace the `DateTimeFormatter::try_new` calls with `FixedCalendarDateTimeFormatter::try_new`, and change the `format` invocation to convert the input to the Gregorian calendar: -```rust +```rust,ignore println!("Date: {}", date_formatter.format(&iso_date.to_calendar(Gregorian))); ``` -The generic type of `FixedCalendarDateTimeFormatter` will be inferred from the input, which now has type `&Date` now. Unlike `DateTimeFormatter`, `FixedCalendarDateTimeFormatter` never applies calendar conversions on its input, so it will be a `FixedCalendarDateTimeFormatter`. +The generic type of `FixedCalendarDateTimeFormatter` will be inferred from the input, which has type `&Date` now. Unlike `DateTimeFormatter`, `FixedCalendarDateTimeFormatter` never applies calendar conversions on its input, so it will be a `FixedCalendarDateTimeFormatter`. -Now we can run datagen with `--markers-for-bin` again: +Now we can run datagen with `--markers-for-bin` again and the output should be much shorter: -```console +```shell cargo build --release icu4x-datagen --markers-for-bin target/release/tutorial --locales ccp --format blob --out ccp_smallest.blob +# ... +# 2025-05-14T14:26:52.306Z INFO [icu_provider_export::export_impl] Generated marker DatetimeNamesMonthGregorianV1 +# 2025-05-14T14:26:52.308Z INFO [icu_provider_export::export_impl] Generated marker DatetimeNamesYearGregorianV1 +# 2025-05-14T14:26:52.312Z INFO [icu_provider_export::export_impl] Generated marker DatetimePatternsDateGregorianV1 +# 2025-05-14T14:26:52.324Z INFO [icu_provider_export::export_impl] Generated marker DecimalDigitsV1 +# 2025-05-14T14:26:52.325Z INFO [icu_provider_export::export_impl] Generated marker DecimalSymbolsV1 +# ... ``` -The output will be much shorter: - -```console -2025-05-14T14:26:52.306Z INFO [icu_provider_export::export_impl] Generated marker DatetimeNamesMonthGregorianV1 -2025-05-14T14:26:52.308Z INFO [icu_provider_export::export_impl] Generated marker DatetimeNamesYearGregorianV1 -2025-05-14T14:26:52.312Z INFO [icu_provider_export::export_impl] Generated marker DatetimePatternsDateGregorianV1 -2025-05-14T14:26:52.324Z INFO [icu_provider_export::export_impl] Generated marker DecimalDigitsV1 -2025-05-14T14:26:52.325Z INFO [icu_provider_export::export_impl] Generated marker DecimalSymbolsV1 -``` - -And the blob will also be much smaller at the sizes: +The blob should also be even smaller: -```console +```shell wc -c *.blob # 5448603 ccp.blob # 13711 ccp_smaller.blob diff --git a/tutorials/data-management.md b/tutorials/data-slimming.md similarity index 71% rename from tutorials/data-management.md rename to tutorials/data-slimming.md index ef59bcb8b65..05df7c21c88 100644 --- a/tutorials/data-management.md +++ b/tutorials/data-slimming.md @@ -1,36 +1,42 @@ -# Data management - -This tutorial introduces data providers as well as the `icu4x-datagen` tool. +# Introduction to ICU4X - Data slimming If you're happy shipping your app with the recommended set of locales included in `ICU4X`, you can stop reading now. If you want to reduce code size, do runtime data loading, or build your own complex data pipelines, this tutorial is for you. +In this tutorial, we will remove unneeded locale data from our app. ICU4X compiled data contains data for hundreds of languages, but not all locales might be required at runtime. Usually there is a fixed set that a user can choose from, which in our example is going to be Japanese and English (`ja` and `en`). + ## 1. Prerequisites -This tutorial assumes you have finished the [introductory tutorial](quickstart.md) and continues where that tutorial left off. In particular, you should still have the latest version of code for `myapp`. +This tutorial assumes you have finished the [introductory tutorial](quickstart.md) and continues where that tutorial left off. In particular, you should still have the latest version of your code. -## 2. Generating data +Data generation is done using the `icu4x-datagen` tool, which pulls data from [Unicode's *Common Locale Data Repository* (*CLDR*)](http://cldr.unicode.org/index/downloads) and from `ICU4C` releases. -Data generation is done using the `icu4x-datagen` tool, which pulls in data from [Unicode's *Common Locale Data Repository* (*CLDR*)](http://cldr.unicode.org/index/downloads) and from `ICU4C` releases to generate `ICU4X` data. +Verify that Rust is installed (even if you're following the JavaScript tutorial). If it's not, you can install it in a few seconds from [https://rustup.rs/](https://rustup.rs/). -First we will need to install the binary: +```shell +cargo --version +# cargo 1.86.0 (adf9b6ad1 2025-02-28) +``` -```console +Now you can run + +```shell cargo install icu4x-datagen ``` -Get a coffee, this might take a while ☕. +## 2. Generating custom data Once installed, run: -```console -icu4x-datagen --markers all --locales ja --format baked --pretty --out my_data +```shell +icu4x-datagen --markers all --locales ja en --format baked --pretty --out my_data ``` -This will generate a `my_data` directory containing the data for all components in the `ja` locale. +This will generate a `my_data` directory containing the data for all components in the `ja` and `en` locales. `icu4x-datagen` has many options, some of which we'll discover below. The default options should work for most purposes, but check out `icu4x-datagen --help` to learn more about fine-tuning your data. -### Should you check in data to your repository? +
+Aside: Should you check in data to your repository? You can check in the generated data to your version control system, or you can add it to a build script. There are pros and cons of both approaches. @@ -46,11 +52,15 @@ You should generate it automatically at build time if: If you check in the generated data, it is recommended that you configure a job in continuous integration that verifies that the data in your repository reflects the latest CLDR/Unicode releases; otherwise, your app may drift out of date. +
+ ## 3. Using the generated data +Note: this section is currently only possible in Rust. 🤷 + Once we have generated the data, we need to instruct `ICU4X` to use it. To do this, set the `ICU4X_DATA_DIR` during the compilation of your app: -```console +```shell ICU4X_DATA_DIR=$(pwd)/my_data cargo run ``` @@ -79,46 +89,31 @@ Because of these two data provider types, every `ICU4X` API has three constructo ## 5. Using the generated data explicitly +Note: this section is currently only possible in Rust. 🤷 + The data we generated in section 2 is actually just Rust code defining `DataProvider` implementations for all markers using hardcoded data (go take a look!). So far we've used it through the default `try_new` constructor by using the environment variable to replace the built-in data. However, we can also directly access the `DataProvider` implementations if we want, for example to combine it with other providers. We include the generated code with the `include!` macro. The `impl_data_provider!` macro adds the generated implementations to any type. -```rust,compile_fail -extern crate alloc; // required as my-data is written for #[no_std] -use icu::locale::{locale, Locale}; -use icu::calendar::Date; -use icu::datetime::{DateTimeFormatter, fieldsets::YMD}; +Replace your `date_time_formatter` construction with the following code: -const LOCALE: Locale = locale!("ja"); - -struct MyDataProvider; +```rust,compile_fail +extern crate alloc; // required as my_data is written for #[no_std] include!("../my_data/mod.rs"); +struct MyDataProvider; impl_data_provider!(MyDataProvider); -fn main() { - let baked_provider = MyDataProvider; - - let dtf = DateTimeFormatter::try_new_unstable( - &baked_provider, - LOCALE.into(), - YMD::long() - ) - .expect("ja data should be available"); - - let date = Date::try_new_iso(2020, 10, 14) - .expect("date should be valid"); - - let formatted_date = dtf.format(&date); - - println!("📅: {}", formatted_date); -} +// Create and use an ICU4X date formatter: +let date_formatter = DateTimeFormatter::try_new_unstable(MyDataProvider, locale.into(), YMDT::medium()) + .expect("should have data for specified locale"); +println!("📅: {}", date_formatter.format(&iso_date_time)); ``` The `impl_data_provider!` code will require additional crates, see its documentation for a list. -```console +```shell cargo add icu_locale_core cargo add icu_pattern cargo add icu_provider --features baked @@ -136,7 +131,7 @@ To use `BufferProvider`s, the Cargo feature `"serde"` needs to be enabled on `ic Let's update our `Cargo.toml`: -```console +```shell cargo add icu --features serde cargo add icu_provider_blob --features alloc cargo add icu_provider_adapters @@ -144,7 +139,7 @@ cargo add icu_provider_adapters We can generate data for it using the `--format blob` flag: -```console +```shell icu4x-datagen --markers all --locales ja --format blob --out my_data_blob.postcard ``` @@ -152,55 +147,56 @@ This will generate a `my_data_blob.postcard` file containing the serialized data ### Locale Fallbacking +
+Rust + Unlike `BakedDataProvider`, `BlobDataProvider` (and `FsDataProvider`) does not perform locale fallbacking. For example, if `en-US` is requested but only `en` data is available, then the data request will fail. To enable fallback, we can wrap the provider in a `LocaleFallbackProvider`. Note that fallback comes at a cost, as fallbacking code and data has to be included and executed on every request. If you don't need fallback (disclaimer: you probably do), you can use the `BlobDataProvider` directly (for baked data, see [`Options::skip_internal_fallback`](https://docs.rs/icu_provider_baked/latest/icu_provider_baked/export/struct.Options.html)). We can then use the provider in our code: -```rust,no_run -use icu::locale::{locale, Locale, fallback::LocaleFallbacker}; -use icu::calendar::Date; -use icu::datetime::{DateTimeFormatter, fieldsets::YMD}; +```rust,ignore +use icu::locale::fallback::LocaleFallbacker; use icu_provider_adapters::fallback::LocaleFallbackProvider; use icu_provider_blob::BlobDataProvider; -const LOCALE: Locale = locale!("ja"); +let blob = std::fs::read("my_data_blob.postcard").expect("Failed to read file"); +let buffer_provider = + BlobDataProvider::try_new_from_blob(blob.into_boxed_slice()) + .expect("blob should be valid"); -fn main() { - let blob = std::fs::read("my_data_blob.postcard").expect("Failed to read file"); - let buffer_provider = - BlobDataProvider::try_new_from_blob(blob.into_boxed_slice()) - .expect("blob should be valid"); +let fallbacker = LocaleFallbacker::try_new_with_buffer_provider(&buffer_provider) + .expect("Provider should contain fallback rules"); - let fallbacker = LocaleFallbacker::try_new_with_buffer_provider(&buffer_provider) - .expect("Provider should contain fallback rules"); +let buffer_provider = LocaleFallbackProvider::new(buffer_provider, fallbacker); - let buffer_provider = LocaleFallbackProvider::new(buffer_provider, fallbacker); +// Create and use an ICU4X date formatter: +let date_formatter = DateTimeFormatter::try_new_with_buffer_provider(&buffer_provider, locale.into(), YMDT::medium()) + .expect("should have data for specified locale"); - let dtf = DateTimeFormatter::try_new_with_buffer_provider( - &buffer_provider, - LOCALE.into(), - YMD::long() - ) - .expect("blob should contain required markers and `ja` data"); +println!("📅: {}", date_formatter.format(&iso_date_time)); +``` - let date = Date::try_new_iso(2020, 10, 14) - .expect("date should be valid"); +As you can see in the second `expect` message, it's not possible to statically tell whether the correct data markers are included. While `BakedDataProvider` would result in a compile error for missing `DataProvider` implementations, `BlobDataProvider` returns runtime errors if markers are missing. - let formatted_date = dtf.format(&date); +
- println!("📅: {}", formatted_date); -} -``` +
+JavaScript + +TODO + +
-As you can see in the second `expect` message, it's not possible to statically tell whether the correct data markers are included. While `BakedDataProvider` would result in a compile error for missing `DataProvider` implementations, `BlobDataProvider` returns runtime errors if markers are missing. ## 7. Data slicing +Note: this section is currently only possible in Rust. 🤷 + You might have noticed that the blob we generated is a hefty 5MB. This is no surprise, as we used `--markers all`. However, our binary only uses date formatting data in Japanese. There's room for optimization: -```console +```shell cargo build --release && icu4x-datagen --markers-for-bin target/release/myapp --locales ja --format blob --out my_data_blob.postcard --overwrite ``` @@ -210,40 +206,15 @@ But there is more to optimize. You might have noticed this in the output of the We can instead use `FixedCalendarDateTimeFormatter`, which only supports formatting `Date`s: -```rust,no_run -use icu::locale::{locale, Locale, fallback::LocaleFallbacker}; -use icu::calendar::{Date, Gregorian}; -use icu::datetime::{FixedCalendarDateTimeFormatter, fieldsets::YMD}; -use icu_provider_adapters::fallback::LocaleFallbackProvider; -use icu_provider_blob::BlobDataProvider; - -const LOCALE: Locale = locale!("ja"); - -fn main() { - let blob = std::fs::read("my_data_blob.postcard").expect("Failed to read file"); - let buffer_provider = - BlobDataProvider::try_new_from_blob(blob.into_boxed_slice()) - .expect("blob should be valid"); - - let fallbacker = LocaleFallbacker::try_new_with_buffer_provider(&buffer_provider) - .expect("Provider should contain fallback rules"); - - let buffer_provider = LocaleFallbackProvider::new(buffer_provider, fallbacker); - - let dtf = FixedCalendarDateTimeFormatter::::try_new_with_buffer_provider( - &buffer_provider, - LOCALE.into(), - YMD::long(), - ) - .expect("blob should contain required data"); - - let date = Date::try_new_gregorian(2020, 10, 14) - .expect("date should be valid"); +```rust,ignore +use icu::datetime::FixedCalendarDateTimeFormatter; +use icu::calendar::cal::Gregorian; - let formatted_date = dtf.format(&date); +// Create and use an ICU4X date formatter: +let date_formatter = FixedCalendarDateTimeFormatter::try_new(locale.into(), YMDT::medium()) + .expect("should have data for specified locale"); - println!("📅: {}", formatted_date); -} +println!("📅: {}", date_formatter.format(&iso_date_time.to_calendar(Gregorian))); ``` This has two advantages: it reduces our code size, as `DateTimeFormatter` might include code for calendar conversions (such as from ISO to Gregorian in this case), and it reduces our data size, as `--markers-for-bin` can now determine that we need even fewer markers. The data size improvement could have also been achieved by manually listing the data markers we think we'll need (using the `--markers` flag), but we risk a runtime error if we're wrong. @@ -260,6 +231,6 @@ These API-level optimizations also apply to compiled data (there's no need to us We have learned how to generate data and load it into our programs, optimize data size, and gotten to know the different data providers that are part of `ICU4X`. -For a deeper dive into configuring your data providers in code, see [data-provider-runtime.md]. +For a deeper dive into configuring your data providers in code, see [the runtime data provider tutorial](data-provider-runtime.md). You can learn more about datagen, including the Rust API which we have not used in this tutorial, by reading [the docs](https://docs.rs/icu_provider_export/latest/). diff --git a/tutorials/date-picker.md b/tutorials/date-picker.md deleted file mode 100644 index be7662db791..00000000000 --- a/tutorials/date-picker.md +++ /dev/null @@ -1,233 +0,0 @@ -# Interactive Date Picker - -In this tutorial, you will learn how to build an end-to-end application using ICU4X to format a date and time with some default locales and additional locales loaded dynamically. - -This tutorial is written in parallel between **Rust** and **JavaScript** in a web browser. - -## 1. Installing ICU4X - -Installing dependencies is always your first step. - -### Rust Part 1 - -Verify that Rust is installed. If it's not, you can install it in a few seconds from [https://rustup.rs/](https://rustup.rs/). - -```console -cargo --version -# cargo 1.86.0 (adf9b6ad1 2025-02-28) -``` - -Create a new Rust binary crate with icu4x as a dependency: - -```console -cargo new --bin tutorial -cd tutorial -cargo add icu -``` - -### JavaScript Part 1 - -We recommend using [CodePen](https://codepen.io/pen/?editors=1011) to follow along. To load ICU4X into CodePen, you can use this snippet in the JavaScript editor: - -```javascript -import { Locale, DateFormatter, IsoDate, DateTimeLength } from "https://unpkg.com/icu@2.0.0"; -``` - -This loads the full ICU4X WebAssembly file. Since it may take a few seconds to load on slow connections, we'll create a loading div. Add this to your HTML: - -```html -
Loading…
- - -``` - -And in JavaScript, add these lines after the import statement: - -```javascript -document.getElementById("loading").style.display = "none"; -document.getElementById("inputoutput").style.display = "block"; -``` - -## 2. Parsing an Input Locale - -Here, we will accept a locale string from the user and parse it into an ICU4X Locale. - -### Rust Part 2 - -First, we will use Rust APIs to accept a string from user input on the command line. Then we can parse the input string as an ICU4X `Locale`. Add the following to your `fn main()`: - -```rust,no_run -// At the top of the file: -use icu::locale::Locale; - -// In the main() function: -print!("Enter your locale: "); -std::io::Write::flush(&mut std::io::stdout()).unwrap(); -let locale_str = { - let mut buf = String::new(); - std::io::stdin().read_line(&mut buf).unwrap(); - buf -}; - -// Since the string contains whitespace, we must call `.trim()`: -let locale = match locale_str.trim().parse::() { - Ok(locale) => { - println!("You entered: {locale}"); - locale - } - Err(e) => { - panic!("Error parsing locale! {e}"); - } -}; -``` - -Try inputting locales in non-canonical syntax and see them normalized! - -```bash -cargo run -Enter your locale: DE-CH -You entered: de-CH -``` - -### JavaScript Part 2 - -In the HTML, create an input element for accepting a locale string input, and an output element to echo it back to the user. Add this inside of the `inputoutput` div: - -```html - -

-

Output:

-``` - -And in JavaScript: - -```javascript -// Create a function that updates the UI: -function update() { - try { - let localeStr = document.getElementById("localeinput").value; - - let locale = Locale.fromString(localeStr); - let output = locale.toString(); - - document.getElementById("output").innerText = output; - } catch(e) { - document.getElementById("output").innerText = e; - } -} - -// Run the function whenever the locale input changes: -document.getElementById("localeinput").addEventListener("keyup", update, false); - -// Also run the function right now to initialize the UI: -update(); -``` - -Try inputting locales in non-canonical syntax and see them normalized! - -> Locale: ES-419 -> Output: es-419 - -## 3. Formatting a Date - -Now we will use built-in locale data to produce a formatted date. - -### Rust Part 3 - -We would like to format today's date. We will get this from the `time` crate, which you need to add: - -```console -cargo add time --features local-offset -``` - -Now we can write the Rust code: - -```rust -// At the top of the file: -use icu::datetime::{DateTimeFormatter, fieldsets::YMD, input::Date}; - -let locale = icu::locale::Locale::UNKNOWN; // to make this example compile - -// Put the following in the main() function: -let iso_date = { - let current_offset_date_time = time::OffsetDateTime::now_local().unwrap(); - Date::try_new_iso( - current_offset_date_time.year(), - current_offset_date_time.month() as u8, - current_offset_date_time.day(), - ) - .unwrap() -}; - -// Create and use an ICU4X date formatter: -let date_formatter = DateTimeFormatter::try_new( - locale.into(), - YMD::medium(), -) -.expect("should have data for specified locale"); -println!( - "Date: {}", - date_formatter.format(&iso_date) -); -``` - -Try this in several locales, like `en` (English), `en-GB` (British English), and `th` (Thai). Observe how differently dates are represented in locales around the world! You can explicitly specify arbitrary calendar systems using the `u-ca` Unicode extension keyword in the locale. Try `en-u-ca-hebrew`! - -### JavaScript Part 3 - -In JavaScript, we will create a datetime input field. - -Add this to the HTML: - -```html - -

-``` - -And this to JavaScript: - -```javascript - -// Run the function whenever the date input changes: -document.getElementById("dateinput").addEventListener("input", update, false); - -// Put the following in the update() function, inside the try block: -let dateStr = document.getElementById("dateinput").value; - -let dateObj = dateStr ? new Date(dateStr) : new Date(); -let isoDate = new IsoDate(dateObj.getFullYear(), dateObj.getMonth() + 1, dateObj.getDate()); -let dateFormatter = DateFormatter.createYmd(locale, DateTimeLength.Long); -let output = dateFormatter.formatIso(isoDate); - -document.getElementById("output").innerText = output; -``` - -Try this in several locales, like `en` (English), `en-GB` (British English), and `th` (Thai). Observe how differently dates are represented in locales around the world! You can explicitly specify arbitrary calendar systems using the `u-ca` Unicode extension keyword in the locale. Try `en-u-ca-hebrew`! - -## 4. Formatting date and time - -Now we would also like to format the current time. - -### Rust Part 4 - -Use the API documentation for [`icu::time::DateTime`](https://docs.rs/icu/latest/icu/time/struct.DateTime.html) and [`icu::datetime::fieldsets`](https://docs.rs/icu/latest/icu/datetime/fieldsets/index.html) to expand your app to format both date and time. - -### JavaScript Part 4 - -Use the API documentation for [`Time`](https://icu4x.unicode.org/2_0/tsdoc/classes/Time.html) and [`DateTimeFormatter`](https://icu4x.unicode.org/2_0/tsdoc/classes/DateTimeFormatter.html) to expand your app to format both a date and a time. - -Hint: You can create an HTML time picker with - -```html -

-``` - -Hint: You can create a `Date` from `dateStr` and `timeStr` with - -```javascript -let dateObj = dateStr && timeStr ? new Date(dateStr + " " + timeStr) : new Date(); -``` - -Note that Dates constructed this way will be in UTC. \ No newline at end of file diff --git a/tutorials/quickstart.md b/tutorials/quickstart.md index f4220af77f9..4663fc48f15 100644 --- a/tutorials/quickstart.md +++ b/tutorials/quickstart.md @@ -1,6 +1,6 @@ -# Introduction to ICU4X for Rust +# Introduction to ICU4X -`ICU4X` is an implementation of [Internationalization Components of Unicode](http://site.icu-project.org/) (ICU) intended to be modular, performant and flexible. +`ICU4X` is an implementation of [Internationalization Components of Unicode](https://icu.unicode.org/) (ICU) intended to be modular, performant and flexible. The library provides a layer of APIs for all software to enable internationalization capabilities. @@ -8,8 +8,12 @@ To use `ICU4X` in the Rust ecosystem one can either add dependencies on selected In this tutorial we are going to build up to writing an app that uses the `icu::datetime` component to format a date and time, covering various topics in the process. +This tutorial is written in parallel between **Rust** and **JavaScript** in a web browser. + ## 1. Requirements +
+Rust For this tutorial we assume the user has basic Rust knowledge. If acquiring it is necessary, the [Rust Book](https://doc.rust-lang.org/book/) provides an excellent introduction. We also assume that the user is familiar with a terminal and have `rust` and `cargo` installed. @@ -19,14 +23,24 @@ To verify that, open a terminal and check that the results are similar to: cargo --version # cargo 1.86.0 (adf9b6ad1 2025-02-28) ``` +
+ +
+JavaScript + +For this tutorial we assume the user has basic JavaScript knowledge. We recommend using [CodePen](https://codepen.io/pen/?editors=1011) to follow along. +
## 2. Creating an app with ICU4X as a dependency +
+Rust + Use `cargo` to initialize a binary application: ```console -cargo new --bin myapp -cd myapp +cargo new --bin tutorial +cd tutorial ``` Then add a dependency on `ICU4X`'s main crate, `icu`: @@ -35,113 +49,269 @@ Then add a dependency on `ICU4X`'s main crate, `icu`: cargo add icu ``` +Run your application with `cargo run`: + +```shell +cargo run +# Hello, world! +``` + +*Notice:* By default, `cargo run` builds and runs a `debug` mode of the binary. If you want to evaluate performance, memory or binary size, use `cargo run --release`. + +
+ +
+JavaScript + +To load ICU4X into CodePen, you can use this snippet in the JavaScript editor: + +```javascript +import { Locale, DateFormatter, IsoDate, DateTimeLength } from "https://unpkg.com/icu@2.0.0"; +``` + +This loads the full development ICU4X WebAssembly file. Since it may take some time to load on slow connections, we'll create a loading div. In future tutorials you will learn how to build an optimized WebAssembly file, reducing the size of the WASM file by 99% or more. Add this to your HTML: + +```html +
Loading…
+ + +``` + +And in JavaScript, add these lines after the import statement: + +```javascript +document.getElementById("loading").style.display = "none"; +document.getElementById("inputoutput").style.display = "block"; +``` +
+ ## 3. Locales `ICU4X` comes with a variety of components allowing to manage various facets of software internationalization. Most of those features depend on the selection of a `Locale` which is a particular combination of language, script, region with optional variants. An examples of such locales are `en-US` (American English), `sr-Cyrl` (Serbian with Cyrillic script) or `ar-EG-u-nu-latn` (Egyptian Arabic with ASCII numerals). -In `ICU4X` `Locale` is a part of the `locale_core` component. If the user needs just this one feature, they can use `icu_locale_core` crate as a dependency, but since here we already added a dependency on `icu`, we can refer to it via `icu::locale`. +In `ICU4X` `Locale` is a part of the `locale` component[^1]. Let's use this in our application. + +[^1]: If the user needs just this one feature, they can use `icu_locale_core` crate as a dependency, but since here we already added a dependency on `icu`, we can refer to it via `icu::locale`. -Let's use this in our application. +
+Rust -Open `src/main.rs` and edit it to: +Open `src/main.rs` and add the following code inside `fn main`: -```rust +```rust,no_run use icu::locale::Locale; -fn main() { - let loc: Locale = "ES-AR".parse() - .expect("should be a valid locale"); +// Pass a locale string on the command line +let locale_str = std::env::args().nth(1).unwrap(); - if loc.id.language.as_str() == "es" { - println!("¡Hola!"); - } +// Since the string contains whitespace, we must call `.trim()`: +let locale = locale_str.trim().parse::().unwrap(); - println!("You are using: {}", loc); +if locale.id.language.as_str() == "es" { + println!("¡Hola!"); } + +println!("Your locale: {locale}"); ``` After saving it, call `cargo run` and it should display: -```text -¡Hola! -You are using: es-AR +```shell +cargo run -- DE-CH +# Your locale: de-CH + +cargo run -- es-419 +# ¡Hola! +# Your locale: es-419 ``` -*Notice:* Here, `ICU4X` canonicalized the locales's syntax which uses lowercase letters for the language portion. +### Convenience macro -Congratulations! `ICU4X` has been used to semantically operate on a locale! +The scenario of working with statically declared `Locale`s (and subtags) is common. It's a bit unergonomic to have to parse them at runtime and handle a parser error (or to degrade to string operations), so ICU4X provides macros one can use to parse at compilation time: -### Convenience macro +```rust,ignore +use icu::locale::subtags::language; -The scenario of working with statically declared `Locale`s is common. +if locale.id.language == language!("es") { + println!("¡Hola!"); +} +``` -It's a bit unergonomic to have to parse them at runtime and handle a parser error in such case. +Try using a malformed string, like "spanish" and call `cargo check`. +
-For that purpose, ICU4X provides a macro one can use to parse it at compilation time: +
+JavaScript -```rust -use icu::locale::{Locale, locale, subtags::language}; +In the HTML, create an input element for accepting a locale string input, and an output element to echo it back to the user. Add this inside of the `inputoutput` div: -const LOCALE: Locale = locale!("ES-AR"); +```html + +

+

Output:

+``` -fn main() { - if LOCALE.id.language == language!("es") { - println!("¡Hola!"); - } +And in JavaScript: - println!("You are using: {}", LOCALE); +```javascript +// Create a function that updates the UI: +function update() { + try { + let localeStr = document.getElementById("localeinput").value; + + let locale = Locale.fromString(localeStr); + if locale.language == "es" { + console.log("¡Hola!"); + } + let output = locale.toString(); + + document.getElementById("output").innerText = output; + } catch(e) { + document.getElementById("output").innerText = e; + } } + +// Run the function whenever the locale input changes: +document.getElementById("localeinput").addEventListener("keyup", update, false); + +// Also run the function right now to initialize the UI: +update(); ``` -In this case, the parsing is performed at compilation time, so we don't need to handle an error case. Try passing an malformed identifier, like "foo-bar" and call `cargo check`. +Try inputting locales in non-canonical syntax and see them normalized! + +> Locale: ES-419 +> +> Output: es-419 -Next, let's add some more complex functionality. +
+ +*Notice:* Here, `ICU4X` canonicalized the locales's syntax which uses lowercase letters for the language portion. + +Congratulations! `ICU4X` has been used to semantically operate on a locale! ## 4. Using an ICU4X component -We're going to extend our app to use the `icu::datetime` component to format a date and time. This component requires data; we will look at custom data generation later and for now use the default included data, -which is exposed through constructors such as `try_new`. +We're going to extend our app to use the `icu::datetime` component to format a date. This component requires data; we will look at custom data generation later and for now use the default included data, +which is exposed through standard constructors (`try_new` in Rust, `create` in JavaScript). + +
+Rust -```rust -use icu::locale::{Locale, locale}; -use icu::calendar::Date; -use icu::datetime::{DateTimeFormatter, fieldsets::YMD}; +We will get the current date from the `time` crate, which you need to add + +```console +cargo add time --features local-offset +``` -const LOCALE: Locale = locale!("ja"); // let's try some other language +Now add the following code to `fn main`: -fn main() { +```rust,ignore +use icu::datetime::{DateTimeFormatter, fieldsets::YMD, input::Date}; - let dtf = DateTimeFormatter::try_new( - LOCALE.into(), - YMD::long(), +let iso_date = { + let current_offset_date_time = time::OffsetDateTime::now_local().unwrap(); + Date::try_new_iso( + current_offset_date_time.year(), + current_offset_date_time.month() as u8, + current_offset_date_time.day(), ) - .expect("ja data should be available"); + .unwrap() +}; + +// Create and use an ICU4X date formatter: +let date_formatter = DateTimeFormatter::try_new( + locale.into(), + YMD::medium(), +) +.expect("should have data for specified locale"); +println!( + "📅: {}", + date_formatter.format(&iso_date) +); +``` - let date = Date::try_new_iso(2020, 10, 14) - .expect("date should be valid"); +If all went well, running the app with `cargo run` should display: - let formatted_date = dtf.format(&date); +```console +cargo run -- de-CH +Your locale: de-CH +📅: 15.05.2025 +``` +
- println!("📅: {}", formatted_date); -} +
+JavaScript + +In JavaScript, we will create a datetime input field. + +Add this to the HTML: + +```html + +

``` -If all went well, running the app with `cargo run` should display: +And this to JavaScript: + +```javascript + +// Run the function whenever the date input changes: +document.getElementById("dateinput").addEventListener("input", update, false); + +// Put the following in the update() function, inside the try block: +let dateStr = document.getElementById("dateinput").value; + +let dateObj = dateStr ? new Date(dateStr) : new Date(); +let isoDate = new IsoDate(dateObj.getFullYear(), dateObj.getMonth() + 1, dateObj.getDate()); +let dateFormatter = DateFormatter.createYmd(locale, DateTimeLength.Long); +let output = dateFormatter.formatIso(isoDate); + +document.getElementById("output").innerText = output; +``` +
+ +Try this in several locales, like `en` (English), `en-GB` (British English), and `th` (Thai). Observe how differently dates are represented in locales around the world! You can explicitly specify arbitrary calendar systems using the `u-ca` Unicode extension keyword in the locale. Try `en-u-ca-hebrew`! + +## 5. Formatting date and time -```text -📅: 2020年10月14日 +Now we would also like to format the current time. + +
+Rust + +Use the API documentation for [`icu::time::DateTime`](https://docs.rs/icu/latest/icu/time/struct.DateTime.html) and [`icu::datetime::fieldsets`](https://docs.rs/icu/latest/icu/datetime/fieldsets/index.html) to expand your app to format both date and time. + +
+ +
+JavaScript + +Use the API documentation for [`Time`](https://icu4x.unicode.org/2_0/tsdoc/classes/Time.html) and [`DateTimeFormatter`](https://icu4x.unicode.org/2_0/tsdoc/classes/DateTimeFormatter.html) to expand your app to format both a date and a time. + +Hint: You can create an HTML time picker with + +```html +

``` -Here's an internationalized date! +Hint: You can create a `Date` from `dateStr` and `timeStr` with + +```javascript +let dateObj = dateStr && timeStr ? new Date(dateStr + " " + timeStr) : new Date(); +``` -*Notice:* By default, `cargo run` builds and runs a `debug` mode of the binary. If you want to evaluate performance, memory or size of this example, use `cargo run --release`. +Note that Dates constructed this way will be in UTC. +
-## 5. Data Management +## 6. Data Management -While the locale API is purely algorithmic, many internationalization APIs like the date formatting API require more complex data to work. You've seen this in the previous example where we had to call `.expect("ja data should be available")` after the constructor. +While the locale API is purely algorithmic, many internationalization APIs like the date formatting API require more complex data to work. You've seen this in the previous example where we had to call `.expect("should have data for specified locale")` after the constructor. Data management is a complex and non-trivial area which often requires customizations for particular environments and integrations into a project's ecosystem. @@ -149,17 +319,17 @@ The way `ICU4X` handles data is one of its novelties, aimed at making the data m `ICU4X` by default contains data for a a wide range of CLDR locales[^1], meaning that for most languages, the constructors can be considered infallible and you can `expect` or `unwrap` them, as we did above. -However, shipping the library with all locales will have a size impact on your binary. It also requires you to update your binary whenever CLDR data changes, which happens twice a year. To learn how to solve these problems, see our [data management](data-management.md) tutorial. +However, shipping the library with all locales will have a size impact on your binary. It also requires you to update your binary whenever CLDR data changes, which happens twice a year. To learn how to solve these problems, see our [data packs](data-packs.md) and [data slimming](data-slimming.md) tutorials. [^1]: All locales with coverage level `basic`, `moderate`, or `modern` in [`CLDR`](https://github.com/unicode-org/cldr-json/blob/main/cldr-json/cldr-core/coverageLevels.json) -## 6. Summary +## 7. Summary This concludes this introduction tutorial. With the help of `Locale` and `DateTimeFormatter` we formatted a date to Japanese, but that's just the start. Internationalization is a broad domain and there are many more components in `ICU4X`. -Next, learn how to [generate optimized data for your binary](data-management.md), [configure your Cargo.toml file](../examples/cargo), or continue exploring by reading [the docs](https://docs.rs/icu/latest/). +Next, learn how to [generate optimized data for your binary](data-slimming.md), [create language packs](data-packs.md), [configure your Cargo.toml file](../examples/cargo), or continue exploring by reading [the docs](https://docs.rs/icu/latest/).