Skip to content

Commit 23cb5d7

Browse files
committed
aSD
1 parent ebdead3 commit 23cb5d7

File tree

5 files changed

+332
-417
lines changed

5 files changed

+332
-417
lines changed

tools/md-tests/src/lib.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,12 @@ mod readme {}
1010
mod tutorials {
1111
#[doc = include_str!("../../../tutorials/quickstart.md")]
1212
mod quickstart_md {}
13-
#[doc = include_str!("../../../tutorials/date-picker.md")]
14-
mod date_picker_md {}
13+
#[doc = include_str!("../../../tutorials/data-packs.md")]
14+
mod data_packs_md {}
1515
#[doc = include_str!("../../../tutorials/data-provider-runtime.md")]
1616
mod data_provider_runtime_md {}
17-
#[doc = include_str!("../../../tutorials/data-management.md")]
18-
mod data_management_md {}
17+
#[doc = include_str!("../../../tutorials/data-slimming.md")]
18+
mod data_slimming_md {}
1919
}
2020

2121
mod documents {

tutorials/date-picker-data.md renamed to tutorials/data-packs.md

Lines changed: 28 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,64 +1,69 @@
1-
# Interactive Date Picker - Custom Data
1+
# Introduction to ICU4X - Data packs
2+
3+
If you're happy shipping your app with the recommended set of locales included in `ICU4X`, you can stop reading now. If you want to include additional locales, do runtime data loading, or build your own complex data pipelines, this tutorial is for you.
24

35
In this tutorial, we will add additional locale data to your app. ICU4X compiled data contains data for hundreds of languages, but there are languages that have data in CLDR that are not included (generally because they don't have comprehensive coverage). For example, if you try using the locale `ccp` (Chakma) in your app, you will get output like `2023 M11 7`. Believe it or not, but this is not actually correct output for Chakma. Instead ICU4X fell back to the "root locale", which tries to be as neutral as possible. Note how it avoided calling the month by name by using `M11`, even though we requested a format with a non-numeric month name.
46

57
So, let's add some data for Chakma.
68

7-
## 1. Installing `icu4x-datagen`
9+
## 1. Prerequisites
10+
11+
This tutorial assumes you have finished the [introductory tutorial](quickstart.md) and continues where that tutorial left off. In particular, you should still have the latest version of your code.
812

913
Data generation is done using the `icu4x-datagen` tool, which pulls data from [Unicode's *Common Locale Data Repository* (*CLDR*)](http://cldr.unicode.org/index/downloads) and from `ICU4C` releases.
1014

11-
Verify that Rust is installed. If it's not, you can install it in a few seconds from [https://rustup.rs/](https://rustup.rs/).
15+
Verify that Rust is installed (even if you're following the JavaScript tutorial). If it's not, you can install it in a few seconds from [https://rustup.rs/](https://rustup.rs/).
1216

13-
```console
17+
```shell
1418
cargo --version
1519
# cargo 1.86.0 (adf9b6ad1 2025-02-28)
1620
```
1721

1822
Now you can run
1923

20-
```console
24+
```shell
2125
cargo install icu4x-datagen
2226
```
2327

2428
## 2. Generating the data pack
2529

2630
We're ready to generate the data. We will use the blob format, and create a blob that will contain just Chakma data. At runtime we can then load it as needed.
2731

28-
```console
32+
```shell
2933
icu4x-datagen --markers all --locales ccp --format blob --out ccp.blob
3034
```
3135

3236
This will generate a `ccp.blob` file containing data for Chakma.
3337

38+
`icu4x-datagen` has many options, some of which we'll discover below. The default options should work for most purposes, but check out `icu4x-datagen --help` to learn more about fine-tuning your data.
39+
3440
💡 Note: if you're having technical difficulties, this file is available [here](https://storage.googleapis.com/static-493776/icu4x_2023-11-03/ccp.blob).
3541

3642

3743
## 3. Using the data pack
3844

39-
### Rust Part 3
45+
<details>
46+
<summary>Rust</summary>
4047

4148
To use blob data, we will need to add the `icu_provider_blob` crate to our project:
4249

43-
```console
50+
```shell
4451
cargo add icu_provider_blob --features alloc
4552
```
4653

4754
We also need to enable the `serde` feature on the `icu` crate to enable deserialization support:
4855

49-
```console
56+
```shell
5057
cargo add icu --features serde
5158
```
5259

5360
Now, update the instantiation of the datetime formatter to load data from the blob if the
5461
locale is Chakma:
5562

56-
```rust
57-
// At the top of the file:
63+
```rust, ignore
5864
use icu::locale::locale;
5965
use icu_provider_blob::BlobDataProvider;
6066
61-
// replace the date_formatter creation
6267
let date_formatter = if locale == locale!("ccp") {
6368
println!("Using buffer provider");
6469
@@ -78,9 +83,10 @@ let date_formatter = if locale == locale!("ccp") {
7883
};
7984
```
8085

81-
Try using `ccp` now!
86+
</details>
8287

83-
### JavaScript Part 3
88+
<details>
89+
<summary>JavaScript</summary>
8490

8591
Update the formatting logic to load data from the blob if the locale is Chakma. Note that this code uses a callback, as it does an HTTP request:
8692

@@ -116,6 +122,8 @@ if (localeStr == "ccp") {
116122
}
117123
```
118124

125+
</details>
126+
119127
Try using `ccp` now!
120128

121129
## 4. Slimming the data pack
@@ -124,7 +132,7 @@ Note: the following steps are currently only possible in Rust. 🤷
124132

125133
When we ran `icu4x-datagen`, we passed `--markers all`, which make it generate *all* data for the Chakma locale, even though we only need date formatting. We can make `icu4x-datagen` analyze our binary to figure out which markers are needed:
126134

127-
```console
135+
```shell
128136
cargo build --release
129137
icu4x-datagen --markers-for-bin target/release/tutorial --locales ccp --format blob --out ccp_smaller.blob
130138
```
@@ -135,7 +143,7 @@ This should generate a lot fewer markers!
135143

136144
Let's look at the sizes:
137145

138-
```console
146+
```shell
139147
wc -c *.blob
140148
# 5448603 ccp.blob
141149
# 13711 ccp_smaller.blob
@@ -149,22 +157,22 @@ The last datagen invocation still produced a lot of markers, as you saw in its o
149157

150158
Replace the `DateTimeFormatter::try_new` calls with `FixedCalendarDateTimeFormatter::try_new`, and change the `format` invocation to convert the input to the Gregorian calendar:
151159

152-
```rust
160+
```rust,ignore
153161
println!("Date: {}", date_formatter.format(&iso_date.to_calendar(Gregorian)));
154162
```
155163

156164
The generic type of `FixedCalendarDateTimeFormatter` will be inferred from the input, which now has type `&Date<Gregorian>` now. Unlike `DateTimeFormatter`, `FixedCalendarDateTimeFormatter` never applies calendar conversions on its input, so it will be a `FixedCalendarDateTimeFormatter<Gregorian, ...>`.
157165

158166
Now we can run datagen with `--markers-for-bin` again:
159167

160-
```console
168+
```shell
161169
cargo build --release
162170
icu4x-datagen --markers-for-bin target/release/tutorial --locales ccp --format blob --out ccp_smallest.blob
163171
```
164172

165173
The output will be much shorter:
166174

167-
```console
175+
```shell
168176
2025-05-14T14:26:52.306Z INFO [icu_provider_export::export_impl] Generated marker DatetimeNamesMonthGregorianV1
169177
2025-05-14T14:26:52.308Z INFO [icu_provider_export::export_impl] Generated marker DatetimeNamesYearGregorianV1
170178
2025-05-14T14:26:52.312Z INFO [icu_provider_export::export_impl] Generated marker DatetimePatternsDateGregorianV1
@@ -174,7 +182,7 @@ The output will be much shorter:
174182

175183
And the blob will also be much smaller at the sizes:
176184

177-
```console
185+
```shell
178186
wc -c *.blob
179187
# 5448603 ccp.blob
180188
# 13711 ccp_smaller.blob

0 commit comments

Comments
 (0)