Description
Is your feature request related to a problem? Please describe.
The i18n WG is developing recommendations for referring to one or more characters in markup (see https://w3c.github.io/bp-i18n-specdev/#char_ref_template).
The most basic template for the expanded markup is:
<span class="codepoint" translate="no"><bdi lang="xx">&#xXXXX;</bdi><span class="uname">U+XXXX UNICODE_CHARACTER_NAME</span></span>
This is not complicated, but it's a bit lengthy and fiddly for authors to type in full, especially if a sequence of characters is involved. We'd therefore like to propose a macro that can be used with respec docs to automatically create the full markup from a more concise base.
Describe the solution you'd like
We propose the following expansions, where
- the textContent can be a code point value, eg. 00E9, or a sequence of space-separated values, eg. 0928 093F;
- the textContent can be a character, eg. é, or a sequence of characters, eg. नि
- the
lang
attribute is strongly recommended, and has a BCP47 language code as its value - there is no limit on the number of values provided
- hex and character values can't be mixed – the former can be requested using
class="hx"
, and the latter usingclass="ch"
- the character name(s) are automatically inserted by respec
Examples:
[1]
<span class="hx" lang="fr">00E9</span>
OR
<span class="ch" lang="fr">é</span>
--->
<span class="codepoint" translate="no"><bdi lang="fr">é</bdi><span class="uname">U+00E9 LATIN SMALL LETTER E WITH ACUTE</span></span>
[2]
<span class="hx" lang="hi">0928 093F</span>
OR
<span class="ch" lang="hi">नि</span>
--->
<span class="codepoint" translate="no"><bdi lang="hi">नि</bdi><span class="uname">U+0928 DEVANAGARI LETTER NA</span> + <span class="uname">U+093F DEVANAGARI VOWEL SIGN I</span></span>
It may also be useful to have a way of indicating that no bdi element is wanted (although much of the time an image would be useful as a replacement). Maybe something like:
<span class="hx nobdi" lang="en">00A0</span>
For invisible characters or tricky to display characters (such as certain combining marks), more complete solution would allow for an image in the expanded markup. For example:
<span class="codepoint" translate="no"><img src="mypath/2003.png" alt=" "><span class="uname">U+2003: EM SPACE</span></span>
If it's possible to standardise or accept user input wrt the image location, this could be achieved with a shorthand such as the following, where an additional class name of img
or svg
is used.
<span class="hx img" lang="ja">2003</span>
(Btw, I can provide a set of images for invisible characters, eg. .)
Additional context
Note that there is intentionally no span between </bdi><span>
. The gap will be provided by styling (which avoids problems with variable space widths and makes it possible to reduce the gap or change it at scale if needed).