Skip to content

Transform HTML entities to unicode #15

@johnridesabike

Description

@johnridesabike

The Babel JSX compiler converts HTML entities into the unicode characters they represent. In ReScript, we have to use the actual unicode directly. For some characters, like   this isn't always convenient.

Would it be possible for the React PPX to do the same transformation that Babel does? See their entity map for reference: https://github.com/babel/babel/blob/b3e2bcda73dea7d68b4c82bfabb92acb11b1ed90/packages/babel-parser/src/plugins/jsx/xhtml.js

Babel example

Input

let f = () => <div>&nbsp;</div>;

output

let f = () => /*#__PURE__*/React.createElement("div", null, "\xA0");

ReScript example

Input

let f = () => <div> {React.string("&nbsp;")} </div>

Output

function f(param) {
  return React.createElement("div", undefined, "&nbsp;");
}

Note that "&nbsp; will render as-is, not as an actual nonbreaking space character.

Activity

ryyppy

ryyppy commented on Mar 24, 2021

@ryyppy
Member

Will need to investigate what's possible!

johnridesabike

johnridesabike commented on Mar 24, 2021

@johnridesabike
Author

Even if it isn't possible to transform strings via PPX, another possibility would be to include the entities as values like let nbsp = `\xa0`, which could be used like <div> {React.string(`foo${React.nbsp}bar`)} </div> .

Still a bit clunky, but it seems preferable to having to write the unicode by hand.

ryyppy

ryyppy commented on Mar 24, 2021

@ryyppy
Member

@johnridesabike I just asked Ricky and Maxim if it makes sense to first-class this into the syntax / ppx, and its hard to say if this particular feature would be justifiable from a complexity POV... e.g. if it mixes well with compiler internal escaping / if it makes the syntax logic way more complicated than necessary.

So, would it be an option to create some independent HtmlEntity module (maybe within an example/ directory), that implements all the unicode characters mentioned above, and add an extra section to the rescript-react docs that points to that particular file for copy / paste, so ppl can quickly use it in the manner you just described with {foo${HtmlEntity.nbsp}bar}? With @inline, this would actually be zero-cost even.

I personally created my own entity bindings in user-space, but having more guidance on the topic in the docs would be great.

johnridesabike

johnridesabike commented on Mar 24, 2021

@johnridesabike
Author

That seems like a reasonable compromise. IMO the fact that there is no documentation on the topic, combined with people being used to HTML entities "just working" in Babel JSX, is the main issue. Even a unicode table in the docs that people can copy/paste from would be useful.

For the hypothetical HtmlEntity module, are you thinking of putting that in an example/ directory within the rescript-react repository, or make a separate repo (or even just a gist) that people can use to copy/paste from?

johnridesabike

johnridesabike commented on Mar 31, 2021

@johnridesabike
Author

For anyone interested in copying entities into their own project, I just pushed a commit with an HtmlEntity module here: https://github.com/johnridesabike/coronate/blob/e42688a04f34eccf0003b428e0f13054dd80a9b2/src/HtmlEntities.res

ryyppy

ryyppy commented on Apr 1, 2021

@ryyppy
Member

Thanks Jon!
We could put your file in an "extra" folder in this project and link to it?

johnridesabike

johnridesabike commented on Apr 1, 2021

@johnridesabike
Author

Yes! It was very easy to adapt from the Babel file, so feel free to link, copy, or do whatever with it.

cknitt

cknitt commented on Oct 7, 2022

@cknitt
Member

@ryyppy Actually why put it into an "extra folder" and not make it a direct part of @rescript/react? Or maybe even the compiler (Dom.HtmlEntity or whatever)?

/cc @cristianoc

johnridesabike

johnridesabike commented on Oct 19, 2022

@johnridesabike
Author

I don't see a problem with including it in Dom or Js or something similar. There's nothing React-specific in the file I made, even though it's based on the Babel-React source. It could be used with any kind of JavaScript output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Transform HTML entities to unicode · Issue #15 · rescript-lang/rescript-react