-
-
Notifications
You must be signed in to change notification settings - Fork 815
Closed
Milestone
Description
When outputting a string value containing a supplementary Unicode code point, UTF8JsonGenerator is encoding the supplementary character as a pair of \uNNNN
escapes representing the two halves of the surrogate pair that would denote the code point in UTF-16 instead of using the correct multi-byte UTF-8 encoding of the character. The following Groovy script demonstrates the behaviour:
@Grab(group='com.fasterxml.jackson.core', module='jackson-core', version='2.6.2')
import com.fasterxml.jackson.core.JsonFactory
def factory = new JsonFactory()
def bytes1 = new ByteArrayOutputStream()
def gen1 = factory.createGenerator(bytes1) // UTF8JsonGenerator
gen1.writeStartObject()
gen1.writeStringField("test", new String(Character.toChars(0x1F602)))
gen1.writeEndObject()
gen1.close()
System.out.write(bytes1.toByteArray())
println ""
// prints {"test":"\uD83D\uDE02"}
def bytes2 = new ByteArrayOutputStream()
new OutputStreamWriter(bytes2, "UTF-8").withWriter { w ->
def gen2 = factory.createGenerator(w) // WriterBasedJsonGenerator
gen2.writeStartObject()
gen2.writeStringField("test", new String(Character.toChars(0x1F602)))
gen2.writeEndObject()
gen2.close()
}
System.out.write(bytes2.toByteArray())
println ""
// prints {"test":"😂"}
When generating to a Writer rather than an OutputStream (and letting Java handle the UTF-8 byte conversion) the supplementary character U+1F602 is encoded as the correct UTF-8 four byte sequence f0 9f 98 82
.
htmldoug, Podbrushkin and elonzh
Metadata
Metadata
Assignees
Labels
No labels