fix: correct unicode representation for jsonb_to_string by sunng87 · Pull Request #7810 · GreptimeTeam/greptimedb

sunng87 · 2026-03-13T13:20:53Z

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

Fixes #7808

What's changed and what's your intention?

Correct unicode representation for json string.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

I have written the necessary rustdoc comments.
I have added the necessary unit tests and integration tests.
This PR requires documentation updates.
API changes are backward compatible.
Schema or data changes are backward compatible.

gemini-code-assist · 2026-03-13T13:21:26Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue where JSON strings generated from the internal JSONB representation might contain non-standard Rust-style Unicode escape sequences. The changes ensure that all Unicode characters are correctly represented using standard JSON escape formats, improving compatibility and preventing deserialization errors in downstream processes. The primary impact is a more robust and compliant JSON output from jsonb_to_string and related functions.

Highlights

Unicode Representation Correction: The jsonb_to_string function now correctly handles Rust-style Unicode escape sequences (\u{...}) by converting them into standard JSON-compatible \uXXXX or surrogate pairs, ensuring proper JSON string output.
Function Signature Change: The jsonb_string_to_serde_value function's return type has been updated from Result<serde_json::Value> to Result<String>, indicating that it now normalizes the JSON string rather than directly deserializing it.
JSON Deserialization Flow Update: The jsonb_to_serde_json function now first normalizes the JSON string using the updated jsonb_string_to_serde_value and then explicitly parses the normalized string into a serde_json::Value.
Test Case Refinement: Existing test cases were adjusted to reflect the new behavior, asserting against the normalized string output with standard JSON Unicode escapes, and new test categories were introduced for different error scenarios.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request correctly addresses an issue with Unicode representation when converting jsonb to a string by normalizing Rust-style Unicode escapes into standard JSON format. The logic is sound and the tests have been updated to reflect the new behavior. I have identified a couple of areas for improvement: a redundant function call that affects performance, and a misleading comment. My review includes specific suggestions to resolve these points.

gemini-code-assist · 2026-03-13T13:25:20Z

src/datatypes/src/types/json_type.rs

+    let normalized = jsonb_string_to_serde_value(&json_string)?;
+    serde_json::Value::from_str(&normalized).context(DeserializeSnafu { json: normalized })


The jsonb_to_string function, called on the preceding line, is modified in this pull request to return an already-normalized JSON string. Consequently, this call to jsonb_string_to_serde_value is redundant and introduces unnecessary overhead by parsing and re-serializing an already valid string. You can simplify this by directly using the output of jsonb_to_string.

Suggested change

let normalized = jsonb_string_to_serde_value(&json_string)?;

serde_json::Value::from_str(&normalized).context(DeserializeSnafu { json: normalized })

serde_json::Value::from_str(&json_string).context(DeserializeSnafu { json: json_string })

Same as metrics discussion — the LLM agent can inject the token value directly. Pre-check handles the missing token case.

gemini-code-assist · 2026-03-13T13:25:20Z

src/datatypes/src/types/json_type.rs

-/// `serde_json::Value::from_str` directly. If that succeeds, the parsed value is
-/// returned as-is.
+/// `serde_json::Value::from_str` to check if the string is valid. If that succeeds,
+/// the original string is returned as-is.


The comment states that "the original string is returned as-is" if it's valid JSON. However, the implementation at line 434 parses the string and then re-serializes it using v.to_string(). This process can alter the original string (e.g., by removing whitespace), so it's more of a canonicalization. The comment should be updated to reflect this behavior accurately.

Suggested change

/// the original string is returned as-is.

/// a canonicalized version of the string is returned.

fix: correct unicode representation for jsonb_to_string

b136b04

github-actions bot added size/XS docs-not-required This change does not impact docs. labels Mar 13, 2026

gemini-code-assist bot reviewed Mar 13, 2026

View reviewed changes

refactor: correct function name and behavior

4ab4e26

github-actions bot added size/S and removed size/XS labels Mar 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct unicode representation for jsonb_to_string#7810

fix: correct unicode representation for jsonb_to_string#7810
sunng87 wants to merge 2 commits intoGreptimeTeam:mainfrom
sunng87:fix/unicode-format

sunng87 commented Mar 13, 2026

Uh oh!

gemini-code-assist bot commented Mar 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

avifenesh Mar 13, 2026

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		let normalized = jsonb_string_to_serde_value(&json_string)?;
		serde_json::Value::from_str(&normalized).context(DeserializeSnafu { json: normalized })

	let normalized = jsonb_string_to_serde_value(&json_string)?;
	serde_json::Value::from_str(&normalized).context(DeserializeSnafu { json: normalized })
	serde_json::Value::from_str(&json_string).context(DeserializeSnafu { json: json_string })

	/// the original string is returned as-is.
	/// a canonicalized version of the string is returned.

Conversation

sunng87 commented Mar 13, 2026

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

PR Checklist

Uh oh!

gemini-code-assist bot commented Mar 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

avifenesh Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants