Skip to content

fix: correct unicode representation for jsonb_to_string#7810

Draft
sunng87 wants to merge 2 commits intoGreptimeTeam:mainfrom
sunng87:fix/unicode-format
Draft

fix: correct unicode representation for jsonb_to_string#7810
sunng87 wants to merge 2 commits intoGreptimeTeam:mainfrom
sunng87:fix/unicode-format

Conversation

@sunng87
Copy link
Member

@sunng87 sunng87 commented Mar 13, 2026

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

Fixes #7808

What's changed and what's your intention?

Correct unicode representation for json string.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

@github-actions github-actions bot added size/XS docs-not-required This change does not impact docs. labels Mar 13, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue where JSON strings generated from the internal JSONB representation might contain non-standard Rust-style Unicode escape sequences. The changes ensure that all Unicode characters are correctly represented using standard JSON escape formats, improving compatibility and preventing deserialization errors in downstream processes. The primary impact is a more robust and compliant JSON output from jsonb_to_string and related functions.

Highlights

  • Unicode Representation Correction: The jsonb_to_string function now correctly handles Rust-style Unicode escape sequences (\u{...}) by converting them into standard JSON-compatible \uXXXX or surrogate pairs, ensuring proper JSON string output.
  • Function Signature Change: The jsonb_string_to_serde_value function's return type has been updated from Result<serde_json::Value> to Result<String>, indicating that it now normalizes the JSON string rather than directly deserializing it.
  • JSON Deserialization Flow Update: The jsonb_to_serde_json function now first normalizes the JSON string using the updated jsonb_string_to_serde_value and then explicitly parses the normalized string into a serde_json::Value.
  • Test Case Refinement: Existing test cases were adjusted to reflect the new behavior, asserting against the normalized string output with standard JSON Unicode escapes, and new test categories were introduced for different error scenarios.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses an issue with Unicode representation when converting jsonb to a string by normalizing Rust-style Unicode escapes into standard JSON format. The logic is sound and the tests have been updated to reflect the new behavior. I have identified a couple of areas for improvement: a redundant function call that affects performance, and a misleading comment. My review includes specific suggestions to resolve these points.

Comment on lines +408 to +409
let normalized = jsonb_string_to_serde_value(&json_string)?;
serde_json::Value::from_str(&normalized).context(DeserializeSnafu { json: normalized })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The jsonb_to_string function, called on the preceding line, is modified in this pull request to return an already-normalized JSON string. Consequently, this call to jsonb_string_to_serde_value is redundant and introduces unnecessary overhead by parsing and re-serializing an already valid string. You can simplify this by directly using the output of jsonb_to_string.

Suggested change
let normalized = jsonb_string_to_serde_value(&json_string)?;
serde_json::Value::from_str(&normalized).context(DeserializeSnafu { json: normalized })
serde_json::Value::from_str(&json_string).context(DeserializeSnafu { json: json_string })

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as metrics discussion — the LLM agent can inject the token value directly. Pre-check handles the missing token case.

/// `serde_json::Value::from_str` directly. If that succeeds, the parsed value is
/// returned as-is.
/// `serde_json::Value::from_str` to check if the string is valid. If that succeeds,
/// the original string is returned as-is.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment states that "the original string is returned as-is" if it's valid JSON. However, the implementation at line 434 parses the string and then re-serializes it using v.to_string(). This process can alter the original string (e.g., by removing whitespace), so it's more of a canonicalization. The comment should be updated to reflect this behavior accurately.

Suggested change
/// the original string is returned as-is.
/// a canonicalized version of the string is returned.

@github-actions github-actions bot added size/S and removed size/XS labels Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Imcompatible unicode encoding in json_to_string output

2 participants