fix: correct unicode representation for jsonb_to_string#7810
fix: correct unicode representation for jsonb_to_string#7810sunng87 wants to merge 2 commits intoGreptimeTeam:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue where JSON strings generated from the internal JSONB representation might contain non-standard Rust-style Unicode escape sequences. The changes ensure that all Unicode characters are correctly represented using standard JSON escape formats, improving compatibility and preventing deserialization errors in downstream processes. The primary impact is a more robust and compliant JSON output from Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly addresses an issue with Unicode representation when converting jsonb to a string by normalizing Rust-style Unicode escapes into standard JSON format. The logic is sound and the tests have been updated to reflect the new behavior. I have identified a couple of areas for improvement: a redundant function call that affects performance, and a misleading comment. My review includes specific suggestions to resolve these points.
src/datatypes/src/types/json_type.rs
Outdated
| let normalized = jsonb_string_to_serde_value(&json_string)?; | ||
| serde_json::Value::from_str(&normalized).context(DeserializeSnafu { json: normalized }) |
There was a problem hiding this comment.
The jsonb_to_string function, called on the preceding line, is modified in this pull request to return an already-normalized JSON string. Consequently, this call to jsonb_string_to_serde_value is redundant and introduces unnecessary overhead by parsing and re-serializing an already valid string. You can simplify this by directly using the output of jsonb_to_string.
| let normalized = jsonb_string_to_serde_value(&json_string)?; | |
| serde_json::Value::from_str(&normalized).context(DeserializeSnafu { json: normalized }) | |
| serde_json::Value::from_str(&json_string).context(DeserializeSnafu { json: json_string }) |
There was a problem hiding this comment.
Same as metrics discussion — the LLM agent can inject the token value directly. Pre-check handles the missing token case.
| /// `serde_json::Value::from_str` directly. If that succeeds, the parsed value is | ||
| /// returned as-is. | ||
| /// `serde_json::Value::from_str` to check if the string is valid. If that succeeds, | ||
| /// the original string is returned as-is. |
There was a problem hiding this comment.
The comment states that "the original string is returned as-is" if it's valid JSON. However, the implementation at line 434 parses the string and then re-serializes it using v.to_string(). This process can alter the original string (e.g., by removing whitespace), so it's more of a canonicalization. The comment should be updated to reflect this behavior accurately.
| /// the original string is returned as-is. | |
| /// a canonicalized version of the string is returned. |
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
Fixes #7808
What's changed and what's your intention?
Correct unicode representation for json string.
PR Checklist
Please convert it to a draft if some of the following conditions are not met.