-
Notifications
You must be signed in to change notification settings - Fork 29.5k
fix(editor): Fix the issue that the contents of json, html, csv, md, txt, and css files contain garbled Chinese characters #16118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…aligned with the icon on the left when a validation error occurred
…txt, and css files contain garbled Chinese characters
luka seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cubic found 1 issue across 3 files. Review it in cubic.dev
React with 👍 or 👎 to teach cubic. Tag @cubic-dev-ai
to give specific feedback.
@@ -27,6 +27,7 @@ export { | |||
isObjectEmpty, | |||
deepCopy, | |||
jsonParse, | |||
base64DecodeUTF8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base64DecodeUTF8 function uses a deprecated escape() function in its fallback path which is marked for removal from JavaScript.
Atob will produce garbled characters when processing Base64 encoding of Chinese or other non-ASCII characters because it was originally designed to support only Latin1 (ISO-8859-1) encoding, that is, each character only occupies one byte (0~255). Chinese characters are multi-byte encodings (such as UTF-8, which usually occupies 2~4 bytes). When you directly decode the Base64 encoded string with atob, a single-byte decoded string is returned. If these bytes are directly interpreted as UTF-16 strings (JavaScript's internal string encoding), garbled characters will result. |
'data:' + mimeType + ';base64,' + binaryData
`data:${mimeType};charset=utf-8;base64,${binaryData}`; This happens because when decoding Base64 data, the browser needs to know which character encoding (like UTF-8) to use to interpret the text content—especially for non-ASCII characters like Chinese. If the character set isn't explicitly specified ( ✅ Why does the second version not cause garbled text?`data:${mimeType};charset=utf-8;base64,${binaryData}` This includes
Chinese characters like "你好" are multi-byte in UTF-8. If you don’t explicitly specify the charset, some environments might default to another encoding (like ISO-8859-1), which causes incorrect interpretation. ❌ Why does the first version cause garbled text?'data:' + mimeType + ';base64,' + binaryData This omits the
|
Hey @luka-mimi, Thanks for the PR, We have created "GHC-2405" as the internal reference to get this reviewed. One of us will be in touch if there are any changes needed, in most cases this is normally within a couple of weeks but it depends on the current workload of the team. |
Summary
fix(editor): Fix the issue that the contents of json, html, csv, md, txt, and css files contain garbled Chinese characters
Related Linear tickets, Github issues, and Community forum posts
#15041
Review / Merge checklist
release/backport
(if the PR is an urgent fix that needs to be backported)