Skip to content

fix(editor): Fix the issue that the contents of json, html, csv, md, txt, and css files contain garbled Chinese characters #16118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

luka-mimi
Copy link
Contributor

@luka-mimi luka-mimi commented Jun 7, 2025

Summary

fix(editor): Fix the issue that the contents of json, html, csv, md, txt, and css files contain garbled Chinese characters

Related Linear tickets, Github issues, and Community forum posts

#15041

Review / Merge checklist

  • PR title and summary are descriptive. (conventions)
  • Docs updated or follow-up ticket created.
  • Tests included.
  • PR Labeled with release/backport (if the PR is an urgent fix that needs to be backported)

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ luka-mimi
❌ luka


luka seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic found 1 issue across 3 files. Review it in cubic.dev

React with 👍 or 👎 to teach cubic. Tag @cubic-dev-ai to give specific feedback.

@@ -27,6 +27,7 @@ export {
isObjectEmpty,
deepCopy,
jsonParse,
base64DecodeUTF8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base64DecodeUTF8 function uses a deprecated escape() function in its fallback path which is marked for removal from JavaScript.

@luka-mimi
Copy link
Contributor Author

Atob will produce garbled characters when processing Base64 encoding of Chinese or other non-ASCII characters because it was originally designed to support only Latin1 (ISO-8859-1) encoding, that is, each character only occupies one byte (0~255).

Chinese characters are multi-byte encodings (such as UTF-8, which usually occupies 2~4 bytes). When you directly decode the Base64 encoded string with atob, a single-byte decoded string is returned. If these bytes are directly interpreted as UTF-16 strings (JavaScript's internal string encoding), garbled characters will result.

@luka-mimi
Copy link
Contributor Author

luka-mimi commented Jun 7, 2025

'data:' + mimeType + ';base64,' + binaryData

`data:${mimeType};charset=utf-8;base64,${binaryData}`;

This happens because when decoding Base64 data, the browser needs to know which character encoding (like UTF-8) to use to interpret the text content—especially for non-ASCII characters like Chinese. If the character set isn't explicitly specified (charset=utf-8), the default behavior may lead to garbled text (mojibake).


✅ Why does the second version not cause garbled text?

`data:${mimeType};charset=utf-8;base64,${binaryData}`

This includes charset=utf-8, which tells the browser or parser:

After decoding the Base64 data, interpret it as UTF-8 encoded text.

Chinese characters like "你好" are multi-byte in UTF-8. If you don’t explicitly specify the charset, some environments might default to another encoding (like ISO-8859-1), which causes incorrect interpretation.


❌ Why does the first version cause garbled text?

'data:' + mimeType + ';base64,' + binaryData

This omits the charset, so the browser guesses the encoding based on the mimeType:

  • For text/plain, it might default to ISO-8859-1 or something else.
  • Since Chinese characters are multi-byte in UTF-8, but ISO-8859-1 interprets bytes as single characters, you get garbled output.

@n8n-assistant n8n-assistant bot added community Authored by a community member core Enhancement outside /nodes-base and /editor-ui in linear Issue or PR has been created in Linear for internal review labels Jun 7, 2025
@Joffcom
Copy link
Member

Joffcom commented Jun 7, 2025

Hey @luka-mimi,

Thanks for the PR, We have created "GHC-2405" as the internal reference to get this reviewed.

One of us will be in touch if there are any changes needed, in most cases this is normally within a couple of weeks but it depends on the current workload of the team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Authored by a community member core Enhancement outside /nodes-base and /editor-ui in linear Issue or PR has been created in Linear for internal review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants