Open
Description
Describe the bug
When the input text contains full-width characters, the error location indicator (^) will point at the wrong character because it uses half-width spaces (U+20) only but it should instead match input characters' widths and use full-width spaces (U+3000) as well.
To Reproduce
Use any CJK characters or full-width version of latin scripts and intentionally create a grammar error. The indicator (^) will point at the wrong location.
Current and expected behavior shown here:
raise UnexpectedCharacters(stream, i, text_line, text_column, {item.expect.name for item in to_scan},
lark.exceptions.UnexpectedCharacters: No terminal matches '古' in the current parser context, at line 2 col 6
1.菝葀:古代一種象徵祥瑞的草。《廣韻.入聲.末韻》:「菝:菝葀,瑞草。」
^ // using U+20 (current)
^ // using U+20 and U+3000 (expected)
Expected one of:
* LPAREN
* I_LQUOTE
Metadata
Metadata
Assignees
Labels
No labels