Skip to content

Misaligned error location when input text contain full-width characters #1530

Open
@johan456789

Description

@johan456789

Describe the bug

When the input text contains full-width characters, the error location indicator (^) will point at the wrong character because it uses half-width spaces (U+20) only but it should instead match input characters' widths and use full-width spaces (U+3000) as well.

To Reproduce

Use any CJK characters or full-width version of latin scripts and intentionally create a grammar error. The indicator (^) will point at the wrong location.

Current and expected behavior shown here:

    raise UnexpectedCharacters(stream, i, text_line, text_column, {item.expect.name for item in to_scan},
lark.exceptions.UnexpectedCharacters: No terminal matches '古' in the current parser context, at line 2 col 6
1.菝葀:古代一種象徵祥瑞的草。《廣韻.入聲.末韻》:「菝:菝葀,瑞草。」
     ^  // using U+20 (current)
     ^  // using U+20 and U+3000 (expected)
Expected one of: 
	* LPAREN
	* I_LQUOTE

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions