Improve Type-1 font parsing #20715
Open
Improve Type-1 font parsing #20715
Conversation
981ef67
to
90b5889
jkseppan
added a commit
to jkseppan/matplotlib
that referenced
this pull request
Jul 22, 2021
With this I can produce smaller pdf files with usetex in some small tests, but this obviously needs more extensive testing, thus marking as draft. On top of matplotlib#20634 and matplotlib#20715. Closes matplotlib#127.
jkseppan
added a commit
to jkseppan/matplotlib
that referenced
this pull request
Jul 22, 2021
With this I can produce smaller pdf files with usetex in some small tests, but this obviously needs more extensive testing, thus marking as draft. On top of matplotlib#20634 and matplotlib#20715. Closes matplotlib#127.
90b5889
to
e35728b
Move Type1Font._tokens into a top-level function _tokenize that is a coroutine. The parsing stage consuming the tokens can instruct the tokenizer to return a binary token - this is necessary when decrypting the CharStrings and Subrs arrays, since the preceding context determines which parts of the data need to be decrypted. The function now also parses the encrypted portion of the font file. To support usage as a coroutine, move the whitespace filtering into the function, since passing the information about binary tokens would not easily work through a filter. The function now returns tokens as subclasses of a new _Token class, which carry the position and value of the token and can have token-specific helper methods. The position data will be needed when modifying the file, as the font is transformed or subsetted. A new helper function _expression can be used to consume tokens that form a balanced subexpression delimited by [] or {}. This helps fix a bug in UniqueID removal: if the font includes PostScript code that checks if the UniqueID is set in the current dictionary, the previous code broke that code instead of removing the UniqueID definition. Fonts can include UniqueID in the encrypted portion as well as the cleartext one, and removal is now done in both portions. Fix a bug related to font weight: the key is title-cased and not lower-cased, so font.prop['weight'] should not exist.
Type-1 fonts are required to have subroutines with specific contents but their names may vary. They are usually ND, NP and RD but names like | and |- appear too.
e35728b
to
9418b35
jkseppan
added a commit
to jkseppan/matplotlib
that referenced
this pull request
Jul 22, 2021
With this I can produce smaller pdf files with usetex in some small tests, but this obviously needs more extensive testing, thus marking as draft. On top of matplotlib#20715. Closes matplotlib#127.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
PR Summary
Parse font properties also from the encrypted part of the file, and reimplement the parsing so it understands more of PostScript's syntax. This fixes a bug where
Type1Font.transform
would not remove the UniqueID key but break some PostScript code referring to UniqueID instead.Incidentally, fix the bug where every font had a
weight
property with value'Normal'
- the correct property is spelledWeight
with a capital letter.This is on top of #20634 so merging that one will bring the diff size down a little. This is another prerequisite for subsetting Type-1 fonts (#127).
PR Checklist
pytest
passes).flake8
on changed files to check).flake8-docstrings
and runflake8 --docstring-convention=all
).doc/users/next_whats_new/
(follow instructions in README.rst there).doc/api/next_api_changes/
(follow instructions in README.rst there).