Improve Type-1 font parsing #20715

jkseppan · 2021-07-22T12:13:40Z

PR Summary

Parse font properties also from the encrypted part of the file, and reimplement the parsing so it understands more of PostScript's syntax. This fixes a bug where Type1Font.transform would not remove the UniqueID key but break some PostScript code referring to UniqueID instead.

Incidentally, fix the bug where every font had a weight property with value 'Normal' - the correct property is spelled Weight with a capital letter.

This is on top of #20634 so merging that one will bring the diff size down a little. This is another prerequisite for subsetting Type-1 fonts (#127).

PR Checklist

Has pytest style unit tests (and pytest passes).
Is Flake 8 compliant (run flake8 on changed files to check).
New features are documented, with examples if plot related.
Documentation is sphinx and numpydoc compliant (the docs should build without error).
Conforms to Matplotlib style conventions (install flake8-docstrings and run flake8 --docstring-convention=all).
New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).

With this I can produce smaller pdf files with usetex in some small tests, but this obviously needs more extensive testing, thus marking as draft. On top of matplotlib#20634 and matplotlib#20715. Closes matplotlib#127.

Move Type1Font._tokens into a top-level function _tokenize that is a coroutine. The parsing stage consuming the tokens can instruct the tokenizer to return a binary token - this is necessary when decrypting the CharStrings and Subrs arrays, since the preceding context determines which parts of the data need to be decrypted. The function now also parses the encrypted portion of the font file. To support usage as a coroutine, move the whitespace filtering into the function, since passing the information about binary tokens would not easily work through a filter. The function now returns tokens as subclasses of a new _Token class, which carry the position and value of the token and can have token-specific helper methods. The position data will be needed when modifying the file, as the font is transformed or subsetted. A new helper function _expression can be used to consume tokens that form a balanced subexpression delimited by [] or {}. This helps fix a bug in UniqueID removal: if the font includes PostScript code that checks if the UniqueID is set in the current dictionary, the previous code broke that code instead of removing the UniqueID definition. Fonts can include UniqueID in the encrypted portion as well as the cleartext one, and removal is now done in both portions. Fix a bug related to font weight: the key is title-cased and not lower-cased, so font.prop['weight'] should not exist.

Type-1 fonts are required to have subroutines with specific contents but their names may vary. They are usually ND, NP and RD but names like | and |- appear too.

With this I can produce smaller pdf files with usetex in some small tests, but this obviously needs more extensive testing, thus marking as draft. On top of matplotlib#20715. Closes matplotlib#127.

jkseppan force-pushed the jkseppan:type1-improved-parsing branch 4 times, most recently from 981ef67 to 90b5889 Jul 22, 2021

jkseppan mentioned this pull request Jul 22, 2021

Type-1 font subsetting #20716

Draft

7 tasks

jklymak added topic: text/fonts status: waiting for other PR labels Jul 22, 2021

jkseppan force-pushed the jkseppan:type1-improved-parsing branch from 90b5889 to e35728b Jul 22, 2021

jkseppan added 2 commits Jul 13, 2021

Recognize abbreviations of PostScript code

9418b35

Type-1 fonts are required to have subroutines with specific contents but their names may vary. They are usually ND, NP and RD but names like | and |- appear too.

jkseppan force-pushed the jkseppan:type1-improved-parsing branch from e35728b to 9418b35 Jul 22, 2021

jklymak removed the status: waiting for other PR label Jul 22, 2021

Jul	AUG	Sep
	12
2020	2021	2022

matplotlib / matplotlib

Improve Type-1 font parsing #20715

Improve Type-1 font parsing #20715

jkseppan commented Jul 22, 2021

matplotlib / matplotlib

Sponsor matplotlib/matplotlib

Improve Type-1 font parsing #20715

Are you sure you want to change the base?

Improve Type-1 font parsing #20715

Conversation

jkseppan commented Jul 22, 2021

PR Summary

PR Checklist