COLLECTED BY

Organization: Archive Team

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

History is littered with hundreds of conflicts over the future of a community, group, location or business that were "resolved" when one of the parties stepped ahead and destroyed what was there. With the original point of contention destroyed, the debates would fall to the wayside. Archive Team believes that by duplicated condemned data, the conversation and debate can continue, as well as the richness and insight gained by keeping the materials. Our projects have ranged in size from a single volunteer downloading the data to a small-but-critical site, to over 100 volunteers stepping forward to acquire terabytes of user-created data to save for future generations.

The main site for Archive Team is at archiveteam.org and contains up to the date information on various projects, manifestos, plans and walkthroughs.

This collection contains the output of many Archive Team projects, both ongoing and completed. Thanks to the generous providing of disk space by the Internet Archive, multi-terabyte datasets can be made available, as well as in use by the Wayback Machine, providing a path back to lost websites and work.

Our collection has grown to the point of having sub-collections for the type of data we acquire. If you are seeking to browse the contents of these collections, the Wayback Machine is the best first stop. Otherwise, you are free to dig into the stacks to see what you may find.

The Archive Team Panic Downloads are full pulldowns of currently extant websites, meant to serve as emergency backups for needed sites that are in danger of closing, or which will be missed dearly if suddenly lost due to hard drive crashes or server failures.

Collection: ArchiveBot: The Archive Team Crowdsourced Crawler

ArchiveBot is an IRC bot designed to automate the archival of smaller websites (e.g. up to a few hundred thousand URLs). You give it a URL to start at, and it grabs all content under that URL, records it in a WARC, and then uploads that WARC to ArchiveTeam servers for eventual injection into the Internet Archive (or other archive sites).

To use ArchiveBot, drop by #archivebot on EFNet. To interact with ArchiveBot, you issue commands by typing it into the channel. Note you will need channel operator permissions in order to issue archiving jobs. The dashboard shows the sites being downloaded currently.

There is a dashboard running for the archivebot process at http://www.archivebot.com.

ArchiveBot's source code can be found at https://github.com/ArchiveTeam/ArchiveBot.

TIMESTAMPS

The Wayback Machine - https://web.archive.org/web/20240218131239/https://github.com/matplotlib/matplotlib/pull/20866

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Remove ttconv and implement Type-42 embedding using fontTools #20866

Draft

jkseppan wants to merge 10 commits into matplotlib:main from jkseppan:remove-ttconv

Member

jkseppan commented Aug 20, 2021 •

edited

PR Summary

ttconv is now only being used for outputting fonts in Type-42 format in PostScript files, and with fontTools this turns out to be quite doable in pure Python.

PR Checklist

Has pytest style unit tests (and pytest passes).
Is Flake 8 compliant (run flake8 on changed files to check).
[N/A] New features are documented, with examples if plot related.
Documentation is sphinx and numpydoc compliant (the docs should build without error).
Conforms to Matplotlib style conventions (install flake8-docstrings and run flake8 --docstring-convention=all).
[N/A] New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).

jkseppan added backend: ps topic: text/fonts labels

jkseppan force-pushed the remove-ttconv branch from 447fd1b to a826065 Compare

August 20, 2021 12:42

QuLogic added this to the v3.6.0 milestone

jkseppan force-pushed the remove-ttconv branch from 8c44828 to 2535508 Compare

August 21, 2021 09:55

Member Author

jkseppan commented Aug 21, 2021

Now this works with almost all TrueType fonts on my Mac, with the exception of CFF fonts (#20870) and Apple Color Emoji.ttc and NISC18030.ttf both of which cause a crash in FT2Font (#7305).

jkseppan force-pushed the remove-ttconv branch from de16cf0 to daee4e5 Compare

August 21, 2021 15:57

jkseppan marked this pull request as ready for review

August 21, 2021 16:21

jkseppan force-pushed the remove-ttconv branch 2 times, most recently from 016713a to 7cea2d4 Compare

August 22, 2021 14:20

QuLogic added the status: needs rebase label

QuLogic reviewed

View reviewed changes

lib/matplotlib/backends/_backend_pdf_ps.py

@@ @@ -24,28 +25,59 @@ def get_glyphs_subset(fontfile, characters): @@
                   Subset a TTF font
                   Reads the named fontfile and restricts the font to the characters.
-                  Returns a serialization of the subset font as file-like object.
+                  Returns a TTFont object.

Member

QuLogic Dec 24, 2021

Just remove the line; you have a Returns section.

lib/matplotlib/backends/_backend_pdf_ps.py

+                  if fontfile.endswith('ttc'):
+                      # fix this once we support loading more fonts from a collection
+                      # https://github.com/matplotlib/matplotlib/issues/3135#issuecomment-571085541
+                      options.font_number = 0

Member

QuLogic Dec 24, 2021

I think we test a ttc in CI already, so it could be used to test this line?

lib/matplotlib/backends/backend_ps.py

+                      The Type-42 formatted font
+                  """
+                  version, breakpoints = _version_and_breakpoints(font.get('loca'), fontdata)
+                  post, name = font['post'], font['name']

Member

QuLogic Dec 24, 2021

Are these tables guaranteed to exist?

lib/matplotlib/backends/backend_ps.py

Comment on lines +296 to +301

		Read the version number of the font and determine sfnts breakpoints.
		When a TrueType font file is written as a Type 42 font, it has to be

Member

QuLogic Dec 24, 2021

Suggested change

      
                Read the version number of the font and determine sfnts breakpoints.
          
                When a TrueType font file is written as a Type 42 font, it has to be
          
                Read the version number of the font and determine sfnts breakpoints.
          
                When a TrueType font file is written as a Type 42 font, it has to be

lib/matplotlib/backends/backend_ps.py

Comment on lines +314 to +322

+                  tuple
+                      ((v1, v2), breakpoints) where v1 is the major version number,
+                      v2 is the minor version number and breakpoints is a sorted list
+                      of offsets into fontdata; if loca is not available, just the table
+                      boundaries

Member

QuLogic Dec 24, 2021

For multiple return, you should list the entries separately, like Parameters.

lib/matplotlib/backends/backend_ps.py

Comment on lines +336 to +342

+                  breakpoints = sorted(
+                      set(tables.values()) | glyf_breakpoints | {len(fontdata)}
+                  )

Member

QuLogic Dec 24, 2021

Suggested change

      
                breakpoints = sorted(
          
                    set(tables.values()) | glyf_breakpoints | {len(fontdata)}
          
                )
          
                breakpoints = sorted({*tables.values(), *glyf_breakpoints, len(fontdata)})

lib/matplotlib/backends/backend_ps.py

Comment on lines +384 to +394

+                  s = StringIO()
+                  go = font.getGlyphOrder()
+                  s.write(f'/CharStrings {len(go)} dict dup begin\n')
+                  for i, name in enumerate(go):
+                      s.write(f'/{name} {i} def\n')
+                  s.write('end readonly def')
+                  return s.getvalue()

Member

QuLogic Dec 24, 2021

I don't understand why you need StringIO for this, as there is no file IO necessary.

Suggested change

      
                s = StringIO()
          
                go = font.getGlyphOrder()
          
                s.write(f'/CharStrings {len(go)} dict dup begin\n')
          
                for i, name in enumerate(go):
          
                    s.write(f'/{name} {i} def\n')
          
                s.write('end readonly def')
          
                return s.getvalue()
          
                go = font.getGlyphOrder()
          
                s = f'/CharStrings {len(go)} dict dup begin\n'
          
                for i, name in enumerate(go):
          
                    s += f'/{name} {i} def\n'
          
                s += 'end readonly def'
          
                return s

lib/matplotlib/backends/backend_ps.py

Comment on lines +414 to +432

+                  b = BytesIO()
+                  b.write(b'/sfnts[')
+                  pos = 0
+                  while pos < len(fontdata):
+                      i = bisect.bisect_left(breakpoints, pos + 65534)
+                      newpos = breakpoints[i-1]
+                      if newpos <= pos:
+                          # have to accept a larger string
+                          newpos = breakpoints[-1]
+                      b.write(b'<')
+                      b.write(binascii.hexlify(fontdata[pos:newpos]))
+                      b.write(b'00>')  # need an extra zero byte on every string
+                      pos = newpos
+                  b.write(b']def')
+                  s = b.getvalue().decode('ascii')

Member

QuLogic Dec 24, 2021

Also doesn't appear to need BytesIO:

Suggested change

      
                b = BytesIO()
          
                b.write(b'/sfnts[')
          
                pos = 0
          
                while pos < len(fontdata):
          
                    i = bisect.bisect_left(breakpoints, pos + 65534)
          
                    newpos = breakpoints[i-1]
          
                    if newpos <= pos:
          
                        # have to accept a larger string
          
                        newpos = breakpoints[-1]
          
                    b.write(b'<')
          
                    b.write(binascii.hexlify(fontdata[pos:newpos]))
          
                    b.write(b'00>')  # need an extra zero byte on every string
          
                    pos = newpos
          
                b.write(b']def')
          
                s = b.getvalue().decode('ascii')
          
                s = '/sfnts['
          
                pos = 0
          
                while pos < len(fontdata):
          
                    i = bisect.bisect_left(breakpoints, pos + 65534)
          
                    newpos = breakpoints[i-1]
          
                    if newpos <= pos:
          
                        # have to accept a larger string
          
                        newpos = breakpoints[-1]
          
                    s += f'<{fontdata[pos:newpos].hex()}00>'  # Always NULL terminate.
          
                    pos = newpos
          
                s += ']def'

lib/matplotlib/tests/test_backend_ps.py

Comment on lines +240 to +261

		fontfiles = (font_manager.findfont(fp) for fp in fps)
		if len(set(fontfiles)) < 6:

Member

QuLogic Dec 24, 2021

Suggested change

      
                fontfiles = (font_manager.findfont(fp) for fp in fps)
          
                if len(set(fontfiles)) < 6:
          
                fontfiles = {font_manager.findfont(fp) for fp in fps}
          
                if len(fontfiles) < 6:

Member

tacaswell commented Dec 24, 2021

Wow, I missed that this came in. 💯 on dropping an extension module!

jklymak added the status: needs revision label

jklymak marked this pull request as draft

February 1, 2022 11:52

Member

jklymak commented Feb 1, 2022

Dropping to draft until the comments and rebase are addressed. Anyone should feel free to move back to active...

QuLogic mentioned this pull request

[MNT]: Replace str(n)cpy etc with safe versions (C++) #22603

Closed

Member

QuLogic commented Mar 15, 2022

Ping @jkseppan?

oscargus reviewed

View reviewed changes

lib/matplotlib/backends/_backend_pdf_ps.py

+                  Returns
+                  -------
+                  fontTools.ttLib.ttFont.TTFont

Contributor

oscargus Mar 15, 2022 •

edited

Long term, it could make sense to link to the documentation? (Although fontTools is not in the intersphinx_mapping yet.)

QuLogic modified the milestones: v3.6.0, v3.7.0

tacaswell modified the milestones: v3.7.0, v3.8.0

Member

tacaswell commented Dec 9, 2022

We want to try and do mpl 3.7 feature freeze around Jan 1 so I think it is unlikely we will have the bandwidth to get this rebased and reviewed by then, pushing to 3.8.

jkseppan added 6 commits

December 9, 2022 15:35


          Remove ttconv

af47e06

Move the test in lib/matplotlib/tests/test_ttconv.py to
lib/matplotlib/tests/test_backend_pdf.py. Actually it has not been
a test of ttconv for a while, since type-3 conversion has not been
using ttconv.


          Type-42 fonts using fontTools

79c9186

Split _backend_pdf_ps.get_glyphs_subset into two functions:
get_glyphs_subset returns the font object (which needs to be
closed after using) and font_as_file serializes this object
into a BytesIO file-like object.


          Add several more droppable tables

5074a89

Fonttools cannot subset these and drops them, avoid the warning


          Select the first font from ttc files

f82c9cc


          Don't crash if the loca table is missing

cc58d0d

Also don't end up in an infinite loop if there is a larger gap
between breakpoints than we would like.


          Don't crash if glyph bounds cannot be found

ca60c94

jkseppan added 4 commits

December 9, 2022 15:39


          Log the name of unsupported fonts

b146d14

FT2Font throws RuntimeError for some fonts


          Get names from the original font

7d0c84e

Not the subsetted one, since in some cases fontTools.subset
does not produce a good name table.

Import functions from _backend_pdf_ps to help keep lines
shorter.


          A test that exercises the type-42 embedding

438f61e


          Document ttconv removal

a1b41fe

tacaswell force-pushed the remove-ttconv branch from 7cea2d4 to a1b41fe Compare

December 9, 2022 20:44

github-actions bot removed the status: needs rebase label

github-actions bot added the status: needs rebase label

anntzer mentioned this pull request

Use pybind11 in ttconv module #25253

Merged

Member

jklymak commented Feb 21, 2023

Do we have someone who can pick this up?

QuLogic mentioned this pull request

replace ttconv for ps/pdf #12418

Open

QuLogic linked an issue

that may be closed by this pull request

replace ttconv for ps/pdf #12418

Open

ksunden modified the milestones: v3.8.0, future releases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment