The Wayback Machine - https://web.archive.org/web/20220505222511/https://github.com/topics/document-processing
Here are
14 public repositories
matching this topic...
Generic framework for historical document processing
Updated
Jul 9, 2021
Python
An include filter for Pandoc
Updated
Dec 6, 2020
Haskell
Semantic extraction from conference proceedings.
Updated
Jul 26, 2020
Python
Unofficial mirror of git://git.lyx.org/lyx.git (updates daily. not affiliated with lyx.org.)
tokyo, a REST API, when given any type of document 📄 , Identifies mime-type 🧐 . Suggests extension 🦔 . Alas Extracts text 💪 .
Updated
Jun 13, 2020
Clojure
A module for creating stopword lists for any language, based on a set of documents.
Updated
Apr 15, 2022
JavaScript
This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.
Updated
Sep 11, 2020
Python
A Python command-line utility intended for automating some copyediting tasks in documents. It allows editing zipped, XML-based files (e.g. docx, odt, or epub), through XSLT stylesheets. Can be rather easily extended with your own custom xsl stylesheets.
Updated
Jul 17, 2018
XSLT
A document preprocessor that works in conjunction with tools like groff/troff & refer.
Updated
Jan 17, 2022
Jupyter Notebook
School/College Stationary List OCR and Parsing
Convert scans of handwritten notes to PDF.
An implementation of basic IR techniques from scratch.
Updated
May 24, 2019
Python
Apply keyword procedures in a given Racket namespace using X-expressions.
Updated
May 5, 2020
Racket
Improve this page
Add a description, image, and links to the
document-processing
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
document-processing
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.