⚡️ Open Source No Code Web Data Extraction Platform • Turn Websites To APIs & Spreadsheets In Minutes ⚡️
-
Updated
Jun 12, 2025 - TypeScript
⚡️ Open Source No Code Web Data Extraction Platform • Turn Websites To APIs & Spreadsheets In Minutes ⚡️
Extract Keywords from sentence or Replace keywords in sentences.
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
ContextGem: Effortless LLM extraction from documents
Lightweight library for scraping web-sites with LLMs
A beginner-friendly yet powerful Python toolkit for financial analysis and automation — built to make modern investing accessible to everyone
📰 Let ChatGPT Summarize Hacker News for You
🚜 Parse text and tables from PDF files.
A powerful Model Context Protocol (MCP) server that provides an all-in-one solution for public web access.
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
Benchmarking PDF libraries
Undetected web-scraping & seamless HTML parsing in Python!
A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.
Wikipedia information extraction library
A python client for the Sypht API
This repository provides usage examples for the Python module Newspaper3k.
A Python utility to digitize plots.
Accurate, private and configurable document retrieval LLM
Add a description, image, and links to the data-extraction topic page so that developers can more easily learn about it.
To associate your repository with the data-extraction topic, visit your repo's landing page and select "manage topics."