The Wayback Machine - https://web.archive.org/web/20210824200749/https://github.com/topics/web-scraping

#

web-scraping

Here are 2,426 public repositories matching this topic...

lorien / awesome-web-scraping

Star

List of libraries, tools and APIs for web scraping and data processing.

Updated Aug 10, 2021
Makefile

autoscraper

alirezamika / autoscraper

Sponsor Star

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

python crawler machine-learning scraper automation ai scraping artificial-intelligence web-scraping scrape webscraping webautomation

Updated Feb 3, 2021
Python

apify-js

apify / apify-js

Star

Open

Update main examples to include DOM manipulation

1

mtrunkat commented Sep 17, 2019

Main examples at Apify SDK webpage, Github repo and CLI templates should demonstrate how to manipulate with DOM and retrieve data from it.

Also add one example of scraping with Apify SDK + jQuery to https://sdk.apify.com/docs/examples/basiccrawler

Feedback from: https://medium.com/better-programming/do-i-need-python-scrapy-to-build-a-web-scraper-7cc7cac2081d

I lost an hour trying to make

Read more

good first issue

Open

dataset.delete() throws unfriendly error when empty

4

Open

Improve error messages

1

Find more good first issues →

php-curl-class / php-curl-class

Star

PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs

Updated Aug 20, 2021
PHP

mherrmann / selenium-python-helium

Star

Selenium-python but lighter: Helium is the best Python library for web automation.

python firefox chrome webdriver selenium python3 web-scraping helium web-automation selenium-python

Updated May 24, 2021
Python

lorien / grab

Star

Web Scraping Framework

python framework spider asynchronous network http-client web-scraping pycurl urllib3

Updated Feb 22, 2021
Python

go-rod / rod

Star

A Devtools driver for web automation and scraping

testing go golang scraper automation web chrome-devtools headless devtools web-scraping cdp chrome-headless rod chrome-devtools-protocol devtools-protocol gorod

Updated Aug 11, 2021
Go

codingforentrepreneurs / 30-Days-of-Python

Star

Learn Python for the next 30 (or so) Days.

python api flask automation tutorial csv jupyter rest-api selenium pandas python3 web-scraping selenium-webdriver fastapi

Updated Aug 23, 2021
HTML

justmarkham / DAT8

Star

General Assembly's 2015 Data Science course in Washington, DC

python data-science machine-learning natural-language-processing course clustering naive-bayes linear-regression scikit-learn jupyter-notebook pandas data-visualization web-scraping data-analysis ensemble-learning logistic-regression decision-trees regular-expressions data-cleaning model-evaluation

Updated Apr 18, 2016
Jupyter Notebook

tidyverse / rvest

Star

Simple web scraping for R

html r web-scraping

Updated Jul 30, 2021
R

snoop

snooppr / snoop

Star

Snoop — инструмент разведки на основе открытых данных (OSINT world)

Updated Aug 22, 2021
Python

vprusso / youtube_tutorials

Sponsor Star

Collection of scripts corresponding to LucidProgramming YouTube tutorials

python python3 web-scraping youtube-tutorial python-tutorial ctci-solutions lucidprogramming python3-tutorial technical-interview

Updated Feb 10, 2021
Python

x4nth055 / pythoncode-tutorials

Star

The Python Code Tutorials

python python-tutorials machine-learning natural-language-processing computer-vision text-classification tutorials python3 web-scraping face-detection scapy network-analysis network-programming programming-tutorial ethical-hacking network-security socket-programming scapy-tutorials

Updated Aug 21, 2021
Jupyter Notebook

DataHenHQ / till

Star

DataHen Till is a standalone tool that instantly makes your existing web scraper scalable, maintainable, and more unblockable, with minimal code changes on your scraper.

crawler scraper scraping mitm proxy-server web-scraping man-in-the-middle

Updated Aug 18, 2021
Go

juancarlospaco / faster-than-requests

Star

Faster requests on Python 3

Updated Jun 23, 2021
Nim

postmodern / spidr

Sponsor Star

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

ruby crawler scraper web spider web-crawler web-scraper web-scraping web-spider spider-links

Updated Jun 23, 2021
Ruby

dinubs / coolqlcool

Star

Nextjs server to query websites with GraphQL

javascript graphql schema nextjs web-scraping

Updated Aug 13, 2021
JavaScript

alecxe / scrapy-fake-useragent

Star

Random User-Agent middleware based on fake-useragent

python web-scraping scrapy

Updated Sep 17, 2020
Python

intoli / user-agents

Star

A JavaScript library for generating random user agents with data that's updated daily.

javascript user-agent random randomization navigator web-scraping browsers browser-automation user-agent-spoofer

Updated Aug 24, 2021
JavaScript

A9T9 / RPA

Star

UI.Vision: Open-Source RPA Software (formerly Kantu) - Modern Robotic Process Automation with Selenium IDE++

opencv automation webassembly web-scraping autohotkey browser-extension imacros selenium-ide browser-automation visual-recognition sikulix web-automation ui-tests uipath data-driven-tests

Updated Jun 9, 2021
JavaScript

AlexMathew / scrapple

Star

A framework for creating semi-automatic web content extractors

python crawler tutorial extractor scraping web-scraper selector css-selector web-scraping scrapy scrapers beautifulsoup xpath-expression lxml selector-expression

Updated Oct 24, 2020
Python

rushter / selectolax

Star

Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).

css python parser html5 web-scraping modest-engine

Updated Aug 23, 2021
Cython

VIDA-NYU / ache

Star

ACHE is a web crawler for domain-specific search.

web-crawler web-scraping web-spider focused-crawler domain-specific-search web-search

Updated Jul 17, 2021
Java

jaebradley / basketball_reference_web_scraper

Star

NBA Stats API via Basketball Reference

python nba web-scraper web-scraping basketball-reference

Updated Aug 10, 2021
HTML

austinoboyle / scrape-linkedin-selenium

Star

Open

Certifications return empty []

2

anntdiv commented Jun 10, 2021

Hello,
Thanks for new update in personal_info section,
I found out that the attribute 'certifications' return empty list []
Test url: https://www.linkedin.com/in/an-nguyen-9b3248122/
Results:
`{'personal_info': {'name': 'An Nguyen',
'headline': 'Data Scientist/Machine Learning Engineer',
'company': 'PERSOL PROCESS & TECHNOLOGY CO., LTD.',
'school': 'National Chiao Tung University',

Read more

help wanted good first issue

Open

Companyscraper doesn't work and returns error 'NoneType'

3

Open

Scrape linkedin posts

3

Find more good first issues →

infinitbyte / gopa

Star

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

lightweight elasticsearch crawler spider web-crawler scraping crawling web-scraping web-spider

Updated May 19, 2021
Go

csu / quora-api

Star

An unofficial API for Quora.

python api flask rest-api web-api web-scraping quora quora-api

Updated Oct 9, 2016
Python

sangaline / wayback-machine-scraper

Star

A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

python web-scraping command-line-tool wayback-machine wayback-archiver archive-dot-org

Updated Feb 15, 2021
Python

hailoc12 / docbao

Star

Công cụ quét và phân tích từ khoá các trang báo mạng Việt Nam

python3 web-scraping newspaper-crawler facebook-crawler made-in-vietnam

Updated Aug 22, 2021
Python

adbar / trafilatura

Star

Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

nlp crawler text-mining scraper news scraping web-scraper text-extraction web-scraping readability tei tei-xml news-articles html2text news-crawler article-extractor news-scraper text-cleaning text-preprocessing

Updated Aug 23, 2021
Python

Improve this page

Add a description, image, and links to the web-scraping topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-scraping topic, visit your repo's landing page and select "manage topics."