-
Updated
Aug 10, 2021 - Makefile
#
web-scraping
Here are 2,426 public repositories matching this topic...
List of libraries, tools and APIs for web scraping and data processing.
javascript
ruby
python
go
golang
php
awesome
proxy
proxy-server
web-scraping
awesome-list
proxy-list
proxylist
data-processing
captcha-solving
captcha-breaking
captcha-solver
anti-captcha
captcha-recognition
proxyserver
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
python
crawler
machine-learning
scraper
automation
ai
scraping
artificial-intelligence
web-scraping
scrape
webscraping
webautomation
-
Updated
Feb 3, 2021 - Python
PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs
api
php
http
client
json
framework
curl
xml
proxy
restful
class
http-client
http-proxy
api-client
web-scraper
requests
web-scraping
php-curl
web-service
php-curl-library
-
Updated
Aug 20, 2021 - PHP
Selenium-python but lighter: Helium is the best Python library for web automation.
-
Updated
May 24, 2021 - Python
Web Scraping Framework
-
Updated
Feb 22, 2021 - Python
A Devtools driver for web automation and scraping
testing
go
golang
scraper
automation
web
chrome-devtools
headless
devtools
web-scraping
cdp
chrome-headless
rod
chrome-devtools-protocol
devtools-protocol
gorod
-
Updated
Aug 11, 2021 - Go
Learn Python for the next 30 (or so) Days.
python
api
flask
automation
tutorial
csv
jupyter
rest-api
selenium
pandas
python3
web-scraping
selenium-webdriver
fastapi
-
Updated
Aug 23, 2021 - HTML
General Assembly's 2015 Data Science course in Washington, DC
python
data-science
machine-learning
natural-language-processing
course
clustering
naive-bayes
linear-regression
scikit-learn
jupyter-notebook
pandas
data-visualization
web-scraping
data-analysis
ensemble-learning
logistic-regression
decision-trees
regular-expressions
data-cleaning
model-evaluation
-
Updated
Apr 18, 2016 - Jupyter Notebook
Snoop — инструмент разведки на основе открытых данных (OSINT world)
security
parser
osint
scanner
geo
geolocation
scraping
web-scraping
ip
geocoder
police
infosec
ctf
termux
pentest
nickname
blueteam
redteam
username-checker
username-search
-
Updated
Aug 22, 2021 - Python
Collection of scripts corresponding to LucidProgramming YouTube tutorials
python
python3
web-scraping
youtube-tutorial
python-tutorial
ctci-solutions
lucidprogramming
python3-tutorial
technical-interview
-
Updated
Feb 10, 2021 - Python
The Python Code Tutorials
python
python-tutorials
machine-learning
natural-language-processing
computer-vision
text-classification
tutorials
python3
web-scraping
face-detection
scapy
network-analysis
network-programming
programming-tutorial
ethical-hacking
network-security
socket-programming
scapy-tutorials
-
Updated
Aug 21, 2021 - Jupyter Notebook
DataHen Till is a standalone tool that instantly makes your existing web scraper scalable, maintainable, and more unblockable, with minimal code changes on your scraper.
-
Updated
Aug 18, 2021 - Go
Faster requests on Python 3
python
curl
high-performance
cython
python-library
web-scraper
python3
speed
open-data
http-requests
web-scraping
scrapy
ndjson
python-requests
urllib
download-file
urllib3
faster-than-requests
requests3
requests-toolbelt
-
Updated
Jun 23, 2021 - Nim
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
-
Updated
Jun 23, 2021 - Ruby
Nextjs server to query websites with GraphQL
-
Updated
Aug 13, 2021 - JavaScript
Random User-Agent middleware based on fake-useragent
-
Updated
Sep 17, 2020 - Python
A JavaScript library for generating random user agents with data that's updated daily.
javascript
user-agent
random
randomization
navigator
web-scraping
browsers
browser-automation
user-agent-spoofer
-
Updated
Aug 24, 2021 - JavaScript
UI.Vision: Open-Source RPA Software (formerly Kantu) - Modern Robotic Process Automation with Selenium IDE++
opencv
automation
webassembly
web-scraping
autohotkey
browser-extension
imacros
selenium-ide
browser-automation
visual-recognition
sikulix
web-automation
ui-tests
uipath
data-driven-tests
-
Updated
Jun 9, 2021 - JavaScript
A framework for creating semi-automatic web content extractors
python
crawler
tutorial
extractor
scraping
web-scraper
selector
css-selector
web-scraping
scrapy
scrapers
beautifulsoup
xpath-expression
lxml
selector-expression
-
Updated
Oct 24, 2020 - Python
Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).
-
Updated
Aug 23, 2021 - Cython
ACHE is a web crawler for domain-specific search.
-
Updated
Jul 17, 2021 - Java
NBA Stats API via Basketball Reference
-
Updated
Aug 10, 2021 - HTML
anntdiv
commented
Jun 10, 2021
Hello,
Thanks for new update in personal_info section,
I found out that the attribute 'certifications' return empty list []
Test url: https://www.linkedin.com/in/an-nguyen-9b3248122/
Results:
`{'personal_info': {'name': 'An Nguyen',
'headline': 'Data Scientist/Machine Learning Engineer',
'company': 'PERSOL PROCESS & TECHNOLOGY CO., LTD.',
'school': 'National Chiao Tung University',
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
-
Updated
May 19, 2021 - Go
A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
-
Updated
Feb 15, 2021 - Python
Công cụ quét và phân tích từ khoá các trang báo mạng Việt Nam
-
Updated
Aug 22, 2021 - Python
Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)
nlp
crawler
text-mining
scraper
news
scraping
web-scraper
text-extraction
web-scraping
readability
tei
tei-xml
news-articles
html2text
news-crawler
article-extractor
news-scraper
text-cleaning
text-preprocessing
-
Updated
Aug 23, 2021 - Python
Improve this page
Add a description, image, and links to the web-scraping topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the web-scraping topic, visit your repo's landing page and select "manage topics."
Main examples at Apify SDK webpage, Github repo and CLI templates should demonstrate how to manipulate with DOM and retrieve data from it.
Also add one example of scraping with Apify SDK + jQuery to https://sdk.apify.com/docs/examples/basiccrawler
Feedback from: https://medium.com/better-programming/do-i-need-python-scrapy-to-build-a-web-scraper-7cc7cac2081d
I lost an hour trying to make