Python
news-please - an integrated web crawler and information extractor for news that just works - GitHub - fhamborg/news-please: news-please - an integrated web crawler and information extractor for new...
News, full-text, and article metadata extraction in Python 3. Advanced docs: - GitHub - codelucas/newspaper: News, full-text, and article metadata extraction in Python 3. Advanced docs:
A service daemon to run Scrapy spiders. Contribute to scrapy/scrapyd development by creating an account on GitHub.
(Proxy UI + Scrape)
Intelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era - GitHub - imWildCat/scylla: Intelligent proxy pool for Humans™ t...
Every web site provides APIs. Contribute to elliotgao2/toapi development by creating an account on GitHub.
Incredibly fast crawler designed for OSINT. Contribute to s0md3v/Photon development by creating an account on GitHub.
admin ui for scrapy/open source scrapinghub. Contribute to DormyMo/SpiderKeeper development by creating an account on GitHub.
Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js - GitHub - Gerapy/Gerapy: Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js
Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO :point_right: - GitHub - my8100/scrapydweb: We...
Golang
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架 - GitHub - crawlab-team/crawlab: Distributed web crawler admin platform for...
SDK for Crawlab, including SDK for different programming languages such as Python, Node.js and Java, and a CLI Tool written in Python. - GitHub - crawlab-team/crawlab-sdk: SDK for Crawlab, includin...
Elegant Scraper and Crawler Framework for Golang. Contribute to gocolly/colly development by creating an account on GitHub.
Declarative web scraping. Contribute to MontFerret/ferret development by creating an account on GitHub.
Extract structured data from web sites. Web sites scraping. - GitHub - slotix/dataflowkit: Extract structured data from web sites. Web sites scraping.
A standalone and scriptable web scraper in Go. Contribute to philippta/flyscrape development by creating an account on GitHub.
Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application - GitHub - hakluke/hakrawler: Simple, fast web crawler designed for easy, quick discover...
Takes a list of URLs and returns their HTTP response codes - GitHub - hakluke/hakcheckurl: Takes a list of URLs and returns their HTTP response codes
Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering. - GitHub - geziyor/geziyor: Geziyor, blazing fast web crawling & scraping framework for Go. Supports J...
Rust
The fastest web crawler written in Rust. Contribute to spider-rs/spider development by creating an account on GitHub.
Javascript
Crawlee—A web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast. - GitHub - apify/crawlee: Crawlee—A web scraping and browser automation library for N...
Web Crawler/Spider for NodeJS + server-side jQuery ;-) - GitHub - bda-research/node-crawler: Web Crawler/Spider for NodeJS + server-side jQuery ;-)
Lightweight scraper for Google News. Contribute to lewisdonovan/google-news-scraper development by creating an account on GitHub.
Web service for web page to Markdown conversion. Contribute to macsplit/urltomarkdown development by creating an account on GitHub.