lassie

Content scraper

Library for retrieving basic content from websites

Web Content Retrieval for Humans™

GitHub

615 stars
22 watching
49 forks
Language: HTML
last commit: over 2 years ago
Linked from 2 awesome lists

contentmetaoembedpythonrequests

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
kennethreitz/requests-html A Pythonic HTML parsing library providing intuitive and asynchronous web scraping capabilities. 304
meilisearch/docs-scraper Automates scraping and indexing of documentation content into a search engine 297
felipecsl/wombat A Ruby-based web crawler and data extraction tool with an elegant DSL. 1,315
malfrats/xeuledoc A tool to fetch information about public Google documents from various services 856
laramies/metagoofil Extracts metadata from public documents found on websites, useful for brute-force attacks. 1,050
aantron/lambdasoup A functional HTML scraping and manipulation library in OCaml 384
the-markup/blacklight-collector A tool for scraping website content and analyzing browser behavior 205
needmorecowbell/giggity A tool to scrape and store hierarchical data about GitHub organizations, users, or repositories. 127
scrapy/scrapely A pure-python library for extracting structured data from HTML pages. 1,865
mdsecactivebreach/linkedint A Python-based tool for extracting and analyzing LinkedIn data for reconnaissance purposes during adversary simulation. 478
medialab/minet A command line tool and Python library for extracting data from various web sources. 293
miyagawa/web-scraper A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface. 104
propublica/upton A web scraping framework that simplifies the process by handling repetitive tasks and provides options for efficient data retrieval 1,612
localvoid/ndx A lightweight library for full-text indexing and searching 153
benibela/xidel A tool to extract data from web pages using various query languages and selectors. 690