lassie

Content scraper

Library for retrieving basic content from websites

Web Content Retrieval for Humans™

615 stars

22 watching

49 forks

Language: HTML

last commit: about 4 years ago

Linked from 2 awesome lists

contentmetaoembedpythonrequests

Screenshot of michaelhelmick/lassie website

lassie.readthedocs.org

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
kennethreitz/requests-html	A Pythonic HTML parsing library providing intuitive and asynchronous web scraping capabilities.	304
meilisearch/docs-scraper	Automates scraping and indexing of documentation content into a search engine	297
felipecsl/wombat	A Ruby-based web crawler and data extraction tool with an elegant DSL.	1,315
malfrats/xeuledoc	A tool to fetch information about public Google documents from various services	856
laramies/metagoofil	Extracts metadata from public documents found on websites, useful for brute-force attacks.	1,050
aantron/lambdasoup	A functional HTML scraping and manipulation library in OCaml	384
the-markup/blacklight-collector	A tool for scraping website content and analyzing browser behavior	205
needmorecowbell/giggity	A tool to scrape and store hierarchical data about GitHub organizations, users, or repositories.	127
scrapy/scrapely	A pure-python library for extracting structured data from HTML pages.	1,865
mdsecactivebreach/linkedint	A Python-based tool for extracting and analyzing LinkedIn data for reconnaissance purposes during adversary simulation.	478
medialab/minet	A command line tool and Python library for extracting data from various web sources.	293
miyagawa/web-scraper	A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface.	104
propublica/upton	A web scraping framework that simplifies the process by handling repetitive tasks and provides options for efficient data retrieval	1,612
localvoid/ndx	A lightweight library for full-text indexing and searching	153
benibela/xidel	A tool to extract data from web pages using various query languages and selectors.	690