lassie

Content scraper

Library for retrieving basic content from websites

Web Content Retrieval for Humans™

GitHub

613 stars
22 watching
49 forks
Language: HTML
last commit: over 2 years ago
Linked from 2 awesome lists

contentmetaoembedpythonrequests

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
kennethreitz/requests-html A Pythonic HTML parsing library providing intuitive and asynchronous web scraping capabilities. 303
meilisearch/docs-scraper Automates scraping and indexing of documentation content into a search engine 290
felipecsl/wombat A Ruby-based web crawler and data extraction tool with an elegant DSL. 1,315
malfrats/xeuledoc A tool to fetch information about public Google documents from various services 846
laramies/metagoofil Extracts metadata from public documents available on websites 1,028
aantron/lambdasoup A functional HTML scraping and manipulation library 383
the-markup/blacklight-collector A tool for scraping website content and analyzing browser behavior 202
needmorecowbell/giggity A tool to scrape and store hierarchical data about GitHub organizations, users, or repositories. 126
scrapy/scrapely A pure-python library for extracting structured data from HTML pages. 1,863
mdsecactivebreach/linkedint A Python-based tool for extracting and analyzing LinkedIn data for reconnaissance purposes during adversary simulation. 476
medialab/minet A command line tool and Python library for extracting data from various web sources. 286
miyagawa/web-scraper A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface. 104
propublica/upton A web scraping framework that simplifies the process by handling repetitive tasks and provides options for efficient data retrieval 1,613
localvoid/ndx A lightweight library for full-text indexing and searching 153
benibela/xidel A tool to extract data from web pages using various query languages and selectors. 681