gv-crawl
Text aligner
Automates text extraction and alignment from Global Voices articles to create parallel corpora for low-resource languages.
Global Voices bitext crawler
9 stars
1 watching
4 forks
Language: Python
last commit: about 10 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
gregorut/vgchartzscrape | A Python script that captures data from vgchartz.com and saves it to a CSV file | 79 |
elliotgao2/gain | A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. | 2,035 |
chenjiandongx/github-spider | A Python-based web crawler for scraping Github user and repository data. | 264 |
jmg/crawley | A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. | 186 |
cocrawler/cocrawler | A versatile web crawler built with modern tools and concurrency to handle various crawl tasks | 187 |
puerkitobio/gocrawl | A concurrent web crawler written in Go that allows flexible and polite crawling of websites. | 2,038 |
x-plug/cvalues | Evaluates and aligns the values of Chinese large language models with safety and responsibility standards | 477 |
0xvavaldi/gramify | Analyzes text data to extract patterns of words or characters for password cracking and analysis purposes. | 28 |
vchitect/vbench | A tool for evaluating and benchmarking video generative models in computer vision and artificial intelligence | 576 |
kahunalu/pwnbin | Searches public pastebins for specified keywords and returns matching results | 427 |
machinalis/yalign | Automates the process of extracting parallel sentences from comparable corpora to aid in statistical machine translation | 127 |
a11ywatch/crawler | Performs web page crawling at high performance. | 49 |
vida-nyu/ache | A web crawler designed to efficiently collect and prioritize relevant content from the web | 454 |
gentlegiantjgc/pymctranslate | Enables data translation between Minecraft versions and platforms via an intermediate format. | 27 |
jwvhewitt/dmeternal | A dungeon crawler game written in Python, featuring procedurally generated content and turn-based gameplay. | 57 |