gv-crawl
Text aligner
Automates text extraction and alignment from Global Voices articles to create parallel corpora for low-resource languages.
Global Voices bitext crawler
9 stars
1 watching
4 forks
Language: Python
last commit: about 11 years ago
Linked from 1 awesome list
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | A Python script that captures data from vgchartz.com and saves it to a CSV file | 80 |
| | A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. | 2,037 |
| | A Python-based web crawler for scraping Github user and repository data. | 264 |
| | A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. | 188 |
| | A versatile web crawler built with modern tools and concurrency to handle various crawl tasks | 188 |
| | A concurrent web crawler written in Go that allows flexible and polite crawling of websites. | 2,036 |
| | Evaluates and aligns the values of Chinese large language models with safety and responsibility standards | 481 |
| | Analyzes text data to extract patterns of words or characters for password cracking and analysis purposes. | 28 |
| | A benchmark suite for evaluating the performance of video generative models | 643 |
| | Searches public pastebins for specified keywords and returns matching results | 428 |
| | Automates the process of extracting parallel sentences from comparable corpora to aid in statistical machine translation | 127 |
| | Performs web page crawling at high performance. | 51 |
| | A web crawler designed to efficiently collect and prioritize relevant content from the web | 459 |
| | Enables data translation between Minecraft versions and platforms via an intermediate format. | 27 |
| | A dungeon crawler game written in Python, featuring procedurally generated content and turn-based gameplay. | 57 |