gv-crawl
Text aligner
Automates text extraction and alignment from Global Voices articles to create parallel corpora for low-resource languages.
Global Voices bitext crawler
9 stars
1 watching
4 forks
Language: Python
last commit: over 10 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| A Python script that captures data from vgchartz.com and saves it to a CSV file | 80 |
| A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. | 2,037 |
| A Python-based web crawler for scraping Github user and repository data. | 264 |
| A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. | 188 |
| A versatile web crawler built with modern tools and concurrency to handle various crawl tasks | 188 |
| A concurrent web crawler written in Go that allows flexible and polite crawling of websites. | 2,036 |
| Evaluates and aligns the values of Chinese large language models with safety and responsibility standards | 481 |
| Analyzes text data to extract patterns of words or characters for password cracking and analysis purposes. | 28 |
| A benchmark suite for evaluating the performance of video generative models | 643 |
| Searches public pastebins for specified keywords and returns matching results | 428 |
| Automates the process of extracting parallel sentences from comparable corpora to aid in statistical machine translation | 127 |
| Performs web page crawling at high performance. | 51 |
| A web crawler designed to efficiently collect and prioritize relevant content from the web | 459 |
| Enables data translation between Minecraft versions and platforms via an intermediate format. | 27 |
| A dungeon crawler game written in Python, featuring procedurally generated content and turn-based gameplay. | 57 |