Tatoeba-Challenge
Translation dataset pack
A collection of machine translation datasets and tools to support real-world low-resource scenarios
811 stars
23 watching
90 forks
Language: Makefile
last commit: 5 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
helsinki-nlp/xed | A multilingual dataset for sentiment analysis and emotion detection from movie subtitles. | 56 |
helsinki-nlp/ukrainianlt | A collection of Ukrainian language tools and resources for machine translation, natural language processing, and text translation. | 30 |
karthikncode/nlp-datasets | A curated list of Natural Language Processing datasets used to train and evaluate NLP models. | 919 |
alexa/massive | A collection of tools and modeling code for a large multilingual Natural Language Understanding dataset | 541 |
nlpodyssey/cybertron | A Go package providing an easy interface to use pre-trained NLP models from the HuggingFace repository for tasks like text classification and machine translation. | 293 |
fido-ai/ua-datasets | Provides a collection of datasets for natural language processing in Ukrainian. | 57 |
chakki-works/chazutsu | A tool that simplifies the process of preparing and manipulating natural language processing datasets | 243 |
gopherdata/resources | A collection of Go-based resources and tools for data science tasks | 879 |
nytud/hulu | A collection of linguistic datasets and benchmarks for natural language understanding tasks | 8 |
enginbozkurt/carla-training-data | Generates training data from the Carla driving simulator in the KITTI dataset format for autonomous vehicle development | 108 |
mikahama/uralicnlp | An NLP library providing morphological analysis and language modeling tools for Uralic languages and others. | 71 |
peleiden/daluke | A language model trained on Danish Wikipedia data for named entity recognition and masked language modeling | 9 |
kimtaro/ve | A linguistic framework for natural language processing tasks. | 216 |
michael-wzhu/promptcblue | A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain | 328 |
nytud/happ | A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms | 1 |