Tatoeba-Challenge
Translation dataset pack
A collection of machine translation datasets and tools to support real-world low-resource scenarios
804 stars
23 watching
91 forks
Language: Makefile
last commit: 3 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
helsinki-nlp/xed | A multilingual dataset for sentiment analysis and emotion detection from movie subtitles. | 56 |
helsinki-nlp/ukrainianlt | A collection of Ukrainian language tools and resources for machine translation, natural language processing, and text translation. | 30 |
karthikncode/nlp-datasets | A curated list of Natural Language Processing datasets used to train and evaluate NLP models. | 919 |
alexa/massive | A collection of tools and modeling code for a large multilingual Natural Language Understanding dataset | 538 |
nlpodyssey/cybertron | A Go package providing an easy interface to use pre-trained NLP models from the HuggingFace repository for tasks like text classification and machine translation. | 286 |
fido-ai/ua-datasets | Provides a collection of datasets for natural language processing in Ukrainian. | 56 |
chakki-works/chazutsu | A tool that simplifies the process of preparing and manipulating natural language processing datasets | 243 |
gopherdata/resources | A collection of Go-based resources and tools for data science tasks | 876 |
nytud/hulu | A collection of linguistic datasets and benchmarks for natural language understanding tasks | 9 |
enginbozkurt/carla-training-data | Generates training data from the Carla driving simulator in the KITTI dataset format for autonomous vehicle development | 108 |
mikahama/uralicnlp | An NLP library providing morphological analyses and lemmatization tools for various languages, including Uralic and some European languages. | 70 |
peleiden/daluke | A language model trained on Danish Wikipedia data for named entity recognition and masked language modeling | 9 |
kimtaro/ve | A linguistic framework for natural language processing tasks. | 216 |
michael-wzhu/promptcblue | A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain | 323 |
nytud/happ | A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms | 1 |