Tatoeba-Challenge

Translation dataset pack

A collection of machine translation datasets and tools to support real-world low-resource scenarios

GitHub

804 stars
23 watching
91 forks
Language: Makefile
last commit: 3 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
helsinki-nlp/xed A multilingual dataset for sentiment analysis and emotion detection from movie subtitles. 56
helsinki-nlp/ukrainianlt A collection of Ukrainian language tools and resources for machine translation, natural language processing, and text translation. 30
karthikncode/nlp-datasets A curated list of Natural Language Processing datasets used to train and evaluate NLP models. 919
alexa/massive A collection of tools and modeling code for a large multilingual Natural Language Understanding dataset 538
nlpodyssey/cybertron A Go package providing an easy interface to use pre-trained NLP models from the HuggingFace repository for tasks like text classification and machine translation. 286
fido-ai/ua-datasets Provides a collection of datasets for natural language processing in Ukrainian. 56
chakki-works/chazutsu A tool that simplifies the process of preparing and manipulating natural language processing datasets 243
gopherdata/resources A collection of Go-based resources and tools for data science tasks 876
nytud/hulu A collection of linguistic datasets and benchmarks for natural language understanding tasks 9
enginbozkurt/carla-training-data Generates training data from the Carla driving simulator in the KITTI dataset format for autonomous vehicle development 108
mikahama/uralicnlp An NLP library providing morphological analyses and lemmatization tools for various languages, including Uralic and some European languages. 70
peleiden/daluke A language model trained on Danish Wikipedia data for named entity recognition and masked language modeling 9
kimtaro/ve A linguistic framework for natural language processing tasks. 216
michael-wzhu/promptcblue A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain 323
nytud/happ A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms 1