Tatoeba-Challenge

Translation dataset pack

A collection of machine translation datasets and tools to support real-world low-resource scenarios

GitHub

811 stars
23 watching
90 forks
Language: Makefile
last commit: 5 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
helsinki-nlp/xed A multilingual dataset for sentiment analysis and emotion detection from movie subtitles. 56
helsinki-nlp/ukrainianlt A collection of Ukrainian language tools and resources for machine translation, natural language processing, and text translation. 30
karthikncode/nlp-datasets A curated list of Natural Language Processing datasets used to train and evaluate NLP models. 919
alexa/massive A collection of tools and modeling code for a large multilingual Natural Language Understanding dataset 541
nlpodyssey/cybertron A Go package providing an easy interface to use pre-trained NLP models from the HuggingFace repository for tasks like text classification and machine translation. 293
fido-ai/ua-datasets Provides a collection of datasets for natural language processing in Ukrainian. 57
chakki-works/chazutsu A tool that simplifies the process of preparing and manipulating natural language processing datasets 243
gopherdata/resources A collection of Go-based resources and tools for data science tasks 879
nytud/hulu A collection of linguistic datasets and benchmarks for natural language understanding tasks 8
enginbozkurt/carla-training-data Generates training data from the Carla driving simulator in the KITTI dataset format for autonomous vehicle development 108
mikahama/uralicnlp An NLP library providing morphological analysis and language modeling tools for Uralic languages and others. 71
peleiden/daluke A language model trained on Danish Wikipedia data for named entity recognition and masked language modeling 9
kimtaro/ve A linguistic framework for natural language processing tasks. 216
michael-wzhu/promptcblue A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain 328
nytud/happ A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms 1