Tatoeba-Challenge

Translation dataset pack

A collection of machine translation datasets and tools to support real-world low-resource scenarios

811 stars

23 watching

90 forks

Language: Makefile

last commit: almost 2 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

osyvokon/awesome-ukrainian-nlp

Related projects:

Repository	Description	Stars
helsinki-nlp/xed	A multilingual dataset for sentiment analysis and emotion detection from movie subtitles.	56
helsinki-nlp/ukrainianlt	A collection of Ukrainian language tools and resources for machine translation, natural language processing, and text translation.	30
karthikncode/nlp-datasets	A curated list of Natural Language Processing datasets used to train and evaluate NLP models.	919
alexa/massive	A collection of tools and modeling code for a large multilingual Natural Language Understanding dataset	541
nlpodyssey/cybertron	A Go package providing an easy interface to use pre-trained NLP models from the HuggingFace repository for tasks like text classification and machine translation.	293
fido-ai/ua-datasets	Provides a collection of datasets for natural language processing in Ukrainian.	57
chakki-works/chazutsu	A tool that simplifies the process of preparing and manipulating natural language processing datasets	243
gopherdata/resources	A collection of Go-based resources and tools for data science tasks	879
nytud/hulu	A collection of linguistic datasets and benchmarks for natural language understanding tasks	8
enginbozkurt/carla-training-data	Generates training data from the Carla driving simulator in the KITTI dataset format for autonomous vehicle development	108
mikahama/uralicnlp	An NLP library providing morphological analysis and language modeling tools for Uralic languages and others.	71
peleiden/daluke	A language model trained on Danish Wikipedia data for named entity recognition and masked language modeling	9
kimtaro/ve	A linguistic framework for natural language processing tasks.	216
michael-wzhu/promptcblue	A large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain	328
nytud/happ	A dataset of Hungarian translations of human-language examples to test anaphora resolution algorithms	1