massive

Multilingual NLU dataset toolkit

A collection of tools and modeling code for a large multilingual Natural Language Understanding dataset

Tools and Modeling Code for the MASSIVE dataset

GitHub

541 stars

17 watching

57 forks

Language: Python

last commit: over 3 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

oroszgy/awesome-hungarian-nlp

Related projects:

Repository	Description	Stars
eleutherai/polyglot	Large language models designed to perform well in multiple languages and address performance issues with current multilingual models.	476
bilibili/index-1.9b	A lightweight, multilingual language model with a long context length	920
microsoft/unicoder	This repository provides pre-trained models and code for understanding and generation tasks in multiple languages.	89
fido-ai/ua-datasets	Provides a collection of datasets for natural language processing in Ukrainian.	57
citiususc/linguakit	A multilingual NLP toolkit providing various natural language processing tasks	65
nytud/hulu	A collection of linguistic datasets and benchmarks for natural language understanding tasks	8
01-ai/yi	A series of large language models trained from scratch to excel in multiple NLP tasks	7,743
vhellendoorn/code-lms	A guide to using pre-trained large language models in source code analysis and generation	1,789
jd-aig/nlp_baai	A collection of natural language processing models and tools for collaboration on a joint project between BAAI and JDAI.	254
goru001/inltk	A comprehensive toolkit for Natural Language Processing tasks in Indic languages, providing pre-trained models and datasets.	825
kimtaro/ve	A linguistic framework for natural language processing tasks.	216
chakki-works/chazutsu	A tool that simplifies the process of preparing and manipulating natural language processing datasets	243
xverse-ai/xverse-moe-a36b	Develops and publishes large multilingual language models with advanced mixing-of-experts architecture.	37
poio-nlp/poio-corpus	A collection of language resources extracted from publicly available sources.	7
felixgithub2017/mmcu	Measures the understanding of massive multitask Chinese datasets using large language models	87