massive

Multilingual NLU dataset toolkit

A collection of tools and modeling code for a large multilingual Natural Language Understanding dataset

Tools and Modeling Code for the MASSIVE dataset

GitHub

541 stars
17 watching
57 forks
Language: Python
last commit: about 2 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
eleutherai/polyglot Large language models designed to perform well in multiple languages and address performance issues with current multilingual models. 476
bilibili/index-1.9b A lightweight, multilingual language model with a long context length 920
microsoft/unicoder This repository provides pre-trained models and code for understanding and generation tasks in multiple languages. 89
fido-ai/ua-datasets Provides a collection of datasets for natural language processing in Ukrainian. 57
citiususc/linguakit A multilingual NLP toolkit providing various natural language processing tasks 65
nytud/hulu A collection of linguistic datasets and benchmarks for natural language understanding tasks 8
01-ai/yi A series of large language models trained from scratch to excel in multiple NLP tasks 7,743
vhellendoorn/code-lms A guide to using pre-trained large language models in source code analysis and generation 1,789
jd-aig/nlp_baai A collection of natural language processing models and tools for collaboration on a joint project between BAAI and JDAI. 254
goru001/inltk A comprehensive toolkit for Natural Language Processing tasks in Indic languages, providing pre-trained models and datasets. 825
kimtaro/ve A linguistic framework for natural language processing tasks. 216
chakki-works/chazutsu A tool that simplifies the process of preparing and manipulating natural language processing datasets 243
xverse-ai/xverse-moe-a36b Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. 37
poio-nlp/poio-corpus A collection of language resources extracted from publicly available sources. 7
felixgithub2017/mmcu Measures the understanding of massive multitask Chinese datasets using large language models 87