massive

Multilingual NLU dataset toolkit

A collection of tools and modeling code for a large multilingual Natural Language Understanding dataset

Tools and Modeling Code for the MASSIVE dataset

GitHub

538 stars
17 watching
57 forks
Language: Python
last commit: almost 2 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
eleutherai/polyglot Large language models designed to perform well in multiple languages and address performance issues with current multilingual models. 475
bilibili/index-1.9b A lightweight, multilingual language model with a long context length 904
microsoft/unicoder This repository provides pre-trained models and code for understanding and generation tasks in multiple languages. 88
fido-ai/ua-datasets Provides a collection of datasets for natural language processing in Ukrainian. 55
citiususc/linguakit A multilingual NLP toolkit providing various natural language processing tasks 65
nytud/hulu A collection of linguistic datasets and benchmarks for natural language understanding tasks 9
01-ai/yi A series of large language models trained from scratch to excel in multiple NLP tasks 7,699
vhellendoorn/code-lms A guide to using pre-trained large language models in source code analysis and generation 1,782
jd-aig/nlp_baai A collection of natural language processing models and tools for collaboration on a joint project between BAAI and JDAI. 252
goru001/inltk A comprehensive toolkit for Natural Language Processing tasks in Indic languages, providing pre-trained models and datasets. 822
kimtaro/ve A linguistic framework for natural language processing tasks. 216
chakki-works/chazutsu A tool that simplifies the process of preparing and manipulating natural language processing datasets 243
xverse-ai/xverse-moe-a36b Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. 36
poio-nlp/poio-corpus A collection of language resources extracted from publicly available sources. 7
felixgithub2017/mmcu Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. 87