massive
Multilingual NLU dataset toolkit
A collection of tools and modeling code for a large multilingual Natural Language Understanding dataset
Tools and Modeling Code for the MASSIVE dataset
538 stars
17 watching
57 forks
Language: Python
last commit: almost 2 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
eleutherai/polyglot | Large language models designed to perform well in multiple languages and address performance issues with current multilingual models. | 475 |
bilibili/index-1.9b | A lightweight, multilingual language model with a long context length | 904 |
microsoft/unicoder | This repository provides pre-trained models and code for understanding and generation tasks in multiple languages. | 88 |
fido-ai/ua-datasets | Provides a collection of datasets for natural language processing in Ukrainian. | 55 |
citiususc/linguakit | A multilingual NLP toolkit providing various natural language processing tasks | 65 |
nytud/hulu | A collection of linguistic datasets and benchmarks for natural language understanding tasks | 9 |
01-ai/yi | A series of large language models trained from scratch to excel in multiple NLP tasks | 7,699 |
vhellendoorn/code-lms | A guide to using pre-trained large language models in source code analysis and generation | 1,782 |
jd-aig/nlp_baai | A collection of natural language processing models and tools for collaboration on a joint project between BAAI and JDAI. | 252 |
goru001/inltk | A comprehensive toolkit for Natural Language Processing tasks in Indic languages, providing pre-trained models and datasets. | 822 |
kimtaro/ve | A linguistic framework for natural language processing tasks. | 216 |
chakki-works/chazutsu | A tool that simplifies the process of preparing and manipulating natural language processing datasets | 243 |
xverse-ai/xverse-moe-a36b | Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. | 36 |
poio-nlp/poio-corpus | A collection of language resources extracted from publicly available sources. | 7 |
felixgithub2017/mmcu | Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. | 87 |