GlotLID
Language identifier
A language identification model that supports over 2000 languages and can be used for various NLP tasks.
Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
90 stars
5 watching
7 forks
Language: Python
last commit: 22 days ago
Linked from 1 awesome list
glotglotccglotlidlangidlanguage-classificationlanguage-detectionlanguage-detection-liblanguage-detection-librarylanguage-detectorlanguage-identificationlanguage-identification-toolkitlanguage-identifierlanguage-recognitionlidlow-resource-languageslow-resource-nlpmultlingual
Related projects:
Repository | Description | Stars |
---|---|---|
alvations/sugarlike | A tool that identifies languages in text by comparing them to a reference set of patterns. | 1 |
twerkmeister/ilid | A deep learning-based system for identifying spoken language in audio files. | 90 |
pld-linux/aspell-gl | A Galician language dictionary for use in spell-checking software | 1 |
hashwin/scylla | A Ruby-based language detection tool that uses N-Gram based text categorization to identify the language of given text. | 36 |
karthikncode/nlp-datasets | A curated list of Natural Language Processing datasets used to train and evaluate NLP models. | 919 |
alvations/sugali | A system designed to identify the language of an arbitrary text string using machine learning and multiple data sources. | 2 |
pemistahl/lingua-go | A library that accurately detects the language of short to long text inputs without requiring external APIs or configuration. | 1,190 |
abadojack/whatlanggo | A library for detecting and identifying languages in text | 643 |
richardlitt/lrl | Developing tools and scripts to extract data from low-resource languages, focusing on language processing and machine learning applications. | 2 |
cltk/cltk | A Python library offering natural language processing capabilities for pre-modern languages | 839 |
microgit-com/linguist.cr | An implementation of GitHub's Linguist for syntax highlighting and language detection in Crystal programming language | 8 |
pemistahl/lingua | An accurate language detection library for Java and the JVM suitable for both short and long text inputs. | 707 |
ydli-ai/csl | A large-scale dataset for natural language processing tasks focused on Chinese scientific literature, providing tools and benchmarks for NLP research. | 568 |
hyphenliu/cnminlangwebcollect | Detects languages of Chinese minority websites and collects them into a dataset. | 1 |
greyblake/whatlang-rs | A Rust library for detecting the language of text, including script recognition and reliability estimation. | 970 |