GlotLID

Language identifier

A language identification model that supports over 2000 languages and can be used for various NLP tasks.

Language Identification with Support for More Than 2000 Labels -- EMNLP 2023

GitHub

106 stars
5 watching
7 forks
Language: Python
last commit: 19 days ago
Linked from 1 awesome list

glotglotccglotlidlangidlanguage-classificationlanguage-detectionlanguage-detection-liblanguage-detection-librarylanguage-detectorlanguage-identificationlanguage-identification-toolkitlanguage-identifierlanguage-recognitionlidlow-resource-languageslow-resource-nlpmultlingual

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
alvations/sugarlike A tool that identifies languages in text by comparing them to a reference set of patterns. 1
twerkmeister/ilid A deep learning-based system for identifying spoken language in audio files. 90
pld-linux/aspell-gl A Galician language dictionary for use in spell-checking software 1
hashwin/scylla A Ruby-based language detection tool that uses N-Gram based text categorization to identify the language of given text. 36
karthikncode/nlp-datasets A curated list of Natural Language Processing datasets used to train and evaluate NLP models. 919
alvations/sugali A system designed to identify the language of an arbitrary text string using machine learning and multiple data sources. 2
pemistahl/lingua-go A library that accurately detects the language of short to long text inputs without requiring external APIs or configuration. 1,192
abadojack/whatlanggo A library for detecting and identifying languages in text 644
richardlitt/lrl Developing tools and scripts to extract data from low-resource languages, focusing on language processing and machine learning applications. 2
cltk/cltk A Python library offering natural language processing capabilities for pre-modern languages 843
microgit-com/linguist.cr An implementation of GitHub's Linguist for syntax highlighting and language detection in Crystal programming language 8
pemistahl/lingua An accurate language detection library for Java and the JVM suitable for both short and long text inputs. 716
ydli-ai/csl A large-scale dataset for natural language processing tasks focused on Chinese scientific literature, providing tools and benchmarks for NLP research. 582
hyphenliu/cnminlangwebcollect Detects languages of Chinese minority websites and collects them into a dataset. 1
greyblake/whatlang-rs A Rust library for detecting the language of text, including script recognition and reliability estimation. 980