GlotLID
Language identifier
A language identification model that supports over 2000 languages and can be used for various NLP tasks.
Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
106 stars
5 watching
7 forks
Language: Python
last commit: 3 months ago
Linked from 1 awesome list
glotglotccglotlidlangidlanguage-classificationlanguage-detectionlanguage-detection-liblanguage-detection-librarylanguage-detectorlanguage-identificationlanguage-identification-toolkitlanguage-identifierlanguage-recognitionlidlow-resource-languageslow-resource-nlpmultlingual
Related projects:
Repository | Description | Stars |
---|---|---|
| A tool that identifies languages in text by comparing them to a reference set of patterns. | 1 |
| A deep learning-based system for identifying spoken language in audio files. | 90 |
| A Galician language dictionary for use in spell-checking software | 1 |
| A Ruby-based language detection tool that uses N-Gram based text categorization to identify the language of given text. | 36 |
| A curated list of Natural Language Processing datasets used to train and evaluate NLP models. | 919 |
| A system designed to identify the language of an arbitrary text string using machine learning and multiple data sources. | 2 |
| A library that accurately detects the language of short to long text inputs without requiring external APIs or configuration. | 1,192 |
| A library for detecting and identifying languages in text | 644 |
| Developing tools and scripts to extract data from low-resource languages, focusing on language processing and machine learning applications. | 2 |
| A Python library offering natural language processing capabilities for pre-modern languages | 843 |
| An implementation of GitHub's Linguist for syntax highlighting and language detection in Crystal programming language | 8 |
| An accurate language detection library for Java and the JVM suitable for both short and long text inputs. | 716 |
| A large-scale dataset for natural language processing tasks focused on Chinese scientific literature, providing tools and benchmarks for NLP research. | 582 |
| Detects languages of Chinese minority websites and collects them into a dataset. | 1 |
| A Rust library for detecting the language of text, including script recognition and reliability estimation. | 980 |