cnminlangwebcollect

Language detector collection tool

Detects languages of Chinese minority websites and collects them into a dataset.

Chinese minorities website languages detection and websites collection

GitHub

1 stars
2 watching
8 forks
Language: Python
last commit: about 4 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
pemistahl/lingua An accurate language detection library for Java and the JVM suitable for both short and long text inputs. 707
hashwin/scylla A Ruby-based language detection tool that uses N-Gram based text categorization to identify the language of given text. 36
vseloved/wiki-lang-detect Uses Wikipedia data to identify the language of unstructured text 31
unlyed/universal-language-detector Detects and resolves the language used in user requests 95
hanzhenlei767/nlp_learn A comprehensive collection of NLP-related code snippets and notes on various models and techniques, including pre-trained language models and Chinese text processing methods. 25
pemistahl/lingua-go A library that accurately detects the language of short to long text inputs without requiring external APIs or configuration. 1,190
minibikini/paasaa Tools for detecting the language of unstructured text in Elixir applications 115
jingzhang617/cod-rank-localize-and-segment Develops a system to detect, segment, and rank camouflaged objects in images. 74
greyblake/whatlang-rs A Rust library for detecting the language of text, including script recognition and reliability estimation. 970
olivomarco/lc4j An open-source Java library implementing text categorization and language detection using N-grams. 5
abadojack/whatlanggo A library for detecting and identifying languages in text 643
alvations/sugali A system designed to identify the language of an arbitrary text string using machine learning and multiple data sources. 2
hemangsk/capacitor-mlkit-language An Android and iOS plugin using ML Kit for language identification on device 3
detectlanguage/detectlanguage-go A Go client for detecting the language of given text and interacting with the Detect Language API 25
cisnlp/glotlid A language identification model that supports over 2000 languages and can be used for various NLP tasks. 90