Multilingual-Latent-Dirichlet-Allocation-LDA
Clustering tool
An LDA-based text clustering pipeline for multiple languages
A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.
82 stars
10 watching
29 forks
Language: Python
last commit: 4 months ago
Linked from 2 awesome lists
clusteringenglishfrenchlatent-dirichlet-allocationldamachine-learningmultilingualnatural-language-processing
Related projects:
Repository | Description | Stars |
---|---|---|
ealdent/lda-ruby | A Ruby wrapper around an existing C implementation of Latent Dirichlet Allocation (LDA) for topic modeling in natural language processing. | 133 |
primaryobjects/lda | A JavaScript library that uses Latent Dirichlet allocation to model topics in text data | 291 |
dvlab-research/lisa | A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge. | 1,861 |
james-bowman/nlp | This project provides a set of algorithms and implementations for natural language processing in Go. | 450 |
luogen1996/lavin | An open-source implementation of a vision-language instructed large language model | 508 |
mmihaltz/trendminer-hunlp | An NLP processing pipeline designed to handle the unique characteristics of social media text data in Hungarian. | 5 |
richardlitt/lrl | Developing tools and scripts to extract data from low-resource languages, focusing on language processing and machine learning applications. | 2 |
ldmt-muri/morpholm | This project develops language models that incorporate morphological knowledge to improve their understanding of linguistic structures and relationships. | 3 |
adbar/simplemma | Lemmatization tool for natural language processing | 145 |
lowresourcelanguages/hltdi-morphology | Provides morphological analysis tools for various languages, including verb and noun generation, based on archived web pages. | 5 |
scoder/lupa | A wrapper around Lua or LuaJIT that enables fast and efficient integration of dynamic languages into Python applications. | 1,018 |
eleutherai/polyglot | Large language models designed to perform well in multiple languages and address performance issues with current multilingual models. | 475 |
slycoder/topicmodels.jl | Software package implementing Bayesian topic modeling in Julia using Latent Dirichlet Allocation (LDA) model | 38 |
zaibacu/rita-dsl | A DSL for building custom NLP patterns from manual language rules | 65 |
ldmt-muri/alignment-with-openfst | An implementation of a CRF autoencoder framework for aligning text data | 21 |