spanish-corpora
Spanish Corpus
A collection of unannotated Spanish text data, compiled from various sources and processed for natural language processing tasks.
Unannotated Spanish 3 Billion Words Corpora
92 stars
4 watching
10 forks
Language: Python
last commit: over 2 years ago
Linked from 1 awesome list
corporalinguisticsnatural-language-processingnlpspanishspanish-language
Related projects:
Repository | Description | Stars |
---|---|---|
| A collection of linguistic resources and trained word embeddings for the Spanish language. | 45 |
| A collection of Galician language data in JSON format. | 2 |
| A collection of small datasets from various languages to test and evaluate NLP scripts | 3 |
| A collection of precomputed word embeddings for the Spanish language, derived from different corpora and computational methods. | 354 |
| This project generates Spanish word embeddings using fastText on large corpora. | 9 |
| This is a collection of annotated text data for the Galician language. | 1 |
| A pre-trained NLP model trained on Spanish text data using the BERT architecture | 490 |
| A multilingual parallel corpus created from translations of the Bible. | 177 |
| A Python library to access and manipulate linguistically annotated corpus files in various formats. | 16 |
| A dataset and annotation scheme for Hungarian causal reasoning tasks. | 1 |
| This project trains a machine learning model to generate sentence embeddings from Spanish text data using the sent2vec algorithm. | 4 |
| A large dataset of news articles with labeled categories to train fake news recognition algorithms | 385 |
| A collection of linguistic and text resources for Latin America | 6 |
| A manually annotated corpus for training and testing machine learning models of Aspect Based Sentiment Analysis (ABSA) in Hungarian language. | 0 |
| A large annotated corpus of Hungarian text with various linguistic annotations, split into development and test datasets for natural language processing tasks. | 2 |