sbwce
Spanish Corpus
A collection of linguistic resources and trained word embeddings for the Spanish language.
Spanish Billion Word Corpus and Embeddings
45 stars
4 watching
8 forks
Language: Python
last commit: almost 2 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
josecannete/spanish-corpora | A collection of unannotated Spanish text data, compiled from various sources and processed for natural language processing tasks. | 92 |
dccuchile/spanish-word-embeddings | A collection of precomputed word embeddings for the Spanish language, derived from different corpora and computational methods. | 356 |
bertez/corpora | A collection of Galician language data in JSON format. | 2 |
dccuchile/beto | A pre-trained NLP model trained on Spanish text data using the BERT architecture | 492 |
botcenter/spanishwordembeddings | This project generates Spanish word embeddings using fastText on large corpora. | 9 |
bigredt/vico | Multi-sense word embeddings learned from visual cooccurrences | 25 |
zhangxiangxiao/crepe | A toolkit for building character-level convolutional networks for text classification using Torch 7. | 848 |
hslcy/vcwe | This project provides code and corpora for creating word embeddings by considering the visual characteristics of words. | 15 |
cidles/pyannotation | A Python library to access and manipulate linguistically annotated corpus files in various formats. | 16 |
universaldependencies/ud_galician-ctg | This is a collection of annotated text data for the Galician language. | 1 |
qhungngo/evbcorpus | A large-scale bilingual corpus collection for language technology and NLP tasks, containing English-Vietnamese translations and bitexts. | 42 |
phonologicalcorpustools/corpustools | A collection of tools and libraries for analyzing and processing phonological data in various languages | 113 |
ainfosec/crema | A compiler and runtime system for executing a minimalist programming language in sub-Turing Complete space. | 64 |
christos-c/bible-corpus | A multilingual parallel corpus created from translations of the Bible. | 176 |
kscanne/chichewa | A collection of NLP resources for a Bantu language, including a basic lexicon and script for morphological generation. | 9 |