sbwce

Spanish Corpus

A collection of linguistic resources and trained word embeddings for the Spanish language.

Spanish Billion Word Corpus and Embeddings

GitHub

45 stars
4 watching
8 forks
Language: Python
last commit: almost 2 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
josecannete/spanish-corpora A collection of unannotated Spanish text data, compiled from various sources and processed for natural language processing tasks. 92
dccuchile/spanish-word-embeddings A collection of precomputed word embeddings for the Spanish language, derived from different corpora and computational methods. 356
bertez/corpora A collection of Galician language data in JSON format. 2
dccuchile/beto A pre-trained NLP model trained on Spanish text data using the BERT architecture 492
botcenter/spanishwordembeddings This project generates Spanish word embeddings using fastText on large corpora. 9
bigredt/vico Multi-sense word embeddings learned from visual cooccurrences 25
zhangxiangxiao/crepe A toolkit for building character-level convolutional networks for text classification using Torch 7. 848
hslcy/vcwe This project provides code and corpora for creating word embeddings by considering the visual characteristics of words. 15
cidles/pyannotation A Python library to access and manipulate linguistically annotated corpus files in various formats. 16
universaldependencies/ud_galician-ctg This is a collection of annotated text data for the Galician language. 1
qhungngo/evbcorpus A large-scale bilingual corpus collection for language technology and NLP tasks, containing English-Vietnamese translations and bitexts. 42
phonologicalcorpustools/corpustools A collection of tools and libraries for analyzing and processing phonological data in various languages 113
ainfosec/crema A compiler and runtime system for executing a minimalist programming language in sub-Turing Complete space. 64
christos-c/bible-corpus A multilingual parallel corpus created from translations of the Bible. 176
kscanne/chichewa A collection of NLP resources for a Bantu language, including a basic lexicon and script for morphological generation. 9