sbwce

Spanish Corpus

A collection of linguistic resources and trained word embeddings for the Spanish language.

Spanish Billion Word Corpus and Embeddings

45 stars

4 watching

8 forks

Language: Python

last commit: over 3 years ago

Linked from 1 awesome list

Screenshot of crscardellino/sbwce website

crscardellino.github.io/SBWCE

Backlinks from these awesome lists:

keon/awesome-nlp

Related projects:

Repository	Description	Stars
josecannete/spanish-corpora	A collection of unannotated Spanish text data, compiled from various sources and processed for natural language processing tasks.	92
dccuchile/spanish-word-embeddings	A collection of precomputed word embeddings for the Spanish language, derived from different corpora and computational methods.	354
bertez/corpora	A collection of Galician language data in JSON format.	2
dccuchile/beto	A pre-trained NLP model trained on Spanish text data using the BERT architecture	490
botcenter/spanishwordembeddings	This project generates Spanish word embeddings using fastText on large corpora.	9
bigredt/vico	Multi-sense word embeddings learned from visual cooccurrences	25
zhangxiangxiao/crepe	A toolkit for building character-level convolutional networks for text classification using Torch 7.	848
hslcy/vcwe	This project provides code and corpora for creating word embeddings by considering the visual characteristics of words.	15
cidles/pyannotation	A Python library to access and manipulate linguistically annotated corpus files in various formats.	16
universaldependencies/ud_galician-ctg	This is a collection of annotated text data for the Galician language.	1
qhungngo/evbcorpus	A large-scale bilingual corpus collection for language technology and NLP tasks, containing English-Vietnamese translations and bitexts.	42
phonologicalcorpustools/corpustools	A collection of tools and libraries for analyzing and processing phonological data in various languages	115
ainfosec/crema	A compiler and runtime system for executing a minimalist programming language in sub-Turing Complete space.	64
christos-c/bible-corpus	A multilingual parallel corpus created from translations of the Bible.	177
kscanne/chichewa	A collection of NLP resources for a Bantu language, including a basic lexicon and script for morphological generation.	9