awesome-spanish-nlp

Spanish NLP Resources

A curated collection of linguistic resources and datasets for natural language processing and computational linguistics in Spanish.

Curated list of Linguistic Resources for doing NLP & CL on Spanish

GitHub

330 stars
25 watching
41 forks
last commit: 11 months ago
Linked from 4 awesome lists


Clustering

Multilingual Latent Dirichlet Allocation LDA 82 4 months ago

Speech

Mexican Spanish Speech Recognition DB - 150 Speakers
Mexican Spanish Speech Recognition DB - 299 Speakers
Phonetic Transcriptions of Spanish Pronunciation Lexicon
Sphinx Speech Recognition Models

Speech / Part of Speech Taggers (POS Taggers)

TreeTagger - POSTagger
Stanford - POSTagger
Freeling
ixa-pipe-pos 17 about 2 years ago
Ruby Snowball Implementation 4 almost 12 years ago
Spaguetti POSTagger(Based on NLTK + CESS corpus

Multiword Expressions Extractors (MLWE)

Freeling

Multiword Expressions Extractors (MLWE) / Name Entity Recognition (NER)

OpenNLP - Person/Place/Organization models
DBPedia Spotlight 756 over 6 years ago
CitiusTagger - Spanish NER and POSTagger

Multiword Expressions Extractors (MLWE) / Corpora / Shared tasks

Exploiting Parallel Texts for Statistical Machine Translation - NAACL 2006 in New York City
CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages
Quality Estimation (Spanish - English) WMT13
ACL 2010 in Uppsala - Shared Task: Machine Translation for European Languages
TASS - 2014 (Sentiment Analysis focused on Spanish)
SemEval-2 2010 Coreference Resolution in Multiple Languages
SAB Corpus (Spanish Corpus for Sentiment Analysis towards Brands)

Multiword Expressions Extractors (MLWE) / Corpora / Corpora

Multilingual Aligned Annotated Corpus (CRATER)
UAM Treebank - 1,500 syntactically annotated sentences extracted from newspapers (El País Digital and Compra Maestra
POSTagged/syntactic dependencies - European Corpus Initiative Multilingual Corpus I
The Corpus of Contemporary Spanish(POStags, lemmas)
Lemmas Dictionary
esTenten Spanish (POSTagged)
Europarl Corpus (Parallel Corpus English-Spanish)
Colombian Political Speeches 6 over 11 years ago
South American Slang Expressions/MTWE 6 over 11 years ago
Syntax and Semantic Annotations (Subset Ancora Corpus)
Plurilingual Specific Corpus on Economics, Medicine, Computer Science
Copenhagen Treebank (Dependency Parsing)
Reuters Corpora RCV2 - New Corpora
MolinoLabs Corpus - News Corpora from Spain, Argentina and Mexico
PANACEA- Legislation Corpus
PANACEA- Legislation Ngram Corpus
PANACEA- Dependency Parsed Corpus
PANACEA- Monolingual Lexica (MWE, Frames, Semantic Classes)
Opinion Mining - User reviews on Cars, Hotels, Washing machines, Books, Cell phones, Music..
Cross Lingual Textual Entailment (CLTE) Corpus (English-Spanish)
Ngram Frequencies out of Colombia News Corpora
Sagan Textual Entailment Test Suite
Garcia, Marcos and Pablo Gamallo, 2013 - Portuguese and Spanish biographical relation extraction corpora (Garcia, Marcos and Pablo Gamallo, 2013. Exploring the Effectiveness of Linguistic Knowledge for Biographical Relation Extraction. Natural Language Engineering, CJO2013. doi:10.1017/S1351324913000314.)
Garcia, Marcos and Pablo Gamallo, 2014 - Portuguese, Spanish and Galician coreference corpora (Garcia, Marcos and Pablo Gamallo, 2014. Multilingual corpora with coreferential annotation of person entities. In Proceedings of the 9th edition of the Language Resources and Evaluation Conference (LREC 2014), Reykjavik: 3229-3233.)
COW(Corpora From the Web) Ngram/Annotated People's Name Corpora
Wikicorpus- Portion of 2006's wikipedia annotated with WordNet Synsets and POS
Spanish Billion Words Corpus with word2vec Embeddings
OSCAR or Open Super-large Crawled ALMAnaCH coRpus Spanish subset

Multiword Expressions Extractors (MLWE) / Misc

Word2Vec vectors for Wikipedia Spanish Articles 601 almost 7 years ago
DBpedia Spanish Entities Titles
DBpedia Spanish Abstracts
Conshuga - Galician Verb conjugator

Backlinks from these awesome lists:

More related projects: