awesome-linguistics
Language toolkit
A curated collection of resources and tools for linguistics and natural language processing
A curated list of anything remotely related to linguistics
371 stars
27 watching
29 forks
last commit: 12 days ago
Linked from 3 awesome lists
awesome-listlanguagelinguisticsresources
Platforms and toolkits | |||
CLARIN-D web tools | Tools for Analysing Research Data | ||
CorpusExplorer | Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 50 interactive visualizations under a user-friendly interface | ||
Haxe-linguistics | 26 | over 3 years ago | Early linguistical analysis and natural language processing library for Haxe |
Natural | 10,625 | 3 months ago | General natural language tools for Node.js |
Natural Language ToolKit (NLTK) | The most complete platform for building Python programs to work with human language data | ||
Snowball | Snowball is a language in which stemming algorithms can be easily represented | ||
Spacy | Industrial-strength National Language Processing in Python | ||
Mate Tools | , webservice via | ||
UBIAI | Easy-to-use text annotation tool for teams with most comprehensive auto-annotation features. Supports NER, relations and document classification as well as OCR annotation for invoice labeling | ||
textblob-de | 104 | over 3 years ago | Nice alternative for spacy (see above) |
UralicNLP | 70 | 16 days ago | An open source Python library for processing morphologically rich and, for the most part, endangered Uralic languages. It can do morphological analysis, generation, lemmatization, disambiguation and lexical lookup for a great many Uralic languages |
Algorithms | |||
Stemming algorithms for various European languages | Various stemming algorithms from snowball | ||
The Porter Stemmer Algorithm | The ‘official’ home page for distribution of the Porter Stemming Algorithm, written and maintained by its author, Martin Porter | ||
Data sets | |||
EuroRomCom Data | 20 | about 7 years ago | JSON formatted Pan-Romance word lists |
Araneum Germanicum | |||
CEHugeWebCorpus | German corpus based on CommonCrawl | ||
Digitales Wörterbuch der deutschen Sprache (DWDS) | |||
GC4 Corpus | (CommonCrawl) | ||
IDS Corpora | German Reference Corpus | ||
Leipzig Corpora Collection | sampled sentences in different languages | ||
SdeWaC | big german internet corpus | ||
C-WEP | |||
DysList (list of dyslexic errors) | 5 | almost 6 years ago | |
Falko | |||
Litkey | |||
OpinionSpam | 2 | about 7 years ago | |
Resources | |||
Low Resource Languages | 390 | 7 months ago | A list of resources for conservation, development, and documentation of low resource (human) languages |
Language Science Press | Language Science Press is a born-digital scholar-led open access publisher in linguistics | ||
Deep learning models and transformers | |||
dbmdz BERT models | 155 | almost 2 years ago | |
Deepset German BERT model | |||
Evaluating German Transformer Language Models with Syntactic Agreement Tests | 7 | over 1 year ago | |
German ELMo Model | 28 | almost 5 years ago | |
german-transformer-training | 23 | over 3 years ago | |
GermLM | 14 | over 5 years ago | (NER exploration) |
GerPT2 | 20 | over 2 years ago | |
Sentence Transformers | 15,329 | 6 days ago | |
On Wikipedia | |||
Bag of words model | |||
Document classification | |||
Language models | |||
Naive Bayes classification | |||
Natural language processing | |||
Outline of natural language processing | |||
Parts of speech tagging | |||
Sentiment analysis | |||
Term frequency - inverse document frequency | |||
Vector space model | |||
On Youtube | |||
Computational Linguistics Lecture Playlist (Youtube) | Lectures for University of Maryland class on computational linguistics | ||
The Virtual Linguistics Campus | CC-licensed educational videos interconnected with Marburg University's e-learning platform of the same name | ||
Books | |||
Essentials of Linguistics, 2nd edition | An introductory book (2nd edition) | ||
Introduction to Linguistics | |||
Natural Language Processing with Python | The book from the NLTK package | ||
Text Mining with R | |||
Foundations of Computational Linguistics | |||
Foundations of Statistical Natural Language Processing | |||
Semisupervised Learning for Computational Linguistics | |||
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition | |||
The Oxford Handbook of Computational Linguistics | |||
Standards | |||
DTA Basisformat | |||
ISO TC 37 SC 4 | |||
UIMA | |||
Lists | |||
15 most popular books on good reads | |||
corpus-linguistics | GitHub topics & | ||
nlp-datasets | 5,775 | almost 2 years ago | |
NLP-progress | 22,715 | 4 months ago | |
/r/LanguageTechnology/ | |||
awesome-nlp | 16,768 | about 1 year ago | |
Awesome Community-Curated NLP List | 196 | over 2 years ago | |
awesome-chinese-nlp | 7,808 | over 1 year ago | |
awesome-danish | 165 | 17 days ago | |
awesome-hungarian-nlp | 224 | about 1 year ago | |
awesome Information Retrieval | 1,069 | over 1 year ago | |
Indonesian NLP | 279 | almost 3 years ago | |
Norwegian NLP resources | 177 | over 3 years ago | |
German NLP resources | 451 | 22 days ago | |
awesome-nlp-polish | 294 | over 3 years ago | |
awesome-spanish-nlp | 330 | 11 months ago | |
M. Weisser's list of NLP/Computational Linguistics Resources | |||
Communities | |||
Linguistics Stack Exchange | |||
Untranslatable.co, Multilingual urban dictionary |