awesome-linguistics
Language toolkit
A curated collection of resources and tools for linguistics and natural language processing
A curated list of anything remotely related to linguistics
377 stars
27 watching
29 forks
last commit: 12 months ago
Linked from 3 awesome lists
awesome-listlanguagelinguisticsresources
Platforms and toolkits | |||
| CLARIN-D web tools | Tools for Analysing Research Data | ||
| CorpusExplorer | Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 50 interactive visualizations under a user-friendly interface | ||
| Haxe-linguistics | 26 | over 4 years ago | Early linguistical analysis and natural language processing library for Haxe |
| Natural | 10,670 | over 1 year ago | General natural language tools for Node.js |
| Natural Language ToolKit (NLTK) | The most complete platform for building Python programs to work with human language data | ||
| Snowball | Snowball is a language in which stemming algorithms can be easily represented | ||
| Spacy | Industrial-strength National Language Processing in Python | ||
| Mate Tools | , webservice via | ||
| UBIAI | Easy-to-use text annotation tool for teams with most comprehensive auto-annotation features. Supports NER, relations and document classification as well as OCR annotation for invoice labeling | ||
| textblob-de | 104 | over 4 years ago | Nice alternative for spacy (see above) |
| UralicNLP | 71 | 11 months ago | An open source Python library for processing morphologically rich and, for the most part, endangered Uralic languages. It can do morphological analysis, generation, lemmatization, disambiguation and lexical lookup for a great many Uralic languages |
Algorithms | |||
| Stemming algorithms for various European languages | Various stemming algorithms from snowball | ||
| The Porter Stemmer Algorithm | The ‘official’ home page for distribution of the Porter Stemming Algorithm, written and maintained by its author, Martin Porter | ||
Data sets | |||
| EuroRomCom Data | 20 | about 8 years ago | JSON formatted Pan-Romance word lists |
| Araneum Germanicum | |||
| CEHugeWebCorpus | German corpus based on CommonCrawl | ||
| Digitales Wörterbuch der deutschen Sprache (DWDS) | |||
| GC4 Corpus | (CommonCrawl) | ||
| IDS Corpora | German Reference Corpus | ||
| Leipzig Corpora Collection | sampled sentences in different languages | ||
| SdeWaC | big german internet corpus | ||
| C-WEP | |||
| DysList (list of dyslexic errors) | 5 | almost 7 years ago | |
| Falko | |||
| Litkey | |||
| OpinionSpam | 2 | about 8 years ago | |
Resources | |||
| Low Resource Languages | 393 | over 1 year ago | A list of resources for conservation, development, and documentation of low resource (human) languages |
| Language Science Press | Language Science Press is a born-digital scholar-led open access publisher in linguistics | ||
Deep learning models and transformers | |||
| dbmdz BERT models | 155 | almost 3 years ago | |
| Deepset German BERT model | |||
| Evaluating German Transformer Language Models with Syntactic Agreement Tests | 7 | over 2 years ago | |
| German ELMo Model | 28 | almost 6 years ago | |
| german-transformer-training | 23 | over 4 years ago | |
| GermLM | 14 | over 6 years ago | (NER exploration) |
| GerPT2 | 20 | over 3 years ago | |
| Sentence Transformers | 15,556 | 11 months ago | |
On Wikipedia | |||
| Bag of words model | |||
| Document classification | |||
| Language models | |||
| Naive Bayes classification | |||
| Natural language processing | |||
| Outline of natural language processing | |||
| Parts of speech tagging | |||
| Sentiment analysis | |||
| Term frequency - inverse document frequency | |||
| Vector space model | |||
On Youtube | |||
| Computational Linguistics Lecture Playlist (Youtube) | Lectures for University of Maryland class on computational linguistics | ||
| The Virtual Linguistics Campus | CC-licensed educational videos interconnected with Marburg University's e-learning platform of the same name | ||
Books | |||
| Essentials of Linguistics, 2nd edition | An introductory book (2nd edition) | ||
| Introduction to Linguistics | |||
| Natural Language Processing with Python | The book from the NLTK package | ||
| Text Mining with R | |||
| Foundations of Computational Linguistics | |||
| Foundations of Statistical Natural Language Processing | |||
| Semisupervised Learning for Computational Linguistics | |||
| Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition | |||
| The Oxford Handbook of Computational Linguistics | |||
Standards | |||
| DTA Basisformat | |||
| ISO TC 37 SC 4 | |||
| UIMA | |||
Lists | |||
| 15 most popular books on good reads | |||
| corpus-linguistics | GitHub topics & | ||
| nlp-datasets | 5,802 | over 2 years ago | |
| NLP-progress | 22,742 | over 1 year ago | |
| /r/LanguageTechnology/ | |||
| awesome-nlp | 16,830 | almost 2 years ago | |
| Awesome Community-Curated NLP List | 197 | over 3 years ago | |
| awesome-chinese-nlp | 7,827 | over 2 years ago | |
| awesome-danish | 168 | 11 months ago | |
| awesome-hungarian-nlp | 227 | about 2 years ago | |
| awesome Information Retrieval | 1,076 | over 2 years ago | |
| Indonesian NLP | 279 | almost 4 years ago | |
| Norwegian NLP resources | 178 | over 4 years ago | |
| German NLP resources | 453 | about 1 year ago | |
| awesome-nlp-polish | 293 | over 4 years ago | |
| awesome-spanish-nlp | 332 | almost 2 years ago | |
| M. Weisser's list of NLP/Computational Linguistics Resources | |||
Communities | |||
| Linguistics Stack Exchange | |||
| Untranslatable.co, Multilingual urban dictionary | |||