spark-nlp

NLP toolkit

Provides a set of pre-trained models and libraries for natural language processing tasks on top of Apache Spark

State of the Art Natural Language Processing

GitHub

4k stars
100 watching
712 forks
Language: Scala
last commit: 6 days ago
Linked from 4 awesome lists

bertentity-extractionlanguage-detectionlemmatizerllamacppllmmachine-translationnamed-entity-recognitionnatural-language-processingnlponnxpart-of-speech-taggerpysparkquestion-answeringsentiment-analysissparkspell-checkertensorflowtext-classificationtransformers

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
tyson925/magyarlanc_spark A Spark-based tool for processing Hungarian text data with Magyarlanc language processing features and optional integration with ElasticSearch. 4
databricks/spark-corenlp Wraps Stanford CoreNLP annotators as Spark DataFrame functions for natural language processing tasks 422
explosion/spacy Industrial-strength NLP library for Python and Cython 30,230
sebastianruder/nlp-progress A comprehensive repository tracking progress in NLP tasks and their corresponding datasets. 22,715
axa-group/nlp.js A comprehensive NLP library for building conversational AI systems with entity extraction, sentiment analysis, language identification, and more. 6,283
dmmiller612/sparktorch A PyTorch implementation on Apache Spark for distributed deep learning model training and inference. 339
spark-notebook/spark-notebook An interactive web-based editor for exploring and analyzing large datasets using Scala, Apache Spark, and other data science tools 3,151
microsoft/synapseml A library for building scalable machine learning pipelines on distributed computing frameworks like Apache Spark 5,068
stanfordnlp/corenlp A Java-based suite of tools for natural language processing and analysis 9,704
bigscience-workshop/promptsource A toolkit for creating and using natural language prompts to enable large language models to generalize to new tasks. 2,696
joblib/joblib-spark Enables parallelization of machine learning tasks on a distributed Spark cluster using the joblib library. 242
explosion/spacy-stanza Wraps the Stanza NLP library to use Stanford models with spaCy 725
apache/spark An analytics engine designed to handle large-scale data processing and analysis 39,916
curiosity-ai/catalyst A C# Natural Language Processing library with pre-trained models and tools for building custom models 739
nlpodyssey/spago A Go-based machine learning library designed to support neural architectures in natural language processing 1,752