spark-nlp
NLP toolkit
Provides a set of pre-trained models and libraries for natural language processing tasks on top of Apache Spark
State of the Art Natural Language Processing
4k stars
100 watching
712 forks
Language: Scala
last commit: 6 days ago
Linked from 4 awesome lists
bertentity-extractionlanguage-detectionlemmatizerllamacppllmmachine-translationnamed-entity-recognitionnatural-language-processingnlponnxpart-of-speech-taggerpysparkquestion-answeringsentiment-analysissparkspell-checkertensorflowtext-classificationtransformers
Related projects:
Repository | Description | Stars |
---|---|---|
tyson925/magyarlanc_spark | A Spark-based tool for processing Hungarian text data with Magyarlanc language processing features and optional integration with ElasticSearch. | 4 |
databricks/spark-corenlp | Wraps Stanford CoreNLP annotators as Spark DataFrame functions for natural language processing tasks | 422 |
explosion/spacy | Industrial-strength NLP library for Python and Cython | 30,230 |
sebastianruder/nlp-progress | A comprehensive repository tracking progress in NLP tasks and their corresponding datasets. | 22,715 |
axa-group/nlp.js | A comprehensive NLP library for building conversational AI systems with entity extraction, sentiment analysis, language identification, and more. | 6,283 |
dmmiller612/sparktorch | A PyTorch implementation on Apache Spark for distributed deep learning model training and inference. | 339 |
spark-notebook/spark-notebook | An interactive web-based editor for exploring and analyzing large datasets using Scala, Apache Spark, and other data science tools | 3,151 |
microsoft/synapseml | A library for building scalable machine learning pipelines on distributed computing frameworks like Apache Spark | 5,068 |
stanfordnlp/corenlp | A Java-based suite of tools for natural language processing and analysis | 9,704 |
bigscience-workshop/promptsource | A toolkit for creating and using natural language prompts to enable large language models to generalize to new tasks. | 2,696 |
joblib/joblib-spark | Enables parallelization of machine learning tasks on a distributed Spark cluster using the joblib library. | 242 |
explosion/spacy-stanza | Wraps the Stanza NLP library to use Stanford models with spaCy | 725 |
apache/spark | An analytics engine designed to handle large-scale data processing and analysis | 39,916 |
curiosity-ai/catalyst | A C# Natural Language Processing library with pre-trained models and tools for building custom models | 739 |
nlpodyssey/spago | A Go-based machine learning library designed to support neural architectures in natural language processing | 1,752 |