udpipe

Text parser

A trainable pipeline for tokenization, tagging, lemmatizing and parsing of annotated text data

UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files

GitHub

364 stars
28 watching
77 forks
Language: C++
last commit: 8 days ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
itunlp/dapipe A tool for processing and analyzing Danish text data using a pre-trained language model. 7
languagemachines/ucto A tokeniser for natural language text that separates words from punctuation and supports basic preprocessing steps such as case changing 65
kenavolic/pipet A lightweight C++ library for building compile-time processing pipelines with customizable filters and branches. 67
pdpipe/pdpipe A tool for creating and managing data pipelines with pandas DataFrames 716
udellgroup/oboe Automated machine learning system for selecting promising models or pipelines for new datasets 82
dmulyalin/ttp A template-based text parsing library 349
tomaskoutek/logstash-pipeline-parser Parser for Logstash pipeline configuration files 3
dbuenzli/uutf A non-blocking streaming codec for Unicode encoding schemes 32
joboccara/pipes A header-only C++14 library for building expressive data pipelines using a chainable interface. 803
ixa-ehu/ixa-pipe-pos Provides tools for part of speech tagging and lemmatization across multiple languages using machine learning models. 17
dbuenzli/uuseg An OCaml library for segmenting Unicode text into grapheme clusters, words, and sentences. 23
tpolecat/atto A compact, incremental text parsing library for Scala that enables efficient and functional processing of structured data 359
ypares/porcupine A tool that enables data manipulation and analysis pipelines to be flexible, reusable, and reproducible in different environments 89
uni-algo/uni-algo A C/C++ library that provides secure and efficient Unicode algorithms for text processing 280
ada-url/ada A fast and spec-compliant URL parser written in C++ 1,358