colibri-core

Pattern extractor

A C++ and Python library for efficiently counting and extracting patterns from large corpus data

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool colibri-patternmodeller whi ch allows you to build, view, manipulate and query pattern models.

GitHub

124 stars
12 watching
20 forks
Language: C++
last commit: about 1 year ago
Linked from 2 awesome lists

c-plus-pluscomputational-linguisticscorpuslibrarylinguisticsngramngramsnlppattern-recognitionpythonskipgramtext-processing

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
proycon/python-frog A Python binding to a C++ NLP tool for Dutch language processing tasks 47
proycon/pynlpl A Python library for natural language processing tasks, including text manipulation and analysis. 479
zaibacu/rita-dsl A DSL for building custom NLP patterns from manual language rules 65
pymorphy2/pymorphy2 A morphological analyzer and generator for Russian and Ukrainian languages 1,123
sergioburdisso/pyss3 A Python package implementing an interpretable machine learning model for text classification with visualization tools 336
patois/hexraystoolbox A toolset for analyzing and identifying patterns in compiled code from various architectures. 438
ppke-nlpg/anagramma-parser An implementation of a computational model for linguistic analysis based on cognitive inspiration 1
joakim-brannstrom/dextool A set of tooling plugins built on top of the LLVM/Clang compiler infrastructure to analyze and improve C/C++ code quality. 101
patterns-ai-core/langchainrb A Ruby library providing an interface to Large Language Model (LLM) providers for text generation and embedding 1,415
cidles/poio-analyzer A collection of software tools for linguists to manage and analyze linguistic data 13
flo-compbio/monet An open-source Python package for analyzing scRNA-Seq data using PCA-based latent spaces 39
nccgroup/pybeacon A collection of Python scripts for analyzing and interacting with Cobalt Strike beacons. 167
proycon/foliapy A comprehensive Python library for parsing and processing FoLiA documents used in Natural Language Processing. 18
cytomining/pycytominer A Python package for processing high-dimensional data from microscopy imaging experiments 80
jkkummerfeld/berkeley-coreference-analyser Analyze and classify errors in coreference resolution output 29