colibri-core
Pattern extractor
A C++ and Python library for efficiently counting and extracting patterns from large corpus data
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool colibri-patternmodeller
whi ch allows you to build, view, manipulate and query pattern models.
124 stars
12 watching
20 forks
Language: C++
last commit: about 1 year ago
Linked from 2 awesome lists
c-plus-pluscomputational-linguisticscorpuslibrarylinguisticsngramngramsnlppattern-recognitionpythonskipgramtext-processing
Related projects:
Repository | Description | Stars |
---|---|---|
proycon/python-frog | A Python binding to a C++ NLP tool for Dutch language processing tasks | 47 |
proycon/pynlpl | A Python library for natural language processing tasks, including text manipulation and analysis. | 479 |
zaibacu/rita-dsl | A DSL for building custom NLP patterns from manual language rules | 65 |
pymorphy2/pymorphy2 | A morphological analyzer and generator for Russian and Ukrainian languages | 1,123 |
sergioburdisso/pyss3 | A Python package implementing an interpretable machine learning model for text classification with visualization tools | 336 |
patois/hexraystoolbox | A toolset for analyzing and identifying patterns in compiled code from various architectures. | 438 |
ppke-nlpg/anagramma-parser | An implementation of a computational model for linguistic analysis based on cognitive inspiration | 1 |
joakim-brannstrom/dextool | A set of tooling plugins built on top of the LLVM/Clang compiler infrastructure to analyze and improve C/C++ code quality. | 101 |
patterns-ai-core/langchainrb | A Ruby library providing an interface to Large Language Model (LLM) providers for text generation and embedding | 1,415 |
cidles/poio-analyzer | A collection of software tools for linguists to manage and analyze linguistic data | 13 |
flo-compbio/monet | An open-source Python package for analyzing scRNA-Seq data using PCA-based latent spaces | 39 |
nccgroup/pybeacon | A collection of Python scripts for analyzing and interacting with Cobalt Strike beacons. | 167 |
proycon/foliapy | A comprehensive Python library for parsing and processing FoLiA documents used in Natural Language Processing. | 18 |
cytomining/pycytominer | A Python package for processing high-dimensional data from microscopy imaging experiments | 80 |
jkkummerfeld/berkeley-coreference-analyser | Analyze and classify errors in coreference resolution output | 29 |