galeXtra

Term extractor

A multi-language term extractor that uses morphosyntax tagging and filtering to identify multi-word terms from plain text input.

Multiword Extractor for Portuguese, English, Spanish, Galician, French

GitHub

2 stars
2 watching
1 forks
Language: Shell
last commit: over 8 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
gamallo/citiussentiment A Perl-based sentiment analysis tool for analyzing text in multiple languages. 7
dwisiswant0/galer A tool to extract URLs from HTML attributes by crawling in and evaluating JavaScript 255
zoomio/tagify An application that extracts keywords from text sources 38
theacharya/markersextractor Tools and library to extract metadata from Final Cut Pro's FCPXML data export format 39
recrm/archivetools A collection of tools for extracting and analyzing data from web archives 71
darccio/pipar A tool for extracting and processing data from political parties' registries 3
eset-la/lord-of-the-strings A tool to extract and classify relevant strings from binary files 9
aymericbeaumet/squeeze A tool to extract relevant information from text 17
limiu82214/gojmapr A library to extract specific properties from complex JSON structures into Go structs with minimal code changes. 22
eyurtsev/kor An open-source wrapper around LLMs to extract structured data from text 1,638
apertium/apertium-glg A software package providing linguistic data and tools for analyzing and generating text in the Galician language. 0
gmarty/xgettext Tools for extracting translatable strings from source code written in template languages. 77
ftramer/lm_memorization A tool to extract memorized content from large language models like GPT-2 by analyzing their training data 179
gamallo/deppattern A Perl-based dependency parsing system for multiple Romance languages, including grammar compiler and parser generator. 10
pxyup/fitter A utility for extracting and processing data from various sources, including APIs, websites, and static text 120