galeXtra
Term extractor
A multi-language term extractor that uses morphosyntax tagging and filtering to identify multi-word terms from plain text input.
Multiword Extractor for Portuguese, English, Spanish, Galician, French
2 stars
2 watching
1 forks
Language: Shell
last commit: over 8 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
gamallo/citiussentiment | A Perl-based sentiment analysis tool for analyzing text in multiple languages. | 7 |
dwisiswant0/galer | A tool to extract URLs from HTML attributes by crawling in and evaluating JavaScript | 255 |
zoomio/tagify | An application that extracts keywords from text sources | 38 |
theacharya/markersextractor | Tools and library to extract metadata from Final Cut Pro's FCPXML data export format | 39 |
recrm/archivetools | A collection of tools for extracting and analyzing data from web archives | 71 |
darccio/pipar | A tool for extracting and processing data from political parties' registries | 3 |
eset-la/lord-of-the-strings | A tool to extract and classify relevant strings from binary files | 9 |
aymericbeaumet/squeeze | A tool to extract relevant information from text | 17 |
limiu82214/gojmapr | A library to extract specific properties from complex JSON structures into Go structs with minimal code changes. | 22 |
eyurtsev/kor | An open-source wrapper around LLMs to extract structured data from text | 1,638 |
apertium/apertium-glg | A software package providing linguistic data and tools for analyzing and generating text in the Galician language. | 0 |
gmarty/xgettext | Tools for extracting translatable strings from source code written in template languages. | 77 |
ftramer/lm_memorization | A tool to extract memorized content from large language models like GPT-2 by analyzing their training data | 179 |
gamallo/deppattern | A Perl-based dependency parsing system for multiple Romance languages, including grammar compiler and parser generator. | 10 |
pxyup/fitter | A utility for extracting and processing data from various sources, including APIs, websites, and static text | 120 |