hundict

Bilingual dictionary extractor

A tool for extracting bilingual dictionaries from parallel corpora by leveraging Python's speed and flexibility.

bilingual dictionary extractor from parallel corpora

GitHub

22 stars
5 watching
2 forks
Language: Python
last commit: over 10 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
juditacs/wikt2dict Tool to parse and process Wiktionary translation data for dictionary creation 53
eyurtsev/kor An open-source wrapper around LLMs to extract structured data from text 1,638
danieljdufour/date-extractor A Python library that extracts dates from plain text 65
szegedai/hun-date-parser A Python package for extracting datetime intervals from Hungarian sentences and converting date objects to text. 8
xyntopia/pydoxtools A Python library for extracting information from unstructured documents using AI techniques and customizable pipelines. 78
gamallo/galextra A multi-language term extractor that uses morphosyntax tagging and filtering to identify multi-word terms from plain text input. 2
tchayintr/best2010_cooker Extracts segmented words from Thai BEST2010 corpus. 2
fox-it/dissect.target Provides a programming API and command line tools to access various data sources inside disk images or file collections. 48
thunlp/thulac-python An efficient Chinese lexical analyzer with morphological analysis capabilities 2,032
belgianbiodiversityplatform/python-dwca-reader A tool to parse and retrieve biodiversity data from archived files 45
danburzo/hred Extracts data from HTML or XML documents to JSON using a CSS selector-like query language 70
zaataylor/wikiref An extension that extracts and edits Wikipedia references with ease 2
51j0/android-storage-extractor A tool to extract local data storage of an Android application in one click. 16
csababarta/ntdsxtract A Python-based tool for extracting and analyzing data from Windows domain controllers to aid in Active Directory forensic investigations 321
eset-la/lord-of-the-strings A tool to extract and classify relevant strings from binary files 9