MultilingualCorporaExtractor

Corpora extractor

Extracts and formats multilingual corpora from international bibles into XML, JSON, and HTML files for analysis.

Node io Spider for extracting multilingual corpora

GitHub

0 stars

3 watching

0 forks

Language: JavaScript

last commit: about 13 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

richardlitt/low-resource-languages

Related projects:

Repository	Description	Stars
fielddb/lexiconwebservicesample	A Node.js web server implementing a lexicon API for the Drag and Drop FieldLinguistics project	1
fielddb/dictionarychromeextension	Provides a Chrome extension and associated server for accessing definitions from Wiktionary	6
fielddb/lucenerevolution-2013	Demos and examples for utilizing linguistics in natural language processing with Lucene and Solr	0
danburzo/hred	Extracts data from HTML or XML documents to JSON using a CSS selector-like query language	70
fielddb/corpuswebservice	Enables CORS requests to connect to CouchDB from other domains	0
fielddb/fielddb	An app for managing and sharing text and audio data in various contexts, adaptable to users' terminology and I-Language.	79
nissl-lab/toxy	A .NET framework for extracting text from various document formats across multiple platforms.	362
fielddb/lex4all	Tool for automating pronunciation lexicon creation for low-resource languages using speech recognition and machine learning algorithms.	1
fielddb/fielddblexicon	A web-based interface for browsing and editing lexical data in FieldDB databases	0
lastcallmedia/composerextrafiles	Allows dependencies to be downloaded with specific files extracted and installed during package installation	0
mainmatter/ember-intl-analyzer	Identifies unused translations in Ember.js projects to help maintain consistency and accuracy of internationalization.	48
fielddb/lexiconwebservice	A Node.js service that uses a morphological analyzer to generate morphemes and glosses for words	0
fielddb/languageclassdashboard	A web application dashboard for tracking language learning metrics and statistics.	0
knowitall/reverb	Extracts binary relationships from English sentences at scale	543
fielddb/octothorpe	A CouchDB-powered wiki application with a jQuery interface.	0