MultilingualCorporaExtractor
Corpora extractor
Extracts and formats multilingual corpora from international bibles into XML, JSON, and HTML files for analysis.
Node io Spider for extracting multilingual corpora
0 stars
3 watching
0 forks
Language: JavaScript
last commit: over 11 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| A Node.js web server implementing a lexicon API for the Drag and Drop FieldLinguistics project | 1 |
| Provides a Chrome extension and associated server for accessing definitions from Wiktionary | 6 |
| Demos and examples for utilizing linguistics in natural language processing with Lucene and Solr | 0 |
| Extracts data from HTML or XML documents to JSON using a CSS selector-like query language | 70 |
| Enables CORS requests to connect to CouchDB from other domains | 0 |
| An app for managing and sharing text and audio data in various contexts, adaptable to users' terminology and I-Language. | 79 |
| A .NET framework for extracting text from various document formats across multiple platforms. | 362 |
| Tool for automating pronunciation lexicon creation for low-resource languages using speech recognition and machine learning algorithms. | 1 |
| A web-based interface for browsing and editing lexical data in FieldDB databases | 0 |
| Allows dependencies to be downloaded with specific files extracted and installed during package installation | 0 |
| Identifies unused translations in Ember.js projects to help maintain consistency and accuracy of internationalization. | 48 |
| A Node.js service that uses a morphological analyzer to generate morphemes and glosses for words | 0 |
| A web application dashboard for tracking language learning metrics and statistics. | 0 |
| Extracts binary relationships from English sentences at scale | 543 |
| A CouchDB-powered wiki application with a jQuery interface. | 0 |