MathOCR

Document analyzer

A software project that enables the recognition and analysis of printed scientific documents, particularly focusing on mathematical expressions.

A scientific document recognition system

GitHub

168 stars

11 watching

41 forks

Language: Java

last commit: over 3 years ago

Linked from 1 awesome list

latexoptical-character-recognitionscientific-document-recognition

Backlinks from these awesome lists:

kba/awesome-ocr

Related projects:

Repository	Description	Stars
jsv4/opencontracts	A document analytics platform providing features for managing documents, extracting layout information and vector embeddings, annotating documents, and querying them using LlamaIndex.	728
bobld/documentlayoutanalysis	Develops tools and algorithms for analyzing layout and structure of documents in PDF format	591
icij/datashare	An application that helps investigate journalists analyze and search documents, using natural language processing and entity recognition techniques.	601
mingyuan-xia/patdroid	An Android-specific toolkit for analyzing and understanding APK files	118
open-korean-text/elasticsearch-analysis-openkoreantext	An Elasticsearch analyzer plugin for analyzing Korean text using the Open-Korean Text module.	127
tingxueronghua/chartllama-code	A multimodal LLM for understanding and generating charts in various formats.	202
wangqianwen0418/discrilens	A tool for analyzing and visualizing discrimination in machine learning models	6
ddmcdonald/sparser	A model-driven language text analysis system with a rule-based approach to extract information from large volumes of text	57
runem/web-component-analyzer	Analyzes web components and emits documentation in various formats	509
uglytoad/pdfpig	A C# library for extracting and analyzing text from PDF files	1,794
ohjeongwook/darungrim	Analyzes software patches to identify vulnerabilities and weaknesses	359
johannesbuchner/languagecheck	A tool to analyze and improve the language of scientific papers before submission.	98
tylabs/qs_old	A tool to analyze and extract malicious content from office documents and executables	126
dlang-community/d-scanner	Analyzes D source code for syntax, style, and security issues	242
ckorzen/pdf-text-extraction-benchmark	Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles	65