aws-pdf-textract-pipeline
PDF extractor
A data pipeline for extracting structured data from PDFs using AWS Textract and cloud-based services
Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript
164 stars
3 watching
18 forks
Language: TypeScript
last commit: 9 months ago
Linked from 1 awesome list
awsaws-cdkaws-textractcdkcloudformationdata-pipelinedynamodbjestlambdapdfpuppeteers3serverlesssnstextracttypescriptwebscraping
Related projects:
Repository | Description | Stars |
---|---|---|
| A CoffeeScript library for extracting text from PDF files and creating searchable documents with OCR capabilities | 28 |
| A Django package that enhances Wagtail's document search with text extraction capabilities using Tesseract and Textract libraries. | 33 |
| A tool to extract text from PDFs and add a searchable layer to them | 279 |
| Evaluates PDF extraction tools' ability to extract meaningful text from scientific articles | 65 |
| A C# library for extracting and analyzing text from PDF files | 1,794 |
| A tool to extract relevant information from text | 17 |
| A multi-language term extractor that uses morphosyntax tagging and filtering to identify multi-word terms from plain text input. | 2 |
| Extracts tables from PDF files using Java | 1,859 |
| A tool to fetch information about public Google documents from various services | 856 |
| A .NET library that uses the PDFtk binary to manipulate and process PDF files | 37 |
| Provides reusable templates and tools for deploying AWS CDK applications | 119 |
| Extracts structured variables from Sass files and makes them available in JavaScript for use in styles or dynamic content. | 186 |
| A package that extracts and works with Go struct fields as values, including type information. | 6 |
| A Quarkus-based microservice to extract text from PDF files | 24 |
| Automates extraction of key circuit information from PDF datasheets/documents to build a database of commercial off-the-shelf IP. | 51 |