pdf-structured-data-extractor

This project demonstrates how to extract structured information from PDF documents using a combination of Langchain, OpenAI models, and the DocLing library. It provides a framework for parsing PDFs and leveraging LLMs to identify and format key data points.

Key Technologies:

Langchain: Used for orchestrating the data extraction process and interacting with LLMs.
OpenAI Models: Provides the large language model capabilities for identifying and structuring information.
DocLing: A library for processing documents.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
app		app
data		data
README.md		README.md
data_extraction_llms.ipynb		data_extraction_llms.ipynb
docling_document_processing.py		docling_document_processing.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf-structured-data-extractor

About

Releases

Packages

Languages

Danitilahun/Document-processing-Pdf-Structured-Data-Extractor

Folders and files

Latest commit

History

Repository files navigation

pdf-structured-data-extractor

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages