Skip to content

This project demonstrates how to extract structured information from PDF documents using a combination of Langchain, OpenAI models, and the DocLing library. It provides a framework for parsing PDFs and leveraging LLMs to identify and format key data points.

Notifications You must be signed in to change notification settings

Danitilahun/Document-processing-Pdf-Structured-Data-Extractor

Repository files navigation

pdf-structured-data-extractor

This project demonstrates how to extract structured information from PDF documents using a combination of Langchain, OpenAI models, and the DocLing library. It provides a framework for parsing PDFs and leveraging LLMs to identify and format key data points.

Key Technologies:

  • Langchain: Used for orchestrating the data extraction process and interacting with LLMs.
  • OpenAI Models: Provides the large language model capabilities for identifying and structuring information.
  • DocLing: A library for processing documents.

About

This project demonstrates how to extract structured information from PDF documents using a combination of Langchain, OpenAI models, and the DocLing library. It provides a framework for parsing PDFs and leveraging LLMs to identify and format key data points.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published