LangChain + Docugami: Chat with Your Own Business Documents
At Docugami, we recognize that systems like ChatGPT that are built with general-purpose Large Language Models (LLMs) offer fantastic capabilities. However, they don't enable you to chat with your own business documents and can also be inaccurate for many business, financial, legal, and scientific scenarios because they are trained on the public internet, which introduces a wide range of low-quality source materials.
Today, we are thrilled to announce an initial integration of LangChain with Docugami. LangChain is one of the most popular frameworks for simplifying the creation of applications using LLMs. Coupling LangChain with Docugami’s unique ability to generate a Document XML Knowledge Graph Representation of long-form Business Documents opens the door for LangChain developers to build the most accurate applications that can enable users to chat with their own Business Documents, without being limited by document size or context window restrictions.
To get started, follow the quick start guide here. Tag us @docugami on twitter to share your results and experience. We welcome your technical questions, please post them on our new Docugami Discord.
We are excited to see what you will build with this integration!
How Docugami Enhances Document Understanding
We believe that systems aiming to understand the content of documents, such as retrieval and question-answering, can greatly benefit from Docugami's semantic Document XML Knowledge Graph Representation. Our unique approach to document chunking allows for better understanding and processing of your documents:
- Intelligent Chunking: Docugami breaks down every document into a hierarchical semantic XML tree of chunks of varying sizes, from single words or numerical values to entire sections. These chunks follow the semantic contours of the document, providing a more meaningful representation than arbitrary length or simple whitespace-based chunking.
- Structured Representation: In addition, the XML tree indicates the structural contours of every document, using attributes denoting headings, paragraphs, lists, tables, and other common elements, and does that consistently across all supported document formats, such as scanned PDFs or DOCX files. It appropriately handles long-form document characteristics like page headers/footers or multi-column flows for clean text extraction.
- Semantic Annotations: Chunks are annotated with semantic tags that are coherent across the document set, facilitating consistent hierarchical queries across multiple documents, even if they are written and formatted differently. For example, in set of lease agreements, you can easily identify key provisions like the Landlord, Tenant, or Renewal Date, as well as more complex information such as the wording of any sub-lease provision or whether a specific jurisdiction has an exception section within a Termination Clause.
- Additional Metadata: Chunks are also annotated with additional metadata, if a user has been using Docugami. This additional metadata can be used for high-accuracy Document QA without context window restrictions. See detailed code walk-through in this notebook.
With this Docugami integration, LangChain users can now unlock the full potential of their own Business Documents, and we can't wait to see the innovative solutions you create.
We LOVE the LangChain community, and are expanding our standard 14-day free-trial for LangChain users: Start a free Docugami trial and file a support ticket mentioning LangChain to receive an extended 30-day trial with an upgraded 2000-page limit.