Technical
September 19, 2024

RAG Pipeline Diagram: How to Augment LLMs With Your Data

This quick sketch of the RAG pipeline diagram will help you understand how you can enhance your AI by leveraging organizational unstructured documents.
Grab your AI use cases template
Icon Rounded Arrow White - BRIX Templates
Oops! Something went wrong while submitting the form.
RAG Pipeline Diagram: How to Augment LLMs With Your Data
šŸ–ļø We created an entire series on making your data AI-ready. Our approach leverages Unstructured AI, a specialized ETL layer for processing unstructured document formats.

ā€

The Retrieval Augmented Generation (RAG) pipeline is essential for boosting your large language model (LLM) with additional data sources.

A RAG pipeline helps the LLM retrieve relevant information in real time and provide context-specific responses.

Hereā€™s a quick sketch of a RAG pipeline diagram that shows how you can augment LLMs with your own data:

RAG pipeline diagram

Letā€™s break it down component by component.

ā€

Key Takeaways

  • RAG pipeline enhances large language models by integrating real-time external data retrieval, enabling context-specific responses rather than relying on static, pre-trained information.
  • Key components of our RAGĀ pipeline include unstructured documents, Unstructured AI, intelligent chunking in JSONĀ format, embedding models, vector and conventional databases, an orchestrator, and the LLM.
  • Unstructured AI Agent processes complex documents into structured outputs, converting semi-structured reports into useable formats and creating numerical embeddings for efficient retrieval.
  • The orchestrator manages the entire RAGĀ pipeline by handling user queries, triggering data retrieval processes, filtering and ranking results, and ensuring the LLMĀ receives the most relevant information to generate contextually appropriate responses.

What Is a RAG Pipeline?

RAG pipeline combines two key components:

  • Retrieval - The pipeline receives relevant information from external sources (relevant documents, databases, or APIs) based on user queries.
  • Generation - The retrieved data is passed on to the LLM, which generates contextually enriched and accurate responses.

Such a pipeline is designed to help LLMs overcome the limitation of relying on static, pre-trained data. It enables them to provide up-to-date and domain-specific responses.

For example, a RAG pipeline can streamline claim processing by enabling the retrieval of policy documents, prior claim history, and relevant regulations. When an agent receives a new claim, they can instantly access similar past cases, review documentation, and generate an initial assessment for the customer.

For bankers, an RAG pipeline can improve mortgage processing by enabling real-time retrieval of historical loan data, credit reports, and market analyses. When a loan officer is evaluating a mortgage application, the system can provide insights into similar loansā€™ performance and relevant underwriting guidelines. This allows faster and more informed decision-making.

RAG Pipeline Diagram: Key Components

Key components of a RAG pipeline diagram

Our RAG pipeline consists of the following key components:

  • Documents
  • Unstructured AI
  • JSON with intelligent chunking
  • Embedding model
  • Vector and conventional databases
  • Orchestrator
  • LLM

Documents

In the RAG pipeline, documents refer to unstructured (or semi-structured) external data sources that provide valuable information that needs to be retrieved. Such documents usually come in formats like PDF, PPT, DOC, CSV, PNG, and more.

These documents contain domain-specific knowledge that LLM does not inherently possess. However, to integrate these documents into the pipeline, they must be processed through intelligent chunking or segmentation to enable more efficient and targeted retrieval.

Accessing information in these documents helps enhance LLMā€™s outputs, so the following process of structuring and integrating this data is crucial to improve the LLMā€™s responses with factual, up-to-date, and relevant information.

Unstructured AI

Unstructured AI serves as a sophisticated Extract, Transform, Load layer (ETL) designed to process complex and unstructured documents for RAG architectures, i.e., to convert them into structured outputs.

Unstructured AI works in the following way:

  1. Semi-structured reports - Unstructured AI receives semi-structured reports in various formats, like PDF and DOCX.
  2. Table and chart extraction - Tables are converted into CSV or Excel formats, while characters are extracted as PNGs with semantic descriptions.
  3. Intelligent chunking - Unstructured AI organizes text semantically while retaining hierarchical structure.
  4. Create and store embeddings - Numerical embeddings are generated and stored in a vector database.
  5. Downstream RAG or GenAI - Unstructured AI queries the vector database to retrieve relevant information.
Unstructured AI process

JSON With Intelligent Chunking

Text is broken down into meaningful segments and is represented in a structured JSON format. This method combines intelligent chunking with JSONā€™s flexibility. The key features include:

  • Semantic segmentation - Breaking text into coherent units (paragraphs and sections).
  • Hierarchical structure - Preserves relationships between headers, subheaders, and body.
  • Metadata retention - Each chunk retains relevant data (positions, timestamps, etc.)
  • Formatting preservation - Retains elements like lists and tables.
  • Size optimization - Balances chunk sizes for efficient processing.
  • Cross-referencing - Maintains connections between related chunks for context-aware searches.

Such a structure provides efficient processing and nuanced retrieval by embedding models.

Embedding Model

An embedding model in an RAG pipeline converts textual chunks of data into dense vector representations (embeddings). These embeddings capture semantic relationships within the text in a way that allows for mathematical comparisons.

This enables the system to perform similarity searches and retrieve relevant information based on the semantic content rather than just keyword matching.

ā€œSemantic contentā€ implies that the model captures the meaning or context of the text versus just individual words, enabling it to recognize relationships between phrases with similar meanings, even if they use different words. Ā 

This allows for a more accurate retrieval of relevant information by comparing the underlying concepts rather than exact word matches.

An embedding model helps employees quickly find and retrieve relevant documents.

For example, if a bank employee searches for ā€œmortgage underwriting guidelines for self-employed individualsā€, a traditional keyword system may only show exact phrase matches.

An embedding model, however, understands the context and can also retrieve related terms like ā€œhome loan policies for freelancersā€.

This capability improves efficiency by providing faster access to relevant information and speeding up decision-making.

Vector and Conventional Databases

Vector and conventional databases are a crucial part of the RAG pipeline because they allow for the storage and retrieval of information.

Vector database stores and indexes on the embeddings generated by the embedding model. In vector databases, these embeddings represent document chunks, which make similarity search very efficient.

When the user submits a query, its embedding is compared to those stored in the vector database, retrieving the most relevant chunks after a semantic search.

Conventional databases store structured data, such as document links, metadata, and other relevant information that might not require semantic embedding. Therefore, a conventional database handles relational data and allows for traditional query operations like filtering by specific fields and attributes.

User Query

User query represents the input or request made by a user seeking specific information or assistance.

The query is usually formatted in natural language and can include questions, prompts, or commands.

The RAG pipeline processes the query by retrieving relevant information from an external knowledge base and generating a coherent response using an LLM.

The processed user query is used to retrieve relevant information, which the language model combines with generative capabilities to produce accurate and contextually appropriate responses.

Orchestrator

RAG orchestrator

The orchestrator manages the flow of data and interactions between various parts of the RAG pipeline.

It acts as a control center to ensure each step in the pipeline functions efficiently.

The key tasks of the orchestrator include:

  • Query management - After the user submits the request, the orchestrator routes it to the appropriate component, such as an embedding model or vector database, to ensure the correct sequence of actions.
  • Data retrieval - Orchestrator triggers the retrieval process by sending a query to the vector database to retrieve the most relevant chunks of information from the embeddings.
  • Processing and ranking - The orchestrator can also handle the post-retrieval process to filter results, rank them based on relevance, and merge outputs from multiple data sources to ensure the best possible response.
  • Interactions with LLM - Orchestrator passes the most relevant chunks to the LLM for final response generation, enriching the modelā€™s answer with up-to-date data.

The orchestratorā€™s coordination of complex interactions helps improve the efficiency of retrieval, processing, and generation for accurate and efficient result delivery.

LLM

LLM is the final and critical component of the RAG pipeline, responsible for generating human-like, context-aware responses based on both the retrieved data and its own pre-trained knowledge.

It is responsible for:

  • Contextual understanding
  • Response generation
  • Data integration
  • Natural language output

LLM can interpret the input query, understand the userā€™s intent, and combine raw data in its internal understanding to generate a well-informed, coherent, and contextually enriched response.

Thatā€™s why the LLM is commonly referred to as the ā€œfaceā€ of the system, as it transforms the retrieved data into a user-friendly, intelligent response.

Augment LLMs Using Your Data With RAG Pipeline

Need help getting more value from your data? Please schedule a free 30-minute call with our experts. We can discuss your needs and show you how Unstructured AI works live.

In this article

Schedule a free,ā€Ø30-minute call

Explore how our AI Agents can help you unlock enterprise-wide automation.

See how AI Agents work in real time

Learn how to apply them to your business

Discuss pricing & project roadmap

Get answers to all your questions