Minimal errors in data extraction
Time for an AI Agent to get the correct answer
Reduction in response time
This client is a 3rd party loan partner in mortgage processing. They faced challenges with manual data extraction from various tax forms and schedules, which was time-consuming and error-prone. To resolve these issues, they partnered with us to develop a custom AI-driven solution.
Our Generative AI solution uses advanced document embedding models and vector databases to classify and extract data efficiently and accurately. This system accelerates document extraction 10x times and reduces analyst involvement by 90%, achieving near-perfect extraction accuracy.
This customized AI solution now enhances their document processing capabilities. It allows them to handle tasks quicker and more efficiently with less human involvement. By automating the extraction and classification of data with AI, they can speed up the whole process more than other available methods.
Mortgage processing involves handling vast amounts of customer data essential for making lending decisions. This data comes from various documents submitted by customers, such as bank statements, tax returns, schedules, and more.
U.S. tax returns detail income, expenses, and other financial information over a fiscal year. These documents often include structured elements like tables, forms, headers, footers, and labels, each requiring precise data extraction.
Traditionally, their classification and extraction required extensive manual input. However, manual extraction from tax forms was slow, exhausting, and prone to mistakes, delaying further processing steps and decision-making.
The reliance on non-AI systems that lacked contextual understanding and advanced learning capabilities made accurate document processing nearly impossible. The mortgage industry's strict regulations demand high accuracy and reliability, which off-the-shelf automation systems often fail to meet.
Faced with these issues, the client needed a custom Generative AI solution that could handle this processing complexity. They partnered with us to create an application to automate their document-handling workflow.
We opted for a solution that leverages an Optical Character Recognition (OCR) system and a geometric extraction technique to develop an API. This API accepts a PDF file as input, which may contain one or multiple tax returns. It then intelligently identifies each page of the tax return and performs geometric extraction of predefined lines. That means our application can even process incomplete tax forms.
To tackle the specific challenges faced by the client, we created a system to automate the classification and extraction process with the following workflow:
For training, we used completed tax forms (1040, 1065, 1120, 1120s, 4562, 8825) and schedules (1040 Schedule C, 1040 Schedule D, 1040 Schedule E, 1040 Schedule F, 1065 Schedule K1, 1120 Schedule K1) given to us by the client.
We first extracted the text using OCR and then calculated its embedding. This embedding is compared to pre-calculated embeddings to classify the page. We chose a well-known closed-source vector database to store calculated embeddings to speed up this process.
E.g., line item data such as name, address, customer number, etc.
During the training process, we found it difficult to properly identify the correct information to extract from the page, so we created a complex geometric algorithm to identify and extract relevant information accurately.
Line items that were extracted in a JSON output were key-value pairs (e.g., the key is “first name“ and the value is “John“).
This process required extensive customization to satisfy the customer’s specific needs. We opted for JSON, an open standard format that employs human-readable text to store and transmit data. This choice enabled us to effectively structure the extracted information, ensuring straightforward storage, analysis, and conversion.
We developed a custom Generative AI solution that simplifies data interaction, enhances processing accuracy, and reduces response times. Our advanced geometric algorithm precisely identifies and extracts relevant information, achieving nearly 100% accuracy in data extraction.
This acceleration in document processing is faster than other off-the-shelf solutions and increases the speed of operations 10x, marking significant time savings. Analyst involvement has been reduced by 90%, which dramatically enhances operational efficiency and frees up analysts' time to allocate to other tasks.
As the client completes final testing, we are confident the impact on mortgage processing industry practices will become more evident. Using custom AI solutions to meet our clients' needs demonstrates how automation plays a game-changing role in mortgage processing.