AWS Intelligent Document Processing | Amazon Web Services

Summary notes created by Deciphr AI

John Sowell, a solutions architect with AWS, outlines the benefits of AWS Intelligent Document Processing (IDP) for modernizing document workflows. He emphasizes IDP's use of OCR and machine learning to extract and analyze data from various document types, such as forms and invoices, without requiring ML expertise. Sowell explains that IDP can lead to cost savings by reducing human intervention and improving accuracy, thus speeding up customer service. He also discusses IDP's application across industries—finance, medical, and legal—and details the IDP pipeline, from capturing documents to processing with Amazon Textract and Comprehend. The talk includes insights on domain-specific APIs, the importance of document classification, and the integration of generative AI to enhance IDP capabilities, including summarization and data normalization. Sowell concludes by highlighting AWS's commitment to security and compliance, showcasing IDP's potential to streamline insurance claim processing and other document-intensive tasks.

Summary Notes

Modernizing Document Workflows with AWS Intelligent Document Processing (IDP)

AWS IDP uses machine learning (ML) to enhance document workflows.
OCR technology is the foundation, with industry-leading accuracy.
ML extracts information from forms, tables, invoices, and contracts.
IDP classifies documents and recognizes entities unique to businesses.
Transition to IDP may increase software costs but overall costs decrease.
IDP reduces human input, improves accuracy, and prevents bad decisions.
Scalability of IDP helps handle varying workloads and supports business growth.
IDP is applicable in finance, medical, and legal industries.

"At AWS, we built IDP to allow customers to take advantage of industry leading machine learning technologies in their document workflow without the need for ML experience."

AWS IDP is designed to be user-friendly and does not require users to have prior machine learning experience.

"One thing to keep in mind when moving to IDP is that the software costs typically go up compared to legacy solutions, but the overall costs decrease as less human input is needed per document and accuracy improves, resulting in fewer bad decisions caused by inaccurate information."

Transitioning to IDP can be cost-effective in the long run due to reduced labor and increased accuracy.

"Overall, you will be able to serve customers faster since IDP's scalability removes document processing as a bottleneck for your business's growth and it is much easier to handle spiky workloads while not paying for excess capacity during lower periods of demand."

IDP enhances customer service by eliminating document processing delays and efficiently managing workload fluctuations.

Typical IDP Pipeline

Documents are captured in Amazon S3, a durable and scalable object store.
Amazon Textract extracts text from documents.
Amazon Comprehend classifies documents into customer-defined categories.
Classified documents are routed for further processing based on business rules.
Human intervention may be included for verification and corrections.
Document sources can be diverse, including physical scans, desktops, and mobile devices.

"Lets see what a typical IDP pipeline looks like. Going from left to right, it starts with capturing documents in s three, our highly durable and scalable object store."

The IDP pipeline begins with document storage in Amazon S3, emphasizing its durability and scalability.

"Humans may be part of the document processing pipeline also. For example, humans may check any low confidence extraction of text and make corrections if needed."

Human oversight is incorporated within the IDP pipeline to ensure accuracy where ML confidence is low.

Amazon Textract and Comprehend Capabilities

Textract supports text and OCR, including handwriting.
It can extract complex structures like tables, key-value pairs, and signatures.
Textract has specialized APIs for domain-specific use cases.
Analyze Expense API processes invoices and receipts.
Analyze ID API extracts data from identity documents.
Analyze Lending API processes mortgage-related documents.
Textract Queries allows for flexible data extraction across various document formats.

"You may be already familiar with textracs text and OCR capabilities, but we also offer support for handwritten text, as well as the ability to extract more complex structures such as tables, key value pair information, and signatures."

Textract is versatile and can handle various document elements, including handwritten text and complex structures.

"We have three dedicated APIs to tackle important domain specific use cases."

AWS has developed specialized APIs within Textract to address specific industry needs and document types.

Specialized APIs for Domain-Specific Use Cases

Analyze Expense API automates accounts payable and expense management.
Analyze ID API verifies identities and supports online registration without templates.
Analyze Lending API enhances the efficiency of processing mortgage documents.
Textract Queries provides AI-driven flexibility for data extraction needs.

"The first, analyze expense API, is dedicated to the processing of invoices and receipts documents that helps customers automate the process of processing their accounts payable invoicing, expense management, and other workflows that have been historically very difficult to automate."

The Analyze Expense API specifically aids in automating traditionally challenging financial workflows.

"The second is analyze ID API, which is dedicated to data extraction from identity documents such as passports, driver's license, and state ids issued by the United States without the need for any templates or configuration."

The Analyze ID API simplifies the extraction of data from identity documents, streamlining identity verification processes.

"The third is analyze lending API, which enables our customers to efficiently process mortgage documents such as mortgage statements, bank statements, and payoff statements."

The Analyze Lending API is targeted at improving the processing of mortgage-related documents for the lending industry.

Intelligent Document Processing (IDP) Capabilities

IDP technologies facilitate the extraction and processing of information from documents.
They can handle both typed and handwritten text, recognizing checkboxes and table content.
Advanced forms capability can discern specific selections, like job fair participation.
Tables capability retains the relationships between columns and rows within identified tables.

"The advanced forms capability is able to identify that the job fair checkbox was checked but the website was not checked."

This quote explains that the IDP can distinguish between different selections within a form, such as whether certain checkboxes were marked or not.

"And finally, our tables capability is able to identify the information within the tables while maintaining the key relationship between the columns and the rows."

This quote highlights the IDP's ability to understand and preserve the structure of data within tables.

Document Classification

Different documents require distinct extraction criteria; bank statements and pay stubs are given as examples.
Amazon Comprehend uses custom classifiers to categorize documents.
Users provide examples and categories for each document type during the training phase.
Comprehend selects the most accurate model algorithm automatically, simplifying the process for users.
Models can be referenced by name for real-time or asynchronous classification, with asynchronous results stored in an S3 bucket.

"For example, from a bank statement you want to extract the date, the amount, and whether it was a credit or debit for each transaction, whereas from a pay stub you want to extract the gross pay, the deductions, any taxes, and finally the net pay."

This quote illustrates the specific information that needs to be extracted from different types of documents, demonstrating the need for document classification.

"Comprehend uses multiple algorithms in the training process and picks the model that delivers highest accuracy for the training data."

This quote explains how Amazon Comprehend optimizes for accuracy by choosing the best algorithm for the data during the training process.

Use Cases for Sensitive Document Processing

Amazon Comprehend can handle sensitive documents containing Personal Identifiable Information (PII).
It can detect and optionally redact PII in documents.
Comprehend can also identify common entities like organizations, dates, and titles.
Users can train Comprehend to recognize custom entities specific to their business needs.

"Comprehend can detect and optionally redact PII data in the document."

This quote indicates that Amazon Comprehend has the capability to identify and protect sensitive personal information in documents.

"Customers can train Comprehend to detect custom entities such as insurance policy numbers, product codes, or occupation."

This quote shows that Comprehend can be tailored to recognize specialized terms and identifiers relevant to a particular industry or business.

Reducing Human Input in Document Processing

The goal of IDP is to minimize the need for human review and input.
Prior OCR technology often required extensive human intervention.
IDP allows for significant reduction in human input, which can lead to labor savings.
A phased approach to automation is recommended for businesses, aiming for quick wins and long-term goals.

"With IDP, you are able to reduce the amount of human input for your document processes."

This quote emphasizes the efficiency gain from using IDP, which reduces the need for human interaction with documents.

"Many customers plan a multi-step journey to implement the people, process, and technology change in their document processing workflows."

This quote suggests a strategic approach to integrating IDP, indicating that it often involves a gradual implementation process.

Augmenting IDP with Generative AI (FMs)

Generative AI and Foundation Models (FMs) trained on vast data can enhance IDP services.
FMs can summarize text, normalize data formats, and assist with low confidence results.
They can be integrated into various stages of the document workflow, such as classification, extraction, and human review.
FMs can also provide natural language interfaces for interacting with document data.

"For example, the IDP text output may be summarized by FMs to help readers focus on important points, or the text data may feed a corpus of knowledge that FMs may use to answer questions through a chatbot interface."

This quote describes how FMs can summarize information and provide conversational interfaces to engage with document content.

"FMs may also be used to normalize data formats, correct punctuation and grammar, or handle low confidence results that would normally be handled by a human."

This quote details additional functionalities of FMs, including data formatting and correction, as well as addressing uncertain results without human intervention.

Security and Compliance in AWS

AWS prioritizes security, access control, data confidentiality, and compliance.
IAM service manages access control; data is encrypted in transit and at rest.
Amazon VPCs provide isolated networks for workloads.
AWS services help customers meet governance, risk, and compliance requirements.
Customers can verify the compliance status of services like Amazon Textract.

"At AWS, security is top priority. Access control is managed through the AWS IAM service, and data confidentiality is maintained through encryption both at rest and in transit."

This quote confirms AWS's commitment to security and outlines the measures in place, such as IAM for access control and encryption for data protection.

"AWS customers may check the compliance status of each service to determine if it meets their compliance needs."

This quote highlights the transparency and assurance AWS provides to customers regarding compliance with standards and regulations.

IDP in Insurance Claims Processing

IDP can streamline insurance claims processing, improving customer experience and reducing costs.
The demo will showcase an IDP flow aimed at expediting insurance claim handling.

"In the following demo, we will see how IDP can help optimize insurance claims."

This quote introduces a practical application of IDP, indicating that a demonstration will illustrate its use in the context of insurance claims.

Automated Document Processing (ADP) and Intelligent Document Processing (IDP)

ADP and IDP are used to process claims automatically at high volumes.
Dashboards and human judgment are employed as necessary according to business rules.
These processes can be integrated into existing workflows and agent tools.
IDP can handle simple "yes" or "no" decisions based on predefined business rules.
Complex decisions may require human or advanced decision systems.
Over time, automation's role in decision-making can be increased to improve efficiency.
A major challenge in document processing is ensuring the inclusion of all required documents for a claim.

"One of the benefits of IDP is the ability to handle easy yes or easy no decisions based on the customer's business rules."

This quote highlights the benefit of IDP in making straightforward decisions based on preset rules.

"Over time, the share of decisions made by automation can be increased."

This quote emphasizes the potential for increased automation in decision-making processes over time.

Claim Processing and Document Collection

Claims can be submitted through various channels such as mail, email, portals, or smartphones.
The initial step involves collecting and sorting documents, often done by human staff.
Errors in this process can lead to delays or incorrect denials of claims.
The demonstration shows a claim with all necessary documents, a contrast to a previously mentioned auto-rejected claim.

"Claims documents can come in from multiple channels... and figuring out what you have is the first step."

This quote describes the initial stage in the claims process, which is identifying and sorting incoming documents.

"If they make an error, a claim could be delayed or incorrectly denied."

This quote explains the consequences of human error in the document collection and sorting process.

AI Recognition and Human Review

AI is used to recognize and label documents within a claim.
Documents with a confidence score below a certain threshold, such as 85%, require human review.
Human in the loop review is crucial for documents that AI cannot confidently classify.
An example is given where a scanned image of an insurance ID needs human verification due to a low confidence score.

"Our AI was able to recognize and label all ten of the documents, but the confidence score of the third one did not meet the business requirements that require human review of scores with a less than 85% confidence level."

This quote explains how AI classifies documents and identifies those that need human review based on confidence scores.

"It is the insurance ID and we can see this is the scanned image of the insurance ID and in this example, this would typically be sent to a human in the loop review."

This quote illustrates a practical example of how documents with low confidence scores are flagged for human review.

AI Classification and Large Language Models (LLMs)

Generative AI and LLMs assist in classifying documents that are not easily identifiable.
LLMs can determine the type of document, such as an insurance report, even with low initial confidence scores.
Human verification can confirm the classification made by the LLM.

"Here we can see how using generative AI's foundation models can help with classifying this so we can launch a LLM classifier."

This quote discusses the use of generative AI and LLMs to classify ambiguous documents.

"The LLM is used to classify the document. We agree with the decision and we select and the indication is that this was manually classified via a LLM."

This quote describes the process of using an LLM for document classification and the subsequent human confirmation of the classification.

AI Extraction of Key Value Pairs and Structured Data

AI analyzes forms such as the CMS 1500 to extract key value pairs and structure the results.
The structured data can be easily used by downstream business systems or databases.
AI can detect tables, columns, and rows within documents, maintaining their relationships.
Queries can be made against documents to find specific information without knowing the exact field names.

"The AI is able to read the key value pair and structure the results so they can be easily used by your downstream business systems or databases."

This quote highlights the AI's capability to extract and structure data for use in other systems.

"We are also able to provide a query against the document and this would be useful when you do not know what the fields are or if the fields vary between documents."

This quote explains the flexibility of AI in querying documents for information without needing predefined field names.

Human in the Loop Review and Augmented AI

Uncertain cases are referred to human reviewers through an augmented AI feature.
Human review is essential for verifying AI decisions and ensuring accuracy.
The process allows for a combination of AI efficiency and human oversight.

"Now for those cases where there is a low level of certainty, we can send this through a human in the loop review via our augmented AI feature."

This quote discusses the use of human reviewers to handle cases where AI certainty is low, ensuring the accuracy of the process.

Document-Specific AI Capabilities

AI can capture signatures and collect raw text from documents.
The same form capabilities are used to process various types of documents, such as explanations of benefits.
Key value pairs are extracted from different document types, such as insurance cards.

"We can also collect the raw text of the information."

This quote indicates AI's ability to extract unstructured text from documents.

"Here we see an explanation of benefits and the same form capability is used to capture the form information as well as a table which is contained here."

This quote exemplifies AI's versatility in processing different forms and extracting structured data.

AI Processing of Implied Fields in Documents

AI can recognize and interpret implied fields in documents, such as an address without a labeled field.
AI normalizes field names across different ID documents, for example, interpreting 'ln' as 'last name'.
AI prevents misrecognition of names, ensuring accurate identification.

"Many documents have implied fields. Our AI understands that the person lives at 123 any street. In this example, even though the field isn't labeled."

This quote explains that the AI is capable of understanding context to identify information such as addresses even when not explicitly labeled.

"Our technology also normalizes field names across id documents, such as last name, which in this example is listed as ln, and on other ids, it may not be listed at all, or it may be listed as last name."

This quote highlights the AI's ability to standardize field names for consistency across various forms of identification.

"This also prevents a mistake of recognizing the person's name as lndo."

The quote emphasizes the importance of AI in preventing errors in name recognition.

AI Understanding of Invoices and Receipts

AI can interpret invoices and receipts by recognizing fields like subtotal, discount, tax, and total.
AI is capable of extracting line item details from invoices, similar to form tables.

"Our AI also has advanced capabilities to understand invoices and receipts using implied context and standardized field names to reduce the amount of processing effort."

The quote indicates that AI uses context and standardization to efficiently process financial documents.

AI Extraction and Training for Unique Business Needs

AI can be trained to recognize specific entities in dense text documents, such as patient names or IDs in medical documents.
AI predictions come with confidence scores to assist human verification.
Training AI for entity recognition is also applicable to legal contracts and other complex documents.

"Many businesses need to find information that is unique to their use case within dense text documents, you can train our AI to recognize entities such as the patient's name, the date they were admitted to the hospital, or their patient id."

This quote explains that AI can be customized to identify business-specific information in detailed documents.

Generative AI in Document Processing

Generative AI can summarize information in documents, like insurance reports.
AI can answer questions about document content, improving interactivity and accessibility.
Generative AI can be integrated into chatbot systems for dynamic information retrieval.

"In this insurance report, we see how a LLM, which is hosted on Sagemaker, can be used to summarize the information contained in the insurance report."

The quote describes how generative AI, specifically a Large Language Model (LLM), can summarize complex documents.

"We also can use the generative AI to ask questions to find information about the document."

This quote suggests that generative AI can be used for querying documents to extract specific information.

AI Accommodation for Skewed Scans and Medical Transcriptions

AI can accurately extract information from documents scanned at an angle.
Health AI identifies relationships among extracted health information and links it to medical ontologies like ICD-10 or SNOMED.

"Here we see that the transcription document was scanned askew, but the IDP solution was able to accommodate this and extract out the information correctly."

The quote demonstrates the AI's ability to handle imperfectly scanned documents and still accurately extract information.

AI Extraction from Medical Prescriptions

AI can extract information from challenging images, such as glossy or difficult-to-read medical prescriptions.
Extracted information is linked to medical ontologies like RX norm.

"Here we see the image. It's kind of glossy, somewhat difficult to read, but the IDP solution extracted out the information correctly and then linked it to the RX norm medical ontology."

This quote shows the AI's capability to interpret and contextualize information from suboptimal images of medical prescriptions.

AI Identification and Redaction of Sensitive Information

AI identifies and redacts Personally Identifiable Information (PII) and Protected Health Information (PHI), including names, IDs, and dates.

"In this example, the names of the doctor and the patient were identified as PII."

The quote illustrates the AI's ability to detect and protect sensitive personal information within documents.

Generative AI Normalization of Information

Generative AI normalizes dates and names into standard formats.
AI corrects extracted text from handwritten notes, including grammar and spelling.

"Genai may be used to normalize information into a standard format. So in this case it is taking the dates and normalizing it in standard format in month month data and year year format."

This quote describes how generative AI can standardize date formats for consistency.

"We can send this information into the LLM which will correct and correct the text, including grammar as well as spelling."

The quote explains how generative AI can refine and correct textual information, improving the accuracy of document processing.

Review and Validation Stage in AI Document Processing

AI processes documents, extracts key fields with appropriate confidence levels, and cross-validates information across documents.
Verification ensures consistency, such as matching insurance IDs and patient names across different forms.

"Here we can see that we have collected and processed all the documents within the claim, that we have extracted out the requisite key fields with the correct confidence level, and then we have cross validated the information from documents."

The quote indicates a comprehensive review process where AI checks for consistency and accuracy in the extracted information.

Conclusion of IDP Presentation

The presentation summarized Intelligent Document Processing (IDP) at AWS, its optimization of document workflows, and augmentation with generative AI.
IDP and generative AI create summaries, normalize outputs, and provide a natural language interface for document data interaction.

"At AWS, we learned what IDP is, how it helps customers optimize their document processing workflows, and how using generative AI can augment IDP by creating summaries, normalizing outputs, and providing a natural language interface to interact with the document data."

This concluding quote summarizes the main points of the presentation, highlighting the benefits of IDP and generative AI in document processing.

What others are sharing

Go To Library

Andrew Ng: Building Faster with AI

The Fitness Scientist: "Even A Little Alcohol Is Hurting Your Health!" Kristen Holmes

First Acquisition in March, $200m by Year End | Jordan Dubin Interview