From helping us take the best smartphone shots and aiding our shopping choices, to influencing our entertainment decisions, AI has been present in our lives for years. However, until recently, its presence was subtle, and most everyday users still considered it a future technology.

The turning point came in 2022 when a series of groundbreaking generative AI launches, including DALL-E 2, Midjourney, Stable Diffusion, and notably ChatGPT, captivated mainstream audiences and the tech industry with their text-to-image and text-generation capabilities.

Today, McKinsey estimates that productivity boosts created by generative AI could add up to $4.4 trillion per year to the global economy. Much of that value stems from the automation and enhancement of language-based tasks; therefore, this article will delve into generative AI document understanding, examining its potential to streamline paperwork processes across diverse industries.

What Is Document Understanding?

Document understanding, often referred to as intelligent document processing (IDP), is the use of AI, RPA (Robotic Process Automation), NLP (Natural Language Processing), ML (Machine Learning), OCR (Optical Character Recognition), and other technologies that allow computers to understand and process documents. This includes structured (billing codes, databases, spreadsheets, etc.) and unstructured data (medical notes, emails, call transcriptions, etc.).

Why is it such a big deal? Because traditional, manual document processing is highly time-consuming. A recent study by Zapier sheds light on the administrative burden workers face daily.

the biggest administrative time sinks

Source: Zapier report: How office workers spend time

Worse still, administrative work can stand in the way of other, more critical tasks. For instance, physicians spend up to 19 hours a week on paperwork—nearly half of the work week they could spend tending to patients.

Another issue is the error rate. Even the most meticulous back-office workers will eventually make mistakes, and the risk increases proportionally to the amount of documents that need to be processed.

To expedite and improve the accuracy of document-centered tasks, intelligent document processing uses several components that each serve a specific purpose:


Converting the document file into content machines can read, enabling further processing.


Document understanding tools can automatically discern the type of documents and group them into predefined categories or topics based on their content for easier retrieval.

Text extraction

Capturing relevant data from the categorized documents using OCR.

Sentiment analysis

The process of “understanding” emotions and opinions expressed in the document to determine its general tone.


Knowledge workers double-check the extracted and categorized output to ensure it’s 100% correct.


ML capabilities allow document processing tools to use the output to learn and increase accuracy.

The Role of Generative AI in Document Understanding

Document understanding tools are already extremely useful in eliminating manual paperwork. With generative AI, they can achieve even more profound document process automation.

What are large language models (LLM)?

Large language models are a type of generative AI algorithm designed specifically to understand, process, and generate textual content. For this purpose, LLMs use machine learning and massive volumes of training data to self-learn how to interpret and create text. Notable LLM examples include

  • ChatGPT
  • BERT
  • Bard
  • LaMDA

Text generation

Generative AI’s primary use is content generation, but it has multiple applications in document processing. Generative AI can write reports based on processed documents and offer instant suggestions for writing tasks. It can also quickly convert structured data into reports, narratives, and marketing content.

Document summarization

IDP tools equipped with gen AI capabilities can quickly go through lengthy texts, extract the most relevant information, and condense it into concise summaries. Summaries provide an overview of the document’s content, allowing for easier access to information and faster decision-making.

Content augmentation

IDP’s machine learning functionalities depend on training data to improve. However, in some cases, real-world data may not be available or insufficient. Here’s where Gen AI generates large volumes of synthetic documents with all the essential features of their real-world counterparts, like structure and format, and fills them with diversified content.

Fraud prevention

Generative AI can be applied similarly to improve data security. Historical fraud attempt datasets are limited, and fraud techniques constantly change, which leads to a shortage of reliable training materials for ML models. Gen AI can provide ample training datasets to create robust and adaptable fraud detection models by producing large volumes of realistic, diversified synthetic documents.

Translation and multilingual processing

Compared to traditional automated translation tools, generative AI is much more sensitive to linguistic nuances. This ability, combined with Gen AI’s text-generation capacities, makes it a perfect solution for fast, accurate, and contextually correct document translation. A properly trained and verified model will streamline cooperation in multilingual companies and improve customer communication in businesses that operate in global markets.

Conversational search

Thanks to its deep understanding of natural language, Gen AI can replace keyword-based search methods with conversational prompts. AI models can find links between documents, assess their tone, and interpret context without relying on keywords. Using phrases, users can formulate more specific prompts to retrieve the necessary document faster and more accurately.

he benefits of generative AI in document processing

Generative AI Document Understanding across Industries

The above use cases are universal for all document-based tasks, which allows all industries to take advantage of Gen AI’s capabilities. Here are some of them.

Generative AI in Finance

The finance industry has two characteristics that favor generative AI document understanding: it’s receptive to new technologies and revolves hugely around paperwork. As such, generative AI has the potential to revolutionize many document-based financial functions and processes, including:

  • Research, reporting, and forecasting,
  • Risk assessment,
  • Credit scoring,
  • Fraud prevention,
  • Generating market insights,
  • Loan underwriting and mortgage approval.

Given these opportunities, it’s no surprise that leading financial institutions have already started adopting generative AI. J.P. Morgan applies LLM to screen payment validation, improve processing, and provide its clients with automated insights. Mastercard uses generative AI to enhance customer experience, treasury management, product testing, personalize communication, and reduce bias in credit decisions.

Generative AI in Healthcare

In 2022, the US healthcare system spent $60 billion on administrative tasks, about $18 billion more than the year before. With numbers this high, even minor improvements could likely save medical institutions millions.

Generative AI can assist healthcare providers with a plethora of documentation tasks. Claim processing, health insurance prior authorization, and benefits verification are all relatively simple but laborious processes that could be expedited with generative AI document understanding. Gen AI could be integrated with existing EHR systems for a more significant effect.

Even more human-centric functions, such as customer service, can benefit significantly from Gen AI for easier, real-time document retrieval and detailed patient data search. Gen AI could also easily take over physician paperwork tasks: writing discharge summaries and patient instructions, summarizing lab results, or creating checklists.

Another use case for understanding generative AI documents in healthcare is invoice processing. Thermo Fisher Scientific, a global provider of lab and pharmaceutical equipment and supplies, diagnostic services, and medical software products, decided to automate its payment processes with the help of UiPath Document Understanding. The results were stunning: the solution processed 840,000 invoices annually with 85% accuracy, reducing the time necessary by 70%.

Generative AI in Insurance

According to a World Economic Forum survey, insurance jobs are among the top candidates for Gen AI-based augmentation and automation. 34% of work time dedicated to insurance appraisal can be automated, and 66% can be augmented with LLMs. And for underwriting, 100% of work time can be augmented!

Let’s start with the latter. Generative AI can make risk predictions that factor in variables from applicants’ documents such as age, health history, occupation, etc. Risk calculation is performed automatically and much faster than traditional—manual methods. Additionally, Gen AI-powered insurance document processing significantly reduces error rates and bias.

Regarding claims processing, generative AI can compile data from all necessary documents, such as receipts, medical records, and claim forms. Simple cases can be automated entirely, eliminating manual data entry, while more complex ones are flagged for human verification. Claim tracking can also be automated, allowing for improved visibility.

Data privacy is a severe concern of insurance that can also be solved with the help of generative AI. The algorithm can analyze past fraud attempts and produce synthetic training data for machine learning-based protection. Prior authorization of health insurance also proves challenging, especially in complex cases involving several policies and document types. To help its customer service reps solve such issues, New York Life Insurance Co. develops its own Gen AI-based tool. The AI assistant will facilitate data retrieval, allowing reps to provide faster support without putting customers on hold.

Generative AI in Real Estate

McKinsey estimates that generative AI could bring $110 billion to $180 billion in value for the real estate industry. This value is largely generated through content creation and unstructured data analytics.

Gen AI can quickly create property descriptions based on provided information such as location, number of rooms, area, etc. Automating this process could save dozens of working hours per week and provide input for listings at scale. Models capable of processing images can even produce descriptions based on photos or floor plans.

Using generative AI document understanding, asset managers can expedite the collection and analysis of property data, leading to improved budgeting and forecasting. Reporting, forecasting, risk analysis, finding acquisition candidates, and customer care can all be facilitated with Gen AI or automated completely. Generative models can also analyze data on a property’s energy efficiency to provide insights into potential cost savings, benefitting buyers and sellers.

Leading real estate investment platforms like Keyway use machine learning and generative AI to increase efficiency and improve investor experience. Keyway’s Keypilot assistant manages several tasks throughout the investment process, from property search and selection to drafting investment memorandums, valuation prediction, and contract analysis.

Bottom line

Given the opportunities Gen AI presents today, it’s easy to forget that it’s still a relatively new tool. As generative AI advances and more businesses find new ways to leverage its power for intelligent document processing, the future of productivity couldn’t be more exciting.

At Flobotics, we’re part of this future, offering a combination of tried and true intelligent automation solutions, proven RPA tools, and generative AI to boost workflow efficiency to its limits and beyond. Want to join us? Get in touch, and let’s talk about what we can do for your business.

Like the article? Spread the word

Karl Mielnicki CTO of Flobotics

Karl Mielnicki

Expert and fanatic in RPA - Robotic Process Automation with over 5 years of IT experience working for consulting companies and tech startups. UiPath consultant, an accredited BluePrism developer.