About the client
| Parameter | Value |
|---|---|
| Industry | Financial services |
| Size | Mid-size company |
| Sponsor | Sales division leadership in collaboration with IT |
| Document volume | Several hundred per month |
| Document types | Dealer network offers - PDFs, Word, scanned images, multiple languages |
Starting point
The client operates in a segment where fast response to enquiries is critical. Offers arrive from a large dealer network, and each dealer sends documents in a different format — structured PDFs, Word forms, exports from internal systems, scanned images, smartphone photos. Varying page counts, different visual layouts, several languages.
The processing workflow was entirely manual. An operator would open the document, read it, identify the key parameters (commercial terms, technical specifications, price, timelines) and re-key them into the internal system. A single document typically took 10–15 minutes, sometimes more for complex cases. With hundreds of documents arriving per month, the process had become the bottleneck of the sales operation.
Management was considering two paths: hire more operators, or automate. Specialist OCR tools were running into the variability of incoming documents — low-resolution scans, photographs, different layouts, foreign languages. The client was looking for an approach that could handle this variability without requiring every document type to be configured separately.
What the problem was
- Input variety — from structured PDFs to smartphone photos, multiple languages, different layouts
- Time cost — 10–15 minutes per document at hundreds per month meant significant administrative capacity demands
- Slower response to enquiries — until an offer was processed, the company couldn’t react
- Pressure to hire — the alternative was taking on more people, which was expensive and slow
- Specialist OCR tools weren’t enough — configuring for every input type would mean an endless list of exceptions
- Data security — in the financial sector it is essential to know exactly where data goes and who can access it
What I did
The engagement ran in five phases. The resulting solution is built on an LLM with custom orchestration — an approach that captures input variability in a way that specialist OCR tools cannot.
1. Process mapping and document typology (1 week)
I went through the actual processing workflow with the sales and administrative team and documented:
- All typical document formats and their frequency
- What data is extracted from them (17 key fields across several categories)
- What validation rules operators apply, often intuitively
- Where edge cases and error states occur — unclear formats, missing data, foreign languages
2. Technical options assessment (1 week)
I compared four approaches:
- Specialist OCR+AI tools (Rossum, ABBYY, Hypatos)
- Azure AI Document Intelligence / AWS Textract
- LLM with custom orchestration (ChatGPT, Claude, Gemini)
- Hybrid solution
For this use case the LLM approach proved most suitable — precisely because modern LLMs can handle visually heterogeneous inputs without needing each layout configured separately.
3. Security framework and legal review (2 weeks)
In the financial sector, the security framework is a critical phase. Together with internal IT, compliance, and the legal team we worked through:
- Where to process data — selecting enterprise LLM services with guarantees that data is not used for training
- Data flows — the document’s journey from receipt, through processing, to archiving
- Encryption and protection — in-transit and at-rest
- Audit trail — a record of what was processed when and by which model
- GDPR and banking secrecy — explicit assessment of all personal and sensitive data
4. Pilot implementation (3 weeks)
The pilot was built as an orchestrator that:
- Accepted a document from the incoming channel
- Identified the format and OCR’d scans or photographs where needed
- Passed the content to the LLM with a structured prompt for extracting the 17 key fields
- Applied validation rules (data types, ranges, consistency)
- Flagged unclear cases for human review
- Wrote the data to the downstream system
The pilot ran in parallel with the manual process for 3 weeks. Results were compared daily.
5. Go-live and handover (2 weeks)
After refining the prompt, validation rules, and orchestration, the process moved into production. Human oversight was retained, but for most documents it was reduced to validation rather than full processing.
What the client received
- Technical options assessment — structured analysis of four approaches with a recommendation
- Security framework — what to send where, how to archive, how to audit
- Process map of the new workflow — roles, checks, escalation paths for errors
- Working orchestration solution built on LLM with full documentation
- Prompts and validation rules for the 17 key extracted fields
- Operational runbook — how to handle errors, quality monitoring, updates when document formats change
- Tracking metrics — extraction quality, operator intervention rate, average processing time
What the impact was
Average processing time per document fell to approximately 1 minute — from the original 10–15 minutes. The process was 10–15× faster while maintaining quality and human validation for unclear cases. The company did not need to hire any additional administrative capacity, even as document volumes continued to grow.
Beyond the direct speed improvement, the solution delivered three less obvious but equally important effects.
Faster response to enquiries. The sales team could respond more quickly, and in an environment where deals are decided within days, this has a direct impact on conversion rates and deal volume.
Freed administrative capacity. Without needing to hire more people, capacity shifted toward higher-value work — complex cases, individual clients, strategic sales support.
Standardised output data. Despite the incredible variety of inputs, the output from the orchestration process has a uniform structure. This made downstream analysis and reporting significantly easier.
A note from practice
On projects like this, the choice of technical approach is critical. If I had recommended a specialist OCR solution, we would have run into an endless number of exceptions it couldn’t handle — smartphone photos, different languages, unstructured offers. LLMs with vision capabilities handle this variability naturally, because they don’t read layout — they understand content. At the same time, the LLM approach is harder on security and governance — which is why at this financial-sector client, the security framework took almost as long as the implementation itself. Taking shortcuts on the security framework would have been unacceptable in this segment.
Related service
This project combined consulting on an AI use case, comparison of vendor and technology options, a security framework, and the design of a working orchestration process. In the service catalogue, it sits closest to AI for business.