Document AI: Modern finance and operations teams are working in PDF invoices and logistics documents. These files contain rich information – header details, nested line items, shipment summaries – but they are locked in unstructured formats.
In this post, we’ll walk through an example that combines:
- Snowflake Document AI to extract structured JSON from PDF invoices
- A lateral FLATTEN + array indexing pattern to handle nested line items cleanly
- Dynamic Tables to materialize header- and line-level facts continuously
It’s designed to handle variable numbers of line items and incremental ingestion of new PDFs.
The Use Case: Logistics Invoice with Nested Line Items
Our scenario is a logistics company (Acme Logistics Ltd) generating invoices for a retail customer. Each invoice is a PDF with:
- Header-level fields
- Invoice number: INV-2045
- Invoice date: 16-May-2025
- Vendor name: ACME LOGISTICS LTD
- PO number: PO-4500123
- Currency: INR
- Nested line items – one line per service, e.g.
- SRV-ZONEA Line Haul Charges – Zone A
- SRV-WH-MAY Warehouse Storage – May 2025
- SRV-FUEL Fuel Surcharge & Handling
- Quantities and prices per line, and invoice-level totals.
Technical:
We use Snowflake Document AI to build a model that extracts invoice metadata and returns a JSON payload like:
Step1 : We have developed the model using Document AI.
Observe the Description:Nested Array, and our goal is convert nested arrays into a clean relational structure.