Ajith's Tech Tips

ajithkurup1989@gmail.com

9447955466

AJITH'S TECH TIPS

Build.Break.Improve.Repeat..!!!!

Document AI: Modern finance and operations teams are working in PDF invoices and logistics documents. These files contain rich information – header details, nested line items, shipment summaries – but they are locked in unstructured formats.

In this post, we’ll walk through an example that combines:

  • Snowflake Document AI to extract structured JSON from PDF invoices
  • lateral FLATTEN + array indexing pattern to handle nested line items cleanly
  • Dynamic Tables to materialize header- and line-level facts continuously

It’s designed to handle variable numbers of line items and incremental ingestion of new PDFs.

The Use Case: Logistics Invoice with Nested Line Items

Our scenario is a logistics company (Acme Logistics Ltd) generating invoices for a retail customer. Each invoice is a PDF with:

  • Header-level fields
    • Invoice number: INV-2045
    • Invoice date: 16-May-2025
    • Vendor name: ACME LOGISTICS LTD
    • PO number: PO-4500123
    • Currency: INR
  • Nested line items – one line per service, e.g.
    • SRV-ZONEA Line Haul Charges – Zone A
    • SRV-WH-MAY Warehouse Storage – May 2025
    • SRV-FUEL Fuel Surcharge & Handling
  • Quantities and prices per line, and invoice-level totals.

Technical:

We use Snowflake Document AI to build a model that extracts invoice metadata and returns a JSON payload like:

Step1 : We have developed  the model using Document AI.

Observe the Description:Nested Array, and our goal is convert nested arrays into a clean relational structure.

Leave a Reply

Your email address will not be published. Required fields are marked *