Every month, invoices land in my Gmail.
And each one has a different format: → some have the data buried inside a PDF attachment, → another puts it in the email body but uses the same sender address for two different utility types, → other doesn't even attach a PDF, so it needs an account login to download it.
I mapped sender's email pattern and built a Google Apps Script that extracts amounts and due dates, saves PDFs to Drive, logs everything to a spreadsheet, and archives the emails. All triggered automatically.
https://www.loom.com/share/2808ef50bcc3411fba205207dd16fb38
Before writing the script, I analyzed invoice emails from each sender and asked: Where exactly is the amount? Where is the due date? Is there a PDF attached? What format are the numbers in?
This is the actual work. Observing the pattern:
| Sender | Where's the data? | PDF situation | Extraction method |
|---|---|---|---|
| Type 1 | Buried inside the PDF | Attached | Send PDF to Gemini, get back JSON |
| Type 2 | Amount + date in email body | Attached, but same sender for two utility types | Regex for data, Gemini classifies utility type from PDF |
| Type 3 | Amount + date in email body | NOT attached → behind a portal login | Regex for data, auto-download PDF via authenticated session |
| Type 4 | Amount + date in email body | Attached or none | Regex only, no AI needed |
The key insight: there is no single method that works for all senders. The product decision is to match each sender to the right strategy.