At UMass Amherst Dining, reporting across 37 locations used to eat about two hours a day. Someone pulled numbers from each location, pasted them into a master sheet, reconciled the inevitable mismatches, and only then could anyone ask an actual question of the data. By the time the report was ready, it was already stale. None of that was a modeling problem. It was an ETL problem (extract, transform, load), done by hand. So I automated it.
The win was not a clever model. It was deleting two hours of manual reporting a day and making the numbers trustworthy enough to forecast on.
01 The two-hour problem
Thirty-seven locations, one master sheet, and a person stitching it together by hand every morning. The work was slow, easy to get wrong, and stale by the time it landed. Worst of all, nobody trusted the numbers enough to plan on them. The fix was not a fancier dashboard. It was making the boring part disappear.
02 The pipeline at a glance
Three stages, each with one job. The messy reality of 37 locations goes in one end; a single trustworthy reporting layer comes out the other.
03 Extract: meet the data where it lives
Thirty-seven locations means thirty-seven slightly different habits. Different column orders, different names for the same thing, the occasional missing day. The first job was not transformation, it was building extraction that tolerates that mess instead of breaking on it.
04 Transform: make it boring and repeatable
The transform layer normalizes everything into one shape: consistent keys, consistent units, consistent location IDs. The rule I held onto was idempotency. Running the pipeline twice on the same inputs produces exactly the same output. That is what lets you re-run it without fear when a location submits a correction at 4pm.
- Validate at the edges
Row counts, value ranges, and a "does every location actually show up today?" check that fails loudly instead of producing a quietly wrong number.
- Keep an audit trail
What ran, when, and what it touched, so the next person (or future me) can actually read what happened.
- Write it so a teammate can follow
A pipeline only one person understands is a liability, not an asset.
05 Load: one source of truth
Everything lands in a single reporting layer, so leadership stops reconciling spreadsheets and starts asking questions. The two hours of manual reporting dropped to about twelve minutes, and most of that is now review, not assembly.
06 The real payoff: forecasting on data you trust
Once the numbers were consistent and current, they were good enough to forecast on. That opened the door to data-driven inventory forecasting across locations, anticipating demand instead of reacting to last week's stockout. You can't forecast on data you don't trust, and you can't trust data that's reassembled by hand every morning.
The lesson I keep relearning: the highest-leverage data work is often the least glamorous. Remove the toil, make the inputs trustworthy, and the interesting analysis becomes possible.
07 The live tool it became
The same kitchen operation ran another job on a hand-built 39-tab spreadsheet: turning the day's order report into production sheets. I rebuilt that as a tool you can actually use, the same "kill the toil" idea pushed one step further.
CPK Production Planner
Paste a commissary kitchen's daily order report and get the make list and per-station pick sheets across 36 locations, instantly, entirely in your browser. Nothing is sent anywhere.
Try the CPK Production Planner →