Skip to content
Touchskyer's Thinking Wall
S1E06
5 min read

Silicon Workforce S1E06: $92 Bought a Product

Silicon Workforce S1E06

The first time I realized this session was different was when I saw the family calendar’s main interface screenshot.

It wasn’t a Google Calendar clone. It was an AI agent feed — a personalized greeting at the top, empty bento stats cards below, and an input box at the bottom saying “Tell me something new…” Less than 24 hours since we wrote the first line of code.

$92. That was the entire development cost of this product.

The Bill

23 hours 9 minutes. 44 subagents dispatched. 35 git commits. 103 tests all passing.

Let me break down this $92.

Total token consumption: 115 million. But 95% of that — 109 million — was cache reads. Prompt caching is the lifeline of long sessions. Without caching, the same workload would cost roughly 3-5x more, around $300-$400.

This number reveals something: AI development cost isn’t about how expensive the model is — it’s about whether you can reuse context. Each tick needs to reload the project’s code and state — if that context can be read from cache, it costs pennies; if recalculated every time, it costs dollars.

Changing Direction Mid-Course

The first half of this project was actually headed in the wrong direction.

The original spec was a Google Calendar replica: month view, week view, click a cell to add events. The previous session ran 21 iterations, building out login, data storage, intelligent parsing, and push notifications — a complete feature set.

Then I opened the page and looked — wrong.

The core problem of a family calendar isn’t “where events are stored” — it’s “why would family members open it every day.” A calendar you have to actively check isn’t much better than a paper wall calendar. The real value is an agent that proactively remembers, reminds, and notifies you across channels.

So the direction pivoted: the main interface changed from Calendar to agent feed. Calendar was demoted to a secondary page.

This decision was made outside the loop. OPC loops execute plans, they don’t generate plans. If the direction is wrong, the loop will very efficiently march in the wrong direction. Direction must be figured out before the loop starts.

The Bigger Bills

$92 was a small project. What about bigger ones?

Pi-Math — a math education platform — cost $347. 47 hours, 76 subagents, 3 context blowouts. What does “blowout” mean? AI’s working memory (context window) gets filled up, the system force-compresses it, and AI loses massive amounts of working context. Like writing a 500-page thesis that suddenly gets compressed to a 2-page summary — you know what you’re doing, but all the details are gone.

After each blowout, AI needs several ticks to re-understand the project state. That’s why $347 isn’t a linear multiple of $92 — the repeated work from blowouts is a hidden cost.

Development cost comparison across three projects

Two project costs side by side:

ProjectDurationCostSubagentsTests
Family Calendar23h$9244103
Pi-Math47h$34776200+

A senior engineer’s daily rate is roughly $800. $92 bought a functionally complete product with test coverage. If calculated by labor cost, the same work might take one to two weeks, $4,000-$8,000.

But this comparison has a prerequisite: the direction must be right. If the direction is wrong, $92 is also wasted. The loop won’t tell you the direction is wrong — it’ll just run in whatever direction you gave it.

What Loops Are Good For

After running two products, the loop’s applicability boundaries became very clear.

Good for implementation tasks with clear specs. “Write these tests,” “implement LLM Pool’s three-way fallback,” “change the homepage from Calendar to Agent Feed” — these all have specific completion criteria that loops can converge toward step by step.

Good for tasks needing review-implement cycles. Review finds problem -> implement fixes -> re-review confirms — this cycle is the loop’s sweet spot. In the $92 family calendar, the review agent discovered E2E tests were calling real LLMs (meaning test stability depended on whether the API key was valid, the network was up, and the LLM was in a good mood), then the implement agent switched to mocks. The testing agent would never have discovered this problem on its own.

Not good for tasks needing real-time UI preview. “Adjust this button’s position” — you need to see the result, and loops can’t give you instant feedback.

Not good for steps requiring external account operations. Creating a Vercel database, configuring Google OAuth — these UI operations can’t be automated.

The most important rule: Loop ROI is highest in the first two hours, then diminishes. The first two hours build the skeleton and get the main flow working; after that it’s polishing details. The marginal returns of polishing decrease until hitting a ceiling — the 94% wall from the last episode.

$92 isn’t free. But if you know when to start a loop and when to stop and look yourself, it’s currently the most cost-effective way to develop.


Silicon Workforce S1: The OPC Framework Evolution Previous: AI Works While You Sleep <- Next: When Tools Start Checking Themselves ->

Comments