Discussion about this post

User's avatar
Sumeet Maniar's avatar

Ravi,

Excellent analysis.

I’ve been arguing that the winning architecture for most enterprise AI systems will be a hybrid of deterministic and non-deterministic approaches. LLMs are incredible for discovery, ideation, ambiguity, and rapid prototyping, but many production workflows eventually benefit from being refactored into deterministic systems where possible. That improves cost, reliability, explainability, and scalability. Meaning use the speed of non-deterministic to get to the deterministic effectively.

In our product and AI labs since last year, we often followed a pattern:

• Used frontier models to rapidly explore, prototype, and discover solutions

• Identify where reasoning is actually required versus where deterministic logic, traditional ML, or conventional software can perform the task more efficiently. Also think through about what are plain vanilla tokens vs. thinking tokens. Bifurcate the branches. Even count the tokens used per total turns achieving the outcome.

• Re-architect the workflow accordingly.

Healthcare is a great example. Much of clinical decision-making ultimately maps to evidence-based rules, protocols, and structured logic. The challenge isn’t replacing everything with LLMs it’s understanding where language reasoning adds value and where it doesn’t.

In another use case it took me 47 versions via to optimize parallel API calling rather than sequential for a map /geo solution I was iterating on. The frontier models could not do it, whereas if I coded it - this would take 10 minutes. A lot of tokens and two hours wasted.

Your point on “skills” resonated. I’ve wondered whether we’re sometimes just relocating context-window complexity rather than reducing it. Hidden prompt bloat, excessive guardrails, and agent-to-agent chatter can create significant token overhead with diminishing returns, which are additional, yet similar topics that add to “token maxxing.” I still want to try dynamic inference calling at the time of the agent processing to both closed or open models to see how this performs.

We all know now a solution - it’s likely a tightly orchestrated system of specialized components, each with a “very” narrow responsibility, small context window, and clear evaluation criteria. But if its too narrow, does that not become a function call? That architecture tends to be more efficient, more reliable, and less prone to cascading hallucinations.

For PMs, the real “craft” is cradt systems to consistently reach 90–95%+ human-level performance through rigorous evals and iteration. Though, sometimes my mind gets numbed or bored by sub-optimal time in tweaking. Once that threshold is achieved, the hard work shifts to industrializing the solution governance, monitoring, reliability, and engineering inference at scale and in parallel.

Great perspective on the skills layer where at times they get called up. On another note, Anthropic did a great session on optimizing guardrails via evals late last week (one can Google the accompanying video). Workshop name: agent decomposition. https://github.com/anthropics/cwc-workshops and XVideo: https://x.com/0x_rody/status/2061019244595233135?s=20

Armughan's avatar

I especially appreciate the piece you share about building skill libraries and how much context is dragged along when invoking them.

7 more comments...

No posts

Ready for more?