The current state of artificial intelligence is defined by a paradoxical “value gap.” Despite the breathless pace of innovation, widespread reports from MIT and Forbes suggest that up to 95% of AI pilots fail to reach production. Sriram Raghavan argues that this failure stems from a lack of formal understanding of AI as a new computing element. As enterprises rush from simple assistants to autonomous agents in a matter of months, they are finding that traditional software development practices are insufficient for the stochastic nature of Large Language Models (LLMs).
2.1 The Three-Layered Journey of Governance
Enterprise AI cannot exist without a disciplined, end-to-end governance strategy. Raghavan describes this as a series of “concentric circles” that have evolved over time. In classical machine learning, governance focused on distributional fairness and adversarial robustness. Generative AI added a second layer addressing hallucinations and source attribution. Now, the “Agentic” layer introduces risks regarding tool-calling and autonomous actions.
To navigate this, IBM developed the Risk Atlas, a comprehensive taxonomy of risks, and the Risk Atlas Nexus, an AI-based navigation advisor that guides developers from intent to specific mitigation measures. A critical focus here is tool-calling hallucinations. Raghavan distinguishes between “syntactic” errors (which fail harmlessly) and “semantic” errors—where an agent provides a correctly formatted but factually wrong command to a database.
The “So What?” Layer: The strategic implication is clear: organizations cannot treat governance as a piecemeal checklist. Because these risks are additive, a failure in the foundational layer (fairness) compromises the outer agentic layer (tool-calling). Reliability requires a platform approach where governance is baked into the entire lifecycle, from the Risk Atlas Nexus during development to runtime monitoring in production.
2.2 The Efficiency Frontier—Why Small Models “Hunt”
Raghavan dismisses the idea of a single “magic model.” Instead, he predicts a market polarization where the “middle bucket” of models (100B–200B parameters) disappears. These models are being squeezed: small models (under 10B) are now hitting performance benchmarks that previously required massive hardware, while Frontier APIs handle the most complex reasoning.
IBM’s Granite 3.0 series exemplifies this shift. Built on a Hybrid Mamba 2 architecture—which combines Transformers with state space models—these models provide enormous efficiency in memory footprint. The Granite 3.0 family is also the first open-source model to be ISO 42001 certified, providing a verified audit of data sanitization and governance.
The “So What?” Layer: For the enterprise, “small models hunt” because they provide 98% of a frontier model’s performance at 1/50th of the cost. By leveraging WebGPU to run these models directly in a browser or on a laptop, organizations can eliminate massive infrastructure overhead and keep sensitive data local, gaining a massive competitive advantage in operational efficiency.
2.3 Defining Generative Computing
The most provocative shift in the keynote is the move from “brutal” natural language prompting to “Generative Programming.” Raghavan argues that prompting is a trial-and-error process that is brittle, unportable, and insecure. To solve this, IBM introduced the Melia toolkit, which treats LLMs as computational agents rather than human-like entities.
Melia represents a return to CS 101 principles by enforcing the separation of instructions from data. It replaces English-language “praying” with structured Python-based control flow, moving the logic (if/then/retry) outside the stochastic model and into the deterministic code.
The “So What?” Layer: Treating an LLM as a software component rather than a conversational partner is the only path to reliability. By moving control flow into Python, developers can create modular, testable applications that don’t break when a model is upgraded from one version to the next.
Conclusion: These elements converge to form the Agentic Development Life Cycle (ADLC). This new framework, jointly authored with Anthropic, replaces the traditional SDLC and requires developers to master new muscles in evaluation and stochastic testing to build the next generation of enterprise software.