📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper from Google emphasizes that in AI-assisted software development, the actual AI model is only about 10% of system behavior. The majority depends on how developers configure, verify, and engineer the surrounding infrastructure and context.

A new whitepaper from Google, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, states that the AI model accounts for only about 10% of the behavior in AI-driven software systems. The report emphasizes that the real leverage lies in the harness and context engineering, which comprise roughly 90% of the system’s effectiveness. This shifts the traditional focus from model selection to configuration, verification, and environment design, impacting how organizations approach AI development.

The whitepaper argues that the common industry narrative overemphasizes the importance of the AI model itself. Instead, it presents evidence—such as benchmark improvements achieved solely through harness tweaks—that configuration, tools, prompts, and context management are the dominant factors shaping AI system behavior. For example, experiments with coding agents showed a move into top-tier performance by only adjusting the harness, not changing the underlying model.

It also introduces the concept of ‘agentic engineering,’ where AI is integrated within a structured framework of rules, tests, and guardrails, contrasting with ‘vibe coding,’ which relies on minimal oversight. The paper stresses that costs associated with AI are driven more by the design and maintenance of these configurations than by the model’s raw capabilities. This includes token economy considerations, security, and operational efficiency.

At a glance
reportWhen: published March 2026
The developmentThe new Google whitepaper highlights that the key to effective AI coding is not the model itself but the harness and context engineering, shifting focus in SDLC strategies.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Implications for AI Development and Investment

This revelation shifts the strategic focus for organizations adopting AI: investing in robust harness design, context management, and verification processes offers far greater returns than chasing the latest model upgrades. It also suggests that competitive advantage in AI is less about the model’s raw power and more about the infrastructure surrounding it, including tools, prompts, and guardrails. This understanding influences budget allocation, team skills development, and long-term planning, emphasizing durability over fleeting model improvements.

AI-Native Software Delivery: Proven Practices to Produce High-Quality Software Faster

AI-Native Software Delivery: Proven Practices to Produce High-Quality Software Faster

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Evolution of AI in Software Engineering

The whitepaper builds on recent trends where AI-generated code now constitutes roughly 41% of new code, with 85% of developers using AI tools regularly. Prior to this, the industry often equated model sophistication with system quality. However, as models become more capable and commoditized, the focus has shifted toward how these models are integrated and controlled. Past efforts centered on selecting the best model, but emerging evidence favors optimizing the surrounding infrastructure, which is more controllable and cost-effective.

This perspective aligns with broader industry movements towards ‘agentic engineering,’ where AI operates within a structured environment of rules, tests, and dynamic context loading, rather than relying solely on prompt engineering. The whitepaper emphasizes that this approach not only improves performance but also reduces long-term costs and security risks.

“The biggest shift in software engineering isn’t a new language or framework; it’s moving from writing code to expressing intent and trusting machines to interpret that intent.”

— Addy Osmani

Risk-First Software Development, Second Edition: Deliver Better Systems in a Post-Agile, AI World

Risk-First Software Development, Second Edition: Deliver Better Systems in a Post-Agile, AI World

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Remaining Questions About Implementation and Scope

While the whitepaper provides strong evidence that harness and context engineering dominate system behavior, it does not specify how universally this applies across all AI applications or the precise methods for optimal harness design. The long-term impacts on AI cost structures and security protocols are still being studied, and industry adoption of these insights may vary.

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for Developers and Organizations

Organizations should reevaluate their AI development strategies, prioritizing the design of robust harnesses, context management, and verification processes. Future research and case studies are expected to refine best practices for scalable, cost-effective AI systems, with a focus on tooling, automation, and security. Industry standards may evolve to emphasize infrastructure quality over model selection alone.

Amazon

AI environment setup and management

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the system behavior?

The whitepaper shows that the surrounding infrastructure—prompts, rules, tools, and context management—has a much greater influence on how AI systems perform than the model itself.

What is ‘agentic engineering’?

It refers to integrating AI within a structured framework of rules, tests, and guardrails, as opposed to minimal prompt-based approaches, to improve reliability and cost-efficiency.

How does this shift affect AI project costs?

Focusing on harness and context engineering can reduce long-term costs by minimizing token usage, security vulnerabilities, and maintenance efforts, despite higher initial investment.

Will this change how AI models are developed?

While model improvements remain valuable, the whitepaper suggests that most of the performance gains and system robustness come from better infrastructure and configuration, shifting development priorities.

What should organizations do now?

They should invest in developing strong harnesses, testing frameworks, and context management practices, viewing these as strategic assets for AI success.

Source: ThorstenMeyerAI.com

You May Also Like

RCS on Iphone: Features, Security, and Limitations

Provisioning RCS on iPhone introduces new features and security considerations that could impact your messaging privacy—discover what you need to know.

Search as Code: Perplexity Is Right About the Future — Just Not First to It

Perplexity introduces Search as Code, enabling AI models to dynamically assemble retrieval pipelines, improving accuracy and efficiency in search tasks.

How Satellite Messaging Could Change Emergency Access

How satellite messaging could revolutionize emergency access by ensuring reliable communication when ground networks fail, transforming rescue efforts forever.

Hashing Vs Encryption Vs Encoding

By understanding hashing, encryption, and encoding, you’ll discover how each method uniquely protects or transforms data—learn which is right for your security needs.