AI-Native Infrastructure One-Page Reference Architecture: Three Planes + One Loop
The true value of architecture is enabling organizational consensus on complex systems within five minutes, not creating another new technology stack.
Industry-leading vendors emphasize different aspects: Cisco focuses more on AI-native infrastructure and reference designs such as Cisco Validated Designs; HPE emphasizes an open, full-stack AI-native architecture across the AI full lifecycle; NVIDIA explicitly proposes adding a new AI-native infrastructure tier for inference context reuse in long-context and agentic workloads. This chapter converges these perspectives into a verifiable architecture framework: Three Planes + One Loop.
One-Page Architecture Overview
Below is the detailed reference architecture diagram for the three planes and closed loop of AI-native infrastructure, helping readers quickly establish an overall understanding:
This diagram can be understood as: The new control plane (Intent) of AI-native infrastructure must be constrained by the Governance Plane and produce measurable resource consequences in the Execution Plane.
This is also the common ground behind different vendor narratives: Cisco uses reference designs and delivery frameworks to make infrastructure capabilities scalable and replicable; HPE uses open/full-stack to cover lifecycle delivery; NVIDIA elevates the reuse of “context state assets” to an independent infrastructure layer. All three point to the same issue: incorporating AI resource consequences into governable system boundaries.
Core Capabilities of the Three Planes
This section details the core capabilities of each of the three planes to help clarify focus areas during architecture reviews.
Intent Plane
The Intent Plane is responsible for expressing “what I want,” including the following capabilities:
- Inference/Training APIs (entry points and contracts)
- MCP/Tool calling protocols and tool catalog (standardizing tool access as “declaratable capability boundaries”)
- Agent/Workflow (breaking down tasks into executable steps)
- Policy as Intent: priorities, budgets, quotas, compliance/security constraints (front-loaded in the form of “intent”)
Key Point: The Intent Plane is not the starting point itself; the real starting point is—whether intent can be translated into executable and governable plans. Otherwise, Agents/MCP will only amplify uncertainty: more tools, longer chains, larger state spaces, and more uncontrollable resource consumption.
During architecture reviews, focus on these questions:
- Is intent declarable (contract) and rejectable (admission)?
- Does intent carry budget/priority/compliance constraints (policy as intent)?
- Is the translation from intent to execution traceable?
Execution Plane
The Execution Plane is responsible for landing intent into “actual execution,” mainly including:
- Training, fine-tuning, inference serving, batch processing, agentic runtime
- “State and Context” services: cache/KV/vector/context memory, etc., for carrying inference context, retrieval results, and session state
- Full-chain observability hooks: token metering, GPU time, video memory, network traffic, storage I/O, queue wait times, etc.
An industry trend worth emphasizing: as long-context and agentic workloads become widespread, “context” itself becomes a critical state asset and may even rise to become an independent infrastructure layer. NVIDIA explicitly proposes inference context memory storage in the Rubin platform, establishing an AI-native infrastructure tier to provide shared, low-latency inference context at the pod level to support reuse (for long-context and agentic workloads).
Review points focus on three things:
- Is execution measurable: Can attribution be done across token/GPU/network/storage dimensions?
- Is state governable: What are the lifecycle, reuse boundaries, and isolation strategies for context and cache?
- Is observability closed-loop oriented: Observability is not for “seeing,” but for “enabling governance to correct deviations.”
Governance Plane
The Governance Plane is the “core differentiator” of AI-native infrastructure, responsible for transforming resource scarcity and uncertainty into a controllable system:
- Budget/quotas/billing: governing consumption across teams, tenants, projects, models, and agent tasks
- Isolation and sharing strategies: same-card sharing, video memory isolation, preemption, priorities, fairness
- Topology-aware scheduling: incorporating GPU, interconnect, network, and storage topology into placement (especially in training and high-throughput inference)
- Risk and compliance control: audits, policy enforcement points, sensitive data and access control
- Integration with FinOps/SRE/SecOps: incorporating cost, reliability, and risk into a single operational mechanism
From a vendor narrative perspective, this layer typically corresponds to “reference architecture + full-stack delivery”: Cisco emphasizes accelerating and scaling delivery in AI infrastructure through “fully integrated systems + Cisco Validated Designs”; HPE emphasizes end-to-end delivery with “open, full-stack AI-native architecture” to support model development and deployment.
The baseline question for Governance Plane reviews is: Can you make explainable resource allocation and degradation decisions under budget/risk constraints?
Closed-Loop Mechanism Explained
This section introduces the core workflow of the closed-loop mechanism to help understand the essential difference between AI-native and AI-ready.
The minimal implementation of the closed loop includes four steps:
- Admission: Bind intent with policy at the entry point (budget, priority, compliance)
- Translation: Translate intent into executable plans (select runtime, resource specifications, topology preferences)
- Metering: End-to-end metering and attribution across tokens/GPU/network/storage
- Enforcement: Budget triggers degradation/rate limiting/preemption; risk triggers isolation/audits; SLO triggers scaling/routing
In other words: The closed loop is not a “monitoring dashboard,” but a “governance-driven real-time correction mechanism.” If there is no closed loop for “intent → consumption → cost/risk outcomes,” systems can easily spin out of control across cost, risk, quality, and other dimensions.
This is also why “AI-native” is often accompanied by changes in operating model: when system execution speed and resource consumption are amplified by models/agents, organizations must front-load governance mechanisms and institutionalize them. LF Networking also explicitly points out: becoming AI-native is not just a technical migration, but a redefinition of the operating model.
Practical Usage of the One-Page Architecture
In subsequent chapters, this “one-page architecture” can be repeatedly reused as a review template:
- Discussing MCP/Agent: Position them in the Intent Plane and constrain with the closed loop (admission/translation) to avoid “intent proliferation”
- Discussing runtime and platforms: Place in the Execution Plane, focusing on observable, attributable, governable state assets (context/cache/KV/vector)
- Discussing GPUs, scheduling, costs: Ground in the Governance Plane, using budget/isolation/topology/metering as leverage points
- Discussing enterprise implementation: Use the closed loop to examine if it’s “truly AI-native” (whether cost/risk outcomes can be written back as executable policies)
If you can only remember one sentence: The determination of AI-native is not in “how many AI components are used,” but in “whether there exists an executable governance closed loop that constrains intent to controllable resource consequences and economic/risk outcomes.”
Summary
The one-page reference architecture provides a unified systems language and review framework for AI-native infrastructure. Through the three planes of Intent, Execution, and Governance, combined with the closed-loop mechanism, organizations can achieve efficient collaboration in architecture design, resource governance, and risk control. Looking ahead, as AI-native capabilities continue to mature, the governance closed loop will become a core competitive advantage for enterprises implementing AI.
References
- NVIDIA AI Enterprise Reference Architecture - nvidia.com
- Google Cloud AI Infrastructure - cloud.google.com
- AWS Well-Architected Framework - aws.amazon.com