AI-Native Infrastructure One-Page Reference Architecture: …

The true value of architecture is enabling organizational consensus on complex systems within five minutes, not creating another new technology stack.

Industry-leading vendors emphasize different aspects: Cisco focuses more on AI-native infrastructure and reference designs such as Cisco Validated Designs; HPE emphasizes an open, full-stack AI-native architecture across the AI full lifecycle; NVIDIA explicitly proposes adding a new AI-native infrastructure tier for inference context reuse in long-context and agentic workloads. This chapter converges these perspectives into a verifiable architecture framework: Three Planes + One Loop.

Note

The “architecture” in this chapter serves as a review framework, not a component checklist. The goal is to unify organizational language and review boundaries, not to reinvent the technology stack.

One-Page Architecture Overview

Below is the detailed reference architecture diagram for the three planes and closed loop of AI-native infrastructure, helping readers quickly establish an overall understanding:

Figure 1: Three planes detailed architecture diagram

This diagram can be understood as: The new control plane (Intent) of AI-native infrastructure must be constrained by the Governance Plane and produce measurable resource consequences in the Execution Plane.

This is also the common ground behind different vendor narratives: Cisco uses reference designs and delivery frameworks to make infrastructure capabilities scalable and replicable; HPE uses open/full-stack to cover lifecycle delivery; NVIDIA elevates the reuse of “context state assets” to an independent infrastructure layer. All three point to the same issue: incorporating AI resource consequences into governable system boundaries.

Core Capabilities of the Three Planes

This section details the core capabilities of each of the three planes to help clarify focus areas during architecture reviews.

Intent Plane

The Intent Plane is responsible for expressing “what I want,” including the following capabilities:

Inference/Training APIs (entry points and contracts)
MCP/Tool calling protocols and tool catalog (standardizing tool access as “declaratable capability boundaries”)
Agent/Workflow (breaking down tasks into executable steps)
Policy as Intent: priorities, budgets, quotas, compliance/security constraints (front-loaded in the form of “intent”)

Key Point: The Intent Plane is not the starting point itself; the real starting point is—whether intent can be translated into executable and governable plans. Otherwise, Agents/MCP will only amplify uncertainty: more tools, longer chains, larger state spaces, and more uncontrollable resource consumption.

During architecture reviews, focus on these questions:

Is intent declarable (contract) and rejectable (admission)?
Does intent carry budget/priority/compliance constraints (policy as intent)?
Is the translation from intent to execution traceable?

Execution Plane

The Execution Plane is responsible for landing intent into “actual execution,” mainly including:

Training, fine-tuning, inference serving, batch processing, agentic runtime
“State and Context” services: cache/KV/vector/context memory, etc., for carrying inference context, retrieval results, and session state
Full-chain observability hooks: token metering, GPU time, video memory, network traffic, storage I/O, queue wait times, etc.

An industry trend worth emphasizing: as long-context and agentic workloads become widespread, “context” itself becomes a critical state asset and may even rise to become an independent infrastructure layer. NVIDIA explicitly proposes inference context memory storage in the Rubin platform, establishing an AI-native infrastructure tier to provide shared, low-latency inference context at the pod level to support reuse (for long-context and agentic workloads).

Review points focus on three things:

Is execution measurable: Can attribution be done across token/GPU/network/storage dimensions?
Is state governable: What are the lifecycle, reuse boundaries, and isolation strategies for context and cache?
Is observability closed-loop oriented: Observability is not for “seeing,” but for “enabling governance to correct deviations.”

Governance Plane

The Governance Plane is the “core differentiator” of AI-native infrastructure, responsible for transforming resource scarcity and uncertainty into a controllable system:

Budget/quotas/billing: governing consumption across teams, tenants, projects, models, and agent tasks
Isolation and sharing strategies: same-card sharing, video memory isolation, preemption, priorities, fairness
Topology-aware scheduling: incorporating GPU, interconnect, network, and storage topology into placement (especially in training and high-throughput inference)
Risk and compliance control: audits, policy enforcement points, sensitive data and access control
Integration with FinOps/SRE/SecOps: incorporating cost, reliability, and risk into a single operational mechanism

From a vendor narrative perspective, this layer typically corresponds to “reference architecture + full-stack delivery”: Cisco emphasizes accelerating and scaling delivery in AI infrastructure through “fully integrated systems + Cisco Validated Designs”; HPE emphasizes end-to-end delivery with “open, full-stack AI-native architecture” to support model development and deployment.

The baseline question for Governance Plane reviews is: Can you make explainable resource allocation and degradation decisions under budget/risk constraints?

Closed-Loop Mechanism Explained

This section introduces the core workflow of the closed-loop mechanism to help understand the essential difference between AI-native and AI-ready.

Significance of the Closed-Loop Mechanism

The closed loop is the most confusing yet critical dividing line between AI-native and “AI-ready.”

The minimal implementation of the closed loop includes four steps:

Admission: Bind intent with policy at the entry point (budget, priority, compliance)
Translation: Translate intent into executable plans (select runtime, resource specifications, topology preferences)
Metering: End-to-end metering and attribution across tokens/GPU/network/storage
Enforcement: Budget triggers degradation/rate limiting/preemption; risk triggers isolation/audits; SLO triggers scaling/routing

In other words: The closed loop is not a “monitoring dashboard,” but a “governance-driven real-time correction mechanism.” If there is no closed loop for “intent → consumption → cost/risk outcomes,” systems can easily spin out of control across cost, risk, quality, and other dimensions.

This is also why “AI-native” is often accompanied by changes in operating model: when system execution speed and resource consumption are amplified by models/agents, organizations must front-load governance mechanisms and institutionalize them. LF Networking also explicitly points out: becoming AI-native is not just a technical migration, but a redefinition of the operating model.

Practical Usage of the One-Page Architecture

In subsequent chapters, this “one-page architecture” can be repeatedly reused as a review template:

Discussing MCP/Agent: Position them in the Intent Plane and constrain with the closed loop (admission/translation) to avoid “intent proliferation”
Discussing runtime and platforms: Place in the Execution Plane, focusing on observable, attributable, governable state assets (context/cache/KV/vector)
Discussing GPUs, scheduling, costs: Ground in the Governance Plane, using budget/isolation/topology/metering as leverage points
Discussing enterprise implementation: Use the closed loop to examine if it’s “truly AI-native” (whether cost/risk outcomes can be written back as executable policies)

If you can only remember one sentence: The determination of AI-native is not in “how many AI components are used,” but in “whether there exists an executable governance closed loop that constrains intent to controllable resource consequences and economic/risk outcomes.”

Summary

The one-page reference architecture provides a unified systems language and review framework for AI-native infrastructure. Through the three planes of Intent, Execution, and Governance, combined with the closed-loop mechanism, organizations can achieve efficient collaboration in architecture design, resource governance, and risk control. Looking ahead, as AI-native capabilities continue to mature, the governance closed loop will become a core competitive advantage for enterprises implementing AI.

References

Created on Jan 18, 2026 Updated on Jan 18, 2026 1126 words about 6 Minute