Which AI coding tool has the strongest sandboxing?

Codex CLI. It enforces safety at the OS kernel layer using Seatbelt on macOS and Landlock plus seccomp on Linux, which means an agent literally cannot escape the sandbox through application-layer tricks. Claude Code uses application-layer hooks for permission prompts and is fine for most cases but is fundamentally a softer boundary. Cursor's sandboxing is the lightest of the three; it relies on VS Code's process model and the plugin's own discipline.

What is AGENTS.md and is it the same as CLAUDE.md?

AGENTS.md is an open standard for project-level agent instructions, supported by Codex CLI and several other agents. CLAUDE.md is Anthropic's project-level instructions file used by Claude Code. The formats are similar (markdown with conventions) but distinct files; an agent that supports AGENTS.md will read it, an agent that supports CLAUDE.md will read CLAUDE.md, and a few agents (including Cursor) read both.

Can I use the same MCP servers across Codex, Claude Code, and Cursor?

Yes. The Model Context Protocol is the open standard all three implement, so a server you build or install for one works across the others with minimal configuration changes. Each tool has its own configuration file (~/.codex/config for Codex, .claude/settings.json for Claude Code, .cursor/mcp.json for Cursor), but the server itself is portable.

Which tool is best for a security-conscious team?

Codex CLI for kernel-level isolation when running untrusted code. Claude Code if you need programmable governance hooks (custom permission prompts, audit logs, policy gates) without owning a VM strategy. Cursor is the weakest of the three on the security axis -- it is built around the trust model of a code editor on a developer's machine, not around running untrusted agents.

Codex CLI vs Claude Code vs Cursor: 2026 Architecture Deep-Dive (Sandboxing, Context, Plugins, Scheduling)

The 2026 AI coding tool conversation usually defaults to "which one has the smartest model". That is the wrong question. Codex CLI, Claude Code, and Cursor mostly use the same handful of frontier models -- Claude Opus 4.7, GPT-5.4, Gemini 3.1 Pro -- and the model is increasingly a parameter you choose, not a property of the tool. The real differences are architectural: how each tool sandboxes the agent, how it manages context, how it composes with other tools, and how it lets a team govern what the agent can do.

This guide unpacks those architectural differences and gives a decision framework for which fits your shape of work. If you want a feature-by-feature surface comparison instead, our Cursor vs Claude Code piece covers that for those two; this guide goes deeper on the architecture and adds Codex CLI to the mix.

Key Takeaways

Sandboxing: Codex CLI is kernel-level (Seatbelt on macOS, Landlock+seccomp on Linux), Claude Code is application-level via hooks, Cursor relies on the editor's own permission model.
Configuration: Codex reads AGENTS.md, Claude Code reads CLAUDE.md, Cursor reads .cursorrules and (newer) Cursor Skills.
Composition: Claude Code's subagents and Codex's thread automations are the strongest patterns for parallel/long-running work; Cursor's Background Agents and Canvases solve different problems.
MCP support: All three implement MCP. The same server runs across them with minor config tweaks.
Best fit by team shape: Solo developer leaning visual → Cursor. Solo developer leaning terminal → Claude Code. Org/team prioritizing security and audit → Codex CLI.

Why Architecture Beats Model Choice in 2026

The April 2026 model landscape is convergent. Claude Opus 4.7 (released April 16) is a notable improvement on Opus 4.6 in advanced software engineering. GPT-5.4 holds the multimodal lead. Gemini 3.1 Pro is unmatched on long context. None of them is a category killer. All three are accessible inside all three tools (Codex defaults to OpenAI models but supports BYOK, Claude Code defaults to Anthropic, Cursor lets you pick per session).

What is divergent is what the tool does around the model. Sandboxing decides whether an agent can wreck your machine. The configuration system decides how a team encodes house rules. The plugin system decides what the agent can reach for. The composition primitives decide how multi-step work actually runs. These are the choices that lock you in for the next year of work, and they are the ones the model-vs-model framing skips entirely.

The Three Architectures at a Glance

	Codex CLI	Claude Code	Cursor
Built by	OpenAI	Anthropic	Cursor (Anysphere)
Surface	Terminal, also bundled in Codex App	Terminal + Claude Desktop + Code Web	VS Code-fork desktop app
Language	Rust	Node + TypeScript	Electron (TypeScript) + native components
Sandboxing	OS kernel (Seatbelt / Landlock / seccomp)	App-layer hooks + permission prompts	Editor-level + plugin discipline
Project config	`AGENTS.md`, `SKILL.md`	`CLAUDE.md`	`.cursorrules`, Cursor Skills
Plugin system	MCP, OpenAI Apps	MCP, Skills, Hooks	Cursor Marketplace, MCP, Plugins
Composition	Subprocesses, thread automations	Subagents, Routines	Background Agents, Remote Agents, Canvases
Pricing model	Bundled with ChatGPT Plus/Pro/Business/Enterprise	Pro $20, Max $100/$200, Team, Enterprise	Subscription tier ($20-40/mo) + usage
Open source	Yes (Apache 2.0)	Partially (CLI client)	Closed
Best fit	Security-conscious teams, batch/cloud work	Interactive agentic work, large refactors	Visual editing, in-IDE workflows

Sandboxing: The Architectural Choice You Cannot Easily Reverse

Sandboxing is the single most consequential architectural decision a coding agent makes. It determines what happens when the agent goes off the rails -- a hallucinated rm -rf, a runaway shell loop, a silently scoped-up chmod. Once you ship the agent into your team, changing the sandboxing model is a migration, not a setting.

Codex CLI: kernel-level

Codex CLI enforces sandboxing at the OS kernel layer. On macOS it uses Seatbelt profiles to constrain filesystem and network access; on Linux it composes Landlock for path-based access control with seccomp for syscall filtering. The agent gets a kernel-enforced bubble: it can read what you let it read, write where you let it write, and the rest is a syscall failure regardless of what the model decides to try.

The practical effect: a Codex agent running an untrusted script in --sandbox mode literally cannot reach into your home directory or hit the network if the profile says no, even if the model produces shellcode that tries. For teams that run Codex on shared machines, against untrusted PRs, or as part of CI, this is the highest-confidence boundary on the market. The trade-off is that the sandbox profile is the source of truth -- if the profile is too tight, the agent fails; if too loose, the kernel-level guarantee is hollow.

Claude Code: application-layer hooks

Claude Code does not run a kernel sandbox. Instead it uses an application-layer hook system and a permission-prompt pattern. Every potentially destructive action -- writing a file, running a shell command, hitting a URL -- can fire a hook. Hooks are scripts you write that decide whether the action should be allowed, logged, modified, or blocked. The model is asked for permission inline; you confirm or deny at the prompt or via a hook.

The practical effect: programmable governance. A team can write a hook that blocks any shell command containing rm -rf, requires approval before any HTTP call, or logs every file write to a SIEM. The boundary is softer than a kernel sandbox -- a model determined to break out has more attack surface -- but the policy expressivity is much higher. This is the right shape for teams that want to encode their own house rules in code.

Cursor: editor-level

Cursor's sandboxing model is the lightest of the three. The agent runs as part of the Cursor editor process, with the file system access of the user who launched Cursor. Permissions are gated at action prompts (Cursor asks before running a command or editing a file outside the open project), and there are some configuration options for restricting tool access, but there is no kernel-level isolation and no programmable hook system.

This is not a flaw -- it is a design choice. Cursor is a code editor first; the trust model is "the developer is at the keyboard, the agent is helping". For interactive in-IDE work that is the right model. For unattended runs or untrusted code review it is the wrong tool.

What this means for your choice

The decision framework:

Running untrusted code, doing PR review on outside contributions, or operating in a regulated industry: Codex CLI for the kernel-level guarantee.
Building your own governance policies, integrating with internal logging/auth systems, or running in a high-control enterprise: Claude Code for the hook system.
Day-to-day visual editing on your own laptop with full trust in the codebase: Cursor is fine.

Configuration: The File Where House Rules Live

Every coding agent needs a way to encode "how this project works" -- conventions, build commands, preferred libraries, things to never touch. The three tools each ship a different file, and the differences matter for portability.

CLAUDE.md (Claude Code)

CLAUDE.md is the most opinionated of the three. It supports a layered system: ~/.claude/CLAUDE.md for global rules, <project>/CLAUDE.md for project-level rules, and per-directory CLAUDE.md files that override the parent. Claude Code reads them in that order on every session, and you can reference the same patterns in subagents. The format is plain markdown with light conventions (the # Commands section is widely used for "here are the commands you can run").

The strength is the layered model -- it scales from "I write what I always want Claude to do" to "this team has 200 conventions across 50 directories" without restructuring. The weakness is portability: only Claude Code reads CLAUDE.md natively, though some other tools (Cursor included) can be configured to also load it.

For more on Claude Code's configuration patterns, see our Claude Code context management guide.

AGENTS.md (Codex CLI)

AGENTS.md is an emerging open standard, supported by Codex CLI and a growing list of other tools. The format is intentionally simple: markdown with sections for instructions, conventions, file patterns, and commands. The bet is that a single shared file is more valuable than a tool-specific one, especially for open-source projects where contributors may use different agents.

Codex CLI also supports SKILL.md files for reusable skill definitions -- a skill is a prompt + tool wiring that the agent can compose into other tasks. Skills are increasingly the unit of "how I do X" across the agent ecosystem.

The strength is portability and the open-standard bet. The weakness is that CLAUDE.md is more entrenched in real teams; if your existing project already has a battle-tested CLAUDE.md, switching to AGENTS.md is migration work for marginal gain.

.cursorrules + Cursor Skills (Cursor)

Cursor reads .cursorrules from the project root, with the same layered concept (a .cursorrules in a subdirectory overrides the parent). The format is plain text with light conventions. Cursor 3.x added Cursor Skills, which are reusable prompt+tool bundles installable from the Cursor Marketplace -- the closest analogue to Claude Code's skills.

The strength is integration with the marketplace; many third-party Skills exist for common workflows (code review, refactoring, test generation). The weakness is the rules format itself is less expressive than CLAUDE.md or AGENTS.md, and the marketplace's quality bar varies.

Multi-tool teams: which file should you maintain?

For a team that uses more than one tool, the pragmatic answer in April 2026 is: maintain CLAUDE.md as the canonical file (it is the most expressive), then symlink or sync to AGENTS.md and Cursor Rules. The cost is a few minutes of maintenance; the upside is that whichever agent a team member is using, they get the same conventions.

Composition: How Multi-Step Work Actually Runs

The interesting question in 2026 is not "can the agent write code", it is "can the agent run a four-hour project". Each tool ships different primitives for composing multi-step work.

Claude Code: subagents and Routines

Claude Code's two composition primitives are subagents and Routines. Subagents are child Claude Code sessions with their own clean context window; the parent delegates a task ("read this file and summarize the auth flow") and only the final report comes back. Subagents are the right tool for any work that would otherwise pollute the parent context with noise.

Routines are saved Claude Code configurations that run on Anthropic's cloud on a schedule, an HTTP trigger, or a GitHub event. They turn Claude Code from "an agent you start" into "an agent that starts itself". For background work -- nightly sweeps, PR triage, scheduled docs sync -- Routines are the cleanest pattern in the ecosystem.

The combination is powerful: a Routine fires at 3am, spawns subagents to investigate three repos in parallel, and posts a summary to Slack before standup. No human at the keyboard.

Codex CLI: subprocesses and thread automations

Codex CLI's composition is shell-native. The agent can spawn subprocesses (other Codex sessions, other shell commands), and as of the April 16 update, Codex thread automations let a thread run continuously and react to external triggers. The pattern is similar to Claude Code Routines but lives inside the Codex App rather than as a separate cloud product.

Codex's strength is the unix-y composition story: any command-line tool composes naturally because the agent is one. Pipe the agent's output into another tool, redirect into a file, run in background, schedule via cron -- all of it works the way a developer expects.

Cursor: Background Agents, Remote Agents, Canvases, Automations

Cursor's composition primitives are surface-driven. Background Agents run in-editor while you keep coding in the foreground (Cursor's strongest feature). Remote Agents let Cursor drive any machine you have SSH access to. Canvases are interactive visualizations the agent can build; Automations are Cursor's scheduled-trigger product (covered in our scheduled agents comparison).

Cursor's strength is in-editor parallelism -- nothing else lets you have three agents working on three branches in three panes while you edit a fourth. The weakness is that the composition primitives are tied to the Cursor surface; you cannot easily script Cursor from outside Cursor.

When composition matters most

For teams shipping a real volume of work through agents:

High-volume, scheduled, hands-off: Claude Code Routines or Codex thread automations.
Concurrent in-editor work: Cursor Background Agents.
Programmatic, scriptable: Codex CLI (the only one with a clean shell-native story).
Mixed: some scheduled, some in-editor, some across tools: the right answer is often "use more than one", with Claude Code or Codex for scheduled work and Cursor for in-editor work.

Plugin Systems and the MCP Layer

All three tools implement the Model Context Protocol -- the open standard for connecting agents to external tools. This is genuinely important: an MCP server you build for one works across all three with minor config tweaks.

But each tool also has its own plugin layer on top:

Cursor Marketplace is the most consumer-friendly. It has the most third-party plugins (Skills, Canvases, integrations), browseable in the editor, one-click install. The quality bar is mixed but the volume is high.
Claude Code Skills + Hooks is the most programmable. A Skill is a reusable prompt+tool bundle; a Hook is a script that runs on agent events. Both are file-based and version-controllable, which is the right shape for a team.
Codex Apps + MCP is the most open. Codex is open-source under Apache 2.0, the App ecosystem is built on standard MCP servers, and there is no marketplace gate. This is great for teams that want full control; less great for users who want curated discovery.

Practical rule: if you want to install other people's stuff easily, use Cursor. If you want to write your own and ship it across the team, use Claude Code or Codex.

Pricing and Lock-In

	Codex CLI	Claude Code	Cursor
Free tier	None (requires ChatGPT plan)	None	Limited free tier
Entry plan	$20/mo (ChatGPT Plus)	$20/mo (Claude Pro)	$20/mo (Cursor Pro)
Most-used plan	$200/mo (ChatGPT Pro)	$100-200/mo (Claude Max)	$40/mo (Cursor Business)
Team plan	$25/user (ChatGPT Business)	$25/user (Claude Team)	$40/user (Cursor Business)
Enterprise	Available	Available	Available
Per-token usage	No (bundled)	No (bundled, with caps)	Yes (above limits)
Open source	Yes (Apache 2.0)	Partially	No

The pricing surface is similar but the lock-in profiles differ:

Codex CLI is the most portable. Open-source CLI, supports BYOK, your AGENTS.md/SKILL.md files work elsewhere. Easiest to migrate away from if needed.
Claude Code has medium lock-in. The CLAUDE.md format is portable, but Routines and most of the orchestration features are Anthropic-cloud-only.
Cursor has the most lock-in. The editor itself, the marketplace, the Canvases, the Background Agents -- nearly all are Cursor-specific. Strong product, but committing to Cursor is committing to Cursor.

Choose Codex CLI If...

You are running an agent on a shared or untrusted machine and need kernel-enforced sandboxing.
Your team values open source, BYOK, and the AGENTS.md standard.
Your composition needs are shell-native (cron jobs, pipelines, CI integration).
You already pay for ChatGPT Plus, Pro, or Business and want the agent that ships with it.
You operate in a security-conscious or regulated industry.

Choose Claude Code If...

Your work is interactive and agentic -- large refactors, multi-file investigations, long-running sessions.
You want programmable governance (hooks, custom permissions, audit logs) without owning a VM strategy.
You want the strongest composition story for unattended work via Routines.
You value Anthropic's safety posture and instruction-following profile (especially with Opus 4.7).
You are willing to commit to Anthropic's cloud for the orchestration layer.

Choose Cursor If...

Most of your day is visual editing with AI assist.
You want the lowest friction onboarding for developers coming from VS Code.
Your team values the marketplace's depth of third-party plugins and Skills.
You need Canvases, Background Agents in-editor, or Remote Agents driving other machines.
Your trust model is "developer at the keyboard with full repo access" rather than "untrusted agent in a sandbox".

Or Use More Than One

The most productive teams in April 2026 are not picking one. The pattern that has emerged:

Cursor for daily coding. Visual editing, in-editor parallelism, the lowest-friction surface for the work that fills most of an hour.
Claude Code for heavy lifting. Big refactors, debugging sessions that span many files, anything that needs the agent to plan and execute multi-step work.
Codex CLI for sandboxed batch work. Untrusted code review, CI integration, scheduled jobs against PRs.

The integration tax is small (the same MCP servers work across all three, the same models can run in any of them) and the upside is that each tool is doing what it is best at. As of mid-2026 this is the dominant pattern at most teams that take AI coding tools seriously, and it is likely to remain the right shape until one of the three vendors absorbs the others' strengths -- which has not happened yet.

The architectural decisions matter more than the model choice. Pick the architectures that match how your team actually works, and let the model improve underneath them.