Working Paper · Computational Stewardship Theory · 2026
Ambient Agency
A Theory of Computational Stewardship for the Post-Prompt Era
San Francisco, California, USA · andrewcswerdlow@gmail.com
Abstract
Large language models have produced a generation of agentic software that plans, reasons, and acts on a user's behalf. These systems are nonetheless interaction-centric: they are activated by discrete prompts, scoped to bounded tasks, and terminate on completion. We argue this paradigm is transitional, and that its successor differs in kind rather than degree.
We introduce Ambient Agency, a computational model in which intelligent systems operate as persistent, context-aware participants embedded in human cognitive, organizational, and environmental systems. Ambient agents are not invoked. They observe, model, anticipate, and intervene continuously, while remaining largely beneath the threshold of conscious interaction.
We develop Computational Stewardship Theory (CST), which treats ambient agents not as tools but as computational fiduciaries holding delegated authority over a principal's goals, attention, and decision environment. From CST we derive three foundational research challenges — persistent world modeling, temporal preference alignment, and constitutional delegation — and isolate their hardest case, the Trailing Principal Problem. We propose two implementable mechanisms — Consent Horizons and the Legitimacy Ledger — that make delegated authority auditable without real-time supervision, and we state the program's central empirical claim in falsifiable form.
CCS Concepts · Computing methodologies → Intelligent agents; Machine learning · Human-centered computing → Human–computer interaction (HCI).
Keywords · ambient agents · agentic AI · computational stewardship · delegated authority · AI alignment · principal–agent theory · human–AI interaction · AI governance
1The Locus of Initiative
Computing advances by changing the locus of initiative — where in the human–machine loop action originates. Each transition was read, at the time, as merely faster or more convenient. In retrospect each relocated initiative.
The aspiration that computation recede into the background of human activity is itself decades old — Weiser's ubiquitous computing imagined machines that "weave themselves into the fabric of everyday life until they are indistinguishable from it" [14] — but it was articulated for sensors and displays, not for systems that reason and act. Contemporary agent architectures that interleave reasoning and action [15], and the benchmarks that measure them [10], all presuppose a task supplied from outside.
Batch computing kept initiative entirely with the human, who submitted jobs and waited. Personal computing gave initiative to direct manipulation: the human acted, the machine responded in real time. Networked and mobile computing widened the channel but preserved the structure. Conversational AI inherited the same structure in a more fluent register: the human prompts, the model answers.
We argue a further relocation is underway, and that it is the most consequential since direct manipulation, because for the first time initiative passes substantially to the machine. Current agents disguise this: they are autonomous within a task but reactive about tasks. A human still decides that a task exists, when it begins, and that it is done. Remove those three decisions and the structure changes character entirely.
Ambient agents differ from conventional agents along three axes at once:
- Persistent rather than episodic. There is no terminating "done," only a continuously maintained state.
- Participatory rather than interactive. They are a standing component of the environment they act in, not an interlocutor summoned into it.
- Oriented toward stewardship rather than execution. Their object is not the completion of a specified task but the maintenance and advancement of a principal's interests across time — most of which the principal never specifies.
An application that runs continuously, acts under inferred authority, and shapes the conditions under which its principal later chooses is less like a program than like an institution.
2From Tool Use to Computational Stewardship
The dominant frame for AI systems is the tool. A tool has four properties that together make it normatively simple: it is explicitly invoked, bounded in operation, directed by its user, and observable in execution. These properties license our intuition that responsibility for a tool's effects flows cleanly to the user who wields it. A hammer has no standing of its own; the carpenter is answerable for the wall.
Ambient agents violate all four properties, and the violations compound. When all four fail at once, the clean flow of responsibility fails with them.
Computational Steward. A persistent computational system entrusted with maintaining and advancing the goals, preferences, and decision environment of a principal under conditions of structurally incomplete supervision — conditions in which the volume of the steward's action necessarily exceeds the principal's capacity to observe it.
What replaces the tool relation is not autonomy — the agent is not acting for itself — but fiduciarity: the relationship of the trustee, the legal agent, the guardian. Its economic structure is the classical principal–agent problem, in which a principal delegates authority to an agent whose interests may diverge and whose actions cannot be fully monitored [9]. The qualifier "structurally incomplete" carries the theory. Ordinary principal–agent theory assumes supervision is costly and reasons about how much to buy [9]. Stewardship assumes supervision is impossible at scale and reasons about what can substitute for it.
When you cannot verify acts, trust must rest on something other than verification. The substitute is legitimacy — and legitimacy, not capability, is the binding constraint on ambient systems.
3The Three-Layer Theory of Ambient Agency
An ambient agent is simultaneously a cognitive participant, a political subordinate, and an ecological force. These are not modules of an architecture but three lenses on one system; each surfaces a distinct class of obligation. The cognitive lens follows the Extended Mind thesis [4] and distributed cognition [8]; the ecological lens generalizes choice architecture — the observation that how options are presented shapes which are chosen [13] — except that an ambient agent determines the menu's contents continuously and without the principal's awareness.
3.2Jurisdictional overlap & lexical hierarchy
In enterprise and shared settings, a single steward serves multiple principals at once, producing jurisdictional overlap: the optimization an organization prefers may conflict with the preferences of an individual user the same steward serves. The steward must resolve such conflicts by lexical constitutional hierarchy — a fixed, declared ordering of whose mandate prevails in which domain — rather than by opaque probabilistic weighting.
The reason is accountability, not tidiness: a lexical ordering is auditable after the fact, whereas a learned weighting that silently trades a user's interest against an employer's reproduces precisely the unaccountable discretion the theory forbids. Weighting hides the conflict; hierarchy adjudicates it in the open.
4The Trailing Principal Problem
Alignment research has largely assumed a stationary principal: a fixed target whose preferences the system must learn and satisfy — whether through reward modeling from human comparisons [3][11] or inverse approaches that infer objectives from behavior [7]. Ambient operation breaks the assumption — not because preferences are hard to learn, but because the principal changes while the agent acts, and acts and authorizations now occur on different timescales.
At any moment an ambient agent serves three temporally distinct principals at once.
Trailing Principal Problem. Given divergent temporal representations of a principal — an authorized past self, an inferred present self, and a projected future self — determine which representation ought to govern an action, when the action's authorization and its execution are separated in time and the present self is only partially observable.
This is genuinely novel because it is not a preference-learning problem. The difficulty is closer to the one Parfit raised for personal identity over time — that the self who is bound is not straightforwardly the self who committed [12] — than to any problem of reward specification. The mandate encodes something like the principal's second-order volitions, the preferences they endorse about which of their preferences should govern them [5] — precisely what a fleeting first-order impulse should not be permitted to override.
4.1The time-discounted fidelity criterion
We propose a decision rule for adjudicating between an authorized past self and an inferred present self. For a candidate action a, let u(a | P) be its value under preference state P. The steward acts according to:
The criterion encodes three commitments any defensible ambient agent must make:
- Fidelity weight (λ) — deference to the recorded mandate. Not a global constant but a function of reversibility and stakes: for cheaply reversible, low-stakes actions λ → 0 and the agent tracks the inferred present self freely; for irreversible or high-stakes actions λ → 1 and the agent defers hard to the explicit mandate, escalating rather than improvising.
- Confidence fallback (1 − c). When inference confidence is low, the present-preference term collapses back onto Pt₀ — an agent that cannot reliably read you falls back on what you actually authorized rather than on its guesses. This is the structural defense against overriding a known instruction on a noisy inference.
- Future welfare (γ). The projected-future term holds the steward to account for foreseeable harm to a future self, even when both past and present selves would have approved — the safeguard against an agent that satisfies you today at your own later expense.
We do not claim the weights are easy to set. We claim that temporal preference alignment — the principled estimation of λ, c, and γ from observable signals — is a well-posed research domain that does not yet exist, and that the criterion gives it a target to attack.
5Constitutional Delegation
Prevailing AI governance constrains actions: it enumerates what the system may not output, whether through training against stated principles [2] or the broader program of specifying safe behavior in advance [1]. Ambient systems make action-level enumeration hopeless — the action space is open and the volume unobservable — and so demand that we constrain authority instead. Where Constitutional AI gives a model a constitution governing what it says [2], we propose a constitution governing what it is permitted to do.
Rather than approving each act, the principal grants a bounded, revocable, auditable mandate within which the steward acts freely — exactly as a constitution bounds an office rather than vetting each official decision.
5.1Consent Horizons as a self-expiring grant
The standing-authorization problem — how can a principal meaningfully consent to open-ended future action they cannot foresee? — has no solution if consent is a single binary grant. Each horizon is a tuple ⟨domain, blast radius, expiry, renewal rule⟩. Within it the agent acts without further consent; at its edge it must fall silent, escalate, or trigger renewal. Presumed consent becomes legitimate because it operates only inside a horizon the principal explicitly drew, and the lapse-by-default of every horizon returns the principal to the loop without their having to remember to.
granted → active → lapsed / revoked. The double circle marks the operative state — authority exists only while active. Every horizon lapses by default, so renewal is an explicit act and silence revokes rather than extends.5.2Why the ledger is the substitute for supervision
Because supervision is structurally impossible in real time, accountability must be retrospective. Every invocation of λ, c, and γ is recorded against the action it produced, so a principal who feels over-managed can inspect why the agent deferred to or departed from their standing mandate.
Trust in an ambient agent is warranted exactly to the degree that its ledger is legible, its horizons are honored, and its non-delegable boundary holds.
6Computational Stewardship Architecture
The theory implies a reference architecture of five interacting layers. Its shape differs from the prevailing agent loop in where it places its center of gravity — not on the planner, but on the persistent world model and the constitutional gate.
7Evaluation & the Central Claim
Task agents are benchmarked on correctness and completion. Those metrics are silent on every property that matters for a steward, which may complete tasks flawlessly while quietly disenfranchising its principal. We propose five metrics, each tied to an obligation and a mechanism.
| Metric | What it measures |
|---|---|
| Stewardship Quality | Long-horizon alignment with the principal's objectives, assessed retrospectively against the Legitimacy Ledger rather than per-act. |
| Agency Preservation | The rate at which the agent surfaces genuine choices versus forecloses them — the operational form of the ecological obligation. |
| Environmental Quality | Diversity of the decision environments the agent constructs and their freedom from manipulative framing. |
| Reversibility Index | The expected systemic cost of undoing the agent's autonomous interventions — proposed as a first-class optimization target, not merely a metric. |
| Constitutional Compliance | The rate at which actions remain within the active mandate, and the integrity of the non-delegable boundary. |
Above a threshold of action volume and temporal span, principal satisfaction and trust in an agent are governed more strongly by legitimacy properties — reversibility, constitutional compliance, agency preservation — than by capability properties — task correctness, planning depth, completion rate.
Concretely: holding capability fixed and varying legitimacy will move trust more than holding legitimacy fixed and varying capability.
The experiment that would refute it. Deploy a set of ambient-agent variants against the same population over a span long enough for preference drift to occur. Hold capability fixed by sharing one underlying planning-and-action model across all variants and verifying parity on a standard task-completion benchmark. Then vary legitimacy along three manipulable factors — presence or absence of Consent Horizons, ledger legibility (full / summary / none), and the reversibility class of permitted actions. The dependent measures are principal-reported trust and a behavioral proxy for retained autonomy: the rate of un-coerced overrides and renewals.
If capability dominates legitimacy in determining trust at high action volume, the central thesis is false and ambient agency reduces to "task agents that run more often." We predict the opposite and stake the program on it.
8Implications for Agentic Software
The practical upshot is a relocation of engineering effort. The prevailing stack is Prompt → Reason → Act, and almost all investment goes into the middle term. The ambient stack is Observe → Model → Anticipate → Steward → Govern, and its weight sits at the ends.
9Objections & Limits
Intellectual honesty requires stating what would make this program fail, beyond the falsifiable claim. These limits do not undercut the program; they locate its frontier.
- The confidence term c may be unfaithful. The entire criterion rests on c being calibrated. A confidently wrong agent (c → 1) bypasses the (1 − c) fallback and is more dangerous under this framework than under a tool framework. Modern neural networks are known to be poorly calibrated by default, tending toward overconfidence [6], which makes this not a hypothetical risk but the expected failure absent deliberate countermeasures. Calibration — especially out of distribution — is a precondition, not a detail; a steward that cannot calibrate its own confidence must be forced toward λ → 1 by default.
- The legitimacy machinery may be theater. A ledger no principal ever reads, and horizons that auto-renew unexamined, reproduce the consent fictions of modern terms-of-service rather than curing them. The mechanisms are necessary but not sufficient; they require that renewal interactions be rare, meaningful, and genuinely capable of changing behavior.
- Ecological optimization may be intrinsically manipulative. Any agent that shapes an environment shapes deliberation, and the line between preserving the capacity to choose and engineering the choice may not survive contact with optimization pressure. We take this to be the deepest open question, and we do not resolve it.
10Conclusion
Agentic software as it exists today is an intermediate form: autonomous within tasks, reactive about them. Its successor will be ambient — persistent, participatory, and oriented to stewardship rather than execution. Such systems are neither tools nor autonomous entities but computational fiduciaries, exercising delegated authority over a principal's goals, attention, and environment under conditions where supervision is structurally impossible.
The scientific question is no longer how to make an agent capable of acting; capability is increasingly given. It is how to make delegated computational authority legitimate — bounded by constitution, scoped by consent horizons, accountable through legible ledgers, and faithful across a principal's own change over time.
Ambient Agency, if the central claim holds, is not a faster way to complete tasks. It is the transition from interaction-centric software to stewardship-centric systems — and with it, the first time the question for computing becomes not what a machine can do for us, but on what authority it does so.
References
- Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. Concrete Problems in AI Safety. arXiv:1606.06565.
- Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, et al. 2022. Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073.
- Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep Reinforcement Learning from Human Preferences. NeurIPS 30.
- Andy Clark and David J. Chalmers. 1998. The Extended Mind. Analysis 58, 1, 7–19.
- Harry G. Frankfurt. 1971. Freedom of the Will and the Concept of a Person. The Journal of Philosophy 68, 1, 5–20.
- Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017. On Calibration of Modern Neural Networks. ICML 70, 1321–1330.
- Dylan Hadfield-Menell, Stuart J. Russell, Pieter Abbeel, and Anca Dragan. 2016. Cooperative Inverse Reinforcement Learning. NeurIPS 29.
- Edwin Hutchins. 1995. Cognition in the Wild. MIT Press, Cambridge, MA.
- Michael C. Jensen and William H. Meckling. 1976. Theory of the Firm: Managerial Behavior, Agency Costs and Ownership Structure. Journal of Financial Economics 3, 4, 305–360.
- Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2024. SWE-bench: Can Language Models Resolve Real-World GitHub Issues? ICLR.
- Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, et al. 2022. Training Language Models to Follow Instructions with Human Feedback. NeurIPS 35, 27730–27744.
- Derek Parfit. 1971. Personal Identity. The Philosophical Review 80, 1, 3–27.
- Richard H. Thaler and Cass R. Sunstein. 2008. Nudge: Improving Decisions About Health, Wealth, and Happiness. Yale University Press, New Haven, CT.
- Mark Weiser. 1991. The Computer for the 21st Century. Scientific American 265, 3, 94–104.
- Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. ICLR.
How to cite
This site presents the working paper Ambient Agency: A Theory of Computational Stewardship for the Post-Prompt Era by Andrew Swerdlow.