Your AI agent works perfectly in a demo.
Then it reaches production — and suddenly it takes 47 steps to do a 5-step job.
Nothing is broken.
The model is strong.
The prompts are fine.
Yet no one can answer the most important question:
Why did the agent take this execution path?
This is the core failure mode of modern agentic AI systems. Not lack of intelligence — lack of structural control. In previous blog, we examined how architectural decisions shape agent behavior at scale particularly in Centralized vs. Distributed Intelligence: Designing Multi-Agent AI Systems That Scale, and this article builds directly on that foundation by focusing on execution control, determinism, and reliability.
Most agents today are built as free-form execution loops: think, act, observe, repeat. These loops are powerful, but they introduce unbounded non-determinism at the system level. The same input can lead to different plans, different tool calls, different step counts — and no reliable way to reproduce or audit the behavior.
For experimentation, this is acceptable.
For production systems — especially enterprise and regulated environments — it is not.
This article argues that non-determinism in agentic AI is fundamentally an architecture problem, not a prompting problem. Sampling controls reduce variance, but they do not constrain execution.
The solution is workflow-based agent architectures: deterministic control flow on the outside, probabilistic intelligence on the inside. LLMs operate at well-defined nodes; execution between them is explicit, bounded, and replayable.
In the sections that follow, we’ll examine where non-determinism comes from, why free-form agents amplify it, and how workflow-based patterns like DAG pipelines, state machines, and plan-and-execute systems enable production-grade agentic AI.
Before we dive in, if you'd prefer to watch rather than read, we've put together a video walkthrough — you can check it out here.
Before controlling non-determinism, we must be precise about which determinism matters. In agentic AI, multiple forms are often conflated, leading to ineffective fixes.
In classical software engineering, determinism is a property of the program, not the hardware or compiler.
Formally, a function f is deterministic if:
∀x : f(x) → y (always)
Given the same input, the output — and the execution path — are invariant.
In practice, this implies:
Example (deterministic):
def approve(amount):
if amount < 10_000:
return "AUTO_APPROVED"
else:
return "MANUAL_REVIEW"
Same input → same branch → same output → same trace.
Production software enforces determinism using structure, not intelligence:
Example: a simple workflow as a control-flow graph
[Validate] → [Enrich] → [Score] → [Decide]
Large language models are probabilistic programs. Even a single inference call is not strictly deterministic:
P(token | context) ≠ 1.0
Typical decoding introduces randomness:
response = model.generate(
prompt,
temperature=0.7,
top_p=0.9
)
Even with:
temperature = 0.0
you still face:
But this is not the real problem.
A single probabilistic function is manageable.
An agent is not a function — it is a program that writes its own next instruction.
Typical free-form agent loop:
while not done:
thought = llm.think(state)
action = llm.decide(thought)
observation = execute(action)
state = update(state, observation)
What becomes non-deterministic:
This creates execution path entropy, not just output variance.
Two runs, same input:
Run A: 8 steps → tool_x → tool_y → done
Run B: 21 steps → tool_y → reflect → tool_x → retry → done
Both are “correct.”
Neither is predictable.
Determinism is not binary — it exists on a spectrum:
Non-determinism exists in all probabilistic systems. What makes agentic AI dangerous in production is not randomness itself, but the fact that agents are long-lived, stateful, and self-directing. Small stochastic decisions compound over time into large, irreversible execution variance.
A single LLM call behaves like a mostly pure function:
An agent, by contrast, is a program:
This turns variance from a local concern into a system-wide property.
Most agent frameworks use a ReAct-style loop, but the issue is what the loop does not specify. There is no fixed upper bound on:
In classical software, unbounded loops are bugs. In agentic systems, they are often the default.
Each step introduces a branching decision. Over multiple steps, execution paths grow exponentially. The result is:
Two runs with identical inputs can diverge completely — not just in output, but in how the system behaves.
Non-determinism in agentic AI is not a single failure mode. It emerges from multiple layers of the system, each contributing its own form of variability. These layers interact and compound, which is why non-determinism becomes difficult to control once agents move beyond simple, single-step tasks.
Below is a concise taxonomy of the five primary sources of non-determinism in agentic systems.
Originates inside the model and inference stack.
Occurs when execution structure is decided at runtime.
Agents are highly sensitive to unstable context.
Introduced when agents interact with external systems.
Emerges from agent-to-agent interaction.
Workflow-based agent architectures are designed to solve the core problem outlined so far: unbounded, opaque execution in agentic systems. Rather than allowing agents to dynamically decide their next action at every step, workflows move execution control into an explicit, deterministic structure.
A workflow-based agent is an agent whose behavior is governed by a predefined execution graph, rather than a free-form loop.
Key characteristics:
In this model, the agent does not decide what to do next — it decides how to perform the current step.
Workflow-based agents exhibit system-level properties that free-form agents cannot guarantee.
Core properties
These properties shift agent behavior from emergent to engineered.
The value of workflow-based agents becomes obvious at production scale.
Operational advantages
Workflow-based architectures do not eliminate intelligence — they discipline it. By constraining how agents act, they make it possible to trust what agents do.
In enterprise environments, the need for workflow-based agent architectures is not theoretical — it is operational. At mactores, we consistently see agentic systems fail not at the reasoning layer, but at the execution layer: unclear control flow, irreproducible behavior, and the inability to explain why a system behaved a certain way.
Across regulated industries and large-scale internal platforms, free-form agent loops rarely survive first contact with requirements like auditability, SLA enforcement, or incident forensics. Workflow-based architectures — DAGs, state machines, and plan-and-execute systems — provide the structural guarantees these environments demand without eliminating the benefits of LLM-driven intelligence.
DAG-based architectures are often the cleanest and most approachable way to introduce structure into agentic systems. They replace free-form loops with a mathematically well-understood execution model: the directed acyclic graph.
At scale, this pattern behaves less like an “agent” and more like a distributed workflow engine with intelligent nodes.
A DAG-based agent is defined as a directed acyclic graph:
G = (V, E)
Where:
Key properties of the model:
This formalism alone eliminates an entire class of agent failures.
Execution in a DAG-based agent is governed by deterministic rules.
Core semantics
This is fundamentally different from agent loops, where control flow is decided dynamically at runtime.
At its core, a DAG agent executes like a workflow engine:
for node in topological_sort(graph):
if dependencies_satisfied(node):
execute(node)
Important implications of this structure:
LLMs influence what happens inside a node, not which node runs next.
To preserve determinism, LLMs inside DAG nodes must behave like pure, bounded functions.
Node design principles
This containment is critical. If an LLM is allowed to emit free-form instructions, it effectively breaks out of the DAG and reintroduces control-flow non-determinism.
DAG architectures dramatically reduce chaos, but they are not failure-proof.
Common failure modes
If DAG pipelines bring structure to what runs and when, state machine architectures bring structure to what the system is allowed to be. This pattern is the most restrictive of the workflow-based approaches — and for many production environments, that is precisely its strength.
State machines replace flexible execution graphs with explicit system states and sanctioned transitions. Nothing happens unless it is allowed.
In a state machine–driven agent, the system is always in exactly one well-defined state.
States as invariants
Transitions as contracts
This forces agent behavior to conform to a predefined lifecycle, rather than inventing one at runtime.
Formally, a finite state machine can be defined as:
S = {s1, s2, ..., sn}
T = {(s1 → s2), (s2 → s3)}
Where:
Anything not represented in T is impossible by design.
This sharply limits execution entropy.
State machine–based agents provide strong, system-level guarantees that free-form agents cannot.
Core guarantees
The agent’s “reasoning” happens inside a state — never across states.
At runtime, state transitions are enforced mechanically, not inferred.
if state == REVIEW and approved:
state = EXECUTE
Important characteristics of this pattern:
This makes state machines hostile to creativity — and extremely friendly to reliability.
Plan-and-execute architectures address a core weakness of free-form agent loops: local decision-making without global awareness. Instead of deciding the next action step-by-step, the agent first constructs an explicit plan, then executes it under deterministic control. Intelligence is front-loaded; execution is disciplined.
This pattern is especially effective for long-horizon, multi-step tasks where coherence, ordering, and predictability matter more than improvisation.
The defining feature of this architecture is a dedicated planning phase that runs before any action is taken.
What happens during planning
Execution Phase
Once a plan exists, execution becomes a controlled process.
Execution characteristics
During execution, the agent is no longer deciding what to do next. It is simply carrying out a predefined contract.
This sharply reduces execution entropy while preserving intelligent reasoning where it matters most.
Replanning as a Controlled Exception
Real-world systems encounter surprises. Plan-and-execute architectures handle this through explicit replanning, not ad-hoc adaptation.
Replanning mechanics
Crucially, replanning is an exceptional path, not a continuous loop.
Pseudo-Code Example
At a high level, the control flow is straightforward:
plan = planner(task)
for step in plan:
execute(step)
if failure:
plan = replan(context)
Key properties of this structure:
ReAct-style agents decide one step at a time. This works well for short, exploratory tasks — but breaks down as complexity increases.
Advantages of plan-and-execute
For tasks like:
Real-world agentic systems rarely fit cleanly into a single architectural pattern. Production requirements often demand both flexibility and control, which is where hybrid and composable architectures become essential. The key idea is simple: compose intelligence inside structure, never the other way around.
One common pattern is to use a DAG as the outer control structure, while allowing individual nodes to host free-form agents internally.
def dag_node(input):
return react_agent.run(input)
In this design:
Another powerful composition is embedding agents inside state machine states.
if state == ANALYZE:
result = agent.analyze(context)
state = NEXT_STATE
Here:
This pattern is especially effective when regulatory or audit constraints require explicit state progression but still benefit from intelligent decision-making within each phase.
Complex systems often require workflows inside workflows.
Examples:
result = sub_workflow.execute(input)
Nested workflows allow teams to scale complexity hierarchically without flattening everything into a single, unmanageable graph.
Hybrid architectures let you selectively loosen constraints where it’s safe, while preserving deterministic guarantees where it matters most. In the next section, we’ll look at how these structured systems can be tested, replayed, and debugged in production.
Deterministic replay is the cornerstone of reliable agent operations.
What must be captured
With these artifacts, teams can replay an agent run step-by-step, answering not just what happened, but why it happened.
Traditional unit tests are insufficient for agentic systems. Instead, teams adopt behavioral testing at the workflow level.
Effective strategies
This approach acknowledges probabilistic behavior while enforcing structural guarantees.
Observability must be designed into the agent architecture, not added afterward.
Critical signals
When agents are observable at the step level, failures become diagnosable events — not mysteries.
Workflow-based architectures dramatically reduce non-determinism, but they do not eliminate all risk. Some failure modes persist — and ignoring them leads to fragile systems.
In production systems, agentic AI rarely fails because a model “wasn’t smart enough.” It fails because the system surrounding the model could not bound, observe, or explain its behavior.
At Mactores, we encounter this pattern repeatedly when organizations attempt to operationalize agents across enterprise workflows. Early prototypes often rely on free-form agent loops, which work well in isolation but collapse under real-world constraints: compliance requirements, cost controls, latency budgets, and multi-team ownership.
Workflow-based agent architectures emerge as the practical solution. DAG pipelines provide predictable parallelism. State machines enforce explicit lifecycle guarantees. Plan-and-execute systems enable long-horizon reasoning without unbounded execution. In practice, these patterns are rarely used in isolation — they are composed, nested, and selectively relaxed where risk permits.
What matters most is not the specific pattern, but the principle: structure must exist outside the model. When control flow is explicit, and intelligence is localized, agentic systems become debuggable systems — not probabilistic experiments.
This shift is what enables agentic AI to function as infrastructure, not novelty.
Agentic AI systems fail in production not because they lack intelligence, but because they lack structure. Non-determinism is inevitable in probabilistic models — unbounded execution is not. By separating intelligence from control and adopting workflow-based architectures, teams can build agents that are observable, replayable, and production-safe without sacrificing capability.
DAGs, state machines, and plan-and-execute patterns all enforce the same principle: deterministic control flow with localized intelligence. This is how agentic systems evolve from experiments into infrastructure.
The open question is no longer whether agents can reason, but how much freedom should an intelligent system have to decide its own execution path in production? In the next article, we’ll extend this discussion by examining AI Agent Safety: The Missing Layer in Most Enterprise Deployments, and why structural control is a prerequisite, but not a substitute, for building agents that are truly safe, governable, and enterprise-ready.