The runtime pipeline, package architecture, type system, and integration points for the Agentic Legibility Stack.
This document describes the target operating model for government service delivery through AI agents. For concept definitions see the Glossary; for product design patterns see the Product & Service Design Guide.
The stack is organised as a set of focused packages with strict dependency boundaries. Every package owns one concern; cross-cutting behaviour is mediated through well-typed interfaces.
The dependency tree is rooted at the Shared Schema Layer, which has zero internal dependencies. All other components declare their internal dependencies explicitly. External SDK usage is confined to two components.
Three hard rules that govern the architecture:
The citizen-facing experience. Hosts the conversational agent, task cards, consent flows, and receipt history. Exposes an evidence API that the Legibility Studio reads from.
The department-facing admin tool. Displays service coverage, gap analysis, trace inspection, and artefact authoring. Consumes evidence over HTTP — never imports the Evidence Plane directly.
Structured artefacts define each service: a capability manifest, policy ruleset, state model, consent model, and state-specific instructions. These artefacts are the canonical description of what a service does, who can use it, and how it progresses.
The evidence store is append-only. Every significant event — state transitions, consent decisions, policy evaluations, credential presentations — is captured as an immutable trace event. The store supports full replay: given a trace ID, the system can reconstruct the exact conversation state at any point.
Personal data is stored with field-level source attribution. Every value carries metadata indicating which department sourced it, whether it is verified or citizen-submitted, and which consent grant authorised its use.
Each service in the store is described by up to five artefact files. Together, these form the complete machine-readable description of a government service.
| Artefact | Purpose | Required |
|---|---|---|
| manifest | Service identity, description, department, input schema, output schema, and lifecycle metadata. The canonical reference for what the service does. | Yes |
| policy | Eligibility rules expressed as typed conditions. Each rule declares a field, operator, expected value, and human-readable failure reason. | Yes |
| state-model | The finite-state machine defining the journey. States, transitions, guards, terminal markers, and receipt-emitting flags. | Yes |
| consent | Required and optional consent grants. Each grant declares scope (which data), purpose (why), and whether it is mandatory. | No |
| state-instructions | Per-state behavioural rules for the agent. DO/DO NOT lists that constrain the LLM’s responses at each point in the journey. | No |
Artefact authoring is the integration point for departments. A department does not need to build an API or write code. It publishes structured artefacts describing its services, and the platform handles orchestration, consent, evidence, and agent behaviour. The Legibility Studio provides a guided authoring experience for department teams.
Each component in the system owns a single, well-defined concern. This table summarises what each component does and what it must never do.
| Component | Owns | Must Never |
|---|---|---|
| Shared Schema Layer | Shared types, schema validation, type guards | Import any other component |
| LLM Adapter Layer | LLM calls, MCP client, tool dispatch | Make policy or state decisions |
| Evidence Plane | Trace events, receipts, case ledger, replay | Import LLM integration or make API calls |
| Identity Layer | User profiles, authentication, session management | Access evidence store directly |
| Legibility Engine | PolicyEvaluator, StateMachine, ConsentManager, FieldCollector, ArtefactStore | Call the LLM or make network requests |
| MCP Server | MCP server, tool generation, resource exposure | Use MCP client (that belongs to the adapter layer) |
| Personal Data Layer | Citizen data model, field sourcing, relationship store | Bypass consent checks |
| Orchestration Layer | Orchestrator, pipeline, strategy dispatch | Import LLM integration directly (must go through adapter layer) |
| Service Graph | GOV.UK service graph integration, node resolution | Store state or evidence |
| Service Store | Service storage, tiered resolution, gap analysis | Make eligibility decisions (that belongs to the Legibility Engine) |
The centrepiece of the architecture. Every citizen interaction passes through this 14-step orchestrator pipeline. Deterministic code runs before and after the language model; the platform always has the final word.
The LLM runs at step 7 out of 14. Everything before it is deterministic setup; everything after it is validation and override. The language model proposes; the platform disposes. This is the central architectural guarantee.
Every pipeline run produces a single typed output. This is the contract between the runtime and the citizen application.
interface OrchestratorOutput { response: string; reasoning: string; toolsUsed: string[]; conversationTitle: string | null; tasks: TaskEntry[]; policyResult?: { eligible: boolean; explanation: string }; handoff?: { triggered: boolean; reason: string }; serviceState?: { currentState: string; stateHistory: string[] }; consentRequests?: ConsentRequest[]; extractedFields?: FieldExtraction[]; outcomeHints?: Record<string, unknown>; serviceProposal?: { serviceId: string; serviceName: string; reason: string }; needProposal?: { need: string; services: string[] }; serviceCompletions?: Array<{ serviceId: string; status: string }>; versionMetadata: { promptHash: string; rulesetVersion: string; stateModelVersion: string }; pipelineTrace: { traceId: string; steps: PipelineStep[]; totalDurationMs: number }; }
The orchestrator does not know how a service is implemented. It delegates through a pluggable ServiceStrategy interface. Two implementations exist today; new strategies can be added without modifying the pipeline.
interface ServiceStrategy { buildTools(ctx: ServiceStrategyContext): ToolDefinition[]; buildServiceContext(ctx: ServiceStrategyContext): string | Promise<string>; dispatchToolCall(name: string, input: unknown): Promise<string>; extractStateTransitions(messages: unknown[]): StateTransitionResult[]; }
JsonServiceStrategy (deterministic, inline) — no tools are given to the LLM. Service context is built from artefacts and the field collector. All state transitions are proposed by the LLM in its structured output and validated by the state machine. Used for services with complete artefact sets.
McpServiceStrategy (tool-based) — the LLM receives check_eligibility and advance_state tools. It calls these during the agentic loop. Tool results are parsed for state transitions. Used for service-graph services that expose department APIs.
Both strategies produce identical OrchestratorOutput. The citizen application cannot distinguish which strategy was used. This is by design: the contract is stable regardless of implementation.
Step 7 — the agentic loop — deserves closer examination. It is the only step where the language model runs, and its behaviour is tightly bounded.
The iteration limit is a hard ceiling, not a target. Most conversations complete in 1–2 iterations. The limit exists to bound worst-case latency and cost. If a service consistently requires 4+ iterations, it indicates the artefacts need refinement — more context in the state instructions reduces the LLM’s need for exploratory tool calls.
Every pipeline run produces a trace that records the duration and outcome of each step. This is the primary debugging and performance monitoring tool.
interface PipelineStep { name: string; durationMs: number; outcome: "success" | "skipped" | "error"; metadata?: Record<string, unknown>; }
In practice, steps 1–6 and 8–14 are pure computation and complete in negligible time. Step 7 (LLM call) accounts for the vast majority of total pipeline duration. The trace makes it immediately visible when a slow pipeline run is caused by LLM latency versus application logic.
The core of the deterministic layer. Four classes enforce policy, manage state, handle consent, and collect citizen data — all without any LLM involvement.
Evaluates a service’s eligibility rules against a citizen’s data. The evaluator iterates each rule, evaluates conditions using typed operators, and returns a structured result. No fuzzy logic, no LLM interpretation — rules either pass or fail.
class PolicyEvaluator { evaluate( ruleset: PolicyRuleset, context: Record<string, unknown> ): PolicyResult }
The evaluation logic iterates rules, evaluates conditions using operators: >=, <=, ==, !=, exists, not-exists, in. Each rule returns a pass/fail result with the original description as an explanation.
interface PolicyRuleset { service_id: string; version: string; rules: PolicyRule[]; } interface PolicyRule { id: string; description: string; condition: { field: string; operator: ">=" | "<=" | "==" | "!=" | "exists" | "not-exists" | "in"; value?: unknown; }; reason_if_failed: string; edge_case?: boolean; } interface PolicyResult { eligible: boolean; passed: PolicyRule[]; failed: PolicyRule[]; edgeCases: PolicyRule[]; explanation: string; }
A deterministic finite-state machine constructed from a service’s StateModelDefinition. It enforces the legal transitions for a service journey, prevents the LLM from skipping steps, and identifies terminal and receipt-emitting states.
class StateMachine { constructor(definition: StateModelDefinition) getState(): string allowedTransitions(): Array<{ to: string; trigger?: string }> transition(trigger: string): TransitionResult isTerminal(): boolean setState(stateId: string): void }
Key concepts: initial state (where every journey begins), terminal states (completed, rejected, handed-off), transition guards (condition fields that must be satisfied before a transition is allowed), and receipt-emitting states (states that trigger an immutable receipt on entry).
A typical service journey. The state model defines a linear path with a branch point at eligibility, where the journey may diverge to rejection or human handoff.
Manages the citizen’s consent decisions for a service. Each service defines a ConsentModel with required and optional grants. The manager tracks which grants have been presented, accepted, or declined, and gates progress accordingly.
class ConsentManager { getRequiredGrants(): ConsentGrant[] getOptionalGrants(): ConsentGrant[] recordDecision( grantId: string, granted: boolean ): ConsentDecision allRequiredGranted(): boolean }
Consent is never assumed or inferred. The platform renders consent cards (deterministic task injection at pipeline step 11) and records the citizen’s explicit decision. If a required grant is declined, the journey cannot proceed past the consent state.
Tracks the data fields required by a service, pre-fills from persona data where available, and records new fields extracted from conversation. The collector tells the prompt layer what is still missing, enabling the LLM to ask targeted questions.
class FieldCollector { seed(personaData: Record<string, unknown>): void recordField( key: string, value: unknown, source: string ): void getMissing(): string[] isComplete(): boolean }
Loads and caches structured artefacts from the service registry. Each service’s artefact set is loaded once and held in memory for the duration of the session.
class ArtefactStore { loadFromRegistry(serviceId: string): Promise<number> fetchArtefacts(serviceId: string): Promise<ServiceArtefacts> getServiceArtefacts(serviceId: string): ServiceArtefacts | null }
interface ServiceArtefacts { manifest: CapabilityManifest; policy?: PolicyRuleset; consent?: ConsentModel; stateModel?: StateModelDefinition; stateInstructions?: StateInstructions; }
The five artefact types form a complete service description. The manifest declares identity and input schema. The policy defines eligibility rules. The state model defines the journey. The consent model defines data sharing permissions. The state instructions provide per-state behavioural guidance for the agent.
The PolicyEvaluator supports seven operators for rule conditions. Each operator has strict type semantics — the evaluator does not perform type coercion.
| Operator | Semantics | Example |
|---|---|---|
| >= | Field value is greater than or equal to the specified value. Numeric comparison only. | age >= 16 |
| <= | Field value is less than or equal to the specified value. Numeric comparison only. | age <= 70 |
| == | Field value is strictly equal to the specified value. String or numeric. | jurisdiction == “England” |
| != | Field value is not equal to the specified value. | status != “disqualified” |
| exists | Field is present and non-null. Value parameter is ignored. | driving_licence_number exists |
| not-exists | Field is absent or null. Value parameter is ignored. | ban_end_date not-exists |
| in | Field value is one of the specified array values. | licence_type in [“full”, “provisional”] |
Edge cases are distinct from failures. A rule with edge_case: true does not cause outright ineligibility. Instead, it flags the citizen for potential handoff to a human adviser. This handles ambiguous situations — a citizen who might be eligible but needs human judgement. The agent communicates uncertainty rather than making a binary decision.
Transitions can carry guard conditions that must be satisfied before the transition is allowed. Guards prevent premature advancement — a citizen cannot reach “payment-made” without first passing through “details-confirmed”.
{ "from": "eligibility-checked", "to": "consent-given", "trigger": "grant_consent", "guard": { "condition": "policy_result.eligible == true", "message": "Cannot proceed: citizen is not eligible for this service." } }
The state machine evaluates guards synchronously. If a guard fails, the transition is rejected and the machine remains at its current state. The guard’s message is available to the agent for explaining why the journey cannot proceed.
Every significant action produces an immutable trace event. The evidence plane enables audit, replay, and accountability across all citizen interactions.
The primary interface for writing events into the evidence store. Supports span-based tracing: a span groups related events (e.g., all events within a single pipeline run) under a common span ID.
class TraceEmitter { startSpan(opts: SpanOptions): SpanContext emit( type: TraceEventType, span: SpanContext, payload: Record<string, unknown> ): Promise<TraceEvent> endSpan( span: SpanContext, type: TraceEventType, payload: Record<string, unknown> ): Promise<TraceEvent> }
interface TraceEvent { id: string; traceId: string; spanId: string; parentSpanId?: string; timestamp: string; type: TraceEventType; payload: Record<string, unknown>; metadata: { userId?: string; sessionId: string; capabilityId?: string; }; }
Every event is classified by type. Events marked as ledger-updating are written to the case ledger in addition to the trace store, making them available for case management and replay.
| Type | Description | Updates Ledger |
|---|---|---|
| state.transition | State machine moved to a new state | Yes |
| consent.granted | Citizen approved data sharing | Yes |
| consent.denied | Citizen declined data sharing | Yes |
| policy.evaluated | Eligibility rules checked | Yes |
| handoff.initiated | Escalation to human adviser triggered | Yes |
| capability.invoked | Department service called | Yes |
| llm.request | Language model call initiated | Yes |
| llm.response | Language model response received | Yes |
| credential.presented | Verifiable proof submitted | Yes |
| receipt.issued | Outcome documented with immutable receipt | Yes |
| field.extracted | Fact parsed from conversation | No |
| error.occurred | System error logged | No |
Generates immutable receipts from trace events. Receipts are the citizen-facing evidence that an action was taken. They include a reference to the service, the action performed, the outcome, and which data was shared.
class ReceiptGenerator { generate(event: TraceEvent): Receipt } interface Receipt { id: string; capabilityId: string; action: string; outcome: string; timestamp: string; dataShared?: string[]; }
The case store maintains a ledger of in-flight and completed cases. It is updated from trace events that carry ledger-updating types. Each case aggregates the full history of a citizen’s interaction with a specific service, including state transitions, consent decisions, and receipts.
class CaseStore { getCase(caseId: string): CaseRecord | null listByUser(userId: string): CaseRecord[] listByService(serviceId: string): CaseRecord[] updateFromEvent(event: TraceEvent): void } interface CaseRecord { id: string; userId: string; serviceId: string; currentState: string; stateHistory: string[]; consentDecisions: ConsentDecision[]; receipts: Receipt[]; createdAt: string; updatedAt: string; status: "active" | "completed" | "handed-off" | "rejected"; }
The case store is a derived view — it is built entirely from trace events. If the case store is lost, it can be reconstructed by replaying events from the evidence store. This makes the evidence store the single source of truth.
The replay engine reconstructs full conversation state from trace events. Given a trace ID, it steps through events in chronological order, rebuilding state machine position, consent decisions, collected fields, and the complete message history.
class ReplayEngine { loadTrace(traceId: string): Promise<ReplaySession> stepForward(session: ReplaySession): ReplayFrame stepBackward(session: ReplaySession): ReplayFrame jumpToEvent(session: ReplaySession, eventId: string): ReplayFrame } interface ReplayFrame { eventIndex: number; totalEvents: number; currentState: string; collectedFields: Record<string, unknown>; consentState: Record<string, boolean>; event: TraceEvent; }
Replay supports three use cases: (1) Audit — regulators or complaints teams step through a citizen’s journey to verify that rules were followed. (2) Debugging — engineers identify where a journey diverged from expected behaviour. (3) Dispute resolution — when a citizen contests an outcome, the replay provides incontrovertible evidence of what happened and why.
A typical pipeline run produces the following sequence of trace events.
A unified data model for citizen information with field-level provenance, tiered trust levels, and delegation-aware access control.
Every data field carries a trust tier. The tier determines how the field can be used, whether it requires consent to share, and whether the citizen can edit it.
Immutable, government-sourced data. Confirmed by a department system of record. The citizen cannot edit these fields — only the source department can update them. Examples: National Insurance number, driving licence number, passport details.
Citizen-entered data that has not been verified against a department source. The citizen can edit these fields at any time. Examples: contact preferences, correspondence address, accessibility needs.
Agent-derived data, flagged as such. Extracted from conversation context or computed from other fields. Always shown with an “inferred” label so the citizen knows its provenance. Examples: estimated income bracket, likely household composition.
Every field in the citizen data model carries source metadata. This enables the platform to show exactly where each piece of data came from and which consent grant authorised its use.
interface FieldSource { source: string; // "HMRC", "DVLA", "Home Office" tier: "verified" | "submitted" | "inferred"; topic: string; // "identity", "finance", "employment" }
Manages delegation relationships: parents acting for children, power of attorney holders acting for vulnerable adults, executors managing a deceased person’s affairs. The store enforces scoped permissions and logs every delegated action for audit.
class RelationshipStore { canActOnBehalf( actorId: string, subjectId: string, scope: PermissionScope ): boolean getPermissions( actorId: string, subjectId: string ): PermissionScope[] getSubjects( actorId: string ): Array<{ userId: string; type: RelationshipType }> logDelegatedAction( action: DelegatedAction ): void }
| Relationship | Default Permissions |
|---|---|
| parent_of | full_authority |
| guardian_of | full_authority |
| executor_of | view_data, submit_on_behalf, make_payments, correspond |
| attorney_of | view_data, submit_on_behalf, manage_consent, make_payments, correspond |
| carer_of | view_status, receive_alerts |
| spouse | view_status, receive_alerts |
| representative_of | view_status, view_data, correspond |
Delegation is always scoped. A power of attorney holder can submit forms and manage consent, but a carer can only view status and receive alerts. The RelationshipStore enforces these boundaries at the API level — the agent cannot bypass them through conversational persuasion.
The citizen data model aggregates fields from multiple departments into a single profile. Each field carries its source metadata, enabling the platform to track provenance across department boundaries.
{ "identity": { "full_name": { "value": "Margaret Chen", "tier": "verified", "source": "HMRC" }, "date_of_birth": { "value": "1958-03-14", "tier": "verified", "source": "HMRC" }, "ni_number": { "value": "QQ123456C", "tier": "verified", "source": "HMRC" } }, "contact": { "address": { "value": "12 Oak Lane, Bristol", "tier": "submitted", "source": "citizen" }, "email": { "value": "m.chen@email.com", "tier": "submitted", "source": "citizen" } }, "finance": { "annual_income": { "value": 28400, "tier": "verified", "source": "HMRC" }, "estimated_savings": { "value": "low", "tier": "inferred", "source": "agent" } } }
The profile is organised by topic: identity, contact, finance, employment, health, transport, housing. Each topic maps to one or more source departments. When a service requests a field, the platform checks the consent model to determine whether the citizen has granted access to that topic from that source.
When a citizen consents to share data across departments, the platform mediates the exchange. The flow is always explicit: the citizen sees which fields will be shared, which department will receive them, and for what purpose.
The system prompt is not a monolith. It is assembled from layered fragments, each serving a specific purpose. Static layers are cached; dynamic layers change every turn.
In journey mode (service in progress), the system prompt is built from thirteen layers. Each layer adds context that shapes the agent’s behaviour for the current turn.
1. Agent personality // conservative / expansive discovery 2. Persona communication // Plain English / formal / assisted 3. Scenario context // Service domain and background 4. Persona data // Verified fields from citizen profile 5. Policy evaluation // Eligibility results 6. Fact extraction // Instructions for parsing citizen input 7. State model context // Current state + allowed transitions 8. State instructions // Per-state DO/DO NOT rules 9. Field collector status // Required / collected / missing 10. Consent requirements // Grants needed for this state 11. Accuracy guardrails // Factual grounding rules 12. Task format // Card type specifications 13. Structured output // JSON block schema
Each layer occupies a portion of the token budget. The layers are ordered by stability: static layers first, dynamic layers last.
Triage mode is lighter, omitting layers 4, 5, 7, 8, 9, and 10. The total token budget varies by model and configuration.
The system prompt is split into a static prefix and a dynamic suffix. The static prefix (agent personality, persona style, scenario context, guardrails) is marked with a cache control directive. The dynamic suffix (state context, field collector, extracted facts) changes every turn.
Cache hit rate: The static prefix is tagged with a cache control directive supported by the LLM provider. When the citizen remains within the same service journey, the prefix is reused from cache. This yields significant token savings on repeated turns within a journey. The savings compound across multi-turn conversations, which typically run 8–12 turns.
The language model must produce a JSON block at the end of every response. The orchestrator parses this block at pipeline step 8. The schema enforces a consistent contract between the LLM’s output and the deterministic validation that follows.
{ "title": "Renew your driving licence", "tasks": [ { "type": "form", "title": "Confirm your details", "fields": ["full_name", "date_of_birth", "address"] } ], "transition": "details-confirmed", "facts": [ { "key": "preferred_contact", "value": "email" } ] }
The transition field is validated against the state machine (step 10). If the LLM proposes an illegal transition, it is silently dropped and the state machine remains at its current position. The tasks array may be overridden entirely by deterministic task injection (step 11) at certain states.
The agent operates in two modes. Mode selection happens at pipeline step 4 and determines which prompt layers are included and which tools are available.
No service context. The agent identifies the citizen’s need and proposes one or more services. Prompt layers 4, 5, 7, 8, 9, and 10 are omitted. No state machine, no field collector, no consent. The agent works from the full service catalogue to find the best match.
Service selected and artefacts loaded. All 13 prompt layers are active. The state machine tracks progress, the field collector tracks data, and the consent manager gates access. The agent operates within the boundaries defined by the service’s artefacts.
The agent personality layer (layer 1) defines the agent’s behavioural profile. Two personalities are currently defined, each tuned for different citizen needs.
| Personality | Behaviour | Best For |
|---|---|---|
| Conservative discovery | Cautious, step-by-step, asks for explicit confirmation before advancing. Never skips questions, even when data is already known. Prioritises citizen understanding over speed. | Vulnerable users, complex services, high-stakes decisions (benefits, legal) |
| Expansive discovery | Proactive, efficient, pre-fills forms and auto-submits where possible. Explains what it has done rather than asking permission first. Prioritises speed while maintaining accuracy. | Routine renewals, confident users, repeat interactions |
Personality does not affect deterministic behaviour. Both conservative and expansive discovery modes follow the same state machine, the same policy rules, and the same consent requirements. The personality only affects how the agent communicates — whether it asks before acting or acts before explaining. The platform’s guarantees hold regardless of personality.
Two strategies for connecting services to the runtime, a tool generation pipeline, and a three-tier service store with gap analysis.
deterministic
No tools given to the LLM. Service context is built entirely from structured artefacts and the field collector. All state transitions are proposed by the LLM in its structured output and validated by the orchestrator’s state machine.
Best for: Services with complete, hand-crafted artefact sets. Maximum determinism, minimum LLM autonomy.
tool-based
The LLM receives check_eligibility and advance_state tools via the adapter. It calls these during the agentic loop (step 7). Tool results are parsed for state transitions and fed back into the validation pipeline.
Best for: Service-graph services that expose department APIs. The LLM has more autonomy but is still bounded by the state machine.
The server generates a standard set of integration points per service from its structured artefacts. This ensures a consistent interface regardless of the underlying department system.
Services come from three sources, with descending levels of artefact completeness. The store applies precedence rules: a fully described service always overrides a graph entry, which overrides a catalogue entry.
Hand-crafted services with complete artefact sets: manifest, policy, state model, consent, and state instructions. Highest precedence. These are the services that work end-to-end through the agent.
Service-graph nodes with metadata, department attribution, and partial artefacts. Sufficient for triage and eligibility checking, but not for full journey orchestration.
The current GOV.UK service register contains approximately 1,500 services with names and department attribution. Used for service discovery during triage. No artefacts — the agent can identify the service but cannot orchestrate a journey.
The Legibility Studio computes per-department coverage by comparing full (Tier 1) services against the total catalogue. This surfaces exactly which services need artefact authoring and in what priority order.
Coverage formula: For each department, gap analysis computes full_count / total_count. A department with 3 fully described services out of 47 total has 6.4% coverage. The studio ranks departments by total service volume and highlights the highest-impact gaps — the services that appear in the most citizen life events.
Integration teams should start with high-volume, cross-departmental services: bereavement notifications, child benefit claims, driving licence renewals. These appear in multiple life events and benefit the most citizens when made fully legible to agents.
Departments author artefacts through the Legibility Studio or by editing structured artefacts directly. The following sequence represents the recommended authoring order.
The capability manifest is the entry point for any service. Here is the structure for a typical renewal service.
{ "id": "dvla-renew-driving-licence", "name": "Renew a driving licence", "department": "Driver and Vehicle Licensing Agency", "description": "Renew a photocard driving licence that is expiring or has expired.", "version": "1.0.0", "input_schema": { "required": [ "full_name", "date_of_birth", "driving_licence_number", "national_insurance_number", "address", "photo" ], "optional": ["email", "phone"] }, "output_schema": { "produces": ["application_reference", "expected_delivery_date"] }, "estimated_duration": "10 minutes", "fee": { "amount": 14.00, "currency": "GBP" } }
The input schema drives the field collector. When the orchestrator loads a service, it seeds the FieldCollector from the manifest’s input_schema.required array. Fields already present in the citizen’s profile are marked as collected; the remainder become the “missing fields” list that the agent uses to ask targeted questions.
Four assurance layers ensure that artefacts are correct, the pipeline behaves deterministically, model changes are governed, and compliance requirements are met continuously.
Before a department’s artefacts enter the service registry, they pass through automated validation and simulation.
The deterministic pipeline is tested independently of any LLM provider. Because steps 1–6 and 8–14 are pure computation, they can be verified exhaustively against known inputs and expected outputs.
| Assurance Area | Method | Coverage |
|---|---|---|
| Policy evaluation | Deterministic tests against all seven operators with boundary conditions. Each operator is tested with valid, invalid, missing, and edge-case inputs. | All rule types and operators |
| State machine transitions | Every defined transition, guard, and terminal state is exercised. Illegal transitions are verified to be rejected. Auto-transitions and forced transitions are validated. | Complete state graph per service |
| Consent enforcement | Verification that required grants block progress when declined, optional grants do not, and all decisions are recorded in the evidence store. | All consent models |
| Evidence integrity | Trace events are verified for completeness (every pipeline run produces a span), ordering (events are chronological), and immutability (events cannot be modified after creation). | All event types |
When the underlying language model is changed or updated, the platform runs a structured evaluation process before the new model enters production.
Automated checks run continuously to verify that the platform meets its policy and consent obligations.
Four continuous compliance checks:
The platform handles sensitive citizen data across department boundaries. Security, data residency, and governance are architectural concerns, not afterthoughts.
Citizen identity is established through integration with GOV.UK One Login, the cross-government identity platform. The agent does not manage credentials directly — authentication is delegated to One Login, and the platform receives a verified identity token. Session management uses standard secure token patterns with appropriate expiry and refresh mechanisms.
All citizen data is processed and stored within UK jurisdiction. The platform infrastructure is hosted in UK data centres. No citizen personally identifiable information (PII) leaves UK borders during processing, storage, or transit. Department data shared through the platform remains subject to each department’s existing data governance framework.
In transit: all communications between the citizen application, the platform services, and department APIs use TLS 1.2 or higher. At rest: the evidence store, citizen data profiles, and consent records are encrypted using AES-256. Encryption keys are managed through a dedicated key management service with automatic rotation.
Citizen PII sent to the language model is processed in-session only. The platform’s contract with LLM providers stipulates that citizen data is not used for model training, is not logged beyond the session, and is not accessible to the provider’s staff. The LLM Adapter Layer enforces this boundary — it controls exactly which citizen data fields are included in prompts, and strips sensitive fields (financial details, health records) unless explicitly required by the current service and covered by a consent grant.
The Consent Manager is a hard gate, not advisory. When a service requires data that the citizen has not consented to share, the pipeline halts. The LLM cannot override consent decisions through conversational persuasion or prompt injection. Consent decisions are recorded as immutable trace events and are available for audit at any time. Consent can be withdrawn, at which point the platform ceases to share the affected data fields with the relevant service.
Every significant action is captured in the evidence store as an immutable trace event. The evidence store is append-only — events cannot be modified or deleted after creation. This provides a complete, tamper-evident record of every policy evaluation, consent decision, state transition, LLM interaction, and receipt issued. The replay engine can reconstruct the exact state of any citizen interaction at any point in time.
The architecture’s separation of deterministic and AI layers provides structural protection against prompt injection. Even if a malicious input manipulates the LLM’s response, the deterministic pipeline (steps 8–14) validates all outputs: state transitions must be legal, consent must be granted, policy rules must pass. The LLM proposes; the platform disposes.
The RelationshipStore enforces scoped permissions for delegated access. A power of attorney holder, parent, or carer can act on behalf of another citizen only within their declared permission scope. Every delegated action is logged separately in the evidence store, recording both the actor and the subject. Permission boundaries are enforced at the API level and cannot be circumvented through the conversational interface.
The platform is designed to degrade gracefully. Every failure mode has a defined handling strategy that preserves citizen trust and data integrity.
The platform always has a safe fallback. No failure mode results in a citizen being left without guidance, losing their progress, or having data silently dropped. The worst case is a degraded experience with human support — never a broken one.
The architecture enforces a single principle: the language model generates conversation; deterministic code enforces rules. Every integration point, every package boundary, and every type signature exists to maintain that separation.