Technical Reference

Architecture & Integration Guide

The runtime pipeline, package architecture, type system, and integration points for the Agentic Legibility Stack.

This document describes the target operating model for government service delivery through AI agents. For concept definitions see the Glossary; for product design patterns see the Product & Service Design Guide.

1

System Architecture

The stack is organised as a set of focused packages with strict dependency boundaries. Every package owns one concern; cross-cutting behaviour is mediated through well-typed interfaces.

Package Dependency Graph

The dependency tree is rooted at the Shared Schema Layer, which has zero internal dependencies. All other components declare their internal dependencies explicitly. External SDK usage is confined to two components.

Shared Schema Layer
Leaf — no internal dependencies
LLM Adapter Layer
+ LLM provider SDK
+ MCP client (Model Context Protocol)
Evidence Plane
Identity Layer
Service Graph
Legibility Engine
+ Service Store, Evidence Plane
Personal Data Layer
+ Identity Layer, Evidence Plane
Service Store
+ Service Graph, Evidence Plane
Orchestration Layer
+ Legibility Engine
MCP Server
+ Legibility Engine
+ MCP server (Model Context Protocol)

Architectural Rules

Three hard rules that govern the architecture:

  1. LLM integration is confined to the LLM Adapter Layer ONLY. Zero direct LLM SDK usage elsewhere in the system. All LLM calls route through the adapter interface.
  2. Model Context Protocol (MCP) client usage in the LLM Adapter Layer, server usage in the MCP Server. No other component touches MCP directly.
  3. The Legibility Studio fetches evidence via HTTP. It does not import the Evidence Plane directly. This enforces a clean separation between the department-facing and citizen-facing surfaces.

Two Applications

Citizen App

The citizen-facing experience. Hosts the conversational agent, task cards, consent flows, and receipt history. Exposes an evidence API that the Legibility Studio reads from.

Legibility Studio

The department-facing admin tool. Displays service coverage, gap analysis, trace inspection, and artefact authoring. Consumes evidence over HTTP — never imports the Evidence Plane directly.

Data Layer

Structured artefacts define each service: a capability manifest, policy ruleset, state model, consent model, and state-specific instructions. These artefacts are the canonical description of what a service does, who can use it, and how it progresses.

The evidence store is append-only. Every significant event — state transitions, consent decisions, policy evaluations, credential presentations — is captured as an immutable trace event. The store supports full replay: given a trace ID, the system can reconstruct the exact conversation state at any point.

Personal data is stored with field-level source attribution. Every value carries metadata indicating which department sourced it, whether it is verified or citizen-submitted, and which consent grant authorised its use.

Artefact Structure

Each service in the store is described by up to five artefact files. Together, these form the complete machine-readable description of a government service.

Artefact Purpose Required
manifest Service identity, description, department, input schema, output schema, and lifecycle metadata. The canonical reference for what the service does. Yes
policy Eligibility rules expressed as typed conditions. Each rule declares a field, operator, expected value, and human-readable failure reason. Yes
state-model The finite-state machine defining the journey. States, transitions, guards, terminal markers, and receipt-emitting flags. Yes
consent Required and optional consent grants. Each grant declares scope (which data), purpose (why), and whether it is mandatory. No
state-instructions Per-state behavioural rules for the agent. DO/DO NOT lists that constrain the LLM’s responses at each point in the journey. No

Artefact authoring is the integration point for departments. A department does not need to build an API or write code. It publishes structured artefacts describing its services, and the platform handles orchestration, consent, evidence, and agent behaviour. The Legibility Studio provides a guided authoring experience for department teams.

Package Responsibility Summary

Each component in the system owns a single, well-defined concern. This table summarises what each component does and what it must never do.

Component Owns Must Never
Shared Schema Layer Shared types, schema validation, type guards Import any other component
LLM Adapter Layer LLM calls, MCP client, tool dispatch Make policy or state decisions
Evidence Plane Trace events, receipts, case ledger, replay Import LLM integration or make API calls
Identity Layer User profiles, authentication, session management Access evidence store directly
Legibility Engine PolicyEvaluator, StateMachine, ConsentManager, FieldCollector, ArtefactStore Call the LLM or make network requests
MCP Server MCP server, tool generation, resource exposure Use MCP client (that belongs to the adapter layer)
Personal Data Layer Citizen data model, field sourcing, relationship store Bypass consent checks
Orchestration Layer Orchestrator, pipeline, strategy dispatch Import LLM integration directly (must go through adapter layer)
Service Graph GOV.UK service graph integration, node resolution Store state or evidence
Service Store Service storage, tiered resolution, gap analysis Make eligibility decisions (that belongs to the Legibility Engine)
2

The Runtime Pipeline

The centrepiece of the architecture. Every citizen interaction passes through this 14-step orchestrator pipeline. Deterministic code runs before and after the language model; the platform always has the final word.

1
Policy evaluation deterministic
Evaluates PolicyRuleset against persona data via PolicyEvaluator. Returns eligible/ineligible with reasons. Rules are pure data — no code execution.
2
State machine setup deterministic
Creates StateMachine from StateModelDefinition. Restores current state from client session. Computes allowed transitions.
3
Field collector deterministic
Seeds FieldCollector from the manifest input schema and persona data. Tracks required versus collected fields, and identifies what is still missing.
4
Agent selection deterministic
Selects triage mode (no service context, need identification) or journey mode (service in progress, artefact-driven). The choice determines prompt composition and available tools.
5
Strategy context and system prompt deterministic
Calls ServiceStrategy.buildServiceContext(). Assembles the system prompt from layered fragments: personality, persona style, scenario, artefacts, guardrails, and output format.
6
Build tools deterministic
Calls ServiceStrategy.buildTools(). The deterministic strategy returns an empty array; the tool-based strategy returns service-specific check and advance tools.
7
Agentic loop ai
Up to 5 iterations: call the language model, dispatch any tool calls via the adapter, accumulate results. This is the only step where the LLM generates content. The loop terminates when no tool calls remain or the iteration limit is reached.
8
Parse structured output deterministic
Extracts the JSON block from the LLM response. Parses conversation title, tasks, proposed state transitions, and extracted facts. Malformed output is caught and logged.
9
Build tasks deterministic
Transforms parsed task descriptions into typed TaskEntry objects with structured fields: type, title, description, options, metadata.
10
State transitions deterministic
Validates the LLM’s proposed transition against the StateMachine. Rejects illegal transitions. Applies forced and auto transitions defined in the state model.
11
Deterministic task injection deterministic
Overrides LLM-generated tasks with platform-generated cards at specific states. Consent cards, payment forms, and document upload prompts are always rendered by the platform, never by the LLM.
12
Consent requests deterministic
Surfaces consent grant cards at the eligibility-checked state. Required grants must be accepted to proceed; optional grants can be declined without blocking progress.
13
Handoff detection deterministic
Evaluates safeguarding triggers, edge-case conditions, and escalation rules. When a handoff is triggered, the conversation is routed to a human adviser with full context preserved.
14
Version metadata and trace platform
Hashes the system prompt, stamps ruleset and state model versions, builds the full pipeline trace with per-step timing. The trace is written to the evidence store for audit and replay.

The LLM runs at step 7 out of 14. Everything before it is deterministic setup; everything after it is validation and override. The language model proposes; the platform disposes. This is the central architectural guarantee.

OrchestratorOutput

Every pipeline run produces a single typed output. This is the contract between the runtime and the citizen application.

Interface OrchestratorOutput
interface OrchestratorOutput {
  response: string;
  reasoning: string;
  toolsUsed: string[];
  conversationTitle: string | null;
  tasks: TaskEntry[];
  policyResult?: { eligible: boolean; explanation: string };
  handoff?: { triggered: boolean; reason: string };
  serviceState?: { currentState: string; stateHistory: string[] };
  consentRequests?: ConsentRequest[];
  extractedFields?: FieldExtraction[];
  outcomeHints?: Record<string, unknown>;
  serviceProposal?: {
    serviceId: string;
    serviceName: string;
    reason: string
  };
  needProposal?: { need: string; services: string[] };
  serviceCompletions?: Array<{ serviceId: string; status: string }>;
  versionMetadata: {
    promptHash: string;
    rulesetVersion: string;
    stateModelVersion: string
  };
  pipelineTrace: {
    traceId: string;
    steps: PipelineStep[];
    totalDurationMs: number
  };
}

Service Strategy Pattern

The orchestrator does not know how a service is implemented. It delegates through a pluggable ServiceStrategy interface. Two implementations exist today; new strategies can be added without modifying the pipeline.

Interface ServiceStrategy
interface ServiceStrategy {
  buildTools(ctx: ServiceStrategyContext): ToolDefinition[];
  buildServiceContext(ctx: ServiceStrategyContext): string | Promise<string>;
  dispatchToolCall(name: string, input: unknown): Promise<string>;
  extractStateTransitions(messages: unknown[]): StateTransitionResult[];
}

JsonServiceStrategy (deterministic, inline) — no tools are given to the LLM. Service context is built from artefacts and the field collector. All state transitions are proposed by the LLM in its structured output and validated by the state machine. Used for services with complete artefact sets.

McpServiceStrategy (tool-based) — the LLM receives check_eligibility and advance_state tools. It calls these during the agentic loop. Tool results are parsed for state transitions. Used for service-graph services that expose department APIs.

Both strategies produce identical OrchestratorOutput. The citizen application cannot distinguish which strategy was used. This is by design: the contract is stable regardless of implementation.

Agentic Loop Detail

Step 7 — the agentic loop — deserves closer examination. It is the only step where the language model runs, and its behaviour is tightly bounded.

Iteration 1: Initial LLM call
System prompt + conversation history + tools sent to the language model. The model responds with text and optionally one or more tool calls.
Tool dispatch
Each tool call is dispatched through ServiceStrategy.dispatchToolCall(). Results are appended to the message history as tool result blocks.
Iteration 2: Follow-up LLM call
If tool calls were made, the model is called again with the updated history. It may make further tool calls or produce its final response.
Iteration limit
Maximum 5 iterations. If the model is still making tool calls after 5 rounds, the loop terminates and the last text response is used. This prevents runaway tool call chains.
Auto-submit detection
If the model’s response contains a task that can be auto-submitted (pre-filled form with all fields present), the orchestrator may skip the citizen confirmation step to reduce friction.

The iteration limit is a hard ceiling, not a target. Most conversations complete in 1–2 iterations. The limit exists to bound worst-case latency and cost. If a service consistently requires 4+ iterations, it indicates the artefacts need refinement — more context in the state instructions reduces the LLM’s need for exploratory tool calls.

Pipeline Trace Structure

Every pipeline run produces a trace that records the duration and outcome of each step. This is the primary debugging and performance monitoring tool.

Interface PipelineStep
interface PipelineStep {
  name: string;
  durationMs: number;
  outcome: "success" | "skipped" | "error";
  metadata?: Record<string, unknown>;
}

In practice, steps 1–6 and 8–14 are pure computation and complete in negligible time. Step 7 (LLM call) accounts for the vast majority of total pipeline duration. The trace makes it immediately visible when a slow pipeline run is caused by LLM latency versus application logic.

3

Legibility Package

The core of the deterministic layer. Four classes enforce policy, manage state, handle consent, and collect citizen data — all without any LLM involvement.

PolicyEvaluator

Evaluates a service’s eligibility rules against a citizen’s data. The evaluator iterates each rule, evaluates conditions using typed operators, and returns a structured result. No fuzzy logic, no LLM interpretation — rules either pass or fail.

Class PolicyEvaluator
class PolicyEvaluator {
  evaluate(
    ruleset: PolicyRuleset,
    context: Record<string, unknown>
  ): PolicyResult
}

The evaluation logic iterates rules, evaluates conditions using operators: >=, <=, ==, !=, exists, not-exists, in. Each rule returns a pass/fail result with the original description as an explanation.

Interface PolicyRuleset
interface PolicyRuleset {
  service_id: string;
  version: string;
  rules: PolicyRule[];
}

interface PolicyRule {
  id: string;
  description: string;
  condition: {
    field: string;
    operator: ">=" | "<=" | "==" | "!=" | "exists" | "not-exists" | "in";
    value?: unknown;
  };
  reason_if_failed: string;
  edge_case?: boolean;
}

interface PolicyResult {
  eligible: boolean;
  passed: PolicyRule[];
  failed: PolicyRule[];
  edgeCases: PolicyRule[];
  explanation: string;
}

StateMachine

A deterministic finite-state machine constructed from a service’s StateModelDefinition. It enforces the legal transitions for a service journey, prevents the LLM from skipping steps, and identifies terminal and receipt-emitting states.

Class StateMachine
class StateMachine {
  constructor(definition: StateModelDefinition)
  getState(): string
  allowedTransitions(): Array<{ to: string; trigger?: string }>
  transition(trigger: string): TransitionResult
  isTerminal(): boolean
  setState(stateId: string): void
}

Key concepts: initial state (where every journey begins), terminal states (completed, rejected, handed-off), transition guards (condition fields that must be satisfied before a transition is allowed), and receipt-emitting states (states that trigger an immutable receipt on entry).

State Flow: DVLA Driving Licence Renewal

A typical service journey. The state model defines a linear path with a branch point at eligibility, where the journey may diverge to rejection or human handoff.

not-started
identity-verified
eligibility-checked
consent-given
details-confirmed
photo-submitted
payment-made
application-submitted
completed
rejected
handed-off

ConsentManager

Manages the citizen’s consent decisions for a service. Each service defines a ConsentModel with required and optional grants. The manager tracks which grants have been presented, accepted, or declined, and gates progress accordingly.

Class ConsentManager
class ConsentManager {
  getRequiredGrants(): ConsentGrant[]
  getOptionalGrants(): ConsentGrant[]
  recordDecision(
    grantId: string,
    granted: boolean
  ): ConsentDecision
  allRequiredGranted(): boolean
}

Consent is never assumed or inferred. The platform renders consent cards (deterministic task injection at pipeline step 11) and records the citizen’s explicit decision. If a required grant is declined, the journey cannot proceed past the consent state.

FieldCollector

Tracks the data fields required by a service, pre-fills from persona data where available, and records new fields extracted from conversation. The collector tells the prompt layer what is still missing, enabling the LLM to ask targeted questions.

Class FieldCollector
class FieldCollector {
  seed(personaData: Record<string, unknown>): void
  recordField(
    key: string,
    value: unknown,
    source: string
  ): void
  getMissing(): string[]
  isComplete(): boolean
}

ArtefactStore

Loads and caches structured artefacts from the service registry. Each service’s artefact set is loaded once and held in memory for the duration of the session.

Class ArtefactStore
class ArtefactStore {
  loadFromRegistry(serviceId: string): Promise<number>
  fetchArtefacts(serviceId: string): Promise<ServiceArtefacts>
  getServiceArtefacts(serviceId: string): ServiceArtefacts | null
}
Interface ServiceArtefacts
interface ServiceArtefacts {
  manifest: CapabilityManifest;
  policy?: PolicyRuleset;
  consent?: ConsentModel;
  stateModel?: StateModelDefinition;
  stateInstructions?: StateInstructions;
}

The five artefact types form a complete service description. The manifest declares identity and input schema. The policy defines eligibility rules. The state model defines the journey. The consent model defines data sharing permissions. The state instructions provide per-state behavioural guidance for the agent.

Policy Operator Reference

The PolicyEvaluator supports seven operators for rule conditions. Each operator has strict type semantics — the evaluator does not perform type coercion.

Operator Semantics Example
>= Field value is greater than or equal to the specified value. Numeric comparison only. age >= 16
<= Field value is less than or equal to the specified value. Numeric comparison only. age <= 70
== Field value is strictly equal to the specified value. String or numeric. jurisdiction == “England”
!= Field value is not equal to the specified value. status != “disqualified”
exists Field is present and non-null. Value parameter is ignored. driving_licence_number exists
not-exists Field is absent or null. Value parameter is ignored. ban_end_date not-exists
in Field value is one of the specified array values. licence_type in [“full”, “provisional”]

Edge cases are distinct from failures. A rule with edge_case: true does not cause outright ineligibility. Instead, it flags the citizen for potential handoff to a human adviser. This handles ambiguous situations — a citizen who might be eligible but needs human judgement. The agent communicates uncertainty rather than making a binary decision.

State Model Transition Guards

Transitions can carry guard conditions that must be satisfied before the transition is allowed. Guards prevent premature advancement — a citizen cannot reach “payment-made” without first passing through “details-confirmed”.

Example: Transition with guard
{
  "from": "eligibility-checked",
  "to": "consent-given",
  "trigger": "grant_consent",
  "guard": {
    "condition": "policy_result.eligible == true",
    "message": "Cannot proceed: citizen is not eligible for this service."
  }
}

The state machine evaluates guards synchronously. If a guard fails, the transition is rejected and the machine remains at its current state. The guard’s message is available to the agent for explaining why the journey cannot proceed.

4

Evidence Plane

Every significant action produces an immutable trace event. The evidence plane enables audit, replay, and accountability across all citizen interactions.

TraceEmitter

The primary interface for writing events into the evidence store. Supports span-based tracing: a span groups related events (e.g., all events within a single pipeline run) under a common span ID.

Class TraceEmitter
class TraceEmitter {
  startSpan(opts: SpanOptions): SpanContext
  emit(
    type: TraceEventType,
    span: SpanContext,
    payload: Record<string, unknown>
  ): Promise<TraceEvent>
  endSpan(
    span: SpanContext,
    type: TraceEventType,
    payload: Record<string, unknown>
  ): Promise<TraceEvent>
}

TraceEvent Structure

Interface TraceEvent
interface TraceEvent {
  id: string;
  traceId: string;
  spanId: string;
  parentSpanId?: string;
  timestamp: string;
  type: TraceEventType;
  payload: Record<string, unknown>;
  metadata: {
    userId?: string;
    sessionId: string;
    capabilityId?: string;
  };
}

Event Types

Every event is classified by type. Events marked as ledger-updating are written to the case ledger in addition to the trace store, making them available for case management and replay.

Type Description Updates Ledger
state.transition State machine moved to a new state Yes
consent.granted Citizen approved data sharing Yes
consent.denied Citizen declined data sharing Yes
policy.evaluated Eligibility rules checked Yes
handoff.initiated Escalation to human adviser triggered Yes
capability.invoked Department service called Yes
llm.request Language model call initiated Yes
llm.response Language model response received Yes
credential.presented Verifiable proof submitted Yes
receipt.issued Outcome documented with immutable receipt Yes
field.extracted Fact parsed from conversation No
error.occurred System error logged No

ReceiptGenerator

Generates immutable receipts from trace events. Receipts are the citizen-facing evidence that an action was taken. They include a reference to the service, the action performed, the outcome, and which data was shared.

Class & Interface ReceiptGenerator / Receipt
class ReceiptGenerator {
  generate(event: TraceEvent): Receipt
}

interface Receipt {
  id: string;
  capabilityId: string;
  action: string;
  outcome: string;
  timestamp: string;
  dataShared?: string[];
}

CaseStore

The case store maintains a ledger of in-flight and completed cases. It is updated from trace events that carry ledger-updating types. Each case aggregates the full history of a citizen’s interaction with a specific service, including state transitions, consent decisions, and receipts.

Class CaseStore
class CaseStore {
  getCase(caseId: string): CaseRecord | null
  listByUser(userId: string): CaseRecord[]
  listByService(serviceId: string): CaseRecord[]
  updateFromEvent(event: TraceEvent): void
}

interface CaseRecord {
  id: string;
  userId: string;
  serviceId: string;
  currentState: string;
  stateHistory: string[];
  consentDecisions: ConsentDecision[];
  receipts: Receipt[];
  createdAt: string;
  updatedAt: string;
  status: "active" | "completed" | "handed-off" | "rejected";
}

The case store is a derived view — it is built entirely from trace events. If the case store is lost, it can be reconstructed by replaying events from the evidence store. This makes the evidence store the single source of truth.

ReplayEngine

The replay engine reconstructs full conversation state from trace events. Given a trace ID, it steps through events in chronological order, rebuilding state machine position, consent decisions, collected fields, and the complete message history.

Class ReplayEngine
class ReplayEngine {
  loadTrace(traceId: string): Promise<ReplaySession>
  stepForward(session: ReplaySession): ReplayFrame
  stepBackward(session: ReplaySession): ReplayFrame
  jumpToEvent(session: ReplaySession, eventId: string): ReplayFrame
}

interface ReplayFrame {
  eventIndex: number;
  totalEvents: number;
  currentState: string;
  collectedFields: Record<string, unknown>;
  consentState: Record<string, boolean>;
  event: TraceEvent;
}

Replay supports three use cases: (1) Audit — regulators or complaints teams step through a citizen’s journey to verify that rules were followed. (2) Debugging — engineers identify where a journey diverged from expected behaviour. (3) Dispute resolution — when a citizen contests an outcome, the replay provides incontrovertible evidence of what happened and why.

Trace Lifecycle

A typical pipeline run produces the following sequence of trace events.

span.start
Pipeline span opened with trace ID, session ID, and capability ID.
policy.evaluated
Eligibility rules checked. Payload contains passed/failed rules and overall result.
llm.request
Language model called. Payload contains prompt hash and token counts.
llm.response
Language model responded. Payload contains model ID, token usage, and tool calls made.
state.transition
State machine moved from eligibility-checked to consent-given. Transition validated.
consent.granted
Citizen accepted data sharing. Grant ID and scope recorded.
field.extracted
New facts parsed from conversation. Fields and sources recorded.
receipt.issued
Immutable receipt generated for the completed action.
span.end
Pipeline span closed. Duration and step count recorded.
5

Citizen Data Architecture

A unified data model for citizen information with field-level provenance, tiered trust levels, and delegation-aware access control.

Three Data Tiers

Every data field carries a trust tier. The tier determines how the field can be used, whether it requires consent to share, and whether the citizen can edit it.

Tier 1: Verified

Immutable, government-sourced data. Confirmed by a department system of record. The citizen cannot edit these fields — only the source department can update them. Examples: National Insurance number, driving licence number, passport details.

Tier 2: Submitted

Citizen-entered data that has not been verified against a department source. The citizen can edit these fields at any time. Examples: contact preferences, correspondence address, accessibility needs.

Tier 3: Inferred

Agent-derived data, flagged as such. Extracted from conversation context or computed from other fields. Always shown with an “inferred” label so the citizen knows its provenance. Examples: estimated income bracket, likely household composition.

Field Metadata

Every field in the citizen data model carries source metadata. This enables the platform to show exactly where each piece of data came from and which consent grant authorised its use.

Interface FieldSource
interface FieldSource {
  source: string;     // "HMRC", "DVLA", "Home Office"
  tier: "verified" | "submitted" | "inferred";
  topic: string;     // "identity", "finance", "employment"
}

RelationshipStore

Manages delegation relationships: parents acting for children, power of attorney holders acting for vulnerable adults, executors managing a deceased person’s affairs. The store enforces scoped permissions and logs every delegated action for audit.

Class RelationshipStore
class RelationshipStore {
  canActOnBehalf(
    actorId: string,
    subjectId: string,
    scope: PermissionScope
  ): boolean
  getPermissions(
    actorId: string,
    subjectId: string
  ): PermissionScope[]
  getSubjects(
    actorId: string
  ): Array<{ userId: string; type: RelationshipType }>
  logDelegatedAction(
    action: DelegatedAction
  ): void
}

Default Permissions by Relationship Type

Relationship Default Permissions
parent_of full_authority
guardian_of full_authority
executor_of view_data, submit_on_behalf, make_payments, correspond
attorney_of view_data, submit_on_behalf, manage_consent, make_payments, correspond
carer_of view_status, receive_alerts
spouse view_status, receive_alerts
representative_of view_status, view_data, correspond

Delegation is always scoped. A power of attorney holder can submit forms and manage consent, but a carer can only view status and receive alerts. The RelationshipStore enforces these boundaries at the API level — the agent cannot bypass them through conversational persuasion.

Unified Citizen Data Model

The citizen data model aggregates fields from multiple departments into a single profile. Each field carries its source metadata, enabling the platform to track provenance across department boundaries.

Example: Citizen profile structure
{
  "identity": {
    "full_name":        { "value": "Margaret Chen",     "tier": "verified",  "source": "HMRC" },
    "date_of_birth":    { "value": "1958-03-14",      "tier": "verified",  "source": "HMRC" },
    "ni_number":        { "value": "QQ123456C",       "tier": "verified",  "source": "HMRC" }
  },
  "contact": {
    "address":          { "value": "12 Oak Lane, Bristol", "tier": "submitted", "source": "citizen" },
    "email":            { "value": "m.chen@email.com",  "tier": "submitted", "source": "citizen" }
  },
  "finance": {
    "annual_income":    { "value": 28400,              "tier": "verified",  "source": "HMRC" },
    "estimated_savings": { "value": "low",              "tier": "inferred",  "source": "agent" }
  }
}

The profile is organised by topic: identity, contact, finance, employment, health, transport, housing. Each topic maps to one or more source departments. When a service requests a field, the platform checks the consent model to determine whether the citizen has granted access to that topic from that source.

Cross-Department Data Flow

When a citizen consents to share data across departments, the platform mediates the exchange. The flow is always explicit: the citizen sees which fields will be shared, which department will receive them, and for what purpose.

Citizen starts DVLA renewal
The service manifest declares it needs National Insurance number, which is sourced from HMRC.
Consent card presented
“DVLA needs your National Insurance number from HMRC to verify your identity. Allow?”
Citizen grants consent
Decision recorded in the evidence store with trace ID, grant ID, scope, and timestamp.
Field resolved
NI number from HMRC (Tier 1, verified) is made available to the DVLA service context. The field collector marks it as collected.
Receipt issued
An immutable receipt records that the NI number was shared with DVLA, for the purpose of identity verification, with the citizen’s explicit consent.
6

Prompt Engineering

The system prompt is not a monolith. It is assembled from layered fragments, each serving a specific purpose. Static layers are cached; dynamic layers change every turn.

System Prompt Composition

In journey mode (service in progress), the system prompt is built from thirteen layers. Each layer adds context that shapes the agent’s behaviour for the current turn.

Journey mode prompt layers
 1. Agent personality       // conservative / expansive discovery
 2. Persona communication   // Plain English / formal / assisted
 3. Scenario context       // Service domain and background
 4. Persona data           // Verified fields from citizen profile
 5. Policy evaluation      // Eligibility results
 6. Fact extraction        // Instructions for parsing citizen input
 7. State model context    // Current state + allowed transitions
 8. State instructions     // Per-state DO/DO NOT rules
 9. Field collector status  // Required / collected / missing
10. Consent requirements   // Grants needed for this state
11. Accuracy guardrails    // Factual grounding rules
12. Task format            // Card type specifications
13. Structured output      // JSON block schema

Layer Stack

Each layer occupies a portion of the token budget. The layers are ordered by stability: static layers first, dynamic layers last.

Layer 1: Agent personality
Behavioural profile — cautious/proactive, formality level
static
Layer 2: Persona style
Communication preferences from citizen profile
static
Layer 3: Scenario context
Service domain, department, relevant background
static
Layer 4: Service artefacts and state
Policy results, state model, field status, consent
per-service
Layer 5: Dynamic context
Per-turn state instructions, extracted facts, missing fields
per-turn
Layer 6: Guardrails and format
Accuracy rules, task types, structured output schema
static

Triage mode is lighter, omitting layers 4, 5, 7, 8, 9, and 10. The total token budget varies by model and configuration.

Prompt Caching Strategy

The system prompt is split into a static prefix and a dynamic suffix. The static prefix (agent personality, persona style, scenario context, guardrails) is marked with a cache control directive. The dynamic suffix (state context, field collector, extracted facts) changes every turn.

Cache hit rate: The static prefix is tagged with a cache control directive supported by the LLM provider. When the citizen remains within the same service journey, the prefix is reused from cache. This yields significant token savings on repeated turns within a journey. The savings compound across multi-turn conversations, which typically run 8–12 turns.

Structured Output Schema

The language model must produce a JSON block at the end of every response. The orchestrator parses this block at pipeline step 8. The schema enforces a consistent contract between the LLM’s output and the deterministic validation that follows.

Required JSON block in LLM response
{
  "title": "Renew your driving licence",
  "tasks": [
    {
      "type": "form",
      "title": "Confirm your details",
      "fields": ["full_name", "date_of_birth", "address"]
    }
  ],
  "transition": "details-confirmed",
  "facts": [
    { "key": "preferred_contact", "value": "email" }
  ]
}

The transition field is validated against the state machine (step 10). If the LLM proposes an illegal transition, it is silently dropped and the state machine remains at its current position. The tasks array may be overridden entirely by deterministic task injection (step 11) at certain states.

Triage Mode vs Journey Mode

The agent operates in two modes. Mode selection happens at pipeline step 4 and determines which prompt layers are included and which tools are available.

Triage Mode

No service context. The agent identifies the citizen’s need and proposes one or more services. Prompt layers 4, 5, 7, 8, 9, and 10 are omitted. No state machine, no field collector, no consent. The agent works from the full service catalogue to find the best match.

Journey Mode

Service selected and artefacts loaded. All 13 prompt layers are active. The state machine tracks progress, the field collector tracks data, and the consent manager gates access. The agent operates within the boundaries defined by the service’s artefacts.

Agent Personalities

The agent personality layer (layer 1) defines the agent’s behavioural profile. Two personalities are currently defined, each tuned for different citizen needs.

Personality Behaviour Best For
Conservative discovery Cautious, step-by-step, asks for explicit confirmation before advancing. Never skips questions, even when data is already known. Prioritises citizen understanding over speed. Vulnerable users, complex services, high-stakes decisions (benefits, legal)
Expansive discovery Proactive, efficient, pre-fills forms and auto-submits where possible. Explains what it has done rather than asking permission first. Prioritises speed while maintaining accuracy. Routine renewals, confident users, repeat interactions

Personality does not affect deterministic behaviour. Both conservative and expansive discovery modes follow the same state machine, the same policy rules, and the same consent requirements. The personality only affects how the agent communicates — whether it asks before acting or acts before explaining. The platform’s guarantees hold regardless of personality.

7

Service Integration

Two strategies for connecting services to the runtime, a tool generation pipeline, and a three-tier service store with gap analysis.

Two Service Strategies

JsonServiceStrategy

deterministic

No tools given to the LLM. Service context is built entirely from structured artefacts and the field collector. All state transitions are proposed by the LLM in its structured output and validated by the orchestrator’s state machine.

Best for: Services with complete, hand-crafted artefact sets. Maximum determinism, minimum LLM autonomy.

McpServiceStrategy

tool-based

The LLM receives check_eligibility and advance_state tools via the adapter. It calls these during the agentic loop (step 7). Tool results are parsed for state transitions and fed back into the validation pipeline.

Best for: Service-graph services that expose department APIs. The LLM has more autonomy but is still bounded by the state machine.

Tool Generation Pipeline

The server generates a standard set of integration points per service from its structured artefacts. This ensures a consistent interface regardless of the underlying department system.

Load artefacts
Manifest, policy, consent model, and state model read from the service store.
Generate tools (2 per service)
check_eligibility — evaluates policy ruleset against citizen data. advance_state — proposes a state transition with context.
Generate resources (4 per service)
Manifest, policy, consent, and state model exposed as typed URIs. Consumers read artefacts without knowing the storage format.
Generate prompts (2 per service)
Journey template (full service conversation) and eligibility check template (quick check only).

Service Store Tiers

Services come from three sources, with descending levels of artefact completeness. The store applies precedence rules: a fully described service always overrides a graph entry, which overrides a catalogue entry.

Tier 1: Full

Hand-crafted services with complete artefact sets: manifest, policy, state model, consent, and state instructions. Highest precedence. These are the services that work end-to-end through the agent.

Tier 2: Graph

Service-graph nodes with metadata, department attribution, and partial artefacts. Sufficient for triage and eligibility checking, but not for full journey orchestration.

Tier 3: Catalogue

The current GOV.UK service register contains approximately 1,500 services with names and department attribution. Used for service discovery during triage. No artefacts — the agent can identify the service but cannot orchestrate a journey.

Gap Analysis

The Legibility Studio computes per-department coverage by comparing full (Tier 1) services against the total catalogue. This surfaces exactly which services need artefact authoring and in what priority order.

Coverage formula: For each department, gap analysis computes full_count / total_count. A department with 3 fully described services out of 47 total has 6.4% coverage. The studio ranks departments by total service volume and highlights the highest-impact gaps — the services that appear in the most citizen life events.

Integration teams should start with high-volume, cross-departmental services: bereavement notifications, child benefit claims, driving licence renewals. These appear in multiple life events and benefit the most citizens when made fully legible to agents.

Artefact Authoring Workflow

Departments author artefacts through the Legibility Studio or by editing structured artefacts directly. The following sequence represents the recommended authoring order.

1
Capability manifest
Start here. Define the service name, department, description, input schema (what data the service needs), and output schema (what the service produces). This is the service’s identity card.
2
Policy ruleset
Define eligibility rules. Each rule maps a citizen data field to a condition using typed operators. Include the human-readable failure message that will be shown if the rule fails. Mark edge cases that should trigger handoff rather than outright rejection.
3
State model
Map the citizen journey as a state machine. Define each state, its allowed transitions, transition guards (conditions that must be met), terminal states, and which states emit receipts. Keep the model linear where possible — branches add complexity.
4
Consent model
Declare what data the service needs to access and why. Separate required grants (must accept to proceed) from optional grants (can decline without blocking). Each grant needs a plain-English description suitable for a consent card.
5
State instructions
Write per-state DO/DO NOT rules for the agent. These constrain the LLM’s behaviour at each point in the journey. Be specific: “DO ask for the citizen’s driving licence number” is better than “DO collect required information”.

Worked Example: Manifest Structure

The capability manifest is the entry point for any service. Here is the structure for a typical renewal service.

Example: CapabilityManifest
{
  "id": "dvla-renew-driving-licence",
  "name": "Renew a driving licence",
  "department": "Driver and Vehicle Licensing Agency",
  "description": "Renew a photocard driving licence that is expiring or has expired.",
  "version": "1.0.0",
  "input_schema": {
    "required": [
      "full_name", "date_of_birth",
      "driving_licence_number", "national_insurance_number",
      "address", "photo"
    ],
    "optional": ["email", "phone"]
  },
  "output_schema": {
    "produces": ["application_reference", "expected_delivery_date"]
  },
  "estimated_duration": "10 minutes",
  "fee": {
    "amount": 14.00,
    "currency": "GBP"
  }
}

The input schema drives the field collector. When the orchestrator loads a service, it seeds the FieldCollector from the manifest’s input_schema.required array. Fields already present in the citizen’s profile are marked as collected; the remainder become the “missing fields” list that the agent uses to ask targeted questions.

8

Assurance and Verification

Four assurance layers ensure that artefacts are correct, the pipeline behaves deterministically, model changes are governed, and compliance requirements are met continuously.

Artefact Verification

Before a department’s artefacts enter the service registry, they pass through automated validation and simulation.

1
Schema validation deterministic
Every artefact (manifest, policy, state model, consent, state instructions) is validated against its published schema. Type errors, missing required fields, and structural violations are caught before the artefact can be registered.
2
Referential integrity deterministic
Cross-artefact references are checked: policy rules reference fields declared in the manifest input schema; state model transitions reference valid states; consent grants reference data scopes that exist in the field catalogue.
3
Dry-run simulation platform
The platform runs the full pipeline with synthetic citizen data against the service artefacts. This verifies that the state machine can reach all terminal states, policy rules evaluate correctly, and consent flows complete without blocking. No live LLM call is required — the simulation uses deterministic response fixtures.
4
Gap analysis platform
The Legibility Studio reports on artefact completeness: which services have full coverage, which are missing policy rulesets, which state models have unreachable states. Departments can prioritise authoring work based on coverage scores.

Pipeline Assurance

The deterministic pipeline is tested independently of any LLM provider. Because steps 1–6 and 8–14 are pure computation, they can be verified exhaustively against known inputs and expected outputs.

Assurance Area Method Coverage
Policy evaluation Deterministic tests against all seven operators with boundary conditions. Each operator is tested with valid, invalid, missing, and edge-case inputs. All rule types and operators
State machine transitions Every defined transition, guard, and terminal state is exercised. Illegal transitions are verified to be rejected. Auto-transitions and forced transitions are validated. Complete state graph per service
Consent enforcement Verification that required grants block progress when declined, optional grants do not, and all decisions are recorded in the evidence store. All consent models
Evidence integrity Trace events are verified for completeness (every pipeline run produces a span), ordering (events are chronological), and immutability (events cannot be modified after creation). All event types

Model Governance

When the underlying language model is changed or updated, the platform runs a structured evaluation process before the new model enters production.

1
Evaluation suite
A curated set of known service journeys is replayed against the new model. Each journey has expected outputs: correct structured JSON, valid state transition proposals, appropriate task card generation, and accurate field extraction.
2
Regression testing
The evaluation results are compared against the baseline from the current production model. Regressions in structured output parsing, transition accuracy, or guardrail compliance are flagged for review before approval.
3
Prompt compatibility
The prompt composition is tested to ensure it produces reliable results with the new model. Token budget, cache behaviour, and structured output adherence are validated across all thirteen prompt layers.

Compliance Verification

Automated checks run continuously to verify that the platform meets its policy and consent obligations.

Four continuous compliance checks:

  • Policy rule coverage — every service with a published policy ruleset has corresponding evaluation coverage. No rule goes untested against synthetic data.
  • Consent completeness — every data field accessed by a service is covered by a consent grant in the service’s consent model. No data flows without a declared purpose.
  • Audit trail continuity — every citizen interaction produces a complete trace. Gaps in the evidence store (missing spans, orphaned events) trigger alerts.
  • Handoff accuracy — edge-case rules consistently trigger human escalation. The platform verifies that safeguarding conditions never result in automated decisions.
9

Security and Data Governance

The platform handles sensitive citizen data across department boundaries. Security, data residency, and governance are architectural concerns, not afterthoughts.

Authentication and Identity

Citizen identity is established through integration with GOV.UK One Login, the cross-government identity platform. The agent does not manage credentials directly — authentication is delegated to One Login, and the platform receives a verified identity token. Session management uses standard secure token patterns with appropriate expiry and refresh mechanisms.

Data Residency

All citizen data is processed and stored within UK jurisdiction. The platform infrastructure is hosted in UK data centres. No citizen personally identifiable information (PII) leaves UK borders during processing, storage, or transit. Department data shared through the platform remains subject to each department’s existing data governance framework.

Encryption

In transit: all communications between the citizen application, the platform services, and department APIs use TLS 1.2 or higher. At rest: the evidence store, citizen data profiles, and consent records are encrypted using AES-256. Encryption keys are managed through a dedicated key management service with automatic rotation.

LLM Data Handling

Citizen PII sent to the language model is processed in-session only. The platform’s contract with LLM providers stipulates that citizen data is not used for model training, is not logged beyond the session, and is not accessible to the provider’s staff. The LLM Adapter Layer enforces this boundary — it controls exactly which citizen data fields are included in prompts, and strips sensitive fields (financial details, health records) unless explicitly required by the current service and covered by a consent grant.

Consent as a Technical Control

The Consent Manager is a hard gate, not advisory. When a service requires data that the citizen has not consented to share, the pipeline halts. The LLM cannot override consent decisions through conversational persuasion or prompt injection. Consent decisions are recorded as immutable trace events and are available for audit at any time. Consent can be withdrawn, at which point the platform ceases to share the affected data fields with the relevant service.

Audit Trail

Every significant action is captured in the evidence store as an immutable trace event. The evidence store is append-only — events cannot be modified or deleted after creation. This provides a complete, tamper-evident record of every policy evaluation, consent decision, state transition, LLM interaction, and receipt issued. The replay engine can reconstruct the exact state of any citizen interaction at any point in time.

Prompt Injection Mitigation

The architecture’s separation of deterministic and AI layers provides structural protection against prompt injection. Even if a malicious input manipulates the LLM’s response, the deterministic pipeline (steps 8–14) validates all outputs: state transitions must be legal, consent must be granted, policy rules must pass. The LLM proposes; the platform disposes.

Access Control and Delegation

The RelationshipStore enforces scoped permissions for delegated access. A power of attorney holder, parent, or carer can act on behalf of another citizen only within their declared permission scope. Every delegated action is logged separately in the evidence store, recording both the actor and the subject. Permission boundaries are enforced at the API level and cannot be circumvented through the conversational interface.

10

Resilience and Failure Modes

The platform is designed to degrade gracefully. Every failure mode has a defined handling strategy that preserves citizen trust and data integrity.

1
LLM unavailability
If the language model is unreachable or returns an error, the platform degrades gracefully to static guidance. The citizen is shown pre-authored content for their current service state, including next steps and links to the existing GOV.UK web journey. No data is lost — the citizen can resume the agentic journey when the model becomes available.
2
Department API timeout
When a department API call times out or fails, the platform retries with exponential backoff. If retries are exhausted, the citizen is notified that the specific action could not be completed and is offered the option to continue with partial information or save their progress and return later. The platform never silently drops a failed action.
3
Evidence store failure
If the evidence store is temporarily unavailable, trace events are queued in a durable buffer and flushed when the store recovers. The pipeline never discards trace data. If the buffer reaches capacity, the pipeline pauses rather than proceeding without audit coverage. Evidence integrity is non-negotiable.
4
Handoff as a safety valve
At any point in the pipeline, the system can escalate to a human adviser. Handoff triggers include: safeguarding concerns detected in conversation, repeated pipeline errors for the same citizen, edge-case policy rules, and explicit citizen request. The full conversation context, collected fields, and current state are preserved and transferred to the human adviser.
5
Malformed LLM output
If the language model produces output that cannot be parsed at pipeline step 8, the response text is still delivered to the citizen but no state transition is applied and no tasks are generated. The conversation continues from its current state. The malformed output is logged as a trace event for monitoring and model evaluation.

The platform always has a safe fallback. No failure mode results in a citizen being left without guidance, losing their progress, or having data silently dropped. The worst case is a degraded experience with human support — never a broken one.

The architecture enforces a single principle: the language model generates conversation; deterministic code enforces rules. Every integration point, every package boundary, and every type signature exists to maintain that separation.