Technical Architecture Reference — Agentic Legibility Stack

1 System Architecture
2 The Runtime Pipeline
3 Legibility Package
4 Evidence Plane
5 Citizen Data Architecture
6 Prompt Engineering
7 Service Integration
8 Assurance and Verification
9 Security and Data Governance
10 Resilience and Failure Modes

1

System Architecture

The stack is organised as a set of focused packages with strict dependency boundaries. Every package owns one concern; cross-cutting behaviour is mediated through well-typed interfaces.

Package Dependency Graph

The dependency tree is rooted at the Shared Schema Layer, which has zero internal dependencies. All other components declare their internal dependencies explicitly. External SDK usage is confined to two components.

Shared Schema Layer

Leaf — no internal dependencies

LLM Adapter Layer

+ LLM provider SDK

+ MCP client (Model Context Protocol)

Evidence Plane

Identity Layer

Service Graph

Legibility Engine

+ Service Store, Evidence Plane

Personal Data Layer

+ Identity Layer, Evidence Plane

Service Store

+ Service Graph, Evidence Plane

Orchestration Layer

+ Legibility Engine

MCP Server

+ Legibility Engine

+ MCP server (Model Context Protocol)

Architectural Rules

Three hard rules that govern the architecture:

LLM integration is confined to the LLM Adapter Layer ONLY. Zero direct LLM SDK usage elsewhere in the system. All LLM calls route through the adapter interface.
Model Context Protocol (MCP) client usage in the LLM Adapter Layer, server usage in the MCP Server. No other component touches MCP directly.
The Legibility Studio fetches evidence via HTTP. It does not import the Evidence Plane directly. This enforces a clean separation between the department-facing and citizen-facing surfaces.

Two Applications

Citizen App

The citizen-facing experience. Hosts the conversational agent, task cards, consent flows, and receipt history. Exposes an evidence API that the Legibility Studio reads from.

Legibility Studio

The department-facing admin tool. Displays service coverage, gap analysis, trace inspection, and artefact authoring. Consumes evidence over HTTP — never imports the Evidence Plane directly.

Data Layer

Structured artefacts define each service: a capability manifest, policy ruleset, state model, consent model, and state-specific instructions. These artefacts are the canonical description of what a service does, who can use it, and how it progresses.

The evidence store is append-only. Every significant event — state transitions, consent decisions, policy evaluations, credential presentations — is captured as an immutable trace event. The store supports full replay: given a trace ID, the system can reconstruct the exact conversation state at any point.

Personal data is stored with field-level source attribution. Every value carries metadata indicating which department sourced it, whether it is verified or citizen-submitted, and which consent grant authorised its use.

Artefact Structure

Each service in the store is described by up to five artefact files. Together, these form the complete machine-readable description of a government service.

Artefact	Purpose	Required
manifest	Service identity, description, department, input schema, output schema, and lifecycle metadata. The canonical reference for what the service does.	Yes
policy	Eligibility rules expressed as typed conditions. Each rule declares a field, operator, expected value, and human-readable failure reason.	Yes
state-model	The finite-state machine defining the journey. States, transitions, guards, terminal markers, and receipt-emitting flags.	Yes
consent	Required and optional consent grants. Each grant declares scope (which data), purpose (why), and whether it is mandatory.	No
state-instructions	Per-state behavioural rules for the agent. DO/DO NOT lists that constrain the LLM’s responses at each point in the journey.	No

Artefact authoring is the integration point for departments. A department does not need to build an API or write code. It publishes structured artefacts describing its services, and the platform handles orchestration, consent, evidence, and agent behaviour. The Legibility Studio provides a guided authoring experience for department teams.

Package Responsibility Summary

Each component in the system owns a single, well-defined concern. This table summarises what each component does and what it must never do.

Component	Owns	Must Never
Shared Schema Layer	Shared types, schema validation, type guards	Import any other component
LLM Adapter Layer	LLM calls, MCP client, tool dispatch	Make policy or state decisions
Evidence Plane	Trace events, receipts, case ledger, replay	Import LLM integration or make API calls
Identity Layer	User profiles, authentication, session management	Access evidence store directly
Legibility Engine	PolicyEvaluator, StateMachine, ConsentManager, FieldCollector, ArtefactStore	Call the LLM or make network requests
MCP Server	MCP server, tool generation, resource exposure	Use MCP client (that belongs to the adapter layer)
Personal Data Layer	Citizen data model, field sourcing, relationship store	Bypass consent checks
Orchestration Layer	Orchestrator, pipeline, strategy dispatch	Import LLM integration directly (must go through adapter layer)
Service Graph	GOV.UK service graph integration, node resolution	Store state or evidence
Service Store	Service storage, tiered resolution, gap analysis	Make eligibility decisions (that belongs to the Legibility Engine)

2

The Runtime Pipeline

The centrepiece of the architecture. Every citizen interaction passes through this 14-step orchestrator pipeline. Deterministic code runs before and after the language model; the platform always has the final word.

1

Policy evaluation deterministic

Evaluates PolicyRuleset against persona data via PolicyEvaluator. Returns eligible/ineligible with reasons. Rules are pure data — no code execution.

2

State machine setup deterministic

Creates StateMachine from StateModelDefinition. Restores current state from client session. Computes allowed transitions.

3

Field collector deterministic

Seeds FieldCollector from the manifest input schema and persona data. Tracks required versus collected fields, and identifies what is still missing.

4

Agent selection deterministic

Selects triage mode (no service context, need identification) or journey mode (service in progress, artefact-driven). The choice determines prompt composition and available tools.

5

Strategy context and system prompt deterministic

Calls ServiceStrategy.buildServiceContext(). Assembles the system prompt from layered fragments: personality, persona style, scenario, artefacts, guardrails, and output format.

6

Build tools deterministic

Calls ServiceStrategy.buildTools(). The deterministic strategy returns an empty array; the tool-based strategy returns service-specific check and advance tools.

7

Agentic loop ai

Up to 5 iterations: call the language model, dispatch any tool calls via the adapter, accumulate results. This is the only step where the LLM generates content. The loop terminates when no tool calls remain or the iteration limit is reached.

8

Parse structured output deterministic

Extracts the JSON block from the LLM response. Parses conversation title, tasks, proposed state transitions, and extracted facts. Malformed output is caught and logged.

9

Build tasks deterministic

Transforms parsed task descriptions into typed TaskEntry objects with structured fields: type, title, description, options, metadata.

10

State transitions deterministic

Validates the LLM’s proposed transition against the StateMachine. Rejects illegal transitions. Applies forced and auto transitions defined in the state model.

11

Deterministic task injection deterministic

Overrides LLM-generated tasks with platform-generated cards at specific states. Consent cards, payment forms, and document upload prompts are always rendered by the platform, never by the LLM.

12

Consent requests deterministic

Surfaces consent grant cards at the eligibility-checked state. Required grants must be accepted to proceed; optional grants can be declined without blocking progress.

13

Handoff detection deterministic

Evaluates safeguarding triggers, edge-case conditions, and escalation rules. When a handoff is triggered, the conversation is routed to a human adviser with full context preserved.

14

Version metadata and trace platform

Hashes the system prompt, stamps ruleset and state model versions, builds the full pipeline trace with per-step timing. The trace is written to the evidence store for audit and replay.

The LLM runs at step 7 out of 14. Everything before it is deterministic setup; everything after it is validation and override. The language model proposes; the platform disposes. This is the central architectural guarantee.

OrchestratorOutput

Every pipeline run produces a single typed output. This is the contract between the runtime and the citizen application.

Interface OrchestratorOutput

interface OrchestratorOutput {
  response: string;
  reasoning: string;
  toolsUsed: string[];
  conversationTitle: string | null;
  tasks: TaskEntry[];
  policyResult?: { eligible: boolean; explanation: string };
  handoff?: { triggered: boolean; reason: string };
  serviceState?: { currentState: string; stateHistory: string[] };
  consentRequests?: ConsentRequest[];
  extractedFields?: FieldExtraction[];
  outcomeHints?: Record<string, unknown>;
  serviceProposal?: {
    serviceId: string;
    serviceName: string;
    reason: string
  };
  needProposal?: { need: string; services: string[] };
  serviceCompletions?: Array<{ serviceId: string; status: string }>;
  versionMetadata: {
    promptHash: string;
    rulesetVersion: string;
    stateModelVersion: string
  };
  pipelineTrace: {
    traceId: string;
    steps: PipelineStep[];
    totalDurationMs: number
  };
}

Service Strategy Pattern

The orchestrator does not know how a service is implemented. It delegates through a pluggable ServiceStrategy interface. Two implementations exist today; new strategies can be added without modifying the pipeline.

Interface ServiceStrategy

interface ServiceStrategy {
  buildTools(ctx: ServiceStrategyContext): ToolDefinition[];
  buildServiceContext(ctx: ServiceStrategyContext): string | Promise<string>;
  dispatchToolCall(name: string, input: unknown): Promise<string>;
  extractStateTransitions(messages: unknown[]): StateTransitionResult[];
}

JsonServiceStrategy (deterministic, inline) — no tools are given to the LLM. Service context is built from artefacts and the field collector. All state transitions are proposed by the LLM in its structured output and validated by the state machine. Used for services with complete artefact sets.

McpServiceStrategy (tool-based) — the LLM receives check_eligibility and advance_state tools. It calls these during the agentic loop. Tool results are parsed for state transitions. Used for service-graph services that expose department APIs.

Both strategies produce identical OrchestratorOutput. The citizen application cannot distinguish which strategy was used. This is by design: the contract is stable regardless of implementation.

Agentic Loop Detail

Step 7 — the agentic loop — deserves closer examination. It is the only step where the language model runs, and its behaviour is tightly bounded.

Iteration 1: Initial LLM call

System prompt + conversation history + tools sent to the language model. The model responds with text and optionally one or more tool calls.

Tool dispatch

Each tool call is dispatched through ServiceStrategy.dispatchToolCall(). Results are appended to the message history as tool result blocks.

Iteration 2: Follow-up LLM call

If tool calls were made, the model is called again with the updated history. It may make further tool calls or produce its final response.

Iteration limit

Maximum 5 iterations. If the model is still making tool calls after 5 rounds, the loop terminates and the last text response is used. This prevents runaway tool call chains.

Auto-submit detection

If the model’s response contains a task that can be auto-submitted (pre-filled form with all fields present), the orchestrator may skip the citizen confirmation step to reduce friction.

The iteration limit is a hard ceiling, not a target. Most conversations complete in 1–2 iterations. The limit exists to bound worst-case latency and cost. If a service consistently requires 4+ iterations, it indicates the artefacts need refinement — more context in the state instructions reduces the LLM’s need for exploratory tool calls.

Pipeline Trace Structure

Every pipeline run produces a trace that records the duration and outcome of each step. This is the primary debugging and performance monitoring tool.

Interface PipelineStep

interface PipelineStep {
  name: string;
  durationMs: number;
  outcome: "success" | "skipped" | "error";
  metadata?: Record<string, unknown>;
}

In practice, steps 1–6 and 8–14 are pure computation and complete in negligible time. Step 7 (LLM call) accounts for the vast majority of total pipeline duration. The trace makes it immediately visible when a slow pipeline run is caused by LLM latency versus application logic.

3

Legibility Package

The core of the deterministic layer. Four classes enforce policy, manage state, handle consent, and collect citizen data — all without any LLM involvement.

PolicyEvaluator

Evaluates a service’s eligibility rules against a citizen’s data. The evaluator iterates each rule, evaluates conditions using typed operators, and returns a structured result. No fuzzy logic, no LLM interpretation — rules either pass or fail.

Class PolicyEvaluator

class PolicyEvaluator {
  evaluate(
    ruleset: PolicyRuleset,
    context: Record<string, unknown>
  ): PolicyResult
}

The evaluation logic iterates rules, evaluates conditions using operators: >=, <=, ==, !=, exists, not-exists, in. Each rule returns a pass/fail result with the original description as an explanation.

Interface PolicyRuleset

interface PolicyRuleset {
  service_id: string;
  version: string;
  rules: PolicyRule[];
}

interface PolicyRule {
  id: string;
  description: string;
  condition: {
    field: string;
    operator: ">=" | "<=" | "==" | "!=" | "exists" | "not-exists" | "in";
    value?: unknown;
  };
  reason_if_failed: string;
  edge_case?: boolean;
}

interface PolicyResult {
  eligible: boolean;
  passed: PolicyRule[];
  failed: PolicyRule[];
  edgeCases: PolicyRule[];
  explanation: string;
}

StateMachine

A deterministic finite-state machine constructed from a service’s StateModelDefinition. It enforces the legal transitions for a service journey, prevents the LLM from skipping steps, and identifies terminal and receipt-emitting states.

Class StateMachine

class StateMachine {
  constructor(definition: StateModelDefinition)
  getState(): string
  allowedTransitions(): Array<{ to: string; trigger?: string }>
  transition(trigger: string): TransitionResult
  isTerminal(): boolean
  setState(stateId: string): void
}

Key concepts: initial state (where every journey begins), terminal states (completed, rejected, handed-off), transition guards (condition fields that must be satisfied before a transition is allowed), and receipt-emitting states (states that trigger an immutable receipt on entry).

State Flow: DVLA Driving Licence Renewal

A typical service journey. The state model defines a linear path with a branch point at eligibility, where the journey may diverge to rejection or human handoff.

not-started

identity-verified

eligibility-checked

consent-given

details-confirmed

photo-submitted

payment-made

application-submitted

completed

rejected

handed-off

ConsentManager

Manages the citizen’s consent decisions for a service. Each service defines a ConsentModel with required and optional grants. The manager tracks which grants have been presented, accepted, or declined, and gates progress accordingly.

Class ConsentManager

class ConsentManager {
  getRequiredGrants(): ConsentGrant[]
  getOptionalGrants(): ConsentGrant[]
  recordDecision(
    grantId: string,
    granted: boolean
  ): ConsentDecision
  allRequiredGranted(): boolean
}

Consent is never assumed or inferred. The platform renders consent cards (deterministic task injection at pipeline step 11) and records the citizen’s explicit decision. If a required grant is declined, the journey cannot proceed past the consent state.

FieldCollector

Tracks the data fields required by a service, pre-fills from persona data where available, and records new fields extracted from conversation. The collector tells the prompt layer what is still missing, enabling the LLM to ask targeted questions.

Class FieldCollector

class FieldCollector {
  seed(personaData: Record<string, unknown>): void
  recordField(
    key: string,
    value: unknown,
    source: string
  ): void
  getMissing(): string[]
  isComplete(): boolean
}

ArtefactStore

Loads and caches structured artefacts from the service registry. Each service’s artefact set is loaded once and held in memory for the duration of the session.

Class ArtefactStore

class ArtefactStore {
  loadFromRegistry(serviceId: string): Promise<number>
  fetchArtefacts(serviceId: string): Promise<ServiceArtefacts>
  getServiceArtefacts(serviceId: string): ServiceArtefacts | null
}

Interface ServiceArtefacts

interface ServiceArtefacts {
  manifest: CapabilityManifest;
  policy?: PolicyRuleset;
  consent?: ConsentModel;
  stateModel?: StateModelDefinition;
  stateInstructions?: StateInstructions;
}

The five artefact types form a complete service description. The manifest declares identity and input schema. The policy defines eligibility rules. The state model defines the journey. The consent model defines data sharing permissions. The state instructions provide per-state behavioural guidance for the agent.

Policy Operator Reference

The PolicyEvaluator supports seven operators for rule conditions. Each operator has strict type semantics — the evaluator does not perform type coercion.

Operator	Semantics	Example
>=	Field value is greater than or equal to the specified value. Numeric comparison only.	age >= 16
<=	Field value is less than or equal to the specified value. Numeric comparison only.	age <= 70
==	Field value is strictly equal to the specified value. String or numeric.	jurisdiction == “England”
!=	Field value is not equal to the specified value.	status != “disqualified”
exists	Field is present and non-null. Value parameter is ignored.	driving_licence_number exists
not-exists	Field is absent or null. Value parameter is ignored.	ban_end_date not-exists
in	Field value is one of the specified array values.	licence_type in [“full”, “provisional”]

Edge cases are distinct from failures. A rule with edge_case: true does not cause outright ineligibility. Instead, it flags the citizen for potential handoff to a human adviser. This handles ambiguous situations — a citizen who might be eligible but needs human judgement. The agent communicates uncertainty rather than making a binary decision.

State Model Transition Guards

Transitions can carry guard conditions that must be satisfied before the transition is allowed. Guards prevent premature advancement — a citizen cannot reach “payment-made” without first passing through “details-confirmed”.

Example: Transition with guard

{
  "from": "eligibility-checked",
  "to": "consent-given",
  "trigger": "grant_consent",
  "guard": {
    "condition": "policy_result.eligible == true",
    "message": "Cannot proceed: citizen is not eligible for this service."
  }
}

The state machine evaluates guards synchronously. If a guard fails, the transition is rejected and the machine remains at its current state. The guard’s message is available to the agent for explaining why the journey cannot proceed.

4

Evidence Plane

Every significant action produces an immutable trace event. The evidence plane enables audit, replay, and accountability across all citizen interactions.

TraceEmitter

The primary interface for writing events into the evidence store. Supports span-based tracing: a span groups related events (e.g., all events within a single pipeline run) under a common span ID.

Class TraceEmitter

class TraceEmitter {
  startSpan(opts: SpanOptions): SpanContext
  emit(
    type: TraceEventType,
    span: SpanContext,
    payload: Record<string, unknown>
  ): Promise<TraceEvent>
  endSpan(
    span: SpanContext,
    type: TraceEventType,
    payload: Record<string, unknown>
  ): Promise<TraceEvent>
}

TraceEvent Structure

Interface TraceEvent

interface TraceEvent {
  id: string;
  traceId: string;
  spanId: string;
  parentSpanId?: string;
  timestamp: string;
  type: TraceEventType;
  payload: Record<string, unknown>;
  metadata: {
    userId?: string;
    sessionId: string;
    capabilityId?: string;
  };
}

Event Types

Every event is classified by type. Events marked as ledger-updating are written to the case ledger in addition to the trace store, making them available for case management and replay.

Type	Description	Updates Ledger
state.transition	State machine moved to a new state	Yes
consent.granted	Citizen approved data sharing	Yes
consent.denied	Citizen declined data sharing	Yes
policy.evaluated	Eligibility rules checked	Yes
handoff.initiated	Escalation to human adviser triggered	Yes
capability.invoked	Department service called	Yes
llm.request	Language model call initiated	Yes
llm.response	Language model response received	Yes
credential.presented	Verifiable proof submitted	Yes
receipt.issued	Outcome documented with immutable receipt	Yes
field.extracted	Fact parsed from conversation	No
error.occurred	System error logged	No

ReceiptGenerator

Generates immutable receipts from trace events. Receipts are the citizen-facing evidence that an action was taken. They include a reference to the service, the action performed, the outcome, and which data was shared.

Class & Interface ReceiptGenerator / Receipt

class ReceiptGenerator {
  generate(event: TraceEvent): Receipt
}

interface Receipt {
  id: string;
  capabilityId: string;
  action: string;
  outcome: string;
  timestamp: string;
  dataShared?: string[];
}

CaseStore

The case store maintains a ledger of in-flight and completed cases. It is updated from trace events that carry ledger-updating types. Each case aggregates the full history of a citizen’s interaction with a specific service, including state transitions, consent decisions, and receipts.

Class CaseStore

class CaseStore {
  getCase(caseId: string): CaseRecord | null
  listByUser(userId: string): CaseRecord[]
  listByService(serviceId: string): CaseRecord[]
  updateFromEvent(event: TraceEvent): void
}

interface CaseRecord {
  id: string;
  userId: string;
  serviceId: string;
  currentState: string;
  stateHistory: string[];
  consentDecisions: ConsentDecision[];
  receipts: Receipt[];
  createdAt: string;
  updatedAt: string;
  status: "active" | "completed" | "handed-off" | "rejected";
}

The case store is a derived view — it is built entirely from trace events. If the case store is lost, it can be reconstructed by replaying events from the evidence store. This makes the evidence store the single source of truth.

ReplayEngine

The replay engine reconstructs full conversation state from trace events. Given a trace ID, it steps through events in chronological order, rebuilding state machine position, consent decisions, collected fields, and the complete message history.

Class ReplayEngine

class ReplayEngine {
  loadTrace(traceId: string): Promise<ReplaySession>
  stepForward(session: ReplaySession): ReplayFrame
  stepBackward(session: ReplaySession): ReplayFrame
  jumpToEvent(session: ReplaySession, eventId: string): ReplayFrame
}

interface ReplayFrame {
  eventIndex: number;
  totalEvents: number;
  currentState: string;
  collectedFields: Record<string, unknown>;
  consentState: Record<string, boolean>;
  event: TraceEvent;
}

Replay supports three use cases: (1) Audit — regulators or complaints teams step through a citizen’s journey to verify that rules were followed. (2) Debugging — engineers identify where a journey diverged from expected behaviour. (3) Dispute resolution — when a citizen contests an outcome, the replay provides incontrovertible evidence of what happened and why.

Trace Lifecycle

A typical pipeline run produces the following sequence of trace events.

span.start

Pipeline span opened with trace ID, session ID, and capability ID.

policy.evaluated

Eligibility rules checked. Payload contains passed/failed rules and overall result.

llm.request

Language model called. Payload contains prompt hash and token counts.

llm.response

Language model responded. Payload contains model ID, token usage, and tool calls made.

state.transition

State machine moved from eligibility-checked to consent-given. Transition validated.

consent.granted

Citizen accepted data sharing. Grant ID and scope recorded.

field.extracted

New facts parsed from conversation. Fields and sources recorded.

receipt.issued

Immutable receipt generated for the completed action.

span.end

Pipeline span closed. Duration and step count recorded.

5

Citizen Data Architecture

A unified data model for citizen information with field-level provenance, tiered trust levels, and delegation-aware access control.

Three Data Tiers

Every data field carries a trust tier. The tier determines how the field can be used, whether it requires consent to share, and whether the citizen can edit it.

Tier 1: Verified

Immutable, government-sourced data. Confirmed by a department system of record. The citizen cannot edit these fields — only the source department can update them. Examples: National Insurance number, driving licence number, passport details.

Tier 2: Submitted

Citizen-entered data that has not been verified against a department source. The citizen can edit these fields at any time. Examples: contact preferences, correspondence address, accessibility needs.

Tier 3: Inferred

Agent-derived data, flagged as such. Extracted from conversation context or computed from other fields. Always shown with an “inferred” label so the citizen knows its provenance. Examples: estimated income bracket, likely household composition.

Field Metadata

Every field in the citizen data model carries source metadata. This enables the platform to show exactly where each piece of data came from and which consent grant authorised its use.

Interface FieldSource

interface FieldSource {
  source: string;     // "HMRC", "DVLA", "Home Office"
  tier: "verified" | "submitted" | "inferred";
  topic: string;     // "identity", "finance", "employment"
}

RelationshipStore

Manages delegation relationships: parents acting for children, power of attorney holders acting for vulnerable adults, executors managing a deceased person’s affairs. The store enforces scoped permissions and logs every delegated action for audit.

Class RelationshipStore

class RelationshipStore {
  canActOnBehalf(
    actorId: string,
    subjectId: string,
    scope: PermissionScope
  ): boolean
  getPermissions(
    actorId: string,
    subjectId: string
  ): PermissionScope[]
  getSubjects(
    actorId: string
  ): Array<{ userId: string; type: RelationshipType }>
  logDelegatedAction(
    action: DelegatedAction
  ): void
}

Default Permissions by Relationship Type

Relationship	Default Permissions
parent_of	full_authority
guardian_of	full_authority
executor_of	view_data, submit_on_behalf, make_payments, correspond
attorney_of	view_data, submit_on_behalf, manage_consent, make_payments, correspond
carer_of	view_status, receive_alerts
spouse	view_status, receive_alerts
representative_of	view_status, view_data, correspond

Delegation is always scoped. A power of attorney holder can submit forms and manage consent, but a carer can only view status and receive alerts. The RelationshipStore enforces these boundaries at the API level — the agent cannot bypass them through conversational persuasion.

Unified Citizen Data Model

The citizen data model aggregates fields from multiple departments into a single profile. Each field carries its source metadata, enabling the platform to track provenance across department boundaries.

Example: Citizen profile structure

{
  "identity": {
    "full_name":        { "value": "Margaret Chen",     "tier": "verified",  "source": "HMRC" },
    "date_of_birth":    { "value": "1958-03-14",      "tier": "verified",  "source": "HMRC" },
    "ni_number":        { "value": "QQ123456C",       "tier": "verified",  "source": "HMRC" }
  },
  "contact": {
    "address":          { "value": "12 Oak Lane, Bristol", "tier": "submitted", "source": "citizen" },
    "email":            { "value": "m.chen@email.com",  "tier": "submitted", "source": "citizen" }
  },
  "finance": {
    "annual_income":    { "value": 28400,              "tier": "verified",  "source": "HMRC" },
    "estimated_savings": { "value": "low",              "tier": "inferred",  "source": "agent" }
  }
}

The profile is organised by topic: identity, contact, finance, employment, health, transport, housing. Each topic maps to one or more source departments. When a service requests a field, the platform checks the consent model to determine whether the citizen has granted access to that topic from that source.

Cross-Department Data Flow

When a citizen consents to share data across departments, the platform mediates the exchange. The flow is always explicit: the citizen sees which fields will be shared, which department will receive them, and for what purpose.

Citizen starts DVLA renewal

The service manifest declares it needs National Insurance number, which is sourced from HMRC.

Consent card presented

“DVLA needs your National Insurance number from HMRC to verify your identity. Allow?”

Citizen grants consent

Decision recorded in the evidence store with trace ID, grant ID, scope, and timestamp.

Field resolved

NI number from HMRC (Tier 1, verified) is made available to the DVLA service context. The field collector marks it as collected.

Receipt issued

An immutable receipt records that the NI number was shared with DVLA, for the purpose of identity verification, with the citizen’s explicit consent.

6

Prompt Engineering

The system prompt is not a monolith. It is assembled from layered fragments, each serving a specific purpose. Static layers are cached; dynamic layers change every turn.

System Prompt Composition

In journey mode (service in progress), the system prompt is built from thirteen layers. Each layer adds context that shapes the agent’s behaviour for the current turn.

Journey mode prompt layers

 1. Agent personality       // conservative / expansive discovery
 2. Persona communication   // Plain English / formal / assisted
 3. Scenario context       // Service domain and background
 4. Persona data           // Verified fields from citizen profile
 5. Policy evaluation      // Eligibility results
 6. Fact extraction        // Instructions for parsing citizen input
 7. State model context    // Current state + allowed transitions
 8. State instructions     // Per-state DO/DO NOT rules
 9. Field collector status  // Required / collected / missing
10. Consent requirements   // Grants needed for this state
11. Accuracy guardrails    // Factual grounding rules
12. Task format            // Card type specifications
13. Structured output      // JSON block schema

Layer Stack

Each layer occupies a portion of the token budget. The layers are ordered by stability: static layers first, dynamic layers last.

Layer 1: Agent personality

Behavioural profile — cautious/proactive, formality level

static

Layer 2: Persona style

Communication preferences from citizen profile

static

Layer 3: Scenario context

Service domain, department, relevant background

static

Layer 4: Service artefacts and state

Policy results, state model, field status, consent

per-service

Layer 5: Dynamic context

Per-turn state instructions, extracted facts, missing fields

per-turn

Layer 6: Guardrails and format

Accuracy rules, task types, structured output schema

static

Triage mode is lighter, omitting layers 4, 5, 7, 8, 9, and 10. The total token budget varies by model and configuration.

Prompt Caching Strategy

The system prompt is split into a static prefix and a dynamic suffix. The static prefix (agent personality, persona style, scenario context, guardrails) is marked with a cache control directive. The dynamic suffix (state context, field collector, extracted facts) changes every turn.

Cache hit rate: The static prefix is tagged with a cache control directive supported by the LLM provider. When the citizen remains within the same service journey, the prefix is reused from cache. This yields significant token savings on repeated turns within a journey. The savings compound across multi-turn conversations, which typically run 8–12 turns.

Structured Output Schema

The language model must produce a JSON block at the end of every response. The orchestrator parses this block at pipeline step 8. The schema enforces a consistent contract between the LLM’s output and the deterministic validation that follows.

Required JSON block in LLM response

{
  "title": "Renew your driving licence",
  "tasks": [
    {
      "type": "form",
      "title": "Confirm your details",
      "fields": ["full_name", "date_of_birth", "address"]
    }
  ],
  "transition": "details-confirmed",
  "facts": [
    { "key": "preferred_contact", "value": "email" }
  ]
}

The transition field is validated against the state machine (step 10). If the LLM proposes an illegal transition, it is silently dropped and the state machine remains at its current position. The tasks array may be overridden entirely by deterministic task injection (step 11) at certain states.

Triage Mode vs Journey Mode

The agent operates in two modes. Mode selection happens at pipeline step 4 and determines which prompt layers are included and which tools are available.

Triage Mode

No service context. The agent identifies the citizen’s need and proposes one or more services. Prompt layers 4, 5, 7, 8, 9, and 10 are omitted. No state machine, no field collector, no consent. The agent works from the full service catalogue to find the best match.

Journey Mode

Service selected and artefacts loaded. All 13 prompt layers are active. The state machine tracks progress, the field collector tracks data, and the consent manager gates access. The agent operates within the boundaries defined by the service’s artefacts.

Agent Personalities

The agent personality layer (layer 1) defines the agent’s behavioural profile. Two personalities are currently defined, each tuned for different citizen needs.

Personality	Behaviour	Best For
Conservative discovery	Cautious, step-by-step, asks for explicit confirmation before advancing. Never skips questions, even when data is already known. Prioritises citizen understanding over speed.	Vulnerable users, complex services, high-stakes decisions (benefits, legal)
Expansive discovery	Proactive, efficient, pre-fills forms and auto-submits where possible. Explains what it has done rather than asking permission first. Prioritises speed while maintaining accuracy.	Routine renewals, confident users, repeat interactions

Personality does not affect deterministic behaviour. Both conservative and expansive discovery modes follow the same state machine, the same policy rules, and the same consent requirements. The personality only affects how the agent communicates — whether it asks before acting or acts before explaining. The platform’s guarantees hold regardless of personality.

7

Service Integration

Two strategies for connecting services to the runtime, a tool generation pipeline, and a three-tier service store with gap analysis.

Two Service Strategies

JsonServiceStrategy

deterministic

No tools given to the LLM. Service context is built entirely from structured artefacts and the field collector. All state transitions are proposed by the LLM in its structured output and validated by the orchestrator’s state machine.

Best for: Services with complete, hand-crafted artefact sets. Maximum determinism, minimum LLM autonomy.

McpServiceStrategy

tool-based

The LLM receives check_eligibility and advance_state tools via the adapter. It calls these during the agentic loop (step 7). Tool results are parsed for state transitions and fed back into the validation pipeline.

Best for: Service-graph services that expose department APIs. The LLM has more autonomy but is still bounded by the state machine.

Tool Generation Pipeline

The server generates a standard set of integration points per service from its structured artefacts. This ensures a consistent interface regardless of the underlying department system.

Load artefacts

Manifest, policy, consent model, and state model read from the service store.

Generate tools (2 per service)

check_eligibility — evaluates policy ruleset against citizen data. advance_state — proposes a state transition with context.

Generate resources (4 per service)

Manifest, policy, consent, and state model exposed as typed URIs. Consumers read artefacts without knowing the storage format.

Generate prompts (2 per service)

Journey template (full service conversation) and eligibility check template (quick check only).

Service Store Tiers

Services come from three sources, with descending levels of artefact completeness. The store applies precedence rules: a fully described service always overrides a graph entry, which overrides a catalogue entry.

Tier 1: Full

Hand-crafted services with complete artefact sets: manifest, policy, state model, consent, and state instructions. Highest precedence. These are the services that work end-to-end through the agent.

Tier 2: Graph

Service-graph nodes with metadata, department attribution, and partial artefacts. Sufficient for triage and eligibility checking, but not for full journey orchestration.

Tier 3: Catalogue

The current GOV.UK service register contains approximately 1,500 services with names and department attribution. Used for service discovery during triage. No artefacts — the agent can identify the service but cannot orchestrate a journey.

Gap Analysis

The Legibility Studio computes per-department coverage by comparing full (Tier 1) services against the total catalogue. This surfaces exactly which services need artefact authoring and in what priority order.

Coverage formula: For each department, gap analysis computes full_count / total_count. A department with 3 fully described services out of 47 total has 6.4% coverage. The studio ranks departments by total service volume and highlights the highest-impact gaps — the services that appear in the most citizen life events.

Integration teams should start with high-volume, cross-departmental services: bereavement notifications, child benefit claims, driving licence renewals. These appear in multiple life events and benefit the most citizens when made fully legible to agents.

Artefact Authoring Workflow

Departments author artefacts through the Legibility Studio or by editing structured artefacts directly. The following sequence represents the recommended authoring order.

1

Capability manifest

Start here. Define the service name, department, description, input schema (what data the service needs), and output schema (what the service produces). This is the service’s identity card.

2

Policy ruleset

Define eligibility rules. Each rule maps a citizen data field to a condition using typed operators. Include the human-readable failure message that will be shown if the rule fails. Mark edge cases that should trigger handoff rather than outright rejection.

3

State model

Map the citizen journey as a state machine. Define each state, its allowed transitions, transition guards (conditions that must be met), terminal states, and which states emit receipts. Keep the model linear where possible — branches add complexity.

4

Consent model

Declare what data the service needs to access and why. Separate required grants (must accept to proceed) from optional grants (can decline without blocking). Each grant needs a plain-English description suitable for a consent card.

5

State instructions

Write per-state DO/DO NOT rules for the agent. These constrain the LLM’s behaviour at each point in the journey. Be specific: “DO ask for the citizen’s driving licence number” is better than “DO collect required information”.

Worked Example: Manifest Structure

The capability manifest is the entry point for any service. Here is the structure for a typical renewal service.

Example: CapabilityManifest

{
  "id": "dvla-renew-driving-licence",
  "name": "Renew a driving licence",
  "department": "Driver and Vehicle Licensing Agency",
  "description": "Renew a photocard driving licence that is expiring or has expired.",
  "version": "1.0.0",
  "input_schema": {
    "required": [
      "full_name", "date_of_birth",
      "driving_licence_number", "national_insurance_number",
      "address", "photo"
    ],
    "optional": ["email", "phone"]
  },
  "output_schema": {
    "produces": ["application_reference", "expected_delivery_date"]
  },
  "estimated_duration": "10 minutes",
  "fee": {
    "amount": 14.00,
    "currency": "GBP"
  }
}

The input schema drives the field collector. When the orchestrator loads a service, it seeds the FieldCollector from the manifest’s input_schema.required array. Fields already present in the citizen’s profile are marked as collected; the remainder become the “missing fields” list that the agent uses to ask targeted questions.

8

Assurance and Verification

Four assurance layers ensure that artefacts are correct, the pipeline behaves deterministically, model changes are governed, and compliance requirements are met continuously.

Artefact Verification

Before a department’s artefacts enter the service registry, they pass through automated validation and simulation.

1

Schema validation deterministic

Every artefact (manifest, policy, state model, consent, state instructions) is validated against its published schema. Type errors, missing required fields, and structural violations are caught before the artefact can be registered.

2

Referential integrity deterministic

Cross-artefact references are checked: policy rules reference fields declared in the manifest input schema; state model transitions reference valid states; consent grants reference data scopes that exist in the field catalogue.

3

Dry-run simulation platform

The platform runs the full pipeline with synthetic citizen data against the service artefacts. This verifies that the state machine can reach all terminal states, policy rules evaluate correctly, and consent flows complete without blocking. No live LLM call is required — the simulation uses deterministic response fixtures.

4

Gap analysis platform

The Legibility Studio reports on artefact completeness: which services have full coverage, which are missing policy rulesets, which state models have unreachable states. Departments can prioritise authoring work based on coverage scores.

Pipeline Assurance

The deterministic pipeline is tested independently of any LLM provider. Because steps 1–6 and 8–14 are pure computation, they can be verified exhaustively against known inputs and expected outputs.

Assurance Area	Method	Coverage
Policy evaluation	Deterministic tests against all seven operators with boundary conditions. Each operator is tested with valid, invalid, missing, and edge-case inputs.	All rule types and operators
State machine transitions	Every defined transition, guard, and terminal state is exercised. Illegal transitions are verified to be rejected. Auto-transitions and forced transitions are validated.	Complete state graph per service
Consent enforcement	Verification that required grants block progress when declined, optional grants do not, and all decisions are recorded in the evidence store.	All consent models
Evidence integrity	Trace events are verified for completeness (every pipeline run produces a span), ordering (events are chronological), and immutability (events cannot be modified after creation).	All event types

Model Governance

When the underlying language model is changed or updated, the platform runs a structured evaluation process before the new model enters production.

1

Evaluation suite

A curated set of known service journeys is replayed against the new model. Each journey has expected outputs: correct structured JSON, valid state transition proposals, appropriate task card generation, and accurate field extraction.

2

Regression testing

The evaluation results are compared against the baseline from the current production model. Regressions in structured output parsing, transition accuracy, or guardrail compliance are flagged for review before approval.

3

Prompt compatibility

The prompt composition is tested to ensure it produces reliable results with the new model. Token budget, cache behaviour, and structured output adherence are validated across all thirteen prompt layers.

Compliance Verification

Automated checks run continuously to verify that the platform meets its policy and consent obligations.

Four continuous compliance checks:

Policy rule coverage — every service with a published policy ruleset has corresponding evaluation coverage. No rule goes untested against synthetic data.
Consent completeness — every data field accessed by a service is covered by a consent grant in the service’s consent model. No data flows without a declared purpose.
Audit trail continuity — every citizen interaction produces a complete trace. Gaps in the evidence store (missing spans, orphaned events) trigger alerts.
Handoff accuracy — edge-case rules consistently trigger human escalation. The platform verifies that safeguarding conditions never result in automated decisions.

9

Security and Data Governance

The platform handles sensitive citizen data across department boundaries. Security, data residency, and governance are architectural concerns, not afterthoughts.

Authentication and Identity

Citizen identity is established through integration with GOV.UK One Login, the cross-government identity platform. The agent does not manage credentials directly — authentication is delegated to One Login, and the platform receives a verified identity token. Session management uses standard secure token patterns with appropriate expiry and refresh mechanisms.

Data Residency

All citizen data is processed and stored within UK jurisdiction. The platform infrastructure is hosted in UK data centres. No citizen personally identifiable information (PII) leaves UK borders during processing, storage, or transit. Department data shared through the platform remains subject to each department’s existing data governance framework.

Encryption

In transit: all communications between the citizen application, the platform services, and department APIs use TLS 1.2 or higher. At rest: the evidence store, citizen data profiles, and consent records are encrypted using AES-256. Encryption keys are managed through a dedicated key management service with automatic rotation.

LLM Data Handling

Citizen PII sent to the language model is processed in-session only. The platform’s contract with LLM providers stipulates that citizen data is not used for model training, is not logged beyond the session, and is not accessible to the provider’s staff. The LLM Adapter Layer enforces this boundary — it controls exactly which citizen data fields are included in prompts, and strips sensitive fields (financial details, health records) unless explicitly required by the current service and covered by a consent grant.

Consent as a Technical Control

The Consent Manager is a hard gate, not advisory. When a service requires data that the citizen has not consented to share, the pipeline halts. The LLM cannot override consent decisions through conversational persuasion or prompt injection. Consent decisions are recorded as immutable trace events and are available for audit at any time. Consent can be withdrawn, at which point the platform ceases to share the affected data fields with the relevant service.

Audit Trail

Every significant action is captured in the evidence store as an immutable trace event. The evidence store is append-only — events cannot be modified or deleted after creation. This provides a complete, tamper-evident record of every policy evaluation, consent decision, state transition, LLM interaction, and receipt issued. The replay engine can reconstruct the exact state of any citizen interaction at any point in time.

Prompt Injection Mitigation

The architecture’s separation of deterministic and AI layers provides structural protection against prompt injection. Even if a malicious input manipulates the LLM’s response, the deterministic pipeline (steps 8–14) validates all outputs: state transitions must be legal, consent must be granted, policy rules must pass. The LLM proposes; the platform disposes.

Access Control and Delegation

The RelationshipStore enforces scoped permissions for delegated access. A power of attorney holder, parent, or carer can act on behalf of another citizen only within their declared permission scope. Every delegated action is logged separately in the evidence store, recording both the actor and the subject. Permission boundaries are enforced at the API level and cannot be circumvented through the conversational interface.

10

Resilience and Failure Modes

The platform is designed to degrade gracefully. Every failure mode has a defined handling strategy that preserves citizen trust and data integrity.

1

LLM unavailability

If the language model is unreachable or returns an error, the platform degrades gracefully to static guidance. The citizen is shown pre-authored content for their current service state, including next steps and links to the existing GOV.UK web journey. No data is lost — the citizen can resume the agentic journey when the model becomes available.

2

Department API timeout

When a department API call times out or fails, the platform retries with exponential backoff. If retries are exhausted, the citizen is notified that the specific action could not be completed and is offered the option to continue with partial information or save their progress and return later. The platform never silently drops a failed action.

3

Evidence store failure

If the evidence store is temporarily unavailable, trace events are queued in a durable buffer and flushed when the store recovers. The pipeline never discards trace data. If the buffer reaches capacity, the pipeline pauses rather than proceeding without audit coverage. Evidence integrity is non-negotiable.

4

Handoff as a safety valve

At any point in the pipeline, the system can escalate to a human adviser. Handoff triggers include: safeguarding concerns detected in conversation, repeated pipeline errors for the same citizen, edge-case policy rules, and explicit citizen request. The full conversation context, collected fields, and current state are preserved and transferred to the human adviser.

5

Malformed LLM output

If the language model produces output that cannot be parsed at pipeline step 8, the response text is still delivered to the citizen but no state transition is applied and no tasks are generated. The conversation continues from its current state. The malformed output is logged as a trace event for monitoring and model evaluation.

The platform always has a safe fallback. No failure mode results in a citizen being left without guidance, losing their progress, or having data silently dropped. The worst case is a degraded experience with human support — never a broken one.

Architecture & Integration Guide

System Architecture

Package Dependency Graph

Architectural Rules

Two Applications

Citizen App

Legibility Studio

Data Layer

Artefact Structure

Package Responsibility Summary

The Runtime Pipeline

OrchestratorOutput

Service Strategy Pattern

Agentic Loop Detail

Pipeline Trace Structure

Legibility Package

PolicyEvaluator

StateMachine

State Flow: DVLA Driving Licence Renewal

ConsentManager

FieldCollector

ArtefactStore

Policy Operator Reference

State Model Transition Guards

Evidence Plane

TraceEmitter

TraceEvent Structure

Event Types

ReceiptGenerator

CaseStore

ReplayEngine

Trace Lifecycle

Citizen Data Architecture

Three Data Tiers

Tier 1: Verified

Tier 2: Submitted

Tier 3: Inferred

Field Metadata

RelationshipStore

Default Permissions by Relationship Type

Unified Citizen Data Model

Cross-Department Data Flow

Prompt Engineering

System Prompt Composition

Layer Stack

Prompt Caching Strategy

Structured Output Schema

Triage Mode vs Journey Mode

Triage Mode

Journey Mode

Agent Personalities

Service Integration

Two Service Strategies

JsonServiceStrategy

McpServiceStrategy

Tool Generation Pipeline

Service Store Tiers

Tier 1: Full

Tier 2: Graph

Tier 3: Catalogue

Gap Analysis

Artefact Authoring Workflow

Worked Example: Manifest Structure

Assurance and Verification

Artefact Verification

Pipeline Assurance

Model Governance

Compliance Verification

Security and Data Governance

Authentication and Identity

Data Residency

Encryption

LLM Data Handling

Consent as a Technical Control

Audit Trail

Prompt Injection Mitigation

Access Control and Delegation

Resilience and Failure Modes