Building Compliant AI Agents: Governance Controls and Technical Architecture

Running AI agents in regulated industries is an engineering problem. The models work. The hard part is building the infrastructure around them (audit systems, approval workflows, data boundaries, access controls) so the organisation can demonstrate compliance at any point. This reference covers the technical controls and architectural patterns required.

The governance stack

Compliant AI agent deployment requires controls at every layer. Governance added after the fact does not work. It has to be structural.

Immutable audit trails

Every action an AI agent takes must be recorded in a tamper-evident log:

Inputs: the documents, data, or queries the agent received
Processing: which model was invoked, what version, what configuration parameters were active
Reasoning: intermediate steps, retrieved context, and decision logic
Outputs: the final result, recommendation, or action taken
Human review: whether the output was reviewed, by whom, when, and what decision was made

The log must be append-only. Records cannot be modified or deleted. Timestamps must be reliable and synchronised. Storage must meet the retention requirements of the applicable regulatory framework, which in financial services typically means five to seven years.

This is not a debugging tool. It is a regulatory artefact. When the FCA requests evidence of your decision-making process, or the SRA asks how client data was handled, this log is what you produce.

Human-in-the-loop controls

Not every agent action requires human approval. But the system must support configurable approval gates, settable per process, per risk level, or per output confidence score.

Three common patterns:

Pre-execution review. The agent prepares a recommendation but does not act until a human approves. Suited to high-stakes decisions: binding insurance risks, issuing legal advice, making investment recommendations.

Post-execution review. The agent acts immediately but flags outputs for human review within a defined window. Suited to high-volume, lower-risk processes where speed matters: document classification, data extraction, routine correspondence.

Exception-based review. The agent acts autonomously unless it hits an edge case or produces a low-confidence output. Suited to well-understood processes where accuracy is established: standard bordereaux validation, routine compliance checks.

The approval workflow must be integrated into the agent’s execution pipeline, not layered on top as a separate system. The agent must be aware of its approval status and halt execution when required.

Role-based access control

Compliant deployment requires granular permissions across several dimensions:

Builder permissions: who can create and configure agents, define their data access, and set behavioural parameters
Deployment permissions: who can promote agents from testing to production
Review permissions: who can approve or reject agent outputs
Audit permissions: who can access the audit trail and export records for regulatory review
Administration permissions: who can modify system configuration, manage users, and update security settings

These roles must map to the organisation’s existing governance structure. In financial services under SM&CR, this means the Senior Manager responsible for AI systems can trace accountability from oversight through deployment and review.

Deployment architecture

Where AI agents run determines whether compliance is achievable. The deployment model is a regulatory decision, not just a procurement one.

Private cloud deployment

The most common architecture for regulated deployments places all components within the organisation’s own cloud tenancy:

Infrastructure layer. Compute, storage, and networking within a single cloud region (typically Azure UK South, AWS eu-west-2, or equivalent). The organisation holds the subscription and manages access through its own identity provider.

Model layer. Large language models run as containers within this tenancy. Open-weight models (Llama, Mistral, and similar) can be deployed directly. Commercial models from Anthropic or OpenAI can be accessed through private endpoints (Azure OpenAI Service, AWS Bedrock) that do not route data outside the tenancy boundary.

Encryption. Data at rest is encrypted with keys managed through the organisation’s own key management service (Azure Key Vault, AWS KMS). The AI platform vendor does not hold decryption keys.

Network controls. The platform operates within a virtual network with no public internet exposure. Access is through VPN or private endpoints only. See the security architecture for details on how these controls are implemented.

This architecture satisfies GDPR data residency requirements, FCA expectations around data security, and Lloyd’s requirements for managing agent systems and controls. If the relationship with the AI platform vendor ends, the organisation retains all data and can export all agent configurations.

On-premises deployment

Some organisations, particularly in defence, certain government departments, and firms handling highly sensitive financial data, require air-gapped deployment with no cloud dependency.

On-premises deployment runs the full stack on the organisation’s own hardware: model inference, agent orchestration, governance controls, and storage. Maximum isolation, but the organisation must manage the underlying infrastructure, including GPU provisioning for model inference.

The trade-off is operational complexity. Model updates, platform upgrades, and scaling must be handled internally or through scheduled on-site support.

Hybrid deployment

A hybrid model places the platform management layer (agent configuration, monitoring dashboards, user management) in the vendor’s infrastructure while keeping model execution and data storage within the organisation’s environment.

This reduces operational overhead while maintaining data sovereignty. The management plane communicates with the execution plane through encrypted channels and never processes or stores the organisation’s data.

Hybrid works well for organisations that want rapid setup and managed platform operations but cannot allow data to leave their infrastructure boundary.

Model management

Running AI agents in production requires treating models as managed assets with their own lifecycle.

Model selection and hosting

The architecture should be model-agnostic. Locking into a single model provider creates vendor dependency and limits the ability to respond to advances in the field or changes in provider terms.

A practical approach supports:

Open-weight models deployed as containers within the organisation’s infrastructure, offering maximum control. The organisation holds the weights, controls the inference environment, and faces no external data transmission.
Commercial models accessed through private API endpoints within the organisation’s cloud tenancy. Azure OpenAI Service and AWS Bedrock provide this without routing data to the model provider’s general infrastructure.
Specialised models fine-tuned on domain-specific data for particular tasks, running alongside general-purpose models, with the orchestration layer routing tasks to the appropriate model.

Version control and rollback

Every model deployed in the system must be versioned. The audit trail must record which model version produced each output. When a model is updated, the system must:

Record the change: what version was replaced, what was deployed, who authorised it, and when
Run validation against a reference dataset to confirm the new version meets accuracy thresholds
Support rollback to the previous version if the new version produces degraded results in production

Drift monitoring

Model performance changes over time as the data distribution shifts. A compliant deployment must monitor for drift:

Output quality metrics. Accuracy, precision, recall, and domain-specific measures tracked over time.
Confidence distribution. Shifts in the agent’s confidence scores may indicate the model is encountering data outside its training distribution.
Human override rates. An increase in reviewers rejecting agent outputs signals degradation.

When drift is detected, the system should flag the affected agent for review rather than allowing it to continue producing outputs of declining quality.

Integration patterns

AI agents in regulated environments must connect to existing enterprise systems without creating new compliance gaps.

API-based integration

The standard approach for systems with modern APIs. The agent accesses source systems through authenticated API calls, with each request logged in the audit trail. OAuth 2.0 or certificate-based authentication ensures agent access is traceable and revocable.

Common integrations include document management systems (SharePoint, NetDocuments), CRM platforms (Salesforce), core insurance platforms, and email systems (Microsoft Graph API for Outlook).

Browser-based computer control

For legacy systems without APIs (and regulated industries have many), agents can interact through browser-based automation. The agent controls a browser session, navigating web interfaces, reading screen content, and performing actions as a human would.

This requires additional governance: the browser session must be recorded, actions must be logged at a granular level, and the agent must operate within defined boundaries to prevent unintended interactions.

Data ingestion pipelines

Document-heavy processes require structured ingestion: receiving documents from multiple sources (email, portals, file shares), classifying them, extracting relevant data, and passing structured outputs to downstream agents or systems.

The pipeline must handle format diversity (PDF, DOCX, XLSX, scanned images, emails with attachments) and maintain provenance. Every extracted data point must trace back to its source document and location within that document.

Implementation approach

Deploying AI agents in regulated industries works when scoped tightly and delivered iteratively. The pattern that fails is the eighteen-month transformation programme that tries to automate everything at once.

Phase 1: Assessment (weeks 1-2)

Identify the specific processes where AI agents can deliver measurable value. Assess the data environment, regulatory constraints, and integration requirements. Produce a prioritised roadmap based on impact and feasibility.

Phase 2: First agent (weeks 3-6)

Build, test, and deploy a single agent against a real process with real data. This phase establishes the infrastructure: cloud environment, governance controls, audit logging, and human review workflows.

The goal is not a proof of concept. It is a production system processing real work under real governance. If the agent cannot satisfy compliance requirements, the issues surface now rather than after twelve months of development.

Phase 3: Expand (months 2-6)

With the platform and governance layer in place, subsequent agents are faster to deploy. Each new agent reuses the existing infrastructure, audit framework, and approval workflows. The organisation builds internal capability to configure and manage agents, reducing dependency on the implementation partner.

Phase 4: Operate (ongoing)

Ongoing operation covers model updates, drift monitoring, new agent development, and periodic regulatory review. The governance framework evolves as regulations change and the organisation’s use of AI agents matures.

Evaluating an AI agent platform

When assessing platforms for regulated AI agent deployment, these capabilities matter:

Capability	Why it matters
Private deployment	Data sovereignty and regulatory compliance require infrastructure you control
Immutable audit trails	Regulators require evidence of every AI decision and its reasoning
Configurable human oversight	Different processes need different levels of human involvement
Model agnosticism	Avoid vendor lock-in; use the best model for each task
Role-based access control	Accountability requires traceable permissions aligned to governance structures
Explainable outputs	Regulators, clients, and affected individuals need comprehensible explanations
Integration framework	Agents must connect to existing systems without creating new data leakage risks
Export capability	If you change platforms, you must retain your data, configurations, and audit history

Platforms that treat these as premium add-ons rather than core architecture are not built for regulated environments.