Executive summary
That approach preserves the core’s stability while delivering measurable business outcomes in months, not years. For boards and CIOs, the overlay model is the shortest path from PoC to production with controlled risk, clear FinOps levers, and auditability. (Accenture)
Why you shouldn’t “rip and replace” the core for every AI idea
Core banking systems — whether monolithic mainframes or modern cloud cores — are the single source of truth for ledger state, settlement, regulatory reporting, and reconciliation. Touching them for every new AI feature multiplies testing, increases regulatory reviews, and risks operational continuity. Many banks that attempt continual core changes find delivery times measured in quarters or years and face ballooning costs.
Instead, overlays treat the core as authoritative while attaching a flexible, observable orchestration layer that:
- reads data in real time (or via CDC / event streams),
- enriches and reasons using agentic AI patterns, and
- writes only narrow, reversible artifacts (decision IDs, status flags, correlation IDs) under strict policy controls.
This separation keeps transaction integrity in the core while allowing rapid experimentation and safe automation on top. It also reduces the scope of security and audit reviews for new features — the overlay surface can be hardened and certified independently. (McKinsey)
The overlay architecture — roles, patterns, and why they map to outcomes

A practical overlay is not a single chatbot but a small ecosystem of clearly bounded services and agent roles. Think in terms of patterns you can reuse across product lines:
Core roles (pattern catalog)
- Router (Identity & Scope): Authenticates, masks PII, labels jurisdiction and channel, and decides which orchestration pattern to use.
- Planner (Flow Engine): Breaks the work into ordered steps (fetch docs, run scorer, craft message, schedule callback), chooses model tiers, and applies retries/timeouts.
- Knowledge / Retrieval (RAG): Pulls exact policy, pricing, product rules, and customer history; returns citations and confidence bands.
- Tool Executor (Action): Executes scoped APIs — e.g., create a ticket, schedule a callback, issue a status flag — under least privilege.
- Supervisor (Guardrails & HITL): Enforces policy-as-code, rate limits, redaction, and human approvals for exceptions.
- Critic / Telemetry (FinOps & QA): Samples outputs for drift, grounded-answer rate, cost per action, and triggers rollbacks if thresholds fail.
Why this matters: each role is auditable, replaceable, and testable. You can instrument the Planner to route heavy synthesis to large models only for edge cases, and keep cheap classification models in the fast path — giving predictable cost and predictable latency.
Four proven overlay patterns (and the business problems they solve)
- Read-heavy Decision Layer (real-time, narrow write)
- Use when you must give a near-instant decision (pre-approval, eligibility check) without touching accounting flows.
- Business impact: shorter time-to-decision, higher conversion, fewer manual handoffs.
- Event-driven Orchestration (stream → enrich → act)
- Subscribe to core events (transactions, disbursements) and orchestrate follow-ups (fraud triage, promise management).
- Business impact: fewer late claims, higher straight-through processing, lower complaint rates.
- Batch Augmentation & Change Sets (safe reconciliation)
- Run heavy models in batch windows; produce reconciled change lists for controlled ingestion.
- Business impact: rapid portfolio repricing, better portfolio health without 24/7 core load.
- API Facade & Experience Mesh (fast front ends)
- Build composite APIs that merge core reads with AI enrichment to power digital assistants, partner APIs, or white-label endpoints.
- Business impact: faster time to market for new channels and partner features.
Each pattern keeps the core’s transactional semantics intact while unlocking the specific business value you need now.
Governance and auditability — the non-negotiables

Regulators and auditors will ask three questions: who made the decision, what data supported it, and can you reconstruct that path? The overlay must make those answers trivial.
Must-have guardrails:
- Policy-as-code: encode rules (e.g., writeback conditions, rate limits, disclosure text) as machine-enforceable policies the Supervisor executes at runtime.
- Immutable decision logs: for every automated action record: request, retrieval IDs (document and chunk IDs), model ID/version, prompts, responses, Supervisor outcome, and the user or human approver.
- Scoped write contracts: define exactly what fields the overlay can change and under what approvals .
- Data residency & encryption controls: ensure all PII handling complies with residency rules and encryption standards.
- Testing & canary rollouts: any change to Planner, Knowledge or Supervisor must pass automated quality checks and staged canary releases with rollback triggers.
These controls reduce audit friction, demonstrate compliance by design, and make governance an enabler rather than a blocker.
FinOps & performance levers — keeping AI cost predictable
AI overlays introduce recurring compute and model costs — but they also create levers to manage them:
- Model routing & tiering: classify → cheap model; summarize/score → mid model; explain/legal text → large model. The Planner should route automatically based on step SLAs and confidence.
- Cache & TTL: cache retrieval results and common decision artifacts with sensible TTLs to avoid repeated costly retrievals.
- Batching for heavy work: push expensive retraining, re-scoring, or recompute to scheduled windows.
- Per-decision cost attribution: tag every run with a cost token so product owners see spend per outcome and can trade-off quality vs. price.
- Fallback & portability: abstract model providers so you can route to a cheaper provider for low-value flows and switch if SLA/cost slips.
Combine these with a FinOps dashboard that shows cost per resolved decision (not just token spend) and you get finance buy-in to scale the program responsibly.
Implementation blueprint: 90 days from pilot to meaningful production

Goal: reduce loan pre-approval time by 40% while keeping only reversible writebacks to the core.
Days 0–14 — Discovery & baseline
- Map core APIs, identify CDC streams, and baseline metrics (time-to-decision, overrides, % manual touches).
- Choose a single high-value pilot workflow.
Days 15–45 — Build MVP overlay
- Stand up Router, Planner, Knowledge, Tool Executor, and Supervisor in a sandbox.
- Implement read-only integration first and a one-screen human review for any writeback.
Days 46–75 — Supervised rollout
- Route a controlled percentage of traffic through the overlay in supervised mode (human approves exceptions).
- Track grounded-answer rate, Supervisor acceptance, per-decision cost, and latency p50/p95.
Days 76–90 — Optimize & scale
- Introduce model routing, caching, and cost routing.
- Expand to a second workflow and plan a staged production cutover.
This approach limits risk, creates measurable business outcomes, and builds the governance artifacts required for enterprise sign-off.
Example KPIs to measure impact (operational + financial)
- Decision time: p50 / p95 time-to-decision (target: 40% reduction p50).
- Supervisor acceptance rate: % of automated decisions accepted without edits.
- Cost per decision: (model + infra + verification hours) / accepted decisions.
- Grounded-answer rate: % of outputs that cite a verifiable source.
- Core writeback incidents: number of reconciliation exceptions per 10k writebacks (target: near zero).
These KPIs tie AI activity to finance and ops outcomes, making the program measurable and fundable.
Common objections and how to answer them
“But we must keep the core pristine — any writes are risky.”
Answer: limit writes to status flags, correlation IDs, and human-approved change sets. Reconciliation and idempotency prevent upstream risk.
“How will regulators view this?”
Answer: provide immutable decision logs, policy-as-code, and a supervisor that enforces escalation for any adverse outcome. This usually shortens, not lengthens, audit timelines.
“Won’t this increase vendor lock-in?”
Answer: abstract models and tools behind clear contracts and design the Planner to switch providers — run portability drills before you scale.
Two practical examples of overlay wins
- Credit pre-approval acceleration — an overlay reads core balances and payment history, runs a Planner that assembles evidence and uses a mid-sized model to craft an offer. A status flag is written to the core pending human sign-off. Result: 30–50% faster pre-approvals with audit traceability.
- Deflection & promise management — event-driven orchestration subscribes to missed payment events, enriches with risk score and communication preference, sends a personalized status update, and schedules a one-click payment link. Result: reduced recontacts and improved cure rates.
These are achievable without replatforming the core — and they compound quickly as you reuse patterns. Schedule a Strategy Call — a21.ai

