Playbook · Checklist

The Enterprise AI Readiness Audit, in Five Gates

A practitioner audit framework for CIOs and CTOs scoping an enterprise AI build. Each gate has pass or fail criteria, not discussion. Run it before the SOW is signed.

Format

Playbook / Checklist

Sector

BFSI, healthcare, public sector

Service relevance

AI security, AI copilots, data

Author

Vishal Shukla, VP of Technology

Why this exists

The failure pattern is operational, not technical

Three out of four enterprise AI projects fail to deliver their intended ROI. The 2026 research from McKinsey, RAND, and the MIT NANDA Initiative is consistent on this. Almost none of the failures are technical. Programs ship a model and stall on data, governance, security, evaluation, or handover. By the time the slip is visible to the CIO, the program has spent eighteen months and seven figures and the team is asking for more.

This is an audit you run before the build, not after. The five gates are the questions an enterprise AI build has to answer cleanly before the SOW is signed. Each gate is binary. Either the criteria are met or they are not. Read each gate, mark each criterion honestly, and treat any failed gate as work that has to happen before construction, not during it.

Key takeaways
  • Three out of four enterprise AI projects miss their intended ROI. Almost none of the failures are technical. The pattern is operational: data, governance, security, evaluation, or handover.
  • This is an audit you run before the build, not after. Each gate is binary. You pass or you do not start construction.
  • Most enterprises fail at least two gates on the first run. The failed gates are the work that has to happen before the build.
  • The audit is best run by a cross-functional team: AI program owner, CISO, a data owner, a compliance lead, and the executive sponsor.
The five gates

A gate fails the moment one criterion fails

The order reflects priority. If you must fix in sequence: data first, security second, governance third, evaluation fourth, handover fifth. But the audit itself is one pass across all five.

Gate 1

Data Readiness

Does the organization have data that an AI system can be trained on, evaluated against, and trusted in production?

Pass criteria

  1. 1. Source data for the in-scope use cases is identified, named, and owned. Each data set has a named human owner who can answer questions about provenance and meaning. No anonymous lake tables.
  2. 2. Data quality has been measured against a defined standard, not assumed. Completeness, accuracy, freshness, and consistency each have current measurements. Gartner estimates only 12 percent of organizations have data clean enough to support production AI.
  3. 3. Sensitive fields are tagged at the column level. PII, PHI, financial data, and any sector-specific protected categories are tagged at ingestion, not at consumption.
  4. 4. Lineage is traceable. For any record the model will train on or infer against, you can produce the source system, the ingestion timestamp, the transformation steps, and the consuming systems within one business day.
  5. 5. A ground-truth set exists for the use case. A labelled, agreed, version-controlled set of input and output pairs. The labels are owned by the business, not by the data team.

Most common failure

The team passes criteria one through three and fails on four or five. Lineage cannot be reconstructed for older records. The ground-truth set is "we will build it during the project." That is a hidden failure: the build phase absorbs the ground-truth work and the timeline slips by months.

If you fail this gate

Stop. The data work is the work. Run a focused data foundation engagement that closes the failed criteria before the AI build kicks off. The cost of doing this before is meaningfully lower than the cost of doing it during.

Gate 2

Security Boundaries

Will the AI system, once built, satisfy your existing security posture and the sector-specific regulatory regime you operate under?

Pass criteria

  1. 1. The threat model has been written down. Not a generic catalog. A model specific to your architecture, data flows, trust boundaries, and user populations. Prompt injection, model extraction, training-data poisoning, and prompt-based exfiltration are addressed by name.
  2. 2. Framework alignment has been mapped, not assumed. The system is mapped to NIST AI RMF, ISO 42001, OWASP LLM Top 10, and the sector frameworks that apply: EU AI Act, HIPAA, DORA, SEBI. Mapping means a gap document, not a logo wall.
  3. 3. Access control is enforced at the API and data layer, not at the UI. PII access, model inference, and tool execution permissions are governed by the same identity model that governs the rest of your production estate.
  4. 4. An adversarial test plan exists. Curated prompt-injection, jailbreak, and data-exfiltration corpora are in place, with a dated schedule for adversarial testing in production. "We will pen-test before launch" is not a plan.
  5. 5. The system has a named owner for security operations after launch. Not a project role. An operating role on the security org chart.

Most common failure

Criteria one and two are addressed at planning time and forgotten by build time. The threat model becomes stale on day thirty. The framework mapping is never refreshed against the actual implementation. Both are zombie artifacts.

If you fail this gate

Bring in an AI Security Review engagement before the build. The review produces the threat model, the framework alignment scorecard, and the adversarial test plan. The build then ships against those artifacts, not toward them.

Gate 3

Model Governance

When the model produces a decision that someone (regulator, auditor, board member, customer) questions, can you defend that decision?

Pass criteria

  1. 1. The use case has an explicit risk classification, documented before the build, against your internal AI risk framework and the EU AI Act categorisation if applicable.
  2. 2. Model approval gates exist and are named. A model does not reach production without explicit sign-off from named roles: data owner, security lead, compliance lead, business sponsor. The gates are written into the SOW.
  3. 3. Decision logging is built in from day one. For every inference, you log the input, the model version, the output, the confidence score, and the reasoning chain where applicable, retrievable for the regulatory retention period.
  4. 4. Bias and fairness testing is part of the build, not a follow-on. The protected categories and the unacceptable-disparity thresholds are agreed before training, not interpreted after.
  5. 5. An incident response playbook exists for AI-specific incidents: hallucination at scale, model drift, adversarial attack post-launch, regulatory enquiry. Each scenario has a named owner and a documented first-hour response.

Most common failure

Decision logging is bolted on after the fact and is incomplete. The team can show the model produced an output but cannot reconstruct the input or the reasoning chain. The first regulatory enquiry exposes the gap.

If you fail this gate

Bring the governance work to the front of the build. Model approval gates, decision logging architecture, and the incident response playbook are scoped into the build SOW, not assumed as overhead.

Gate 4

Evaluation Infrastructure

How will you know the model is working in production, and how quickly will you know when it stops?

Pass criteria

  1. 1. A ground-truth evaluation set exists, version-controlled, owned by the business, refreshed on a documented cadence. This is the same set from Gate 1, audited here for whether it is fit for evaluation.
  2. 2. An evaluation harness runs against the ground-truth set automatically: on every model version change, every prompt change, and on a scheduled cadence in production. Manual runs do not count.
  3. 3. Production performance is monitored at three levels: accuracy against ground truth, input-distribution drift, and output-distribution drift. Each has a defined alert threshold.
  4. 4. Hallucination rate is measured for any generative use case. Post citation enforcement, under 3 percent on monitored queries is a defensible benchmark for production-grade enterprise copilots. The threshold is documented before launch.
  5. 5. The evaluation infrastructure is owned by your team after the build. Your team can extend the ground-truth set, change thresholds, and rerun evaluation independently.

Most common failure

The evaluation harness exists at launch and then atrophies. Six months in, no one has updated the ground-truth set. The model has drifted, the evaluation has not, and the only signal of a problem is a user complaint.

If you fail this gate

Build the evaluation infrastructure first, then the model. A model without an evaluation harness is a demo, not a production system.

Gate 5

Operational Handover

On the day the build partner leaves, does your team have what it needs to run the system?

Pass criteria

  1. 1. A named operator inside the organization owns the system from day one of operations, with operating budget allocated. Not the build partner. Not "to be determined at handover."
  2. 2. The operations runbook exists and has been used. The named operator has executed it against the system at least once before handover: drift response, retraining, incident response, and routine monitoring all rehearsed.
  3. 3. The team that will run the system has been trained against the actual system, not a generic curriculum. Training happens during the build, not after.
  4. 4. The build partner has a defined exit: a handover date, a documented deliverables set, a final acceptance protocol, and a transition support window.
  5. 5. The system can be operated without the build partner. Critical dependencies on the partner's tooling, accounts, or knowledge are documented and either transferred or replaced.

Most common failure

This gate fails silently. The build ships, the team is happy with launch, and six months later the system has drifted. There is no one to call. The build partner moved on. The original operator was reassigned. The model produces degraded output and the program is quietly shelved.

If you fail this gate

Do not start the build. The most expensive failure mode in enterprise AI is shipping a system no one is staffed to run. Either solve the staffing problem before construction, or scope an operate retainer with a partner that genuinely owns the system from day one.

What to do if a gate fails

A failed gate is a finding, not a verdict

The finding produces a remediation plan with a named owner, a scope, a timeline, and a budget. The remediation completes before the build starts. This is unromantic work. It is also the difference between a program that ships and a program that joins the three out of four that do not.

We run a version of this framework on the front of every enterprise AI engagement, included at no additional cost for any program we scope. The internal version is sized as a structured checklist with evidence requirements per criterion, scoring per gate, and a remediation roadmap output.

Standards cited: NIST AI RMF, ISO/IEC 42001, OWASP LLM Top 10, EU AI Act, and the sector frameworks HIPAA, DORA, and SEBI.

Download the playbook

Take the full audit to your team

The complete six-page playbook, formatted to send to your security, finance, and delivery teams. We will email the link directly.

Get the PDF
Related services
Related tools
Talk to us

Want to Run This Audit against Your Program?

Book a 30-minute scoping call. We will walk the five gates against your specific AI build and send the working checklist with evidence requirements per criterion.