Industrial LLM Strategy and Pilot Specification

Industrial / Oil & Gas · 2025 · 8 weeks

International Oil and Gas Service Provider

LLM strategy and on-premise pilot specification for pipeline integrity audits at an international oil and gas services provider.

Client

International Oil and Gas Service ProviderIndustrial / Oil & Gas

Engagement

Strategy and pilot specification

Timeline

8 weeks 2025

Capabilities

AI Strategy · Industrial · Oil & Gas · RAG · On-prem

01The challenge

Test whether an LLM could safely assist on a regulated pipeline integrity audit, without softening the standard the methodology exists to enforce.

The client's pipeline integrity audits sit underneath capital decisions, insurance positions, and regulatory submissions for some of the largest operators in the world. The methodology is proprietary, refined over decades of field work. The judgement-heavy step in the middle of every audit (a senior consultant cross-referencing the framework against several thousand pages of evidence) lived in the heads of people approaching retirement.

The brief had a succession edge to it. The sponsor wanted an honest answer on whether AI could carry some of that load, written down well enough to defend the answer either way. Off-the-shelf hosted assistants were out: audit content cannot leave the client environment. That is a contractual line, not a preference, and it rules out every cloud LLM on the market before the design conversation starts.

Audit content stays inside the client environment, end to end.
Assist, do not replace. Marks stay with the consultant, every time.
Every draft finding has to cite the source pages it came from.
The proprietary methodology never reaches a hosted model.
The recommendation has to be defensible to procurement, engineering, and the regulator.

03What we built

A staged on-premise pilot, with a binary gate between knowledge and judgement.

Two routes were on the table: a single-shot end-to-end consultant assistant, or a two-stage pilot with a conditional gate between the stages. We recommended the staged route, and the strategy review endorsed it without amendment.

Stage one is a knowledge base engine: document processing, retrieval on both meaning and exact wording, a simple query interface. It earns its keep on day one because junior staff stop spending days assembling evidence, and it puts the on-premise GPU and Kubernetes stack through real workloads before anything riskier gets layered on.

Stage two is the gap-analysis engine. It breaks the framework into testable criteria, drafts a finding for each one against the evidence index, and routes those drafts into a consultant review interface before anything reaches a customer-facing report. The position is "assist, not replace", and the architecture is what enforces it. The gate between the stages is binary: if stage one cannot hit the agreed retrieval accuracy and stability targets on five historical audits, stage two does not start.

The five interviews shaped every design call. The retired-knowledge problem became the case for capturing institutional memory as a machine-readable index. The earlier in-house benchmarking work, stalled at proof-of-concept for lack of GPU capacity, became the case for using the now-available on-premise hardware as the primary path. The data-sovereignty constraint became the design boundary, not a wish.

Document ingestion that normalises across PDFs, Word, and scans into one searchable index.
Retrieval on meaning and exact wording at the same time, catching both concept and clause references.
Gap-analysis engine that drafts findings against framework criteria with source-page citations.
Consultant review interface: accept, edit, reject, with immutable audit trail.
Sovereign data path: primary on the client's GPU and Kubernetes stack, with hybrid overflow inheriting the same posture.
Pilot success metrics: top-three retrieval, sub-3s response, precision above 85%, recall above 90%.

Design process

Mapping the consultant's job today, and what shifts when a knowledge engine joins it.

The annotated framework page covered in highlights and margin questions was the artefact of the judgement work. It was about to leave the building with the people doing it. The pair of diagrams below was the conversation: today's bottleneck on the left, the system that retrieves and drafts so the consultant can still make the call, on the right.

Today: one consultant, a framework, thousands of pages.

With a knowledge engine: retrieval and first draft, consultant still decides.

04Outcomes

A board-ready answer and a build-ready specification.

8 wks

In-depth AI consultancy sprint

Lifecycle interviews

2-stage

Pilot with go/no-go gate

On-prem

Sovereign data path

Strategy report

Strategic framing, friction-point register, infrastructure assessment, success metrics, and a risk register, written for a board-level go or no-go decision.

Technical specification

Architecture, security and data-governance posture, testing strategy, AI validation methodology, and support model, written so the build team could start without us in the room.

Staged pilot plan

Two stages with a conditional gate between them. The first earns its keep on retrieval alone; the second only proceeds if the first hits the agreed metrics.

Interview minutes

Structured notes from every session across the audit lifecycle, written for the in-house team to keep using as reference long after the engagement closed.

Cohort, by role in the audit lifecycle

Five interviews. One per stage. Every claim in the report anchored in something we could look at.

The interview programme covered the audit lifecycle end to end and went after the parts of the work that lived in senior heads rather than in any document. Five sessions over Microsoft Teams, each paired with an evidence request: anonymised questionnaires, generic procedures, prior in-house benchmarking material, sample workflow diagrams.

01

Founding pipeline integrity consultant

Originator of the methodology, who walked us through how the questionnaire and on-site visits run, and what gets templated versus rebuilt every time.
02

Integrity team lead and project sponsor

End-to-end workflow walkthrough: the 50-day audit cycle, the 50 to 150 days of follow-on document creation, where recommendation overload happens.
03

Head of in-house LLM infrastructure

What the on-premise GPU and Kubernetes stack can do today, prior open-weight benchmarking work, and what data is and is not allowed to leave the environment.
04

Upstream and offshore consultant

Bespoke, data-heavy projects: hundreds of mixed-format files per asset, annotated diagrams, knowledge that completes a project and then disappears into a personal drive.
05

Compliance audit specialist

Pre-audits ahead of regulator inspections: checklist building from industry guidance, multi-discipline on-site note capture, the consolidation step nobody enjoys.

Hosting posture, decided

Three options on the table. One won the pilot. One stays in reserve. One was ruled out before the design conversation started.

Eliminated

Pure cloud

Ruled out by the data-sovereignty constraint. Audit content cannot leave the client environment, full stop.

Chosen for pilot

Pure on-premise

Existing GPU and Kubernetes stack made it a zero-marginal-cost path to a working prototype. Methodology never reaches a hosted model.

In reserve

Hybrid (production)

Held for production on the condition that overflow capacity inherits the same data-handling posture and never becomes the convenient route round it.

Architectural decisions worth naming

Two design calls do most of the work.

Split on meaning, not tokens

Documents are broken up on boundaries that mean something inside an audit document. Headings, tables, and cross-references carry information a naive fixed-token split would shred.

Search meaning and exact wording at the same time

Auditors cite specific clause numbers and standards as often as they search by concept. The retrieval engine catches both shapes of question in one pass.

In their words

We worked with OpenKit, as we wanted a company with domain expertise in LLMs to look at our strategy and test the concept. The study was focussed and really gave us a validation of our concept, a technical roadmap and prioritisation of the developments. The OpenKit team engaged well during the project, and through capture of workflows and interviews gained a good understanding of what we do. OpenKit provided a good technical study and exceptional value for money.

Project Sponsor Pipeline integrity programme lead · International Oil and Gas Service Provider

Approach

How we delivered it.

Stack

Private retrieval over audit contentOpen-weight LLM on client GPUKubernetes platform, client-ownedConsultant review interfaceImmutable audit trail

Capabilities

AI StrategyIndustrialOil & GasRAGOn-prem

Compliance

ISO 27001ISO 9001GDPRUK data residencyClient data sovereignty

Engagement

From scoping to live.

Discovery and scopingFixed-fee discovery designed to land at a go or no-go decision, with named deliverables and a defined audit-lifecycle interview programme. Weeks 1-2
Stakeholder interviewsFive sessions across the audit lifecycle, each paired with an evidence request. Every claim in the final report anchored in something we could look at. Weeks 3-5
Strategy and architectureHosting-posture comparison, two-stage pilot recommendation, on-premise specification written precisely enough for the in-house build team to start without us in the room. Weeks 6-7
Strategy review and handoverSign-off with the project sponsor, digital solutions lead, and head of in-house LLM infrastructure aligned on direction. Specification and pilot plan handed across. Week 8
Internal pilot buildClient's own team taking the specification forward. Benchmarking conversation restarted shortly after handover. Post-engagement

Bring your team's next AI project to a 30-minute call.

No deck. We listen, sketch a delivery shape, and tell you honestly whether AI is the right tool for the problem.

Book a scoping call View all work

International Oil and Gas Service Provider

Test whether an LLM could safely assist on a regulated pipeline integrity audit, without softening the standard the methodology exists to enforce.

A staged on-premise pilot, with a binary gate between knowledge and judgement.

Mapping the consultant's job today, and what shifts when a knowledge engine joins it.