AI Knowledge Systems for UK Firms | ISO 27001 | OpenKit

AI knowledge systems
for firms whose value is in their archive.

Private RAG. Built on top of your stack.

Insights agencies with twenty years of research no one can find. Regulatory consultancies whose senior partners are walking knowledge bases. Specialist law firms with case files locked in SharePoint. Niche manufacturers whose IP lives in PDFs and someone’s head. We build the system that makes the archive searchable, citable, and useful to the whole team.

ISO 27001, ISO 9001, Cyber Essentials. Cambridge-based, working across the UK.

QUERY CITED EXCERPT SOURCE report_2019_q3.pdf p.14 · conf 0.92 FIG · ARCHIVE → CITATION PROPRIETARY CORPUS

OpenKit is a UK AI consulting firm that builds private knowledge systems for organisations with valuable internal IP. OpenKit deploys retrieval-augmented generation across your proprietary corpus, case files, research archives, internal reports, telemetry, integrated with your existing storage and identity stack. OpenKit holds ISO 27001, ISO 9001, and Cyber Essentials certifications and works with regulated and IP-heavy clients across the United Kingdom from a base in Cambridge.

The shape of the problem.

Four patterns we see across the firms that engage us for knowledge-systems work. If two or more of these read as familiar, the system we build is probably worth the conversation.

PATTERN 01

The archive nobody searches

Twelve years of research reports, case files, customer interviews, expert memos sitting in SharePoint, Google Drive, network shares. Search returns nothing useful. Staff ask colleagues instead of looking it up. Institutional memory walks out the door when a senior leaves.

PATTERN 02

The senior bottleneck

Every difficult question routes to the same three or four senior people because only they remember the relevant precedent. They burn out. The work that should run on their archive runs on their attention.

PATTERN 03

New joiners cannot get oriented

Onboarding a new hire takes six months of shadowing because the institutional knowledge is not written down anywhere they can read it. The longer the firm has been operating, the worse this gets.

PATTERN 04

The off-the-shelf AI tool is wrong

ChatGPT and Copilot are confident but they do not know your firm. They cite generic answers when the actual answer is in your 2019 report. Sending sensitive material to public APIs is not an option for regulated work.

What we build.

A private retrieval-augmented generation system over your proprietary corpus, built to your environment, on top of your existing stack, respecting the access controls you already have. The principles on the right are non-negotiable on every build.

The technical engineering pattern is covered on the retrieval-augmented generation page. This page is the buyer-side view.

01

Private by default

Your corpus stays in your environment. The retrieval and generation layer is configurable to keep data on-prem, in a dedicated cloud tenant, or in a private deployment of a major LLM provider, whatever your obligations require.

02

Integrated with your stack

We deploy on top of your existing storage: SharePoint, S3, on-prem document stores, your case management system. No new data lake to migrate to and no new platform for the team to learn.

03

Permissioned at the source

The system respects the access controls already configured in your storage and identity systems. People can only retrieve what they were already allowed to see. Audit logging is on as standard.

04

Citable

Every answer includes a citation back to the source document or excerpt. No confident hallucinations. If the system does not have the answer, it says so rather than making one up.

Stack we integrate with.

We do not require you to migrate to a new platform. The system plugs into the storage and identity tools you already operate. Sector-specific obligations are addressed at the integration layer, not bolted on later.

Storage

  • Microsoft 365 / SharePoint Online
  • Microsoft 365 document libraries
  • AWS S3
  • On-prem document stores

Identity

  • Microsoft Entra ID / Azure AD
  • SAML 2.0 / OIDC providers

LLM deployment

  • Anthropic Claude (UK / EU regions)
  • OpenAI
  • Open-weights on private GPU (Llama, Mistral)
  • On-prem inference for sovereign requirements

Regulated controls

  • ISO 27001 audit logging
  • UK GDPR data processing register
  • NHS DSP Toolkit alignment
  • FCA SYSC operational resilience

Knowledge systems we have built.

Three recent engagements. Clients here stay anonymised at their request. We describe outcomes in terms the engineering and procurement teams can verify against their own constraints.

— 01

A UK insurance intelligence publisher

Insurance media Anonymised
A publisher with over a hundred thousand proprietary articles and reports needed a customer-facing search and Q&A layer that cited the source material. We built the retrieval system, the citation layer, and the access controls so subscribers see only what their tier allows.
Outcome Querying the archive in seconds. Citation accuracy verified against ground-truth on a sampled set. Subscribers retained the IP they were paying for.
— 02

A UK manufacturing group

Manufacturing / lighting Anonymised
A lighting manufacturer with field telemetry across cloud time-series infrastructure needed a way for the sales and support teams to ask questions of the data without learning SQL. We built the retrieval layer over the telemetry stack and the conversational interface on top.
Outcome Sales and support teams query the telemetry directly. Engineering time previously spent on ad-hoc data requests recovered.
— 03

A UK cultural insights agency

Insights and strategy Anonymised
A cultural insights agency with decades of past client work in mixed-format archives (decks, transcripts, PDFs) needed an internal system the team could query when starting new engagements. Confidentiality across overlapping clients was the load-bearing constraint.
Outcome Team queries the archive on new briefs in minutes. Confidentiality boundaries preserved at the source-document level via permissioning.

Questions buyers ask.

What is an AI knowledge system?
An AI knowledge system is a private retrieval-augmented generation deployment over an organisation's proprietary corpus, reports, case files, archives, telemetry, that lets staff query the institutional knowledge directly. At OpenKit we build these as integrated systems on top of the client's existing storage and identity stack, with permissioning honoured at the source.
How is this different from ChatGPT or Microsoft Copilot?
Public AI tools answer from their training data. They do not know your archive. They cite generic sources when the actual answer is in your 2019 report. A private knowledge system retrieves from your corpus only and cites back to the document. Confidence is grounded in your data, not synthesised from someone else.
Where does our data live?
Your data stays where it is. The retrieval layer reads from your existing storage (SharePoint, on-prem document stores, S3, your case management system). The LLM call goes to a configurable backend: a UK or EU region of a major provider, a private deployment on your cloud tenant, or on-prem inference if your obligations require it. We do not require migration to a new platform.
How do you handle access controls and permissioning?
The system respects whatever access controls are configured in your storage and identity systems. If a user cannot see a document in SharePoint, they cannot retrieve it through the AI layer either. Permissioning is checked at retrieval time against your existing identity provider, not pre-baked into the embeddings.
What about hallucinations?
Every answer includes a citation back to the source document or excerpt. If the system cannot find a relevant source in your corpus, it says so rather than making one up. The retrieval-grounded answer is the only answer the system returns. Hallucination rates on grounded answers measured in single-digit percentages on engagements where we benchmarked.
How is this different from your retrieval-augmented generation page?
The technical RAG page describes the engineering: embeddings, retrieval, evaluation, deployment patterns. This page describes the engagement from the buyer's side: what we build for an insights agency, a regulatory consultancy, a specialist law firm, a manufacturer. Both apply; this one is the audience-led version.
How is this different from a productised private AI setup?
Productised private AI offerings are configured wrappers over a generic LLM deployment. They work well when the client wants AI access without the engineering depth. Our knowledge-systems work is custom-built against the client's proprietary corpus and integrated with their existing storage, identity, and audit-logging stack. Different scope, different price point. The audit identifies which is the right fit for your business.
How long does a knowledge-systems engagement take?
A typical sequence starts with the AI Audit, which scopes the build. The build itself runs from a few weeks to a few months depending on the corpus size, the integration surface, and the regulated-controls work. We can usually demonstrate the retrieval layer running against a sample of your corpus within the first month.
How much does it cost?
Scoped per engagement after the audit, since the cost depends on corpus size, integration surface, and regulatory requirements. The audit that scopes the build is a fixed-fee engagement; build engagements are quoted after we have seen the corpus and the integration surface.
Do you work with regulated industries?
Yes. We have built knowledge systems for insurance media, manufacturing, cultural insights, and regulated professional services. ISO 27001 audit logging is on as standard. Sector-specific obligations (UK GDPR, FCA SYSC, NHS DSP Toolkit, SRA) sit at the integration layer per engagement.
Can we keep the system on-prem?
Yes. We deploy retrieval, embedding generation, and LLM inference on-prem when sovereignty or contract obligations require it. Open-weights models running on your GPU infrastructure are the most common pattern for fully on-prem deployments.

Want a system that knows your archive?

Two-week audit to scope the build. Private by default. Citable by design. Integrated with your existing storage, identity, and audit-logging stack.

Start Your AI Project

Thank you for your interest. Enter your project details below and our team will get in contact within 24 hours.

About your AI project

0 / 2,000

About you

By submitting this form, you confirm that you have read and agree to our privacy policy. We will only use your information to respond to your inquiry.