Open Knowledge Format (OKF) and the Agentic Web
TL;DR
AI agents are brilliant at execution but routinely fail because they lack the right context. In most news organisations, critical editorial and business knowledge is scattered across a dozen incompatible systems.
- OKF is not software. It is a vendor-neutral format — plain markdown files with YAML frontmatter, stored in an ordinary folder tree. No runtime, no SDK, no proprietary tooling.
- A Knowledge Bundle is the whole folder. A Concept is one markdown file inside it. A Concept ID is its relative path (the file's "address").
- Agents query the bundle instead of hunting across CMSs, wikis, and heads of senior staff — one predictable location, one shared format.
- The payoff is reliability, not magic. OKF gives agents accurate, citable context and gives humans a single place to maintain it.
As AI models and agents become more capable, they face a frustrating hurdle: they lack context. An AI agent is brilliant at writing code, summarising massive documents, or analysing data, but it cannot give you accurate, useful results if it doesn't have the right background information.
In most news organisations, this vital editorial and business knowledge is incredibly fractured. It lives across a dozen different surfaces: database structures, internal publishing definitions, editorial style guides, breaking-news playbooks, syndication API rules — stuck inside content management systems, buried in code comments, or simply trapped in the heads of a few senior editors and data analysts.
When an AI agent needs to answer a simple question like "How do we calculate our article conversion rate across our morning newsletter stream?" it has to hunt through these scattered, incompatible systems. Every vendor has a different setup, which means developers are constantly forced to reinvent the wheel just to feed the right information to their agents.
To fix this, we don't need another heavy enterprise software platform. We need a universal, shared format.
1. Introducing the Open Knowledge Format (OKF)
The Open Knowledge Format (OKF) v0.1 is an open specification designed to standardise how we package knowledge for both humans and AI agents. It formalises the "LLM wiki" pattern introduced by Andrej Karpathy — the idea of maintaining a plain-text knowledge base that you hand directly to a language model — and turns it into a portable, vendor-neutral standard.
Best of all, it requires no complex software, no new runtimes, and no proprietary coding kits. An OKF bundle is simply:
- Just Markdown. Plain text files that anyone can read in any text editor, view on GitHub, or index with a basic search tool.
- Just Files. A basic directory folder that you can zip up, store in a standard Git repository, or mount onto any normal filesystem.
- Just YAML Frontmatter. A small block of structured, searchable text at the very top of each file to handle vital metadata — like its
type,title,description,tags, andtimestamp.
If you have ever used modern note-taking or website tools like Notion or Hugo, the setup will feel entirely familiar. OKF just provides the basic rules to make these different wikis talk to each other seamlessly.
2. Understanding OKF Terminology
OKF relies on a small set of clean, agreed-upon terms to structure information:
- Knowledge Bundle: The entire project folder or directory tree. This is the complete, self-contained library of files that you hand over to an AI agent or store in Git.
- Concept: A single, standalone unit of knowledge within your bundle, represented as an individual markdown document. It can describe something physical like an editorial table, or something abstract like a publishing metric.
- Concept ID: The unique "address" or identifier of that piece of knowledge inside the bundle. It is simply the relative file path with the
.mdextension removed (e.g.,metrics/paywall_conversion). - Frontmatter: The configuration block at the very top of the markdown file wrapped in triple dashes (
---). It uses YAML to provide core metadata that agents can quickly query or filter. - Body: Everything that comes after the frontmatter. Written in plain markdown, it contains the actual explanations, tables, code blocks, or equations.
- Link: Standard markdown links used to point from one concept file to another. This connects independent files together into a rich relationship web.
- Citation: Clear pointers listed at the bottom of the document to prove exactly where the information came from, preventing AI hallucinations.
3. Real-World Examples of OKF Concepts
While OKF can represent a technical table schema, it can represent abstract media operations just as easily. Below are two completely different examples of how concepts look inside a news publisher's bundle directory.
Example 1: A News Publisher Metric
If saved at metrics/paywall_conversion_rate.md, this file maps out how the media company tracks paywall efficacy:
---
type: metric
title: Paywall Conversion Rate
description: >
Tracks the percentage of anonymous readers who start a paid
subscription after hitting a metered paywall gate.
tags: [paywall, conversion, revenue, subscriptions]
timestamp: 2024-11-15T09:00:00Z
---
**Definition:** The ratio of readers who initiate a paid subscription
to the total number of readers who encounter the paywall gate.
**Formula:**
Paywall Conversion Rate =
(New Subscriptions from Paywall) / (Unique Paywall Hits) × 100
**Data sources:** Piano Analytics, or custom CMS paywall events.
**Caveats:** Exclude readers arriving via partner referral links —
these bypass the paywall by design. Anonymous-to-registered
conversions are tracked separately in the registration funnel.
**Links:**
- [Revenue KPI Dashboard](../metrics/revenue_kpi_overview)
- [Metered Paywall Config](../config/paywall_meter_config)
**Citations:**
- Piano Analytics internal documentation (accessed 2024-11-01)
- Newsroom Revenue Handbook, Q3 2024An AI agent querying this bundle now has an unambiguous, citable definition. It doesn't need to guess whether "paywall conversions" includes referral bypasses — the concept tells it explicitly.
Example 2: An Editorial Breaking-News Playbook
If saved at playbooks/breaking_news_publishing.md, this file codifies the steps editors follow under deadline pressure — the exact kind of institutional knowledge that typically lives only in a senior editor's head:
--- type: playbook title: Breaking News Publishing Playbook description: > Step-by-step protocol for publishing breaking stories on the live site, including CMS workflow, social push sequence, and SEO rules. tags: [breaking-news, publishing, editorial, CMS, seo] timestamp: 2025-03-01T08:00:00Z --- ## Triggering criteria A story is classified as **breaking** when it meets at least one: - Confirmed fatalities or major injuries - Government announcement with immediate public impact - Market-moving financial event (index move ≥ 2%) ## CMS checklist (in order) 1. Set article `type: breaking` in the CMS article settings. 2. Add the `<BreakingNewsBanner>` component to the lede slot. 3. Set `newsKeywords` meta tag to the top three Reuters topic codes. 4. Publish with *initial slug* — never change the slug after first index. 5. Push social notification via the Ping desk (not automated). ## SEO rules - Title must be ≤ 60 characters; lead with the most specific noun. - Do **not** add a paywall gate for the first 6 hours. - Submit URL to Google Search Console Inspect immediately after publish. **Links:** - [Paywall Conversion Rate](../metrics/paywall_conversion_rate) - [SEO Style Guide](../guides/seo_style_guide) **Citations:** - Editorial Standards Manual v4.2, January 2025 - Google News Publisher Centre documentation
Notice the Links section cross-references the paywall metric defined in Example 1. That's the relationship web in action — an agent following a question about breaking-news traffic can traverse from the playbook to the paywall metric to the revenue dashboard without leaving the bundle.
4. How Agents Actually Use an OKF Bundle
When an AI agent receives a task — say, "Why did our paywall conversion drop on the morning of March 15th?" — the workflow with an OKF bundle looks like this:
1. Agent receives task
2. Agent queries bundle index for relevant concepts
→ matches metrics/paywall_conversion_rate
→ matches playbooks/breaking_news_publishing
3. Agent reads both concept files
4. Agent reasons with accurate, citable definitions
5. Agent produces an answer grounded in the bundleWithout the bundle, step 2 becomes a long, error-prone hunt: checking the CMS documentation, asking the data team what their table schema looks like, and hoping the answer isn't buried in a Confluence page from 2021.
The bundle also solves a subtler problem: preventing hallucination by giving the model a citation to check against. Every concept file ends with a Citations block. An agent that cites Newsroom Revenue Handbook, Q3 2024 is easier to verify — and easier to correct — than one that produces a plausible-sounding definition from training weights.
5. OKF vs RAG: Why Not Just Embed Everything?
The obvious question: if you already run a RAG pipeline or a vector store, why bother with OKF at all?
| Dimension | RAG / Vector Store | OKF Bundle |
|---|---|---|
| Human readability | Poor — opaque embeddings | Excellent — plain markdown |
| Editability | Requires re-embedding pipeline | Edit a text file |
| Versioning | Complex (index + chunk drift) | Standard Git diff/blame |
| Tooling required | Vector DB, embedding model, retriever | None beyond a text editor |
| Cross-vendor portability | Low — tied to embedding dimensions | Complete — it's just files |
| Best for | Large, unstructured corpora (millions of docs) | Curated, structured business knowledge (hundreds of concepts) |
OKF and RAG are not competitors — they occupy different positions on the scale axis. A RAG pipeline over your entire article archive is the right tool for "find me articles similar to this topic." An OKF bundle covering your metric definitions, playbooks, and data schemas is the right tool for "what does this organisation mean by engaged reader?" One is broad retrieval; the other is precise institutional knowledge.
In practice you'll often run both: the OKF bundle loaded into the agent's system prompt (or fetched at task start), the RAG index queried for large-corpus lookups. The bundle keeps the agent grounded in how your organisation operates; the RAG index gives it access to what your organisation has published.
OKF also slots naturally into the Context Warehouse pattern — the architecture where a whole curated corpus is loaded into the model's context window rather than retrieved chunk by chunk. A Knowledge Bundle is exactly the kind of pre-assembled, structured, versioned corpus a Context Warehouse expects: small focused files, explicit citations, no noise. Where a Context Warehouse answers "load everything relevant," OKF answers "here is everything relevant, already organised."
6. Structuring Your First Bundle
For a news publisher, a sensible starting directory tree looks like this:
knowledge-bundle/
├── okf.yaml ← bundle manifest (name, version, owner)
├── metrics/
│ ├── paywall_conversion_rate.md
│ ├── page_views_per_session.md
│ └── newsletter_open_rate.md
├── playbooks/
│ ├── breaking_news_publishing.md
│ └── correction_and_retraction.md
├── guides/
│ ├── seo_style_guide.md
│ └── structured_data_requirements.md
└── schemas/
├── article_table.md
└── author_profile_table.mdThe okf.yaml manifest at the root declares the bundle's identity — its name, version, owner, and a short description — so an agent receiving it knows immediately what it holds and how current it is.
A few rules of thumb I'd apply when building a first bundle:
- One concept per file. Resist the temptation to put five related metrics in one document. Granularity makes links precise and retrieval reliable.
- Mandatory citations. If you can't source a definition, it's not ready for the bundle. The whole point is grounding agents in verified facts.
- Version in Git. Treat your bundle like production code. Every change should be a commit with a clear message. An agent that gives a wrong answer because it read a stale definition is a debugging problem you want to be able to trace.
- Start small. Ten well-maintained concepts beat a hundred half-finished ones. Agents degrade on noisy context just as reliably as humans do.
Key takeaways
- The AI context problem is a knowledge organisation problem, not a model capability problem. Better models do not fix scattered, incompatible knowledge sources.
- OKF is a format, not a platform — it imposes almost no infrastructure cost because it's just files.
- The YAML frontmatter is the machine-readable layer; the markdown body is the human-readable layer. Both matter — humans maintain what agents consume.
- OKF and RAG solve different scale problems. Use both: the bundle for curated institutional knowledge, the vector index for large-corpus retrieval.
- A bundle maintained in Git, with citations in every concept, is the most reliable way to keep agents honest, auditable, and correctable.
The agentic web is not coming — it is here. The organisations that will get the most out of AI agents are not necessarily the ones with the most compute or the most advanced models. They are the ones that do the unsexy work of documenting what they know, in a format that agents can actually read. OKF is one bet on what that format looks like. It is deliberately minimal: if it works, it will feel obvious in retrospect.
