AnnouncementsMay 15, 20269 min read

Introducing KCP — Compile knowledge before you prompt

Stop pushing PDFs into LLMs. KCP turns documents into semantically compiled context — higher fidelity, fewer tokens, and a science lab to prove the gain.

Paulo Tomazinho, PhDCreator of KCP

May 15, 2026

Large language models are extraordinary readers — but they read the way we ask them to. When we ask them to read a PDF, they read structural noise. When we ask them to read a raw transcript, they read low-density prose. When we paste a 200-page manual into the context window, we pay in tokens, latency and hallucinations for content that was never designed to be machine-native in the first place.

Stop pushing PDFs into LLMs. Compile knowledge first — and prove the gain.

Today we are introducing KCP — the Knowledge Context Protocol: an open format and a complete platform that turns unstructured documents into a verifiable, portable layer of context that any LLM can consume with higher fidelity, fewer tokens and less hallucination.

The problem with raw documents

Documents were built for human eyes. They contain headers, footers, page numbers, decorative figures, repeated boilerplate and the implicit assumption that a human will bridge the gaps between paragraphs. None of that is helpful to a model. Three failure modes show up over and over again:

Structural noise. Up to 30% of the tokens in a typical PDF are layout artifacts the model has to ignore.
Low semantic density. A single fact is often diluted across multiple paragraphs, summaries and restatements.
Missing relations. Causal, procedural and conditional links between entities live in the reader's head, never in the document.

What KCP is

KCP compiles a source document into a .kcp.yaml artifact: a structured, semantically dense package containing entities, relations, facts, procedures, rules, edge cases and provenance. Every block carries a SHA256 hash that ties it back to the exact span of the source — so an agent can cite, verify and trace anything it uses.

# excerpt of a .kcp.yaml package
entities:
- id: e1
name: "Insulin pump"
type: "device"
relations:
- subject: e1
predicate: "requires"
object: "calibration_every_72h"
rules:
- id: r1
when: "glucose > 250 mg/dL"
then: "deliver correction bolus"
severity: "critical"
provenance:
source_sha256: "9f2c…"
page: 42

The three layers

1. Compiler

Upload a PDF, TXT or Markdown file and the compiler extracts entities, relations, facts and procedures into a .kcp package. The pipeline is deterministic where it can be, and assisted by LLMs where extraction needs interpretation.

2. Visualizer

A .kcp package is also a navigable graph. The Visualizer renders the compiled artifact as an annotated YAML view side-by-side with an interactive entity graph, so researchers can audit what was extracted and where it came from.

3. Science Lab

The most distinctive layer. The Lab runs controlled experiments in three parallel conditions — pdf_raw, txt_raw and kcp — across configurable replicas, with random question order, SHA256-hashed prompts and a blind LLM judge that grades each response. The result is not a feeling; it is evidence.

Anatomy of a .kcp package

Each package is a single YAML/JSON file with a small but expressive schema:

entities — the nouns of the document
relations — typed edges between entities
facts — atomic, verifiable statements
procedures — ordered steps with preconditions
rules — when / then logic with severity
edge_cases — what the document explicitly handles at the boundary
provenance — SHA256 hashes that anchor every block to the source

The Science Lab — proving the gain

Anyone can claim their preprocessing makes LLMs smarter. KCP ships with the apparatus to measure it. Each experiment defines a hypothesis, a question battery (factual, inferential, procedural, causal, edge-case), gold answers, an agent model and a blind judge model. The Lab then runs the same battery against three contexts:

pdf_raw — the original PDF text dumped into the prompt
txt_raw — a cleaned plain-text version
kcp — the compiled .kcp.yaml package

Statistical analysis uses bootstrap 95% confidence intervals and the Wilcoxon paired signed-rank test. A composite score combines accuracy, faithfulness and token cost, and the dashboard renders bar, radar and heatmap views, with side-by-side response diffs and a downloadable PDF report.

How KCP differs from RAG and fine-tuning

vs. RAG: RAG retrieves text chunks. KCP retrieves a structured graph with explicit relations and provenance.
vs. embeddings: embeddings are opaque vectors. KCP blocks are human-readable and auditable.
vs. fine-tuning: fine-tuning bakes knowledge into weights. KCP keeps knowledge inspectable, swappable and versionable.
vs. generic benchmarks: the Science Lab produces self-reproducible experiments tied to your own documents.

Where KCP fits today

Compliance & legal — verifiable rules with provenance
Academic research — measurable comparisons across model families
Healthcare — explicit edge cases and severity-tagged rules
Finance & engineering — procedural knowledge that agents can execute
LLM vendor evaluation — apples-to-apples context-quality benchmarks

Architecture in one diagram

The platform is built on TanStack Start, Tailwind v4 and shadcn, with createServerFn handling all backend calls behind requireSupabaseAuth. Seven tables, full RLS, server-side PDF generation and a Lovable AI Gateway power the judge and the assistant features inside the experiment wizard.

The road ahead

KCP today is a working compiler, visualizer and science lab. KCP tomorrow is an open standard for context exchange between humans, tools and LLMs. The roadmap includes public experiment sharing, domain-specific question batteries, a KCP marketplace, a public API, IDE plugins and KCP-native agents.

If LLMs are going to do real work on real documents, the documents themselves need an upgrade. KCP is that upgrade — and the Science Lab is how we keep ourselves honest about it.

KCPProtocolLLMContext EngineeringScience Lab

Continue reading

ResearchMay 15, 202614 min read

PDF vs .kcp — Why Agentic AI Needs a New Knowledge Format

A seminal essay on why PDFs fail LLMs and what comes next: KCP, an AI-native format for executable, verifiable, agent-ready knowledge.