Knowledge System

I didn’t build this because I thought the world needed another RAG framework.
I built it because I didn’t trust the answers I was getting—and I didn’t trust my own understanding of why those answers existed.

Reading about knowledge graphs and retrieval-augmented generation is easy. Nodding along to architecture diagrams is easy. Believing that “this reduces hallucinations” is easy.
Understanding where trust actually comes from is not.
So I built KnowGraphRAG, not as a product, but as an experiment: What happens if you stop treating the LLM as the center of intelligence, and instead force it to speak only from a
structure you can inspect?

Why Chunk-Based RAG Breaks Down in Real Work

Traditional RAG systems tend to look like this:

1. Break documents into chunks
2. Embed those chunks
3. Retrieve “similar” chunks at query time

Hand them to an LLM and hope it behaves

This works surprisingly well—until it doesn’t.

The failure modes show up fast when:

you’re using smaller local models
your data isn’t clean prose (logs, configs, dumps, CSVs)
you care why an answer exists, not just what it says

Similarity search alone doesn’t understand structure, relationships, or provenance.
Two chunks can be “similar” and still be misleading when taken together. And once the LLM starts bridging gaps on its own, hallucinations creep in—especially on constrained
hardware.

I wasn’t interested in making the model smarter.
I was interested in making it more constrained.

Flipping the Model: The Graph Comes First

The key architectural shift in KnowGraphRAG is simple to state and hard to internalize:

The knowledge graph is the system of record.
The LLM is just a renderer.

Under the hood, ingestion looks roughly like this:

Documents are ingested whole, regardless of format
– PDFs, DOCX, CSV, JSON, XML, network configs, logs
They are chunked, but chunks are not treated as isolated facts
Entities are extracted (IPs, orgs, people, hosts, dates, etc.)
Relationships are created
– document → chunk
– chunk → chunk (sequence)
– document → entity
– entity → entity (when relationships can be inferred)
Everything is stored in a graph, not a vector index

Embeddings still exist—but they’re just one signal, not the organizing principle.
The result is a graph where

documents know what they contain
chunks know where they came from
entities know who mentions them
relationships are explicit, not inferred on the fly

That structure turns out to matter a lot.

What “Retrieval” Means in a Graph-Based RAG

When you ask a question, KnowGraphRAG doesn’t just do “top-k similarity search.”
Instead, it roughly follows this flow:

Extract entities from the query
– Not embeddings yet—actual concepts
Anchor the search in the graph
– Find documents, chunks, and entities already connected
Traverse outward
– Follow relationships to build a connected subgraph
Use embeddings to rank, not invent
– Similarity helps order candidates, not define truth
Expand context deliberately
– Adjacent chunks, related entities, structural neighbors

Only after that context is assembled does the LLM get involved.

And when it does, it gets a very constrained prompt:

- Here is the context
- Here are the citations
- Do not answer outside of this

This is how hallucinations get starved—not eliminated, but suffocated.

Why This Works Especially Well with Local LLMs

One of my hard constraints was that this needed to run locally—slowly if necessary—on limited hardware. Even something like a Raspberry Pi.

That constraint forced an architectural honesty check.

Small, non-reasoning models are actually very good at:

summarizing known facts
rephrasing structured input
correlating already-adjacent information

They are terrible at inventing missing links responsibly.

By moving correlation, traversal, and selection into the graph layer, the LLM no longer has to “figure things out.” It just has to talk.

That shift made local models dramatically more useful—and far more predictable.

The Part I Didn’t Expect: Auditability Becomes the Feature

The biggest surprise wasn’t retrieval quality.

It was auditability.

Because every answer is derived from:

specific graph nodes
specific relationships
specific documents and chunks

…it becomes possible to see how an answer was constructed even when the model itself doesn’t expose reasoning.

That turns out to be incredibly valuable for:

compliance work
risk analysis
explaining decisions to humans who don’t care about embeddings

Instead of saying “the model thinks,” you can say:

these entities were involved
these documents contributed
this is the retrieval path

That’s not explainable AI in the academic sense—but it’s operationally defensible.

What KnowGraphRAG Actually Is (and Isn’t)

KnowGraphRAG ended up being a full system, not a demo:

Graph-backed storage (in-memory + persistent)
Entity and relationship extraction
Hybrid retrieval (graph-first, embeddings second)
Document versioning and change tracking
Query history and audit trails
Batch ingestion with guardrails
Visualization so you can see the graph
Support for local and remote LLM backends
An MCP interface so other tools can drive it

But it’s not a silver bullet.
It won’t magically make bad data good.
It won’t remove all hallucinations.
It won’t replace judgment.

What it does do is move responsibility out of the model and back into the system you control.

The Mindset Shift That Matters

If there’s one lesson I’d pass on, it’s this:

Don’t ask LLMs to be trustworthy.
Architect systems where trust is unavoidable.

Knowledge graphs and RAG aren’t a panacea—but together, they create boundaries.
And boundaries are what make local LLMs useful for serious work.

I didn’t fully understand that until I built it.

And now that I have, I don’t think I could go back.