I didn’t build this because I thought the world needed another RAG framework.
I built it because I didn’t trust the answers I was getting—and I didn’t trust my own understanding of why those answers existed.
Reading about knowledge graphs and retrieval-augmented generation is easy. Nodding along to architecture diagrams is easy. Believing that “this reduces hallucinations” is easy.
Understanding where trust actually comes from is not.
So I built KnowGraphRAG, not as a product, but as an experiment: What happens if you stop treating the LLM as the center of intelligence, and instead force it to speak only from a
structure you can inspect?
Traditional RAG systems tend to look like this:
This works surprisingly well—until it doesn’t.
The failure modes show up fast when:
Similarity search alone doesn’t understand structure, relationships, or provenance.
Two chunks can be “similar” and still be misleading when taken together. And once the LLM starts bridging gaps on its own, hallucinations creep in—especially on constrained
hardware.
I wasn’t interested in making the model smarter.
I was interested in making it more constrained.
The key architectural shift in KnowGraphRAG is simple to state and hard to internalize:
Under the hood, ingestion looks roughly like this:
Embeddings still exist—but they’re just one signal, not the organizing principle.
The result is a graph where
That structure turns out to matter a lot.
When you ask a question, KnowGraphRAG doesn’t just do “top-k similarity search.”
Instead, it roughly follows this flow:
Only after that context is assembled does the LLM get involved.
And when it does, it gets a very constrained prompt:
This is how hallucinations get starved—not eliminated, but suffocated.
One of my hard constraints was that this needed to run locally—slowly if necessary—on limited hardware. Even something like a Raspberry Pi.
That constraint forced an architectural honesty check.
Small, non-reasoning models are actually very good at:
They are terrible at inventing missing links responsibly.
By moving correlation, traversal, and selection into the graph layer, the LLM no longer has to “figure things out.” It just has to talk.
That shift made local models dramatically more useful—and far more predictable.
The biggest surprise wasn’t retrieval quality.
It was auditability.
Because every answer is derived from:
…it becomes possible to see how an answer was constructed even when the model itself doesn’t expose reasoning.
That turns out to be incredibly valuable for:
Instead of saying “the model thinks,” you can say:
That’s not explainable AI in the academic sense—but it’s operationally defensible.
KnowGraphRAG ended up being a full system, not a demo:
But it’s not a silver bullet.
It won’t magically make bad data good.
It won’t remove all hallucinations.
It won’t replace judgment.
What it does do is move responsibility out of the model and back into the system you control.
If there’s one lesson I’d pass on, it’s this:
Don’t ask LLMs to be trustworthy.
Architect systems where trust is unavoidable.
Knowledge graphs and RAG aren’t a panacea—but together, they create boundaries.
And boundaries are what make local LLMs useful for serious work.
I didn’t fully understand that until I built it.
And now that I have, I don’t think I could go back.