13 May 2026

Finding a Needle in a Million Haystacks, in 30ms

Your personal knowledge base grows quietly, accumulating notes, reports, articles, and emails, until one day search starts to lag. The culprit isn't your hardware. It's the approach: checking every document chunk one by one scales linearly, and linear eventually loses. This post explores how ThinkableSpace sidesteps that wall using HNSW (Hierarchical Navigable Small World) graphs, the same principle behind social-network "six degrees of separation", to navigate millions of vectors in 15 to 30ms, regardless of library size.

So far we've established that ThinkableSpace converts your documents into lists of numbers, embeddings, that represent meaning. Finding the most relevant results to your search means finding the embeddings that are mathematically closest to your query's embedding.

Simple enough in theory. But there's a practical problem that becomes impossible to ignore once your knowledge base grows beyond a few dozen files.

The obvious approach, and why it breaks

The most straightforward way to find the closest embedding to your query is to compare your query against every single document chunk you've indexed.

Check chunk 1. Check chunk 2. Check chunk 3. All the way to the end.

For a small collection, a few hundred documents, a few thousand chunks, this works fine. The comparisons happen fast enough that you never notice any delay.

But personal knowledge bases grow. A few years of notes, reports, saved articles, and exported emails can easily produce tens of thousands of chunks. A serious researcher or a business user might have hundreds of thousands.

At that scale, checking every single chunk starts taking seconds. And search that takes seconds stops feeling like search, it starts feeling like waiting.

The problem isn't hardware speed. It's the approach itself. No matter how fast your processor gets, a method that scales linearly with the size of your library will eventually lose.

A smarter way to navigate

The solution comes from an insight borrowed from network theory: navigable small worlds.

The "small world" phenomenon is familiar from social networks. Despite the world having billions of people, you can typically reach any stranger through a surprisingly short chain of mutual connections, the famous idea that everyone is separated by about six degrees.

This happens because social networks aren't random. They have structure. Clusters of closely connected people, with occasional long-range connections that bridge distant clusters. You can navigate to any person efficiently by hopping through this structure rather than meeting everyone individually.

Vector search uses the same principle. Instead of searching through all your document chunks one by one, it builds a graph, a web of connections between chunks that are close to each other in meaning.

Navigating the graph

When you run a search, the system doesn't visit every node in the graph. It navigates.

It starts at an entry point and asks a simple question: which of this node's neighbours is closer to my query than I currently am? Then it moves there. And asks again. And moves again.

Each hop brings it closer to the answer. The path from the starting point to the nearest result is short, not because the graph is small, but because it's structured to make navigation efficient.

The technical name for this approach is HNSW, Hierarchical Navigable Small World. The "hierarchical" part adds another layer of elegance: the graph is built in multiple layers, from sparse (few nodes, long-range connections) at the top to dense (all nodes, short-range connections) at the bottom. Search starts at the top, where large jumps can be made quickly, then descends into increasingly fine-grained layers as it zeroes in on the answer.

What this means in practice

The result is search that doesn't get slower as your library grows.

Checking every chunk individually would take twice as long with twice as many documents. HNSW-based search adds a fraction of a millisecond. Whether you have 10,000 chunks or 1,000,000, the search completes in roughly the same time, typically between 15 and 30 milliseconds from the moment you press Enter.

That's fast enough to feel instant. Fast enough that the delay disappears entirely into the background noise of clicking a button.

Accuracy without exhaustiveness

You might wonder: if you're not checking everything, how do you know you're finding the best results?

This is a fair concern, and it's one the field of approximate nearest neighbour search takes seriously.

The answer is that the graph is built to be navigable in a way that reliably leads to very good answers, not always the mathematically perfect nearest neighbour, but results that are close enough to be indistinguishable in practice. The typical accuracy of this approach is around 95%, which means that in 19 out of 20 cases, the result is identical to what a full exhaustive search would have returned.

For document search, this trade-off is entirely reasonable. The difference between the 1st most semantically similar chunk and the 2nd most similar chunk is rarely meaningful to a human reader. What matters is that relevant results appear at the top, and they reliably do.

Speed as a design constraint, not an afterthought

The reason ThinkableSpace can search a large personal knowledge base in milliseconds isn't that the hardware is unusually powerful. It's that the search is architected to avoid the slow path from the start.

Fast search isn't a luxury feature. It's what determines whether a tool actually gets used. A search that takes three seconds gets used occasionally. A search that returns in 20 milliseconds becomes a reflex.

Next: how ThinkableSpace balances speed and accuracy even further, using the same AI model at different levels of precision depending on what the search needs.

← All Posts