Building a semantic search engine in ±250 lines of Python

Listen to this article instead Your browser does not support the audio element Once upon a time I wrote a post about building a toy TF-IDF keyword search engine. It has been one of the more popular posts I’ve written, and in this age of AI I felt a sequel has been long overdue. It’s pretty fast (even though it’s written in pure Python), it ranks results with TF-IDF, and it can rank 6.4 million Wikipedia articles for a given query in milliseconds. But it has absolutely no context of what words mean. ...

February 9, 2026 · 14 min · Bart de Goede

Building a full-text search engine in 150 lines of Python code

Full-text search is everywhere. From finding a book on Scribd, a movie on Netflix, toilet paper on Amazon, or anything else on the web through Google (like how to do your job as a software engineer), you’ve searched vast amounts of unstructured data multiple times today. What’s even more amazing, is that you’ve even though you searched millions (or billions) of records, you got a response in milliseconds. In this post, we are going to explore the basic components of a full-text search engine, and use them to build one that can search across millions of documents and rank them according to their relevance in milliseconds, in less than 150 lines of Python code! ...

March 24, 2021 · 15 min · Bart de Goede