Building a semantic search engine in ±250 lines of Python

Listen to this article instead Your browser does not support the audio element Once upon a time I wrote a post about building a toy TF-IDF keyword search engine. It has been one of the more popular posts I’ve written, and in this age of AI I felt a sequel has been long overdue. It’s pretty fast (even though it’s written in pure Python), it ranks results with TF-IDF, and it can rank 6.4 million Wikipedia articles for a given query in milliseconds. But it has absolutely no context of what words mean. ...

February 9, 2026 · 14 min · Bart de Goede

Modernizing my 150-line Python search engine: Yahoo! dumps -> Hugging Face 🤗

A few years ago I wrote a full-text search engine in 150 lines of Python. The Wikipedia data source it relied on has since been discontinued, and the tooling around it was showing its age. I wanted to (finally) write a follow-up about semantic search, but I realized that I had to get the old repository in a working state first. It’s now using Hugging Face (🤗) datasets, uv, ruff, pytest, and GitHub Actions, without touching the core search logic. ...

February 9, 2026 · 6 min · Bart de Goede

Building a full-text search engine in 150 lines of Python code

Full-text search is everywhere. From finding a book on Scribd, a movie on Netflix, toilet paper on Amazon, or anything else on the web through Google (like how to do your job as a software engineer), you’ve searched vast amounts of unstructured data multiple times today. What’s even more amazing, is that you’ve even though you searched millions (or billions) of records, you got a response in milliseconds. In this post, we are going to explore the basic components of a full-text search engine, and use them to build one that can search across millions of documents and rank them according to their relevance in milliseconds, in less than 150 lines of Python code! ...

March 24, 2021 · 15 min · Bart de Goede

Use Hugo Output Formats to generate Lunr index files for your static site search

I’ve been using Lunr.js to enable some basic site search on this blog. Lunr.js requires an index file that contains all the content you want to make available for search. In order to generate that file, I had a kind of hacky setup, depending on running a Grunt script on every deploy, which introduces a dependency on node, and nobody really wants any of that for just a static HTML website. ...

July 12, 2019 · 3 min · Bart de Goede

Custom OpenSearch: search from your URL bar

Almost all modern browsers enable websites to customize the built-in search feature to let the user access their search features directly, without going to your website first and finding the search input box. If your website has search functionality accessible through a basic GET request, it’s surprisingly simple to enable this for your website too. ...

November 21, 2018 · 5 min · Bart de Goede

Searching your Hugo site with Lunr

Like many software engineers, I figured I needed a blog of sorts, because it would give me a place for my own notes on “How To Do Things™”, let me have a URL to give people, and share my ramblings about Life, the Universe and Everything Else with whoever wants to read them. ...

March 4, 2018 · 10 min · Bart de Goede