Building a full-text search engine in 150 lines of Python code

Full-text search is everywhere. From finding a book on Scribd, a movie on Netflix, toilet paper on Amazon, or anything else on the web through Google (like how to do your job as a software engineer), you’ve searched vast amounts of unstructured data multiple times today. What’s even more amazing, is that you’ve even though you searched millions (or billions) of records, you got a response in milliseconds. In this post, we are going to explore the basic components of a full-text search engine, and use them to build one that can search across millions of documents and rank them according to their relevance in milliseconds, in less than 150 lines of Python code! ...

March 24, 2021 · 15 min · Bart de Goede

Use Hugo Output Formats to generate Lunr index files for your static site search

I’ve been using Lunr.js to enable some basic site search on this blog. Lunr.js requires an index file that contains all the content you want to make available for search. In order to generate that file, I had a kind of hacky setup, depending on running a Grunt script on every deploy, which introduces a dependency on node, and nobody really wants any of that for just a static HTML website. ...

July 12, 2019 · 3 min · Bart de Goede

Custom OpenSearch: search from your URL bar

Almost all modern browsers enable websites to customize the built-in search feature to let the user access their search features directly, without going to your website first and finding the search input box. If your website has search functionality accessible through a basic GET request, it’s surprisingly simple to enable this for your website too. ...

November 21, 2018 · 5 min · Bart de Goede

Searching your Hugo site with Lunr

Like many software engineers, I figured I needed a blog of sorts, because it would give me a place for my own notes on “How To Do Things™”, let me have a URL to give people, and share my ramblings about Life, the Universe and Everything Else with whoever wants to read them. ...

March 4, 2018 · 10 min · Bart de Goede