Building a full-text search engine in 150 lines of Python code

Full-text search is everywhere. From finding a book on Scribd, a movie on Netflix, toilet paper on Amazon, or anything else on the web through Google (like how to do your job as a software engineer), you’ve searched vast amounts of unstructured data multiple times today. What’s even more amazing, is that you’ve even though you searched millions (or billions) of records, you got a response in milliseconds. In this post, we are going to explore the basic components of a full-text search engine, and use them to build one that can search across millions of documents and rank them according to their relevance in milliseconds, in less than 150 lines of Python code! ...

March 24, 2021 · 15 min · Bart de Goede

Bloom filters, using bit arrays for recommendations, caches and Bitcoin

Bloom filters are cool. In my experience, it’s a somewhat underestimated data structure that sounds more complex than it actually is. In this post I’ll go over what they are, how they work (I’ve hacked together an interactive example to help visualise what happens behind the scenes) and go over some of their usecases in the wild. ...

March 23, 2018 · 9 min · Bart de Goede