Building a full-text search engine in 150 lines of Python code

Full-text search is everywhere. From finding a book on Scribd, a movie on Netflix, toilet paper on Amazon, or anything else on the web through Google (like how to do your job as a software engineer), you’ve searched vast amounts of unstructured data multiple times today. What’s even more amazing, is that you’ve even though you searched millions (or billions) of records, you got a response in milliseconds. In this post, we are going to build a basic full-text search engine that can search across millions of documents and rank them according to their relevance to the query in milliseconds, in less than 150 lines of code!

Read On →

Use Google Cloud Text-to-Speech to create an audio version of your blog posts

Audio is big. Like really big, and growing fast, to the tune of “two-thirds of the population listens to online audio” and “weekly online listeners reporting an average nearly 17 hours of listening in the last week”. These numbers include all kinds of audio, from online radio stations, audiobooks, streaming services and podcasts (hi Spotify!). It makes sense too. Consuming audio content is easier to consume and more engaging than written content while you’re on the go, exercising, commuting or doing household chores. But what do you do if you’re like me and don’t have the time or recording equipment to ride this podcasting wave, and just write the occasional blog post?

Read On →

Use Hugo Output Formats to generate Lunr index files for your static site search

I’ve been using Lunr.js to enable some basic site search on this blog. Lunr.js requires an index file that contains all the content you want to make available for search. In order to generate that file, I had a kind of hacky setup, depending on running a Grunt script on every deploy, which introduces a dependency on node, and nobody really wants any of that for just a static HTML website.

Read On →

Custom OpenSearch: search from your URL bar

Almost all modern browsers enable websites to customize the built-in search feature to let the user access their search features directly, without going to your website first and finding the search input box. If your website has search functionality accessible through a basic GET request, it’s surprisingly simple to enable this for your website too.

Read On →

Free SSL on Github Pages with a custom domain: Part 2 - Let's Encrypt

GitHub Pages has just become even more awesome. Since yesterday1, GitHub Pages supports HTTPS for custom domains. And yes, it is still free!

Read On →

Free SSL with a custom domain on GitHub Pages

GitHub Pages is pretty awesome. It lets you push a bunch of static HTML (and/or CSS and Javascript) to a GitHub repository, and they’ll host and serve it for you. For free!

Read On →

Bloom filters, using bit arrays for recommendations, caches and Bitcoin

Bloom filters are cool. In my experience, it’s a somewhat underestimated data structure that sounds more complex than it actually is. In this post I’ll go over what they are, how they work (I’ve hacked together an interactive example to help visualise what happens behind the scenes) and go over some of their usecases in the wild.

Read On →

Searching your Hugo site with Lunr

Like many software engineers, I figured I needed a blog of sorts, because it would give me a place for my own notes on “How To Do Thingsā„¢”, let me have a URL to give people, and share my ramblings about Life, the Universe and Everything Else with whoever wants to read them.

Read On →