Use Hugo Output Formats to generate Lunr index files for your static site search

I’ve been using Lunr.js to enable some basic site search on this blog. Lunr.js requires an index file that contains all the content you want to make available for search. In order to generate that file, I had a kind of hacky setup, depending on running a Grunt script on every deploy, which introduces a dependency on node, and nobody really wants any of that for just a static HTML website.

Listen to this article instead
waveform

I have been wanting forever to have Hugo build that file for me instead1. As it turns out, Output Formats2 make building that index file very easy. Output formats let you generate your content in other formats than HTML, such as AMP or XML for an RSS feed, and it also speaks JSON.

The search on my blog lives on the homepage, where some (very ugly) Javascript downloads the index file, parses it contents into an inverted index, and replaces the content on the page with search results whenever someone starts typing. Essentially, I want to create some JSON output on my homepage (index.json instead of index.html).

I added the following snippet to my config.toml, that says that besides HTML, the homepage also has JSON output:

[outputs]
    home = ["HTML", "JSON"]
    page = ["HTML"]

N.B.: this means that there won’t be a JSON version of the other pages; I just need it on my homepage, because that serves as the search results page too.

Now, I don’t want that index.json file to basically be the list of links it is in the HTML version and in the RSS feed, so I added an index.json file in my layouts folder with the following content:

[
    {{ range $index, $page := .Site.Pages }}
    {{- if eq $page.Type "post" -}}
        {{- if $page.Plain -}}
            {{- if and $index (gt $index 0) -}},{{- end }}
                {
                    "href": "{{ $page.Permalink }}",
                    "title": "{{ htmlEscape $page.Title }}",
                    "categories": [{{ range $tindex, $tag := $page.Params.categories }}{{ if $tindex }}, {{ end }}"{{ $tag| htmlEscape }}"{{ end }}],
                    "content": {{$page.Plain | jsonify}}
                }
            {{- end -}}
      {{- end -}}
    {{- end -}}
]

This will render a JSON file (named index.json) with an array in the root directory of my site, and every item in that array is one of the .Site.Pages (i.e. my posts), whenever that page has text in it and it’s not the homepage. I didn’t bother with minification, because the file is tiny and will be served nicely gzipped by Cloudflare anyway. Whenever Hugo builds the site, it will reindex all the data (i.e. rebuild this file), and I don’t have a dependency on Node and Grunt scripts anymore.


  1. Ever since someone opened a GitHub issue about it 😄 ↩︎

  2. Ships with Hugo version 0.20.0 or greater. ↩︎