Appendix: Algolia Search

Algolia is a commercial index-as-a-service offering, with a free plan for open source projects. It is used by many websites built with Antora, including Antora’s own website.

This page describes how we have incorporated Algolia to provide a searchable index for causeway.apache.org website. They are based on/adapt Dan Allen’s notes for setting up search on the Antora website.

An alternative and perhaps slightly simpler approach would be to use Algolia’s DocSearch, which performs the scanning automatically.

The benefit of the approach described here is that it could be scripted easily into CI.

Clone the causeway-site

If necessary, clone out the current version of the causeway website:

git clone https://github.com/apache/causeway-site .
git checkout asf-site

The "asf-site" branch is what is published, and the content of the site resides in content directory.

Create Algolia Application and Empty Index

  • Registered on https://algolia.com

  • Applied for open source license

  • On website:

    • Created app: "5ISP5TFAEN"

    • Created empty index "causeway-apache-org"

    • Made note of the search API key: "0fc51c28b4ad46e7318e96d4e97fab7c"

    • Made note of the Admin API key: "xxx"

      the "admin API" should not be made public, as it allows the index to be modified or deleted.

Create the Crawler config

We use Algolia’s DocSearch crawler tool (also called the scraper) to index over the static html files to populate the index records. A config file describes how to process the files.

algolia-config.json
{
  "index_name": "causeway-apache-org",
  "start_urls": [
    "https://causeway.apache.org"
  ],
  "sitemap_urls": [
    "https://causeway.apache.org/sitemap.xml"
  ],
  "stop_urls": [
    "https://causeway[.]apache[.]org/(docs|comguide|conguide|setupguide|relnotes|refguide|userguide|core|security|extensions|valuetypes|vro|vw|pjdo|pjpa|testing|tutorials|incubator|regressiontests)/(2.0.0|2.0.0-RC2|2.0.0-RC1|2.0.0-M9|2.0.0-M8|2.0.0-M7|2.0.0-M6|2.0.0-M5)/.*",
    "https://causeway[.]apache[.]org/versions/(1.17.0|1.16.2|1.16.1|1.16.0|1.15.1|1.15.0|1.14.0|1.13.2|1.13.2|1.13.1|1.13.0|1.12.2|1.12.1|1.12.0|1.11.1|1.11.0)/.*"
  ],
  "selectors": {
    "lvl0": {
      "selector": "//nav[@class='breadcrumbs']//li[last()]/a",
      "type": "xpath",
      "global": true,
      "default_value": "Home"
    },
    "lvl1": ".doc h1"
  ,"lvl2": ".doc h2"
  ,"lvl3": ".doc h3"
  ,"lvl4": ".doc h4"
  ,"text": ".doc p, .doc td.content, .doc th.tableblock"
  }
}

The stop_urls property with any paths that should not be crawled.

Our policy is to only index the most recent version. This avoids lots of duplication in the index; previous versions of the page are easily accessible.

The config file reference for this file can be found here.

The file itself resides in the https://github.com/apache/causeway-site repo (in the asf-site branch).

Create the algolia.env file

The algolia.env file provides credentials to populate the search index maintained by the algolia service.

algolia.env
APPLICATION_ID=5ISP5TFAEN
API_KEY=xxxx

The API_KEY is available in the 1password.com vault shared between all Causeway committers.

The file itself resides in the https://github.com/apache/causeway-site repo (in the asf-site branch)

Update the Antora UI bundle

The Antora UI bundle (which defines the skin of the website) was updated. There are four steps:

  • reference the CSS

  • reference the JavaScript

  • set up an input field for the docsearch JavaScript function to hook into

  • run the docsearch JavaScript function on page load

These are fully described in the Algolia docs. There are other options for styling, see here.

Generate and Publish the site

The remaining steps are routine and performed each time there is a change to the site:

And you’re done.