Appendix: Algolia Search
Algolia is a commercial index-as-a-service offering, with a free plan for open source projects. It is used by many websites built with Antora, including Antora’s own website.
This page describes how we have incorporated Algolia to provide a searchable index for causeway.apache.org website. They are based on/adapt Dan Allen’s notes for setting up search on the Antora website.
An alternative and perhaps slightly simpler approach would be to use Algolia’s DocSearch, which performs the scanning automatically. The benefit of the approach described here is that it could be scripted easily into CI. |
Clone the causeway-site
If necessary, clone out the current version of the causeway website:
git clone https://github.com/apache/causeway-site .
git checkout asf-site
The "asf-site" branch is what is published, and the content of the site resides in content
directory.
Create Algolia Application and Empty Index
-
Registered on https://algolia.com
-
Applied for open source license
-
On website:
-
Created app: "5ISP5TFAEN"
-
Created empty index "causeway-apache-org"
-
Made note of the search API key: "0fc51c28b4ad46e7318e96d4e97fab7c"
-
Made note of the Admin API key: "xxx"
the "admin API" should not be made public, as it allows the index to be modified or deleted.
-
Create the Crawler config
We use Algolia’s DocSearch crawler tool (also called the scraper) to index over the static html files to populate the index records. A config file describes how to process the files.
{
"index_name": "causeway-apache-org",
"start_urls": [
"https://causeway.apache.org"
],
"sitemap_urls": [
"https://causeway.apache.org/sitemap.xml"
],
"stop_urls": [
"https://causeway[.]apache[.]org/(docs|comguide|conguide|setupguide|relnotes|refguide|userguide|core|security|extensions|valuetypes|vro|vw|pjdo|pjpa|testing|tutorials|incubator|regressiontests)/(2.0.0|2.0.0-RC2|2.0.0-RC1|2.0.0-M9|2.0.0-M8|2.0.0-M7|2.0.0-M6|2.0.0-M5)/.*",
"https://causeway[.]apache[.]org/versions/(1.17.0|1.16.2|1.16.1|1.16.0|1.15.1|1.15.0|1.14.0|1.13.2|1.13.2|1.13.1|1.13.0|1.12.2|1.12.1|1.12.0|1.11.1|1.11.0)/.*"
],
"selectors": {
"lvl0": {
"selector": "//nav[@class='breadcrumbs']//li[last()]/a",
"type": "xpath",
"global": true,
"default_value": "Home"
},
"lvl1": ".doc h1"
,"lvl2": ".doc h2"
,"lvl3": ".doc h3"
,"lvl4": ".doc h4"
,"text": ".doc p, .doc td.content, .doc th.tableblock"
}
}
The stop_urls
property with any paths that should not be crawled.
Our policy is to only index the most recent version. This avoids lots of duplication in the index; previous versions of the page are easily accessible. |
The config file reference for this file can be found here.
The file itself resides in the https://github.com/apache/causeway-site
repo (in the asf-site
branch).
Create the algolia.env file
The algolia.env
file provides credentials to populate the search index maintained by the algolia service.
APPLICATION_ID=5ISP5TFAEN
API_KEY=xxxx
The API_KEY
is available in the 1password.com
vault shared between all Causeway committers.
The file itself resides in the https://github.com/apache/causeway-site
repo (in the asf-site
branch)
Update the Antora UI bundle
The Antora UI bundle (which defines the skin of the website) was updated. There are four steps:
-
reference the CSS
-
reference the JavaScript
-
set up an input field for the
docsearch
JavaScript function to hook into -
run the
docsearch
JavaScript function on page load
These are fully described in the Algolia docs. There are other options for styling, see here.