Wagtail and Elasticsearch: Clean up unused indexes
Wagtail deletes the search index and rebuilds it from scratch by default when using the Elasticsearch backend. 1
This means users won’t get any results until the re-index is finished. To prevent that one can set ATOMIC_REBUILD
to True
in the
search backends configuration. This allows Wagtail to create new indices, index your content, alias them to the canonical indices and remove the old ones.
However, should the indexing process result in any error, you end up with stray indices. This could spell trouble if you’re on limited hosting plans such as Bonsai on Heroku.
If you have access to the Elasticsearch console you could run the following checks:
GET /_cat/indices
2 - returns all indices, their status, number of primaries/replicas, documents in them and size
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open wagtail__wagtailimages_image_lcgak2e <UUID> 1 1 0 0 416b 208b
green open wagtail__images_customimage_eatxk42 <UUID> 1 1 452 4 416.1kb 209.3kb
green open wagtail__wagtaildocs_document_ajkn4li <UUID> 1 1 0 0 416b 208b
green open wagtail__documents_customdocument_xtwnqfk <UUID> 1 1 90 0 99.6kb 49.8kb
green open wagtail__wagtailcore_page_zqkorrt <UUID> 1 1 3251 0 6.8mb 3.4mb
green open wagtail__wagtailcore_page_bqsaiep <UUID> 1 1 3238 0 6.9mb 3.4mb
green open wagtail__wagtailcore_page_nm7irjb <UUID> 1 1 6792 95 23.7mb 11.7mb
GET /_cat/aliases
3 - gives you the list of aliases
alias index
wagtail__documents_customdocument wagtail__documents_customdocument_xtwnqfk - - - -
wagtail__images_customimage wagtail__images_customimage_eatxk42 - - - -
wagtail__wagtaildocs_document wagtail__wagtaildocs_document_ajkn4li - - - -
wagtail__wagtailimages_image wagtail__wagtailimages_image_lcgak2e - - - -
wagtail__wagtailcore_page wagtail__wagtailcore_page_nm7irjb - - - -
Elasticsearch breaks an index into shards in order to distribute them and scale. It is recommended to run a cluster of notes so that you have primary and secondary nodes (replicas) for reliability. On services such as Bonsai, each node counts towards your total index limit.
GET /_cat/shards
4 - list all shards
index shard prirep state docs store ip node
wagtail__images_customimage_eatxk42 0 p STARTED 452 209.3kb <ip> <node>
wagtail__images_customimage_eatxk42 0 r STARTED 452 206.8kb <ip> <node>
wagtail__documents_customdocument_xtwnqfk 0 r STARTED 90 49.8kb <ip> <node>
wagtail__documents_customdocument_xtwnqfk 0 p STARTED 90 49.8kb <ip> <node>
wagtail__wagtaildocs_document_ajkn4li 0 p STARTED 0 208b <ip> <node>
wagtail__wagtaildocs_document_ajkn4li 0 r STARTED 0 208b <ip> <node>
wagtail__wagtailimages_image_lcgak2e 0 r STARTED 0 208b <ip> <node>
wagtail__wagtailimages_image_lcgak2e 0 p STARTED 0 208b <ip> <node>
wagtail__wagtailcore_page_bqsaiep 0 p STARTED 3238 3.4mb <ip> <node>
wagtail__wagtailcore_page_bqsaiep 0 r STARTED 3238 3.4mb <ip> <node>
wagtail__wagtailcore_page_nm7irjb 0 p STARTED 6792 11.7mb <ip> <node>
wagtail__wagtailcore_page_nm7irjb 0 r STARTED 6792 12mb <ip> <node>
wagtail__wagtailcore_page_zqkorrt 0 r STARTED 3251 3.4mb <ip> <node>
wagtail__wagtailcore_page_zqkorrt 0 p STARTED 3251 3.4mb <ip> <node>
In the examples above, we know that wagtail__wagtailcore_page
is an alias of wagtail__wagtailcore_page_nm7irjb
,
thus wagtail__wagtailcore_page_bqsaiep
and wagtail__wagtailcore_page_zqkorrt
are old indices that need removing.
To remove it via the Elasticsearch console/API: DELETE /index_name
5 (for example, DELETE /wagtail__wagtailcore_page_bqsaiep
)
Wagtail
If you do not have access to the Elasticsearch console, you could make use of the tools Wagtail provides via the search backend.
from wagtail.search.backends import get_search_backend
from wagtail.models import Page
# get the default backend, and find the current page index
backend = get_search_backend("default")
page_index = backend.get_index_for_model(Page)
# get the index name (it is 'wagtail__wagtailcore_page')
print(page_index.name)
# get the index the page index is aliased as. in our example `wagtail__wagtailcore_page_nm7irjb`
for alias in page_index.aliased_indices():
print(alias.name)
# check that a given index exists
print(page_index.es.indices.exists(index_name)
# list all indices.
page_index.es.cat.indices()
indices_to_delete: list[str] = [] # provide a list of index names to remove
for index_name in indices_to_delete:
page_index.es.indices.delete(index_name)
note
You can do this using a script or via the Django shell (
django-admin shell
)