Elasticsearch deep search approaches

How to go through Elasticsearch data with pageable requests

A practical guide on how to use three Elasticsearch pageable search types together with Spring Boot. Differences, pros and cons, API, implementations with Spring Boot and Jest.

Photo by Agence Olloweb on Unsplash

Introduction

Many products use Elasticsearch (ES) like storage to display documents (entity stored in ES) UI. It often requires returning paginated results from the API. ES has a broad spread way to implement pagination. This way uses from and size parameters. It’s the right approach until the documents’ amount is under limits. Or you get an exception like this:

Result window is too large.

The easiest way to fix this issue is to increase the limit with a PUT request to update index settings.

However, Elasticsearch engineers didn’t pull this limit out of a hat. It would be too simple if you can increase this limit without a side effect. The side effects are performance downgrading, hardware load increase in case of deep search. A deep search is a search that goes throw all your data. The bottleneck point here is far pages. I want to look at the from/size approach and its alternatives to share how deep search can be implemented in a suitable use case way.

FROM/SIZE pageable search

Use cases: pageable search (UI view for the document).
Pros: any page is available for a random requests; sorted data; aggregations.
Limits: limited total pages size OR performance downgrade; not good for the deep search.
Docs: Search request from size

From/size approach is the canonical way to request paginated results. Pageable implementation uses two parameters to define a page - from and size. Where size is a page size; from - starting index of the first element on the page. Here is the code to build a pageable query sorted by score and customer field.

elasticsearchTemplate is a part of springdata.jest library.

The client could operate with this implementation by using classic API with the following parameters: page, page size, sort direction, and sortBy.

SCROLL API

Use cases: scrolling on the UI; deep search.
Pros: doesn’t hit max_result_window limits.
Limits: memory consumption on the client side; continuous pagination (you cannot start pagination from any page but the first).
Docs: Scroll API

Jest and Spring also support Scroll API out of the box.

The scroll timeout parameter indicates how long Elasticsearch should retain the search context for the request.

The response contains a scroll ID field. Use this field to get the following page.

While a search request returns a single “page” of results, 
the scroll API can be used to retrieve large numbers of results 
(or even all results) from a single search request, 
in much the same way as you would use a cursor 
on a traditional database.

SEARCH AFTER API

Use cases: the best choice for the deep search (based on the latest ES docs).
Pros: data is sorted by business logic; any page is available for a random requests; sorted data; aggregations.
Limits: continuous pagination (you can fo only from the 1st to the last page); point in time is not available for ES before v7.x.; Spring Boot/Jest manual implementation.
Docs: Search after

Search after is the recommended way for the deep search based on the latest documentation.

The request should be sorted. ES recommends including the tiebreaker field field in the sorting. The tiebreaker field should be unique for each document but differs from the _id field.

The search after approach uses results from the order field of the response to get the next batch.

The first page request:

Response:

The following request:

The order of the results could be changed if documents were updated in the process. You can use PIT (point in time) to prevent this (available starting 7.x version).

Create PIT:

Delete PIT:

Use PIT in the search request:

Jest and Spring ES data doesn’t support search after out of the box.

You can manually build a string query to make a search request with desired parameters.

Here is an example of search service implementation.

Client implementation:

Summary

ES is an excellent search engine. It could be used for various scenarios that required paginated results. There are three types of pageable search that you can use for the particular use case. Use from/size for the limited UI real-time search; scrolling through the data with Scroll API, or apply SEARCH AFTER to do a deep ordered search without hitting memory limits.