Search using geolocations with Elasticsearch

Search using geolocations with Elasticsearch

In my last project we needed to create an engine to search available products within a huge set of products. Those products are available in a certain geographic region with many conditions that must match to be returned as a search result. Typical search features also need to be satisfied like the use of filters and pagination.

One of the most important requirements was to solve this with a fast query using coordinates for the customer and the product location. Products availability may be represented in two different ways:

  • a pair of coordinates and a fixed maximum distance of deliverability
  • a polygon that represents the area where it can be fulfilled.

To develop search queries we decided to use ElasticSearch. As the API is developed in Ruby on Rails, we created an active record model and indexed it into an ElasticSearch index, and using elasticsearch-rails gem we get methods to easily make elasticsearch queries (search, create and delete an index, and to index, update and delete documents).

This search engine, provided us with the possibility of making fast queries, using complex structures with nested attributes and build filters using its aggregates, and what’s best: it supports two types of geo data fields without any plugin:

  • geo_point: fields which support lat/lon pairs,
  • geo_shape: fields which support points, lines, circles, polygons, multi-polygons, etc.

For these data fields it provides four types of queries:

  • geo_bounding_box: finds documents with geo-points that fall into the specified rectangle.
  • geo_distance: finds documents with geo-points inside a circle centered in a specified geo-point with a specified radius of distance.
  • geo_polygon: finds documents with geo-points within a specified polygon.
  • geo_shape: Finds documents with:
    • geo-shapes which either intersect, are contained by, or do not intersect with the specified geo-shape
    • geo-points which intersect the specified geo-shape

Despite all these benefits we are aware of some well-known disadvantages of document oriented systems. Elasticsearch doesn’t allow joins between indexes, so we must have nested attributes and duplicated data, and keeping the data up to date was something that we had to take care of.

As our work could be adapted to those issues, we decided to use it, because of all the benefits.

For the queries in which we need to find products near the customer, geo_distance query is the best alternative, and for the products available in a delimited area, geo_shape queries are useful because we can search documents that customer coordinates point is inside the area.

geo_distance queries

Gets all documents that it geo_point field of type geo_point is in a radius of 20 km from the point 40.743479, -73.992377

"query": {
"bool": {
"must": {
"geo_distance": {
"distance": "20km",
"geo_point": {
"lon": -73.992377,
"lat": 40.743479
}
}
}
}
}

geo_shape queries

Documents have an area field of type geo_shape. We use this query to see if customer location is contained in the area where the product is available. Coordinates structure here is an alternative from the other one, it has the next structure: [lon, lat]

"query": {
"bool": {
"must": {
"geo_shape": {
"area": {
"shape": {
"type": "point",
"coordinates": [-73.992377, 40.743479]
},
"relation": "contains"
}
}
}
}
}

Ordering by distance

Geo point fields also allow us to sort documents by distance.

"sort" : [
{ "price": "asc" },
{
"_geo_distance" : {
"geo_point" : {
"lon": -73.992377,
"lat": 40.743479
},
"order" : "asc",
"unit" : "km",
"mode" : "min",
"distance_type" : "arc",
"ignore_unmapped": true
}
}
],

Here we order them first by price, and then order same price products by the distance between it location and 40.743479, -73.992377

Finally using should and minimum should match we can search elements that match the distance or the area in the same query if it’s needed.

Conclusions:

Thanks to Elasticsearch we could resolve our problem, allowing us to make the search fast including geographic queries in an easy way. With it we could satisfy our functional and non-functional requirements, generating a traditional search improved with customer and products location, so we highly recommend it.