[Recipes] Nested Aggregations in Elasticsearch

Problem: 

Nest an aggregation inside another aggregation. Run aggregations within a query context.

Solution Summary: 

Bucket aggregations can have a nested bucket or metric aggregation within it. However, metric aggregations cannot nest any aggregation within it.

Prerequisites: 

Setup accounts json as given here.

Add department docs as given here.

Solution Steps: 

Case 1 - Bucket aggregation nested with a metric aggregation

GET accounts/_search
{
  "aggs" : {
    "opening_date_terms" : {
      "terms" : {
        "field":"opening_date",
        "missing": "2017/12/31"
      },
      "aggs": {
        "opening_date_stats": {
          "stats": {
            "field":"opening_date"
          }
        }
      }
    }
  },
  "size": 0
}

Note:

  1. Child aggregation runs within the context of parent aggregation. 

  2. The child "aggs" is inside "opening_date_terms", which is indide parent "aggs".

 

Part of response:

...

"aggregations": {
    "opening_date_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1514678400000,
          "key_as_string": "2017/12/31 00:00:00",
          "doc_count": 990,
          "opening_date_stats": {
            "count": 0,
            "min": null,
            "max": null,
            "avg": null,
            "sum": null
          }
        },
        {
          "key": 1514764800000,
          "key_as_string": "2018/01/01 00:00:00",
          "doc_count": 2,
          "opening_date_stats": {
            "count": 2,
            "min": 1514764800000,
            "max": 1514764800000,
            "avg": 1514764800000,
            "sum": 3029529600000,
            "min_as_string": "2018/01/01 00:00:00",
            "max_as_string": "2018/01/01 00:00:00",
            "avg_as_string": "2018/01/01 00:00:00",
            "sum_as_string": "2066/01/01 00:00:00"
          }
        },

...

 

Case 2 - Bucket aggregations nested with bucket aggregations

GET accounts/_search
{
  "aggs" : {
    "opening_date_terms" : {
      "terms" : {
        "field":"opening_date",
        "missing": "2017/12/31"
      },
      "aggs": {
        "gender_terms": {
          "terms": {
            "field":"gender.keyword"
          }
        }
      }
    }
  },
  "size": 0
}

 

Part of response:

...

"aggregations": {
    "opening_date_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1514678400000,
          "key_as_string": "2017/12/31 00:00:00",
          "doc_count": 990,
          "gender_terms": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "M",
                "doc_count": 502
              },
              {
                "key": "F",
                "doc_count": 488
              }
            ]
          }
        },

...

Note: Here I have only displayed the first bucket, which is the bucket for documents missing "opening_date". 

 

Case 3 - Run the nested aggregation within a query context

We will use the same aggregation  query and add a query context to filter out any documents without "opening_date". Now the missing bucket will have not records.

Response will contain:

"aggregations": {
    "opening_date_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1514678400000,
          "key_as_string": "2017/12/31 00:00:00",
          "doc_count": 990,
          "opening_date_stats": {
            "count": 0,
            "min": null,
            "max": null,
            "avg": null,
            "sum": null
          }
        },
        {
          "key": 1514764800000,
          "key_as_string": "2018/01/01 00:00:00",
          "doc_count": 2,
          "opening_date_stats": {
            "count": 2,
            "min": 1514764800000,
            "max": 1514764800000,
            "avg": 1514764800000,
            "sum": 3029529600000,
            "min_as_string": "2018/01/01 00:00:00",
            "max_as_string": "2018/01/01 00:00:00",
            "avg_as_string": "2018/01/01 00:00:00",
            "sum_as_string": "2066/01/01 00:00:00"
          }
        },

 

TODO

  1. In case 3, display the bucket with no records (missing) with the use of an appropriate setting. 
    1. Hint: Refer to cases here.

Recipe Tags: 

Learn Serverless from Serverless Programming Cookbook

Contact

Please first use the contact form or facebook page messaging to connect.

Offline Contact
We currently connect locally for discussions and sessions at Bangalore, India. Please follow us on our facebook page for details.
WhatsApp (Primary): (+91) 7411174113
Phone (Escalations): (+91) 7411174114

Business newsletter

Complete the form below, and we'll send you an e-mail every now and again with all the latest news.

About

CloudMaterials is my blog to share notes and learning materials on Cloud and Data Analytics. My current focus is on Microsoft Azure and Amazon Web Services (AWS).

I like to write and I try to document what I learn to share with others. I believe that knowledge is useless unless you share it; the more you share, the more you learn.

Recent comments

Photo Stream