[Recipes] Nested Aggregations in Elasticsearch

Problem:

Nest an aggregation inside another aggregation. Run aggregations within a query context.

Solution Summary:

Bucket aggregations can have a nested bucket or metric aggregation within it. However, metric aggregations cannot nest any aggregation within it.

Prerequisites:

Setup accounts json as given here.

Add department docs as given here.

Solution Steps:

Case 1 - Bucket aggregation nested with a metric aggregation

GET accounts/_search
{
"aggs" : {
"opening_date_terms" : {
"terms" : {
"field":"opening_date",
"missing": "2017/12/31"
},
"aggs": {
"opening_date_stats": {
"stats": {
"field":"opening_date"
}
}
}
}
},
"size": 0
}

Note:

Child aggregation runs within the context of parent aggregation.
The child "aggs" is inside "opening_date_terms", which is indide parent "aggs".

Part of response:

...

"aggregations": {
"opening_date_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1514678400000,
"key_as_string": "2017/12/31 00:00:00",
"doc_count": 990,
"opening_date_stats": {
"count": 0,
"min": null,
"max": null,
"avg": null,
"sum": null
}
},
{
"key": 1514764800000,
"key_as_string": "2018/01/01 00:00:00",
"doc_count": 2,
"opening_date_stats": {
"count": 2,
"min": 1514764800000,
"max": 1514764800000,
"avg": 1514764800000,
"sum": 3029529600000,
"min_as_string": "2018/01/01 00:00:00",
"max_as_string": "2018/01/01 00:00:00",
"avg_as_string": "2018/01/01 00:00:00",
"sum_as_string": "2066/01/01 00:00:00"
}
},

...

Case 2 - Bucket aggregations nested with bucket aggregations

GET accounts/_search
{
"aggs" : {
"opening_date_terms" : {
"terms" : {
"field":"opening_date",
"missing": "2017/12/31"
},
"aggs": {
"gender_terms": {
"terms": {
"field":"gender.keyword"
}
}
}
}
},
"size": 0
}

Part of response:

...

"aggregations": {
"opening_date_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1514678400000,
"key_as_string": "2017/12/31 00:00:00",
"doc_count": 990,
"gender_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "M",
"doc_count": 502
},
{
"key": "F",
"doc_count": 488
}
]
}
},

...

Note: Here I have only displayed the first bucket, which is the bucket for documents missing "opening_date".

Case 3 - Run the nested aggregation within a query context

We will use the same aggregation query and add a query context to filter out any documents without "opening_date". Now the missing bucket will have not records.

Response will contain:

"aggregations": {
"opening_date_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1514678400000,
"key_as_string": "2017/12/31 00:00:00",
"doc_count": 990,
"opening_date_stats": {
"count": 0,
"min": null,
"max": null,
"avg": null,
"sum": null
}
},
{
"key": 1514764800000,
"key_as_string": "2018/01/01 00:00:00",
"doc_count": 2,
"opening_date_stats": {
"count": 2,
"min": 1514764800000,
"max": 1514764800000,
"avg": 1514764800000,
"sum": 3029529600000,
"min_as_string": "2018/01/01 00:00:00",
"max_as_string": "2018/01/01 00:00:00",
"avg_as_string": "2018/01/01 00:00:00",
"sum_as_string": "2066/01/01 00:00:00"
}
},