聚合分析
聚合简介
啥是聚合
聚合分析是数据库中重要的功能特性,例如:找出某字段的最大值
、最小值
、平均值
、总和
等等。
对一个数据集求 max、min、avg、sum 等,在 ES 中称为 指标聚合 metric。
而在关系型数据库中还可以对查出来的数据进行分组 Group By,再在组上 max、min 等,在 ES 中称为 分桶、桶聚合 bucketing。
除此之外 ES 还提供 矩阵聚合 matrix、管道聚合 pipeline 等。
聚合语法
| {
"aggregations": { // 聚合关键词,可简写为 aggs
"<AGG_NAME>": { // 聚合名称
"<AGG_TYPE>": { // 聚合类型
<AGG_BODY> // 聚合体:对哪些字段聚合
}
[, "aggregations": {[<SUB_AGGREGATION>]+ }]? // 聚合里定义子聚合
[, "meta": {[<META_DATA_BODY>]}]? // 定义元信息
}
}
[, "aggregations": {...}]* // 其他聚合,0或N个
}
|
*
: 0 or N, +
: 1 or N, ?
: 0 or 1.
聚合值的来源
聚合计算的值可以取字段的值,也可是脚本计算的结果。
指标聚合
max/min/sum/avg
查询所有客户中余额的最大值
查询年龄为24岁的客户中的余额最大值
| POST /bank/_search HTTP/1.1
Content-Type: application/json
{
"size": 2,
"query": {"match": {"age": 24}},
"sort": [{"balance": {"order": "desc"}}],
"aggs": {
"max_balance": {
"max": {"field": "balance"}
}
}
}
|
| {
"aggregations": {
"max_balance": {
"value": 48745
}
},
...
"hits": {
"total": 42,
"max_score": null,
"hits": [
{
...
"_source": {
"account_number": 697,
"balance": 48745,
"firstname": "Mallory",
"lastname": "Emerson",
"age": 24,
"gender": "F",
"address": "318 Dunne Court",
"employer": "Exoplode",
"email": "malloryemerson@exoplode.com",
"city": "Montura",
"state": "LA"
},
"sort": [48745]
},
{
...
"_source": {
"account_number": 917,
"balance": 47782,
"firstname": "Parks",
"lastname": "Hurst",
"age": 24,
"gender": "M",
"address": "933 Cozine Avenue",
"employer": "Pyramis",
"email": "parkshurst@pyramis.com",
"city": "Lindcove",
"state": "GA"
},
"sort": [47782]
}
]
}
}
|
值来源于脚本,查询所有客户的平均年龄是多少,并对平均年龄加10
==="Req"
| POST /bank/_search?size=0 HTTP/1.1
Content-Type: application/json
{
"aggs": {
"avg_age": {
"avg": {"script": {"source": "doc.age.value"}}
},
"avg_age10": {
"avg": {"script": {"source": "doc.age.value + 10"}}
}
}
}
|
count: 文档计数
Value count: 统计某字段有值的文档数
cardinality: 值去重
Example
stats: 统计 count max min avg sum 5个值
Extended stats
比stats多4个统计结果: 平方和、方差、标准差、平均值加/减两个标准差的区间
Example
| POST /bank/_search HTTP/1.1
Content-Type: application/json
{
"aggs": {
"age_stats": {
"extended_stats": {"field": "age"}
}
}
}
|
| {
...
"hits": {
"total": 1000,
"max_score": 0,
"hits": []
},
"aggregations": {
"age_stats": {
"count": 1000,
"min": 20,
"max": 40,
"avg": 30.171,
"sum": 30171,
"sum_of_squares": 946393,
"variance": 36.10375899999996,
"std_deviation": 6.008640362012022,
"std_deviation_bounds": {
"upper": 42.18828072402404,
"lower": 18.153719275975956
}
}
}
}
|
Percentiles 占比百分位对应的值统计
Percentiles rank 统计值小于等于指定值的文档占比
桶聚合