关注互联网应用及运维技术的个人博客

ElasticSearch中 distinct, count和group by的实现

ElasticSearch中 distinct, count和group by的实现。ElasticSearch最新版高级插件已经完美支持sql的查询。这里备份一下针对老版本或者需要json翻译SQL的方法。主要是针对一些类SQL操作

distinct

# SQL
SELECT DISTINCT(user_id) FROM table WHERE user_id_type = 3;

# ElasticSearch:
{
    "query": {
        "term": {
            "user_id_type": 3
        }
    },
    "collapse": {
        "field": "user_id"
    }
}

# Result
{
    "hits": {
        "hits": [
            {
                "_index": "es_qd_mkt_visitor_packet_dev_v1_20180621",
                "_type": "ad_crowd",
                "_source": {
                    "user_id": "wx2af8414b502d4ca2_oHtrD0Vxv-_8c678figJNHmtaVQQ",
                    "user_id_type": 3
                },
                "fields": {
                    "user_id": [
                        "wx2af8414b502d4ca2_oHtrD0Vxv-_8c678figJNHmtaVQQ"
                    ]
                }
            }
        ]
    }
}

总结:使用collapse字段后,查询结果中[hits]中会出现[fields]字段,其中包含了去重后的user_id.

注意:对于collapse字段,折叠功能ES5.3版本之后才发布的,聚合&折叠只能针对keyword类型有效。

count + distinct

# SQL:
SELECT COUNT(DISTINCT(user_id)) FROM table WHERE user_id_type = 3;
 
# ElasticSearch
{
    "query": {
        "term": {
            "user_id_type": 3
        }
    },
    "aggs": {
        "count": {
            "cardinality": {
                "field": "user_id"
            }
        }
    }
}
 
# Result
{
    "hits": {},
    "aggregations": {
        "count": {
            "value": 121
        }
    }
}

总结:aggscardinality的字段代表需要distinct的字段

count + group by

# SQL:
SELECT COUNT(user_id) FROM table GROUP BY user_id_type;
 
# ElasticSearch
{
    "aggs": {
        "user_type": {
            "terms": {
                "field": "user_id_type"
            }
        }
    }
}
 
# Result
{
    "hits": {},
    "aggregations": {
        "user_type": {
            ...
            "buckets": [
                {
                    "key": 4,
                    "doc_count": 1220
                },
                {
                    "key": 3,
                    "doc_count": 488
                }
            ]
        }
    }
}

总结:aggsterms的字段代表需要gruop by的字段

count + distinct + group by

# SQL:
SELECT COUNT(DISTINCT(user_id)) FROM table GROUP BY user_id_type;

# ElasticSearch
{
    "aggs": {
        "user_type": {
            "terms": {
                "field": "user_id_type"
            },
            "aggs": {
                "count": {
                    "cardinality": {
                        "field": "user_id"
                    }
                }
            }
        }
    }
}

# Result
{
  ...
  "hits": {
    ...
  },
  "aggregations": {
    "user_type": {
      ...
      "buckets": [
        {
          "key": 4,
          "doc_count": 1220, //去重前数据1220条
          "count": {
            "value": 276 //去重后数据276条
          }
        },
        {
          "key": 3,
          "doc_count": 488, //去重前数据488条
          "count": {
            "value": 121 //去重后数据121条
          }
        }
      ]
    }
  }
}

count + distinct + group by + where

# SQL:
SELECT COUNT(DISTINCT(user_id)) FROM table WHERE user_id_type = 2 GROUP BY user_id;

总结:对于既有group by又有distinct的查询要求,需要在aggs中嵌套子aggs

赞(0)
未经允许不得转载:飞天狒狒 » ElasticSearch中 distinct, count和group by的实现

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址