语雀文档库: https://www.yuque.com/imoyt/zssuuf/gcbz67

一、简介

mysql 用作持久化存储,ES 用作检索

基本概念:index库>type表>document文档

  1. index 索引

​ 动词:相当于 mysql 的 insert

​ 名词:相当于 mysql 的 db

  1. Type 类型

​ 在 index 中,可以定义一个或多个类型

​ 类似于 mysql 的 table,每一种类型的数据放在一起

  1. Document 文档

​ 保存在某个 index 下,某种 type 的一个数据 document,文档是 json 格式的,document 就像是 mysql 中的某个 table 里面的内容。每一行对应的列叫属性

为什么 ES 搜索快?倒排索引

保存的记录

  • 红海行动
  • 探索红海行动
  • 红海特别行动
  • 红海记录片
  • 特工红海特别探索

将内容分词就记录到索引中

记录
红海1,2,3,4,5
行动1,2,3
探索2,5
特别3,5
纪录片4,
特工5

检索:

1)、红海特工行动?查出后计算相关性得分:3 号记录命中了 2 次,且 3 号本身才有 3 个单词,2/3,所以 3 号最匹配
2)、红海行动?

关系型类型数据库中两个数据表示式独立的,即使他们里面有相同名称的列也不影响使用,但 ES 中不是这样的。Elasticsearch 是基于 Lucene 开发的搜索引擎,而 ES 中不同 type 下名称相同的 Lucene 中的处理方式是一样的。

  • 两个不同的 type 下的两个 user_name,在 ES 同一个索引下其实被认为是同一个 filed,你必须在两个不同的 type 中定义相同的 filed 映射。否则,不同 type 中的相同字段名称就会处理中出现冲突的情况,导致 Lucene 处理效率下降。

  • 去掉 type 就是为了提交 ES 处理数据的效率。

  • ElasticSearch 7.x URL 中的 type 参数为可选。比如,索引一个文档不再要求提供文档类型。

  • ElasticSearch 8.x 不再支持 URL 中的 type 参数

    解决:将索引从多个类型迁移到单类型。每种类型文档一个独立索引

二、ElasticSearch 安装

请参考这篇博客:CentOS 7 安装 Elasticsearch

三、初步检索

1、检索 es 信息

(1)GET /_cat/nodes:查看所有节点

如:http://192.168.56.10:9200/_cat/nodes

可以直接浏览器输入上面的 url,也可以在 kibana 中输入 GET /_cat/nodes

1
2
3
4
5
127.0.0.1 13 96 4 0.00 0.05 0.17 dilm * 566f1291aedb

566f1291aedb 代表节点

* 代表是主节点

(2)GET /_cat/health: 查看 es 健康状况

如:http://192.168.56.10:9200/_cat/health

1
1623940778 14:39:38 elasticsearch green 1 1 3 3 0 0 0 0 - 100.0%

注意: green 表示健康值正常

(3)GET /_cat/master: 查看主节点

如:http://192.168.56.10:9200/_cat/master

1
2
3
4
JBYr_91ySkSi1cHVNzkgaw 127.0.0.1 127.0.0.1 566f1291aedb

主节点唯一编号
虚拟机地址

(4)GET /_cat/indicies : 查看所有的索引,等价于 mysql 数据库的 show database

如:http://192.168.56.10:9200/_cat/indicies

1
2
3
4
5
green  open .kibana_task_manager_1   DhtDmKrsRDOUHPJm1EFVqQ 1 0 2 3 40.8kb 40.8kb
green open .apm-agent-configuration vxzRbo9sQ1SvMtGkx6aAHQ 1 0 0 0 230b 230b
green open .kibana_1 rdJ5pejQSKWjKxRtx-EIkQ 1 0 5 1 18.2kb 18.2kb

这3个索引是kibana创建的

2、新增文档

保存一个数据,保存在哪个索引的那个类型下(哪张数据表哪张表下),保存时用唯一标识指定

1
2
3
4
5
6
7
8
9
# 在customer索引下的external类型下保存1号数据库
PUT customer/external/1

# POSTMAN输入
http://192.168.56.10:9200/customer/external/1

{
"name":"John Doe"
}

==PUT 和 POST 区别==

  • POST 新增。如果不指定 id,会自动生成 id.指定 id 就会修改这个数据,并新增版本号;
    • 可以不指定 id,不指定 id 时永远为创建
    • 指定不存在的 id 为创建
    • 指定存在的 id 为更新,而版本号会根据内容变而觉得版本号递增与否
  • PUT 可以新增也可以修改。PUT 必须指定 id; 由于 PUT 需要指定 id,我们一般用来修改操作,不指定 id 会报错。
    • 必须指定 id
    • 版本号总会增加

区分技巧:put 和 java 里的 map.put 一样必须指定 key-value。 而 post 相当于 mysql insert

seq_no 和 version 的区别:

  • 每个文档的版本号_version起始值都为 1 每次对当前文档成功操作后都加 1
  • 而序列号 _seq_no则可以看做事索引插入数据插入数据时为 0,每对索引内数据操作成功一次 sqlNO 加 1,并且文档会记录时第几次操作便它成为现在的情况的。

可以参考:https://www.cnblogs.com/Taeso/p/13363136.html

测试

image-20210617230719119

创建数据成功后,显示 201 created 表示插入记录成功。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
返回数据:
带有下划线开头的,称为元数据,反映了当前的基本信息
{
"_index": "customer", 表明该数据在那个数据库下;
"_type": "external", 表明该数据在那个类型下;
"_id": "1", 表明被保存数据的id;
"_version": 1, 被保存数据的版本
"result": "created", 这里是创建了一条数据,如果重新put一条数据,则该状态会变为updated,并且版本号也会发生变化。
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}

下面选用 POST 方式:

添加数据的时候,不指定 ID,会自动的生成 id,并且类型是新增:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"_index": "customer",
"_type": "external",
"_id": "5MIjvncBKdY1wAQm-wNZ",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 11,
"_primary_term": 6
}

再次使用 POST 插入数据,不指定 ID,仍然是新增的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"_index": "customer",
"_type": "external",
"_id": "5cIkvncBKdY1wAQmcQNk",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 12,
"_primary_term": 6
}

添加数据的时候,指定 ID,会使用该 id,并且类型是新增:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"_index": "customer",
"_type": "external",
"_id": "2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 13,
"_primary_term": 6
}

再次使用 POST 插入数据,指定同样的 ID,类型为 updated

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"_index": "customer",
"_type": "external",
"_id": "2",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 14,
"_primary_term": 6
}

3、查看文档

GET /customer/external/1

http://192.168.56.10:9200/customer/external/1

1
2
3
4
5
6
7
8
9
10
11
12
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 10,
"_seq_no": 18, // 并发控制字段,每次更新都会+1,用来做乐观锁
"_primary_term": 6, //同上,主分片重新分配,如果重启,就会变化
"found": true,
"_source": {
"name": "John Doe"
}
}

乐观锁用法:通过“if_seq_no=1&if_primary_term=1”,当序列号匹配的时候,才进行修改,否则不修改

实例:将 id=1 的数据更新为 name=1,然后再次更新为 name=2,起始1_seq_no=18,_primary_term=6

(1)将 name 更新为 1

PUT http://192.168.56.10:9200/customer/external/1?if_seq_no=6&if_primary_term=2

image-20210618211751569

返回结果:

image-20210618211900290

再次查询:

image-20210618212044783

2)将 name 更新为 2,更新过程中使用 seq_no=6

PUT http://192.168.56.10:9200/customer/external/1?if_seq_no=6&if_primary_term=2

结果为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
"error": {
"root_cause": [
{
"type": "version_conflict_engine_exception",
"reason": "[1]: version conflict, required seqNo [6], primary term [2]. current document has seqNo [7] and primary term [2]",
"index_uuid": "Tdur0O2yQOm75NF-JaLPpw",
"shard": "0",
"index": "customer"
}
],
"type": "version_conflict_engine_exception",
"reason": "[1]: version conflict, required seqNo [6], primary term [2]. current document has seqNo [7] and primary term [2]",
"index_uuid": "Tdur0O2yQOm75NF-JaLPpw",
"shard": "0",
"index": "customer"
},
"status": 409
}

(3)查询新的数据

GET http://192.168.56.10:9200/customer/external/1

1
2
3
4
5
6
7
8
9
10
11
12
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 6,
"_seq_no": 7,
"_primary_term": 2,
"found": true,
"_source": {
"name": 2
}
}

能够看到_seq_no 变为 7

(4)再次更新,更新成功

PUT http://192.168.56.10:9200/customer/external/1?if_seq_no=7&if_primary_term=2

4、更新文档 _update

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
POST customer/externel/1/_update
{
"doc":{
"name":"111"
}
}
或者
POST customer/externel/1
{
"doc":{
"name":"222"
}
}
或者
PUT customer/externel/1
{
"doc":{
"name":"222"
}
}

不同:带有 update 情况下

  • ==POST 操作会对比源文档数据==,如果相同不会有什么操作,文档 version 不会增加。
  • PUT 操作总会重新保存并增加 version 版本

POST 时带_update对比元数据如果一样就不进行任何操作

看场景:

  • 对于大并发更新,不带 update
  • 对于大并发查询偶尔更新,带 update;对比更新,重新计算分配规则

(1)POST 更新文档,带有_update

http://192.168.56.10:9200/customer/external/1/_update

image-20210618215519208

image-20210618215656180

如果再次执行更新,则不执行任何操作,序列号也不发生变化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
返回
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 10,
"result": "noop", // 无操作
"_shards": {
"total": 0,
"successful": 0,
"failed": 0
},
"_seq_no": 11,
"_primary_term": 2
}

POST 更新方式,会对比原来的数据,和原来的相同,则不执行任何操作(version 和_seq_no)都不变。****

(2)POST 更新文档,不带_update

在更新过程中,重复执行更新操作,数据也能够更新成功,不会和原来的数据进行对比。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 13,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 21,
"_primary_term": 6
}

5、删除文档或索引

1
2
DELETE customer/external/1
DELETE customer

: elasticsearch 并没有提供删除类型的操作,值提供了删除索引和文档的操作

实例:删除 id=1 的数据,删除后继续查询

DELETE http://192.168.56.10:9200/customer/external/1

image-20210618220643602

再次执行 DELETE http://192.168.56.10:9200/customer/external/1

image-20210618220715096

GET http://192.168.56.10:9200/customer/external/1

1
2
3
4
5
6
{
"_index": "customer",
"_type": "external",
"_id": "1",
"found": false
}

删除索引

实例:删除整个 costomer 索引数据

删除前,所有的索引http://192.168.56.10:9200/_cat/indices

1
2
3
4
green  open .kibana_task_manager_1   wJwVPfDPTbWhGFrpnFq6ag 1 0 2 0 12.5kb 12.5kb
green open .apm-agent-configuration 9HdINrLnTjOhHSHjtFHaiQ 1 0 0 0 283b 283b
green open .kibana_1 JjDT586BTdayZm-jpeB2vg 1 0 8 0 28.6kb 28.6kb
yellow open customer Tdur0O2yQOm75NF-JaLPpw 1 1 3 1 9.9kb 9.9kb

删除“ customer ”索引

DELTE http://192.168.56.10:9200/customer

1
2
3
4
响应
{
"acknowledged": true
}

删除后,所有的索引http://192.168.56.10:9200/_cat/indices

1
2
3
green open .kibana_task_manager_1   wJwVPfDPTbWhGFrpnFq6ag 1 0 2 0 12.5kb 12.5kb
green open .apm-agent-configuration 9HdINrLnTjOhHSHjtFHaiQ 1 0 0 0 283b 283b
green open .kibana_1 JjDT586BTdayZm-jpeB2vg 1 0 8 0 28.6kb 28.6kb

6、ES 的批量操作——bulk

匹配导入数据

POST http://192.168.56.10:9200/customer/external/_bulk

1
2
3
4
5
6
两行为一个整体
{"index":{"_id":"1"}}
{"name":"a"}
{"index":{"_id":"2"}}
{"name":"b"}
注意格式json和text均不可,要去kibana里Dev Tools
image-20210618221652900

语法格式:

1
2
3
4
5
{action:{metadata}}\n
{request body }\n

{action:{metadata}}\n
{request body }\n

​ 这里的批量操作,当发生某一条执行失败时,其他的数据仍然能够接着执行,也就是说彼此之间是独立的。

​ bulk api 以此按顺序执行的所有的 action(动作)。如果一个单个的动作因任何原因失败,它将继续处理它后面剩余的动作。当 bulk api 返回时,它将提供每个动作的状态(与发送的顺序相同),所以您可以检查是否一个指定的动作是否失败。

实例 1: 执行多条数据

1
2
3
4
5
POST /customer/external/_bulk
{"index":{"_id":"1"}}
{"name":"John Doe"}
{"index":{"_id":"2"}}
{"name":"John Doe"}

执行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#! Deprecation: [types removal] Specifying types in bulk requests is deprecated.
{
"took" : 285,
"errors" : false,
"items" : [
{
"index" : {
"_index" : "customer",
"_type" : "external",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
},
{
"index" : {
"_index" : "customer",
"_type" : "external",
"_id" : "2",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1,
"status" : 201
}
}
]
}

实例 2:对于整个索引执行批量操作

1
2
3
4
5
6
7
8
POST /_bulk
{"delete":{"_index":"website","_type":"blog","_id":"123"}}
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"my first blog post"}
{"index":{"_index":"website","_type":"blog"}}
{"title":"my second blog post"}
{"update":{"_index":"website","_type":"blog","_id":"123"}}
{"doc":{"title":"my updated blog post"}}

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
#! Deprecation: [types removal] Specifying types in bulk requests is deprecated.
{
"took" : 370,
"errors" : false,
"items" : [
{
"delete" : {
"_index" : "website",
"_type" : "blog",
"_id" : "123",
"_version" : 1,
"result" : "not_found", // 没有该记录
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 404
}
},
{
"create" : {
"_index" : "website",
"_type" : "blog",
"_id" : "123",
"_version" : 2,
"result" : "created", // 创建
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1,
"status" : 201
}
},
{
"index" : { // 保存
"_index" : "website",
"_type" : "blog",
"_id" : "F2KBH3oBzwta_wrj-92p",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1,
"status" : 201
}
},
{
"update" : { // 更新
"_index" : "website",
"_type" : "blog",
"_id" : "123",
"_version" : 3,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 3,
"_primary_term" : 1,
"status" : 200
}
}
]
}

7、样本测试数据

准备了一份顾客银行账户信息的虚构的 JSON 文档样本。每个文档都有下列的 schema(模式)。

1
2
3
4
5
6
7
8
9
10
11
12
13
{
"account_number": 1,
"balance": 39225,
"firstname": "Amber",
"lastname": "Duke",
"age": 32,
"gender": "M",
"address": "880 Holmes Lane",
"employer": "Pyrami",
"email": "amberduke@pyrami.com",
"city": "Brogan",
"state": "IL"
}

https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json , 导入测试数据

考虑目前的版本已经没有 accounts.json,我们的版本是 7.4.2 ,可以在 github 中把你现在使用的版本下载下来。

image-20210618225111364

1
2
POST bank/account/_bulk
上面的数据

image-20210618225153228

1
2
3
http://192.168.56.10:9200/_cat/indices
刚导入了1000
yellow open bank 99m64ElxRuiH46wV7RjXZA 1 1 1000 0 427.8kb 427.8kb

四、进阶语法

1、search 检索文档

ES 支持两种基本方式检索:

  • 通过 REST request url 发送检索信息 (uri + 检索参数);
  • 通过 REST request body 来发送它们(uri+请求体)

信息检索

API: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/getting-started-search.html

1
2
3
4
5
6
7
8
9
GET bank/_search?q=*&sort=account_number:asc

说明:
q=* # 查询所有
sort # 排序字段
asc # 升序

检索bank下所有的信息,包括type和docs
GET bank/_search

返回的结果:

image-20210622223943079

  • took - 花费多少 ms 搜索
  • timed_out - 是否超时
  • _shards - 多少分片被搜索了,以及多少成功/失败的搜索分片
  • max_score - 文档相关性最高得分
  • hits.total.value - 多少匹配文档被找到
  • hits.sort - 结果的排序 key(列),没有的话按照 score 排序
  • hits._score -相关的得分 (not applicable when using match_all)
1
2
3
GET bank/_search?q=*&sort=account_number:asc

检索了1000条数据,但是根据相关性算法,只返回10

uri + 请求体进行检索

1
2
3
4
5
6
7
8
GET bank/_search
{
"query":{"match_all": {}},
"sort":[
{"account_number": "asc"} ,
{"balance": "desc"}
]
}

​ POSTMAN 中 get 不能携带请求体,我们变为 post 也是一样的,我们 post 一个 json 风格的查询的请求体到 _search 需要了解,一旦搜索的结果被返回,es 就完成了这次请求,不能切不会维护任何服务端的资源或者结果的 cursor 游标。

2、DSL 领域特定语言

如何写复杂查询

​ Elasticsearch 提供一个可以执行查询的 json 风格的 DSL(domain-specific language 领域特定语言)。这个被称为 Query DSL,该查询语言非常全面。

2.1 基本语法格式

一个查询语句的典型结构

1
2
3
4
5
6
7
8
9
如果针对于某个字段,那么它的结构如下:
{
QUERY_NAME: { # 使用的功能
FIELD_NAME: { # 功能参数
ARGUMENT:VALUE,
ARGUMENT:VALUE,...
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
GET bank/_search
{
"query":{ # 查询的字段
"match_all":{}
},
"from":0, # 从第几条文档开始查
"size":5,
"_source":["balance"],
"sort":[
{
"account_number":{ # 返回结果按哪个列排序
"order":"desc" # 降序
}
}
]
}

_source为要返回的字段

query 定义如何查询:

  1. match_all 查询类型【代表查询所有的所有】,es 中可以在 query 中的组合非常重的查询类型完成复杂查询;
  2. 除了 query 参数之外,我们也可以传递其他参数可改变查询结果,如 sort、size;
  3. from + size 限定,完成分页功能;
  4. sort 排序,多字段排序,会在前序字段相等时后内部排序,否则以前序为准;

2.2 返回部分字段

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
GET bank/_search
{
"query": {
"match_all": {}
},
"from": 0,
"size": 5,
"sort": [
{
"account_number": {
"order": "desc"
}
}
],
"_source": ["balance","firstname"]

}

查询结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
{
"took": 18,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1000,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "999",
"_score": null,
"_source": {
"firstname": "Dorothy",
"balance": 6087
},
"sort": [999]
},
{
"_index": "bank",
"_type": "account",
"_id": "998",
"_score": null,
"_source": {
"firstname": "Letha",
"balance": 16869
},
"sort": [998]
},
{
"_index": "bank",
"_type": "account",
"_id": "997",
"_score": null,
"_source": {
"firstname": "Combs",
"balance": 25311
},
"sort": [997]
},
{
"_index": "bank",
"_type": "account",
"_id": "996",
"_score": null,
"_source": {
"firstname": "Andrews",
"balance": 17541
},
"sort": [996]
},
{
"_index": "bank",
"_type": "account",
"_id": "995",
"_score": null,
"_source": {
"firstname": "Phelps",
"balance": 21153
},
"sort": [995]
}
]
}
}

2.3 match 比配查询

基本查询(非字符串),“account_number”: 20 可加可不加“ ”,不加就是精确匹配

1
2
3
4
5
6
7
8
GET bank/_search
{
"query":{
"match": {
"account_number": "20"
}
}
}

match 返回 account-number=20 数据

查询结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "20",
"_score": 1.0,
"_source": {
"account_number": 20,
"balance": 16418,
"firstname": "Elinor",
"lastname": "Ratliff",
"age": 36,
"gender": "M",
"address": "282 Kings Place",
"employer": "Scentric",
"email": "elinorratliff@scentric.com",
"city": "Ribera",
"state": "WA"
}
}
]
}
}

字符串,全文检索“ ” 模糊查询

1
2
3
4
5
6
7
8
GET bank/_search
{
"query":{
"match": {
"address": "kings"
}
}
}

全文检索,最终会按照评分进行排序,会对检索条件进行匹配。

查询结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 5.9908285,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "20",
"_score": 5.9908285,
"_source": {
"account_number": 20,
"balance": 16418,
"firstname": "Elinor",
"lastname": "Ratliff",
"age": 36,
"gender": "M",
"address": "282 Kings Place",
"employer": "Scentric",
"email": "elinorratliff@scentric.com",
"city": "Ribera",
"state": "WA"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "722",
"_score": 5.9908285,
"_source": {
"account_number": 722,
"balance": 27256,
"firstname": "Roberts",
"lastname": "Beasley",
"age": 34,
"gender": "F",
"address": "305 Kings Hwy",
"employer": "Quintity",
"email": "robertsbeasley@quintity.com",
"city": "Hayden",
"state": "PA"
}
}
]
}
}

2.4 match_phrase [短语匹配]

将需要匹配的值当成一个整个单词(不分词)进行检索

1
2
3
4
5
6
7
8
GET bank/_search
{
"query" :{
"match_phrase": {
"address": "mill road"
}
}
}

查出 address 中包含 mill_road 的所有记录,并给出相关性得分

查询结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 8.926605,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "970",
"_score": 8.926605,
"_source": {
"account_number": 970,
"balance": 19648,
"firstname": "Forbes",
"lastname": "Wallace",
"age": 28,
"gender": "M",
"address": "990 Mill Road",
"employer": "Pheast",
"email": "forbeswallace@pheast.com",
"city": "Lopezo",
"state": "AK"
}
}
]
}
}

注意:

match_phrase 和 match 的区别,观察如下的实例:

  • match_phrase 是做短语匹配
  • match 是分词匹配,例如 990 Mill 匹配含有 990 或者 Mill 的结果
1
2
3
4
5
6
7
8
GET bank/_search
{
"query":{
"match_phrase": {
"address": "990 Mill"
}
}
}

查询结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 10.806405,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "970",
"_score": 10.806405,
"_source": {
"account_number": 970,
"balance": 19648,
"firstname": "Forbes",
"lastname": "Wallace",
"age": 28,
"gender": "M",
"address": "990 Mill Road",
"employer": "Pheast",
"email": "forbeswallace@pheast.com",
"city": "Lopezo",
"state": "AK"
}
}
]
}
}

使用 match 的 keyword

1
2
3
4
5
6
7
8
GET bank/_search
{
"query": {
"match": {
"address.keyword": "990 Mill"
}
}
}

查询结果,一条也未匹配到

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}

修改匹配数据为“990 Mill Road”

1
2
3
4
5
6
7
8
GET bank/_search
{
"query": {
"match": {
"address.keyword": "990 Mill Road"
}
}
}

查询出一条数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 6.5032897,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "970",
"_score": 6.5032897,
"_source": {
"account_number": 970,
"balance": 19648,
"firstname": "Forbes",
"lastname": "Wallace",
"age": 28,
"gender": "M",
"address": "990 Mill Road",
"employer": "Pheast",
"email": "forbeswallace@pheast.com",
"city": "Lopezo",
"state": "AK"
}
}
]
}
}

文本字段的匹配,使用 keyword,匹配的条件就是显示字段的全部值,要进行精确匹配的。

match_phrase 是做短语匹配,只要文本中包含匹配条件即包含这个短语,就能匹配到。

2.5 multi_math【多字段匹配】

1
2
3
4
5
6
7
8
9
10
11
12
GET bank/_search
{
"query":{
"multi_match": {
"query": "mill",
"fields": [
"state" ,
"address"
]
}
}
}

state 或者 address 中包含 mill,并且在查询过程中,会对于查询条件分词

查询结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 5.4032025,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "970",
"_score": 5.4032025,
"_source": {
"account_number": 970,
"balance": 19648,
"firstname": "Forbes",
"lastname": "Wallace",
"age": 28,
"gender": "M",
"address": "990 Mill Road",
"employer": "Pheast",
"email": "forbeswallace@pheast.com",
"city": "Lopezo",
"state": "AK"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "136",
"_score": 5.4032025,
"_source": {
"account_number": 136,
"balance": 45801,
"firstname": "Winnie",
"lastname": "Holland",
"age": 38,
"gender": "M",
"address": "198 Mill Lane",
"employer": "Neteria",
"email": "winnieholland@neteria.com",
"city": "Urie",
"state": "IL"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "345",
"_score": 5.4032025,
"_source": {
"account_number": 345,
"balance": 9812,
"firstname": "Parker",
"lastname": "Hines",
"age": 38,
"gender": "M",
"address": "715 Mill Avenue",
"employer": "Baluba",
"email": "parkerhines@baluba.com",
"city": "Blackgum",
"state": "KY"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "472",
"_score": 5.4032025,
"_source": {
"account_number": 472,
"balance": 25571,
"firstname": "Lee",
"lastname": "Long",
"age": 32,
"gender": "F",
"address": "288 Mill Street",
"employer": "Comverges",
"email": "leelong@comverges.com",
"city": "Movico",
"state": "MT"
}
}
]
}
}

2.6 bool 用来做复合查询

复合语句可以合并,任何其他查询语句,包括符合语句。这也就意味着,复合语句之家可以相互嵌套,可以表达非常复杂的逻辑。

must: 必须达到 must 所列举的所有条件

实例:查询 gender=m,并且 address=mill 的数据

1
2
3
4
5
6
7
8
9
10
11
GET bank/_search
{
"query":{
"bool":{
"must":[
{"match":{"address": "mill"}},
{"match": {"gender": "M"}}
]
}
}
}

查询结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 6.0824604,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "970",
"_score": 6.0824604,
"_source": {
"account_number": 970,
"balance": 19648,
"firstname": "Forbes",
"lastname": "Wallace",
"age": 28,
"gender": "M",
"address": "990 Mill Road",
"employer": "Pheast",
"email": "forbeswallace@pheast.com",
"city": "Lopezo",
"state": "AK"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "136",
"_score": 6.0824604,
"_source": {
"account_number": 136,
"balance": 45801,
"firstname": "Winnie",
"lastname": "Holland",
"age": 38,
"gender": "M",
"address": "198 Mill Lane",
"employer": "Neteria",
"email": "winnieholland@neteria.com",
"city": "Urie",
"state": "IL"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "345",
"_score": 6.0824604,
"_source": {
"account_number": 345,
"balance": 9812,
"firstname": "Parker",
"lastname": "Hines",
"age": 38,
"gender": "M",
"address": "715 Mill Avenue",
"employer": "Baluba",
"email": "parkerhines@baluba.com",
"city": "Blackgum",
"state": "KY"
}
}
]
}
}

must_not,必须不匹配 must_not 所列举的所有条件。

实例:查询 gender=m,并且 address=mill 的数据,但是 age 不等于 38 的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
GET bank/_search
{
"query":{
"bool":{
"must":[
{
"match":
{
"gender": "M"
}
},
{
"match": {
"address": "mill"
}
}
],
"must_not":[
{
"match":{
"age":"38"
}
}
]
}
}
}

查询结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 6.0824604,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "970",
"_score": 6.0824604,
"_source": {
"account_number": 970,
"balance": 19648,
"firstname": "Forbes",
"lastname": "Wallace",
"age": 28,
"gender": "M",
"address": "990 Mill Road",
"employer": "Pheast",
"email": "forbeswallace@pheast.com",
"city": "Lopezo",
"state": "AK"
}
}
]
}
}

should,应该满足 should 多列举的条件

如果达到增加相关文档的评分,并不会改变查询的结果。如果 query 中只有 should 且只有一种匹配规则,那么 should 的条件就会被作为默认条件二区改变查询结果。

实例:匹配 lastName 应该属于 Wallace 的数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
GET bank/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"gender": "M"
}
},
{
"match": {
"address": "mill"
}
}
],
"must_not": [
{
"match": {
"age": "18"
}
}
],
"should": [
{
"match": {
"lastname": "Wallace"
}
}
]
}
}
}

查询结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 12.585751,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "970",
"_score": 12.585751,
"_source": {
"account_number": 970,
"balance": 19648,
"firstname": "Forbes",
"lastname": "Wallace",
"age": 28,
"gender": "M",
"address": "990 Mill Road",
"employer": "Pheast",
"email": "forbeswallace@pheast.com",
"city": "Lopezo",
"state": "AK"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "136",
"_score": 6.0824604,
"_source": {
"account_number": 136,
"balance": 45801,
"firstname": "Winnie",
"lastname": "Holland",
"age": 38,
"gender": "M",
"address": "198 Mill Lane",
"employer": "Neteria",
"email": "winnieholland@neteria.com",
"city": "Urie",
"state": "IL"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "345",
"_score": 6.0824604,
"_source": {
"account_number": 345,
"balance": 9812,
"firstname": "Parker",
"lastname": "Hines",
"age": 38,
"gender": "M",
"address": "715 Mill Avenue",
"employer": "Baluba",
"email": "parkerhines@baluba.com",
"city": "Blackgum",
"state": "KY"
}
}
]
}
}

能够看到相关度越高,得分越高。

2.7 Filter【结果过滤】

并不是所有的查询都需要产生分数,特别是哪些仅用于 filtering 过滤的文档,为了不计算分数,elasticsearch 会自动检查场景并且优化查询的执行。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
GET bank/_search
{
"query":{
"bool":{
"must":[
{"match":{"address":"mill"}}
],
"filter":{
"range":{
"balance":{
"gte":"10000",
"lte":"20000"
}
}
}
}
}
}

这里先是查询所有匹配 address=mill 的文档,然后再根据 10000<=balance<=20000 进行过滤查询结果

查询结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 5.4032025,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "970",
"_score": 5.4032025,
"_source": {
"account_number": 970,
"balance": 19648,
"firstname": "Forbes",
"lastname": "Wallace",
"age": 28,
"gender": "M",
"address": "990 Mill Road",
"employer": "Pheast",
"email": "forbeswallace@pheast.com",
"city": "Lopezo",
"state": "AK"
}
}
]
}
}

Each must, should, and must_not element in a Boolean query is referred to as a query clause. How well a document meets the criteria in each must or should clause contributes to the document’s relevance score. The higher the score, the better the document matches your search criteria. By default, Elasticsearch returns documents ranked by these relevance scores.

在 boolean 查询中,must, shouldmust_not 元素都被称为查询子句 。 文档是否符合每个“must”或“should”子句中的标准,决定了文档的“相关性得分”。 得分越高,文档越符合您的搜索条件。 默认情况下,Elasticsearch 返回根据这些相关性得分排序的文档。

The criteria in a must_not clause is treated as a filter. It affects whether or not the document is included in the results, but does not contribute to how documents are scored. You can also explicitly specify arbitrary filters to include or exclude documents based on structured data.

“must_not”子句中的条件被视为“过滤器”。 它影响文档是否包含在结果中, 但不影响文档的评分方式。 还可以显式地指定任意过滤器来包含或排除基于结构化数据的文档。

filter 在使用过程中,并不会计算相关得分_score:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
GET bank/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"address": "mill"
}
}
],
"filter": {
"range": {
"balance": {
"gte": "10000",
"lte": "20000"
}
}
}
}
}
}
//gte:>= lte:<=

查询结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 213,
"relation": "eq"
},
"max_score": 0.0,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "20",
"_score": 0.0,
"_source": {
"account_number": 20,
"balance": 16418,
"firstname": "Elinor",
"lastname": "Ratliff",
"age": 36,
"gender": "M",
"address": "282 Kings Place",
"employer": "Scentric",
"email": "elinorratliff@scentric.com",
"city": "Ribera",
"state": "WA"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "37",
"_score": 0.0,
"_source": {
"account_number": 37,
"balance": 18612,
"firstname": "Mcgee",
"lastname": "Mooney",
"age": 39,
"gender": "M",
"address": "826 Fillmore Place",
"employer": "Reversus",
"email": "mcgeemooney@reversus.com",
"city": "Tooleville",
"state": "OK"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "51",
"_score": 0.0,
"_source": {
"account_number": 51,
"balance": 14097,
"firstname": "Burton",
"lastname": "Meyers",
"age": 31,
"gender": "F",
"address": "334 River Street",
"employer": "Bezal",
"email": "burtonmeyers@bezal.com",
"city": "Jacksonburg",
"state": "MO"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "56",
"_score": 0.0,
"_source": {
"account_number": 56,
"balance": 14992,
"firstname": "Josie",
"lastname": "Nelson",
"age": 32,
"gender": "M",
"address": "857 Tabor Court",
"employer": "Emtrac",
"email": "josienelson@emtrac.com",
"city": "Sunnyside",
"state": "UT"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "121",
"_score": 0.0,
"_source": {
"account_number": 121,
"balance": 19594,
"firstname": "Acevedo",
"lastname": "Dorsey",
"age": 32,
"gender": "M",
"address": "479 Nova Court",
"employer": "Netropic",
"email": "acevedodorsey@netropic.com",
"city": "Islandia",
"state": "CT"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "176",
"_score": 0.0,
"_source": {
"account_number": 176,
"balance": 18607,
"firstname": "Kemp",
"lastname": "Walters",
"age": 28,
"gender": "F",
"address": "906 Howard Avenue",
"employer": "Eyewax",
"email": "kempwalters@eyewax.com",
"city": "Why",
"state": "KY"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "183",
"_score": 0.0,
"_source": {
"account_number": 183,
"balance": 14223,
"firstname": "Hudson",
"lastname": "English",
"age": 26,
"gender": "F",
"address": "823 Herkimer Place",
"employer": "Xinware",
"email": "hudsonenglish@xinware.com",
"city": "Robbins",
"state": "ND"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "222",
"_score": 0.0,
"_source": {
"account_number": 222,
"balance": 14764,
"firstname": "Rachelle",
"lastname": "Rice",
"age": 36,
"gender": "M",
"address": "333 Narrows Avenue",
"employer": "Enaut",
"email": "rachellerice@enaut.com",
"city": "Wright",
"state": "AZ"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "227",
"_score": 0.0,
"_source": {
"account_number": 227,
"balance": 19780,
"firstname": "Coleman",
"lastname": "Berg",
"age": 22,
"gender": "M",
"address": "776 Little Street",
"employer": "Exoteric",
"email": "colemanberg@exoteric.com",
"city": "Eagleville",
"state": "WV"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "272",
"_score": 0.0,
"_source": {
"account_number": 272,
"balance": 19253,
"firstname": "Lilly",
"lastname": "Morgan",
"age": 25,
"gender": "F",
"address": "689 Fleet Street",
"employer": "Biolive",
"email": "lillymorgan@biolive.com",
"city": "Sunbury",
"state": "OH"
}
}
]
}
}

能看到所有文档的 “_score” : 0.0。

2.8 term

和 match 一样。匹配某个属性值。全文检索字段用 match,其他非 text 字段匹配用 term.

Avoid using the term query for text fields.

避免对文本字段使用“term”查询

By default, Elasticsearch changes the values of text fields as part of analysis. This can make finding exact matches for text field values difficult.

默认情况下,Elasticsearch 作为analysis的一部分更改’ text ‘字段的值。这使得为“text”字段值寻找精确匹配变得困难。

To search text field values, use the match.

要搜索“text”字段值,请使用匹配。

https://www.elastic.co/guide/en/elasticsearch/reference/7.6/query-dsl-term-query.html

使用 term 匹配查询

1
2
3
4
5
6
7
8
GET bank/_search
{
"query":{
"term": {
"age":"28"
}
}
}

如果是 text 则查不到

1
2
3
4
5
6
7
8
GET bank/_search
{
"query": {
"term": {
"gender" : "F"
}
}
}

image-20210626183135965

一条也没有匹配到

而更换为 match 匹配时,能够匹配到 32 个文档 1

image-20210626183227857

也就是说,全文检索字段用 match,其他非 text 字段匹配用 term

2.9 Aggregation(执行聚合)

聚合提供了从数据中分组和提取数据的能力。最简单的聚合方法大致等于 SQL Group by 和 SQL 聚合函数,在 elasticsearch 中,执行搜索返回 this(命中结果),并且同时返回聚合结果,把以响应中的所有 this(命中结果)分隔开的能力。这是非常强大且有效的,你可以执行查询和多个聚合,并且在一次使用中得到各自的(任何一个的)返回结果,使用一次简化的 API 避免网络往返。

size: 0 不显示搜索数据

aggs: 执行聚合

语法如下:

1
2
3
4
5
"aggs":{
"aggs_name这次聚合的名字,方便展示在结果集中":{
"AGG_TYPE聚合的类型(avg,term,terms)":{}
}
}

搜索 address 中包含 mill 的所有人的年龄分布以及平均年龄,但不显示这些人的详情

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
GET bank/_search
{
"query":{
"match": {
"address": "Mill"
}
},
"aggs":{
"ageAgg":{
"terms": {
"field": "age",
"size": 10
}
},
"aggAvg":{
"avg": {
"field": "age"
}
},
"balanceAvg":{
"avg":{
"field": "balance"
}
}
},
"size":0
}
//ageAgg:聚合名字 terms:聚合类型 "field": "age":按照age字段聚合 size:10:取出前十种age
//avg:平均值聚合类型
//不显示这些人的详情,只看聚合结果

查询结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"aggAvg": {
"value": 34.0
},
"ageAgg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 38,
"doc_count": 2
},
{
"key": 28,
"doc_count": 1
},
{
"key": 32,
"doc_count": 1
}
]
},
"balanceAvg": {
"value": 25208.0
}
}
}

复杂:按照年龄聚合,并且求这些年龄段的这些人的平均薪资

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
GET bank/_search
{
"query":{
"match_all": {}
},
"aggs":{
"ageAgg":{
"terms": {
"field": "age",
"size": 100
},
"aggs":{
"ageAvg":{
"avg": {
"field": "balance"
}
}
}
}
}
}

输出结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
{
"took" : 44,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "amberduke@pyrami.com",
"city" : "Brogan",
"state" : "IL"
}
},
.....省略
]
},
"aggregations" : {
"ageAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 31,
"doc_count" : 61,
"ageAvg" : {
"value" : 28312.918032786885
}
},
{
"key" : 39,
"doc_count" : 60,
"ageAvg" : {
"value" : 25269.583333333332
}
},
....省略

]
}
}
}

查出所有年龄分布,并且这些年龄段中 M 的平均薪资和 F 的平均薪资以及这个年龄段的总体平均薪资

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
GET bank/_search
{
"query":{
"match_all": {}
},
"aggs":{
"ageAgg":{
"terms": {
"field": "age",
"size": 10
},
"aggs":{
"genderAgg":{
"terms":{
"field": "gender.keyword"
},
"aggs": {
"balanceAvg":{
"avg":{
"field": "balance"
}
}
}
},
"ageBalanceAvg":{
"avg": {
"field": "balance"
}
}
}
}
},
"size":0
}

查询结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
{
"took" : 119,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"ageAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 31,
"doc_count" : 61,
"genderAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "M",
"doc_count" : 35,
"balanceAvg" : {
"value" : 29565.628571428573
}
},
{
"key" : "F",
"doc_count" : 26,
"balanceAvg" : {
"value" : 26626.576923076922
}
}
]
},
"ageBalanceAvg" : {
"value" : 28312.918032786885
}
}
]
.......//省略其他
}
}
}

五、Mapping

1、字段类型

image-20210626223158965

2、映射

**Mapping(映射)**:是用来定义文档(document),以及它所包含的属性(field)是如何存储和索引的。比如:使用 mapping 来定义:

  • 哪些字符串属性应该被看作全文本属性(full text fields);
  • 哪些属性包含数字,日期或地理位置
  • 文档中的所有属性是否都能被索引(_all 配置)
  • 日期格式
  • 自定义映射规则来执行动态添加属性。

查看 mapping 信息

GET bank/_mapping

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
{
"bank": {
"mappings": {
"properties": {
"account_number": {
"type": "long"
},
"address": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"age": {
"type": "long"
},
"balance": {
"type": "long"
},
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"email": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"employer": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"firstname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"gender": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"state": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}

修改 mapping 信息

官方文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

自动猜测的映射类型

image-20210626225344047

3、新版本改变

ElasticSearch7-去掉 type 概念

  1. 关系型数据库中两个数据表示是独立的,即使他们里面有相同名称的列也不影响使用,但 ES 中不是这样的。elasticsearch 是基于 Lucene 开发的搜索引擎,而 ES 中不同 type 下名称相同的 filed 最终在 Lucene 中的处理方式是一样的。

    • 两个不同 type 下的两个 user_name,在 ES 同一个索引下其实被认为是同一个 filed,你必须在两个不同的 type 中定义相同的 filed 映射。否则,不同 type 中的相同字段名称就会在处理中出现冲突的情况,导致 Lucene 处理效率下降。
    • 去掉 type 就是为了提高 ES 处理数据的效率。
  2. Elasticsearch 7.x URL 中的 type 参数为可选。比如,索引一个文档不再要求提供文档类型。

  3. Elasticsearch 8.x 不再支持 URL 中的 type 参数。

  4. 解决:
    将索引从多类型迁移到单类型,每种类型文档一个独立索引

    将已存在的索引下的类型数据,全部迁移到指定位置即可。详见数据迁移

官方:

Elasticsearch 7.x

  • Specifying types in requests is deprecated. For instance, indexing a document no longer requires a document type. The new index APIs are PUT {index}/_doc/{id} in case of explicit ids and POST {index}/_doc for auto-generated ids. Note that in 7.0, _doc is a permanent part of the path, and represents the endpoint name rather than the document type.
  • The include_type_name parameter in the index creation, index template, and mapping APIs will default to false. Setting the parameter at all will result in a deprecation warning.
  • The _default_ mapping type is removed.

Elasticsearch 8.x

  • Specifying types in requests is no longer supported.
  • The include_type_name parameter is removed.

3.1 创建索引

创建索引并指定属性的映射规则(相当于新建表并制定字段和字段类型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
PUT /my_index
{
"mappings":{
"properties":{
"age":{
"type":"integer"
},
"email":{
"type":"keyword"
},
"name":{
"type":"text"
}
}
}
}

输出:

image-20210626230845738

3.2 查看映射

1
GET /my_index

输出结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// "index": false 是否被索引即能检索到,默认是true
{
"my_index": {
"aliases": {},
"mappings": {
"properties": {
"age": {
"type": "integer"
},
"email": {
"type": "keyword"
},
"name": {
"type": "text"
}
}
},
"settings": {
"index": {
"creation_date": "1624720095622",
"number_of_shards": "1",
"number_of_replicas": "1",
"uuid": "jGHXQpc6RpeUwJNAHu7AgQ",
"version": {
"created": "7040299"
},
"provided_name": "my_index"
}
}
}
}

3.3 添加新的字段映射

1
2
3
4
5
6
7
8
9
PUT /my_index/_mapping
{
"properties":{
"employee-id":{
"type":"keyword",
"index": false
}
}
}

这里的 “index”: false, 表明新增字段不能被检索,只是一个冗余字段。

3.4 更新字段

对于已经存在的字段映射,我们不能更新。更新必须创建新的索引,进行数据迁移。

3.5 数据迁移

先创建 new_twitter 的正确映射。然后使用如下方式进行数据迁移。

1
2
3
4
5
6
7
8
9
POST reindex [固定写法]
{
"source":{
"index":"twitter"
},
"dest":{
"index":"new_twitters"
}
}

更多详情见: https://www.elastic.co/guide/en/elasticsearch/reference/7.6/docs-reindex.html

GET /bank/_search

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "amberduke@pyrami.com",
"city" : "Brogan",
"state" : "IL"
}
}
.......

想将年龄修改为 integer

GET /bank/_search

image-20210626232553134

  • 创建 newbank 索引
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
PUT /newbank
{
"mappings": {
"properties": {
"account_number": {
"type": "long"
},
"address": {
"type": "text"
},
"age": {
"type": "integer"
},
"balance": {
"type": "long"
},
"city": {
"type": "keyword"
},
"email": {
"type": "keyword"
},
"employer": {
"type": "keyword"
},
"firstname": {
"type": "text"
},
"gender": {
"type": "keyword"
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"state": {
"type": "keyword"
}
}
}
}
  • 查看“newbank”的映射:

GET /newbank/_mapping

能够看到 age 的映射类型被修改为了 integer

image-20210626232949187

  • 将 bank 中的数据迁移到 newbank 中
1
2
3
4
5
6
7
8
9
10
POST _reindex
{
"source":{
"index":"bank",
"type":"account"
},
"dest":{
"index":"newbank"
}
}

运行输出:

image-20210626233550951

查看 newbank 中的数据

image-20210626233650289

六、分词

一个 tokenizer(分词器)接收一个字符流,将之分割为独立的tokens词元,通常是独立的单词),然后输出 tokens 流。

例如:whitespace tokenizer 遇到空白字符时分割文本。它会将文本"Quick brown fox!"分割为[Quick,brown,fox!]

该 tokenizer(分词器)还负责记录各个 terms(词条)的顺序或 position 位置(用于 phrase 短语和 word proximity 词近邻查询),以及 term(词条)所代表的原始 word(单词)的 start(起始)和 end(结束)的 character offsets(字符串偏移量)(用于高亮显示搜索的内容)。

elasticsearch 提供了很多内置的分词器(标准分词器),可以用来构建 custom analyzers(自定义分词器)。

关于分词器https://www.elastic.co/guide/en/elasticsearch/reference/7.6/analysis.html

1
2
3
4
5
POST _analyze
{
"analyzer": "standard",
"text": "The 2 Brown-Foxes bone"
}

测试结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
{
"tokens": [
{
"token": "the",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "2",
"start_offset": 4,
"end_offset": 5,
"type": "<NUM>",
"position": 1
},
{
"token": "brown",
"start_offset": 6,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "foxes",
"start_offset": 12,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "bone",
"start_offset": 18,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 4
}
]
}

对于中文,我们需要安装额外的分词器

1、安装 ik 分词器

所有的语言分词,默认使用的都是”Standard Analyzer”,但是这部分分词器针对于中文的分词,并不友好。为此需要安装中文的分词器。

注意: 不能使用默认的 elasticsearch-plugin install xxx.zip 进行安装

https://github.com/medcl/elasticsearch-analysis-ik/releases

在前面安装的 elasticsearch 时,我们已经将 elasticsearch 容器的“/usr/share/elasticsearch/plugins”目录,映射到宿主机的“ /mydata/elasticsearch/plugins”目录下,所以比较方便的做法就是下载“/elasticsearch-analysis-ik-7.4.2.zip”文件,然后解压到该文件夹下即可。安装完毕后,需要重启 elasticsearch 容器。

如果不嫌麻烦,还可以采用如下的方式。

  1. 查看 elasticsearch 版本号:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[root@localhost ~]#  curl http://localhost:9200
{
"name" : "ce20fcb8d039",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "hle-F0zaRHKN-H1Qnj4tZA",
"version" : {
"number" : "7.4.2",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "2f90bbf7b93631e52bafb59b3b049cb44ec25e96",
"build_date" : "2019-10-28T20:40:44.881551Z",
"build_snapshot" : false,
"lucene_version" : "8.2.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
  1. 进入 es 内部 plugs 目录
  • docker exec -it 容器 id /bin/bash
1
2
3
4
5
6
7
8
9
[vagrant@localhost ~]$ sudo docker exec -it elasticsearch /bin/bash

[root@66718a266132 elasticsearch]# pwd
/usr/share/elasticsearch
[root@66718a266132 elasticsearch]# pwd
/usr/share/elasticsearch
[root@66718a266132 elasticsearch]# yum install wget
#下载ik7.4.2
[root@66718a266132 elasticsearch]# wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip
  • unzip 下载的文件
1
2
3
4
5
6
7
[root@66718a266132 elasticsearch]# unzip elasticsearch-analysis-ik-7.4.2.zip -d ik

#移动到plugins目录下
[root@66718a266132 elasticsearch]# mv ik plugins/
chmod -R 777 plugins/ik

docker restart elasticsearch
  • rm -rf *.zip
1
[root@66718a266132 elasticsearch]# rm -rf elasticsearch-analysis-ik-7.6.2.zip

2、测试分词器

使用默认分词器

1
2
3
4
5
GET _analyze
{
"analyzer": "standard",
"text":"我是中国人"
}

请观察执行结果:

image-20210627104644371

1
2
3
4
5
GET _analyze
{
"analyzer": "ik_smart",
"text":"我是中国人"
}

输出结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
"tokens": [
{
"token": "我",
"start_offset": 0,
"end_offset": 1,
"type": "CN_CHAR",
"position": 0
},
{
"token": "是",
"start_offset": 1,
"end_offset": 2,
"type": "CN_CHAR",
"position": 1
},
{
"token": "中国人",
"start_offset": 2,
"end_offset": 5,
"type": "CN_WORD",
"position": 2
}
]
}
1
2
3
4
5
GET _analyze
{
"analyzer": "ik_max_word",
"text":"我是中国人"
}

输出结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
{
"tokens": [
{
"token": "我",
"start_offset": 0,
"end_offset": 1,
"type": "CN_CHAR",
"position": 0
},
{
"token": "是",
"start_offset": 1,
"end_offset": 2,
"type": "CN_CHAR",
"position": 1
},
{
"token": "中国人",
"start_offset": 2,
"end_offset": 5,
"type": "CN_WORD",
"position": 2
},
{
"token": "中国",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 3
},
{
"token": "国人",
"start_offset": 3,
"end_offset": 5,
"type": "CN_WORD",
"position": 4
}
]
}

3、自定义词库

比如我们要把尚硅谷算作一个词

  • 修改/usr/share/elasticsearch/plugins/ik/config 中的 IKAnalyzer.cfg.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict"></entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">http://192.168.56.10/es/fenci.txt</entry>
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

修改完成后,需要重启 elasticsearch 容器,否则修改不生效。docker restart elasticsearch

更新完成后,es 只会对于新增的数据用更新分词。历史数据是不会重新分词的。如果想要历史数据重新分词,需要执行:

1
POST my_index/_update_by_query?conflicts=proceed

安装 Nginx 请查看 补充

1
2
3
4
mkdir /mydata/nginx/html/es
cd /mydata/nginx/html/es
vim fenci.txt
输入尚硅谷

测试http://192.168.56.10/es/fenci.txt

往 fenci.txt 文件添加内容:

1
echo "樱桃萨其马,带你甜蜜入夏" > /mydata/nginx/html/fenci.txt

测试效果:

1
2
3
4
5
GET _analyze
{
"analyzer": "ik_max_word",
"text":"樱桃萨其马,带你甜蜜入夏"
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
{
"tokens": [
{
"token": "樱桃",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
{
"token": "萨其马",
"start_offset": 2,
"end_offset": 5,
"type": "CN_WORD",
"position": 1
},
{
"token": "带你",
"start_offset": 6,
"end_offset": 8,
"type": "CN_WORD",
"position": 2
},
{
"token": "甜蜜",
"start_offset": 8,
"end_offset": 10,
"type": "CN_WORD",
"position": 3
},
{
"token": "入夏",
"start_offset": 10,
"end_offset": 12,
"type": "CN_WORD",
"position": 4
}
]
}

七、elasticsearch-Rest-Client

java 操作 es 有两种方式

1、9300: TCP

  • spring-data-elasticsearch:transport-api.jar;
    • springboot 版本不同,ransport-api.jar 不同,不能适配 es 版本
    • 7.x 已经不建议使用,8 以后就要废弃

2、9200: HTTP

有诸多包

  • jestClient: 非官方,更新慢;
  • RestTemplate:模拟 HTTP 请求,ES 很多操作需要自己封装,麻烦;
  • HttpClient:同上;
  • Elasticsearch-Rest-Client:官方 RestClient,封装了 ES 操作,API 层次分明,上手简单;

最终选择 Elasticsearch-Rest-Client(elasticsearch-rest-high-level-client)

https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html

八、SpringBoot 整合 ElasticSearch

创建项目 gulimall-search

image-20210627114816175

选择依赖 web,但不要在里面选择 es

image-20210627114902769

1、导入依赖

这里的版本要和所按照的 ELK 版本匹配。

1
2
3
4
5
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.4.2</version>
</dependency>

在 spring-boot-dependencies 中所依赖的 ES 版本位 6.8.5,要改掉

1
2
3
4
<properties>
<java.version>1.8</java.version>
<elasticsearch.version>7.4.2</elasticsearch.version>
</properties>

请求测试项,比如 es 添加了安全访问规则,访问 es 需要添加一个安全头,就可以通过 requestOptions 设置

官方建议把 requestOptions 创建成单实例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
@Configuration
public class GulimallElasticSearchConfig {
public static final RequestOptions COMMON_OPTIONS;
static {
RequestOptions requestOptions;
RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();

COMMON_OPTIONS = builder.build();
}

@Bean
public RestHighLevelClient esRestClient(){
RestClientBuilder builder = null;
// 可以指定多个es
builder = RestClient.builder(new HttpHost("192.168.56.10",9200,"http"));

RestHighLevelClient client = new RestHighLevelClient(builder);
return client;

}
}

2、测试

2.1 保存数据

https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-document-index.html

保存方式分为同步和异步,异步方式多了个 listener 回调

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
@Test
public void indexData() throws IOException {

// 设置索引
IndexRequest indexRequest = new IndexRequest ("users");
indexRequest.id("1");

User user = new User();
user.setUserName("张三");
user.setAge(20);
user.setGender("男");
String jsonString = JSON.toJSONString(user);

//设置要保存的内容,指定数据和类型
indexRequest.source(jsonString, XContentType.JSON);

//执行创建索引和保存数据
IndexResponse index = client.index(indexRequest, GulimallElasticSearchConfig.COMMON_OPTIONS);

System.out.println(index);

}

2.2 获取数据

https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-search.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
@Test
public void find() throws IOException {
// 1 创建检索请求
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("bank");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
// 构造检索条件
// sourceBuilder.query();
// sourceBuilder.from();
// sourceBuilder.size();
// sourceBuilder.aggregation();
sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
System.out.println(sourceBuilder.toString());

searchRequest.source(sourceBuilder);

// 2 执行检索
SearchResponse response = client.search(searchRequest, GuliESConfig.COMMON_OPTIONS);
// 3 分析响应结果
System.out.println(response.toString());
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 5.4032025,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "970",
"_score": 5.4032025,
"_source": {
"account_number": 970,
"balance": 19648,
"firstname": "Forbes",
"lastname": "Wallace",
"age": 28,
"gender": "M",
"address": "990 Mill Road",
"employer": "Pheast",
"email": "forbeswallace@pheast.com",
"city": "Lopezo",
"state": "AK"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "136",
"_score": 5.4032025,
"_source": {
"account_number": 136,
"balance": 45801,
"firstname": "Winnie",
"lastname": "Holland",
"age": 38,
"gender": "M",
"address": "198 Mill Lane",
"employer": "Neteria",
"email": "winnieholland@neteria.com",
"city": "Urie",
"state": "IL"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "345",
"_score": 5.4032025,
"_source": {
"account_number": 345,
"balance": 9812,
"firstname": "Parker",
"lastname": "Hines",
"age": 38,
"gender": "M",
"address": "715 Mill Avenue",
"employer": "Baluba",
"email": "parkerhines@baluba.com",
"city": "Blackgum",
"state": "KY"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "472",
"_score": 5.4032025,
"_source": {
"account_number": 472,
"balance": 25571,
"firstname": "Lee",
"lastname": "Long",
"age": 32,
"gender": "F",
"address": "288 Mill Street",
"employer": "Comverges",
"email": "leelong@comverges.com",
"city": "Movico",
"state": "MT"
}
}
]
}
}{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 5.4032025,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "970",
"_score": 5.4032025,
"_source": {
"account_number": 970,
"balance": 19648,
"firstname": "Forbes",
"lastname": "Wallace",
"age": 28,
"gender": "M",
"address": "990 Mill Road",
"employer": "Pheast",
"email": "forbeswallace@pheast.com",
"city": "Lopezo",
"state": "AK"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "136",
"_score": 5.4032025,
"_source": {
"account_number": 136,
"balance": 45801,
"firstname": "Winnie",
"lastname": "Holland",
"age": 38,
"gender": "M",
"address": "198 Mill Lane",
"employer": "Neteria",
"email": "winnieholland@neteria.com",
"city": "Urie",
"state": "IL"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "345",
"_score": 5.4032025,
"_source": {
"account_number": 345,
"balance": 9812,
"firstname": "Parker",
"lastname": "Hines",
"age": 38,
"gender": "M",
"address": "715 Mill Avenue",
"employer": "Baluba",
"email": "parkerhines@baluba.com",
"city": "Blackgum",
"state": "KY"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "472",
"_score": 5.4032025,
"_source": {
"account_number": 472,
"balance": 25571,
"firstname": "Lee",
"lastname": "Long",
"age": 32,
"gender": "F",
"address": "288 Mill Street",
"employer": "Comverges",
"email": "leelong@comverges.com",
"city": "Movico",
"state": "MT"
}
}
]
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
@Test
public void find() throws IOException {
// 1 创建检索请求
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("bank");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
// 构造检索条件
// sourceBuilder.query();
// sourceBuilder.from();
// sourceBuilder.size();
// sourceBuilder.aggregation();
sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
//AggregationBuilders工具类构建AggregationBuilder
// 构建第一个聚合条件:按照年龄的值分布
TermsAggregationBuilder agg1 = AggregationBuilders.terms("ageAgg").field("age").size(10);// 聚合名称
// 参数为AggregationBuilder
sourceBuilder.aggregation(agg1);
// 构建第二个聚合条件:平均薪资
AvgAggregationBuilder agg2 = AggregationBuilders.avg("balanceAvg").field("balance");
sourceBuilder.aggregation(agg2);

System.out.println("检索条件"+sourceBuilder.toString());

searchRequest.source(sourceBuilder);

// 2 执行检索
SearchResponse response = client.search(searchRequest, GuliESConfig.COMMON_OPTIONS);
// 3 分析响应结果
System.out.println(response.toString());
}

转换 bean

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
@Data
static class Account {
private int account_number;
private int balance;
private String firstname;
private String lastname;
private int age;
private String gender;
private String address;
private String employer;
private String email;
private String city;
private String state;

@Override
public String toString() {
return "Account{" +
"account_number=" + account_number +
", balance=" + balance +
", firstname='" + firstname + '\'' +
", lastname='" + lastname + '\'' +
", age=" + age +
", gender='" + gender + '\'' +
", address='" + address + '\'' +
", employer='" + employer + '\'' +
", email='" + email + '\'' +
", city='" + city + '\'' +
", state='" + state + '\'' +
'}';
}
}
1
2
3
4
5
6
7
8
9
10
11
// 3.1 获取java bean
SearchHits hits = response.getHits();
SearchHit[] hits1 = hits.getHits();
for (SearchHit hit : hits1) {
hit.getId();
hit.getIndex();
String sourceAsString = hit.getSourceAsString();
Account account = JSON.parseObject(sourceAsString, Account.class);
System.out.println(account);

}
1
2
3
4
Account(accountNumber=970, balance=19648, firstname=Forbes, lastname=Wallace, age=28, gender=M, address=990 Mill Road, employer=Pheast, email=forbeswallace@pheast.com, city=Lopezo, state=AK)
Account(accountNumber=136, balance=45801, firstname=Winnie, lastname=Holland, age=38, gender=M, address=198 Mill Lane, employer=Neteria, email=winnieholland@neteria.com, city=Urie, state=IL)
Account(accountNumber=345, balance=9812, firstname=Parker, lastname=Hines, age=38, gender=M, address=715 Mill Avenue, employer=Baluba, email=parkerhines@baluba.com, city=Blackgum, state=KY)
Account(accountNumber=472, balance=25571, firstname=Lee, lastname=Long, age=32, gender=F, address=288 Mill Street, employer=Comverges, email=leelong@comverges.com, city=Movico, state=MT)

Buckets 分析信息

1
2
3
4
5
6
7
8
// 3.2 获取检索到的分析信息
Aggregations aggregations = response.getAggregations();
Terms ageAgg = aggregations.get("ageAgg");
for (Terms.Bucket bucket : ageAgg.getBuckets()) {
System.out.println("年龄:" + bucket.getKeyAsString() + "--人数: " + bucket.getDocCount());
}
Avg balanceAvg = aggregations.get("balanceAvg");
System.err.println("薪资平均值:"+balanceAvg.getValue());

搜索 address 中包含 mill 的所有人的年龄分布以及平均年龄,平均薪资

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
GET bank/_search
{
"query": {
"match": {
"address": "Mill"
}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 10
}
},
"ageAvg": {
"avg": {
"field": "age"
}
},
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
}

补充:安装 Nginx

  • 随便启动一个 nginx 实例,只是为了复制出配置
1
docker run -p80:80 --name nginx -d nginx:1.10
  • 将容器内的配置文件拷贝到当前目录 (别忘了后面的点 )
1
docker container cp nginx:/etc/nginx .
  • 关闭 nginx 容器和删除
1
2
docker stop nginx
docker rm nginx
  • 查看 mydata 目录下的 nginx 复制文件
1
2
cd /mydata/nginx
ll

image-20210627112135785

  • 在 mydata 目录下把 nginx 文件夹名修改为 conf
1
mv nginx conf

image-20210627112412712

  • 在 mydata 目录下再次新建 nginx 目录,把 conf 目录移动进去
1
2
3
4
[root@localhost mydata]# mkdir nginx
[root@localhost mydata]# mv conf nginx/
[root@localhost mydata]# ls
elasticsearch mysql nginx portainer redis
  • 创建新的 nginx
1
2
3
4
5
docker run -p 80:80 --name nginx \
-v /mydata/nginx/html:/usr/share/nginx/html \
-v /mydata/nginx/logs:/var/log/nginx \
-v /mydata/nginx/conf:/etc/nginx \
-d nginx:1.10

image-20210627112844296

  • 创建“/mydata/nginx/html/index.html”文件,测试是否能够正常访问
1
echo '<h2>hello nginx!</h2>' >index.html

访问:http://nginx 所在主机的 IP:80/index.html

image-20210627113513696