6.索引(Index)的基础配置

# 1.Dynamic Mapping

当写入文档是如果索引不存在，会自动创建文档
Dynamic Mapping的机制是我们无需手动创建Mapping,自动推算出文件类型
如果类型推算错误，一些功能会推算失败如地理位置等

# 字段数据类型

简单类型
- text
- date
- integer/floating
- Boolean
- Ipv4
复杂类型
- 对象和嵌套类型
特殊类型
- geo-point

新增数据类型自动识别

JSON 类型	Elasticsearch类型
字符串	Date float/long text
布尔值	boolean
浮点数	float
整数	long
对象	object
数组	由第一个非空值类型决定
空值	忽略

# 能否更改Mapping字段

PUT user/
{
  "mappings":{
    "_doc":{
      "dynamic":"false"
    }
  }
}

1
2
3
4
5
6
7
8

对新增字段

Dynamics设为true,一旦有新字段加入,Mapping同时也会更新
Dynamics设为false,Mapping不会更细，新增字段的数据无法索引,但信息会出现在_source
Dynamics设为strict,数据不会更新,

对已有字段

一旦有数据写入，就不会修改对字段定义

\	true	false	strict
文档可索引	Y	Y	N
字段可索引	Y	N	N
Mapping被更新	Y	N	N

# 2.定义mapping

# 1.mapping的其他属性自定义

PUT users
{
  "mappings":{
    "_doc":{
      "dynamic":"true"
    },
    "properties": {
      "firstName":{
        "type": "text"
      },
      "LastName":{
        "type": "text",
        "null_value": ""
      },
      "mobile":{
        "type": "text",
        "index": false
      }
    }
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

index: 定义是否索引
null_value : 定义null时的值

PUT products
{
  "mappings":{
    "properties": {
      "company":{
        "type": "text",
        "fields":{
          "keyword":{
            "type":"keyword"
          }
        }
      },
      "comment":{
        "type": "text",
        "fields":{
          "english_comment":{
            "type":"text",
            "analyzer":"english",
            "search_analyzer":"english"
          }
        }
      }
    }
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

# 1.setting的配置

elasticsearch自带分词器

Character Filter：TOKenizer处理前，对文本特殊处理
TOKenizer：分词
TOKen Filter：TOKenizer输出的单词加工

输入:
GET _analyze
{
  "tokenizer": "keyword",
  "char_filter": ["html_strip"],
  "text": "<b>hello word</b>"
}
结果:
{
  "tokens" : [
    {
      "token" : "hello word",
      "start_offset" : 3,
      "end_offset" : 17,
      "type" : "word",
      "position" : 0
    }
  ]
}


输入:
GET _analyze
{
  "tokenizer": "standard",
  "char_filter": ["html_strip"],
  "text": "<b>hello word</b>"
}
结果:
{
  "tokens" : [
    {
      "token" : "hello",
      "start_offset" : 3,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "word",
      "start_offset" : 9,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

输入:
GET _analyze
{
  "tokenizer": "whitespace",
  "filter": ["stop"],
  "text": ["the rain is Spain"]
}
结果:
{
  "tokens" : [
    {
      "token" : "rain",
      "start_offset" : 4,
      "end_offset" : 8,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "Spain",
      "start_offset" : 12,
      "end_offset" : 17,
      "type" : "word",
      "position" : 3
    }
  ]
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74

← 5.SearchAPI