ElasticSearch安装中文分词插件IK Analysis

搜索引擎 2019年02月17日

本篇笔记记录了ElasticSearch安装中文分词插件IK Analysis,测试分词和测试搜索的过程

相关笔记:
CentOS6.9使用RPM包安装ElasticSearch
CentOS6.9安装ElasticSearch
安装IK
方法1:下载预编译包

wget -c https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.0/elasticsearch-analysis-ik-6.6.0.zip

创建插件文件夹 cd your-es-root/plugins/ && mkdir ik

将插件解压缩到文件夹 your-es-root/plugins/ik

方法2:使用elasticsearch-plugin进行安装(从v5.5.1版本支持)

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.0/elasticsearch-analysis-ik-6.6.0.zip

安装后重启elasticsearch

service elasticsearch restart

分词测试
创建一个索引

curl -XPUT http://192.168.75.135:9200/ikindex

创建一个映射模版

curl -XPOST http://192.168.75.135:9200/ikindex/iktype/_mapping -H 'Content-Type:application/json' -d'
{
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_max_word"
            }
        }

}'

默认分词测试

curl -XGET http://192.168.75.135:9200/ikindex/_analyze?pretty=true  -H 'Content-Type:application/json' -d'{"text":"我的学习笔记"}'
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "的",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "学",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "习",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "笔",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    },
    {
      "token" : "记",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<IDEOGRAPHIC>",
      "position" : 5
    }
  ]
}

IK分词测试

curl -XGET 'http://192.168.75.135:9200/ikindex/_analyze?pretty=true'  -H 'Content-Type:application/json' -d'{"analyzer":"ik_max_word","text":"我的学习笔记"}'
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "的",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "学习",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "笔记",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

搜索测试
索引一些文档

curl -XPOST http://192.168.75.135:9200/ikindex/iktype/1 -H 'Content-Type:application/json' -d'{"content":"家明的家,我的学习笔记"}'

curl -XPOST http://192.168.75.135:9200/ikindex/iktype/2 -H 'Content-Type:application/json' -d'{"content":"php是世界上最好的语言"}'

curl -XPOST http://192.168.75.135:9200/ikindex/iktype/3 -H 'Content-Type:application/json' -d'{"content":"我喜欢户外,喜欢钓鱼,喜欢自驾,我是家明"}'

curl -XPOST http://192.168.75.135:9200/ikindex/iktype/4 -H 'Content-Type:application/json' -d'{"content":"这是我的个人博客,家明的学习笔记"}'

curl -XPOST http://192.168.75.135:9200/ikindex/iktype/4 -H 'Content-Type:application/json' -d'{"content":"爱学习的孩纸,才是个好孩纸"}'

高亮显示搜索结果

curl -XPOST http://192.168.75.135:9200/ikindex/iktype/_search  -H 'Content-Type:application/json' -d'
{
    "query" : { "match" : { "content" : "学习笔记" }},
    "highlight" : {
        "pre_tags" : ["<tag1>", "<tag2>"],
        "post_tags" : ["</tag1>", "</tag2>"],
        "fields" : {
            "content" : {}
        }
    }
}'

搜索结果

{
	"took": 9,
	"timed_out": false,
	"_shards": {
		"total": 5,
		"successful": 5,
		"skipped": 0,
		"failed": 0
	},
	"hits": {
		"total": 2,
		"max_score": 0.66301036,
		"hits": [{
			"_index": "ikindex",
			"_type": "iktype",
			"_id": "4",
			"_score": 0.66301036,
			"_source": {
				"content": "爱学习的孩纸,才是个好孩纸"
			},
			"highlight": {
				"content": ["爱<tag1>学习</tag1>的孩纸,才是个好孩纸"]
			}
		}, {
			"_index": "ikindex",
			"_type": "iktype",
			"_id": "1",
			"_score": 0.5753642,
			"_source": {
				"content": "家明的家,我的学习笔记"
			},
			"highlight": {
				"content": ["家明的家,我的<tag1>学习</tag1><tag1>笔记</tag1>"]
			}
		}]
	}
}