PHP使用Elasticsearch实现基于IK分词搜索及多维度聚合查询实例

搜索引擎 2020年05月29日

本篇笔记记录了在linux的CentOS7发行版下使用RPM包安装Elasticsearch,安装IK分词插件,创建索引设置和映射,以及使用Composer安装PHP的Elasticsearch包的过程;并实现了搜索和多维度聚合查询的实例。

安装Elasticsearch

下载RPM包并安装

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.7.0-x86_64.rpm
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.7.0-x86_64.rpm.sha512
shasum -a 512 -c elasticsearch-7.7.0-x86_64.rpm.sha512 
sudo rpm --install elasticsearch-7.7.0-x86_64.rpm

设置开机自动启动

/bin/systemctl daemon-reload
/bin/systemctl enable elasticsearch.service

启动Elasticsearch

systemctl start elasticsearch

systemd查看Elasticsearch运行状态

systemctl status elasticsearch

curl查看Elasticsearch运行状态

curl http://localhost:9200
{
  "name" : "localhost.localdomain",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "RiSXKDmvQVCwljXTdXHFwA",
  "version" : {
    "number" : "7.7.0",
    "build_flavor" : "default",
    "build_type" : "rpm",
    "build_hash" : "81a1e9eda8e6183f5237786246f6dced26a10eaf",
    "build_date" : "2020-05-12T02:01:37.602180Z",
    "build_snapshot" : false,
    "lucene_version" : "8.5.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

安装IK分词插件

下载IK包

wget --no-check-certificate  https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.7.0/elasticsearch-analysis-ik-7.7.0.zip

创建IK插件目录

cd /usr/share/elasticsearch/plugins/ && mkdir ik

解压IK包至ik目录

unzip /usr/local/elasticsearch-analysis-ik-7.7.0.zip -d /usr/share/elasticsearch/plugins/ik/

重启Elasticsearch

systemctl restart elasticsearch

测试IK分词

curl -XGET 'http://localhost:9200/_analyze?pretty=true'  -H 'Content-Type:application/json' -d'{"analyzer":"ik_max_word","text":"我的学习笔记"}'
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "的",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "学习",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "笔记",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

创建索引

curl -X PUT http://localhost:9200/article -H 'Content-Type:application/json' -d'{
	"settings": {
		"number_of_shards": 3,
		"number_of_replicas": 2
	},
	"mappings": {
		"properties": {
			"title": {
				"type": "text",
				"analyzer": "ik_max_word",
				"search_analyzer": "ik_max_word"
			},
			"category": {
				"type": "keyword"
			},
			"create_date": {
				"type": "date",
				"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
			},
			"article_views": {
				"type": "integer"
			}
		}
	}
}'

查看索引信息

curl http://localhost:9200/article?pretty=true
{
	"article": {
		"aliases": {},
		"mappings": {
			"properties": {
				"article_views": {
					"type": "integer"
				},
				"category": {
					"type": "keyword"
				},
				"create_date": {
					"type": "date",
					"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
				},
				"title": {
					"type": "text",
					"analyzer": "ik_max_word"
				}
			}
		},
		"settings": {
			"index": {
				"creation_date": "1587886948979",
				"number_of_shards": "3",
				"number_of_replicas": "2",
				"uuid": "pkEmM36VT9KHjVGYxfqXJA",
				"version": {
					"created": "7070099"
				},
				"provided_name": "article"
			}
		}
	}
}

settings:
number_of_shards主分片数
number_of_replicas副本数
mappings:
title文章标题,启用ik_max_word分词模式
category文章分类
article_views文章浏览量
create_date文章创建时间
修改配置文件

vim /etc/elasticsearch/elasticsearch.yml

配置

network.host: 0.0.0.0
cluster.initial_master_nodes: ["node-1", "node-2"]

重启Elasticsearch

systemctl restart elasticsearch

如果重启失败,查看log

tail -f /var/log/elasticsearch/elasticsearch.log

安装PHP的Elasticsearch包

本实例基于yii2框架,但不过多使用yii2特性,主要演示Elasticsearch

Elasticsearch包运行要求

  • PHP 7.1.0或更高版本
  • Composer
  • PHP的Libcurl扩展

安装

composer require elasticsearch/elasticsearch ~7.0

引入Elasticsearch包

我这里没有require 'vendor/autoload.php';是因为yii的index.php入口文件已经帮我做了这件事

use Elasticsearch\ClientBuilder;

测试一下,查看索引的字段映射信息

    public function actionTest()
    {
        $hosts = [
            '192.168.75.237:9200',         // IP + Port
        ];
        $client = ClientBuilder::create()           // Instantiate a new ClientBuilder
        ->setHosts($hosts)      // Set the hosts
        ->build();
        $params = [
            'index'  => 'article',
        ];
        $result = $client->indices()->getMapping($params);
        print_r($result);
    }

执行结果

创建索引

mysql部分略过......

    public function actionCreateIndex()
    {
        $hosts = [
            '192.168.75.237:9200',         // IP + Port
        ];
        $client = ClientBuilder::create()           // Instantiate a new ClientBuilder
        ->setHosts($hosts)      // Set the hosts
        ->build();
        $article = Article::find()->where(['article_status' => 1])->all();
        foreach ($article as $row) {
            $params = [
                'index' => 'article',
                'id'    => $row->id,
                'body'  => [
                    'title' => $row->article_title,
                    'category' => $row->category->category_name,
                    'article_views' => $row->article_views,
                    'create_date' => $row->create_date,
                ]
            ];
            $response = $client->index($params);
            var_dump($response['_shards']['successful']);
        }
    }

执行结果

Elasticsearch支持使用$client->bulk($params);方法批量创建索引,详情请查阅官方资料
搜索数据

public function actionResult()
    {
        $keyword = Yii::$app->request->get('keyword', '');
        $hosts = [
            '192.168.75.237:9200',         // IP + Port
        ];
        $client = ClientBuilder::create()           // Instantiate a new ClientBuilder
        ->setHosts($hosts)      // Set the hosts
        ->build();
        $params = [
            'index' => 'article',
            'body'  => [
                //查询
                'query' => [
                    //匹配搜索
                    'match' => [
                        'title' => $keyword
                    ]
                ],
                //高亮
                'highlight' => [
                    //自定义高亮起始标签,默认em
                    'pre_tags' => ['<span class="red">'],
                    //自定义高亮结束标签,默认em
                    'post_tags' => ['</span>'],
                    //高亮字段
                    'fields' => [
                        'title' => new \stdClass(),
                    ]
                ]
            ]
        ];

        $results = $client->search($params);
        print_r($results);
        exit();
    }

搜索结果

搜索数据,并进行多维度聚合

    public function actionResult()
    {
        $keyword = Yii::$app->request->get('keyword', '');
        $hosts = [
            '192.168.75.237:9200',         // IP + Port
        ];
        $client = ClientBuilder::create()           // Instantiate a new ClientBuilder
        ->setHosts($hosts)      // Set the hosts
        ->build();
        $params = [
            'index' => 'article',
            'body'  => [
                'from' => 0,
                'size' => 10,
                //查询
                'query' => [
                    //匹配搜索
                    'match' => [
                        'title' => $keyword
                    ]
                ],
                //高亮
                'highlight' => [
                    //自定义高亮起始标签,默认em
                    'pre_tags' => ['<span class="red">'],
                    //自定义高亮结束标签,默认em
                    'post_tags' => ['</span>'],
                    //高亮字段
                    'fields' => [
                        'title' => new \stdClass(),
                    ]
                ],
				//聚合
                'aggs' => [
					//按分类聚合
                    'group_category' => [
                        'terms' => ['field' => 'category'],
                    ],
					//按创建时间聚合,步长按月,最小数量设置1,过滤掉0的数据
                    'group_create_date' => [
                        'date_histogram' => [
                            'field' => 'create_date',
                            'interval' => 'month',
                            'format' => 'yyyy-MM',
                            'min_doc_count' => 1,
                        ],
                    ],
					//按访问量区间聚合,步长500
                    'interval_article_views' => [
                        'histogram' => [
                            'field' => 'article_views',
                            'interval' => 500,
                        ],
                    ],
                ],
            ]
        ];

        $results = $client->search($params);
        print_r($results);
        exit();
    }

搜索结果

Array
(
    [took] => 12
    [timed_out] => 
    [_shards] => Array
        (
            [total] => 3
            [successful] => 3
            [skipped] => 0
            [failed] => 0
        )

    [hits] => Array
        (
            [total] => Array
                (
                    [value] => 13
                    [relation] => eq
                )

            [max_score] => 8.7013235
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => article
                            [_type] => _doc
                            [_id] => 850
                            [_score] => 8.7013235
                            [_source] => Array
                                (
                                    [title] => Redis Cluster集群搭建
                                    [category] => nosql笔记
                                    [article_views] => 25
                                    [create_date] => 2020-05-26 12:45:44
                                )

                            [highlight] => Array
                                (
                                    [title] => Array
                                        (
                                            [0] => <span class="red">Redis</span> Cluster<span class="red">集群</span><span class="red">搭建</span>
                                        )

                                )

                        )

                    [1] => Array
                        (
                            [_index] => article
                            [_type] => _doc
                            [_id] => 851
                            [_score] => 3.7347772
                            [_source] => Array
                                (
                                    [title] => Redis Cluster集群新增、删除节点以及重新分配hash slot哈希槽
                                    [category] => nosql笔记
                                    [article_views] => 15
                                    [create_date] => 2020-05-27 15:42:21
                                )

                            [highlight] => Array
                                (
                                    [title] => Array
                                        (
                                            [0] => <span class="red">Redis</span> Cluster<span class="red">集群</span>新增、删除节点以及重新分配hash slot哈希槽
                                        )

                                )

                        )
............

                    [9] => Array
                        (
                            [_index] => article
                            [_type] => _doc
                            [_id] => 601
                            [_score] => 1.3900613
                            [_source] => Array
                                (
                                    [title] => PHP使用Redis的Transaction(事务)命令
                                    [category] => nosql笔记
                                    [article_views] => 463
                                    [create_date] => 2019-01-24 22:34:56
                                )

                            [highlight] => Array
                                (
                                    [title] => Array
                                        (
                                            [0] => PHP使用<span class="red">Redis</span>的Transaction(事务)命令
                                        )

                                )

                        )

                )

        )

    [aggregations] => Array
        (
            [group_category] => Array
                (
                    [doc_count_error_upper_bound] => 0
                    [sum_other_doc_count] => 0
                    [buckets] => Array
                        (
                            [0] => Array
                                (
                                    [key] => nosql笔记
                                    [doc_count] => 11
                                )

                            [1] => Array
                                (
                                    [key] => git笔记
                                    [doc_count] => 1
                                )

                            [2] => Array
                                (
                                    [key] => 消息队列
                                    [doc_count] => 1
                                )

                        )

                )

            [group_create_date] => Array
                (
                    [buckets] => Array
                        (
                            [0] => Array
                                (
                                    [key_as_string] => 2018-12
                                    [key] => 1543622400000
                                    [doc_count] => 2
                                )

                            [1] => Array
                                (
                                    [key_as_string] => 2019-01
                                    [key] => 1546300800000
                                    [doc_count] => 6
                                )

                            [2] => Array
                                (
                                    [key_as_string] => 2019-03
                                    [key] => 1551398400000
                                    [doc_count] => 1
                                )

                            [3] => Array
                                (
                                    [key_as_string] => 2020-05
                                    [key] => 1588291200000
                                    [doc_count] => 4
                                )

                        )

                )

            [interval_article_views] => Array
                (
                    [buckets] => Array
                        (
                            [0] => Array
                                (
                                    [key] => 0
                                    [doc_count] => 9
                                )

                            [1] => Array
                                (
                                    [key] => 500
                                    [doc_count] => 4
                                )

                        )

                )

        )

)

搜索结果页面展示(模版部分略过......)

主要实现了以下功能:

  • ik分词搜索
  • 搜索结果高亮
  • 按分类聚合:搜索命中的数据的分类和对应的数量
  • 按发布时间聚合:搜索命中的数据的发布月份和对应的数量
  • 按浏览量聚合:搜索命中的数据的浏览量区间和对应的数量