Blame view

docs/ES/ES_8.18/3.1_hanlp安装.md 5.12 KB
648cb4c2   tangwang   ES docs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
  

  

  

  TODO:

  错误也是access denied ("java.io.FilePermission" "/usr/share/elasticsearch/data/model/perceptron/large/cws.bin" "read")

  

  一、环境:

  [root@192 elasticsearch]# bin/elasticsearch --version

  Version: 7.10.2, Build: default/rpm/747e1cc71def077253878a59143c1f785afa92b9/2021-01-13T00:42:12.435326Z, JVM: 15.0.1

  [root@192 ~]# javac -version

  javac 1.8.0_362

  

  二、jdk java.policy补充配置:

  [root@192 elasticsearch]# tail /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.362.b08-3.el8.x86_64/jre/lib/security/java.policy

          permission java.util.PropertyPermission "java.vm.specification.name", "read";

          permission java.util.PropertyPermission "java.vm.version", "read";

          permission java.util.PropertyPermission "java.vm.vendor", "read";

          permission java.util.PropertyPermission "java.vm.name", "read";

  

          permission java.util.PropertyPermission "sun.security.pkcs11.disableKeyExtraction", "read";

          permission java.net.SocketPermission "*", "connect,resolve";

          permission java.io.FilePermission "-","read,write,delete";

  };

  

  三、安装插件:

  cd /usr/share/elasticsearch/

  bin/elasticsearch-plugin install file:///ssd/samba_root1/projects/hanLP/elasticsearch-analysis-hanlp-7.10.2.zip 

  

  chown -R elasticsearch:elasticsearch plugins/

  

  四、插件plugin-security.policy配置

  因为本身已经配置了plugins/analysis-hanlp下的配置,而错误提示是用的elasticsearch/data/model 而不是 elasticsearch/plugins/analysis-hanlp/data/model,因此加上了上级目录的读写权限。

  vi plugins/analysis-hanlp/plugin-security.policy

  补充:

    permission java.io.FilePermission "-", "read,write,delete";

  

  五、这时候提示model下的文件不存在,引文Model下是空的:

  1)发现data/model用的不是插件下的data/model,这个data跟plugins同级,因此会把data拷贝一份到与plugins同级,使得两种路径都能访问到

  2)models下只有一个readme,因此我会把hanLP官网下载下来的fulldata,把model下的内容拷贝过来

  

  ll plugins/analysis-hanlp/data/model/perceptron/large/cws.bin

  

  把data拷贝到上级

  cd /usr/share/elasticsearch

  cp -r /ssd/samba_root1/projects/hanLP/data/model/* plugins/analysis-hanlp/data/model/

  chown -R elasticsearch:elasticsearch plugins/

  cp -r /ssd/samba_root1/projects/hanLP/data . 

  chown -R elasticsearch:elasticsearch data/

  

  确保两个路径都能访问到:

  ll /usr/share/elasticsearch/data/model/perceptron/large/cws.bin

  ll /usr/share/elasticsearch/plugins/analysis-hanlp/data/model/perceptron/large/cws.bin

  

  六、还是提示同样的错误:

  java.security.AccessControlException: access denied ("java.io.FilePermission" "/usr/share/elasticsearch/data/model/perceptron/large/cws.bin" "read")

  但是这个路径是存在的:

  [root@192 elasticsearch]# l /usr/share/elasticsearch/data/model/perceptron/large/cws.bin

  -rw-r--r-- 1 elasticsearch elasticsearch 278038786 3月   9 19:44 /usr/share/elasticsearch/data/model/perceptron/large/cws.bin

  

  

  

  

  

  使用参考:

  https://www.bilibili.com/read/cv22886449/

  

  一、python库安装

  pip install hanlp -i https://pypi.tuna.tsinghua.edu.cn/simple

  pip install --upgrade "hanlp[full]" -i https://pypi.tuna.tsinghua.edu.cn/simple

  

  # 环境准备:

  # pip install hanlp -i https://pypi.tuna.tsinghua.edu.cn/simple

  # # 安装完整版,注意zsh对于方括号[]要用引号包裹起来

  # pip install --upgrade "hanlp[full]" -i https://pypi.tuna.tsinghua.edu.cn/simple

  # 参考

  # https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/tok_stl.ipynb

  # https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/tok_mtl.ipynb

  

  (另外有一个 pyhanlp,看到官网上用的是hanlp,不是pyhanlp,先忽略)

  

  

  

  二、ES插件安装:

  1. ES版本适配

  hanlp插件只支持到7.10.2,因此将ES降到7.10.2 

  注意降级之前要把之前的数据都删掉(尝试把data数据下的路径删掉还不行),删掉后要yum remove elastics*,然后再安装。否则降级后ES不能适配新的数据格式,启动失败。

  (删除之前把 /etc/elasticsearch/elasticsearch.yml 备份下来,否则又要重新改一次)

  

  es起来后,相应的pip的es版本也要更新:

  pip install elasticsearch==7.10.1

  

  2. 安装插件

  1)下载

  hanlp:

  https://github.com/KennFalcon/elasticsearch-analysis-hanlp

  这里找一个最新的:

  https://github.com/KennFalcon/elasticsearch-analysis-hanlp/releases

  

  2)安装插件:注意文件路径前面要加file://

  cd /usr/share/elasticsearch

  bin/elasticsearch-plugin install file:///ssd/samba_root1/projects/hanLP/elasticsearch-analysis-hanlp-7.10.2.zip 

  

  如果要移出插件:

  bin/elasticsearch-plugin remove analysis-hanlp

  

  3)安装数据包

  https://github.com/hankcs/HanLP/releases?page=1

  看到2020年5月以来、直至最新,都是这个数据包:数据包兼容data-for-1.7.5.zip md5=1d9e1be4378b2dbc635858d9c3517aaa

  

  unzip 后,把老的data备份,把新的拷贝过来:

  

  cd /usr/share/elasticsearch/plugins/analysis-hanlp

  mv data data__bak_replace_by_full_data

  cp /ssd/samba_root1/projects/hanLP/data . -rf