4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
1
|
# Unified Configuration for Multi-Tenant Search Engine
|
33839b37
tangwang
属性值参与搜索:
|
2
3
|
# 统一配置文件,所有租户共用一套配置
# 注意:索引结构由 mappings/search_products.json 定义,此文件只配置搜索行为
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
4
5
6
7
|
# Elasticsearch Index
es_index_name: "search_products"
|
33839b37
tangwang
属性值参与搜索:
|
8
|
# ES Index Settings (基础设置)
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
9
10
11
12
13
|
es_settings:
number_of_shards: 1
number_of_replicas: 0
refresh_interval: "30s"
|
33839b37
tangwang
属性值参与搜索:
|
14
15
16
17
|
# 字段权重配置(用于搜索时的字段boost)
# 只配置权重,不配置字段结构(字段结构由 mappings/search_products.json 定义)
field_boosts:
# 文本相关性字段
|
d7d48f52
tangwang
改动(mapping + 灌入结构)
|
18
19
20
21
22
23
24
25
|
"title.zh": 3.0
"brief.zh": 1.5
"description.zh": 1.0
"vendor.zh": 1.5
"title.en": 3.0
"brief.en": 1.5
"description.en": 1.0
"vendor.en": 1.5
|
33839b37
tangwang
属性值参与搜索:
|
26
|
|
5dcddc06
tangwang
索引重构
|
27
|
# 分类相关字段
|
d7d48f52
tangwang
改动(mapping + 灌入结构)
|
28
29
30
31
|
"category_path.zh": 1.5
"category_name_text.zh": 1.5
"category_path.en": 1.5
"category_name_text.en": 1.5
|
33839b37
tangwang
属性值参与搜索:
|
32
33
34
35
36
37
38
39
40
|
# 标签和属性值字段
tags: 1.0
option1_values: 0.5
option2_values: 0.5
option3_values: 0.5
# 搜索域配置(Query Domains)
# 定义不同的搜索策略,指定哪些字段组合在一起搜索
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
41
42
|
indexes:
- name: "default"
|
33839b37
tangwang
属性值参与搜索:
|
43
|
label: "默认搜索"
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
44
|
fields:
|
d7d48f52
tangwang
改动(mapping + 灌入结构)
|
45
46
47
48
|
- "title.zh"
- "brief.zh"
- "description.zh"
- "vendor.zh"
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
49
|
- "tags"
|
d7d48f52
tangwang
改动(mapping + 灌入结构)
|
50
51
|
- "category_path.zh"
- "category_name_text.zh"
|
33839b37
tangwang
属性值参与搜索:
|
52
|
- "option1_values"
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
53
54
55
|
boost: 1.0
- name: "title"
|
33839b37
tangwang
属性值参与搜索:
|
56
|
label: "标题搜索"
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
57
|
fields:
|
d7d48f52
tangwang
改动(mapping + 灌入结构)
|
58
|
- "title.zh"
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
59
60
61
|
boost: 2.0
- name: "vendor"
|
33839b37
tangwang
属性值参与搜索:
|
62
|
label: "品牌搜索"
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
63
|
fields:
|
d7d48f52
tangwang
改动(mapping + 灌入结构)
|
64
|
- "vendor.zh"
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
65
66
67
|
boost: 1.5
- name: "category"
|
33839b37
tangwang
属性值参与搜索:
|
68
|
label: "类目搜索"
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
69
|
fields:
|
d7d48f52
tangwang
改动(mapping + 灌入结构)
|
70
71
|
- "category_path.zh"
- "category_name_text.zh"
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
72
73
74
|
boost: 1.5
- name: "tags"
|
33839b37
tangwang
属性值参与搜索:
|
75
|
label: "标签搜索"
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
76
77
|
fields:
- "tags"
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
78
79
|
boost: 1.0
|
33839b37
tangwang
属性值参与搜索:
|
80
|
# Query Configuration(查询配置)
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
81
|
query_config:
|
33839b37
tangwang
属性值参与搜索:
|
82
|
# 支持的语言
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
83
84
85
|
supported_languages:
- "zh"
- "en"
|
2739b281
tangwang
多语言索引调整
|
86
|
default_language: "en"
|
33839b37
tangwang
属性值参与搜索:
|
87
|
|
345d960b
tangwang
1. 删除全局 enable_tr...
|
88
|
# 功能开关(翻译开关由tenant_config控制)
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
89
90
|
enable_text_embedding: true
enable_query_rewrite: true
|
7bc756c5
tangwang
优化 ES 查询构建
|
91
|
enable_multilang_search: true # 启用多语言搜索(使用翻译进行跨语言检索)
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
92
|
|
33839b37
tangwang
属性值参与搜索:
|
93
94
95
|
# Embedding字段名称
text_embedding_field: "title_embedding"
image_embedding_field: null
|
325eec03
tangwang
1. 日志、配置基础设施,使用优化
|
96
|
|
33839b37
tangwang
属性值参与搜索:
|
97
|
# Embedding禁用阈值(短查询不使用向量搜索)
|
9f96d6f3
tangwang
短query不用语义搜索
|
98
|
embedding_disable_thresholds:
|
33839b37
tangwang
属性值参与搜索:
|
99
100
|
chinese_char_limit: 4
english_word_limit: 3
|
9f96d6f3
tangwang
短query不用语义搜索
|
101
|
|
42e3aea6
tangwang
tidy
|
102
|
# 翻译API配置(provider/URL 在 services.translation)
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
103
|
translation_service: "deepl"
|
33839b37
tangwang
属性值参与搜索:
|
104
|
translation_api_key: null # 通过环境变量设置
|
42e3aea6
tangwang
tidy
|
105
|
|
0064e946
tangwang
feat: 增量索引服务、租户配置...
|
106
107
108
109
110
111
112
113
114
115
116
117
|
# 翻译提示词配置(用于提高翻译质量,作为DeepL API的context参数)
translation_prompts:
# 商品标题翻译提示词
product_title_zh: "请将原文翻译成中文商品SKU名称,要求:确保精确、完整地传达原文信息的基础上,语言简洁清晰、地道、专业。"
product_title_en: "Translate the original text into an English product SKU name. Requirements: Ensure accurate and complete transmission of the original information, with concise, clear, authentic, and professional language."
# query翻译提示词
query_zh: "电商领域"
query_en: "e-commerce domain"
# 默认翻译用词
default_zh: "电商领域"
default_en: "e-commerce domain"
|
33839b37
tangwang
属性值参与搜索:
|
118
119
120
|
# 返回字段配置(_source includes)
# null表示返回所有字段,[]表示不返回任何字段,列表表示只返回指定字段
source_fields: null
|
70dab99f
tangwang
add logs
|
121
122
123
|
# KNN boost配置(向量召回的boost值)
knn_boost: 0.25 # Lower boost for embedding recall
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
124
|
|
33839b37
tangwang
属性值参与搜索:
|
125
|
# Ranking Configuration(排序配置)
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
126
|
ranking:
|
70dab99f
tangwang
add logs
|
127
|
expression: "bm25() + 0.25*text_embedding_relevance()"
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
128
129
130
131
132
133
|
description: "BM25 text relevance combined with semantic embedding similarity"
# Function Score配置(ES层打分规则)
function_score:
score_mode: "sum"
boost_mode: "multiply"
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
134
135
|
functions: []
|
42e3aea6
tangwang
tidy
|
136
|
# 重排配置(provider/URL 在 services.rerank)
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
137
|
rerank:
|
506c39b7
tangwang
feat(search): 统一重...
|
138
|
rerank_window: 1000
|
42e3aea6
tangwang
tidy
|
139
|
timeout_sec: 15.0
|
506c39b7
tangwang
feat(search): 统一重...
|
140
141
|
weight_es: 0.4
weight_ai: 0.6
|
ff32d894
tangwang
rerank
|
142
143
|
rerank_query_template: "{query}"
rerank_doc_template: "{title}"
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
144
|
|
42e3aea6
tangwang
tidy
|
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
|
# 可扩展服务/provider 注册表(单一配置源)
services:
translation:
provider: "direct" # direct | http | google(reserved)
base_url: "http://127.0.0.1:6006"
model: "qwen"
timeout_sec: 10.0
providers:
direct:
model: "qwen"
http:
base_url: "http://127.0.0.1:6006"
model: "qwen"
timeout_sec: 10.0
google:
enabled: false
project_id: ""
location: "global"
model: ""
embedding:
|
950a640e
tangwang
embeddings
|
165
|
provider: "http" # http
|
42e3aea6
tangwang
tidy
|
166
167
168
169
|
base_url: "http://127.0.0.1:6005"
providers:
http:
base_url: "http://127.0.0.1:6005"
|
07cf5a93
tangwang
START_EMBEDDING=...
|
170
171
172
173
174
175
176
177
178
179
180
181
|
# 服务内文本后端(embedding 进程启动时读取)
backend: "tei" # tei | local_st
backends:
tei:
base_url: "http://127.0.0.1:8080"
timeout_sec: 60
model_id: "Qwen/Qwen3-Embedding-0.6B"
local_st:
model_id: "Qwen/Qwen3-Embedding-0.6B"
device: "cuda"
batch_size: 32
normalize_embeddings: true
|
42e3aea6
tangwang
tidy
|
182
|
rerank:
|
701ae503
tangwang
docs
|
183
|
provider: "http"
|
42e3aea6
tangwang
tidy
|
184
185
186
187
|
base_url: "http://127.0.0.1:6007"
providers:
http:
base_url: "http://127.0.0.1:6007"
|
701ae503
tangwang
docs
|
188
189
|
service_url: "http://127.0.0.1:6007/rerank"
# 服务内后端(reranker 进程启动时读取)
|
07cf5a93
tangwang
START_EMBEDDING=...
|
190
|
backend: "qwen3_vllm" # bge | qwen3_vllm
|
701ae503
tangwang
docs
|
191
192
193
194
195
196
197
198
199
200
201
202
|
backends:
bge:
model_name: "BAAI/bge-reranker-v2-m3"
device: null
use_fp16: true
batch_size: 64
max_length: 512
cache_dir: "./model_cache"
enable_warmup: true
qwen3_vllm:
model_name: "Qwen/Qwen3-Reranker-0.6B"
engine: "vllm"
|
07cf5a93
tangwang
START_EMBEDDING=...
|
203
|
max_model_len: 256
|
701ae503
tangwang
docs
|
204
|
tensor_parallel_size: 1
|
07cf5a93
tangwang
START_EMBEDDING=...
|
205
206
|
gpu_memory_utilization: 0.36
dtype: "float16"
|
bc089b43
tangwang
refactor(reranker...
|
207
208
|
enable_prefix_caching: true
enforce_eager: false
|
701ae503
tangwang
docs
|
209
|
instruction: "Given a web search query, retrieve relevant passages that answer the query"
|
42e3aea6
tangwang
tidy
|
210
|
|
cadc77b6
tangwang
索引字段名、变量名、API数据结构...
|
211
|
# SPU配置(已启用,使用嵌套skus)
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
212
213
|
spu_config:
enabled: true
|
cadc77b6
tangwang
索引字段名、变量名、API数据结构...
|
214
|
spu_field: "spu_id"
|
4d824a77
tangwang
所有租户共用一套统一配置.tena...
|
215
|
inner_hits_size: 10
|
33839b37
tangwang
属性值参与搜索:
|
216
217
218
|
# 配置哪些option维度参与检索(进索引、以及在线搜索)
# 格式为list,选择option1/option2/option3中的一个或多个
searchable_option_dimensions: ['option1', 'option2', 'option3']
|
0064e946
tangwang
feat: 增量索引服务、租户配置...
|
219
220
|
# 租户配置(Tenant Configuration)
|
038e4e2f
tangwang
refactor(i18n): t...
|
221
222
|
# 每个租户可配置主语言 primary_language 与索引语言 index_languages(主市场语言,商家可勾选)
# 默认 index_languages: [en, zh],可配置为任意 SUPPORTED_INDEX_LANGUAGES 的子集
|
0064e946
tangwang
feat: 增量索引服务、租户配置...
|
223
|
tenant_config:
|
0064e946
tangwang
feat: 增量索引服务、租户配置...
|
224
|
default:
|
2739b281
tangwang
多语言索引调整
|
225
|
primary_language: "en"
|
038e4e2f
tangwang
refactor(i18n): t...
|
226
|
index_languages: ["en", "zh"]
|
0064e946
tangwang
feat: 增量索引服务、租户配置...
|
227
228
229
|
tenants:
"1":
primary_language: "zh"
|
038e4e2f
tangwang
refactor(i18n): t...
|
230
|
index_languages: ["zh", "en"]
|
0064e946
tangwang
feat: 增量索引服务、租户配置...
|
231
232
|
"2":
primary_language: "en"
|
038e4e2f
tangwang
refactor(i18n): t...
|
233
|
index_languages: ["en", "zh"]
|
0064e946
tangwang
feat: 增量索引服务、租户配置...
|
234
235
|
"3":
primary_language: "zh"
|
038e4e2f
tangwang
refactor(i18n): t...
|
236
|
index_languages: ["zh", "en"]
|
0064e946
tangwang
feat: 增量索引服务、租户配置...
|
237
238
|
"162":
primary_language: "zh"
|
038e4e2f
tangwang
refactor(i18n): t...
|
239
|
index_languages: ["zh", "en"]
|
cff5e86f
tangwang
reindex
|
240
241
|
"170":
primary_language: "en"
|
038e4e2f
tangwang
refactor(i18n): t...
|
242
|
index_languages: ["en", "zh"]
|