|
| @@ -0,0 +1,1424 @@ |
| @@ -0,0 +1,1424 @@ |
|
| |
1
| +# 向量化模块和API说明文档 |
|
| |
2
| + |
|
| |
3
| +本文档详细说明SearchEngine项目中的向量化模块架构、API接口、配置方法和使用指南。 |
|
| |
4
| + |
|
| |
5
| +## 目录 |
|
| |
6
| + |
|
| |
7
| +1. [概述](#概述) |
|
| |
8
| + - 1.1 [向量化模块简介](#11-向量化模块简介) |
|
| |
9
| + - 1.2 [技术选型](#12-技术选型) |
|
| |
10
| + - 1.3 [应用场景](#13-应用场景) |
|
| |
11
| + |
|
| |
12
| +2. [向量化服务架构](#向量化服务架构) |
|
| |
13
| + - 2.1 [本地向量化服务](#21-本地向量化服务) |
|
| |
14
| + - 2.2 [云端向量化服务](#22-云端向量化服务) |
|
| |
15
| + - 2.3 [架构对比](#23-架构对比) |
|
| |
16
| + |
|
| |
17
| +3. [本地向量化服务](#本地向量化服务) |
|
| |
18
| + - 3.1 [服务启动](#31-服务启动) |
|
| |
19
| + - 3.2 [服务配置](#32-服务配置) |
|
| |
20
| + - 3.3 [模型说明](#33-模型说明) |
|
| |
21
| + |
|
| |
22
| +4. [云端向量化服务](#云端向量化服务) |
|
| |
23
| + - 4.1 [阿里云DashScope](#41-阿里云dashscope) |
|
| |
24
| + - 4.2 [API Key配置](#42-api-key配置) |
|
| |
25
| + - 4.3 [使用方式](#43-使用方式) |
|
| |
26
| + |
|
| |
27
| +5. [Embedding API详细说明](#embedding-api详细说明) |
|
| |
28
| + - 5.1 [API概览](#51-api概览) |
|
| |
29
| + - 5.2 [健康检查接口](#52-健康检查接口) |
|
| |
30
| + - 5.3 [文本向量化接口](#53-文本向量化接口) |
|
| |
31
| + - 5.4 [图片向量化接口](#54-图片向量化接口) |
|
| |
32
| + - 5.5 [错误处理](#55-错误处理) |
|
| |
33
| + |
|
| |
34
| +6. [配置说明](#配置说明) |
|
| |
35
| + - 6.1 [服务配置](#61-服务配置) |
|
| |
36
| + - 6.2 [模型配置](#62-模型配置) |
|
| |
37
| + - 6.3 [批处理配置](#63-批处理配置) |
|
| |
38
| + |
|
| |
39
| +7. [客户端集成示例](#客户端集成示例) |
|
| |
40
| + - 7.1 [Python客户端](#71-python客户端) |
|
| |
41
| + - 7.2 [Java客户端](#72-java客户端) |
|
| |
42
| + - 7.3 [cURL示例](#73-curl示例) |
|
| |
43
| + |
|
| |
44
| +8. [性能对比与优化](#性能对比与优化) |
|
| |
45
| + - 8.1 [性能对比](#81-性能对比) |
|
| |
46
| + - 8.2 [成本对比](#82-成本对比) |
|
| |
47
| + - 8.3 [优化建议](#83-优化建议) |
|
| |
48
| + |
|
| |
49
| +9. [故障排查](#故障排查) |
|
| |
50
| + - 9.1 [常见问题](#91-常见问题) |
|
| |
51
| + - 9.2 [日志查看](#92-日志查看) |
|
| |
52
| + - 9.3 [性能调优](#93-性能调优) |
|
| |
53
| + |
|
| |
54
| +10. [附录](#附录) |
|
| |
55
| + - 10.1 [向量维度说明](#101-向量维度说明) |
|
| |
56
| + - 10.2 [模型版本信息](#102-模型版本信息) |
|
| |
57
| + - 10.3 [相关文档](#103-相关文档) |
|
| |
58
| + |
|
| |
59
| +--- |
|
| |
60
| + |
|
| |
61
| +## 概述 |
|
| |
62
| + |
|
| |
63
| +### 1.1 向量化模块简介 |
|
| |
64
| + |
|
| |
65
| +SearchEngine项目实现了完整的文本和图片向量化能力,支持两种部署方式: |
|
| |
66
| + |
|
| |
67
| +1. **本地向量化服务**:独立部署的微服务,基于本地GPU/CPU运行BGE-M3和CN-CLIP模型 |
|
| |
68
| +2. **云端向量化服务**:集成阿里云DashScope API,按使用量付费 |
|
| |
69
| + |
|
| |
70
| +向量化模块是搜索引擎的核心组件,为语义搜索、图片搜索提供AI驱动的相似度计算能力。 |
|
| |
71
| + |
|
| |
72
| +### 1.2 技术选型 |
|
| |
73
| + |
|
| |
74
| +| 功能 | 本地服务 | 云端服务 | |
|
| |
75
| +|------|---------|---------| |
|
| |
76
| +| **文本模型** | BGE-M3 (Xorbits/bge-m3) | text-embedding-v4 | |
|
| |
77
| +| **图片模型** | CN-CLIP (ViT-H-14) | - | |
|
| |
78
| +| **向量维度** | 1024 | 1024 | |
|
| |
79
| +| **服务框架** | FastAPI | 阿里云API | |
|
| |
80
| +| **部署方式** | Docker/本地 | 云端API | |
|
| |
81
| + |
|
| |
82
| +### 1.3 应用场景 |
|
| |
83
| + |
|
| |
84
| +- **语义搜索**:查询文本向量化,与商品向量计算相似度 |
|
| |
85
| +- **图片搜索**:商品图片向量化,支持以图搜图 |
|
| |
86
| +- **混合检索**:BM25 + 向量相似度组合排序 |
|
| |
87
| +- **多语言搜索**:中英文跨语言语义理解 |
|
| |
88
| + |
|
| |
89
| +--- |
|
| |
90
| + |
|
| |
91
| +## 向量化服务架构 |
|
| |
92
| + |
|
| |
93
| +### 2.1 本地向量化服务 |
|
| |
94
| + |
|
| |
95
| +``` |
|
| |
96
| +┌─────────────────────────────────────────┐ |
|
| |
97
| +│ Embedding Microservice (FastAPI) │ |
|
| |
98
| +│ Port: 6005, Workers: 1 │ |
|
| |
99
| +└──────────────┬──────────────────────────┘ |
|
| |
100
| + │ |
|
| |
101
| + ┌───────┴───────┐ |
|
| |
102
| + │ │ |
|
| |
103
| +┌──────▼──────┐ ┌────▼─────┐ |
|
| |
104
| +│ BGE-M3 │ │ CN-CLIP │ |
|
| |
105
| +│ Text Model │ │ Image │ |
|
| |
106
| +│ (CUDA/CPU) │ │ Model │ |
|
| |
107
| +└─────────────┘ └──────────┘ |
|
| |
108
| +``` |
|
| |
109
| + |
|
| |
110
| +**核心特性**: |
|
| |
111
| +- 独立部署,可横向扩展 |
|
| |
112
| +- GPU加速支持 |
|
| |
113
| +- 线程安全设计 |
|
| |
114
| +- 启动时预加载模型 |
|
| |
115
| + |
|
| |
116
| +### 2.2 云端向量化服务 |
|
| |
117
| + |
|
| |
118
| +``` |
|
| |
119
| +┌─────────────────────────────────────┐ |
|
| |
120
| +│ SearchEngine Main Service │ |
|
| |
121
| +│ (uses CloudTextEncoder) │ |
|
| |
122
| +└──────────────┬──────────────────────┘ |
|
| |
123
| + │ |
|
| |
124
| + ▼ |
|
| |
125
| +┌─────────────────────────────────────┐ |
|
| |
126
| +│ Aliyun DashScope API │ |
|
| |
127
| +│ text-embedding-v4 │ |
|
| |
128
| +│ (HTTP/REST) │ |
|
| |
129
| +└─────────────────────────────────────┘ |
|
| |
130
| +``` |
|
| |
131
| + |
|
| |
132
| +**核心特性**: |
|
| |
133
| +- 无需GPU资源 |
|
| |
134
| +- 按使用量计费 |
|
| |
135
| +- 自动扩展 |
|
| |
136
| +- 低运维成本 |
|
| |
137
| + |
|
| |
138
| +### 2.3 架构对比 |
|
| |
139
| + |
|
| |
140
| +| 维度 | 本地服务 | 云端服务 | |
|
| |
141
| +|------|---------|---------| |
|
| |
142
| +| **初始成本** | 高(GPU服务器) | 低(按需付费) | |
|
| |
143
| +| **运行成本** | 固定 | 变动(按调用量) | |
|
| |
144
| +| **延迟** | <100ms | 300-400ms | |
|
| |
145
| +| **吞吐量** | 高(~32 qps) | 中(~2-3 qps) | |
|
| |
146
| +| **离线支持** | ✅ | ❌ | |
|
| |
147
| +| **维护成本** | 高 | 低 | |
|
| |
148
| +| **扩展性** | 手动扩展 | 自动扩展 | |
|
| |
149
| +| **适用场景** | 大规模生产环境 | 初期开发/小规模应用 | |
|
| |
150
| + |
|
| |
151
| +--- |
|
| |
152
| + |
|
| |
153
| +## 本地向量化服务 |
|
| |
154
| + |
|
| |
155
| +### 3.1 服务启动 |
|
| |
156
| + |
|
| |
157
| +#### 方式1:使用脚本启动(推荐) |
|
| |
158
| + |
|
| |
159
| +```bash |
|
| |
160
| +# 启动向量化服务 |
|
| |
161
| +./scripts/start_embedding_service.sh |
|
| |
162
| +``` |
|
| |
163
| + |
|
| |
164
| +脚本特性: |
|
| |
165
| +- 自动激活conda环境 |
|
| |
166
| +- 读取配置文件获取端口 |
|
| |
167
| +- 单worker模式启动服务 |
|
| |
168
| + |
|
| |
169
| +#### 方式2:手动启动 |
|
| |
170
| + |
|
| |
171
| +```bash |
|
| |
172
| +# 激活环境 |
|
| |
173
| +source /home/tw/miniconda3/etc/profile.d/conda.sh |
|
| |
174
| +conda activate searchengine |
|
| |
175
| + |
|
| |
176
| +# 启动服务 |
|
| |
177
| +python -m uvicorn embeddings.server:app \ |
|
| |
178
| + --host 0.0.0.0 \ |
|
| |
179
| + --port 6005 \ |
|
| |
180
| + --workers 1 |
|
| |
181
| +``` |
|
| |
182
| + |
|
| |
183
| +#### 方式3:Docker部署(生产环境) |
|
| |
184
| + |
|
| |
185
| +```bash |
|
| |
186
| +# 构建镜像 |
|
| |
187
| +docker build -t searchengine-embedding:latest . |
|
| |
188
| + |
|
| |
189
| +# 启动容器 |
|
| |
190
| +docker run -d \ |
|
| |
191
| + --name embedding-service \ |
|
| |
192
| + --gpus all \ |
|
| |
193
| + -p 6005:6005 \ |
|
| |
194
| + searchengine-embedding:latest |
|
| |
195
| +``` |
|
| |
196
| + |
|
| |
197
| +### 3.2 服务配置 |
|
| |
198
| + |
|
| |
199
| +配置文件:`embeddings/config.py` |
|
| |
200
| + |
|
| |
201
| +```python |
|
| |
202
| +class EmbeddingConfig: |
|
| |
203
| + # 服务配置 |
|
| |
204
| + HOST = "0.0.0.0" # 监听地址 |
|
| |
205
| + PORT = 6005 # 监听端口 |
|
| |
206
| + |
|
| |
207
| + # 文本模型 (BGE-M3) |
|
| |
208
| + TEXT_MODEL_DIR = "Xorbits/bge-m3" # 模型路径/HuggingFace ID |
|
| |
209
| + TEXT_DEVICE = "cuda" # 设备: "cuda" 或 "cpu" |
|
| |
210
| + TEXT_BATCH_SIZE = 32 # 批处理大小 |
|
| |
211
| + |
|
| |
212
| + # 图片模型 (CN-CLIP) |
|
| |
213
| + IMAGE_MODEL_NAME = "ViT-H-14" # 模型名称 |
|
| |
214
| + IMAGE_DEVICE = None # None=自动, "cuda", "cpu" |
|
| |
215
| + IMAGE_BATCH_SIZE = 8 # 批处理大小 |
|
| |
216
| +``` |
|
| |
217
| + |
|
| |
218
| +### 3.3 模型说明 |
|
| |
219
| + |
|
| |
220
| +#### BGE-M3 文本模型 |
|
| |
221
| + |
|
| |
222
| +- **模型ID**: `Xorbits/bge-m3` |
|
| |
223
| +- **向量维度**: 1024 |
|
| |
224
| +- **支持语言**: 中文、英文、多语言(100+) |
|
| |
225
| +- **特性**: 强大的语义理解能力,支持长文本 |
|
| |
226
| +- **部署**: 自动从HuggingFace下载 |
|
| |
227
| + |
|
| |
228
| +#### CN-CLIP 图片模型 |
|
| |
229
| + |
|
| |
230
| +- **模型**: ViT-H-14 (Chinese CLIP) |
|
| |
231
| +- **向量维度**: 1024 |
|
| |
232
| +- **输入**: 图片URL或本地路径 |
|
| |
233
| +- **特性**: 中文图文理解,适合电商场景 |
|
| |
234
| +- **预处理**: 自动下载、缩放、归一化 |
|
| |
235
| + |
|
| |
236
| +--- |
|
| |
237
| + |
|
| |
238
| +## 云端向量化服务 |
|
| |
239
| + |
|
| |
240
| +### 4.1 阿里云DashScope |
|
| |
241
| + |
|
| |
242
| +**服务地址**: |
|
| |
243
| +- 北京地域:`https://dashscope.aliyuncs.com/compatible-mode/v1` |
|
| |
244
| +- 新加坡地域:`https://dashscope-intl.aliyuncs.com/compatible-mode/v1` |
|
| |
245
| + |
|
| |
246
| +**模型信息**: |
|
| |
247
| +- **模型名**: `text-embedding-v4` |
|
| |
248
| +- **向量维度**: 1024 |
|
| |
249
| +- **输入限制**: 单次最多2048个文本,每个文本最大8192 token |
|
| |
250
| +- **速率限制**: 根据API套餐不同而不同 |
|
| |
251
| + |
|
| |
252
| +### 4.2 API Key配置 |
|
| |
253
| + |
|
| |
254
| +#### 方式1:环境变量(推荐) |
|
| |
255
| + |
|
| |
256
| +```bash |
|
| |
257
| +# 临时设置 |
|
| |
258
| +export DASHSCOPE_API_KEY="sk-your-api-key-here" |
|
| |
259
| + |
|
| |
260
| +# 永久设置(添加到 ~/.bashrc 或 ~/.zshrc) |
|
| |
261
| +echo 'export DASHSCOPE_API_KEY="sk-your-api-key-here"' >> ~/.bashrc |
|
| |
262
| +source ~/.bashrc |
|
| |
263
| +``` |
|
| |
264
| + |
|
| |
265
| +#### 方式2:.env文件 |
|
| |
266
| + |
|
| |
267
| +在项目根目录创建`.env`文件: |
|
| |
268
| + |
|
| |
269
| +```bash |
|
| |
270
| +DASHSCOPE_API_KEY=sk-your-api-key-here |
|
| |
271
| +``` |
|
| |
272
| + |
|
| |
273
| +**获取API Key**:https://help.aliyun.com/zh/model-studio/get-api-key |
|
| |
274
| + |
|
| |
275
| +### 4.3 使用方式 |
|
| |
276
| + |
|
| |
277
| +```python |
|
| |
278
| +from embeddings.cloud_text_encoder import CloudTextEncoder |
|
| |
279
| + |
|
| |
280
| +# 初始化编码器(自动从环境变量读取API Key) |
|
| |
281
| +encoder = CloudTextEncoder() |
|
| |
282
| + |
|
| |
283
| +# 单个文本向量化 |
|
| |
284
| +text = "衣服的质量杠杠的" |
|
| |
285
| +embedding = encoder.encode(text) |
|
| |
286
| +print(embedding.shape) # (1, 1024) |
|
| |
287
| + |
|
| |
288
| +# 批量向量化 |
|
| |
289
| +texts = ["文本1", "文本2", "文本3"] |
|
| |
290
| +embeddings = encoder.encode(texts) |
|
| |
291
| +print(embeddings.shape) # (3, 1024) |
|
| |
292
| + |
|
| |
293
| +# 大批量处理(自动分批) |
|
| |
294
| +large_texts = [f"商品 {i}" for i in range(1000)] |
|
| |
295
| +embeddings = encoder.encode_batch(large_texts, batch_size=32) |
|
| |
296
| +``` |
|
| |
297
| + |
|
| |
298
| +**自定义配置**: |
|
| |
299
| + |
|
| |
300
| +```python |
|
| |
301
| +# 使用新加坡地域 |
|
| |
302
| +encoder = CloudTextEncoder( |
|
| |
303
| + api_key="sk-xxx", |
|
| |
304
| + base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1" |
|
| |
305
| +) |
|
| |
306
| +``` |
|
| |
307
| + |
|
| |
308
| +--- |
|
| |
309
| + |
|
| |
310
| +## Embedding API详细说明 |
|
| |
311
| + |
|
| |
312
| +### 5.1 API概览 |
|
| |
313
| + |
|
| |
314
| +本地向量化服务提供RESTful API接口: |
|
| |
315
| + |
|
| |
316
| +| 端点 | 方法 | 功能 | |
|
| |
317
| +|------|------|------| |
|
| |
318
| +| `/health` | GET | 健康检查 | |
|
| |
319
| +| `/embed/text` | POST | 文本向量化 | |
|
| |
320
| +| `/embed/image` | POST | 图片向量化 | |
|
| |
321
| + |
|
| |
322
| +**服务地址**: |
|
| |
323
| +- 默认:`http://localhost:6005` |
|
| |
324
| +- 生产:`http://<your-server>:6005` |
|
| |
325
| + |
|
| |
326
| +### 5.2 健康检查接口 |
|
| |
327
| + |
|
| |
328
| +```http |
|
| |
329
| +GET /health |
|
| |
330
| +``` |
|
| |
331
| + |
|
| |
332
| +**响应示例**: |
|
| |
333
| +```json |
|
| |
334
| +{ |
|
| |
335
| + "status": "ok", |
|
| |
336
| + "text_model_loaded": true, |
|
| |
337
| + "image_model_loaded": true |
|
| |
338
| +} |
|
| |
339
| +``` |
|
| |
340
| + |
|
| |
341
| +**字段说明**: |
|
| |
342
| +- `status`: 服务状态,"ok"表示正常 |
|
| |
343
| +- `text_model_loaded`: 文本模型是否加载成功 |
|
| |
344
| +- `image_model_loaded`: 图片模型是否加载成功 |
|
| |
345
| + |
|
| |
346
| +**cURL示例**: |
|
| |
347
| +```bash |
|
| |
348
| +curl http://localhost:6005/health |
|
| |
349
| +``` |
|
| |
350
| + |
|
| |
351
| +### 5.3 文本向量化接口 |
|
| |
352
| + |
|
| |
353
| +```http |
|
| |
354
| +POST /embed/text |
|
| |
355
| +Content-Type: application/json |
|
| |
356
| +``` |
|
| |
357
| + |
|
| |
358
| +#### 请求格式 |
|
| |
359
| + |
|
| |
360
| +**请求体**(JSON数组): |
|
| |
361
| +```json |
|
| |
362
| +[ |
|
| |
363
| + "衣服的质量杠杠的", |
|
| |
364
| + "Bohemian Maxi Dress", |
|
| |
365
| + "Vintage Denim Jacket" |
|
| |
366
| +] |
|
| |
367
| +``` |
|
| |
368
| + |
|
| |
369
| +**参数说明**: |
|
| |
370
| +- 类型:`List[str]` |
|
| |
371
| +- 长度:建议≤100(避免超时) |
|
| |
372
| +- 单个文本:建议≤512个字符 |
|
| |
373
| + |
|
| |
374
| +#### 响应格式 |
|
| |
375
| + |
|
| |
376
| +**成功响应**(200 OK): |
|
| |
377
| +```json |
|
| |
378
| +[ |
|
| |
379
| + [0.1234, -0.5678, 0.9012, ..., 0.3456], // 1024维向量 |
|
| |
380
| + [0.2345, 0.6789, -0.1234, ..., 0.4567], // 1024维向量 |
|
| |
381
| + [0.3456, -0.7890, 0.2345, ..., 0.5678] // 1024维向量 |
|
| |
382
| +] |
|
| |
383
| +``` |
|
| |
384
| + |
|
| |
385
| +**字段说明**: |
|
| |
386
| +- 类型:`List[List[float]]` |
|
| |
387
| +- 每个向量:1024个浮点数 |
|
| |
388
| +- 对齐原则:输出数组与输入数组按索引一一对应 |
|
| |
389
| +- 失败项:返回`null` |
|
| |
390
| + |
|
| |
391
| +**错误示例**: |
|
| |
392
| +```json |
|
| |
393
| +[ |
|
| |
394
| + [0.1234, -0.5678, ...], // 成功 |
|
| |
395
| + null, // 失败(空文本或其他错误) |
|
| |
396
| + [0.3456, 0.7890, ...] // 成功 |
|
| |
397
| +] |
|
| |
398
| +``` |
|
| |
399
| + |
|
| |
400
| +#### cURL示例 |
|
| |
401
| + |
|
| |
402
| +```bash |
|
| |
403
| +# 单个文本 |
|
| |
404
| +curl -X POST http://localhost:6005/embed/text \ |
|
| |
405
| + -H "Content-Type: application/json" \ |
|
| |
406
| + -d '["测试查询文本"]' |
|
| |
407
| + |
|
| |
408
| +# 批量文本 |
|
| |
409
| +curl -X POST http://localhost:6005/embed/text \ |
|
| |
410
| + -H "Content-Type: application/json" \ |
|
| |
411
| + -d '["红色连衣裙", "blue jeans", "vintage dress"]' |
|
| |
412
| +``` |
|
| |
413
| + |
|
| |
414
| +#### Python示例 |
|
| |
415
| + |
|
| |
416
| +```python |
|
| |
417
| +import requests |
|
| |
418
| +import numpy as np |
|
| |
419
| + |
|
| |
420
| +def embed_texts(texts): |
|
| |
421
| + """文本向量化""" |
|
| |
422
| + response = requests.post( |
|
| |
423
| + "http://localhost:6005/embed/text", |
|
| |
424
| + json=texts, |
|
| |
425
| + timeout=30 |
|
| |
426
| + ) |
|
| |
427
| + response.raise_for_status() |
|
| |
428
| + embeddings = response.json() |
|
| |
429
| + |
|
| |
430
| + # 转换为numpy数组 |
|
| |
431
| + valid_embeddings = [e for e in embeddings if e is not None] |
|
| |
432
| + return np.array(valid_embeddings) |
|
| |
433
| + |
|
| |
434
| +# 使用 |
|
| |
435
| +texts = ["红色连衣裙", "blue jeans"] |
|
| |
436
| +embeddings = embed_texts(texts) |
|
| |
437
| +print(f"Shape: {embeddings.shape}") # (2, 1024) |
|
| |
438
| + |
|
| |
439
| +# 计算相似度 |
|
| |
440
| +similarity = np.dot(embeddings[0], embeddings[1]) |
|
| |
441
| +print(f"Similarity: {similarity}") |
|
| |
442
| +``` |
|
| |
443
| + |
|
| |
444
| +### 5.4 图片向量化接口 |
|
| |
445
| + |
|
| |
446
| +```http |
|
| |
447
| +POST /embed/image |
|
| |
448
| +Content-Type: application/json |
|
| |
449
| +``` |
|
| |
450
| + |
|
| |
451
| +#### 请求格式 |
|
| |
452
| + |
|
| |
453
| +**请求体**(JSON数组): |
|
| |
454
| +```json |
|
| |
455
| +[ |
|
| |
456
| + "https://example.com/product1.jpg", |
|
| |
457
| + "https://example.com/product2.png", |
|
| |
458
| + "/local/path/to/product3.jpg" |
|
| |
459
| +] |
|
| |
460
| +``` |
|
| |
461
| + |
|
| |
462
| +**参数说明**: |
|
| |
463
| +- 类型:`List[str]` |
|
| |
464
| +- 支持:HTTP URL或本地文件路径 |
|
| |
465
| +- 格式:JPG、PNG等常见图片格式 |
|
| |
466
| +- 长度:建议≤10(图片处理较慢) |
|
| |
467
| + |
|
| |
468
| +#### 响应格式 |
|
| |
469
| + |
|
| |
470
| +**成功响应**(200 OK): |
|
| |
471
| +```json |
|
| |
472
| +[ |
|
| |
473
| + [0.1234, 0.5678, 0.9012, ..., 0.3456], // 1024维向量 |
|
| |
474
| + null, // 失败(图片无效或下载失败) |
|
| |
475
| + [0.3456, 0.7890, 0.2345, ..., 0.5678] // 1024维向量 |
|
| |
476
| +] |
|
| |
477
| +``` |
|
| |
478
| + |
|
| |
479
| +**特性**: |
|
| |
480
| +- 自动下载:HTTP URL自动下载图片 |
|
| |
481
| +- 逐个处理:串行处理(带锁保证线程安全) |
|
| |
482
| +- 容错:单个失败不影响其他图片 |
|
| |
483
| + |
|
| |
484
| +#### cURL示例 |
|
| |
485
| + |
|
| |
486
| +```bash |
|
| |
487
| +# 单个图片(URL) |
|
| |
488
| +curl -X POST http://localhost:6005/embed/image \ |
|
| |
489
| + -H "Content-Type: application/json" \ |
|
| |
490
| + -d '["https://example.com/product.jpg"]' |
|
| |
491
| + |
|
| |
492
| +# 多个图片(混合URL和本地路径) |
|
| |
493
| +curl -X POST http://localhost:6005/embed/image \ |
|
| |
494
| + -H "Content-Type: application/json" \ |
|
| |
495
| + -d '["https://example.com/img1.jpg", "/data/images/img2.png"]' |
|
| |
496
| +``` |
|
| |
497
| + |
|
| |
498
| +#### Python示例 |
|
| |
499
| + |
|
| |
500
| +```python |
|
| |
501
| +import requests |
|
| |
502
| +import numpy as np |
|
| |
503
| + |
|
| |
504
| +def embed_images(image_urls): |
|
| |
505
| + """图片向量化""" |
|
| |
506
| + response = requests.post( |
|
| |
507
| + "http://localhost:6005/embed/image", |
|
| |
508
| + json=image_urls, |
|
| |
509
| + timeout=120 # 图片处理较慢,设置更长超时 |
|
| |
510
| + ) |
|
| |
511
| + response.raise_for_status() |
|
| |
512
| + embeddings = response.json() |
|
| |
513
| + |
|
| |
514
| + # 过滤成功的向量化结果 |
|
| |
515
| + valid_embeddings = [(url, emb) for url, emb in zip(image_urls, embeddings) if emb is not None] |
|
| |
516
| + return valid_embeddings |
|
| |
517
| + |
|
| |
518
| +# 使用 |
|
| |
519
| +image_urls = [ |
|
| |
520
| + "https://example.com/dress1.jpg", |
|
| |
521
| + "https://example.com/dress2.jpg" |
|
| |
522
| +] |
|
| |
523
| + |
|
| |
524
| +results = embed_images(image_urls) |
|
| |
525
| +for url, embedding in results: |
|
| |
526
| + print(f"{url}: {len(embedding)} dimensions") |
|
| |
527
| +``` |
|
| |
528
| + |
|
| |
529
| +### 5.5 错误处理 |
|
| |
530
| + |
|
| |
531
| +#### HTTP状态码 |
|
| |
532
| + |
|
| |
533
| +| 状态码 | 含义 | 处理方式 | |
|
| |
534
| +|--------|------|---------| |
|
| |
535
| +| 200 | 成功 | 正常处理响应 | |
|
| |
536
| +| 500 | 服务器错误 | 检查服务日志 | |
|
| |
537
| +| 503 | 服务不可用 | 模型未加载,检查启动日志 | |
|
| |
538
| + |
|
| |
539
| +#### 常见错误场景 |
|
| |
540
| + |
|
| |
541
| +1. **模型未加载** |
|
| |
542
| +```json |
|
| |
543
| +{ |
|
| |
544
| + "detail": "Runtime Error: Text model not loaded" |
|
| |
545
| +} |
|
| |
546
| +``` |
|
| |
547
| +**解决**:检查服务启动日志,确认模型加载成功 |
|
| |
548
| + |
|
| |
549
| +2. **无效输入** |
|
| |
550
| +```json |
|
| |
551
| +[null, null] |
|
| |
552
| +``` |
|
| |
553
| +**原因**:输入包含空字符串或None |
|
| |
554
| + |
|
| |
555
| +3. **图片下载失败** |
|
| |
556
| +```json |
|
| |
557
| +[ |
|
| |
558
| + [0.123, ...], |
|
| |
559
| + null // URL无效或网络问题 |
|
| |
560
| +] |
|
| |
561
| +``` |
|
| |
562
| +**解决**:检查URL是否可访问 |
|
| |
563
| + |
|
| |
564
| +--- |
|
| |
565
| + |
|
| |
566
| +## 配置说明 |
|
| |
567
| + |
|
| |
568
| +### 6.1 服务配置 |
|
| |
569
| + |
|
| |
570
| +编辑 `embeddings/config.py` 修改服务配置: |
|
| |
571
| + |
|
| |
572
| +```python |
|
| |
573
| +class EmbeddingConfig: |
|
| |
574
| + # ========== 服务配置 ========== |
|
| |
575
| + HOST = "0.0.0.0" # 监听所有网卡 |
|
| |
576
| + PORT = 6005 # 默认端口 |
|
| |
577
| +``` |
|
| |
578
| + |
|
| |
579
| +**生产环境建议**: |
|
| |
580
| +- 使用反向代理(Nginx)处理SSL |
|
| |
581
| +- 配置防火墙规则限制访问 |
|
| |
582
| +- 使用Docker容器隔离 |
|
| |
583
| + |
|
| |
584
| +### 6.2 模型配置 |
|
| |
585
| + |
|
| |
586
| +#### 文本模型配置 |
|
| |
587
| + |
|
| |
588
| +```python |
|
| |
589
| +# ========== BGE-M3 文本模型 ========== |
|
| |
590
| +TEXT_MODEL_DIR = "Xorbits/bge-m3" # HuggingFace模型ID |
|
| |
591
| +TEXT_DEVICE = "cuda" # 设备选择 |
|
| |
592
| +TEXT_BATCH_SIZE = 32 # 批处理大小 |
|
| |
593
| +``` |
|
| |
594
| + |
|
| |
595
| +**DEVICE选择**: |
|
| |
596
| +- `"cuda"`: GPU加速(推荐,需要CUDA) |
|
| |
597
| +- `"cpu"`: CPU模式(较慢,但兼容性好) |
|
| |
598
| + |
|
| |
599
| +**批处理大小建议**: |
|
| |
600
| +- GPU(16GB显存):32-64 |
|
| |
601
| +- GPU(8GB显存):16-32 |
|
| |
602
| +- CPU:8-16 |
|
| |
603
| + |
|
| |
604
| +#### 图片模型配置 |
|
| |
605
| + |
|
| |
606
| +```python |
|
| |
607
| +# ========== CN-CLIP 图片模型 ========== |
|
| |
608
| +IMAGE_MODEL_NAME = "ViT-H-14" # 模型名称 |
|
| |
609
| +IMAGE_DEVICE = None # None=自动检测 |
|
| |
610
| +IMAGE_BATCH_SIZE = 8 # 批处理大小 |
|
| |
611
| +``` |
|
| |
612
| + |
|
| |
613
| +**IMAGE_DEVICE选择**: |
|
| |
614
| +- `None`: 自动检测(推荐) |
|
| |
615
| +- `"cuda"`: 强制使用GPU |
|
| |
616
| +- `"cpu"`: 强制使用CPU |
|
| |
617
| + |
|
| |
618
| +### 6.3 批处理配置 |
|
| |
619
| + |
|
| |
620
| +**批处理大小调优**: |
|
| |
621
| + |
|
| |
622
| +| 场景 | 文本Batch Size | 图片Batch Size | 说明 | |
|
| |
623
| +|------|---------------|---------------|------| |
|
| |
624
| +| 开发测试 | 16 | 1 | 快速响应 | |
|
| |
625
| +| 生产环境(GPU) | 32-64 | 4-8 | 平衡性能 | |
|
| |
626
| +| 生产环境(CPU) | 8-16 | 1-2 | 避免内存溢出 | |
|
| |
627
| +| 离线批处理 | 128+ | 16+ | 最大化吞吐 | |
|
| |
628
| + |
|
| |
629
| +**批处理建议**: |
|
| |
630
| +1. 监控GPU内存使用:`nvidia-smi` |
|
| |
631
| +2. 逐步增加batch_size直到OOM |
|
| |
632
| +3. 预留20%内存余量 |
|
| |
633
| + |
|
| |
634
| +--- |
|
| |
635
| + |
|
| |
636
| +## 客户端集成示例 |
|
| |
637
| + |
|
| |
638
| +### 7.1 Python客户端 |
|
| |
639
| + |
|
| |
640
| +#### 基础客户端类 |
|
| |
641
| + |
|
| |
642
| +```python |
|
| |
643
| +import requests |
|
| |
644
| +from typing import List, Optional |
|
| |
645
| +import numpy as np |
|
| |
646
| + |
|
| |
647
| +class EmbeddingServiceClient: |
|
| |
648
| + """向量化服务客户端""" |
|
| |
649
| + |
|
| |
650
| + def __init__(self, base_url: str = "http://localhost:6005"): |
|
| |
651
| + self.base_url = base_url.rstrip('/') |
|
| |
652
| + self.timeout = 30 |
|
| |
653
| + |
|
| |
654
| + def health_check(self) -> dict: |
|
| |
655
| + """健康检查""" |
|
| |
656
| + response = requests.get(f"{self.base_url}/health", timeout=5) |
|
| |
657
| + response.raise_for_status() |
|
| |
658
| + return response.json() |
|
| |
659
| + |
|
| |
660
| + def embed_text(self, text: str) -> Optional[List[float]]: |
|
| |
661
| + """单个文本向量化""" |
|
| |
662
| + result = self.embed_texts([text]) |
|
| |
663
| + return result[0] if result else None |
|
| |
664
| + |
|
| |
665
| + def embed_texts(self, texts: List[str]) -> List[Optional[List[float]]]: |
|
| |
666
| + """批量文本向量化""" |
|
| |
667
| + if not texts: |
|
| |
668
| + return [] |
|
| |
669
| + |
|
| |
670
| + response = requests.post( |
|
| |
671
| + f"{self.base_url}/embed/text", |
|
| |
672
| + json=texts, |
|
| |
673
| + timeout=self.timeout |
|
| |
674
| + ) |
|
| |
675
| + response.raise_for_status() |
|
| |
676
| + return response.json() |
|
| |
677
| + |
|
| |
678
| + def embed_image(self, image_url: str) -> Optional[List[float]]: |
|
| |
679
| + """单个图片向量化""" |
|
| |
680
| + result = self.embed_images([image_url]) |
|
| |
681
| + return result[0] if result else None |
|
| |
682
| + |
|
| |
683
| + def embed_images(self, image_urls: List[str]) -> List[Optional[List[float]]]: |
|
| |
684
| + """批量图片向量化""" |
|
| |
685
| + if not image_urls: |
|
| |
686
| + return [] |
|
| |
687
| + |
|
| |
688
| + response = requests.post( |
|
| |
689
| + f"{self.base_url}/embed/image", |
|
| |
690
| + json=image_urls, |
|
| |
691
| + timeout=120 # 图片处理需要更长时间 |
|
| |
692
| + ) |
|
| |
693
| + response.raise_for_status() |
|
| |
694
| + return response.json() |
|
| |
695
| + |
|
| |
696
| + def embed_texts_to_numpy(self, texts: List[str]) -> Optional[np.ndarray]: |
|
| |
697
| + """批量文本向量化,返回numpy数组""" |
|
| |
698
| + embeddings = self.embed_texts(texts) |
|
| |
699
| + valid_embeddings = [e for e in embeddings if e is not None] |
|
| |
700
| + if not valid_embeddings: |
|
| |
701
| + return None |
|
| |
702
| + return np.array(valid_embeddings, dtype=np.float32) |
|
| |
703
| + |
|
| |
704
| +# 使用示例 |
|
| |
705
| +if __name__ == "__main__": |
|
| |
706
| + client = EmbeddingServiceClient() |
|
| |
707
| + |
|
| |
708
| + # 健康检查 |
|
| |
709
| + health = client.health_check() |
|
| |
710
| + print(f"Service status: {health}") |
|
| |
711
| + |
|
| |
712
| + # 文本向量化 |
|
| |
713
| + texts = ["红色连衣裙", "blue jeans", "vintage dress"] |
|
| |
714
| + embeddings = client.embed_texts_to_numpy(texts) |
|
| |
715
| + print(f"Embeddings shape: {embeddings.shape}") |
|
| |
716
| + |
|
| |
717
| + # 计算相似度 |
|
| |
718
| + from sklearn.metrics.pairwise import cosine_similarity |
|
| |
719
| + similarities = cosine_similarity(embeddings) |
|
| |
720
| + print(f"Similarity matrix:\n{similarities}") |
|
| |
721
| +``` |
|
| |
722
| + |
|
| |
723
| +#### 高级用法:异步客户端 |
|
| |
724
| + |
|
| |
725
| +```python |
|
| |
726
| +import aiohttp |
|
| |
727
| +import asyncio |
|
| |
728
| +from typing import List, Optional |
|
| |
729
| + |
|
| |
730
| +class AsyncEmbeddingClient: |
|
| |
731
| + """异步向量化服务客户端""" |
|
| |
732
| + |
|
| |
733
| + def __init__(self, base_url: str = "http://localhost:6005"): |
|
| |
734
| + self.base_url = base_url.rstrip('/') |
|
| |
735
| + self.session: Optional[aiohttp.ClientSession] = None |
|
| |
736
| + |
|
| |
737
| + async def __aenter__(self): |
|
| |
738
| + self.session = aiohttp.ClientSession() |
|
| |
739
| + return self |
|
| |
740
| + |
|
| |
741
| + async def __aexit__(self, exc_type, exc_val, exc_tb): |
|
| |
742
| + if self.session: |
|
| |
743
| + await self.session.close() |
|
| |
744
| + |
|
| |
745
| + async def embed_texts(self, texts: List[str]) -> List[Optional[List[float]]]: |
|
| |
746
| + """异步批量文本向量化""" |
|
| |
747
| + if not texts: |
|
| |
748
| + return [] |
|
| |
749
| + |
|
| |
750
| + if not self.session: |
|
| |
751
| + raise RuntimeError("Client not initialized. Use 'async with'.") |
|
| |
752
| + |
|
| |
753
| + async with self.session.post( |
|
| |
754
| + f"{self.base_url}/embed/text", |
|
| |
755
| + json=texts, |
|
| |
756
| + timeout=aiohttp.ClientTimeout(total=30) |
|
| |
757
| + ) as response: |
|
| |
758
| + response.raise_for_status() |
|
| |
759
| + return await response.json() |
|
| |
760
| + |
|
| |
761
| +# 使用示例 |
|
| |
762
| +async def main(): |
|
| |
763
| + async with AsyncEmbeddingClient() as client: |
|
| |
764
| + texts = ["text1", "text2", "text3"] |
|
| |
765
| + embeddings = await client.embed_texts(texts) |
|
| |
766
| + print(f"Got {len(embeddings)} embeddings") |
|
| |
767
| + |
|
| |
768
| +asyncio.run(main()) |
|
| |
769
| +``` |
|
| |
770
| + |
|
| |
771
| +### 7.2 Java客户端 |
|
| |
772
| + |
|
| |
773
| +#### 基础客户端类 |
|
| |
774
| + |
|
| |
775
| +```java |
|
| |
776
| +import java.net.URI; |
|
| |
777
| +import java.net.http.HttpClient; |
|
| |
778
| +import java.net.http.HttpRequest; |
|
| |
779
| +import java.net.http.HttpResponse; |
|
| |
780
| +import java.time.Duration; |
|
| |
781
| +import java.util.List; |
|
| |
782
| +import com.fasterxml.jackson.databind.ObjectMapper; |
|
| |
783
| +import com.fasterxml.jackson.databind.JsonNode; |
|
| |
784
| +import com.fasterxml.jackson.databind.node.ArrayNode; |
|
| |
785
| + |
|
| |
786
| +public class EmbeddingServiceClient { |
|
| |
787
| + private final HttpClient httpClient; |
|
| |
788
| + private final ObjectMapper objectMapper; |
|
| |
789
| + private final String baseUrl; |
|
| |
790
| + |
|
| |
791
| + public EmbeddingServiceClient(String baseUrl) { |
|
| |
792
| + this.baseUrl = baseUrl.replaceAll("/$", ""); |
|
| |
793
| + this.httpClient = HttpClient.newBuilder() |
|
| |
794
| + .connectTimeout(Duration.ofSeconds(10)) |
|
| |
795
| + .build(); |
|
| |
796
| + this.objectMapper = new ObjectMapper(); |
|
| |
797
| + } |
|
| |
798
| + |
|
| |
799
| + /** |
|
| |
800
| + * 健康检查 |
|
| |
801
| + */ |
|
| |
802
| + public HealthStatus healthCheck() throws Exception { |
|
| |
803
| + HttpRequest request = HttpRequest.newBuilder() |
|
| |
804
| + .uri(URI.create(baseUrl + "/health")) |
|
| |
805
| + .timeout(Duration.ofSeconds(5)) |
|
| |
806
| + .GET() |
|
| |
807
| + .build(); |
|
| |
808
| + |
|
| |
809
| + HttpResponse<String> response = httpClient.send( |
|
| |
810
| + request, |
|
| |
811
| + HttpResponse.BodyHandlers.ofString() |
|
| |
812
| + ); |
|
| |
813
| + |
|
| |
814
| + JsonNode json = objectMapper.readTree(response.body()); |
|
| |
815
| + return new HealthStatus( |
|
| |
816
| + json.get("status").asText(), |
|
| |
817
| + json.get("text_model_loaded").asBoolean(), |
|
| |
818
| + json.get("image_model_loaded").asBoolean() |
|
| |
819
| + ); |
|
| |
820
| + } |
|
| |
821
| + |
|
| |
822
| + /** |
|
| |
823
| + * 批量文本向量化 |
|
| |
824
| + */ |
|
| |
825
| + public List<float[]> embedTexts(List<String> texts) throws Exception { |
|
| |
826
| + // 构建请求体 |
|
| |
827
| + ArrayNode requestBody = objectMapper.createArrayNode(); |
|
| |
828
| + for (String text : texts) { |
|
| |
829
| + requestBody.add(text); |
|
| |
830
| + } |
|
| |
831
| + |
|
| |
832
| + HttpRequest request = HttpRequest.newBuilder() |
|
| |
833
| + .uri(URI.create(baseUrl + "/embed/text")) |
|
| |
834
| + .header("Content-Type", "application/json") |
|
| |
835
| + .timeout(Duration.ofSeconds(30)) |
|
| |
836
| + .POST(HttpRequest.BodyPublishers.ofString( |
|
| |
837
| + objectMapper.writeValueAsString(requestBody) |
|
| |
838
| + )) |
|
| |
839
| + .build(); |
|
| |
840
| + |
|
| |
841
| + HttpResponse<String> response = httpClient.send( |
|
| |
842
| + request, |
|
| |
843
| + HttpResponse.BodyHandlers.ofString() |
|
| |
844
| + ); |
|
| |
845
| + |
|
| |
846
| + if (response.statusCode() != 200) { |
|
| |
847
| + throw new RuntimeException("API error: " + response.body()); |
|
| |
848
| + } |
|
| |
849
| + |
|
| |
850
| + // 解析响应 |
|
| |
851
| + JsonNode root = objectMapper.readTree(response.body()); |
|
| |
852
| + List<float[]> embeddings = new java.util.ArrayList<>(); |
|
| |
853
| + |
|
| |
854
| + for (JsonNode item : root) { |
|
| |
855
| + if (item.isNull()) { |
|
| |
856
| + embeddings.add(null); |
|
| |
857
| + } else { |
|
| |
858
| + float[] vector = objectMapper.treeToValue(item, float[].class); |
|
| |
859
| + embeddings.add(vector); |
|
| |
860
| + } |
|
| |
861
| + } |
|
| |
862
| + |
|
| |
863
| + return embeddings; |
|
| |
864
| + } |
|
| |
865
| + |
|
| |
866
| + /** |
|
| |
867
| + * 计算余弦相似度 |
|
| |
868
| + */ |
|
| |
869
| + public static float cosineSimilarity(float[] v1, float[] v2) { |
|
| |
870
| + if (v1.length != v2.length) { |
|
| |
871
| + throw new IllegalArgumentException("Vectors must be same length"); |
|
| |
872
| + } |
|
| |
873
| + |
|
| |
874
| + float dotProduct = 0.0f; |
|
| |
875
| + float norm1 = 0.0f; |
|
| |
876
| + float norm2 = 0.0f; |
|
| |
877
| + |
|
| |
878
| + for (int i = 0; i < v1.length; i++) { |
|
| |
879
| + dotProduct += v1[i] * v2[i]; |
|
| |
880
| + norm1 += v1[i] * v1[i]; |
|
| |
881
| + norm2 += v2[i] * v2[i]; |
|
| |
882
| + } |
|
| |
883
| + |
|
| |
884
| + return (float) (dotProduct / (Math.sqrt(norm1) * Math.sqrt(norm2))); |
|
| |
885
| + } |
|
| |
886
| + |
|
| |
887
| + // 健康状态数据类 |
|
| |
888
| + public static class HealthStatus { |
|
| |
889
| + public final String status; |
|
| |
890
| + public final boolean textModelLoaded; |
|
| |
891
| + public final boolean imageModelLoaded; |
|
| |
892
| + |
|
| |
893
| + public HealthStatus(String status, boolean textModelLoaded, boolean imageModelLoaded) { |
|
| |
894
| + this.status = status; |
|
| |
895
| + this.textModelLoaded = textModelLoaded; |
|
| |
896
| + this.imageModelLoaded = imageModelLoaded; |
|
| |
897
| + } |
|
| |
898
| + |
|
| |
899
| + @Override |
|
| |
900
| + public String toString() { |
|
| |
901
| + return String.format("HealthStatus{status='%s', textModelLoaded=%b, imageModelLoaded=%b}", |
|
| |
902
| + status, textModelLoaded, imageModelLoaded); |
|
| |
903
| + } |
|
| |
904
| + } |
|
| |
905
| + |
|
| |
906
| + // 使用示例 |
|
| |
907
| + public static void main(String[] args) throws Exception { |
|
| |
908
| + EmbeddingServiceClient client = new EmbeddingServiceClient("http://localhost:6005"); |
|
| |
909
| + |
|
| |
910
| + // 健康检查 |
|
| |
911
| + HealthStatus health = client.healthCheck(); |
|
| |
912
| + System.out.println("Health: " + health); |
|
| |
913
| + |
|
| |
914
| + // 文本向量化 |
|
| |
915
| + List<String> texts = List.of("红色连衣裙", "blue jeans", "vintage dress"); |
|
| |
916
| + List<float[]> embeddings = client.embedTexts(texts); |
|
| |
917
| + |
|
| |
918
| + System.out.println("Got " + embeddings.size() + " embeddings"); |
|
| |
919
| + for (int i = 0; i < embeddings.size(); i++) { |
|
| |
920
| + System.out.println("Embedding " + i + " dimensions: " + |
|
| |
921
| + (embeddings.get(i) != null ? embeddings.get(i).length : "null")); |
|
| |
922
| + } |
|
| |
923
| + |
|
| |
924
| + // 计算相似度 |
|
| |
925
| + if (embeddings.get(0) != null && embeddings.get(1) != null) { |
|
| |
926
| + float similarity = cosineSimilarity(embeddings.get(0), embeddings.get(1)); |
|
| |
927
| + System.out.println("Similarity between text 0 and 1: " + similarity); |
|
| |
928
| + } |
|
| |
929
| + } |
|
| |
930
| +} |
|
| |
931
| +``` |
|
| |
932
| + |
|
| |
933
| +**Maven依赖**(`pom.xml`): |
|
| |
934
| + |
|
| |
935
| +```xml |
|
| |
936
| +<dependencies> |
|
| |
937
| + <dependency> |
|
| |
938
| + <groupId>com.fasterxml.jackson.core</groupId> |
|
| |
939
| + <artifactId>jackson-databind</artifactId> |
|
| |
940
| + <version>2.15.2</version> |
|
| |
941
| + </dependency> |
|
| |
942
| +</dependencies> |
|
| |
943
| +``` |
|
| |
944
| + |
|
| |
945
| +### 7.3 cURL示例 |
|
| |
946
| + |
|
| |
947
| +#### 健康检查 |
|
| |
948
| + |
|
| |
949
| +```bash |
|
| |
950
| +curl http://localhost:6005/health |
|
| |
951
| +``` |
|
| |
952
| + |
|
| |
953
| +#### 文本向量化 |
|
| |
954
| + |
|
| |
955
| +```bash |
|
| |
956
| +# 单个文本 |
|
| |
957
| +curl -X POST http://localhost:6005/embed/text \ |
|
| |
958
| + -H "Content-Type: application/json" \ |
|
| |
959
| + -d '["衣服的质量杠杠的"]' \ |
|
| |
960
| + | jq '.[0][0:10]' # 打印前10维 |
|
| |
961
| + |
|
| |
962
| +# 批量文本 |
|
| |
963
| +curl -X POST http://localhost:6005/embed/text \ |
|
| |
964
| + -H "Content-Type: application/json" \ |
|
| |
965
| + -d '["红色连衣裙", "blue jeans", "vintage dress"]' \ |
|
| |
966
| + | jq '. | length' # 检查返回数量 |
|
| |
967
| +``` |
|
| |
968
| + |
|
| |
969
| +#### 图片向量化 |
|
| |
970
| + |
|
| |
971
| +```bash |
|
| |
972
| +# URL图片 |
|
| |
973
| +curl -X POST http://localhost:6005/embed/image \ |
|
| |
974
| + -H "Content-Type: application/json" \ |
|
| |
975
| + -d '["https://example.com/product.jpg"]' \ |
|
| |
976
| + | jq '.[0][0:5]' |
|
| |
977
| + |
|
| |
978
| +# 本地图片 |
|
| |
979
| +curl -X POST http://localhost:6005/embed/image \ |
|
| |
980
| + -H "Content-Type: application/json" \ |
|
| |
981
| + -d '["/data/images/product.jpg"]' |
|
| |
982
| +``` |
|
| |
983
| + |
|
| |
984
| +#### 错误处理示例 |
|
| |
985
| + |
|
| |
986
| +```bash |
|
| |
987
| +# 检查服务状态 |
|
| |
988
| +if ! curl -f http://localhost:6005/health > /dev/null 2>&1; then |
|
| |
989
| + echo "Embedding service is not healthy!" |
|
| |
990
| + exit 1 |
|
| |
991
| +fi |
|
| |
992
| + |
|
| |
993
| +# 调用API并检查错误 |
|
| |
994
| +response=$(curl -s -X POST http://localhost:6005/embed/text \ |
|
| |
995
| + -H "Content-Type: application/json" \ |
|
| |
996
| + -d '["test query"]') |
|
| |
997
| + |
|
| |
998
| +if echo "$response" | jq -e '.[0] == null' > /dev/null; then |
|
| |
999
| + echo "Embedding failed!" |
|
| |
1000
| + echo "$response" |
|
| |
1001
| + exit 1 |
|
| |
1002
| +fi |
|
| |
1003
| + |
|
| |
1004
| +echo "Embedding succeeded!" |
|
| |
1005
| +``` |
|
| |
1006
| + |
|
| |
1007
| +--- |
|
| |
1008
| + |
|
| |
1009
| +## 性能对比与优化 |
|
| |
1010
| + |
|
| |
1011
| +### 8.1 性能对比 |
|
| |
1012
| + |
|
| |
1013
| +#### 本地服务性能 |
|
| |
1014
| + |
|
| |
1015
| +| 操作 | 硬件配置 | 延迟 | 吞吐量 | |
|
| |
1016
| +|------|---------|------|--------| |
|
| |
1017
| +| 文本向量化(单个) | GPU (RTX 3090) | ~80ms | ~12 qps | |
|
| |
1018
| +| 文本向量化(批量32) | GPU (RTX 3090) | ~2.5s | ~256 qps | |
|
| |
1019
| +| 文本向量化(单个) | CPU (16核) | ~500ms | ~2 qps | |
|
| |
1020
| +| 图片向量化(单个) | GPU (RTX 3090) | ~150ms | ~6 qps | |
|
| |
1021
| +| 图片向量化(批量4) | GPU (RTX 3090) | ~600ms | ~6 qps | |
|
| |
1022
| + |
|
| |
1023
| +#### 云端服务性能 |
|
| |
1024
| + |
|
| |
1025
| +| 操作 | 指标 | 值 | |
|
| |
1026
| +|------|------|-----| |
|
| |
1027
| +| 文本向量化(单个) | 延迟 | 300-400ms | |
|
| |
1028
| +| 文本向量化(批量) | 吞吐量 | ~2-3 qps | |
|
| |
1029
| +| API限制 | 速率限制 | 取决于套餐 | |
|
| |
1030
| +| 可用性 | SLA | 99.9% | |
|
| |
1031
| + |
|
| |
1032
| +### 8.2 成本对比 |
|
| |
1033
| + |
|
| |
1034
| +#### 本地服务成本 |
|
| |
1035
| + |
|
| |
1036
| +| 配置 | 硬件成本(月) | 电费(月) | 总成本(月) | |
|
| |
1037
| +|------|--------------|-----------|------------| |
|
| |
1038
| +| GPU服务器 (RTX 3090) | ¥3000 | ¥500 | ¥3500 | |
|
| |
1039
| +| GPU服务器 (A100) | ¥8000 | ¥800 | ¥8800 | |
|
| |
1040
| +| CPU服务器(16核) | ¥800 | ¥200 | ¥1000 | |
|
| |
1041
| + |
|
| |
1042
| +#### 云端服务成本 |
|
| |
1043
| + |
|
| |
1044
| +阿里云DashScope定价(参考): |
|
| |
1045
| + |
|
| |
1046
| +| 套餐 | 价格 | 调用量 | 适用场景 | |
|
| |
1047
| +|------|------|--------|---------| |
|
| |
1048
| +| 按量付费 | ¥0.0007/1K tokens | 无限制 | 测试/小规模 | |
|
| |
1049
| +| 基础版 | ¥100/月 | 1M tokens | 小规模应用 | |
|
| |
1050
| +| 专业版 | ¥500/月 | 10M tokens | 中等规模 | |
|
| |
1051
| +| 企业版 | 定制 | 无限制 | 大规模 | |
|
| |
1052
| + |
|
| |
1053
| +**成本计算示例**: |
|
| |
1054
| + |
|
| |
1055
| +假设每天10万次搜索,每次查询平均10个token: |
|
| |
1056
| +- 日调用量:1M tokens |
|
| |
1057
| +- 月调用量:30M tokens |
|
| |
1058
| +- 月成本:30 × 0.7 = ¥21(按量付费) |
|
| |
1059
| + |
|
| |
1060
| +### 8.3 优化建议 |
|
| |
1061
| + |
|
| |
1062
| +#### 本地服务优化 |
|
| |
1063
| + |
|
| |
1064
| +1. **GPU利用率优化** |
|
| |
1065
| +```python |
|
| |
1066
| +# 增加批处理大小 |
|
| |
1067
| +TEXT_BATCH_SIZE = 64 # 从32增加到64 |
|
| |
1068
| +``` |
|
| |
1069
| + |
|
| |
1070
| +2. **模型量化** |
|
| |
1071
| +```python |
|
| |
1072
| +# 使用半精度浮点数(节省显存) |
|
| |
1073
| +import torch |
|
| |
1074
| +model = model.half() # FP16 |
|
| |
1075
| +``` |
|
| |
1076
| + |
|
| |
1077
| +3. **预热模型** |
|
| |
1078
| +```python |
|
| |
1079
| +# 服务启动后预热 |
|
| |
1080
| +@app.on_event("startup") |
|
| |
1081
| +async def warmup(): |
|
| |
1082
| + _text_model.encode(["warmup"], device="cuda") |
|
| |
1083
| +``` |
|
| |
1084
| + |
|
| |
1085
| +4. **连接池优化** |
|
| |
1086
| +```python |
|
| |
1087
| +# uvicorn配置 |
|
| |
1088
| +--workers 1 \ # 单worker(GPU模型限制) |
|
| |
1089
| +--backlog 2048 \ # 增加连接队列 |
|
| |
1090
| +--limit-concurrency 32 # 限制并发数 |
|
| |
1091
| +``` |
|
| |
1092
| + |
|
| |
1093
| +#### 云端服务优化 |
|
| |
1094
| + |
|
| |
1095
| +1. **批量合并** |
|
| |
1096
| +```python |
|
| |
1097
| +# 累积多个请求后批量调用 |
|
| |
1098
| +class BatchEncoder: |
|
| |
1099
| + def __init__(self, batch_size=32, timeout=0.1): |
|
| |
1100
| + self.batch_size = batch_size |
|
| |
1101
| + self.timeout = timeout |
|
| |
1102
| + self.queue = [] |
|
| |
1103
| + |
|
| |
1104
| + async def encode(self, text: str): |
|
| |
1105
| + # 等待批量积累 |
|
| |
1106
| + future = asyncio.Future() |
|
| |
1107
| + self.queue.append((text, future)) |
|
| |
1108
| + |
|
| |
1109
| + if len(self.queue) >= self.batch_size: |
|
| |
1110
| + self._flush() |
|
| |
1111
| + |
|
| |
1112
| + return await future |
|
| |
1113
| +``` |
|
| |
1114
| + |
|
| |
1115
| +2. **本地缓存** |
|
| |
1116
| +```python |
|
| |
1117
| +import hashlib |
|
| |
1118
| +import pickle |
|
| |
1119
| + |
|
| |
1120
| +class CachedEncoder: |
|
| |
1121
| + def __init__(self, cache_file="embedding_cache.pkl"): |
|
| |
1122
| + self.cache = self._load_cache(cache_file) |
|
| |
1123
| + |
|
| |
1124
| + def encode(self, text: str): |
|
| |
1125
| + key = hashlib.md5(text.encode()).hexdigest() |
|
| |
1126
| + if key in self.cache: |
|
| |
1127
| + return self.cache[key] |
|
| |
1128
| + |
|
| |
1129
| + embedding = self._call_api(text) |
|
| |
1130
| + self.cache[key] = embedding |
|
| |
1131
| + return embedding |
|
| |
1132
| +``` |
|
| |
1133
| + |
|
| |
1134
| +3. **降级策略** |
|
| |
1135
| +```python |
|
| |
1136
| +class HybridEncoder: |
|
| |
1137
| + def __init__(self): |
|
| |
1138
| + self.cloud_encoder = CloudTextEncoder() |
|
| |
1139
| + self.local_encoder = None # 按需加载 |
|
| |
1140
| + |
|
| |
1141
| + def encode(self, text: str): |
|
| |
1142
| + try: |
|
| |
1143
| + return self.cloud_encoder.encode(text) |
|
| |
1144
| + except Exception as e: |
|
| |
1145
| + logger.warning(f"Cloud API failed: {e}, falling back to local") |
|
| |
1146
| + if not self.local_encoder: |
|
| |
1147
| + self.local_encoder = BgeEncoder() |
|
| |
1148
| + return self.local_encoder.encode(text) |
|
| |
1149
| +``` |
|
| |
1150
| + |
|
| |
1151
| +--- |
|
| |
1152
| + |
|
| |
1153
| +## 故障排查 |
|
| |
1154
| + |
|
| |
1155
| +### 9.1 常见问题 |
|
| |
1156
| + |
|
| |
1157
| +#### 问题1:服务无法启动 |
|
| |
1158
| + |
|
| |
1159
| +**症状**: |
|
| |
1160
| +```bash |
|
| |
1161
| +$ ./scripts/start_embedding_service.sh |
|
| |
1162
| +Error: Port 6005 already in use |
|
| |
1163
| +``` |
|
| |
1164
| + |
|
| |
1165
| +**解决**: |
|
| |
1166
| +```bash |
|
| |
1167
| +# 检查端口占用 |
|
| |
1168
| +lsof -i :6005 |
|
| |
1169
| + |
|
| |
1170
| +# 杀死占用进程 |
|
| |
1171
| +kill -9 <PID> |
|
| |
1172
| + |
|
| |
1173
| +# 或者修改配置文件中的端口 |
|
| |
1174
| +# embeddings/config.py: PORT = 6006 |
|
| |
1175
| +``` |
|
| |
1176
| + |
|
| |
1177
| +#### 问题2:CUDA Out of Memory |
|
| |
1178
| + |
|
| |
1179
| +**症状**: |
|
| |
1180
| +``` |
|
| |
1181
| +RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB |
|
| |
1182
| +``` |
|
| |
1183
| + |
|
| |
1184
| +**解决**: |
|
| |
1185
| +```python |
|
| |
1186
| +# 减小批处理大小 |
|
| |
1187
| +TEXT_BATCH_SIZE = 16 # 从32减少到16 |
|
| |
1188
| + |
|
| |
1189
| +# 或者使用CPU模式 |
|
| |
1190
| +TEXT_DEVICE = "cpu" |
|
| |
1191
| +``` |
|
| |
1192
| + |
|
| |
1193
| +#### 问题3:模型下载失败 |
|
| |
1194
| + |
|
| |
1195
| +**症状**: |
|
| |
1196
| +``` |
|
| |
1197
| +OSError: Can't load tokenizer for 'Xorbits/bge-m3' |
|
| |
1198
| +``` |
|
| |
1199
| + |
|
| |
1200
| +**解决**: |
|
| |
1201
| +```bash |
|
| |
1202
| +# 手动下载模型 |
|
| |
1203
| +huggingface-cli download Xorbits/bge-m3 |
|
| |
1204
| + |
|
| |
1205
| +# 或使用镜像 |
|
| |
1206
| +export HF_ENDPOINT=https://hf-mirror.com |
|
| |
1207
| +``` |
|
| |
1208
| + |
|
| |
1209
| +#### 问题4:云端API Key无效 |
|
| |
1210
| + |
|
| |
1211
| +**症状**: |
|
| |
1212
| +``` |
|
| |
1213
| +ERROR: DASHSCOPE_API_KEY environment variable is not set! |
|
| |
1214
| +``` |
|
| |
1215
| + |
|
| |
1216
| +**解决**: |
|
| |
1217
| +```bash |
|
| |
1218
| +# 设置环境变量 |
|
| |
1219
| +export DASHSCOPE_API_KEY="sk-your-key" |
|
| |
1220
| + |
|
| |
1221
| +# 验证 |
|
| |
1222
| +echo $DASHSCOPE_API_KEY |
|
| |
1223
| +``` |
|
| |
1224
| + |
|
| |
1225
| +#### 问题5:API速率限制 |
|
| |
1226
| + |
|
| |
1227
| +**症状**: |
|
| |
1228
| +``` |
|
| |
1229
| +Rate limit exceeded. Please try again later. |
|
| |
1230
| +``` |
|
| |
1231
| + |
|
| |
1232
| +**解决**: |
|
| |
1233
| +```python |
|
| |
1234
| +# 添加延迟 |
|
| |
1235
| +import time |
|
| |
1236
| +for batch in batches: |
|
| |
1237
| + embeddings = encoder.encode_batch(batch) |
|
| |
1238
| + time.sleep(0.1) # 每批之间延迟100ms |
|
| |
1239
| +``` |
|
| |
1240
| + |
|
| |
1241
| +### 9.2 日志查看 |
|
| |
1242
| + |
|
| |
1243
| +#### 服务日志 |
|
| |
1244
| + |
|
| |
1245
| +```bash |
|
| |
1246
| +# 查看实时日志 |
|
| |
1247
| +./scripts/start_embedding_service.sh 2>&1 | tee embedding.log |
|
| |
1248
| + |
|
| |
1249
| +# 或使用systemd(如果配置了服务) |
|
| |
1250
| +journalctl -u embedding-service -f |
|
| |
1251
| +``` |
|
| |
1252
| + |
|
| |
1253
| +#### Python应用日志 |
|
| |
1254
| + |
|
| |
1255
| +```python |
|
| |
1256
| +import logging |
|
| |
1257
| + |
|
| |
1258
| +# 配置日志 |
|
| |
1259
| +logging.basicConfig( |
|
| |
1260
| + level=logging.INFO, |
|
| |
1261
| + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' |
|
| |
1262
| +) |
|
| |
1263
| + |
|
| |
1264
| +logger = logging.getLogger(__name__) |
|
| |
1265
| + |
|
| |
1266
| +# 使用 |
|
| |
1267
| +logger.info("Encoding texts...") |
|
| |
1268
| +logger.error("Encoding failed: %s", str(e)) |
|
| |
1269
| +``` |
|
| |
1270
| + |
|
| |
1271
| +#### GPU监控 |
|
| |
1272
| + |
|
| |
1273
| +```bash |
|
| |
1274
| +# 实时监控GPU使用 |
|
| |
1275
| +watch -n 1 nvidia-smi |
|
| |
1276
| + |
|
| |
1277
| +# 查看详细信息 |
|
| |
1278
| +nvidia-smi --query-gpu=timestamp,name,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.used,memory.free --format=csv |
|
| |
1279
| +``` |
|
| |
1280
| + |
|
| |
1281
| +### 9.3 性能调优 |
|
| |
1282
| + |
|
| |
1283
| +#### 性能分析 |
|
| |
1284
| + |
|
| |
1285
| +```python |
|
| |
1286
| +import time |
|
| |
1287
| +import numpy as np |
|
| |
1288
| + |
|
| |
1289
| +def benchmark_encoder(encoder, texts, iterations=100): |
|
| |
1290
| + """性能基准测试""" |
|
| |
1291
| + times = [] |
|
| |
1292
| + |
|
| |
1293
| + for i in range(iterations): |
|
| |
1294
| + start = time.time() |
|
| |
1295
| + embeddings = encoder.encode(texts) |
|
| |
1296
| + end = time.time() |
|
| |
1297
| + times.append(end - start) |
|
| |
1298
| + |
|
| |
1299
| + times = np.array(times) |
|
| |
1300
| + print(f"Mean: {times.mean():.3f}s") |
|
| |
1301
| + print(f"Std: {times.std():.3f}s") |
|
| |
1302
| + print(f"Min: {times.min():.3f}s") |
|
| |
1303
| + print(f"Max: {times.max():.3f}s") |
|
| |
1304
| + print(f"QPS: {len(texts) / times.mean():.2f}") |
|
| |
1305
| + |
|
| |
1306
| +# 使用 |
|
| |
1307
| +benchmark_encoder(encoder, texts=["test"] * 32, iterations=100) |
|
| |
1308
| +``` |
|
| |
1309
| + |
|
| |
1310
| +#### 内存分析 |
|
| |
1311
| + |
|
| |
1312
| +```bash |
|
| |
1313
| +# Python内存分析 |
|
| |
1314
| +pip install memory_profiler |
|
| |
1315
| + |
|
| |
1316
| +# 在代码中添加 |
|
| |
1317
| +from memory_profiler import profile |
|
| |
1318
| + |
|
| |
1319
| +@profile |
|
| |
1320
| +def encode_batch(texts): |
|
| |
1321
| + return encoder.encode(texts) |
|
| |
1322
| + |
|
| |
1323
| +# 运行 |
|
| |
1324
| +python -m memory_profiler script.py |
|
| |
1325
| +``` |
|
| |
1326
| + |
|
| |
1327
| +--- |
|
| |
1328
| + |
|
| |
1329
| +## 附录 |
|
| |
1330
| + |
|
| |
1331
| +### 10.1 向量维度说明 |
|
| |
1332
| + |
|
| |
1333
| +#### 为什么是1024维? |
|
| |
1334
| + |
|
| |
1335
| +1. **表达能力**:1024维可以捕捉丰富的语义信息 |
|
| |
1336
| +2. **计算效率**:维度适中,计算速度快 |
|
| |
1337
| +3. **存储平衡**:向量大小合理(每个向量约4KB) |
|
| |
1338
| +4. **模型选择**:BGE-M3和text-embedding-v4都使用1024维 |
|
| |
1339
| + |
|
| |
1340
| +#### 向量存储计算 |
|
| |
1341
| + |
|
| |
1342
| +``` |
|
| |
1343
| +单个向量大小 = 1024 × 4字节(FP32) = 4KB |
|
| |
1344
| +100万向量大小 = 4KB × 1,000,000 = 4GB |
|
| |
1345
| +1000万向量大小 = 4KB × 10,000,000 = 40GB |
|
| |
1346
| +``` |
|
| |
1347
| + |
|
| |
1348
| +### 10.2 模型版本信息 |
|
| |
1349
| + |
|
| |
1350
| +#### BGE-M3 |
|
| |
1351
| + |
|
| |
1352
| +- **HuggingFace ID**: `Xorbits/bge-m3` |
|
| |
1353
| +- **论文**: [BGE-M3: Multi-Functionality, Multi-Linguality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation](https://arxiv.org/abs/2402.03616) |
|
| |
1354
| +- **GitHub**: https://github.com/FlagOpen/FlagEmbedding |
|
| |
1355
| +- **特性**: |
|
| |
1356
| + - 支持100+种语言 |
|
| |
1357
| + - 最大支持8192 token长度 |
|
| |
1358
| + - 丰富的语义表达能力 |
|
| |
1359
| + |
|
| |
1360
| +#### CN-CLIP |
|
| |
1361
| + |
|
| |
1362
| +- **模型**: ViT-H-14 |
|
| |
1363
| +- **论文**: [Chinese CLIP: Contrastive Language-Image Pretraining in Chinese](https://arxiv.org/abs/2211.01935) |
|
| |
1364
| +- **GitHub**: https://github.com/OFA-Sys/Chinese-CLIP |
|
| |
1365
| +- **特性**: |
|
| |
1366
| + - 中文图文理解 |
|
| |
1367
| + - 支持图片检索和文本检索 |
|
| |
1368
| + - 适合电商场景 |
|
| |
1369
| + |
|
| |
1370
| +#### Aliyun text-embedding-v4 |
|
| |
1371
| + |
|
| |
1372
| +- **提供商**: 阿里云DashScope |
|
| |
1373
| +- **文档**: https://help.aliyun.com/zh/model-studio/getting-started/models |
|
| |
1374
| +- **特性**: |
|
| |
1375
| + - 云端API,无需部署 |
|
| |
1376
| + - 高可用性(99.9% SLA) |
|
| |
1377
| + - 自动扩展 |
|
| |
1378
| + |
|
| |
1379
| +### 10.3 相关文档 |
|
| |
1380
| + |
|
| |
1381
| +#### 项目文档 |
|
| |
1382
| + |
|
| |
1383
| +- **搜索API对接指南**: `docs/搜索API对接指南.md` |
|
| |
1384
| +- **索引字段说明**: `docs/索引字段说明v2.md` |
|
| |
1385
| +- **系统设计文档**: `docs/系统设计文档.md` |
|
| |
1386
| +- **CLAUDE项目指南**: `CLAUDE.md` |
|
| |
1387
| + |
|
| |
1388
| +#### 外部参考 |
|
| |
1389
| + |
|
| |
1390
| +- **BGE-M3官方文档**: https://github.com/FlagOpen/FlagEmbedding/tree/master/BGE_M3 |
|
| |
1391
| +- **阿里云DashScope**: https://help.aliyun.com/zh/model-studio/ |
|
| |
1392
| +- **Elasticsearch向量搜索**: https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html |
|
| |
1393
| +- **FastAPI文档**: https://fastapi.tiangolo.com/ |
|
| |
1394
| + |
|
| |
1395
| +#### 测试脚本 |
|
| |
1396
| + |
|
| |
1397
| +```bash |
|
| |
1398
| +# 本地向量化服务测试 |
|
| |
1399
| +./scripts/test_embedding_service.sh |
|
| |
1400
| + |
|
| |
1401
| +# 云端向量化服务测试 |
|
| |
1402
| +python scripts/test_cloud_embedding.py |
|
| |
1403
| + |
|
| |
1404
| +# 性能基准测试 |
|
| |
1405
| +python scripts/benchmark_embeddings.py |
|
| |
1406
| +``` |
|
| |
1407
| + |
|
| |
1408
| +--- |
|
| |
1409
| + |
|
| |
1410
| +## 版本历史 |
|
| |
1411
| + |
|
| |
1412
| +| 版本 | 日期 | 变更说明 | |
|
| |
1413
| +|------|------|---------| |
|
| |
1414
| +| v1.0 | 2025-12-23 | 初始版本,完整的向量化模块文档 | |
|
| |
1415
| + |
|
| |
1416
| +--- |
|
| |
1417
| + |
|
| |
1418
| +## 联系方式 |
|
| |
1419
| + |
|
| |
1420
| +如有问题或建议,请联系项目维护者。 |
|
| |
1421
| + |
|
| |
1422
| +**项目仓库**: `/data/tw/SearchEngine` |
|
| |
1423
| + |
|
| |
1424
| +**相关文档目录**: `docs/` |