Commit 950a640e392f52ccd2d836e2fedd7186ada0720f
1 parent
cc11ae04
embeddings
Showing
41 changed files
with
339 additions
and
2415 deletions
Show diff stats
.env.example
| ... | ... | @@ -38,7 +38,7 @@ START_TRANSLATOR=0 |
| 38 | 38 | START_RERANKER=0 |
| 39 | 39 | |
| 40 | 40 | # Embedding Models |
| 41 | -TEXT_MODEL_DIR=/data/tw/models/bge-m3 # 已经改为web请求了,不使用本地模型 | |
| 41 | +TEXT_MODEL_ID=Qwen/Qwen3-Embedding-0.6B | |
| 42 | 42 | IMAGE_MODEL_DIR=/data/tw/models/cn-clip # 已经改为web请求了,不使用本地模型 |
| 43 | 43 | |
| 44 | 44 | # Cache Directory | ... | ... |
CHANGES.md deleted
| ... | ... | @@ -1,218 +0,0 @@ |
| 1 | -# 云端向量化模块更新 - 变更摘要 | |
| 2 | - | |
| 3 | -## 📅 更新日期 | |
| 4 | -2025-12-05 | |
| 5 | - | |
| 6 | -## 🎯 更新目标 | |
| 7 | -为 saas-search 项目添加基于阿里云 DashScope API 的云端文本向量化功能。 | |
| 8 | - | |
| 9 | -## 📝 新增文件 | |
| 10 | - | |
| 11 | -### 1. 核心模块 | |
| 12 | -- ✅ `embeddings/cloud_text_encoder.py` | |
| 13 | - - 云端文本向量化编码器 | |
| 14 | - - 使用 text-embedding-v4 模型 | |
| 15 | - - 单例模式,线程安全 | |
| 16 | - - 支持批量处理 | |
| 17 | - | |
| 18 | -### 2. 测试脚本 | |
| 19 | -- ✅ `scripts/test_cloud_embedding.py` | |
| 20 | - - 读取 queries.txt 前 100 条 | |
| 21 | - - 记录发送/接收时间和耗时 | |
| 22 | - - 输出详细测试报告 | |
| 23 | - | |
| 24 | -### 3. 示例代码 | |
| 25 | -- ✅ `examples/cloud_embedding_example.py` | |
| 26 | - - 单个文本向量化示例 | |
| 27 | - - 批量处理示例 | |
| 28 | - - 相似度计算示例 | |
| 29 | - - 完整的错误处理 | |
| 30 | - | |
| 31 | -### 4. 文档 | |
| 32 | -- ✅ `docs/cloud_embedding_usage.md` - 详细使用文档 | |
| 33 | -- ✅ `docs/cloud_embedding_quickstart.md` - 快速入门指南 | |
| 34 | -- ✅ `CLOUD_EMBEDDING_README.md` - 功能说明 | |
| 35 | -- ✅ `CHANGES.md` - 本文档 | |
| 36 | - | |
| 37 | -## 🔧 修改文件 | |
| 38 | - | |
| 39 | -### 1. requirements.txt | |
| 40 | -- ✅ 添加依赖:`openai>=1.0.0` | |
| 41 | - | |
| 42 | -## 📊 功能特性 | |
| 43 | - | |
| 44 | -### CloudTextEncoder 类 | |
| 45 | - | |
| 46 | -**方法:** | |
| 47 | -- `__init__(api_key=None, base_url=None)` - 初始化(单例) | |
| 48 | -- `encode(sentences, ...)` - 向量化文本 | |
| 49 | -- `encode_batch(texts, batch_size=32, ...)` - 批量处理 | |
| 50 | -- `get_embedding_dimension()` - 获取向量维度 | |
| 51 | - | |
| 52 | -**特性:** | |
| 53 | -- 单例模式,确保只有一个实例 | |
| 54 | -- 线程安全 | |
| 55 | -- 自动错误处理 | |
| 56 | -- 支持批量处理避免 API 限制 | |
| 57 | -- 返回 numpy 数组(兼容现有代码) | |
| 58 | - | |
| 59 | -### 测试脚本功能 | |
| 60 | - | |
| 61 | -**test_cloud_embedding.py:** | |
| 62 | -- 读取 queries.txt(可配置数量) | |
| 63 | -- 逐条发送向量化请求 | |
| 64 | -- 记录每次请求的: | |
| 65 | - - 发送时间(精确到毫秒) | |
| 66 | - - 接收时间(精确到毫秒) | |
| 67 | - - 耗时(秒) | |
| 68 | - - 向量维度 | |
| 69 | - - 成功/失败状态 | |
| 70 | -- 输出汇总统计: | |
| 71 | - - 总查询数 | |
| 72 | - - 成功/失败数量 | |
| 73 | - - 成功率 | |
| 74 | - - 总耗时 | |
| 75 | - - 平均耗时 | |
| 76 | - - 吞吐量 | |
| 77 | - | |
| 78 | -## 📖 使用示例 | |
| 79 | - | |
| 80 | -### 基本使用 | |
| 81 | -```python | |
| 82 | -from embeddings.cloud_text_encoder import CloudTextEncoder | |
| 83 | - | |
| 84 | -encoder = CloudTextEncoder() | |
| 85 | -embedding = encoder.encode("衣服的质量杠杠的") | |
| 86 | -``` | |
| 87 | - | |
| 88 | -### 批量处理 | |
| 89 | -```python | |
| 90 | -texts = ["文本1", "文本2", "文本3"] | |
| 91 | -embeddings = encoder.encode(texts) | |
| 92 | -``` | |
| 93 | - | |
| 94 | -### 运行测试 | |
| 95 | -```bash | |
| 96 | -export DASHSCOPE_API_KEY="sk-xxx" | |
| 97 | -python scripts/test_cloud_embedding.py | |
| 98 | -``` | |
| 99 | - | |
| 100 | -## 🔍 测试结果示例 | |
| 101 | - | |
| 102 | -``` | |
| 103 | -================================================================================ | |
| 104 | -Test Summary | |
| 105 | -================================================================================ | |
| 106 | -Total Queries: 100 | |
| 107 | -Successful: 100 | |
| 108 | -Failed: 0 | |
| 109 | -Success Rate: 100.0% | |
| 110 | -Total Time: 35.123s | |
| 111 | -Total API Time: 32.456s | |
| 112 | -Average Duration: 0.325s per query | |
| 113 | -Throughput: 2.85 queries/second | |
| 114 | -================================================================================ | |
| 115 | -``` | |
| 116 | - | |
| 117 | -## 📚 文档结构 | |
| 118 | - | |
| 119 | -``` | |
| 120 | -docs/ | |
| 121 | -├── cloud_embedding_usage.md # 详细使用说明 | |
| 122 | -│ ├── 模块说明 | |
| 123 | -│ ├── 使用步骤 | |
| 124 | -│ ├── 注意事项 | |
| 125 | -│ ├── 地域选择 | |
| 126 | -│ ├── 故障排除 | |
| 127 | -│ └── 更多信息 | |
| 128 | -│ | |
| 129 | -└── cloud_embedding_quickstart.md # 快速入门 | |
| 130 | - ├── 新增文件列表 | |
| 131 | - ├── 快速开始步骤 | |
| 132 | - ├── 代码示例 | |
| 133 | - ├── 性能指标 | |
| 134 | - ├── 配置选项 | |
| 135 | - ├── 常见问题 | |
| 136 | - └── 最佳实践 | |
| 137 | -``` | |
| 138 | - | |
| 139 | -## ✅ 验证步骤 | |
| 140 | - | |
| 141 | -1. **环境准备** | |
| 142 | - - [x] 安装 openai 包 | |
| 143 | - - [x] 设置 DASHSCOPE_API_KEY | |
| 144 | - | |
| 145 | -2. **功能测试** | |
| 146 | - - [ ] 运行 `python scripts/test_cloud_embedding.py` | |
| 147 | - - [ ] 运行 `python examples/cloud_embedding_example.py` | |
| 148 | - - [ ] 检查输出是否正常 | |
| 149 | - | |
| 150 | -3. **集成测试** | |
| 151 | - - [ ] 在项目中导入模块 | |
| 152 | - - [ ] 测试与现有代码的兼容性 | |
| 153 | - | |
| 154 | -## 🎯 预期性能 | |
| 155 | - | |
| 156 | -基于 text-embedding-v4 模型: | |
| 157 | -- **向量维度**:1024 | |
| 158 | -- **平均延迟**:300-400ms | |
| 159 | -- **吞吐量**:~2-3 queries/秒(单线程) | |
| 160 | -- **成功率**:>99% | |
| 161 | - | |
| 162 | -## ⚠️ 注意事项 | |
| 163 | - | |
| 164 | -1. **API Key 安全** | |
| 165 | - - 不要将 API Key 提交到代码仓库 | |
| 166 | - - 使用环境变量或配置文件 | |
| 167 | - | |
| 168 | -2. **成本控制** | |
| 169 | - - 云端 API 按使用量计费 | |
| 170 | - - 建议缓存常用查询的向量 | |
| 171 | - - 监控使用量 | |
| 172 | - | |
| 173 | -3. **速率限制** | |
| 174 | - - 注意 API 速率限制 | |
| 175 | - - 测试脚本已添加适当延迟 | |
| 176 | - - 可根据需要调整 batch_size | |
| 177 | - | |
| 178 | -4. **网络依赖** | |
| 179 | - - 需要稳定的网络连接 | |
| 180 | - - 考虑添加重试机制 | |
| 181 | - | |
| 182 | -## 🔄 后续计划 | |
| 183 | - | |
| 184 | -建议的改进方向: | |
| 185 | -- [ ] 添加向量缓存机制 | |
| 186 | -- [ ] 实现自动重试逻辑 | |
| 187 | -- [ ] 添加多线程/异步支持 | |
| 188 | -- [ ] 集成到搜索模块 | |
| 189 | -- [ ] 添加性能监控 | |
| 190 | -- [ ] 实现成本追踪 | |
| 191 | - | |
| 192 | -## 🆚 与本地编码器对比 | |
| 193 | - | |
| 194 | -| 特性 | CloudTextEncoder | BgeEncoder | | |
| 195 | -|------|------------------|------------| | |
| 196 | -| 部署 | 无需部署 | 需要本地服务 | | |
| 197 | -| 延迟 | ~350ms | <100ms | | |
| 198 | -| 成本 | 按使用付费 | 固定成本 | | |
| 199 | -| 维护 | 无需维护 | 需要维护 | | |
| 200 | -| 离线 | 不支持 | 支持 | | |
| 201 | -| 扩展 | 自动扩展 | 手动扩展 | | |
| 202 | - | |
| 203 | -## 📞 获取帮助 | |
| 204 | - | |
| 205 | -- 查看文档:`docs/cloud_embedding_*.md` | |
| 206 | -- 运行示例:`python examples/cloud_embedding_example.py` | |
| 207 | -- 阿里云文档:https://help.aliyun.com/zh/model-studio/ | |
| 208 | - | |
| 209 | -## ✨ 总结 | |
| 210 | - | |
| 211 | -本次更新成功添加了云端向量化功能,包括: | |
| 212 | -- ✅ 完整的编码器实现 | |
| 213 | -- ✅ 详细的测试脚本 | |
| 214 | -- ✅ 丰富的示例代码 | |
| 215 | -- ✅ 完善的文档 | |
| 216 | - | |
| 217 | -所有功能已测试通过,可以直接使用。 | |
| 218 | - |
CLIP_SERVICE_README.md
| ... | ... | @@ -82,7 +82,7 @@ cd /data/saas-search |
| 82 | 82 | |
| 83 | 83 | - 现有本地向量服务: |
| 84 | 84 | - 启动脚本:`./scripts/start_embedding_service.sh` |
| 85 | - - 实现:`embeddings/server.py`(FastAPI + 本地模型 `bge_model.py` / `clip_model.py`) | |
| 85 | + - 实现:`embeddings/server.py`(FastAPI + 本地模型 `qwen3_model.py` / `clip_model.py`) | |
| 86 | 86 | - 新增的 `clip-server`: |
| 87 | 87 | - 使用官方实现,单独进程、单独环境 |
| 88 | 88 | - 面向图像 / 文本的 CLIP 向量化服务 |
| ... | ... | @@ -189,8 +189,6 @@ INFO gateway/rep-0@XXXXX start server bound to 0.0.0.0:51000 |
| 189 | 189 | ## 5. 参考 |
| 190 | 190 | |
| 191 | 191 | - 项目地址:`https://github.com/jina-ai/clip-as-service` |
| 192 | -- 本项目向量模块文档:`embeddings/README.md`、`CLOUD_EMBEDDING_README.md` | |
| 193 | - | |
| 194 | - | |
| 192 | +- 本项目向量模块文档:`embeddings/README.md` | |
| 195 | 193 | |
| 196 | 194 | ... | ... |
config/config.yaml
| ... | ... | @@ -162,16 +162,11 @@ services: |
| 162 | 162 | location: "global" |
| 163 | 163 | model: "" |
| 164 | 164 | embedding: |
| 165 | - provider: "http" # http | vllm(reserved) | |
| 165 | + provider: "http" # http | |
| 166 | 166 | base_url: "http://127.0.0.1:6005" |
| 167 | 167 | providers: |
| 168 | 168 | http: |
| 169 | 169 | base_url: "http://127.0.0.1:6005" |
| 170 | - vllm: | |
| 171 | - enabled: false | |
| 172 | - base_url: "" | |
| 173 | - model: "" | |
| 174 | - note: "reserved for future vLLM embedding backend" | |
| 175 | 170 | rerank: |
| 176 | 171 | provider: "http" |
| 177 | 172 | base_url: "http://127.0.0.1:6007" | ... | ... |
config/env_config.py
| ... | ... | @@ -79,8 +79,8 @@ EMBEDDING_SERVICE_URL = os.getenv('EMBEDDING_SERVICE_URL') or f'http://{EMBEDDIN |
| 79 | 79 | TRANSLATION_SERVICE_URL = os.getenv('TRANSLATION_SERVICE_URL') or f'http://{TRANSLATION_HOST}:{TRANSLATION_PORT}' |
| 80 | 80 | RERANKER_SERVICE_URL = os.getenv('RERANKER_SERVICE_URL') or f'http://{RERANKER_HOST}:{RERANKER_PORT}/rerank' |
| 81 | 81 | |
| 82 | -# Model Directories | |
| 83 | -TEXT_MODEL_DIR = os.getenv('TEXT_MODEL_DIR', '/data/tw/models/bge-m3') | |
| 82 | +# Model IDs / paths | |
| 83 | +TEXT_MODEL_DIR = os.getenv('TEXT_MODEL_DIR', os.getenv('TEXT_MODEL_ID', 'Qwen/Qwen3-Embedding-0.6B')) | |
| 84 | 84 | IMAGE_MODEL_DIR = os.getenv('IMAGE_MODEL_DIR', '/data/tw/models/cn-clip') |
| 85 | 85 | |
| 86 | 86 | # Cache Directory | ... | ... |
docs/DEVELOPER_GUIDE.md
| ... | ... | @@ -154,7 +154,7 @@ docs/ # 文档(含本指南) |
| 154 | 154 | |
| 155 | 155 | ### 4.6 embeddings |
| 156 | 156 | |
| 157 | -- **职责**:提供向量服务(FastAPI):`POST /embed/text`、`POST /embed/image`;服务内按配置加载文本后端(如 BGE)与图像后端(如 clip-as-service 或本地 CN-CLIP),实现协议即可插拔。 | |
| 157 | +- **职责**:提供向量服务(FastAPI):`POST /embed/text`、`POST /embed/image`;服务内按配置加载文本后端(如 Qwen3-Embedding-0.6B)与图像后端(如 clip-as-service 或本地 CN-CLIP),实现协议即可插拔。 | |
| 158 | 158 | - **原则**:图片后端实现 `embeddings/protocols.ImageEncoderProtocol`;配置优先从 `config` 或 `embeddings/config.py` 读取,与 `services.embedding` 的 URL 分离。 |
| 159 | 159 | - **详见**:`embeddings/README.md`、本指南 §7.5–§7.6。 |
| 160 | 160 | |
| ... | ... | @@ -303,7 +303,7 @@ services: |
| 303 | 303 | |------|--------|------|--------| |
| 304 | 304 | | 调用方 | `services.<capability>.provider` | http | http | |
| 305 | 305 | | 调用方 | `services.<capability>.providers.http.base_url` | 6007 | 6005 | |
| 306 | -| 服务内 | `services.<capability>.backend` | bge / qwen3_vllm 等 | 见 embeddings/config | | |
| 306 | +| 服务内 | `services.<capability>.backend` | qwen3_embedding_st / qwen3_vllm 等 | 见 embeddings/config | | |
| 307 | 307 | | 服务内 | `services.<capability>.backends.<name>` | 模型名、batch、vLLM 参数 | 模型名、device 等 | |
| 308 | 308 | |
| 309 | 309 | ### 7.6 新增后端清单(以 Qwen3-Reranker 为例) | ... | ... |
docs/QUICKSTART.md
| ... | ... | @@ -396,7 +396,7 @@ services: |
| 396 | 396 | 这里重点区分两层: |
| 397 | 397 | |
| 398 | 398 | - **Provider 层(调用方式)**:调用方如何访问能力(http/direct) |
| 399 | -- **Backend 层(推理实现)**:服务进程内部具体模型实现(bge / qwen3_vllm / ...) | |
| 399 | +- **Backend 层(推理实现)**:服务进程内部具体模型实现(qwen3_embedding_st / qwen3_vllm / ...) | |
| 400 | 400 | |
| 401 | 401 | ### 4.1 Rerank(重排)扩展 |
| 402 | 402 | |
| ... | ... | @@ -429,7 +429,7 @@ services: |
| 429 | 429 | 服务实现: |
| 430 | 430 | |
| 431 | 431 | - `embeddings/server.py`(文本和图像可独立加载) |
| 432 | -- 文本后端现用 BGE | |
| 432 | +- 文本后端现用 Qwen3-Embedding-0.6B(Sentence Transformers) | |
| 433 | 433 | - 图像后端支持本地 CLIP 或 clip-as-service |
| 434 | 434 | |
| 435 | 435 | 扩展建议: |
| ... | ... | @@ -438,6 +438,12 @@ services: |
| 438 | 438 | - 图像后端实现 `ImageEncoderProtocol`(`encode_image_urls`) |
| 439 | 439 | - 如后续后端增多,建议与 rerank 一样在 `services.embedding.backend(s)` 统一配置 |
| 440 | 440 | |
| 441 | +选型建议(Qwen3-Embedding-0.6B): | |
| 442 | + | |
| 443 | +- 当前仓库默认:`Sentence Transformers` 本地推理(改造成本最低,便于直接替换 6005 现有服务) | |
| 444 | +- 生产高并发优先:`TEI`(Text Embeddings Inference)通常比 vLLM 更贴合 embedding-only 场景 | |
| 445 | +- `vLLM` 更适合生成式/重排混合场景;若仅做文本 embedding,通常不作为首选 | |
| 446 | + | |
| 441 | 447 | ### 4.3 新增后端检查清单(以 `qwen3_vllm` 为例) |
| 442 | 448 | |
| 443 | 449 | 1. 实现后端协议(服务内) | ... | ... |
docs/TODO.txt
| ... | ... | @@ -32,28 +32,18 @@ HOST:10.200.16.14 / localhost |
| 32 | 32 | |
| 33 | 33 | 你安装过nvidia-container-toolkit吗 |
| 34 | 34 | 现在有一些开源的推理引擎对向量化模型和重排模型支持的比较好,我们这块也正好要单独拎出来,因此想改造下。 |
| 35 | -调研了TEI, vLLM, xinference,目前觉得最合适的是xinference+vLLM后端, | |
| 36 | -最好以docker方式部署,让gpu对docker可见需要nvidia-container-toolkit, | |
| 37 | -我试了多种方法安装赖nvidia-container-toolkit都失败了 | |
| 35 | +已决策:embedding 先统一为 Qwen3-Embedding-0.6B(Sentence Transformers,本地 6005),后续若要独立高并发服务优先评估 TEI。 | |
| 36 | +最好以 docker 方式部署,让 gpu 对 docker 可见需要 nvidia-container-toolkit, | |
| 37 | +我试了多种方法安装 nvidia-container-toolkit 都失败了 | |
| 38 | 38 | https://mirrors.aliyun.com/github/releases/NVIDIA/nvidia-container-toolkit/ |
| 39 | 39 | https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html |
| 40 | 40 | |
| 41 | 41 | |
| 42 | -bge-m3 | |
| 43 | 42 | qwen3-embedding |
| 44 | 43 | qwen3-reranker |
| 45 | 44 | |
| 46 | -大概耗时是0.026S,现在用这个xinference都得0.5S,看这个xinference的安装和embedding模型的部署是不是有问题 | |
| 47 | - | |
| 48 | - | |
| 49 | - | |
| 50 | -xinference是直接支持了embedding和reranker模型这些模型类型,相当于vllm的上层的封装,因此调用接口很简单,也支持bge和qwen3系列。 但是性能这么差 估计是有啥问题。 | |
| 51 | -不好查的话,用vllm或者其他的推理引擎也行, | |
| 52 | - | |
| 53 | -选一个推理引擎,相比于我自己直接调modelscope/sentence-transformers,主要是多进程和负载均衡、连续批处理,比较有用 | |
| 54 | -不知道我理解的有没有问题 | |
| 55 | -调研了TEI, vLLM, xinference,+vLLM后端 | |
| 56 | -这个推理引擎怎么选合适,是选VLLM还是xinference | |
| 45 | +选一个推理引擎,相比于我自己直接调 sentence-transformers,主要是多进程和负载均衡、连续批处理,比较有用 | |
| 46 | +当前结论:embedding 场景优先 TEI;vLLM 更偏向生成式与 rerank 场景。 | |
| 57 | 47 | |
| 58 | 48 | |
| 59 | 49 | |
| ... | ... | @@ -83,4 +73,3 @@ https://cloud.tencent.com/document/product/1729/113395#4.-.E7.A4.BA.E4.BE.8B |
| 83 | 73 | 登录 百炼美国地域控制台:https://modelstudio.console.aliyun.com/us-east-1?spm=5176.2020520104.0.0.6b383a98WjpXff |
| 84 | 74 | 在 API Key 管理 中创建或复制一个适用于美国地域的 Key |
| 85 | 75 | |
| 86 | - | ... | ... |
docs/向量化模块和API说明文档.md
| ... | ... | @@ -10,6 +10,7 @@ |
| 10 | 10 | ## 配置 |
| 11 | 11 | |
| 12 | 12 | - Provider/URL:`config/config.yaml` 的 `services.embedding` |
| 13 | -- 模型路径:`embeddings/config.py` 或 env `TEXT_MODEL_DIR`、`IMAGE_MODEL_DIR` | |
| 13 | +- 文本模型:`embeddings/config.py` 的 `TEXT_MODEL_ID`(默认 `Qwen/Qwen3-Embedding-0.6B`) | |
| 14 | +- 运行参数:`TEXT_DEVICE`、`TEXT_BATCH_SIZE`、`TEXT_NORMALIZE_EMBEDDINGS` | |
| 14 | 15 | |
| 15 | 16 | 详见 `embeddings/README.md`。 | ... | ... |
docs/系统设计文档.md
| ... | ... | @@ -420,11 +420,11 @@ query_config: |
| 420 | 420 | 1. **条件生成**: |
| 421 | 421 | - 仅当 `enable_text_embedding=true` 时生成向量 |
| 422 | 422 | - 仅对 `default` 域查询生成向量 |
| 423 | -2. **向量模型**:BGE-M3 模型(1024维向量) | |
| 423 | +2. **向量模型**:Qwen3-Embedding-0.6B(1024维向量) | |
| 424 | 424 | 3. **用途**:用于语义搜索(KNN 检索) |
| 425 | 425 | |
| 426 | 426 | #### 实现模块 |
| 427 | -- `embeddings/bge_encoder.py` - BGE 文本编码器 | |
| 427 | +- `embeddings/text_encoder.py` + `embeddings/server.py` - 文本向量服务客户端与服务端实现 | |
| 428 | 428 | - `query/query_parser.py` - 查询解析器(集成向量生成) |
| 429 | 429 | |
| 430 | 430 | --- |
| ... | ... | @@ -585,7 +585,7 @@ ranking: |
| 585 | 585 | - ✅ 查询改写(词典配置) |
| 586 | 586 | - ✅ 语言检测 |
| 587 | 587 | - ✅ 多语言翻译(DeepL API) |
| 588 | -- ✅ 文本向量化(BGE-M3) | |
| 588 | +- ✅ 文本向量化(Qwen3-Embedding-0.6B) | |
| 589 | 589 | - ✅ 域提取(支持 `domain:query` 语法) |
| 590 | 590 | |
| 591 | 591 | ### 6.4 搜索功能 |
| ... | ... | @@ -623,7 +623,7 @@ ranking: |
| 623 | 623 | - **后端**:Python 3.6+ |
| 624 | 624 | - **搜索引擎**:Elasticsearch |
| 625 | 625 | - **数据库**:MySQL(Shoplazza) |
| 626 | -- **向量模型**:BGE-M3(文本)、CN-CLIP(图片) | |
| 626 | +- **向量模型**:Qwen3-Embedding-0.6B(文本)、CN-CLIP(图片) | |
| 627 | 627 | - **翻译服务**:DeepL API |
| 628 | 628 | - **API 框架**:FastAPI |
| 629 | 629 | - **前端**:HTML + JavaScript | ... | ... |
docs/系统设计文档v1.md
docs/索引字段说明v1.md
docs/索引字段说明v2.md
| ... | ... | @@ -352,7 +352,7 @@ |
| 352 | 352 | |
| 353 | 353 | | 字段名 | ES类型 | 维度 | 说明 | 数据来源 | |
| 354 | 354 | |--------|--------|------|------|----------| |
| 355 | -| `title_embedding` | dense_vector | 1024 | 标题向量(用于语义搜索) | 由 BGE-M3 模型生成 | | |
| 355 | +| `title_embedding` | dense_vector | 1024 | 标题向量(用于语义搜索) | 由 Qwen3-Embedding-0.6B 模型生成 | | |
| 356 | 356 | | `image_embedding` | nested | - | 图片向量(用于图片搜索) | 由 CN-CLIP 模型生成 | |
| 357 | 357 | |
| 358 | 358 | **注意**: 这些字段仅用于搜索,不会返回给前端。 |
| ... | ... | @@ -462,4 +462,4 @@ filters AND (text_recall OR embedding_recall) |
| 462 | 462 | 3. **多语言支持**: 文本字段支持中英文,后端根据 `language` 参数自动选择 |
| 463 | 463 | 4. **规格分面**: `specifications` 使用嵌套聚合,按 `name` 分组,然后按 `value` 聚合 |
| 464 | 464 | 5. **向量字段**: `title_embedding` 和 `image_embedding` 仅用于搜索,不返回给前端 |
| 465 | - | |
| 466 | 465 | \ No newline at end of file |
| 466 | + | ... | ... |
embeddings/CLOUD_EMBEDDING_README.md deleted
| ... | ... | @@ -1,307 +0,0 @@ |
| 1 | -# 云端向量化模块 - 更新说明 | |
| 2 | - | |
| 3 | -## 📝 概述 | |
| 4 | - | |
| 5 | -本次更新为 saas-search 项目添加了基于阿里云 DashScope API 的云端文本向量化功能,使用 `text-embedding-v4` 模型。 | |
| 6 | - | |
| 7 | -## 🎯 主要功能 | |
| 8 | - | |
| 9 | -1. **CloudTextEncoder** - 云端文本向量化编码器 | |
| 10 | - - 单例模式,线程安全 | |
| 11 | - - 支持单个/批量文本向量化 | |
| 12 | - - 自动错误处理和降级 | |
| 13 | - - 生成 1024 维向量 | |
| 14 | - | |
| 15 | -2. **测试脚本** - 使用 queries.txt 测试向量化 | |
| 16 | - - 读取前 100 条查询 | |
| 17 | - - 记录每次请求的时间和耗时 | |
| 18 | - - 统计成功率和性能指标 | |
| 19 | - | |
| 20 | -3. **示例代码** - 展示如何使用模块 | |
| 21 | - - 单个文本向量化 | |
| 22 | - - 批量处理 | |
| 23 | - - 相似度计算 | |
| 24 | - | |
| 25 | -## 📁 文件结构 | |
| 26 | - | |
| 27 | -``` | |
| 28 | -saas-search/ | |
| 29 | -├── embeddings/ | |
| 30 | -│ ├── cloud_text_encoder.py # 云端向量化编码器(新增) | |
| 31 | -│ ├── text_encoder.py # 本地编码器(现有) | |
| 32 | -│ └── ... | |
| 33 | -├── scripts/ | |
| 34 | -│ ├── test_cloud_embedding.py # 测试脚本(新增) | |
| 35 | -│ └── ... | |
| 36 | -├── examples/ | |
| 37 | -│ └── cloud_embedding_example.py # 示例代码(新增) | |
| 38 | -├── docs/ | |
| 39 | -│ ├── cloud_embedding_usage.md # 详细文档(新增) | |
| 40 | -│ └── cloud_embedding_quickstart.md # 快速入门(新增) | |
| 41 | -├── data_crawling/ | |
| 42 | -│ └── queries.txt # 测试数据 | |
| 43 | -├── requirements.txt # 已添加 openai>=1.0.0 | |
| 44 | -└── CLOUD_EMBEDDING_README.md # 本文档(新增) | |
| 45 | -``` | |
| 46 | - | |
| 47 | -## 🚀 快速开始 | |
| 48 | - | |
| 49 | -### 1. 安装依赖 | |
| 50 | - | |
| 51 | -```bash | |
| 52 | -pip install openai | |
| 53 | -``` | |
| 54 | - | |
| 55 | -或使用项目 requirements: | |
| 56 | -```bash | |
| 57 | -pip install -r requirements.txt | |
| 58 | -``` | |
| 59 | - | |
| 60 | -### 2. 设置 API Key | |
| 61 | - | |
| 62 | -```bash | |
| 63 | -export DASHSCOPE_API_KEY="sk-your-api-key-here" | |
| 64 | -``` | |
| 65 | - | |
| 66 | -获取 API Key:https://help.aliyun.com/zh/model-studio/get-api-key | |
| 67 | - | |
| 68 | -### 3. 运行测试 | |
| 69 | - | |
| 70 | -```bash | |
| 71 | -# 测试向量化(使用 queries.txt 前 100 条) | |
| 72 | -python scripts/test_cloud_embedding.py | |
| 73 | - | |
| 74 | -# 运行示例代码 | |
| 75 | -python examples/cloud_embedding_example.py | |
| 76 | -``` | |
| 77 | - | |
| 78 | -## 📖 使用方法 | |
| 79 | - | |
| 80 | -### 基础使用 | |
| 81 | - | |
| 82 | -```python | |
| 83 | -from embeddings.cloud_text_encoder import CloudTextEncoder | |
| 84 | - | |
| 85 | -# 初始化编码器 | |
| 86 | -encoder = CloudTextEncoder() | |
| 87 | - | |
| 88 | -# 单个文本向量化 | |
| 89 | -embedding = encoder.encode("衣服的质量杠杠的") | |
| 90 | -print(embedding.shape) # (1, 1024) | |
| 91 | - | |
| 92 | -# 批量向量化 | |
| 93 | -embeddings = encoder.encode(["文本1", "文本2", "文本3"]) | |
| 94 | -print(embeddings.shape) # (3, 1024) | |
| 95 | -``` | |
| 96 | - | |
| 97 | -### 批量处理 | |
| 98 | - | |
| 99 | -```python | |
| 100 | -# 大批量自动分批处理 | |
| 101 | -texts = [f"商品 {i}" for i in range(1000)] | |
| 102 | -embeddings = encoder.encode_batch(texts, batch_size=32) | |
| 103 | -``` | |
| 104 | - | |
| 105 | -## 🧪 测试脚本 | |
| 106 | - | |
| 107 | -测试脚本 `scripts/test_cloud_embedding.py` 功能: | |
| 108 | - | |
| 109 | -✅ 读取 `data_crawling/queries.txt` 前 100 条查询 | |
| 110 | -✅ 逐条发送向量化请求 | |
| 111 | -✅ 记录每次请求的发送时间、接收时间、耗时 | |
| 112 | -✅ 输出向量维度和内容 | |
| 113 | -✅ 统计成功率、平均耗时、吞吐量 | |
| 114 | - | |
| 115 | -### 测试输出示例 | |
| 116 | - | |
| 117 | -``` | |
| 118 | -================================================================================ | |
| 119 | -Cloud Text Embedding Test - Aliyun DashScope API | |
| 120 | -================================================================================ | |
| 121 | - | |
| 122 | -[ 1/100] ✓ SUCCESS | |
| 123 | - Query: Bohemian Maxi Dress | |
| 124 | - Send Time: 2025-12-05 10:30:45.123 | |
| 125 | - Receive Time: 2025-12-05 10:30:45.456 | |
| 126 | - Duration: 0.333s | |
| 127 | - Embedding Shape: (1, 1024) | |
| 128 | - | |
| 129 | -... | |
| 130 | - | |
| 131 | -================================================================================ | |
| 132 | -Test Summary | |
| 133 | -================================================================================ | |
| 134 | -Total Queries: 100 | |
| 135 | -Successful: 100 | |
| 136 | -Failed: 0 | |
| 137 | -Success Rate: 100.0% | |
| 138 | -Total Time: 35.123s | |
| 139 | -Total API Time: 32.456s | |
| 140 | -Average Duration: 0.325s per query | |
| 141 | -Throughput: 2.85 queries/second | |
| 142 | -================================================================================ | |
| 143 | -``` | |
| 144 | - | |
| 145 | -## 📊 性能特点 | |
| 146 | - | |
| 147 | -- **向量维度**:1024 | |
| 148 | -- **平均延迟**:300-400ms/请求 | |
| 149 | -- **吞吐量**:~2-3 queries/秒(单线程) | |
| 150 | -- **错误处理**:自动降级到零向量 | |
| 151 | -- **批处理**:支持自动分批和速率控制 | |
| 152 | - | |
| 153 | -## 🔧 接口说明 | |
| 154 | - | |
| 155 | -### CloudTextEncoder API | |
| 156 | - | |
| 157 | -#### 初始化 | |
| 158 | - | |
| 159 | -```python | |
| 160 | -CloudTextEncoder(api_key=None, base_url=None) | |
| 161 | -``` | |
| 162 | - | |
| 163 | -参数: | |
| 164 | -- `api_key` (str, optional): API Key,默认从环境变量读取 | |
| 165 | -- `base_url` (str, optional): API 端点,默认北京地域 | |
| 166 | - | |
| 167 | -#### encode() | |
| 168 | - | |
| 169 | -```python | |
| 170 | -encode(sentences, normalize_embeddings=True, device='cpu', batch_size=32) | |
| 171 | -``` | |
| 172 | - | |
| 173 | -参数: | |
| 174 | -- `sentences` (str or List[str]): 单个文本或文本列表 | |
| 175 | -- `normalize_embeddings` (bool): 是否归一化(API 自动处理) | |
| 176 | -- `device` (str): 设备参数(兼容性参数,云端 API 忽略) | |
| 177 | -- `batch_size` (int): 批处理大小 | |
| 178 | - | |
| 179 | -返回: | |
| 180 | -- `np.ndarray`: 形状为 (n, 1024) 的 numpy 数组 | |
| 181 | - | |
| 182 | -#### encode_batch() | |
| 183 | - | |
| 184 | -```python | |
| 185 | -encode_batch(texts, batch_size=32, device='cpu') | |
| 186 | -``` | |
| 187 | - | |
| 188 | -参数: | |
| 189 | -- `texts` (List[str]): 文本列表 | |
| 190 | -- `batch_size` (int): 批处理大小 | |
| 191 | -- `device` (str): 设备参数(兼容性参数) | |
| 192 | - | |
| 193 | -返回: | |
| 194 | -- `np.ndarray`: 向量矩阵 | |
| 195 | - | |
| 196 | -## 📚 文档 | |
| 197 | - | |
| 198 | -- **快速入门**:`docs/cloud_embedding_quickstart.md` | |
| 199 | -- **详细文档**:`docs/cloud_embedding_usage.md` | |
| 200 | -- **示例代码**:`examples/cloud_embedding_example.py` | |
| 201 | - | |
| 202 | -## ⚠️ 注意事项 | |
| 203 | - | |
| 204 | -1. **API Key 管理**:妥善保管 API Key,不要提交到代码仓库 | |
| 205 | -2. **成本控制**:云端 API 按使用量计费,注意控制调用次数 | |
| 206 | -3. **速率限制**:注意 API 速率限制,测试脚本已添加延迟 | |
| 207 | -4. **网络依赖**:需要稳定的网络连接 | |
| 208 | -5. **错误处理**:API 失败时会返回零向量,请检查日志 | |
| 209 | - | |
| 210 | -## 🆚 对比本地编码器 | |
| 211 | - | |
| 212 | -| 特性 | CloudTextEncoder | BgeEncoder (本地) | | |
| 213 | -|------|------------------|-------------------| | |
| 214 | -| 部署方式 | 云端 API | 本地服务 | | |
| 215 | -| 初始成本 | 低 | 高(GPU/CPU) | | |
| 216 | -| 运行成本 | 按使用付费 | 固定 | | |
| 217 | -| 延迟 | ~300-400ms | <100ms | | |
| 218 | -| 离线使用 | ❌ | ✅ | | |
| 219 | -| 维护成本 | 低 | 需要维护 | | |
| 220 | -| 扩展性 | 自动扩展 | 手动扩展 | | |
| 221 | - | |
| 222 | -## 🔄 集成建议 | |
| 223 | - | |
| 224 | -### 选择使用场景 | |
| 225 | - | |
| 226 | -**使用 CloudTextEncoder(云端):** | |
| 227 | -- 初期开发和测试 | |
| 228 | -- 查询量不大的应用 | |
| 229 | -- 不需要离线支持 | |
| 230 | -- 希望降低运维成本 | |
| 231 | - | |
| 232 | -**使用 BgeEncoder(本地):** | |
| 233 | -- 大规模生产环境 | |
| 234 | -- 需要低延迟 | |
| 235 | -- 离线使用场景 | |
| 236 | -- 查询量非常大 | |
| 237 | - | |
| 238 | -### 混合使用 | |
| 239 | - | |
| 240 | -```python | |
| 241 | -# 配置文件中选择编码器类型 | |
| 242 | -ENCODER_TYPE = os.getenv("ENCODER_TYPE", "local") # local or cloud | |
| 243 | - | |
| 244 | -if ENCODER_TYPE == "cloud": | |
| 245 | - from embeddings.cloud_text_encoder import CloudTextEncoder | |
| 246 | - encoder = CloudTextEncoder() | |
| 247 | -else: | |
| 248 | - from embeddings.text_encoder import BgeEncoder | |
| 249 | - encoder = BgeEncoder() | |
| 250 | - | |
| 251 | -# 使用统一接口 | |
| 252 | -embeddings = encoder.encode(texts) | |
| 253 | -``` | |
| 254 | - | |
| 255 | -## 🐛 故障排查 | |
| 256 | - | |
| 257 | -### 问题 1:API Key 未设置 | |
| 258 | -```bash | |
| 259 | -export DASHSCOPE_API_KEY="sk-your-key" | |
| 260 | -``` | |
| 261 | - | |
| 262 | -### 问题 2:网络连接失败 | |
| 263 | -- 检查网络连接 | |
| 264 | -- 验证 base_url 是否正确 | |
| 265 | -- 确认防火墙设置 | |
| 266 | - | |
| 267 | -### 问题 3:速率限制 | |
| 268 | -- 减小 batch_size | |
| 269 | -- 增加请求间隔 | |
| 270 | -- 升级 API 套餐 | |
| 271 | - | |
| 272 | -### 问题 4:返回零向量 | |
| 273 | -- 检查日志中的错误信息 | |
| 274 | -- 验证 API Key 是否有效 | |
| 275 | -- 确认账户余额 | |
| 276 | - | |
| 277 | -## 🎓 示例代码 | |
| 278 | - | |
| 279 | -查看 `examples/cloud_embedding_example.py` 了解完整示例: | |
| 280 | -- 单个/批量文本向量化 | |
| 281 | -- 相似度计算 | |
| 282 | -- 错误处理 | |
| 283 | - | |
| 284 | -## 📞 支持 | |
| 285 | - | |
| 286 | -- 项目文档:`docs/` 目录 | |
| 287 | -- 阿里云文档:https://help.aliyun.com/zh/model-studio/ | |
| 288 | -- API 文档:https://help.aliyun.com/zh/model-studio/getting-started/models | |
| 289 | - | |
| 290 | -## ✅ 验证清单 | |
| 291 | - | |
| 292 | -完成以下步骤确认模块正常工作: | |
| 293 | - | |
| 294 | -- [ ] 安装了 openai 包 | |
| 295 | -- [ ] 设置了 DASHSCOPE_API_KEY 环境变量 | |
| 296 | -- [ ] 运行测试脚本成功 | |
| 297 | -- [ ] 查看了示例代码 | |
| 298 | -- [ ] 阅读了文档 | |
| 299 | - | |
| 300 | -## 📅 更新日期 | |
| 301 | - | |
| 302 | -2025-12-05 | |
| 303 | - | |
| 304 | -## 👨💻 维护 | |
| 305 | - | |
| 306 | -如有问题或建议,请联系项目维护者。 | |
| 307 | - |
embeddings/README.md
| ... | ... | @@ -7,12 +7,20 @@ |
| 7 | 7 | 这个目录是一个完整的“向量化模块”,包含: |
| 8 | 8 | |
| 9 | 9 | - **HTTP 客户端**:`text_encoder.py` / `image_encoder.py`(供搜索/索引模块调用) |
| 10 | -- **本地模型实现**:`bge_model.py` / `clip_model.py` | |
| 10 | +- **本地模型实现**:`qwen3_model.py` / `clip_model.py` | |
| 11 | 11 | - **clip-as-service 客户端**:`clip_as_service_encoder.py`(图片向量,推荐) |
| 12 | 12 | - **向量化服务(FastAPI)**:`server.py` |
| 13 | 13 | - **统一配置**:`config.py` |
| 14 | 14 | - **接口契约**:`protocols.ImageEncoderProtocol`(图片编码统一为 `encode_image_urls(urls, batch_size)`,本地 CN-CLIP 与 clip-as-service 均实现该接口) |
| 15 | 15 | |
| 16 | +说明:历史上的云端 embedding 试验实现(DashScope)已从主仓库移除,当前仅维护 6005 这条统一向量服务链路。 | |
| 17 | + | |
| 18 | +### 文本向量后端(默认) | |
| 19 | + | |
| 20 | +- 6005 文本向量服务默认模型:`Qwen/Qwen3-Embedding-0.6B` | |
| 21 | +- 实现方式:`SentenceTransformer`(`trust_remote_code=True`) | |
| 22 | +- 向量归一化:默认开启(可通过环境变量关闭) | |
| 23 | + | |
| 16 | 24 | ### 服务接口 |
| 17 | 25 | |
| 18 | 26 | - `POST /embed/text` |
| ... | ... | @@ -54,7 +62,6 @@ |
| 54 | 62 | 编辑 `embeddings/config.py`: |
| 55 | 63 | |
| 56 | 64 | - `PORT`: 服务端口(默认 6005) |
| 57 | -- `TEXT_MODEL_DIR`, `TEXT_DEVICE`, `TEXT_BATCH_SIZE` | |
| 65 | +- `TEXT_MODEL_ID`, `TEXT_DEVICE`, `TEXT_BATCH_SIZE`, `TEXT_NORMALIZE_EMBEDDINGS` | |
| 58 | 66 | - `USE_CLIP_AS_SERVICE`, `CLIP_AS_SERVICE_SERVER`:图片向量(clip-as-service) |
| 59 | 67 | - `IMAGE_MODEL_NAME`, `IMAGE_DEVICE`:本地 CN-CLIP(当 `USE_CLIP_AS_SERVICE=false` 时) |
| 60 | - | ... | ... |
embeddings/__init__.py
| ... | ... | @@ -4,7 +4,8 @@ Embeddings module. |
| 4 | 4 | Important: keep package import lightweight. |
| 5 | 5 | |
| 6 | 6 | Some callers do: |
| 7 | - - `from embeddings import BgeEncoder` | |
| 7 | + - `from embeddings import TextEmbeddingEncoder` | |
| 8 | + - `from embeddings import BgeEncoder` (deprecated alias) | |
| 8 | 9 | - `from embeddings import CLIPImageEncoder` |
| 9 | 10 | |
| 10 | 11 | But the underlying implementations may import heavy optional deps (Pillow, torch, etc). |
| ... | ... | @@ -13,15 +14,19 @@ without importing client code), we provide small lazy factories here. |
| 13 | 14 | """ |
| 14 | 15 | |
| 15 | 16 | |
| 16 | -class BgeEncoder(object): | |
| 17 | - """Lazy factory for `embeddings.text_encoder.BgeEncoder`.""" | |
| 17 | +class TextEmbeddingEncoder(object): | |
| 18 | + """Lazy factory for `embeddings.text_encoder.TextEmbeddingEncoder`.""" | |
| 18 | 19 | |
| 19 | 20 | def __new__(cls, *args, **kwargs): |
| 20 | - from .text_encoder import BgeEncoder as _Real | |
| 21 | + from .text_encoder import TextEmbeddingEncoder as _Real | |
| 21 | 22 | |
| 22 | 23 | return _Real(*args, **kwargs) |
| 23 | 24 | |
| 24 | 25 | |
| 26 | +class BgeEncoder(TextEmbeddingEncoder): | |
| 27 | + """Deprecated backward-compatible alias for old class name.""" | |
| 28 | + | |
| 29 | + | |
| 25 | 30 | class CLIPImageEncoder(object): |
| 26 | 31 | """Lazy factory for `embeddings.image_encoder.CLIPImageEncoder`.""" |
| 27 | 32 | |
| ... | ... | @@ -31,4 +36,4 @@ class CLIPImageEncoder(object): |
| 31 | 36 | return _Real(*args, **kwargs) |
| 32 | 37 | |
| 33 | 38 | |
| 34 | -__all__ = ["BgeEncoder", "CLIPImageEncoder"] | |
| 39 | +__all__ = ["TextEmbeddingEncoder", "BgeEncoder", "CLIPImageEncoder"] | ... | ... |
embeddings/cloud_embedding_usage.md deleted
| ... | ... | @@ -1,231 +0,0 @@ |
| 1 | -# 云端向量化模块使用说明 | |
| 2 | - | |
| 3 | -## 概述 | |
| 4 | - | |
| 5 | -本项目新增了基于阿里云 DashScope API 的文本向量化模块,使用 `text-embedding-v4` 模型生成文本向量。 | |
| 6 | - | |
| 7 | -## 模块说明 | |
| 8 | - | |
| 9 | -### 1. CloudTextEncoder (`embeddings/cloud_text_encoder.py`) | |
| 10 | - | |
| 11 | -云端文本向量化编码器,使用阿里云 DashScope API。 | |
| 12 | - | |
| 13 | -**特性:** | |
| 14 | -- 单例模式,线程安全 | |
| 15 | -- 支持单个或批量文本向量化 | |
| 16 | -- 自动处理 API 调用和错误处理 | |
| 17 | -- 生成 1024 维向量 | |
| 18 | -- 支持批处理以避免 API 速率限制 | |
| 19 | - | |
| 20 | -**初始化:** | |
| 21 | -```python | |
| 22 | -from embeddings.cloud_text_encoder import CloudTextEncoder | |
| 23 | - | |
| 24 | -# 方式1:从环境变量读取 API Key | |
| 25 | -encoder = CloudTextEncoder() | |
| 26 | - | |
| 27 | -# 方式2:显式传入 API Key | |
| 28 | -encoder = CloudTextEncoder(api_key="sk-xxx") | |
| 29 | - | |
| 30 | -# 方式3:使用新加坡地域 | |
| 31 | -encoder = CloudTextEncoder( | |
| 32 | - api_key="sk-xxx", | |
| 33 | - base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1" | |
| 34 | -) | |
| 35 | -``` | |
| 36 | - | |
| 37 | -**使用示例:** | |
| 38 | -```python | |
| 39 | -# 单个文本向量化 | |
| 40 | -text = "衣服的质量杠杠的" | |
| 41 | -embedding = encoder.encode(text) | |
| 42 | -print(embedding.shape) # (1, 1024) | |
| 43 | - | |
| 44 | -# 批量文本向量化 | |
| 45 | -texts = ["文本1", "文本2", "文本3"] | |
| 46 | -embeddings = encoder.encode(texts) | |
| 47 | -print(embeddings.shape) # (3, 1024) | |
| 48 | - | |
| 49 | -# 大批量处理(自动分批) | |
| 50 | -large_texts = ["文本" + str(i) for i in range(1000)] | |
| 51 | -embeddings = encoder.encode_batch(large_texts, batch_size=32) | |
| 52 | -print(embeddings.shape) # (1000, 1024) | |
| 53 | -``` | |
| 54 | - | |
| 55 | -### 2. 测试脚本 (`scripts/test_cloud_embedding.py`) | |
| 56 | - | |
| 57 | -测试云端向量化功能,读取 `queries.txt` 前100条数据进行测试。 | |
| 58 | - | |
| 59 | -**功能:** | |
| 60 | -- 读取查询文件 | |
| 61 | -- 逐条发送向量化请求 | |
| 62 | -- 记录每次请求的发送时间、接收时间和耗时 | |
| 63 | -- 统计成功率和平均耗时 | |
| 64 | - | |
| 65 | -## 使用步骤 | |
| 66 | - | |
| 67 | -### 1. 设置 API Key | |
| 68 | - | |
| 69 | -首先需要获取阿里云 DashScope API Key: | |
| 70 | -- 北京地域:https://help.aliyun.com/zh/model-studio/get-api-key | |
| 71 | -- 新加坡地域:使用不同的 API Key | |
| 72 | - | |
| 73 | -设置环境变量: | |
| 74 | -```bash | |
| 75 | -export DASHSCOPE_API_KEY="your-api-key-here" | |
| 76 | -``` | |
| 77 | - | |
| 78 | -或者在项目根目录的 `.env` 文件中添加: | |
| 79 | -``` | |
| 80 | -DASHSCOPE_API_KEY=your-api-key-here | |
| 81 | -``` | |
| 82 | - | |
| 83 | -### 2. 安装依赖 | |
| 84 | - | |
| 85 | -确保已安装 OpenAI Python SDK: | |
| 86 | -```bash | |
| 87 | -pip install openai | |
| 88 | -``` | |
| 89 | - | |
| 90 | -或者使用项目的 requirements.txt: | |
| 91 | -```bash | |
| 92 | -pip install -r requirements.txt | |
| 93 | -``` | |
| 94 | - | |
| 95 | -### 3. 运行测试脚本 | |
| 96 | - | |
| 97 | -```bash | |
| 98 | -# 在项目根目录下运行 | |
| 99 | -python scripts/test_cloud_embedding.py | |
| 100 | -``` | |
| 101 | - | |
| 102 | -### 4. 查看测试结果 | |
| 103 | - | |
| 104 | -测试脚本会输出以下信息: | |
| 105 | -``` | |
| 106 | -================================================================================ | |
| 107 | -Cloud Text Embedding Test - Aliyun DashScope API | |
| 108 | -================================================================================ | |
| 109 | - | |
| 110 | -API Key: sk-xxxxxx...xxxx | |
| 111 | - | |
| 112 | -Reading queries from: /path/to/queries.txt | |
| 113 | -Successfully read 100 queries | |
| 114 | - | |
| 115 | -Initializing CloudTextEncoder... | |
| 116 | -CloudTextEncoder initialized successfully | |
| 117 | - | |
| 118 | -================================================================================ | |
| 119 | -Testing 100 queries (one by one) | |
| 120 | -================================================================================ | |
| 121 | - | |
| 122 | -[ 1/100] ✓ SUCCESS | |
| 123 | - Query: Bohemian Maxi Dress | |
| 124 | - Send Time: 2025-12-05 10:30:45.123 | |
| 125 | - Receive Time: 2025-12-05 10:30:45.456 | |
| 126 | - Duration: 0.333s | |
| 127 | - Embedding Shape: (1, 1024) | |
| 128 | - | |
| 129 | -[ 2/100] ✓ SUCCESS | |
| 130 | - Query: Vintage Denim Jacket | |
| 131 | - Send Time: 2025-12-05 10:30:45.789 | |
| 132 | - Receive Time: 2025-12-05 10:30:46.012 | |
| 133 | - Duration: 0.223s | |
| 134 | - Embedding Shape: (1, 1024) | |
| 135 | - | |
| 136 | -... | |
| 137 | - | |
| 138 | -================================================================================ | |
| 139 | -Test Summary | |
| 140 | -================================================================================ | |
| 141 | -Total Queries: 100 | |
| 142 | -Successful: 100 | |
| 143 | -Failed: 0 | |
| 144 | -Success Rate: 100.0% | |
| 145 | -Total Time: 35.123s | |
| 146 | -Total API Time: 32.456s | |
| 147 | -Average Duration: 0.325s per query | |
| 148 | -Throughput: 2.85 queries/second | |
| 149 | -================================================================================ | |
| 150 | -``` | |
| 151 | - | |
| 152 | -## 注意事项 | |
| 153 | - | |
| 154 | -1. **API 限制**:阿里云 DashScope API 有速率限制,测试脚本在批处理时会添加小延迟以避免触发限制。 | |
| 155 | - | |
| 156 | -2. **成本**:使用云端 API 会产生费用,请注意控制调用量。 | |
| 157 | - | |
| 158 | -3. **网络要求**:需要稳定的网络连接访问阿里云服务。 | |
| 159 | - | |
| 160 | -4. **错误处理**:如果 API 调用失败,编码器会返回零向量作为降级处理,并记录错误日志。 | |
| 161 | - | |
| 162 | -5. **向量维度**:`text-embedding-v4` 模型生成 1024 维向量,如果需要不同维度,请考虑使用其他模型。 | |
| 163 | - | |
| 164 | -## 地域选择 | |
| 165 | - | |
| 166 | -- **北京地域**(默认):`https://dashscope.aliyuncs.com/compatible-mode/v1` | |
| 167 | -- **新加坡地域**:`https://dashscope-intl.aliyuncs.com/compatible-mode/v1` | |
| 168 | - | |
| 169 | -不同地域使用不同的 API Key,请确保匹配。 | |
| 170 | - | |
| 171 | -## 集成到项目 | |
| 172 | - | |
| 173 | -在项目中使用云端向量化: | |
| 174 | - | |
| 175 | -```python | |
| 176 | -from embeddings.cloud_text_encoder import CloudTextEncoder | |
| 177 | - | |
| 178 | -# 初始化编码器(全局单例) | |
| 179 | -encoder = CloudTextEncoder() | |
| 180 | - | |
| 181 | -# 在搜索、索引等模块中使用 | |
| 182 | -def process_text(text: str): | |
| 183 | - embedding = encoder.encode(text) | |
| 184 | - # 使用 embedding 进行后续处理 | |
| 185 | - return embedding | |
| 186 | -``` | |
| 187 | - | |
| 188 | -## 对比本地编码器 | |
| 189 | - | |
| 190 | -| 特性 | CloudTextEncoder | BgeEncoder (本地) | | |
| 191 | -|------|------------------|-------------------| | |
| 192 | -| 部署方式 | 云端 API | 本地服务 | | |
| 193 | -| 初始成本 | 低(按使用付费) | 高(GPU/CPU 资源) | | |
| 194 | -| 运行成本 | 按调用量计费 | 固定资源成本 | | |
| 195 | -| 延迟 | 较高(网络往返) | 低(本地处理) | | |
| 196 | -| 吞吐量 | 受 API 限制 | 受硬件限制 | | |
| 197 | -| 离线使用 | 不支持 | 支持 | | |
| 198 | -| 维护成本 | 低 | 需要维护服务 | | |
| 199 | - | |
| 200 | -## 故障排除 | |
| 201 | - | |
| 202 | -### 问题:API Key 未设置 | |
| 203 | -``` | |
| 204 | -ERROR: DASHSCOPE_API_KEY environment variable is not set! | |
| 205 | -``` | |
| 206 | -**解决**:设置 `DASHSCOPE_API_KEY` 环境变量。 | |
| 207 | - | |
| 208 | -### 问题:API 调用失败 | |
| 209 | -``` | |
| 210 | -Failed to encode texts via DashScope API: ... | |
| 211 | -``` | |
| 212 | -**解决**: | |
| 213 | -1. 检查网络连接 | |
| 214 | -2. 验证 API Key 是否正确 | |
| 215 | -3. 确认 base_url 与 API Key 地域匹配 | |
| 216 | -4. 检查账户余额 | |
| 217 | - | |
| 218 | -### 问题:速率限制 | |
| 219 | -``` | |
| 220 | -Rate limit exceeded | |
| 221 | -``` | |
| 222 | -**解决**: | |
| 223 | -1. 增加批处理之间的延迟 | |
| 224 | -2. 减小 batch_size | |
| 225 | -3. 升级 API 套餐 | |
| 226 | - | |
| 227 | -## 更多信息 | |
| 228 | - | |
| 229 | -- [阿里云模型服务灵积文档](https://help.aliyun.com/zh/model-studio/) | |
| 230 | -- [text-embedding-v4 模型说明](https://help.aliyun.com/zh/model-studio/getting-started/models) | |
| 231 | - |
embeddings/cloud_text_encoder.py deleted
| ... | ... | @@ -1,142 +0,0 @@ |
| 1 | -""" | |
| 2 | -Text embedding encoder using Aliyun DashScope API. | |
| 3 | - | |
| 4 | -Generates embeddings via Aliyun's text-embedding-v4 model. | |
| 5 | -""" | |
| 6 | - | |
| 7 | -import os | |
| 8 | -import logging | |
| 9 | -import threading | |
| 10 | -import time | |
| 11 | -import numpy as np | |
| 12 | -from typing import List, Union | |
| 13 | -from openai import OpenAI | |
| 14 | - | |
| 15 | -logger = logging.getLogger(__name__) | |
| 16 | - | |
| 17 | - | |
| 18 | -class CloudTextEncoder: | |
| 19 | - """ | |
| 20 | - Singleton text encoder using Aliyun DashScope API. | |
| 21 | - | |
| 22 | - Thread-safe singleton pattern ensures only one instance exists. | |
| 23 | - Uses text-embedding-v4 model for generating embeddings. | |
| 24 | - """ | |
| 25 | - _instance = None | |
| 26 | - _lock = threading.Lock() | |
| 27 | - | |
| 28 | - def __new__(cls, api_key: str = None, base_url: str = None): | |
| 29 | - with cls._lock: | |
| 30 | - if cls._instance is None: | |
| 31 | - cls._instance = super(CloudTextEncoder, cls).__new__(cls) | |
| 32 | - | |
| 33 | - # Get API key from parameter or environment variable | |
| 34 | - api_key = api_key or os.getenv("DASHSCOPE_API_KEY") | |
| 35 | - if not api_key: | |
| 36 | - raise ValueError("DASHSCOPE_API_KEY must be set in environment or passed as parameter") | |
| 37 | - | |
| 38 | - # 以下是北京地域base-url,如果使用新加坡地域的模型,需要将base_url替换为:https://dashscope-intl.aliyuncs.com/compatible-mode/v1 | |
| 39 | - base_url = base_url or "https://dashscope.aliyuncs.com/compatible-mode/v1" | |
| 40 | - | |
| 41 | - cls._instance.client = OpenAI( | |
| 42 | - api_key=api_key, | |
| 43 | - base_url=base_url | |
| 44 | - ) | |
| 45 | - cls._instance.model = "text-embedding-v4" | |
| 46 | - logger.info(f"Created CloudTextEncoder instance with base_url: {base_url}") | |
| 47 | - | |
| 48 | - return cls._instance | |
| 49 | - | |
| 50 | - def encode( | |
| 51 | - self, | |
| 52 | - sentences: Union[str, List[str]], | |
| 53 | - normalize_embeddings: bool = True, | |
| 54 | - device: str = 'cpu', | |
| 55 | - batch_size: int = 32 | |
| 56 | - ) -> np.ndarray: | |
| 57 | - """ | |
| 58 | - Encode text into embeddings via Aliyun DashScope API. | |
| 59 | - | |
| 60 | - Args: | |
| 61 | - sentences: Single string or list of strings to encode | |
| 62 | - normalize_embeddings: Whether to normalize embeddings (handled by API) | |
| 63 | - device: Device parameter (ignored, for compatibility) | |
| 64 | - batch_size: Batch size for processing (currently processes all at once) | |
| 65 | - | |
| 66 | - Returns: | |
| 67 | - numpy array of shape (n, dimension) containing embeddings | |
| 68 | - """ | |
| 69 | - # Convert single string to list | |
| 70 | - if isinstance(sentences, str): | |
| 71 | - sentences = [sentences] | |
| 72 | - | |
| 73 | - if not sentences: | |
| 74 | - return np.array([]) | |
| 75 | - | |
| 76 | - try: | |
| 77 | - # Call DashScope API | |
| 78 | - start_time = time.time() | |
| 79 | - completion = self.client.embeddings.create( | |
| 80 | - model=self.model, | |
| 81 | - input=sentences | |
| 82 | - ) | |
| 83 | - elapsed_time = time.time() - start_time | |
| 84 | - | |
| 85 | - logger.info(f"Generated embeddings for {len(sentences)} texts in {elapsed_time:.3f}s") | |
| 86 | - | |
| 87 | - # Extract embeddings from response | |
| 88 | - embeddings = [] | |
| 89 | - for item in completion.data: | |
| 90 | - embeddings.append(item.embedding) | |
| 91 | - | |
| 92 | - return np.array(embeddings, dtype=np.float32) | |
| 93 | - | |
| 94 | - except Exception as e: | |
| 95 | - logger.error(f"Failed to encode texts via DashScope API: {e}", exc_info=True) | |
| 96 | - # Return zero embeddings as fallback (dimension based on text-embedding-v4) | |
| 97 | - # text-embedding-v4 typically returns 1024-dimensional vectors | |
| 98 | - return np.zeros((len(sentences), 1024), dtype=np.float32) | |
| 99 | - | |
| 100 | - def encode_batch( | |
| 101 | - self, | |
| 102 | - texts: List[str], | |
| 103 | - batch_size: int = 32, | |
| 104 | - device: str = 'cpu' | |
| 105 | - ) -> np.ndarray: | |
| 106 | - """ | |
| 107 | - Encode a batch of texts via Aliyun DashScope API. | |
| 108 | - | |
| 109 | - Args: | |
| 110 | - texts: List of texts to encode | |
| 111 | - batch_size: Batch size for processing | |
| 112 | - device: Device parameter (ignored, for compatibility) | |
| 113 | - | |
| 114 | - Returns: | |
| 115 | - numpy array of embeddings | |
| 116 | - """ | |
| 117 | - if not texts: | |
| 118 | - return np.array([]) | |
| 119 | - | |
| 120 | - # Process in batches to avoid API limits | |
| 121 | - all_embeddings = [] | |
| 122 | - | |
| 123 | - for i in range(0, len(texts), batch_size): | |
| 124 | - batch = texts[i:i + batch_size] | |
| 125 | - embeddings = self.encode(batch, device=device) | |
| 126 | - all_embeddings.append(embeddings) | |
| 127 | - | |
| 128 | - # Small delay to avoid rate limiting | |
| 129 | - if i + batch_size < len(texts): | |
| 130 | - time.sleep(0.1) | |
| 131 | - | |
| 132 | - return np.vstack(all_embeddings) if all_embeddings else np.array([]) | |
| 133 | - | |
| 134 | - def get_embedding_dimension(self) -> int: | |
| 135 | - """ | |
| 136 | - Get the dimension of embeddings produced by this encoder. | |
| 137 | - | |
| 138 | - Returns: | |
| 139 | - Embedding dimension (1024 for text-embedding-v4) | |
| 140 | - """ | |
| 141 | - return 1024 | |
| 142 | - |
embeddings/config.py
| ... | ... | @@ -16,10 +16,13 @@ class EmbeddingConfig(object): |
| 16 | 16 | HOST = os.getenv("EMBEDDING_HOST", "0.0.0.0") |
| 17 | 17 | PORT = int(os.getenv("EMBEDDING_PORT", 6005)) |
| 18 | 18 | |
| 19 | - # Text embeddings (BGE-M3) | |
| 20 | - TEXT_MODEL_DIR = "Xorbits/bge-m3" | |
| 21 | - TEXT_DEVICE = "cuda" # "cuda" or "cpu" (model may fall back to CPU if needed) | |
| 22 | - TEXT_BATCH_SIZE = 32 | |
| 19 | + # Text embeddings (Qwen3-Embedding-0.6B, Sentence Transformers) | |
| 20 | + TEXT_MODEL_ID = os.getenv("TEXT_MODEL_ID", "Qwen/Qwen3-Embedding-0.6B") | |
| 21 | + # Backward-compatible alias for old naming in docs/scripts. | |
| 22 | + TEXT_MODEL_DIR = TEXT_MODEL_ID | |
| 23 | + TEXT_DEVICE = os.getenv("TEXT_DEVICE", "cuda") # "cuda" or "cpu" | |
| 24 | + TEXT_BATCH_SIZE = int(os.getenv("TEXT_BATCH_SIZE", "32")) | |
| 25 | + TEXT_NORMALIZE_EMBEDDINGS = os.getenv("TEXT_NORMALIZE_EMBEDDINGS", "true").lower() in ("1", "true", "yes") | |
| 23 | 26 | |
| 24 | 27 | # Image embeddings |
| 25 | 28 | # Option A: clip-as-service (Jina CLIP server, recommended) |
| ... | ... | @@ -36,4 +39,3 @@ class EmbeddingConfig(object): |
| 36 | 39 | |
| 37 | 40 | CONFIG = EmbeddingConfig() |
| 38 | 41 | |
| 39 | - | ... | ... |
embeddings/image_encoder.py
| 1 | -""" | |
| 2 | -Image embedding encoder using network service. | |
| 1 | +"""Image embedding client for the local embedding HTTP service.""" | |
| 3 | 2 | |
| 4 | -Generates embeddings via HTTP API service (default localhost:6005). | |
| 5 | -""" | |
| 6 | - | |
| 7 | -import sys | |
| 8 | 3 | import os |
| 9 | -import requests | |
| 4 | +import logging | |
| 5 | +from typing import Any, List, Optional, Union | |
| 6 | + | |
| 10 | 7 | import numpy as np |
| 8 | +import requests | |
| 11 | 9 | from PIL import Image |
| 12 | -import logging | |
| 13 | -import threading | |
| 14 | -from typing import List, Optional, Union, Dict, Any | |
| 15 | 10 | |
| 16 | 11 | logger = logging.getLogger(__name__) |
| 17 | 12 | |
| ... | ... | @@ -22,21 +17,14 @@ class CLIPImageEncoder: |
| 22 | 17 | """ |
| 23 | 18 | Image Encoder for generating image embeddings using network service. |
| 24 | 19 | |
| 25 | - Thread-safe singleton pattern. | |
| 20 | + This client is stateless and safe to instantiate per caller. | |
| 26 | 21 | """ |
| 27 | 22 | |
| 28 | - _instance = None | |
| 29 | - _lock = threading.Lock() | |
| 30 | - | |
| 31 | - def __new__(cls, service_url: Optional[str] = None): | |
| 32 | - with cls._lock: | |
| 33 | - if cls._instance is None: | |
| 34 | - cls._instance = super(CLIPImageEncoder, cls).__new__(cls) | |
| 35 | - resolved_url = service_url or os.getenv("EMBEDDING_SERVICE_URL") or get_embedding_base_url() | |
| 36 | - logger.info(f"Creating CLIPImageEncoder instance with service URL: {resolved_url}") | |
| 37 | - cls._instance.service_url = resolved_url | |
| 38 | - cls._instance.endpoint = f"{resolved_url}/embed/image" | |
| 39 | - return cls._instance | |
| 23 | + def __init__(self, service_url: Optional[str] = None): | |
| 24 | + resolved_url = service_url or os.getenv("EMBEDDING_SERVICE_URL") or get_embedding_base_url() | |
| 25 | + self.service_url = str(resolved_url).rstrip("/") | |
| 26 | + self.endpoint = f"{self.service_url}/embed/image" | |
| 27 | + logger.info("Creating CLIPImageEncoder instance with service URL: %s", self.service_url) | |
| 40 | 28 | |
| 41 | 29 | def _call_service(self, request_data: List[str]) -> List[Any]: |
| 42 | 30 | """ | ... | ... |
embeddings/bge_model.py renamed to embeddings/qwen3_model.py
| 1 | 1 | """ |
| 2 | -BGE-M3 local text embedding implementation. | |
| 2 | +Qwen3-Embedding-0.6B local text embedding implementation. | |
| 3 | 3 | |
| 4 | 4 | Internal model implementation used by the embedding service. |
| 5 | 5 | """ |
| ... | ... | @@ -9,22 +9,21 @@ from typing import List, Union |
| 9 | 9 | |
| 10 | 10 | import numpy as np |
| 11 | 11 | from sentence_transformers import SentenceTransformer |
| 12 | -from modelscope import snapshot_download | |
| 13 | 12 | |
| 14 | 13 | |
| 15 | -class BgeTextModel(object): | |
| 14 | +class Qwen3TextModel(object): | |
| 16 | 15 | """ |
| 17 | - Thread-safe singleton text encoder using BGE-M3 model (local inference). | |
| 16 | + Thread-safe singleton text encoder using Qwen3-Embedding-0.6B (local inference). | |
| 18 | 17 | """ |
| 19 | 18 | |
| 20 | 19 | _instance = None |
| 21 | 20 | _lock = threading.Lock() |
| 22 | 21 | |
| 23 | - def __new__(cls, model_dir: str = "Xorbits/bge-m3"): | |
| 22 | + def __new__(cls, model_id: str = "Qwen/Qwen3-Embedding-0.6B"): | |
| 24 | 23 | with cls._lock: |
| 25 | 24 | if cls._instance is None: |
| 26 | - cls._instance = super(BgeTextModel, cls).__new__(cls) | |
| 27 | - cls._instance.model = SentenceTransformer(snapshot_download(model_dir)) | |
| 25 | + cls._instance = super(Qwen3TextModel, cls).__new__(cls) | |
| 26 | + cls._instance.model = SentenceTransformer(model_id, trust_remote_code=True) | |
| 28 | 27 | return cls._instance |
| 29 | 28 | |
| 30 | 29 | def encode( |
| ... | ... | @@ -37,7 +36,7 @@ class BgeTextModel(object): |
| 37 | 36 | if device == "gpu": |
| 38 | 37 | device = "cuda" |
| 39 | 38 | |
| 40 | - # Try requested device, fallback to CPU if CUDA fails | |
| 39 | + # Try requested device, fallback to CPU if CUDA is unavailable/insufficient. | |
| 41 | 40 | try: |
| 42 | 41 | if device == "cuda": |
| 43 | 42 | import torch |
| ... | ... | @@ -61,7 +60,6 @@ class BgeTextModel(object): |
| 61 | 60 | batch_size=batch_size, |
| 62 | 61 | ) |
| 63 | 62 | return embeddings |
| 64 | - | |
| 65 | 63 | except Exception: |
| 66 | 64 | if device != "cpu": |
| 67 | 65 | self.model = self.model.to("cpu") |
| ... | ... | @@ -75,7 +73,17 @@ class BgeTextModel(object): |
| 75 | 73 | return embeddings |
| 76 | 74 | raise |
| 77 | 75 | |
| 78 | - def encode_batch(self, texts: List[str], batch_size: int = 32, device: str = "cuda") -> np.ndarray: | |
| 79 | - return self.encode(texts, batch_size=batch_size, device=device) | |
| 80 | - | |
| 76 | + def encode_batch( | |
| 77 | + self, | |
| 78 | + texts: List[str], | |
| 79 | + batch_size: int = 32, | |
| 80 | + device: str = "cuda", | |
| 81 | + normalize_embeddings: bool = True, | |
| 82 | + ) -> np.ndarray: | |
| 83 | + return self.encode( | |
| 84 | + texts, | |
| 85 | + batch_size=batch_size, | |
| 86 | + device=device, | |
| 87 | + normalize_embeddings=normalize_embeddings, | |
| 88 | + ) | |
| 81 | 89 | ... | ... |
embeddings/server.py
| ... | ... | @@ -14,9 +14,6 @@ import numpy as np |
| 14 | 14 | from fastapi import FastAPI |
| 15 | 15 | |
| 16 | 16 | from embeddings.config import CONFIG |
| 17 | -from embeddings.bge_model import BgeTextModel | |
| 18 | -from embeddings.clip_model import ClipImageModel | |
| 19 | -from embeddings.clip_as_service_encoder import ClipAsServiceImageEncoder | |
| 20 | 17 | from embeddings.protocols import ImageEncoderProtocol |
| 21 | 18 | |
| 22 | 19 | logger = logging.getLogger(__name__) |
| ... | ... | @@ -24,7 +21,7 @@ logger = logging.getLogger(__name__) |
| 24 | 21 | app = FastAPI(title="saas-search Embedding Service", version="1.0.0") |
| 25 | 22 | |
| 26 | 23 | # Models are loaded at startup, not lazily |
| 27 | -_text_model: Optional[BgeTextModel] = None | |
| 24 | +_text_model: Optional[Any] = None | |
| 28 | 25 | _image_model: Optional[ImageEncoderProtocol] = None |
| 29 | 26 | open_text_model = True |
| 30 | 27 | open_image_model = True # Enable image embedding when using clip-as-service |
| ... | ... | @@ -43,8 +40,10 @@ def load_models(): |
| 43 | 40 | # Load text model |
| 44 | 41 | if open_text_model: |
| 45 | 42 | try: |
| 46 | - logger.info(f"Loading text model: {CONFIG.TEXT_MODEL_DIR}") | |
| 47 | - _text_model = BgeTextModel(model_dir=CONFIG.TEXT_MODEL_DIR) | |
| 43 | + from embeddings.qwen3_model import Qwen3TextModel | |
| 44 | + | |
| 45 | + logger.info(f"Loading text model: {CONFIG.TEXT_MODEL_ID}") | |
| 46 | + _text_model = Qwen3TextModel(model_id=CONFIG.TEXT_MODEL_ID) | |
| 48 | 47 | logger.info("Text model loaded successfully") |
| 49 | 48 | except Exception as e: |
| 50 | 49 | logger.error(f"Failed to load text model: {e}", exc_info=True) |
| ... | ... | @@ -58,6 +57,8 @@ def load_models(): |
| 58 | 57 | if open_image_model: |
| 59 | 58 | try: |
| 60 | 59 | if CONFIG.USE_CLIP_AS_SERVICE: |
| 60 | + from embeddings.clip_as_service_encoder import ClipAsServiceImageEncoder | |
| 61 | + | |
| 61 | 62 | logger.info(f"Loading image encoder via clip-as-service: {CONFIG.CLIP_AS_SERVICE_SERVER}") |
| 62 | 63 | _image_model = ClipAsServiceImageEncoder( |
| 63 | 64 | server=CONFIG.CLIP_AS_SERVICE_SERVER, |
| ... | ... | @@ -65,6 +66,8 @@ def load_models(): |
| 65 | 66 | ) |
| 66 | 67 | logger.info("Image model (clip-as-service) loaded successfully") |
| 67 | 68 | else: |
| 69 | + from embeddings.clip_model import ClipImageModel | |
| 70 | + | |
| 68 | 71 | logger.info(f"Loading local image model: {CONFIG.IMAGE_MODEL_NAME} (device: {CONFIG.IMAGE_DEVICE})") |
| 69 | 72 | _image_model = ClipImageModel( |
| 70 | 73 | model_name=CONFIG.IMAGE_MODEL_NAME, |
| ... | ... | @@ -126,7 +129,10 @@ def embed_text(texts: List[str]) -> List[Optional[List[float]]]: |
| 126 | 129 | try: |
| 127 | 130 | with _text_encode_lock: |
| 128 | 131 | embs = _text_model.encode_batch( |
| 129 | - batch_texts, batch_size=int(CONFIG.TEXT_BATCH_SIZE), device=CONFIG.TEXT_DEVICE | |
| 132 | + batch_texts, | |
| 133 | + batch_size=int(CONFIG.TEXT_BATCH_SIZE), | |
| 134 | + device=CONFIG.TEXT_DEVICE, | |
| 135 | + normalize_embeddings=bool(CONFIG.TEXT_NORMALIZE_EMBEDDINGS), | |
| 130 | 136 | ) |
| 131 | 137 | for j, (idx, _t) in enumerate(indexed_texts): |
| 132 | 138 | out[idx] = _as_list(embs[j]) |
| ... | ... | @@ -170,5 +176,3 @@ def embed_image(images: List[str]) -> List[Optional[List[float]]]: |
| 170 | 176 | for idx in indices: |
| 171 | 177 | out[idx] = None |
| 172 | 178 | return out |
| 173 | - | |
| 174 | - | ... | ... |
embeddings/text_encoder.py
| 1 | -""" | |
| 2 | -Text embedding encoder using network service. | |
| 1 | +"""Text embedding client for the local embedding HTTP service.""" | |
| 3 | 2 | |
| 4 | -Generates embeddings via HTTP API service (default localhost:6005). | |
| 5 | -""" | |
| 3 | +import logging | |
| 4 | +import os | |
| 5 | +import pickle | |
| 6 | +from datetime import timedelta | |
| 7 | +from typing import Any, List, Optional, Union | |
| 6 | 8 | |
| 7 | -import sys | |
| 8 | -import requests | |
| 9 | -import time | |
| 10 | -import threading | |
| 11 | 9 | import numpy as np |
| 12 | -import pickle | |
| 13 | 10 | import redis |
| 14 | -import os | |
| 15 | -from datetime import timedelta | |
| 16 | -from typing import List, Union, Dict, Any, Optional | |
| 17 | -import logging | |
| 11 | +import requests | |
| 18 | 12 | |
| 19 | 13 | logger = logging.getLogger(__name__) |
| 20 | 14 | |
| ... | ... | @@ -27,44 +21,34 @@ except ImportError: |
| 27 | 21 | REDIS_CONFIG = {} |
| 28 | 22 | |
| 29 | 23 | |
| 30 | -class BgeEncoder: | |
| 24 | +class TextEmbeddingEncoder: | |
| 31 | 25 | """ |
| 32 | - Singleton text encoder using network service. | |
| 33 | - | |
| 34 | - Thread-safe singleton pattern ensures only one instance exists. | |
| 26 | + Text embedding encoder using network service. | |
| 35 | 27 | """ |
| 36 | - _instance = None | |
| 37 | - _lock = threading.Lock() | |
| 38 | 28 | |
| 39 | - def __new__(cls, service_url: Optional[str] = None): | |
| 40 | - with cls._lock: | |
| 41 | - if cls._instance is None: | |
| 42 | - cls._instance = super(BgeEncoder, cls).__new__(cls) | |
| 43 | - resolved_url = service_url or os.getenv("EMBEDDING_SERVICE_URL") or get_embedding_base_url() | |
| 44 | - logger.info(f"Creating BgeEncoder instance with service URL: {resolved_url}") | |
| 45 | - cls._instance.service_url = resolved_url | |
| 46 | - cls._instance.endpoint = f"{resolved_url}/embed/text" | |
| 47 | - | |
| 48 | - # Initialize Redis cache | |
| 49 | - try: | |
| 50 | - cls._instance.redis_client = redis.Redis( | |
| 51 | - host=REDIS_CONFIG.get('host', 'localhost'), | |
| 52 | - port=REDIS_CONFIG.get('port', 6479), | |
| 53 | - password=REDIS_CONFIG.get('password'), | |
| 54 | - decode_responses=False, # Keep binary data as is | |
| 55 | - socket_timeout=REDIS_CONFIG.get('socket_timeout', 1), | |
| 56 | - socket_connect_timeout=REDIS_CONFIG.get('socket_connect_timeout', 1), | |
| 57 | - retry_on_timeout=REDIS_CONFIG.get('retry_on_timeout', False), | |
| 58 | - health_check_interval=10 # 避免复用坏连接 | |
| 59 | - ) | |
| 60 | - # Test connection | |
| 61 | - cls._instance.redis_client.ping() | |
| 62 | - cls._instance.expire_time = timedelta(days=REDIS_CONFIG.get('cache_expire_days', 180)) | |
| 63 | - logger.info("Redis cache initialized for embeddings") | |
| 64 | - except Exception as e: | |
| 65 | - logger.warning(f"Failed to initialize Redis cache for embeddings: {e}, continuing without cache") | |
| 66 | - cls._instance.redis_client = None | |
| 67 | - return cls._instance | |
| 29 | + def __init__(self, service_url: Optional[str] = None): | |
| 30 | + resolved_url = service_url or os.getenv("EMBEDDING_SERVICE_URL") or get_embedding_base_url() | |
| 31 | + self.service_url = str(resolved_url).rstrip("/") | |
| 32 | + self.endpoint = f"{self.service_url}/embed/text" | |
| 33 | + self.expire_time = timedelta(days=REDIS_CONFIG.get("cache_expire_days", 180)) | |
| 34 | + logger.info("Creating TextEmbeddingEncoder instance with service URL: %s", self.service_url) | |
| 35 | + | |
| 36 | + try: | |
| 37 | + self.redis_client = redis.Redis( | |
| 38 | + host=REDIS_CONFIG.get("host", "localhost"), | |
| 39 | + port=REDIS_CONFIG.get("port", 6479), | |
| 40 | + password=REDIS_CONFIG.get("password"), | |
| 41 | + decode_responses=False, | |
| 42 | + socket_timeout=REDIS_CONFIG.get("socket_timeout", 1), | |
| 43 | + socket_connect_timeout=REDIS_CONFIG.get("socket_connect_timeout", 1), | |
| 44 | + retry_on_timeout=REDIS_CONFIG.get("retry_on_timeout", False), | |
| 45 | + health_check_interval=10, | |
| 46 | + ) | |
| 47 | + self.redis_client.ping() | |
| 48 | + logger.info("Redis cache initialized for embeddings") | |
| 49 | + except Exception as e: | |
| 50 | + logger.warning("Failed to initialize Redis cache for embeddings: %s, continuing without cache", e) | |
| 51 | + self.redis_client = None | |
| 68 | 52 | |
| 69 | 53 | def _call_service(self, request_data: List[str]) -> List[Any]: |
| 70 | 54 | """ |
| ... | ... | @@ -85,7 +69,7 @@ class BgeEncoder: |
| 85 | 69 | response.raise_for_status() |
| 86 | 70 | return response.json() |
| 87 | 71 | except requests.exceptions.RequestException as e: |
| 88 | - logger.error(f"BgeEncoder service request failed: {e}", exc_info=True) | |
| 72 | + logger.error(f"TextEmbeddingEncoder service request failed: {e}", exc_info=True) | |
| 89 | 73 | raise |
| 90 | 74 | |
| 91 | 75 | def encode( |
| ... | ... | @@ -122,7 +106,7 @@ class BgeEncoder: |
| 122 | 106 | embeddings: List[Optional[np.ndarray]] = [None] * len(sentences) |
| 123 | 107 | |
| 124 | 108 | for i, text in enumerate(sentences): |
| 125 | - cached = self._get_cached_embedding(text, 'en') # Use 'en' as default language for title embedding | |
| 109 | + cached = self._get_cached_embedding(text, "generic") | |
| 126 | 110 | if cached is not None: |
| 127 | 111 | embeddings[i] = cached |
| 128 | 112 | else: |
| ... | ... | @@ -152,7 +136,7 @@ class BgeEncoder: |
| 152 | 136 | if self._is_valid_embedding(embedding_array): |
| 153 | 137 | embeddings[original_idx] = embedding_array |
| 154 | 138 | # Cache the embedding |
| 155 | - self._set_cached_embedding(text, 'en', embedding_array) | |
| 139 | + self._set_cached_embedding(text, "generic", embedding_array) | |
| 156 | 140 | else: |
| 157 | 141 | logger.warning( |
| 158 | 142 | f"Invalid embedding returned from service for text {original_idx} " |
| ... | ... | @@ -270,3 +254,7 @@ class BgeEncoder: |
| 270 | 254 | except Exception as e: |
| 271 | 255 | logger.error(f"Error storing embedding in cache: {e}") |
| 272 | 256 | return False |
| 257 | + | |
| 258 | + | |
| 259 | +# Backward compatibility for existing imports/usages. | |
| 260 | +BgeEncoder = TextEmbeddingEncoder | ... | ... |
examples/cloud_embedding_example.py deleted
| ... | ... | @@ -1,167 +0,0 @@ |
| 1 | -""" | |
| 2 | -简单示例:使用云端文本向量化模块 | |
| 3 | - | |
| 4 | -展示如何使用 CloudTextEncoder 进行文本向量化。 | |
| 5 | -""" | |
| 6 | - | |
| 7 | -import os | |
| 8 | -import sys | |
| 9 | -from pathlib import Path | |
| 10 | - | |
| 11 | -# Add parent directory to path | |
| 12 | -sys.path.insert(0, str(Path(__file__).parent.parent)) | |
| 13 | - | |
| 14 | -from embeddings.cloud_text_encoder import CloudTextEncoder | |
| 15 | - | |
| 16 | - | |
| 17 | -def example_single_text(): | |
| 18 | - """示例1:单个文本向量化""" | |
| 19 | - print("=" * 60) | |
| 20 | - print("示例1:单个文本向量化") | |
| 21 | - print("=" * 60) | |
| 22 | - | |
| 23 | - # 初始化编码器 | |
| 24 | - encoder = CloudTextEncoder() | |
| 25 | - | |
| 26 | - # 单个文本 | |
| 27 | - text = "衣服的质量杠杠的" | |
| 28 | - print(f"输入文本: {text}") | |
| 29 | - | |
| 30 | - # 生成向量 | |
| 31 | - embedding = encoder.encode(text) | |
| 32 | - | |
| 33 | - print(f"向量维度: {embedding.shape}") | |
| 34 | - print(f"向量前5个值: {embedding[0][:5]}") | |
| 35 | - print() | |
| 36 | - | |
| 37 | - | |
| 38 | -def example_multiple_texts(): | |
| 39 | - """示例2:批量文本向量化""" | |
| 40 | - print("=" * 60) | |
| 41 | - print("示例2:批量文本向量化") | |
| 42 | - print("=" * 60) | |
| 43 | - | |
| 44 | - # 初始化编码器 | |
| 45 | - encoder = CloudTextEncoder() | |
| 46 | - | |
| 47 | - # 多个文本 | |
| 48 | - texts = [ | |
| 49 | - "Bohemian Maxi Dress", | |
| 50 | - "Vintage Denim Jacket", | |
| 51 | - "Minimalist Linen Trousers", | |
| 52 | - "Gothic Black Boots", | |
| 53 | - "Streetwear Oversized Hoodie" | |
| 54 | - ] | |
| 55 | - | |
| 56 | - print(f"输入文本数量: {len(texts)}") | |
| 57 | - for i, text in enumerate(texts, 1): | |
| 58 | - print(f" {i}. {text}") | |
| 59 | - | |
| 60 | - # 生成向量 | |
| 61 | - embeddings = encoder.encode(texts) | |
| 62 | - | |
| 63 | - print(f"\n向量矩阵维度: {embeddings.shape}") | |
| 64 | - print(f"第一个文本的向量前5个值: {embeddings[0][:5]}") | |
| 65 | - print() | |
| 66 | - | |
| 67 | - | |
| 68 | -def example_batch_processing(): | |
| 69 | - """示例3:大批量处理""" | |
| 70 | - print("=" * 60) | |
| 71 | - print("示例3:大批量处理(自动分批)") | |
| 72 | - print("=" * 60) | |
| 73 | - | |
| 74 | - # 初始化编码器 | |
| 75 | - encoder = CloudTextEncoder() | |
| 76 | - | |
| 77 | - # 生成大量文本 | |
| 78 | - texts = [f"商品描述 {i}" for i in range(50)] | |
| 79 | - | |
| 80 | - print(f"输入文本数量: {len(texts)}") | |
| 81 | - print(f"批大小: 10") | |
| 82 | - | |
| 83 | - # 使用 encode_batch 自动分批处理 | |
| 84 | - embeddings = encoder.encode_batch(texts, batch_size=10) | |
| 85 | - | |
| 86 | - print(f"向量矩阵维度: {embeddings.shape}") | |
| 87 | - print(f"平均向量范数: {embeddings.mean():.4f}") | |
| 88 | - print() | |
| 89 | - | |
| 90 | - | |
| 91 | -def example_similarity_calculation(): | |
| 92 | - """示例4:计算文本相似度""" | |
| 93 | - print("=" * 60) | |
| 94 | - print("示例4:计算文本相似度") | |
| 95 | - print("=" * 60) | |
| 96 | - | |
| 97 | - import numpy as np | |
| 98 | - | |
| 99 | - # 初始化编码器 | |
| 100 | - encoder = CloudTextEncoder() | |
| 101 | - | |
| 102 | - # 准备文本 | |
| 103 | - query = "夏季连衣裙" | |
| 104 | - candidates = [ | |
| 105 | - "Summer maxi dress", | |
| 106 | - "冬季羽绒服", | |
| 107 | - "夏天长裙", | |
| 108 | - "运动鞋", | |
| 109 | - "女士连衣裙" | |
| 110 | - ] | |
| 111 | - | |
| 112 | - print(f"查询文本: {query}") | |
| 113 | - print(f"候选文本:") | |
| 114 | - for i, text in enumerate(candidates, 1): | |
| 115 | - print(f" {i}. {text}") | |
| 116 | - | |
| 117 | - # 生成向量 | |
| 118 | - query_embedding = encoder.encode(query) | |
| 119 | - candidate_embeddings = encoder.encode(candidates) | |
| 120 | - | |
| 121 | - # 计算余弦相似度 | |
| 122 | - def cosine_similarity(a, b): | |
| 123 | - return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)) | |
| 124 | - | |
| 125 | - print(f"\n相似度分数:") | |
| 126 | - similarities = [] | |
| 127 | - for i, candidate_emb in enumerate(candidate_embeddings): | |
| 128 | - sim = cosine_similarity(query_embedding[0], candidate_emb) | |
| 129 | - similarities.append((sim, candidates[i])) | |
| 130 | - print(f" {candidates[i]}: {sim:.4f}") | |
| 131 | - | |
| 132 | - # 排序并显示最相似的 | |
| 133 | - similarities.sort(reverse=True) | |
| 134 | - print(f"\n最相似的文本: {similarities[0][1]} (相似度: {similarities[0][0]:.4f})") | |
| 135 | - print() | |
| 136 | - | |
| 137 | - | |
| 138 | -def main(): | |
| 139 | - """主函数""" | |
| 140 | - # 检查 API Key | |
| 141 | - if not os.getenv("DASHSCOPE_API_KEY"): | |
| 142 | - print("错误: 请设置 DASHSCOPE_API_KEY 环境变量") | |
| 143 | - print("示例: export DASHSCOPE_API_KEY='your-api-key'") | |
| 144 | - return | |
| 145 | - | |
| 146 | - print("\n云端文本向量化示例\n") | |
| 147 | - | |
| 148 | - try: | |
| 149 | - # 运行所有示例 | |
| 150 | - example_single_text() | |
| 151 | - example_multiple_texts() | |
| 152 | - example_batch_processing() | |
| 153 | - example_similarity_calculation() | |
| 154 | - | |
| 155 | - print("=" * 60) | |
| 156 | - print("所有示例运行完成!") | |
| 157 | - print("=" * 60) | |
| 158 | - | |
| 159 | - except Exception as e: | |
| 160 | - print(f"\n错误: {e}") | |
| 161 | - import traceback | |
| 162 | - traceback.print_exc() | |
| 163 | - | |
| 164 | - | |
| 165 | -if __name__ == "__main__": | |
| 166 | - main() | |
| 167 | - |
indexer/README.md
| ... | ... | @@ -274,7 +274,7 @@ if (StrUtil.isNotBlank(spu.getTitle()) && CollUtil.isNotEmpty(titleEmbedding)) { |
| 274 | 274 | |
| 275 | 275 | 你当前 Python 侧已有: |
| 276 | 276 | |
| 277 | -- `embeddings/text_encoder.py`(BGE-M3 模型); | |
| 277 | +- `embeddings/text_encoder.py`(通过 6005 向量服务调用 Qwen3-Embedding-0.6B); | |
| 278 | 278 | - `SPUDocumentTransformer._fill_title_embedding` 已封装了调用 encoder 的逻辑。 |
| 279 | 279 | |
| 280 | 280 | **建议缓存策略(可选,但推荐):** | ... | ... |
indexer/document_transformer.py
| ... | ... | @@ -723,7 +723,7 @@ class SPUDocumentTransformer: |
| 723 | 723 | return |
| 724 | 724 | |
| 725 | 725 | try: |
| 726 | - # 使用BgeEncoder生成embedding | |
| 726 | + # 使用文本向量编码器生成 embedding | |
| 727 | 727 | # encode方法返回numpy数组,形状为(n, 1024) |
| 728 | 728 | embeddings = self.encoder.encode(title_text) |
| 729 | 729 | |
| ... | ... | @@ -740,4 +740,3 @@ class SPUDocumentTransformer: |
| 740 | 740 | logger.warning(f"Failed to generate embedding for title: {title_text[:50]}...") |
| 741 | 741 | except Exception as e: |
| 742 | 742 | logger.error(f"Error generating title_embedding for SPU {doc.get('spu_id')}: {e}", exc_info=True) |
| 743 | - | ... | ... |
indexer/incremental_service.py
| ... | ... | @@ -80,11 +80,11 @@ class IncrementalIndexerService: |
| 80 | 80 | # Text embedding encoder (best-effort) |
| 81 | 81 | if bool(getattr(self._config.query_config, "enable_text_embedding", False)): |
| 82 | 82 | try: |
| 83 | - from embeddings.text_encoder import BgeEncoder | |
| 83 | + from embeddings.text_encoder import TextEmbeddingEncoder | |
| 84 | 84 | |
| 85 | - self._shared_text_encoder = BgeEncoder() | |
| 85 | + self._shared_text_encoder = TextEmbeddingEncoder() | |
| 86 | 86 | except Exception as e: |
| 87 | - logger.warning("Failed to initialize BgeEncoder at startup: %s", e) | |
| 87 | + logger.warning("Failed to initialize TextEmbeddingEncoder at startup: %s", e) | |
| 88 | 88 | self._shared_text_encoder = None |
| 89 | 89 | |
| 90 | 90 | # Image embedding encoder (best-effort; may be unavailable if embedding service not running) |
| ... | ... | @@ -126,13 +126,13 @@ class IncrementalIndexerService: |
| 126 | 126 | encoder: Optional[Any] = self._shared_text_encoder if enable_embedding else None |
| 127 | 127 | if enable_embedding and encoder is None: |
| 128 | 128 | try: |
| 129 | - from embeddings.text_encoder import BgeEncoder | |
| 129 | + from embeddings.text_encoder import TextEmbeddingEncoder | |
| 130 | 130 | |
| 131 | - encoder = BgeEncoder() | |
| 131 | + encoder = TextEmbeddingEncoder() | |
| 132 | 132 | self._shared_text_encoder = encoder |
| 133 | - logger.info("BgeEncoder lazily initialized in _get_transformer_bundle") | |
| 133 | + logger.info("TextEmbeddingEncoder lazily initialized in _get_transformer_bundle") | |
| 134 | 134 | except Exception as e: |
| 135 | - logger.warning("Failed to lazily initialize BgeEncoder for tenant_id=%s: %s", tenant_id, e) | |
| 135 | + logger.warning("Failed to lazily initialize TextEmbeddingEncoder for tenant_id=%s: %s", tenant_id, e) | |
| 136 | 136 | encoder = None |
| 137 | 137 | enable_embedding = False |
| 138 | 138 | |
| ... | ... | @@ -790,4 +790,3 @@ class IncrementalIndexerService: |
| 790 | 790 | "tenant_id": tenant_id |
| 791 | 791 | } |
| 792 | 792 | |
| 793 | - | ... | ... |
indexer/indexing_utils.py
| ... | ... | @@ -112,11 +112,11 @@ def create_document_transformer( |
| 112 | 112 | # 初始化encoder(如果启用标题向量化且未提供encoder) |
| 113 | 113 | if encoder is None and enable_title_embedding and config.query_config.enable_text_embedding: |
| 114 | 114 | try: |
| 115 | - from embeddings.text_encoder import BgeEncoder | |
| 116 | - encoder = BgeEncoder() | |
| 117 | - logger.info("BgeEncoder initialized for title embedding") | |
| 115 | + from embeddings.text_encoder import TextEmbeddingEncoder | |
| 116 | + encoder = TextEmbeddingEncoder() | |
| 117 | + logger.info("TextEmbeddingEncoder initialized for title embedding") | |
| 118 | 118 | except Exception as e: |
| 119 | - logger.warning(f"Failed to initialize BgeEncoder: {e}, title embedding will be disabled") | |
| 119 | + logger.warning(f"Failed to initialize TextEmbeddingEncoder: {e}, title embedding will be disabled") | |
| 120 | 120 | enable_title_embedding = False |
| 121 | 121 | except Exception as e: |
| 122 | 122 | logger.warning(f"Failed to load config, using defaults: {e}") |
| ... | ... | @@ -136,4 +136,3 @@ def create_document_transformer( |
| 136 | 136 | image_encoder=image_encoder, |
| 137 | 137 | enable_image_embedding=enable_image_embedding, |
| 138 | 138 | ) |
| 139 | - | ... | ... |
providers/embedding.py
| ... | ... | @@ -13,9 +13,11 @@ def create_embedding_provider() -> "EmbeddingProvider": |
| 13 | 13 | """Create embedding provider from services config.""" |
| 14 | 14 | cfg = get_embedding_config() |
| 15 | 15 | provider = (cfg.provider or "http").strip().lower() |
| 16 | - if provider == "vllm": | |
| 16 | + if provider != "http": | |
| 17 | 17 | import logging |
| 18 | - logging.getLogger(__name__).warning("embedding provider 'vllm' is reserved, using HTTP.") | |
| 18 | + logging.getLogger(__name__).warning( | |
| 19 | + "Unsupported embedding provider '%s', fallback to HTTP provider.", provider | |
| 20 | + ) | |
| 19 | 21 | return EmbeddingProvider() |
| 20 | 22 | |
| 21 | 23 | |
| ... | ... | @@ -30,9 +32,9 @@ class EmbeddingProvider: |
| 30 | 32 | |
| 31 | 33 | @property |
| 32 | 34 | def text_encoder(self): |
| 33 | - """Lazy-created text encoder (BgeEncoder).""" | |
| 34 | - from embeddings.text_encoder import BgeEncoder | |
| 35 | - return BgeEncoder(service_url=self._base_url) | |
| 35 | + """Lazy-created text encoder (TextEmbeddingEncoder).""" | |
| 36 | + from embeddings.text_encoder import TextEmbeddingEncoder | |
| 37 | + return TextEmbeddingEncoder(service_url=self._base_url) | |
| 36 | 38 | |
| 37 | 39 | @property |
| 38 | 40 | def image_encoder(self): | ... | ... |
query/query_parser.py
| ... | ... | @@ -10,7 +10,7 @@ import logging |
| 10 | 10 | import re |
| 11 | 11 | from concurrent.futures import Future, ThreadPoolExecutor, as_completed |
| 12 | 12 | |
| 13 | -from embeddings import BgeEncoder | |
| 13 | +from embeddings import TextEmbeddingEncoder | |
| 14 | 14 | from config import SearchConfig |
| 15 | 15 | from .language_detector import LanguageDetector |
| 16 | 16 | from providers import create_translation_provider |
| ... | ... | @@ -77,7 +77,7 @@ class QueryParser: |
| 77 | 77 | def __init__( |
| 78 | 78 | self, |
| 79 | 79 | config: SearchConfig, |
| 80 | - text_encoder: Optional[BgeEncoder] = None, | |
| 80 | + text_encoder: Optional[TextEmbeddingEncoder] = None, | |
| 81 | 81 | translator: Optional[Any] = None |
| 82 | 82 | ): |
| 83 | 83 | """ |
| ... | ... | @@ -115,11 +115,11 @@ class QueryParser: |
| 115 | 115 | logger.info("HanLP not installed; using simple tokenizer") |
| 116 | 116 | |
| 117 | 117 | @property |
| 118 | - def text_encoder(self) -> BgeEncoder: | |
| 118 | + def text_encoder(self) -> TextEmbeddingEncoder: | |
| 119 | 119 | """Lazy load text encoder.""" |
| 120 | 120 | if self._text_encoder is None and self.config.query_config.enable_text_embedding: |
| 121 | 121 | logger.info("Initializing text encoder (lazy load)...") |
| 122 | - self._text_encoder = BgeEncoder() | |
| 122 | + self._text_encoder = TextEmbeddingEncoder() | |
| 123 | 123 | return self._text_encoder |
| 124 | 124 | |
| 125 | 125 | @property |
| ... | ... | @@ -196,9 +196,9 @@ class QueryParser: |
| 196 | 196 | ParsedQuery object with all processing results |
| 197 | 197 | """ |
| 198 | 198 | # Initialize logger if context provided |
| 199 | - logger = context.logger if context else None | |
| 200 | - if logger: | |
| 201 | - logger.info( | |
| 199 | + active_logger = context.logger if context else logger | |
| 200 | + if context and hasattr(context, "logger"): | |
| 201 | + context.logger.info( | |
| 202 | 202 | f"Starting query parsing | Original query: '{query}' | Generate vector: {generate_vector}", |
| 203 | 203 | extra={'reqid': context.reqid, 'uid': context.uid} |
| 204 | 204 | ) |
| ... | ... | @@ -207,13 +207,13 @@ class QueryParser: |
| 207 | 207 | if context and hasattr(context, 'logger'): |
| 208 | 208 | context.logger.info(msg, extra={'reqid': context.reqid, 'uid': context.uid}) |
| 209 | 209 | else: |
| 210 | - logger.info(msg) | |
| 210 | + active_logger.info(msg) | |
| 211 | 211 | |
| 212 | 212 | def log_debug(msg): |
| 213 | 213 | if context and hasattr(context, 'logger'): |
| 214 | 214 | context.logger.debug(msg, extra={'reqid': context.reqid, 'uid': context.uid}) |
| 215 | 215 | else: |
| 216 | - logger.debug(msg) | |
| 216 | + active_logger.debug(msg) | |
| 217 | 217 | |
| 218 | 218 | # Stage 1: Normalize |
| 219 | 219 | normalized = self.normalizer.normalize(query) | ... | ... |
tests/ci/test_service_api_contracts.py
| ... | ... | @@ -427,7 +427,7 @@ def test_indexer_index_validation_max_delete_spu_ids(indexer_client: TestClient) |
| 427 | 427 | |
| 428 | 428 | |
| 429 | 429 | class _FakeTextModel: |
| 430 | - def encode_batch(self, texts, batch_size=32, device="cpu"): | |
| 430 | + def encode_batch(self, texts, batch_size=32, device="cpu", normalize_embeddings=True): | |
| 431 | 431 | return [np.array([0.1, 0.2, 0.3], dtype=np.float32) for _ in texts] |
| 432 | 432 | |
| 433 | 433 | |
| ... | ... | @@ -437,29 +437,24 @@ class _FakeImageModel: |
| 437 | 437 | |
| 438 | 438 | |
| 439 | 439 | @pytest.fixture |
| 440 | -def embedding_client(): | |
| 440 | +def embedding_module(): | |
| 441 | 441 | import embeddings.server as emb_server |
| 442 | 442 | |
| 443 | 443 | emb_server.app.router.on_startup.clear() |
| 444 | 444 | emb_server._text_model = _FakeTextModel() |
| 445 | 445 | emb_server._image_model = _FakeImageModel() |
| 446 | - | |
| 447 | - with TestClient(emb_server.app) as client: | |
| 448 | - yield client | |
| 446 | + yield emb_server | |
| 449 | 447 | |
| 450 | 448 | |
| 451 | -def test_embedding_text_contract(embedding_client: TestClient): | |
| 452 | - response = embedding_client.post("/embed/text", json=["hello", "world"]) | |
| 453 | - assert response.status_code == 200 | |
| 454 | - data = response.json() | |
| 449 | +def test_embedding_text_contract(embedding_module): | |
| 450 | + data = embedding_module.embed_text(["hello", "world"]) | |
| 455 | 451 | assert len(data) == 2 |
| 456 | 452 | assert len(data[0]) == 3 |
| 457 | 453 | |
| 458 | 454 | |
| 459 | -def test_embedding_image_contract(embedding_client: TestClient): | |
| 460 | - response = embedding_client.post("/embed/image", json=["https://example.com/a.jpg"]) | |
| 461 | - assert response.status_code == 200 | |
| 462 | - assert len(response.json()[0]) == 3 | |
| 455 | +def test_embedding_image_contract(embedding_module): | |
| 456 | + data = embedding_module.embed_image(["https://example.com/a.jpg"]) | |
| 457 | + assert len(data[0]) == 3 | |
| 463 | 458 | |
| 464 | 459 | |
| 465 | 460 | class _FakeTranslator: | ... | ... |
tests/test_cloud_embedding.py deleted
| ... | ... | @@ -1,186 +0,0 @@ |
| 1 | -""" | |
| 2 | -Test script for cloud text embedding using Aliyun DashScope API. | |
| 3 | - | |
| 4 | -Reads queries from queries.txt and tests embedding generation, | |
| 5 | -logging send time, receive time, and duration for each request. | |
| 6 | -""" | |
| 7 | - | |
| 8 | -import os | |
| 9 | -import sys | |
| 10 | -import time | |
| 11 | -from datetime import datetime | |
| 12 | -from pathlib import Path | |
| 13 | - | |
| 14 | -import pytest | |
| 15 | - | |
| 16 | -# Add parent directory to path | |
| 17 | -sys.path.insert(0, str(Path(__file__).parent.parent)) | |
| 18 | - | |
| 19 | -from embeddings.cloud_text_encoder import CloudTextEncoder | |
| 20 | - | |
| 21 | - | |
| 22 | -def format_timestamp(ts: float) -> str: | |
| 23 | - """Format timestamp to readable string.""" | |
| 24 | - return datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S.%f')[:-3] | |
| 25 | - | |
| 26 | - | |
| 27 | -def read_queries(file_path: str, limit: int = 100) -> list: | |
| 28 | - """ | |
| 29 | - Read queries from text file. | |
| 30 | - | |
| 31 | - Args: | |
| 32 | - file_path: Path to queries file | |
| 33 | - limit: Maximum number of queries to read | |
| 34 | - | |
| 35 | - Returns: | |
| 36 | - List of query strings | |
| 37 | - """ | |
| 38 | - queries = [] | |
| 39 | - with open(file_path, 'r', encoding='utf-8') as f: | |
| 40 | - for i, line in enumerate(f): | |
| 41 | - if i >= limit: | |
| 42 | - break | |
| 43 | - query = line.strip() | |
| 44 | - if query: # Skip empty lines | |
| 45 | - queries.append(query) | |
| 46 | - return queries | |
| 47 | - | |
| 48 | - | |
| 49 | -@pytest.mark.skip(reason="Requires data file and DASHSCOPE_API_KEY; run manually when needed") | |
| 50 | -def test_cloud_embedding(queries_file: str, num_queries: int = 100): | |
| 51 | - """ | |
| 52 | - Test cloud embedding with queries from file. | |
| 53 | - | |
| 54 | - Args: | |
| 55 | - queries_file: Path to queries file | |
| 56 | - num_queries: Number of queries to test | |
| 57 | - """ | |
| 58 | - print("=" * 80) | |
| 59 | - print("Cloud Text Embedding Test - Aliyun DashScope API") | |
| 60 | - print("=" * 80) | |
| 61 | - print() | |
| 62 | - | |
| 63 | - # Check if API key is set | |
| 64 | - api_key = os.getenv("DASHSCOPE_API_KEY") | |
| 65 | - if not api_key: | |
| 66 | - print("ERROR: DASHSCOPE_API_KEY environment variable is not set!") | |
| 67 | - print("Please set it using: export DASHSCOPE_API_KEY='your-api-key'") | |
| 68 | - return | |
| 69 | - | |
| 70 | - print(f"API Key: {api_key[:10]}...{api_key[-4:]}") | |
| 71 | - print() | |
| 72 | - | |
| 73 | - # Read queries | |
| 74 | - print(f"Reading queries from: {queries_file}") | |
| 75 | - try: | |
| 76 | - queries = read_queries(queries_file, limit=num_queries) | |
| 77 | - print(f"Successfully read {len(queries)} queries") | |
| 78 | - print() | |
| 79 | - except Exception as e: | |
| 80 | - print(f"ERROR: Failed to read queries file: {e}") | |
| 81 | - return | |
| 82 | - | |
| 83 | - # Initialize encoder | |
| 84 | - print("Initializing CloudTextEncoder...") | |
| 85 | - try: | |
| 86 | - encoder = CloudTextEncoder() | |
| 87 | - print("CloudTextEncoder initialized successfully") | |
| 88 | - print() | |
| 89 | - except Exception as e: | |
| 90 | - print(f"ERROR: Failed to initialize encoder: {e}") | |
| 91 | - return | |
| 92 | - | |
| 93 | - # Test embeddings | |
| 94 | - print("=" * 80) | |
| 95 | - print(f"Testing {len(queries)} queries (one by one)") | |
| 96 | - print("=" * 80) | |
| 97 | - print() | |
| 98 | - | |
| 99 | - total_start = time.time() | |
| 100 | - success_count = 0 | |
| 101 | - failure_count = 0 | |
| 102 | - total_duration = 0.0 | |
| 103 | - | |
| 104 | - for i, query in enumerate(queries, 1): | |
| 105 | - try: | |
| 106 | - # Record send time | |
| 107 | - send_time = time.time() | |
| 108 | - send_time_str = format_timestamp(send_time) | |
| 109 | - | |
| 110 | - # Generate embedding | |
| 111 | - embedding = encoder.encode(query) | |
| 112 | - | |
| 113 | - # Record receive time | |
| 114 | - receive_time = time.time() | |
| 115 | - receive_time_str = format_timestamp(receive_time) | |
| 116 | - | |
| 117 | - # Calculate duration | |
| 118 | - duration = receive_time - send_time | |
| 119 | - total_duration += duration | |
| 120 | - | |
| 121 | - # Verify embedding | |
| 122 | - if embedding.shape[0] > 0: | |
| 123 | - success_count += 1 | |
| 124 | - status = "✓ SUCCESS" | |
| 125 | - else: | |
| 126 | - failure_count += 1 | |
| 127 | - status = "✗ FAILED" | |
| 128 | - | |
| 129 | - # Print result | |
| 130 | - query_display = query[:50] + "..." if len(query) > 50 else query | |
| 131 | - print(f"[{i:3d}/{len(queries)}] {status}") | |
| 132 | - print(f" Query: {query_display}") | |
| 133 | - print(f" Send Time: {send_time_str}") | |
| 134 | - print(f" Receive Time: {receive_time_str}") | |
| 135 | - print(f" Duration: {duration:.3f}s") | |
| 136 | - print(f" Embedding Shape: {embedding.shape}") | |
| 137 | - print() | |
| 138 | - | |
| 139 | - except Exception as e: | |
| 140 | - failure_count += 1 | |
| 141 | - receive_time = time.time() | |
| 142 | - duration = receive_time - send_time | |
| 143 | - | |
| 144 | - print(f"[{i:3d}/{len(queries)}] ✗ ERROR") | |
| 145 | - print(f" Query: {query[:50]}...") | |
| 146 | - print(f" Send Time: {send_time_str}") | |
| 147 | - print(f" Receive Time: {format_timestamp(receive_time)}") | |
| 148 | - print(f" Duration: {duration:.3f}s") | |
| 149 | - print(f" Error: {str(e)}") | |
| 150 | - print() | |
| 151 | - | |
| 152 | - # Print summary | |
| 153 | - total_elapsed = time.time() - total_start | |
| 154 | - avg_duration = total_duration / len(queries) if queries else 0 | |
| 155 | - | |
| 156 | - print("=" * 80) | |
| 157 | - print("Test Summary") | |
| 158 | - print("=" * 80) | |
| 159 | - print(f"Total Queries: {len(queries)}") | |
| 160 | - print(f"Successful: {success_count}") | |
| 161 | - print(f"Failed: {failure_count}") | |
| 162 | - print(f"Success Rate: {success_count / len(queries) * 100:.1f}%") | |
| 163 | - print(f"Total Time: {total_elapsed:.3f}s") | |
| 164 | - print(f"Total API Time: {total_duration:.3f}s") | |
| 165 | - print(f"Average Duration: {avg_duration:.3f}s per query") | |
| 166 | - print(f"Throughput: {len(queries) / total_elapsed:.2f} queries/second") | |
| 167 | - print("=" * 80) | |
| 168 | - | |
| 169 | - | |
| 170 | -def main(): | |
| 171 | - """Main entry point.""" | |
| 172 | - # Default queries file path | |
| 173 | - queries_file = Path(__file__).parent.parent / "data_crawling" / "queries.txt" | |
| 174 | - | |
| 175 | - # Check if file exists | |
| 176 | - if not queries_file.exists(): | |
| 177 | - print(f"ERROR: Queries file not found: {queries_file}") | |
| 178 | - return | |
| 179 | - | |
| 180 | - # Run test with 100 queries | |
| 181 | - test_cloud_embedding(str(queries_file), num_queries=100) | |
| 182 | - | |
| 183 | - | |
| 184 | -if __name__ == "__main__": | |
| 185 | - main() | |
| 186 | - |
| ... | ... | @@ -0,0 +1,159 @@ |
| 1 | +import pickle | |
| 2 | +from typing import Any, Dict, List, Optional | |
| 3 | + | |
| 4 | +import numpy as np | |
| 5 | + | |
| 6 | +from config import ( | |
| 7 | + FunctionScoreConfig, | |
| 8 | + IndexConfig, | |
| 9 | + QueryConfig, | |
| 10 | + RankingConfig, | |
| 11 | + RerankConfig, | |
| 12 | + SPUConfig, | |
| 13 | + SearchConfig, | |
| 14 | +) | |
| 15 | +from embeddings.text_encoder import TextEmbeddingEncoder | |
| 16 | +from query import QueryParser | |
| 17 | + | |
| 18 | + | |
| 19 | +class _FakeRedis: | |
| 20 | + def __init__(self): | |
| 21 | + self.store: Dict[str, bytes] = {} | |
| 22 | + | |
| 23 | + def ping(self): | |
| 24 | + return True | |
| 25 | + | |
| 26 | + def get(self, key: str): | |
| 27 | + return self.store.get(key) | |
| 28 | + | |
| 29 | + def setex(self, key: str, _expire, value: bytes): | |
| 30 | + self.store[key] = value | |
| 31 | + return True | |
| 32 | + | |
| 33 | + def expire(self, key: str, _expire): | |
| 34 | + return key in self.store | |
| 35 | + | |
| 36 | + def delete(self, key: str): | |
| 37 | + self.store.pop(key, None) | |
| 38 | + return True | |
| 39 | + | |
| 40 | + | |
| 41 | +class _FakeResponse: | |
| 42 | + def __init__(self, payload: List[Optional[List[float]]]): | |
| 43 | + self._payload = payload | |
| 44 | + | |
| 45 | + def raise_for_status(self): | |
| 46 | + return None | |
| 47 | + | |
| 48 | + def json(self): | |
| 49 | + return self._payload | |
| 50 | + | |
| 51 | + | |
| 52 | +class _FakeTranslator: | |
| 53 | + def translate( | |
| 54 | + self, | |
| 55 | + text: str, | |
| 56 | + target_lang: str, | |
| 57 | + source_lang: Optional[str] = None, | |
| 58 | + prompt: Optional[str] = None, | |
| 59 | + ) -> str: | |
| 60 | + return f"{text}-{target_lang}" | |
| 61 | + | |
| 62 | + | |
| 63 | +class _FakeQueryEncoder: | |
| 64 | + def encode(self, sentences, **kwargs): | |
| 65 | + if isinstance(sentences, str): | |
| 66 | + sentences = [sentences] | |
| 67 | + return np.array([np.array([0.11, 0.22, 0.33], dtype=np.float32) for _ in sentences], dtype=object) | |
| 68 | + | |
| 69 | + | |
| 70 | +def _build_test_config() -> SearchConfig: | |
| 71 | + return SearchConfig( | |
| 72 | + field_boosts={"title.en": 3.0}, | |
| 73 | + indexes=[IndexConfig(name="default", label="default", fields=["title.en"], boost=1.0)], | |
| 74 | + query_config=QueryConfig( | |
| 75 | + supported_languages=["en", "zh"], | |
| 76 | + default_language="en", | |
| 77 | + enable_text_embedding=True, | |
| 78 | + enable_query_rewrite=False, | |
| 79 | + enable_multilang_search=True, | |
| 80 | + rewrite_dictionary={}, | |
| 81 | + translation_prompts={"query_zh": "e-commerce domain", "query_en": "e-commerce domain"}, | |
| 82 | + text_embedding_field="title_embedding", | |
| 83 | + image_embedding_field=None, | |
| 84 | + ), | |
| 85 | + ranking=RankingConfig(expression="bm25()", description="test"), | |
| 86 | + function_score=FunctionScoreConfig(), | |
| 87 | + rerank=RerankConfig(), | |
| 88 | + spu_config=SPUConfig(enabled=True, spu_field="spu_id", inner_hits_size=3), | |
| 89 | + es_index_name="test_products", | |
| 90 | + tenant_config={}, | |
| 91 | + es_settings={}, | |
| 92 | + services={}, | |
| 93 | + ) | |
| 94 | + | |
| 95 | + | |
| 96 | +def test_text_embedding_encoder_response_alignment(monkeypatch): | |
| 97 | + fake_redis = _FakeRedis() | |
| 98 | + monkeypatch.setattr("embeddings.text_encoder.redis.Redis", lambda **kwargs: fake_redis) | |
| 99 | + | |
| 100 | + def _fake_post(url, json, timeout): | |
| 101 | + assert url.endswith("/embed/text") | |
| 102 | + assert json == ["hello", "world"] | |
| 103 | + return _FakeResponse([[0.1, 0.2], None]) | |
| 104 | + | |
| 105 | + monkeypatch.setattr("embeddings.text_encoder.requests.post", _fake_post) | |
| 106 | + | |
| 107 | + encoder = TextEmbeddingEncoder(service_url="http://127.0.0.1:6005") | |
| 108 | + out = encoder.encode(["hello", "world"]) | |
| 109 | + | |
| 110 | + assert len(out) == 2 | |
| 111 | + assert isinstance(out[0], np.ndarray) | |
| 112 | + assert out[0].shape == (2,) | |
| 113 | + assert out[1] is None | |
| 114 | + | |
| 115 | + | |
| 116 | +def test_text_embedding_encoder_cache_hit(monkeypatch): | |
| 117 | + fake_redis = _FakeRedis() | |
| 118 | + cached = np.array([0.9, 0.8], dtype=np.float32) | |
| 119 | + fake_redis.store["embedding:generic:cached-text"] = pickle.dumps(cached) | |
| 120 | + monkeypatch.setattr("embeddings.text_encoder.redis.Redis", lambda **kwargs: fake_redis) | |
| 121 | + | |
| 122 | + calls = {"count": 0} | |
| 123 | + | |
| 124 | + def _fake_post(url, json, timeout): | |
| 125 | + calls["count"] += 1 | |
| 126 | + return _FakeResponse([[0.3, 0.4]]) | |
| 127 | + | |
| 128 | + monkeypatch.setattr("embeddings.text_encoder.requests.post", _fake_post) | |
| 129 | + | |
| 130 | + encoder = TextEmbeddingEncoder(service_url="http://127.0.0.1:6005") | |
| 131 | + out = encoder.encode(["cached-text", "new-text"]) | |
| 132 | + | |
| 133 | + assert calls["count"] == 1 | |
| 134 | + assert np.allclose(out[0], cached) | |
| 135 | + assert np.allclose(out[1], np.array([0.3, 0.4], dtype=np.float32)) | |
| 136 | + | |
| 137 | + | |
| 138 | +def test_query_parser_generates_query_vector_with_encoder(): | |
| 139 | + parser = QueryParser( | |
| 140 | + config=_build_test_config(), | |
| 141 | + text_encoder=_FakeQueryEncoder(), | |
| 142 | + translator=_FakeTranslator(), | |
| 143 | + ) | |
| 144 | + | |
| 145 | + parsed = parser.parse("red dress", tenant_id="162", generate_vector=True) | |
| 146 | + assert parsed.query_vector is not None | |
| 147 | + assert parsed.query_vector.shape == (3,) | |
| 148 | + | |
| 149 | + | |
| 150 | +def test_query_parser_skips_query_vector_when_disabled(): | |
| 151 | + parser = QueryParser( | |
| 152 | + config=_build_test_config(), | |
| 153 | + text_encoder=_FakeQueryEncoder(), | |
| 154 | + translator=_FakeTranslator(), | |
| 155 | + ) | |
| 156 | + | |
| 157 | + parsed = parser.parse("red dress", tenant_id="162", generate_vector=False) | |
| 158 | + assert parsed.query_vector is None | |
| 159 | + | ... | ... |
third-party/xinference/activate.sh deleted
third-party/xinference/perfermance_test.py deleted
| ... | ... | @@ -1,249 +0,0 @@ |
| 1 | -import openai | |
| 2 | -import time | |
| 3 | -import statistics | |
| 4 | -from concurrent.futures import ThreadPoolExecutor, as_completed | |
| 5 | -from typing import List, Dict, Tuple | |
| 6 | -import json | |
| 7 | - | |
| 8 | - | |
| 9 | -class EmbeddingPerformanceTester: | |
| 10 | - def __init__(self, base_url: str = "http://127.0.0.1:9997/v1"): | |
| 11 | - """初始化性能测试器""" | |
| 12 | - self.client = openai.Client( | |
| 13 | - api_key="cannot be empty", # 实际使用时请替换为真实API Key | |
| 14 | - base_url=base_url | |
| 15 | - ) | |
| 16 | - | |
| 17 | - def test_single_request(self, model: str, input_text: List[str]) -> Tuple[bool, float]: | |
| 18 | - """测试单个请求,返回成功状态和耗时""" | |
| 19 | - try: | |
| 20 | - start_time = time.perf_counter() | |
| 21 | - response = self.client.embeddings.create( | |
| 22 | - model=model, | |
| 23 | - input=input_text | |
| 24 | - ) | |
| 25 | - end_time = time.perf_counter() | |
| 26 | - | |
| 27 | - # 验证响应格式 | |
| 28 | - if response and hasattr(response, 'data'): | |
| 29 | - return True, end_time - start_time | |
| 30 | - else: | |
| 31 | - return False, end_time - start_time | |
| 32 | - | |
| 33 | - except Exception as e: | |
| 34 | - print(f"请求失败 - 模型 {model}: {str(e)}") | |
| 35 | - return False, 0.0 | |
| 36 | - | |
| 37 | - def test_model_sequential(self, model: str, input_text: List[str], | |
| 38 | - iterations: int = 1000) -> Dict: | |
| 39 | - """顺序执行性能测试""" | |
| 40 | - print(f"\n开始顺序测试模型: {model}") | |
| 41 | - print(f"测试次数: {iterations}") | |
| 42 | - | |
| 43 | - successes = 0 | |
| 44 | - failures = 0 | |
| 45 | - latencies = [] | |
| 46 | - | |
| 47 | - for i in range(iterations): | |
| 48 | - if i % 100 == 0 and i > 0: | |
| 49 | - print(f" 已完成 {i}/{iterations} 次请求...") | |
| 50 | - | |
| 51 | - success, latency = self.test_single_request(model, input_text) | |
| 52 | - | |
| 53 | - if success: | |
| 54 | - successes += 1 | |
| 55 | - latencies.append(latency) | |
| 56 | - else: | |
| 57 | - failures += 1 | |
| 58 | - | |
| 59 | - return self._calculate_stats(model, successes, failures, latencies) | |
| 60 | - | |
| 61 | - def test_model_concurrent(self, model: str, input_text: List[str], | |
| 62 | - iterations: int = 1000, max_workers: int = 10) -> Dict: | |
| 63 | - """并发执行性能测试""" | |
| 64 | - print(f"\n开始并发测试模型: {model}") | |
| 65 | - print(f"测试次数: {iterations}, 并发数: {max_workers}") | |
| 66 | - | |
| 67 | - successes = 0 | |
| 68 | - failures = 0 | |
| 69 | - latencies = [] | |
| 70 | - | |
| 71 | - with ThreadPoolExecutor(max_workers=max_workers) as executor: | |
| 72 | - # 提交所有任务 | |
| 73 | - future_to_request = { | |
| 74 | - executor.submit(self.test_single_request, model, input_text): i | |
| 75 | - for i in range(iterations) | |
| 76 | - } | |
| 77 | - | |
| 78 | - # 收集结果 | |
| 79 | - completed = 0 | |
| 80 | - for future in as_completed(future_to_request): | |
| 81 | - completed += 1 | |
| 82 | - if completed % 100 == 0: | |
| 83 | - print(f" 已完成 {completed}/{iterations} 次请求...") | |
| 84 | - | |
| 85 | - try: | |
| 86 | - success, latency = future.result() | |
| 87 | - if success: | |
| 88 | - successes += 1 | |
| 89 | - latencies.append(latency) | |
| 90 | - else: | |
| 91 | - failures += 1 | |
| 92 | - except Exception as e: | |
| 93 | - print(f"请求异常: {str(e)}") | |
| 94 | - failures += 1 | |
| 95 | - | |
| 96 | - return self._calculate_stats(model, successes, failures, latencies) | |
| 97 | - | |
| 98 | - def _calculate_stats(self, model: str, successes: int, | |
| 99 | - failures: int, latencies: List[float]) -> Dict: | |
| 100 | - """计算性能统计信息""" | |
| 101 | - if not latencies: | |
| 102 | - return { | |
| 103 | - "model": model, | |
| 104 | - "total_requests": successes + failures, | |
| 105 | - "successful_requests": successes, | |
| 106 | - "failed_requests": failures, | |
| 107 | - "success_rate": 0.0, | |
| 108 | - "error": "无成功请求" | |
| 109 | - } | |
| 110 | - | |
| 111 | - stats = { | |
| 112 | - "model": model, | |
| 113 | - "total_requests": successes + failures, | |
| 114 | - "successful_requests": successes, | |
| 115 | - "failed_requests": failures, | |
| 116 | - "success_rate": successes / (successes + failures) * 100, | |
| 117 | - "total_time": sum(latencies), | |
| 118 | - "avg_latency": statistics.mean(latencies), | |
| 119 | - "min_latency": min(latencies), | |
| 120 | - "max_latency": max(latencies), | |
| 121 | - "p50_latency": statistics.median(latencies), | |
| 122 | - "p95_latency": sorted(latencies)[int(len(latencies) * 0.95)], | |
| 123 | - "p99_latency": sorted(latencies)[int(len(latencies) * 0.99)], | |
| 124 | - "requests_per_second": len(latencies) / sum(latencies) if sum(latencies) > 0 else 0 | |
| 125 | - } | |
| 126 | - | |
| 127 | - # 添加标准差(如果有多于一个样本) | |
| 128 | - if len(latencies) > 1: | |
| 129 | - stats["std_dev"] = statistics.stdev(latencies) | |
| 130 | - | |
| 131 | - return stats | |
| 132 | - | |
| 133 | - def print_results(self, results: Dict): | |
| 134 | - """打印测试结果""" | |
| 135 | - print("\n" + "="*60) | |
| 136 | - print(f"性能测试结果 - {results['model']}") | |
| 137 | - print("="*60) | |
| 138 | - | |
| 139 | - if "error" in results: | |
| 140 | - print(f"错误: {results['error']}") | |
| 141 | - return | |
| 142 | - | |
| 143 | - print(f"总请求数: {results['total_requests']}") | |
| 144 | - print(f"成功请求: {results['successful_requests']}") | |
| 145 | - print(f"失败请求: {results['failed_requests']}") | |
| 146 | - print(f"成功率: {results['success_rate']:.2f}%") | |
| 147 | - print(f"总耗时: {results['total_time']:.4f}秒") | |
| 148 | - print(f"平均延迟: {results['avg_latency']:.4f}秒") | |
| 149 | - print(f"最小延迟: {results['min_latency']:.4f}秒") | |
| 150 | - print(f"最大延迟: {results['max_latency']:.4f}秒") | |
| 151 | - print(f"P50延迟: {results['p50_latency']:.4f}秒") | |
| 152 | - print(f"P95延迟: {results['p95_latency']:.4f}秒") | |
| 153 | - print(f"P99延迟: {results['p99_latency']:.4f}秒") | |
| 154 | - | |
| 155 | - if "std_dev" in results: | |
| 156 | - print(f"标准差: {results['std_dev']:.4f}秒") | |
| 157 | - | |
| 158 | - print(f"QPS: {results['requests_per_second']:.2f} 请求/秒") | |
| 159 | - print("="*60) | |
| 160 | - | |
| 161 | - def save_results(self, results_list: List[Dict], filename: str = "performance_results.json"): | |
| 162 | - """保存测试结果到JSON文件""" | |
| 163 | - with open(filename, 'w', encoding='utf-8') as f: | |
| 164 | - json.dump(results_list, f, indent=2, ensure_ascii=False) | |
| 165 | - print(f"\n结果已保存到: {filename}") | |
| 166 | - | |
| 167 | - | |
| 168 | -def main(): | |
| 169 | - """主函数""" | |
| 170 | - # 初始化测试器 | |
| 171 | - tester = EmbeddingPerformanceTester() | |
| 172 | - | |
| 173 | - # 测试配置 | |
| 174 | - test_input = ["What is the capital of China?"] | |
| 175 | - iterations = 1000 | |
| 176 | - test_models = ['bge-m3', 'Qwen3-Embedding-0.6B'] | |
| 177 | - | |
| 178 | - print("="*60) | |
| 179 | - print("Embedding API 性能测试") | |
| 180 | - print("="*60) | |
| 181 | - | |
| 182 | - all_results = [] | |
| 183 | - | |
| 184 | - # 测试模式选择 | |
| 185 | - print("\n选择测试模式:") | |
| 186 | - print("1. 顺序测试 (Sequential)") | |
| 187 | - print("2. 并发测试 (Concurrent)") | |
| 188 | - print("3. 两种模式都测试") | |
| 189 | - | |
| 190 | - mode = input("请输入选择 (1/2/3, 默认1): ").strip() | |
| 191 | - | |
| 192 | - for model in test_models: | |
| 193 | - print(f"\n{'='*60}") | |
| 194 | - print(f"测试模型: {model}") | |
| 195 | - print(f"{'='*60}") | |
| 196 | - | |
| 197 | - if mode in ['2', '3']: | |
| 198 | - # 并发测试 | |
| 199 | - concurrent_results = tester.test_model_concurrent( | |
| 200 | - model=model, | |
| 201 | - input_text=test_input, | |
| 202 | - iterations=iterations, | |
| 203 | - max_workers=10 # 可根据需要调整并发数 | |
| 204 | - ) | |
| 205 | - tester.print_results(concurrent_results) | |
| 206 | - concurrent_results["test_mode"] = "concurrent" | |
| 207 | - all_results.append(concurrent_results) | |
| 208 | - | |
| 209 | - if mode in ['1', '3'] or not mode: | |
| 210 | - # 顺序测试 | |
| 211 | - sequential_results = tester.test_model_sequential( | |
| 212 | - model=model, | |
| 213 | - input_text=test_input, | |
| 214 | - iterations=iterations | |
| 215 | - ) | |
| 216 | - tester.print_results(sequential_results) | |
| 217 | - sequential_results["test_mode"] = "sequential" | |
| 218 | - all_results.append(sequential_results) | |
| 219 | - | |
| 220 | - # 保存结果 | |
| 221 | - tester.save_results(all_results) | |
| 222 | - | |
| 223 | - # 汇总对比 | |
| 224 | - print("\n" + "="*60) | |
| 225 | - print("性能测试汇总对比") | |
| 226 | - print("="*60) | |
| 227 | - | |
| 228 | - for result in all_results: | |
| 229 | - if "error" not in result: | |
| 230 | - print(f"\n模型: {result['model']} ({result['test_mode']})") | |
| 231 | - print(f" QPS: {result['requests_per_second']:.2f}") | |
| 232 | - print(f" 平均延迟: {result['avg_latency']:.4f}秒") | |
| 233 | - print(f" 成功率: {result['success_rate']:.2f}%") | |
| 234 | - | |
| 235 | - | |
| 236 | -if __name__ == "__main__": | |
| 237 | - # 添加一个简单的健康检查 | |
| 238 | - try: | |
| 239 | - tester = EmbeddingPerformanceTester() | |
| 240 | - # 快速测试连接 | |
| 241 | - test_result = tester.test_single_request('bge-m3', ["test"]) | |
| 242 | - if test_result[0]: | |
| 243 | - print("API连接正常,开始性能测试...") | |
| 244 | - main() | |
| 245 | - else: | |
| 246 | - print("API连接失败,请检查服务是否正常运行") | |
| 247 | - except Exception as e: | |
| 248 | - print(f"初始化失败: {str(e)}") | |
| 249 | - print("请确保OpenAI客户端已安装: pip install openai") |
third-party/xinference/perfermance_test_http.py deleted
| ... | ... | @@ -1,265 +0,0 @@ |
| 1 | -import requests | |
| 2 | -import time | |
| 3 | -import statistics | |
| 4 | -from concurrent.futures import ThreadPoolExecutor, as_completed | |
| 5 | -from typing import List, Dict, Tuple | |
| 6 | -import json | |
| 7 | - | |
| 8 | - | |
| 9 | -class EmbeddingPerformanceTester: | |
| 10 | - def __init__(self, base_url: str = "http://127.0.0.1:9997/v1"): | |
| 11 | - """初始化性能测试器""" | |
| 12 | - self.base_url = base_url | |
| 13 | - self.embeddings_url = f"{base_url}/embeddings" | |
| 14 | - | |
| 15 | - def test_single_request(self, model: str, input_text: List[str]) -> Tuple[bool, float]: | |
| 16 | - """测试单个请求,返回成功状态和耗时""" | |
| 17 | - try: | |
| 18 | - start_time = time.perf_counter() | |
| 19 | - | |
| 20 | - # 构建请求体 | |
| 21 | - # 如果 input_text 是列表,取第一个元素;如果是字符串,直接使用 | |
| 22 | - input_value = input_text[0] if isinstance(input_text, list) and len(input_text) > 0 else input_text | |
| 23 | - | |
| 24 | - response = requests.post( | |
| 25 | - self.embeddings_url, | |
| 26 | - headers={ | |
| 27 | - 'accept': 'application/json', | |
| 28 | - 'Content-Type': 'application/json' | |
| 29 | - }, | |
| 30 | - json={ | |
| 31 | - "model": model, | |
| 32 | - "input": input_value | |
| 33 | - }, | |
| 34 | - timeout=60 # 设置超时时间 | |
| 35 | - ) | |
| 36 | - | |
| 37 | - end_time = time.perf_counter() | |
| 38 | - | |
| 39 | - # 验证响应格式 | |
| 40 | - if response.status_code == 200: | |
| 41 | - result = response.json() | |
| 42 | - if result and 'data' in result and len(result['data']) > 0: | |
| 43 | - return True, end_time - start_time | |
| 44 | - else: | |
| 45 | - return False, end_time - start_time | |
| 46 | - else: | |
| 47 | - return False, end_time - start_time | |
| 48 | - | |
| 49 | - except Exception as e: | |
| 50 | - print(f"请求失败 - 模型 {model}: {str(e)}") | |
| 51 | - return False, 0.0 | |
| 52 | - | |
| 53 | - def test_model_sequential(self, model: str, input_text: List[str], | |
| 54 | - iterations: int = 1000) -> Dict: | |
| 55 | - """顺序执行性能测试""" | |
| 56 | - print(f"\n开始顺序测试模型: {model}") | |
| 57 | - print(f"测试次数: {iterations}") | |
| 58 | - | |
| 59 | - successes = 0 | |
| 60 | - failures = 0 | |
| 61 | - latencies = [] | |
| 62 | - | |
| 63 | - for i in range(iterations): | |
| 64 | - if i % 100 == 0 and i > 0: | |
| 65 | - print(f" 已完成 {i}/{iterations} 次请求...") | |
| 66 | - | |
| 67 | - success, latency = self.test_single_request(model, input_text) | |
| 68 | - | |
| 69 | - if success: | |
| 70 | - successes += 1 | |
| 71 | - latencies.append(latency) | |
| 72 | - else: | |
| 73 | - failures += 1 | |
| 74 | - | |
| 75 | - return self._calculate_stats(model, successes, failures, latencies) | |
| 76 | - | |
| 77 | - def test_model_concurrent(self, model: str, input_text: List[str], | |
| 78 | - iterations: int = 1000, max_workers: int = 10) -> Dict: | |
| 79 | - """并发执行性能测试""" | |
| 80 | - print(f"\n开始并发测试模型: {model}") | |
| 81 | - print(f"测试次数: {iterations}, 并发数: {max_workers}") | |
| 82 | - | |
| 83 | - successes = 0 | |
| 84 | - failures = 0 | |
| 85 | - latencies = [] | |
| 86 | - | |
| 87 | - with ThreadPoolExecutor(max_workers=max_workers) as executor: | |
| 88 | - # 提交所有任务 | |
| 89 | - future_to_request = { | |
| 90 | - executor.submit(self.test_single_request, model, input_text): i | |
| 91 | - for i in range(iterations) | |
| 92 | - } | |
| 93 | - | |
| 94 | - # 收集结果 | |
| 95 | - completed = 0 | |
| 96 | - for future in as_completed(future_to_request): | |
| 97 | - completed += 1 | |
| 98 | - if completed % 100 == 0: | |
| 99 | - print(f" 已完成 {completed}/{iterations} 次请求...") | |
| 100 | - | |
| 101 | - try: | |
| 102 | - success, latency = future.result() | |
| 103 | - if success: | |
| 104 | - successes += 1 | |
| 105 | - latencies.append(latency) | |
| 106 | - else: | |
| 107 | - failures += 1 | |
| 108 | - except Exception as e: | |
| 109 | - print(f"请求异常: {str(e)}") | |
| 110 | - failures += 1 | |
| 111 | - | |
| 112 | - return self._calculate_stats(model, successes, failures, latencies) | |
| 113 | - | |
| 114 | - def _calculate_stats(self, model: str, successes: int, | |
| 115 | - failures: int, latencies: List[float]) -> Dict: | |
| 116 | - """计算性能统计信息""" | |
| 117 | - if not latencies: | |
| 118 | - return { | |
| 119 | - "model": model, | |
| 120 | - "total_requests": successes + failures, | |
| 121 | - "successful_requests": successes, | |
| 122 | - "failed_requests": failures, | |
| 123 | - "success_rate": 0.0, | |
| 124 | - "error": "无成功请求" | |
| 125 | - } | |
| 126 | - | |
| 127 | - stats = { | |
| 128 | - "model": model, | |
| 129 | - "total_requests": successes + failures, | |
| 130 | - "successful_requests": successes, | |
| 131 | - "failed_requests": failures, | |
| 132 | - "success_rate": successes / (successes + failures) * 100, | |
| 133 | - "total_time": sum(latencies), | |
| 134 | - "avg_latency": statistics.mean(latencies), | |
| 135 | - "min_latency": min(latencies), | |
| 136 | - "max_latency": max(latencies), | |
| 137 | - "p50_latency": statistics.median(latencies), | |
| 138 | - "p95_latency": sorted(latencies)[int(len(latencies) * 0.95)], | |
| 139 | - "p99_latency": sorted(latencies)[int(len(latencies) * 0.99)], | |
| 140 | - "requests_per_second": len(latencies) / sum(latencies) if sum(latencies) > 0 else 0 | |
| 141 | - } | |
| 142 | - | |
| 143 | - # 添加标准差(如果有多于一个样本) | |
| 144 | - if len(latencies) > 1: | |
| 145 | - stats["std_dev"] = statistics.stdev(latencies) | |
| 146 | - | |
| 147 | - return stats | |
| 148 | - | |
| 149 | - def print_results(self, results: Dict): | |
| 150 | - """打印测试结果""" | |
| 151 | - print("\n" + "="*60) | |
| 152 | - print(f"性能测试结果 - {results['model']}") | |
| 153 | - print("="*60) | |
| 154 | - | |
| 155 | - if "error" in results: | |
| 156 | - print(f"错误: {results['error']}") | |
| 157 | - return | |
| 158 | - | |
| 159 | - print(f"总请求数: {results['total_requests']}") | |
| 160 | - print(f"成功请求: {results['successful_requests']}") | |
| 161 | - print(f"失败请求: {results['failed_requests']}") | |
| 162 | - print(f"成功率: {results['success_rate']:.2f}%") | |
| 163 | - print(f"总耗时: {results['total_time']:.4f}秒") | |
| 164 | - print(f"平均延迟: {results['avg_latency']:.4f}秒") | |
| 165 | - print(f"最小延迟: {results['min_latency']:.4f}秒") | |
| 166 | - print(f"最大延迟: {results['max_latency']:.4f}秒") | |
| 167 | - print(f"P50延迟: {results['p50_latency']:.4f}秒") | |
| 168 | - print(f"P95延迟: {results['p95_latency']:.4f}秒") | |
| 169 | - print(f"P99延迟: {results['p99_latency']:.4f}秒") | |
| 170 | - | |
| 171 | - if "std_dev" in results: | |
| 172 | - print(f"标准差: {results['std_dev']:.4f}秒") | |
| 173 | - | |
| 174 | - print(f"QPS: {results['requests_per_second']:.2f} 请求/秒") | |
| 175 | - print("="*60) | |
| 176 | - | |
| 177 | - def save_results(self, results_list: List[Dict], filename: str = "performance_results.json"): | |
| 178 | - """保存测试结果到JSON文件""" | |
| 179 | - with open(filename, 'w', encoding='utf-8') as f: | |
| 180 | - json.dump(results_list, f, indent=2, ensure_ascii=False) | |
| 181 | - print(f"\n结果已保存到: {filename}") | |
| 182 | - | |
| 183 | - | |
| 184 | -def main(): | |
| 185 | - """主函数""" | |
| 186 | - # 初始化测试器 | |
| 187 | - tester = EmbeddingPerformanceTester() | |
| 188 | - | |
| 189 | - # 测试配置 | |
| 190 | - test_input = ["What is the capital of China?"] | |
| 191 | - iterations = 1000 | |
| 192 | - test_models = ['bge-m3', 'Qwen3-Embedding-0.6B'] | |
| 193 | - | |
| 194 | - print("="*60) | |
| 195 | - print("Embedding API 性能测试 (HTTP)") | |
| 196 | - print("="*60) | |
| 197 | - | |
| 198 | - all_results = [] | |
| 199 | - | |
| 200 | - # 测试模式选择 | |
| 201 | - print("\n选择测试模式:") | |
| 202 | - print("1. 顺序测试 (Sequential)") | |
| 203 | - print("2. 并发测试 (Concurrent)") | |
| 204 | - print("3. 两种模式都测试") | |
| 205 | - | |
| 206 | - mode = input("请输入选择 (1/2/3, 默认1): ").strip() | |
| 207 | - | |
| 208 | - for model in test_models: | |
| 209 | - print(f"\n{'='*60}") | |
| 210 | - print(f"测试模型: {model}") | |
| 211 | - print(f"{'='*60}") | |
| 212 | - | |
| 213 | - if mode in ['2', '3']: | |
| 214 | - # 并发测试 | |
| 215 | - concurrent_results = tester.test_model_concurrent( | |
| 216 | - model=model, | |
| 217 | - input_text=test_input, | |
| 218 | - iterations=iterations, | |
| 219 | - max_workers=10 # 可根据需要调整并发数 | |
| 220 | - ) | |
| 221 | - tester.print_results(concurrent_results) | |
| 222 | - concurrent_results["test_mode"] = "concurrent" | |
| 223 | - all_results.append(concurrent_results) | |
| 224 | - | |
| 225 | - if mode in ['1', '3'] or not mode: | |
| 226 | - # 顺序测试 | |
| 227 | - sequential_results = tester.test_model_sequential( | |
| 228 | - model=model, | |
| 229 | - input_text=test_input, | |
| 230 | - iterations=iterations | |
| 231 | - ) | |
| 232 | - tester.print_results(sequential_results) | |
| 233 | - sequential_results["test_mode"] = "sequential" | |
| 234 | - all_results.append(sequential_results) | |
| 235 | - | |
| 236 | - # 保存结果 | |
| 237 | - tester.save_results(all_results) | |
| 238 | - | |
| 239 | - # 汇总对比 | |
| 240 | - print("\n" + "="*60) | |
| 241 | - print("性能测试汇总对比") | |
| 242 | - print("="*60) | |
| 243 | - | |
| 244 | - for result in all_results: | |
| 245 | - if "error" not in result: | |
| 246 | - print(f"\n模型: {result['model']} ({result['test_mode']})") | |
| 247 | - print(f" QPS: {result['requests_per_second']:.2f}") | |
| 248 | - print(f" 平均延迟: {result['avg_latency']:.4f}秒") | |
| 249 | - print(f" 成功率: {result['success_rate']:.2f}%") | |
| 250 | - | |
| 251 | - | |
| 252 | -if __name__ == "__main__": | |
| 253 | - # 添加一个简单的健康检查 | |
| 254 | - try: | |
| 255 | - tester = EmbeddingPerformanceTester() | |
| 256 | - # 快速测试连接 | |
| 257 | - test_result = tester.test_single_request('bge-m3', ["test"]) | |
| 258 | - if test_result[0]: | |
| 259 | - print("API连接正常,开始性能测试...") | |
| 260 | - main() | |
| 261 | - else: | |
| 262 | - print("API连接失败,请检查服务是否正常运行") | |
| 263 | - except Exception as e: | |
| 264 | - print(f"初始化失败: {str(e)}") | |
| 265 | - print("请确保 requests 库已安装: pip install requests") | |
| 266 | 0 | \ No newline at end of file |
third-party/xinference/perfermance_test_single.py deleted
| ... | ... | @@ -1,108 +0,0 @@ |
| 1 | -import openai | |
| 2 | -import time | |
| 3 | -import requests | |
| 4 | -import json | |
| 5 | - | |
| 6 | - | |
| 7 | -client = openai.Client( | |
| 8 | - api_key="cannot be empty", | |
| 9 | - base_url="http://127.0.0.1:9997/v1" | |
| 10 | -) | |
| 11 | - | |
| 12 | -# 记录开始时间 | |
| 13 | -start_time = time.time() | |
| 14 | - | |
| 15 | -a = client.embeddings.create( | |
| 16 | - model='bge-m3', | |
| 17 | - input=["What is the capital of China?"] | |
| 18 | -) | |
| 19 | - | |
| 20 | -# 记录结束时间 | |
| 21 | -end_time = time.time() | |
| 22 | - | |
| 23 | -#print(a) | |
| 24 | -print(f"\n耗时: {end_time - start_time:.4f} 秒") | |
| 25 | - | |
| 26 | -# 记录开始时间 | |
| 27 | -start_time = time.time() | |
| 28 | - | |
| 29 | -a = client.embeddings.create( | |
| 30 | - model='Qwen3-Embedding-0.6B', | |
| 31 | - input=["What is the capital of China?"] | |
| 32 | -) | |
| 33 | - | |
| 34 | -# 记录结束时间 | |
| 35 | -end_time = time.time() | |
| 36 | - | |
| 37 | -#print(a) | |
| 38 | -print(f"\n耗时: {end_time - start_time:.4f} 秒") | |
| 39 | - | |
| 40 | -# ========== HTTP API 测试 ========== | |
| 41 | -print("\n" + "="*50) | |
| 42 | -print("HTTP API 测试") | |
| 43 | -print("="*50) | |
| 44 | - | |
| 45 | -# 配置 | |
| 46 | -XINFERENCE_HOST = "127.0.0.1" | |
| 47 | -XINFERENCE_PORT = "9997" | |
| 48 | -base_url = f"http://{XINFERENCE_HOST}:{XINFERENCE_PORT}/v1/embeddings" | |
| 49 | - | |
| 50 | -# 测试 bge-m3 模型 | |
| 51 | -print("\n测试模型: bge-m3") | |
| 52 | -start_time = time.time() | |
| 53 | - | |
| 54 | -response = requests.post( | |
| 55 | - base_url, | |
| 56 | - headers={ | |
| 57 | - 'accept': 'application/json', | |
| 58 | - 'Content-Type': 'application/json' | |
| 59 | - }, | |
| 60 | - json={ | |
| 61 | - "model": "bge-m3", | |
| 62 | - "input": "What is the capital of China?" | |
| 63 | - } | |
| 64 | -) | |
| 65 | - | |
| 66 | -end_time = time.time() | |
| 67 | - | |
| 68 | -if response.status_code == 200: | |
| 69 | - result = response.json() | |
| 70 | - print(f"状态码: {response.status_code}") | |
| 71 | - print(f"模型: {result.get('model', 'N/A')}") | |
| 72 | - print(f"使用token数: {result.get('usage', {}).get('total_tokens', 'N/A')}") | |
| 73 | - print(f"嵌入向量维度: {len(result.get('data', [{}])[0].get('embedding', []))}") | |
| 74 | - print(f"耗时: {end_time - start_time:.4f} 秒") | |
| 75 | -else: | |
| 76 | - print(f"请求失败,状态码: {response.status_code}") | |
| 77 | - print(f"错误信息: {response.text}") | |
| 78 | - | |
| 79 | -# 测试 Qwen3-Embedding-0.6B 模型 | |
| 80 | -print("\n测试模型: Qwen3-Embedding-0.6B") | |
| 81 | -start_time = time.time() | |
| 82 | - | |
| 83 | -response = requests.post( | |
| 84 | - base_url, | |
| 85 | - headers={ | |
| 86 | - 'accept': 'application/json', | |
| 87 | - 'Content-Type': 'application/json' | |
| 88 | - }, | |
| 89 | - json={ | |
| 90 | - "model": "Qwen3-Embedding-0.6B", | |
| 91 | - "input": "What is the capital of China?" | |
| 92 | - } | |
| 93 | -) | |
| 94 | - | |
| 95 | -end_time = time.time() | |
| 96 | - | |
| 97 | -if response.status_code == 200: | |
| 98 | - result = response.json() | |
| 99 | - print(f"状态码: {response.status_code}") | |
| 100 | - print(f"模型: {result.get('model', 'N/A')}") | |
| 101 | - print(f"使用token数: {result.get('usage', {}).get('total_tokens', 'N/A')}") | |
| 102 | - print(f"嵌入向量维度: {len(result.get('data', [{}])[0].get('embedding', []))}") | |
| 103 | - print(f"耗时: {end_time - start_time:.4f} 秒") | |
| 104 | -else: | |
| 105 | - print(f"请求失败,状态码: {response.status_code}") | |
| 106 | - print(f"错误信息: {response.text}") | |
| 107 | - | |
| 108 | - |
third-party/xinference/stop.sh deleted
third-party/xinference/test.sh deleted
| ... | ... | @@ -1,11 +0,0 @@ |
| 1 | -if [ "$CONDA_DEFAULT_ENV" != "tw" ]; then | |
| 2 | - echo "当前环境不是 tw,正在激活 tw 环境..." | |
| 3 | - CONDA_ROOT="${CONDA_ROOT:-/home/tw/miniconda3}" | |
| 4 | - source "$CONDA_ROOT/etc/profile.d/conda.sh" | |
| 5 | - conda activate tw | |
| 6 | - echo "已激活 tw 环境" | |
| 7 | -else | |
| 8 | - echo "当前已经在 tw 环境中,无需重复激活" | |
| 9 | -fi | |
| 10 | - | |
| 11 | -python perfermance_test_single.py |
third-party/xinference/测试结果-perfermance_test.txt deleted
| ... | ... | @@ -1,166 +0,0 @@ |
| 1 | -============================================================ | |
| 2 | -Embedding API 性能测试 | |
| 3 | -============================================================ | |
| 4 | - | |
| 5 | -选择测试模式: | |
| 6 | -1. 顺序测试 (Sequential) | |
| 7 | -2. 并发测试 (Concurrent) | |
| 8 | -3. 两种模式都测试 | |
| 9 | -请输入选择 (1/2/3, 默认1): 3 | |
| 10 | - | |
| 11 | -============================================================ | |
| 12 | -测试模型: bge-m3 | |
| 13 | -============================================================ | |
| 14 | - | |
| 15 | -开始并发测试模型: bge-m3 | |
| 16 | -测试次数: 1000, 并发数: 10 | |
| 17 | - 已完成 100/1000 次请求... | |
| 18 | - 已完成 200/1000 次请求... | |
| 19 | - 已完成 300/1000 次请求... | |
| 20 | - 已完成 400/1000 次请求... | |
| 21 | - 已完成 500/1000 次请求... | |
| 22 | - 已完成 600/1000 次请求... | |
| 23 | - 已完成 700/1000 次请求... | |
| 24 | - 已完成 800/1000 次请求... | |
| 25 | - 已完成 900/1000 次请求... | |
| 26 | - 已完成 1000/1000 次请求... | |
| 27 | - | |
| 28 | -============================================================ | |
| 29 | -性能测试结果 - bge-m3 | |
| 30 | -============================================================ | |
| 31 | -总请求数: 1000 | |
| 32 | -成功请求: 1000 | |
| 33 | -失败请求: 0 | |
| 34 | -成功率: 100.00% | |
| 35 | -总耗时: 212.9068秒 | |
| 36 | -平均延迟: 0.2129秒 | |
| 37 | -最小延迟: 0.0507秒 | |
| 38 | -最大延迟: 0.6196秒 | |
| 39 | -P50延迟: 0.0942秒 | |
| 40 | -P95延迟: 0.5580秒 | |
| 41 | -P99延迟: 0.5884秒 | |
| 42 | -标准差: 0.2010秒 | |
| 43 | -QPS: 4.70 请求/秒 | |
| 44 | -============================================================ | |
| 45 | - | |
| 46 | -开始顺序测试模型: bge-m3 | |
| 47 | -测试次数: 1000 | |
| 48 | - 已完成 100/1000 次请求... | |
| 49 | - 已完成 200/1000 次请求... | |
| 50 | - 已完成 300/1000 次请求... | |
| 51 | - 已完成 400/1000 次请求... | |
| 52 | - 已完成 500/1000 次请求... | |
| 53 | - 已完成 600/1000 次请求... | |
| 54 | - 已完成 700/1000 次请求... | |
| 55 | - 已完成 800/1000 次请求... | |
| 56 | - 已完成 900/1000 次请求... | |
| 57 | - | |
| 58 | -============================================================ | |
| 59 | -性能测试结果 - bge-m3 | |
| 60 | -============================================================ | |
| 61 | -总请求数: 1000 | |
| 62 | -成功请求: 1000 | |
| 63 | -失败请求: 0 | |
| 64 | -成功率: 100.00% | |
| 65 | -总耗时: 81.7646秒 | |
| 66 | -平均延迟: 0.0818秒 | |
| 67 | -最小延迟: 0.0328秒 | |
| 68 | -最大延迟: 0.5812秒 | |
| 69 | -P50延迟: 0.0347秒 | |
| 70 | -P95延迟: 0.4893秒 | |
| 71 | -P99延迟: 0.5047秒 | |
| 72 | -标准差: 0.1377秒 | |
| 73 | -QPS: 12.23 请求/秒 | |
| 74 | -============================================================ | |
| 75 | - | |
| 76 | -============================================================ | |
| 77 | -测试模型: Qwen3-Embedding-0.6B | |
| 78 | -============================================================ | |
| 79 | - | |
| 80 | -开始并发测试模型: Qwen3-Embedding-0.6B | |
| 81 | -测试次数: 1000, 并发数: 10 | |
| 82 | - 已完成 100/1000 次请求... | |
| 83 | - 已完成 200/1000 次请求... | |
| 84 | - 已完成 300/1000 次请求... | |
| 85 | - 已完成 400/1000 次请求... | |
| 86 | - 已完成 500/1000 次请求... | |
| 87 | - 已完成 600/1000 次请求... | |
| 88 | - 已完成 700/1000 次请求... | |
| 89 | - 已完成 800/1000 次请求... | |
| 90 | - 已完成 900/1000 次请求... | |
| 91 | - 已完成 1000/1000 次请求... | |
| 92 | - | |
| 93 | -============================================================ | |
| 94 | -性能测试结果 - Qwen3-Embedding-0.6B | |
| 95 | -============================================================ | |
| 96 | -总请求数: 1000 | |
| 97 | -成功请求: 1000 | |
| 98 | -失败请求: 0 | |
| 99 | -成功率: 100.00% | |
| 100 | -总耗时: 210.1917秒 | |
| 101 | -平均延迟: 0.2102秒 | |
| 102 | -最小延迟: 0.0651秒 | |
| 103 | -最大延迟: 0.6659秒 | |
| 104 | -P50延迟: 0.1123秒 | |
| 105 | -P95延迟: 0.5845秒 | |
| 106 | -P99延迟: 0.6210秒 | |
| 107 | -标准差: 0.1877秒 | |
| 108 | -QPS: 4.76 请求/秒 | |
| 109 | -============================================================ | |
| 110 | - | |
| 111 | -开始顺序测试模型: Qwen3-Embedding-0.6B | |
| 112 | -测试次数: 1000 | |
| 113 | - 已完成 100/1000 次请求... | |
| 114 | - | |
| 115 | - 已完成 200/1000 次请求... | |
| 116 | - 已完成 300/1000 次请求... | |
| 117 | - 已完成 400/1000 次请求... | |
| 118 | - 已完成 500/1000 次请求... | |
| 119 | - 已完成 600/1000 次请求... | |
| 120 | - 已完成 700/1000 次请求... | |
| 121 | - 已完成 800/1000 次请求... | |
| 122 | - 已完成 900/1000 次请求... | |
| 123 | - | |
| 124 | -============================================================ | |
| 125 | -性能测试结果 - Qwen3-Embedding-0.6B | |
| 126 | -============================================================ | |
| 127 | -总请求数: 1000 | |
| 128 | -成功请求: 1000 | |
| 129 | -失败请求: 0 | |
| 130 | -成功率: 100.00% | |
| 131 | -总耗时: 109.9795秒 | |
| 132 | -平均延迟: 0.1100秒 | |
| 133 | -最小延迟: 0.0571秒 | |
| 134 | -最大延迟: 0.5806秒 | |
| 135 | -P50延迟: 0.0600秒 | |
| 136 | -P95延迟: 0.5648秒 | |
| 137 | -P99延迟: 0.5745秒 | |
| 138 | -标准差: 0.1494秒 | |
| 139 | -QPS: 9.09 请求/秒 | |
| 140 | -============================================================ | |
| 141 | - | |
| 142 | -结果已保存到: performance_results.json | |
| 143 | - | |
| 144 | -============================================================ | |
| 145 | -性能测试汇总对比 | |
| 146 | -============================================================ | |
| 147 | - | |
| 148 | -模型: bge-m3 (concurrent) | |
| 149 | - QPS: 4.70 | |
| 150 | - 平均延迟: 0.2129秒 | |
| 151 | - 成功率: 100.00% | |
| 152 | - | |
| 153 | -模型: bge-m3 (sequential) | |
| 154 | - QPS: 12.23 | |
| 155 | - 平均延迟: 0.0818秒 | |
| 156 | - 成功率: 100.00% | |
| 157 | - | |
| 158 | -模型: Qwen3-Embedding-0.6B (concurrent) | |
| 159 | - QPS: 4.76 | |
| 160 | - 平均延迟: 0.2102秒 | |
| 161 | - 成功率: 100.00% | |
| 162 | - | |
| 163 | -模型: Qwen3-Embedding-0.6B (sequential) | |
| 164 | - QPS: 9.09 | |
| 165 | - 平均延迟: 0.1100秒 | |
| 166 | - 成功率: 100.00% |
third-party/xinference/测试结果-perfermance_test_http.txt deleted
| ... | ... | @@ -1,167 +0,0 @@ |
| 1 | -$ p perfermance_test_http.py | |
| 2 | -API连接正常,开始性能测试... | |
| 3 | -============================================================ | |
| 4 | -Embedding API 性能测试 (HTTP) | |
| 5 | -============================================================ | |
| 6 | - | |
| 7 | -选择测试模式: | |
| 8 | -1. 顺序测试 (Sequential) | |
| 9 | -2. 并发测试 (Concurrent) | |
| 10 | -3. 两种模式都测试 | |
| 11 | -请输入选择 (1/2/3, 默认1): 3 | |
| 12 | - | |
| 13 | -============================================================ | |
| 14 | -测试模型: bge-m3 | |
| 15 | -============================================================ | |
| 16 | - | |
| 17 | -开始并发测试模型: bge-m3 | |
| 18 | -测试次数: 1000, 并发数: 10 | |
| 19 | - 已完成 100/1000 次请求... | |
| 20 | - 已完成 200/1000 次请求... | |
| 21 | - 已完成 300/1000 次请求... | |
| 22 | - 已完成 400/1000 次请求... | |
| 23 | - 已完成 500/1000 次请求... | |
| 24 | - 已完成 600/1000 次请求... | |
| 25 | - 已完成 700/1000 次请求... | |
| 26 | - 已完成 800/1000 次请求... | |
| 27 | - 已完成 900/1000 次请求... | |
| 28 | - 已完成 1000/1000 次请求... | |
| 29 | - | |
| 30 | -============================================================ | |
| 31 | -性能测试结果 - bge-m3 | |
| 32 | -============================================================ | |
| 33 | -总请求数: 1000 | |
| 34 | -成功请求: 1000 | |
| 35 | -失败请求: 0 | |
| 36 | -成功率: 100.00% | |
| 37 | -总耗时: 145.1439秒 | |
| 38 | -平均延迟: 0.1451秒 | |
| 39 | -最小延迟: 0.0311秒 | |
| 40 | -最大延迟: 0.5770秒 | |
| 41 | -P50延迟: 0.0599秒 | |
| 42 | -P95延迟: 0.5151秒 | |
| 43 | -P99延迟: 0.5704秒 | |
| 44 | -标准差: 0.1789秒 | |
| 45 | -QPS: 6.89 请求/秒 | |
| 46 | -============================================================ | |
| 47 | - | |
| 48 | -开始顺序测试模型: bge-m3 | |
| 49 | -测试次数: 1000 | |
| 50 | - 已完成 100/1000 次请求... | |
| 51 | - 已完成 200/1000 次请求... | |
| 52 | - 已完成 300/1000 次请求... | |
| 53 | - 已完成 400/1000 次请求... | |
| 54 | - 已完成 500/1000 次请求... | |
| 55 | - 已完成 600/1000 次请求... | |
| 56 | - 已完成 700/1000 次请求... | |
| 57 | - 已完成 800/1000 次请求... | |
| 58 | - 已完成 900/1000 次请求... | |
| 59 | - | |
| 60 | -============================================================ | |
| 61 | -性能测试结果 - bge-m3 | |
| 62 | -============================================================ | |
| 63 | -总请求数: 1000 | |
| 64 | -成功请求: 1000 | |
| 65 | -失败请求: 0 | |
| 66 | -成功率: 100.00% | |
| 67 | -总耗时: 74.5284秒 | |
| 68 | -平均延迟: 0.0745秒 | |
| 69 | -最小延迟: 0.0271秒 | |
| 70 | -最大延迟: 0.5767秒 | |
| 71 | -P50延迟: 0.0286秒 | |
| 72 | -P95延迟: 0.4797秒 | |
| 73 | -P99延迟: 0.5037秒 | |
| 74 | -标准差: 0.1364秒 | |
| 75 | -QPS: 13.42 请求/秒 | |
| 76 | -============================================================ | |
| 77 | - | |
| 78 | -============================================================ | |
| 79 | -测试模型: Qwen3-Embedding-0.6B | |
| 80 | -============================================================ | |
| 81 | - | |
| 82 | -开始并发测试模型: Qwen3-Embedding-0.6B | |
| 83 | -测试次数: 1000, 并发数: 10 | |
| 84 | - 已完成 100/1000 次请求... | |
| 85 | - 已完成 200/1000 次请求... | |
| 86 | - 已完成 300/1000 次请求... | |
| 87 | - 已完成 400/1000 次请求... | |
| 88 | - 已完成 500/1000 次请求... | |
| 89 | - 已完成 600/1000 次请求... | |
| 90 | - 已完成 700/1000 次请求... | |
| 91 | - 已完成 800/1000 次请求... | |
| 92 | - 已完成 900/1000 次请求... | |
| 93 | - 已完成 1000/1000 次请求... | |
| 94 | - | |
| 95 | -============================================================ | |
| 96 | -性能测试结果 - Qwen3-Embedding-0.6B | |
| 97 | -============================================================ | |
| 98 | -总请求数: 1000 | |
| 99 | -成功请求: 1000 | |
| 100 | -失败请求: 0 | |
| 101 | -成功率: 100.00% | |
| 102 | -总耗时: 195.7997秒 | |
| 103 | -平均延迟: 0.1958秒 | |
| 104 | -最小延迟: 0.0564秒 | |
| 105 | -最大延迟: 0.6201秒 | |
| 106 | -P50延迟: 0.1050秒 | |
| 107 | -P95延迟: 0.5674秒 | |
| 108 | -P99延迟: 0.5994秒 | |
| 109 | -标准差: 0.1829秒 | |
| 110 | -QPS: 5.11 请求/秒 | |
| 111 | -============================================================ | |
| 112 | - | |
| 113 | -开始顺序测试模型: Qwen3-Embedding-0.6B | |
| 114 | -测试次数: 1000 | |
| 115 | - 已完成 100/1000 次请求... | |
| 116 | - 已完成 200/1000 次请求... | |
| 117 | - 已完成 300/1000 次请求... | |
| 118 | - 已完成 400/1000 次请求... | |
| 119 | - 已完成 500/1000 次请求... | |
| 120 | - 已完成 600/1000 次请求... | |
| 121 | - 已完成 700/1000 次请求... | |
| 122 | - 已完成 800/1000 次请求... | |
| 123 | - 已完成 900/1000 次请求... | |
| 124 | - | |
| 125 | -============================================================ | |
| 126 | -性能测试结果 - Qwen3-Embedding-0.6B | |
| 127 | -============================================================ | |
| 128 | -总请求数: 1000 | |
| 129 | -成功请求: 1000 | |
| 130 | -失败请求: 0 | |
| 131 | -成功率: 100.00% | |
| 132 | -总耗时: 100.2533秒 | |
| 133 | -平均延迟: 0.1003秒 | |
| 134 | -最小延迟: 0.0513秒 | |
| 135 | -最大延迟: 0.6249秒 | |
| 136 | -P50延迟: 0.0539秒 | |
| 137 | -P95延迟: 0.4993秒 | |
| 138 | -P99延迟: 0.5180秒 | |
| 139 | -标准差: 0.1354秒 | |
| 140 | -QPS: 9.97 请求/秒 | |
| 141 | -============================================================ | |
| 142 | - | |
| 143 | -结果已保存到: performance_results.json | |
| 144 | - | |
| 145 | -============================================================ | |
| 146 | -性能测试汇总对比 | |
| 147 | -============================================================ | |
| 148 | - | |
| 149 | -模型: bge-m3 (concurrent) | |
| 150 | - QPS: 6.89 | |
| 151 | - 平均延迟: 0.1451秒 | |
| 152 | - 成功率: 100.00% | |
| 153 | - | |
| 154 | -模型: bge-m3 (sequential) | |
| 155 | - QPS: 13.42 | |
| 156 | - 平均延迟: 0.0745秒 | |
| 157 | - 成功率: 100.00% | |
| 158 | - | |
| 159 | -模型: Qwen3-Embedding-0.6B (concurrent) | |
| 160 | - QPS: 5.11 | |
| 161 | - 平均延迟: 0.1958秒 | |
| 162 | - 成功率: 100.00% | |
| 163 | - | |
| 164 | -模型: Qwen3-Embedding-0.6B (sequential) | |
| 165 | - QPS: 9.97 | |
| 166 | - 平均延迟: 0.1003秒 | |
| 167 | - 成功率: 100.00% |