微服务批大小建议

enrich： 20 文本向量化：24 图片向量化：8 翻译： 24

微服务批大小建议
enrich： 20 文本向量化：24 图片向量化：8 翻译： 24
tangwang
1 parent 6826fd31
Showing 1 changed file with 24 additions and 39 deletions Show diff stats
docs/搜索API对接指南-07-微服务接口（Embedding-Reranker-Translation）.md
@@ -48,10 +48,12 @@
 将文本列表转为 1024 维向量，用于语义搜索、文档索引等。
-**请求体**（JSON 数组）:
+**完整 curl 示例**:
-```json
-["文本1", "文本2", "文本3"]
+```bash
+curl -X POST "http://localhost:6005/embed/text?normalize=true&priority=1" \
+  -H "Content-Type: application/json" \
+  -d '["芭比娃娃 儿童玩具", "纯棉T恤 短袖"]'
 ```
 **响应**（JSON 数组，与输入一一对应）:
@@ -60,17 +62,11 @@
 [[0.01, -0.02, ...], [0.03, 0.01, ...], ...]
 ```
-**完整 curl 示例**:
-
-```bash
-curl -X POST "http://localhost:6005/embed/text?normalize=true&priority=1" \
-  -H "Content-Type: application/json" \
-  -d '["芭比娃娃 儿童玩具", "纯棉T恤 短袖"]'
-```
-
 说明：
-- 在线 query / 实时请求：建议显式传 `priority=1`
+- 在线 query / 实时请求：显式传 `priority=1`
 - 离线索引 / 批量回填：保持默认 `priority=0` 即可
+- 离线索引建议直接传数组，**单次尽量聚合到 24 条文本**。
+- 当前 TEI 客户端按 `TEI_MAX_CLIENT_BATCH_SIZE=24` 调优；超过该值会继续拆批，但推荐调用方主动分批。
 #### 7.1.2 `POST /embed/image` — 图片向量化
@@ -78,10 +74,12 @@ curl -X POST &quot;http://localhost:6005/embed/text?normalize=true&amp;priority=1&quot; \
 前置条件：`cnclip` 服务已启动（默认端口 `51000`）。若未启动，图片 embedding 服务启动会失败或请求返回错误。
-**请求体**（JSON 数组）:
+**完整 curl 示例**:
-```json
-["https://example.com/image1.jpg", "https://example.com/image2.jpg"]
+```bash
+curl -X POST "http://localhost:6008/embed/image?normalize=true&priority=1" \
+  -H "Content-Type: application/json" \
+  -d '["https://oss.essa.cn/98532128-cf8e-456c-9e30-6f2a5ea0c19f.jpg"]'
 ```
 **响应**（JSON 数组，与输入一一对应）:
@@ -90,26 +88,19 @@ curl -X POST &quot;http://localhost:6005/embed/text?normalize=true&amp;priority=1&quot; \
 [[0.01, -0.02, ...], [0.03, 0.01, ...], ...]
 ```
-**完整 curl 示例**:
-
-```bash
-curl -X POST "http://localhost:6008/embed/image?normalize=true&priority=1" \
-  -H "Content-Type: application/json" \
-  -d '["https://oss.essa.cn/98532128-cf8e-456c-9e30-6f2a5ea0c19f.jpg"]'
-```
-
 在线以图搜图等实时场景可传 `priority=1`；离线索引回填保持默认 `priority=0`。
+离线图片向量化建议**每次聚合 `8` 张图片 URL** 后请求，避免长期逐张调用。
 #### 7.1.3 `POST /embed/clip_text` — CN-CLIP 文本多模态向量（与图片同空间）
 将**自然语言短语**编码为向量，与 `POST /embed/image` 输出的图向量**处于同一向量空间**（Chinese-CLIP 文本塔 / 图塔），用于 **以文搜图**、与 ES `image_embedding` 对齐的 KNN 等。默认配置为 **ViT-H-14**，向量长度 **1024**（与 `mappings/search_products.json` 中 `image_embedding.vector.dims` 一致）；若改为 ViT-L-14 则为 768 维，须同步索引映射与全量重索引。
-与 **7.1.1** 的 `POST /embed/text`（TEI/BGE，语义检索）**不是同一模型、不是同一空间**，请勿混用。
-
-**请求体**（JSON 数组，每项为字符串；**不要**传入 `http://` / `https://` 图片 URL，图片请用 `/embed/image`）:
+**curl 示例**:
-```json
-["纯棉短袖T恤", "芭比娃娃连衣裙"]
+```bash
+curl -X POST "http://localhost:6008/embed/clip_text?normalize=true&priority=1" \
+  -H "Content-Type: application/json" \
+  -d '["纯棉短袖", "street tee"]'
 ```
 **响应**（JSON 数组，与输入一一对应）:
@@ -118,15 +109,8 @@ curl -X POST &quot;http://localhost:6008/embed/image?normalize=true&amp;priority=1&quot; \
 [[0.01, -0.02, ...], [0.03, 0.01, ...], ...]
 ```
-**curl 示例**:
-
-```bash
-curl -X POST "http://localhost:6008/embed/clip_text?normalize=true&priority=1" \
-  -H "Content-Type: application/json" \
-  -d '["纯棉短袖", "street tee"]'
-```
-
 说明：与 `/embed/image` 共用图片侧限流与 `IMAGE_MAX_INFLIGHT`；Redis 缓存键 namespace 为 `clip_text`，与 TEI 文本缓存区分。
+调用建议：离线批量构建图文检索索引时，建议**每次聚合 `24` 条短文本**。
 #### 7.1.4 `GET /health` — 健康检查
@@ -164,7 +148,7 @@ curl &quot;http://localhost:6008/ready&quot;
 使用单套主服务即可同时兼顾：
 - 在线 query 向量化（低延迟，常见 `batch=1~4`）
-- 索引构建向量化（高吞吐，常见 `batch=15~20`）
+- 索引构建向量化（高吞吐，常见 `batch=24`）
 统一启动（主链路）：
@@ -309,10 +293,11 @@ curl &quot;http://localhost:6007/health&quot;
 **Batch Size / 调用方式建议**:
 - 本接口支持 `text: string[]`；离线或批量索引翻译时，应尽量合并请求，让底层 backend 发挥批处理能力。
+- 对商品标题、标题片段、属性值等短文本，调用方建议**按 `8~16` 条/批**聚合后再调用；这比大量 `batch=1` 并发更能发挥 GPU 吞吐。
 - `nllb-200-distilled-600m` 在当前 `Tesla T4` 压测中，推荐配置是 `batch_size=16`、`max_new_tokens=64`、`attn_implementation=sdpa`；继续升到 `batch_size=32` 虽可能提高吞吐，但 tail latency 会明显变差。
 - 在线 query 场景可直接把“单条请求”理解为 `batch_size=1`；更关注 request latency，而不是离线吞吐。
 - `opus-mt-zh-en` / `opus-mt-en-zh` 当前生产配置也是 `batch_size=16`，适合作为中英互译的低延迟本地默认值；若走在线单条调用，同样按 `batch_size=1` 理解即可。
-- `llm` 按单条请求即可。
+- `llm`、`qwen-mt`、`deepl` 也支持列表调用；如果是离线索引链路，仍建议优先合并成批，以减少 HTTP 往返和缓存探测开销。
 **响应**:
 ```json
@@ -444,7 +429,7 @@ curl &quot;http://localhost:6006/health&quot;
 - **Base URL**: Indexer 服务地址，如 `http://localhost:6004`
 - **路径**: `POST /indexer/enrich-content`
-- **说明**: 根据商品标题批量生成 `qanchors`、`enriched_attributes`、`enriched_tags`、`enriched_taxonomy_attributes`，用于拼装 ES 文档。支持通过 `enrichment_scopes` 选择执行 `generic` / `category_taxonomy`，并通过 `category_taxonomy_profile` 选择对应大类的 taxonomy prompt/profile；默认执行 `generic + category_taxonomy(apparel)`。当前支持的 taxonomy profile 包括 `apparel`、`3c`、`bags`、`pet_supplies`、`electronics`、`outdoor`、`home_appliances`、`home_living`、`wigs`、`beauty`、`accessories`、`toys`、`shoes`、`sports`、`others`。所有 profile 的 taxonomy 输出都统一返回 `zh` + `en`，`category_taxonomy_profile` 只决定字段集合。内部使用大模型（需配置 `DASHSCOPE_API_KEY`），支持多语言与 Redis 缓存；单次最多 50 条，建议批量调用以提升效率。
+- **说明**: 根据商品标题批量生成 `qanchors`、`enriched_attributes`、`enriched_tags`、`enriched_taxonomy_attributes`，用于拼装 ES 文档。支持通过 `enrichment_scopes` 选择执行 `generic` / `category_taxonomy`，并通过 `category_taxonomy_profile` 选择对应大类的 taxonomy prompt/profile；默认执行 `generic + category_taxonomy(apparel)`。当前支持的 taxonomy profile 包括 `apparel`、`3c`、`bags`、`pet_supplies`、`electronics`、`outdoor`、`home_appliances`、`home_living`、`wigs`、`beauty`、`accessories`、`toys`、`shoes`、`sports`、`others`。所有 profile 的 taxonomy 输出都统一返回 `zh` + `en`，`category_taxonomy_profile` 只决定字段集合。内部使用大模型（需配置 `DASHSCOPE_API_KEY`），支持按单条进行 Redis 缓存；单次最多 50 条，但推荐常态请求控制在 `20` 条左右**。内部大模型处理批次按 `20` 条拆分。
 请求/响应格式、示例及错误码见 [-05-索引接口（Indexer）](./搜索API对接指南-05-索引接口（Indexer）.md#58-内容理解字段生成接口)。