Compare View

switch
from
...
to
 
Commits (8)
  • tangwang
     
  • tangwang
     
  • tangwang
     
  • tangwang
     
  • ## 背景
    多语言索引下,用户查询常中英混写;需在解析阶段显式标记脚本类型,并在 BM25 子句中同时覆盖对应语言字段。
    
    ## 方案
    
    ### 1. Query 分析(query_parser.ParsedQuery)
    - 新增 `contains_chinese`:query 文本含 CJK(沿用 _contains_cjk)。
    - 新增 `contains_english`:分词结果中存在「纯英文、len>=3」token(fullmatch 字母及可选连字符)。
    - 写入 to_dict、请求 context 中间结果,便于调试与 API 透出。
    
    ### 2. ES 文本召回(es_query_builder._build_advanced_text_query)
    - 对每个 search_lang 子句:若含英文且子句语言非 en(且租户 index_languages 含 en),合并 en 列字段;若含中文且子句语言非 zh(且含 zh),合并 zh 列字段。
    - 合并进来的字段 boost 乘以 `mixed_script_merged_field_boost_scale`(默认 0.8,可在 ESQueryBuilder 构造参数调整)。
    - fallback_original_query_* 分支同样应用上述逻辑。
    
    ### 3. 实现整理
    - 引入 `MatchFieldSpec = (field_path, boost)`:`_build_match_field_specs` 为唯一权重来源;`_merge_supplemental_lang_field_specs` / `_expand_match_field_specs_for_mixed_script` 在 tuple 上合并与缩放;最后 `_format_match_field_specs` 再格式化为 ES `path^boost`,避免先拼字符串再解析。
    
    ## 测试
    - tests/test_query_parser_mixed_language.py:脚本标记与 token 规则。
    - tests/test_es_query_builder.py:合并字段、0.8 缩放、index_languages 限制。
    
    Made-with: Cursor
    tangwang
     
  • tangwang
     
  • ResultFormatter.format_search_results() runs.
    
    What changed:
    
    For each final paginated SPU hit, the searcher now scans
    skus[].option1_value against the query text set built from the original
    query, normalized query, rewritten query, and translations.
    If no option1_value matches textually, it falls back to embedding
    similarity and picks the SKU with the highest inner product against the
    query embedding.
    The matched SKU is promoted to the front of the SPU’s skus list.
    The SPU-level image_url is replaced with that matched SKU’s image_src.
    I left api/result_formatter.py unchanged because it already preserves
    the SKU order and reads image_url from _source; updating the page hits
    in searcher makes the formatter return the desired result automatically.
    
    Verification:
    
    ReadLints on the edited files: no errors
    Passed targeted tests:
    pytest tests/test_search_rerank_window.py -k "translated_query or
    no_direct_option_match"
    tangwang
     
  • tangwang
     
@@ -22,6 +22,8 @@ API_HOST=0.0.0.0 @@ -22,6 +22,8 @@ API_HOST=0.0.0.0
22 API_PORT=6002 22 API_PORT=6002
23 INDEXER_HOST=0.0.0.0 23 INDEXER_HOST=0.0.0.0
24 INDEXER_PORT=6004 24 INDEXER_PORT=6004
  25 +# Embedding HTTP servers (text 6005 / image 6008): bind all interfaces
  26 +EMBEDDING_HOST=0.0.0.0
25 27
26 # Optional service ports 28 # Optional service ports
27 FRONTEND_PORT=6003 29 FRONTEND_PORT=6003
@@ -0,0 +1,17 @@ @@ -0,0 +1,17 @@
  1 +docs
  2 +# Please enter the commit message for your changes. Lines starting
  3 +# with '#' will be ignored, and an empty message aborts the commit.
  4 +#
  5 +# On branch master
  6 +# Your branch is ahead of 'origin/master' by 5 commits.
  7 +# (use "git push" to publish your local commits)
  8 +#
  9 +# Changes to be committed:
  10 +# modified: config/config.yaml
  11 +# modified: docs/TODO.txt
  12 +# modified: "docs/\346\220\234\347\264\242API\345\257\271\346\216\245\346\214\207\345\215\227-07-\345\276\256\346\234\215\345\212\241\346\216\245\345\217\243\357\274\210Embedding-Reranker-Translation\357\274\211.md"
  13 +# modified: "docs/\347\233\270\345\205\263\346\200\247\346\243\200\347\264\242\344\274\230\345\214\226\350\257\264\346\230\216.md"
  14 +#
  15 +# Changes not staged for commit:
  16 +# modified: third-party/clip-as-service (untracked content)
  17 +#
config/config.yaml
@@ -31,9 +31,9 @@ field_boosts: @@ -31,9 +31,9 @@ field_boosts:
31 category_path: 1.5 31 category_path: 1.5
32 category_name_text: 1.5 32 category_name_text: 1.5
33 tags: 1.0 33 tags: 1.0
34 - option1_values: 0.5  
35 - option2_values: 0.5  
36 - option3_values: 0.5 34 + option1_values: 0.6
  35 + option2_values: 0.4
  36 + option3_values: 0.4
37 37
38 # Query Configuration(查询配置) 38 # Query Configuration(查询配置)
39 query_config: 39 query_config:
@@ -47,6 +47,11 @@ query_config: @@ -47,6 +47,11 @@ query_config:
47 enable_text_embedding: true 47 enable_text_embedding: true
48 enable_query_rewrite: true 48 enable_query_rewrite: true
49 49
  50 + # 查询解析阶段:翻译与 query 向量并发执行,共用同一等待预算(毫秒)。
  51 + # 检测语言已在租户 index_languages 内:较短;不在索引语言内:较长(翻译对召回更关键)。
  52 + translation_embedding_wait_budget_ms_source_in_index: 80
  53 + translation_embedding_wait_budget_ms_source_not_in_index: 200
  54 +
50 # 动态多语言检索字段配置 55 # 动态多语言检索字段配置
51 # multilingual_fields 会被拼成 title.{lang}/brief.{lang}/... 形式; 56 # multilingual_fields 会被拼成 title.{lang}/brief.{lang}/... 形式;
52 # shared_fields 为无语言后缀字段。 57 # shared_fields 为无语言后缀字段。
@@ -85,7 +90,34 @@ query_config: @@ -85,7 +90,34 @@ query_config:
85 90
86 # 返回字段配置(_source includes) 91 # 返回字段配置(_source includes)
87 # null表示返回所有字段,[]表示不返回任何字段,列表表示只返回指定字段 92 # null表示返回所有字段,[]表示不返回任何字段,列表表示只返回指定字段
88 - source_fields: null 93 + # 下列字段与 api/result_formatter.py(SpuResult 填充)及 search/searcher.py(SKU 排序/主图替换)一致
  94 + source_fields:
  95 + - spu_id
  96 + - handle
  97 + - title
  98 + - brief
  99 + - description
  100 + - vendor
  101 + - category_name
  102 + - category_name_text
  103 + - category_path
  104 + - category_id
  105 + - category_level
  106 + - category1_name
  107 + - category2_name
  108 + - category3_name
  109 + - tags
  110 + - min_price
  111 + - compare_at_price
  112 + - image_url
  113 + - sku_prices
  114 + - sku_weights
  115 + - sku_weight_units
  116 + - total_inventory
  117 + - option1_name
  118 + - option1_values
  119 + - specifications
  120 + - skus
89 121
90 # KNN boost配置(向量召回的boost值) 122 # KNN boost配置(向量召回的boost值)
91 knn_boost: 0.25 # Lower boost for embedding recall 123 knn_boost: 0.25 # Lower boost for embedding recall
@@ -110,7 +142,7 @@ rerank: @@ -110,7 +142,7 @@ rerank:
110 services: 142 services:
111 translation: 143 translation:
112 service_url: "http://127.0.0.1:6006" 144 service_url: "http://127.0.0.1:6006"
113 - default_model: "llm" 145 + default_model: "nllb-200-distilled-600m"
114 default_scene: "general" 146 default_scene: "general"
115 timeout_sec: 10.0 147 timeout_sec: 10.0
116 cache: 148 cache:
@@ -297,6 +297,12 @@ class AppConfigLoader: @@ -297,6 +297,12 @@ class AppConfigLoader:
297 default_translation_model=str( 297 default_translation_model=str(
298 query_cfg.get("default_translation_model") or "nllb-200-distilled-600m" 298 query_cfg.get("default_translation_model") or "nllb-200-distilled-600m"
299 ), 299 ),
  300 + translation_embedding_wait_budget_ms_source_in_index=int(
  301 + query_cfg.get("translation_embedding_wait_budget_ms_source_in_index", 80)
  302 + ),
  303 + translation_embedding_wait_budget_ms_source_not_in_index=int(
  304 + query_cfg.get("translation_embedding_wait_budget_ms_source_not_in_index", 200)
  305 + ),
300 ) 306 )
301 307
302 function_score_cfg = raw.get("function_score") if isinstance(raw.get("function_score"), dict) else {} 308 function_score_cfg = raw.get("function_score") if isinstance(raw.get("function_score"), dict) else {}
@@ -445,7 +451,7 @@ class AppConfigLoader: @@ -445,7 +451,7 @@ class AppConfigLoader:
445 api_port=int(os.getenv("API_PORT", 6002)), 451 api_port=int(os.getenv("API_PORT", 6002)),
446 indexer_host=os.getenv("INDEXER_HOST", "0.0.0.0"), 452 indexer_host=os.getenv("INDEXER_HOST", "0.0.0.0"),
447 indexer_port=int(os.getenv("INDEXER_PORT", 6004)), 453 indexer_port=int(os.getenv("INDEXER_PORT", 6004)),
448 - embedding_host=os.getenv("EMBEDDING_HOST", "127.0.0.1"), 454 + embedding_host=os.getenv("EMBEDDING_HOST", "0.0.0.0"),
449 embedding_port=int(os.getenv("EMBEDDING_PORT", 6005)), 455 embedding_port=int(os.getenv("EMBEDDING_PORT", 6005)),
450 embedding_text_port=int(os.getenv("EMBEDDING_TEXT_PORT", 6005)), 456 embedding_text_port=int(os.getenv("EMBEDDING_TEXT_PORT", 6005)),
451 embedding_image_port=int(os.getenv("EMBEDDING_IMAGE_PORT", 6008)), 457 embedding_image_port=int(os.getenv("EMBEDDING_IMAGE_PORT", 6008)),
@@ -61,6 +61,11 @@ class QueryConfig: @@ -61,6 +61,11 @@ class QueryConfig:
61 zh_to_en_model: str = "opus-mt-zh-en" 61 zh_to_en_model: str = "opus-mt-zh-en"
62 en_to_zh_model: str = "opus-mt-en-zh" 62 en_to_zh_model: str = "opus-mt-en-zh"
63 default_translation_model: str = "nllb-200-distilled-600m" 63 default_translation_model: str = "nllb-200-distilled-600m"
  64 + # 查询阶段:翻译与向量生成并发提交后,共用同一等待预算(毫秒)。
  65 + # 检测语言已在租户 index_languages 内:偏快返回,预算较短。
  66 + # 检测语言不在 index_languages 内:翻译对召回更关键,预算较长。
  67 + translation_embedding_wait_budget_ms_source_in_index: int = 80
  68 + translation_embedding_wait_budget_ms_source_not_in_index: int = 200
64 69
65 70
66 @dataclass(frozen=True) 71 @dataclass(frozen=True)
@@ -254,13 +259,13 @@ class RuntimeConfig: @@ -254,13 +259,13 @@ class RuntimeConfig:
254 api_port: int = 6002 259 api_port: int = 6002
255 indexer_host: str = "0.0.0.0" 260 indexer_host: str = "0.0.0.0"
256 indexer_port: int = 6004 261 indexer_port: int = 6004
257 - embedding_host: str = "127.0.0.1" 262 + embedding_host: str = "0.0.0.0"
258 embedding_port: int = 6005 263 embedding_port: int = 6005
259 embedding_text_port: int = 6005 264 embedding_text_port: int = 6005
260 embedding_image_port: int = 6008 265 embedding_image_port: int = 6008
261 - translator_host: str = "127.0.0.1" 266 + translator_host: str = "0.0.0.0"
262 translator_port: int = 6006 267 translator_port: int = 6006
263 - reranker_host: str = "127.0.0.1" 268 + reranker_host: str = "0.0.0.0"
264 reranker_port: int = 6007 269 reranker_port: int = 6007
265 270
266 271
1 1
2 2
3 3
  4 +先阅读文本embedding相关的代码:
  5 +@embeddings/README.md @embeddings/server.py @docs/搜索API对接指南-07-微服务接口(Embedding-Reranker-Translation).md @embeddings/text_encoder.py
  6 +目前有TEXT_MAX_INFLIGHT / IMAGE_MAX_INFLIGHT 准入限制,超限返回过载状态码。
  7 +
  8 +文本embedding服务,要支持 priority 查询参数,priority > 0:不计入上述 inflight、不会因准入被拒绝(图片embedding不需要支持,因为只有离线需要用到图片embedding)
  9 +priority == 0(默认,适合做索引之类的离线任务):仍走原有 TEXT_MAX_INFLIGHT / IMAGE_MAX_INFLIGHT 准入;超限返回过载状态码。
  10 +priority > 0(或者==1)(适合在线请求):不会因准入被拒绝,但是仍然需要占用inflight,这样保证在线请求不被限制,并且在线请求很多的时候可以拒绝掉离线的请求。
  11 +
  12 +除了限制规则的修改,更进一步的,也需要保证这种请求是优先处理的(priority=1的相比=0的更优先被处理)。
  13 +关于技术方案,有Worker + 双队列、PriorityMutex等等,除此之外,也请你思考合适的方案。
  14 +成熟稳定、不带来复杂度、性能、稳定性方面的副作用,是最重要的。请先了解代码、需求,深度思考解决方案
  15 +
  16 +
  17 +
  18 +配置体系的重构。
  19 +
  20 +Referring to @docs/config-system-review-and-redesign.md , most of the modifications have been completed. Could you conduct a review to check what else needs improvement in the configuration documentation system? Are there any outstanding issues?
  21 +
  22 +一、仍然存在大量通过环境变量获取配置的地方
  23 +_SERVICE_KIND = (os.getenv("EMBEDDING_SERVICE_KIND", "all") or "all").strip().lower()
  24 +if _SERVICE_KIND not in {"all", "text", "image"}:
  25 + raise RuntimeError(
  26 + f"Invalid EMBEDDING_SERVICE_KIND={_SERVICE_KIND!r}; expected all, text, or image"
  27 + )
  28 +_TEXT_ENABLED_BY_ENV = os.getenv("EMBEDDING_ENABLE_TEXT_MODEL", "true").lower() in ("1", "true", "yes")
  29 +_IMAGE_ENABLED_BY_ENV = os.getenv("EMBEDDING_ENABLE_IMAGE_MODEL", "true").lower() in ("1", "true", "yes")
  30 +open_text_model = _TEXT_ENABLED_BY_ENV and _SERVICE_KIND in {"all", "text"}
  31 +open_image_model = _IMAGE_ENABLED_BY_ENV and _SERVICE_KIND in {"all", "image"}
  32 +
  33 +_text_encode_lock = threading.Lock()
  34 +_image_encode_lock = threading.Lock()
  35 +
  36 +_TEXT_MICROBATCH_WINDOW_SEC = max(
  37 + 0.0, float(os.getenv("TEXT_MICROBATCH_WINDOW_MS", "4")) / 1000.0
  38 +)
  39 +_TEXT_REQUEST_TIMEOUT_SEC = max(
  40 + 1.0, float(os.getenv("TEXT_REQUEST_TIMEOUT_SEC", "30"))
  41 +)
  42 +_TEXT_MAX_INFLIGHT = max(1, int(os.getenv("TEXT_MAX_INFLIGHT", "32")))
  43 +_IMAGE_MAX_INFLIGHT = max(1, int(os.getenv("IMAGE_MAX_INFLIGHT", "1")))
  44 +_OVERLOAD_STATUS_CODE = int(os.getenv("EMBEDDING_OVERLOAD_STATUS_CODE", "503"))
  45 +_LOG_PREVIEW_COUNT = max(1, int(os.getenv("EMBEDDING_LOG_PREVIEW_COUNT", "3")))
  46 +_LOG_TEXT_PREVIEW_CHARS = max(32, int(os.getenv("EMBEDDING_LOG_TEXT_PREVIEW_CHARS", "120")))
  47 +_LOG_IMAGE_PREVIEW_CHARS = max(32, int(os.getenv("EMBEDDING_LOG_IMAGE_PREVIEW_CHARS", "180")))
  48 +_VECTOR_PREVIEW_DIMS = max(1, int(os.getenv("EMBEDDING_VECTOR_PREVIEW_DIMS", "6")))
  49 +_CACHE_PREFIX = str(REDIS_CONFIG.get("embedding_cache_prefix", "embedding")).strip() or "embedding"
  50 +
  51 +
  52 +
  53 +
  54 +
  55 +还有这些写死的地址 @embedding/config.py
  56 +
  57 +self.TEI_BASE_URL = str(text_backend.get("base_url") or "http://127.0.0.1:8080")
  58 +self.TEI_TIMEOUT_SEC = int(text_backend.get("timeout_sec", 60))
  59 +
  60 +self.USE_CLIP_AS_SERVICE = services.image_backend == "clip_as_service"
  61 +self.CLIP_AS_SERVICE_SERVER = str(image_backend.get("server") or "grpc://127.0.0.1:51000")
  62 +
  63 +
  64 +
  65 +
  66 +看起来似乎并没有完全遵循这些原则?
  67 +4. 重新设计的设计原则
  68 +重新设计应遵循以下规则。
  69 +
  70 +4.1 单一逻辑配置系统
  71 +可以有多个文件,但不能有多个职责重叠的加载器。
  72 +必须有一个加载器管道,能够生成一个类型化的 AppConfig 对象。
  73 +
  74 +4.2 配置文件负责声明,解析代码负责解释,环境变量负责运行时注入
  75 +职责应明确如下:
  76 +配置文件
  77 +声明非敏感的目标行为和可部署的非敏感设置
  78 +解析逻辑
  79 +加载、合并、验证、规范化并暴露类型化的配置
  80 +绝不发明隐藏的业务行为
  81 +环境变量
  82 +承载密钥和少量运行时/进程相关的值
  83 +不随意地重新定义业务行为
  84 +
  85 +4.3 整个系统采用单一的优先级规则
  86 +除非明确豁免,否则每个配置类别都应遵循相同的合并模型。
  87 +
  88 +4.4 业务行为不得有静默的隐式后备
  89 +在启动时,如果必需的配置缺失或无效,应快速失败。
  90 +不要静默地回退到诸如硬编码语言列表之类的遗留行为。
  91 +
  92 +4.5 有效配置必须可观测
  93 +每个服务都应能够展示:
  94 +配置版本或哈希值
  95 +加载的源文件
  96 +环境名称
  97 +经过清理的有效配置
  98 +
  99 +5. 推荐的目标设计
  100 +
  101 +5.1 边界模型
  102 +使用三个清晰的层级。
  103 +层级 1:代码仓库管理的静态配置
  104 +目的:
  105 +搜索行为
  106 +租户行为
  107 +提供商/后端注册表
  108 +非敏感的服务拓扑默认值
  109 +功能开关
  110 +示例:
  111 +字段权重
  112 +查询策略
  113 +重排序融合参数
  114 +租户语言方案
  115 +翻译能力注册表
  116 +嵌入后端选择默认值
  117 +
  118 +层级 2:特定环境的层叠配置
  119 +目的:
  120 +按环境区分的非敏感差异
  121 +按环境区分的服务端点
  122 +按环境区分的资源大小默认值
  123 +开发/测试/生产环境的运维差异
  124 +示例:
  125 +本地嵌入 URL 与生产环境嵌入 URL
  126 +开发环境重排序后端与生产环境重排序后端
  127 +本地开发环境中较低的并发度
  128 +
  129 +层级 3:环境变量
  130 +目的:
  131 +密钥
  132 +绑定主机/端口
  133 +外部基础设施凭证
  134 +容器编排器的最后一步注入
  135 +示例:
  136 +ES_HOST, ES_USERNAME, ES_PASSWORD
  137 +DB_HOST, DB_USERNAME, DB_PASSWORD
  138 +REDIS_HOST, REDIS_PASSWORD
  139 +DASHSCOPE_API_KEY, DEEPL_AUTH_KEY
  140 +API_HOST, API_PORT, INDEXER_PORT, TRANSLATION_PORT
  141 +规则:
  142 +环境变量不应成为选择业务行为(如翻译模型、嵌入后端或租户语言策略)的常规途径
  143 +如果允许对非敏感字段进行环境变量覆盖,则必须将其明确列出并记录为运维覆盖手段,而非隐藏的约定
  144 +
  145 +5.2 统一的优先级
  146 +推荐的优先级顺序:
  147 +代码中的模式(schema)默认值
  148 +config/base.yaml
  149 +config/environments/<env>.yaml
  150 +来自 config/tenants/ 的租户层叠配置
  151 +针对明确允许的运行时键的环境变量
  152 +仅适用于当前进程的命令行标志
  153 +重要规则:
  154 +只能有一个模块实现此合并逻辑
  155 +任何业务模块不得直接调用 os.getenv() 来获取配置
  156 +
  157 +
  158 +
  159 +二、cursor(composer2)帮我审查的
  160 +
  161 +## 配置文档系统审查
  162 +
  163 +这是对**配置文档系统**的简要审查,对照 [`docs/config-system-review-and-redesign.md`](docs/config-system-review-and-redesign.md) 和当前代码库进行。
  164 +
  165 +### 实现已匹配的部分
  166 +
  167 +代码库已朝着重新设计的目标迈进了很大一步:
  168 +
  169 +- **单一类型化根节点**:`AppConfig` 位于 [`config/schema.py`](config/schema.py),通过 [`config/loader.py`](config/loader.py) 和 [`get_app_config()`](config/loader.py) 加载。
  170 +- **精简的 `services_config`**:[`config/services_config.py`](config/services_config.py) 委托给 `get_app_config()`(解决了服务层的发现 A)。
  171 +- **租户目录**:[`config/tenant_config_loader.py`](config/tenant_config_loader.py) 使用 `get_app_config().tenants`。
  172 +- **重写资产路径**:`config/dictionaries/query_rewrite.dict` 已存在(解决了发现 E 中的文件名不匹配问题)。
  173 +- **可观测性**:[`GET /admin/config`](api/routes/admin.py) 返回**经过脱敏处理的有效配置树**;[`GET /admin/config/meta`](api/routes/admin.py) 暴露环境信息、`config_hash`、`loaded_files`、`deprecated_keys`(涵盖了 §5.10 的意图;文档中提议的是 `/admin/config/effective`,但实际实现位于 `/admin/config`)。
  174 +
  175 +因此,**"单一加载器 + 有效配置可见性"** 的故事在代码中已基本实现;文档尚未完全跟上。
  176 +
  177 +---
  178 +
  179 +## 文档问题(影响最大)
  180 +
  181 +### 1. 管理 API 文档中关于 `/admin/config` 的描述错误
  182 +
  183 +[`docs/搜索API对接指南.md`](docs/搜索API对接指南.md)(管理部分附近)和 [`docs/搜索API对接指南-06-管理接口(Admin).md`](docs/搜索API对接指南-06-管理接口(Admin).md) 仍将 `/admin/config` 描述为**按租户**的 JSON(包含 `tenant_id`、`es_index_name`、`supported_languages` 等字段)。实际实现返回的是 **`AppConfig.sanitized_dict()`**(完整的应用配置,敏感信息已脱敏),而不是租户摘要字段。
  184 +
  185 +**这些指南中还缺少:** `GET /admin/config/meta`。
  186 +
  187 +**健康检查:** 拆分指南中的示例包含了 [`HealthResponse`](api/models.py) 中不存在的字段(只有 `status` 和 `elasticsearch`)。
  188 +
  189 +对于任何仅根据文档进行 API 集成的人来说,这是最明显的"未解决问题"。
  190 +
  191 +### 2. 面向开发者的指南仍将 `services_config` 作为"配置解析器"的核心
  192 +
  193 +[`docs/DEVELOPER_GUIDE.md`](docs/DEVELOPER_GUIDE.md) §5.2 仍说搜索配置由 **`ConfigLoader`** 加载,服务由 **`config/services_config`** "解析"。§6.2 仍将 **`config/services_config.py`** 列为主要的"解析入口"。[`docs/QUICKSTART.md`](docs/QUICKSTART.md) §3.1 仍说"配置解析:`config/services_config.py`"。
  194 +
  195 +文档中准确的说法应该是:**规范入口是 `config/loader.py` + `get_app_config()`**;[`config/config_loader.py`](config/config_loader.py) 中的 `ConfigLoader` 包装了统一加载器;`services_config` 是现有调用点的**兼容性外观**。
  196 +
  197 +### 3. 重新设计文档本身不是"活的"状态文档
  198 +
  199 +[`docs/config-system-review-and-redesign.md`](docs/config-system-review-and-redesign.md) 读起来仍是**纯粹的问题陈述 + 目标**,没有简短的**"已实现 vs 剩余"**部分。这很容易让人假设什么都没做,或者重复工作。添加一个小附录(或一页 `config/README.md` —— 见下文)可以解决这个问题。
  200 +
  201 +### 4. 缺少 `config/README.md`(§5.3 中推荐)
  202 +
  203 +仍然没有专门的 **`config/README.md`** 来描述:加载器入口点、高级优先级、字典存放位置、指向 `/admin/config` + `/admin/config/meta` 的链接,以及重新设计文档的链接。这是重新设计中明确的交付物,可以锚定"文档系统"。
  204 +
  205 +### 5. 轻微的文档整洁问题
  206 +
  207 +- [`docs/QUICKSTART.md`](docs/QUICKSTART.md) §1.9 环境变量项目后的行有一个多余字符:`---·`(可能是打字错误)。
  208 +- [`docs/DEVELOPER_GUIDE.md`](docs/DEVELOPER_GUIDE.md) §10 **文档索引**没有列出 `config-system-review-and-redesign.md` 或未来的 `config/README.md`。
  209 +
  210 +---
  211 +
  212 +## 重新设计目标与当前代码之间的差距(文档不应声称"已完成")
  213 +
  214 +这些影响文档的诚实度:
  215 +
  216 +| 主题 | 状态 |
  217 +|--------|--------|
  218 +| **`config dump` CLI**(§5.10) | `main.py` 中不存在;运维人员依赖 HTTP 或临时脚本。 |
  219 +| **隐藏的 `["en", "zh"]` 回退**(阶段 3 / 发现 D) | 仍在 [`indexer/document_transformer.py`](indexer/document_transformer.py)、[`suggestion/builder.py`](suggestion/builder.py) 等中使用。 |
  220 +| **加载器外的 `os.getenv`**(规则 1–2) | 仍在例如 [`embeddings/server.py`](embeddings/server.py)、[`reranker/server.py`](reranker/server.py)、[`api/app.py`](api/app.py) 中使用 —— 文档声称"仅加载器"将是夸大其词。 |
  221 +| **拆分 `base.yaml` / `environments/` / `tenants/*.yaml`**(阶段 5) | 未采用;仍是单一的 [`config/config.yaml`](config/config.yaml)。 |
  222 +| **遗留租户标志**(阶段 6 / 发现 H) | [`indexer/README.md`](indexer/README.md) 仍描述上游 MySQL 的 `translate_to_en` / `translate_to_zh`(这可能作为**上游模式**文档保留;需与 Python `tenant_config` 模型区分开来)。 |
  223 +
  224 +---
  225 +
  226 +## 推荐的后续步骤(仅文档,按优先级排序)
  227 +
  228 +1. **修复管理 API 文档**(合并指南 + `-06-` 拆分):`/admin/config` 的响应格式,添加 `/admin/config/meta`,使健康检查示例与 [`HealthResponse`](api/models.py) 一致。
  229 +2. **更新 DEVELOPER_GUIDE §5–§6 和 QUICKSTART §1.9 / §3.1**,将 **`get_app_config()` / `loader.py`** 描述为主要入口,将 **`services_config`** 描述为适配器。
  230 +3. **添加 `config/README.md`**(简短的操作 + 开发者入口)。
  231 +4. **在 `config-system-review-and-redesign.md` 中添加**带日期的**实现状态表**(已交付 vs 推迟的内容),使审查文档不与现实矛盾。
  232 +5. **DEVELOPER_GUIDE §9 检查清单**:将"配置来自 `services_config`"替换为允许 `get_app_config()` 或精简适配器的语言,与 §6 保持一致。
  233 +
  234 +如果需要,我可以在后续处理中为项目 1–3 和重新设计文档中的简短状态块应用补丁。
  235 +
  236 +
  237 +
  238 +
  239 +
  240 +
  241 +
  242 +
  243 +
  244 +
  245 +
  246 +
  247 +
  248 +
  249 +
  250 +检索相关性优化:
  251 +原始搜索词和翻译的词,都需要有对应的主干分析
  252 +这个主干可以根据词性简单提取名词即可
  253 +在搜索时,原始词和主干都成对地出现,原始词和trunk_keywords一起组成一个或查询。
  254 +有一种方案是把原始词和主干词拼接起来。但是bm25要调tf系数。
  255 +
  256 +
  257 +
  258 +
4 nllb-200-distilled-600M性能优化 259 nllb-200-distilled-600M性能优化
5 已完成(2026-03) 260 已完成(2026-03)
6 - CTranslate2 迁移 + float16 转换 261 - CTranslate2 迁移 + float16 转换
docs/搜索API对接指南-07-微服务接口(Embedding-Reranker-Translation).md
@@ -38,6 +38,10 @@ @@ -38,6 +38,10 @@
38 - `TEXT_MAX_INFLIGHT` 38 - `TEXT_MAX_INFLIGHT`
39 - `IMAGE_MAX_INFLIGHT` 39 - `IMAGE_MAX_INFLIGHT`
40 - 当超过处理能力时,服务会直接返回过载错误,而不是无限排队。 40 - 当超过处理能力时,服务会直接返回过载错误,而不是无限排队。
  41 +- 文本与图片服务均支持 `priority` query 参数(图片不做队列插队,仅 admission 规则与文本一致):
  42 + - `priority=0`(默认):适合离线索引,仍分别受 `TEXT_MAX_INFLIGHT` / `IMAGE_MAX_INFLIGHT` admission control 约束。
  43 + - `priority>0`(建议在线请求用 `1`):不会因 admission control 被拒绝,但仍会占用对应 text/image 的 inflight。
  44 + - 文本服务端会优先处理高优先级文本请求;图片端不实现插队,顺序按请求到达处理即可。
41 - `GET /health` 会返回各自的 `limits`、`stats`、`cache_enabled` 等状态;`GET /ready` 用于就绪探针。 45 - `GET /health` 会返回各自的 `limits`、`stats`、`cache_enabled` 等状态;`GET /ready` 用于就绪探针。
42 46
43 #### 7.1.1 `POST /embed/text` — 文本向量化 47 #### 7.1.1 `POST /embed/text` — 文本向量化
@@ -59,11 +63,15 @@ @@ -59,11 +63,15 @@
59 **完整 curl 示例**: 63 **完整 curl 示例**:
60 64
61 ```bash 65 ```bash
62 -curl -X POST "http://localhost:6005/embed/text?normalize=true" \ 66 +curl -X POST "http://localhost:6005/embed/text?normalize=true&priority=1" \
63 -H "Content-Type: application/json" \ 67 -H "Content-Type: application/json" \
64 -d '["芭比娃娃 儿童玩具", "纯棉T恤 短袖"]' 68 -d '["芭比娃娃 儿童玩具", "纯棉T恤 短袖"]'
65 ``` 69 ```
66 70
  71 +说明:
  72 +- 在线 query / 实时请求:建议显式传 `priority=1`
  73 +- 离线索引 / 批量回填:保持默认 `priority=0` 即可
  74 +
67 #### 7.1.2 `POST /embed/image` — 图片向量化 75 #### 7.1.2 `POST /embed/image` — 图片向量化
68 76
69 将图片 URL 或路径转为向量,用于以图搜图。 77 将图片 URL 或路径转为向量,用于以图搜图。
@@ -85,11 +93,13 @@ curl -X POST &quot;http://localhost:6005/embed/text?normalize=true&quot; \ @@ -85,11 +93,13 @@ curl -X POST &quot;http://localhost:6005/embed/text?normalize=true&quot; \
85 **完整 curl 示例**: 93 **完整 curl 示例**:
86 94
87 ```bash 95 ```bash
88 -curl -X POST "http://localhost:6008/embed/image?normalize=true" \ 96 +curl -X POST "http://localhost:6008/embed/image?normalize=true&priority=1" \
89 -H "Content-Type: application/json" \ 97 -H "Content-Type: application/json" \
90 -d '["https://oss.essa.cn/98532128-cf8e-456c-9e30-6f2a5ea0c19f.jpg"]' 98 -d '["https://oss.essa.cn/98532128-cf8e-456c-9e30-6f2a5ea0c19f.jpg"]'
91 ``` 99 ```
92 100
  101 +在线以图搜图等实时场景可传 `priority=1`;离线索引回填保持默认 `priority=0`。
  102 +
93 #### 7.1.3 `GET /health` — 健康检查 103 #### 7.1.3 `GET /health` — 健康检查
94 104
95 ```bash 105 ```bash
@@ -118,6 +128,8 @@ curl &quot;http://localhost:6008/ready&quot; @@ -118,6 +128,8 @@ curl &quot;http://localhost:6008/ready&quot;
118 - cache key 已区分 `normalize=true/false`,避免不同归一化策略命中同一条缓存。 128 - cache key 已区分 `normalize=true/false`,避免不同归一化策略命中同一条缓存。
119 - 当服务端发现请求是 **full-cache-hit** 时,会直接返回,不占用模型并发槽位。 129 - 当服务端发现请求是 **full-cache-hit** 时,会直接返回,不占用模型并发槽位。
120 - 当服务端发现超过 `TEXT_MAX_INFLIGHT` / `IMAGE_MAX_INFLIGHT` 时,会直接拒绝,而不是无限排队。 130 - 当服务端发现超过 `TEXT_MAX_INFLIGHT` / `IMAGE_MAX_INFLIGHT` 时,会直接拒绝,而不是无限排队。
  131 +- 其中 `POST /embed/text` 的 `priority=0` 会按上面的 inflight 规则直接拒绝;`priority>0` 不会被 admission 拒绝,但仍计入 inflight,并在服务端排队时优先于 `priority=0` 请求。
  132 +- `POST /embed/image` 的 `priority=0` 受 `IMAGE_MAX_INFLIGHT` 约束;`priority>0` 不会被 admission 拒绝,但仍计入 inflight(无插队)。
121 133
122 #### 7.1.6 TEI 统一调优建议(主服务) 134 #### 7.1.6 TEI 统一调优建议(主服务)
123 135
@@ -252,9 +264,9 @@ curl &quot;http://localhost:6007/health&quot; @@ -252,9 +264,9 @@ curl &quot;http://localhost:6007/health&quot;
252 - 如果是en-zh互译、期待更高的速度,可以考虑`opus-mt-zh-en` / `opus-mt-en-zh`。(质量未详细评测,一些文章说比blib-200-600m更好,但是我看了些case感觉要差不少) 264 - 如果是en-zh互译、期待更高的速度,可以考虑`opus-mt-zh-en` / `opus-mt-en-zh`。(质量未详细评测,一些文章说比blib-200-600m更好,但是我看了些case感觉要差不少)
253 265
254 **实时翻译选型建议**: 266 **实时翻译选型建议**:
255 -- 在线 query 翻译如果只是 `en/zh` 互译,优先使用 `opus-mt-zh-en` 或 `opus-mt-en-zh`,它们是当前已测本地模型里延迟最低的一档 267 +- 在线 query 翻译如果只是 `en/zh` 互译,优先使用 `opus-mt-zh-en` 或 `opus-mt-en-zh`
256 - 如果涉及其他语言,或对质量要求高于本地轻量模型,优先考虑 `deepl`。 268 - 如果涉及其他语言,或对质量要求高于本地轻量模型,优先考虑 `deepl`。
257 -- `nllb-200-distilled-600m` 不建议作为在线 query 翻译默认方案;我们在 `Tesla T4` 上测到 `batch_size=1` 时,`zh -> en` p50 约 `292.54 ms`、p95 约 `624.12 ms`,`en -> zh` p50 约 `481.61 ms`、p95 约 `1171.71 ms` 269 +- `nllb-200-distilled-600m` 不建议作为在线 query 翻译默认方案;我们在 `Tesla T4` 上测到 `batch_size=1` 时,根据query长短,耗时大概在70-150ms之间
258 270
259 **Batch Size / 调用方式建议**: 271 **Batch Size / 调用方式建议**:
260 - 本接口支持 `text: string[]`;离线或批量索引翻译时,应尽量合并请求,让底层 backend 发挥批处理能力。 272 - 本接口支持 `text: string[]`;离线或批量索引翻译时,应尽量合并请求,让底层 backend 发挥批处理能力。
docs/搜索API速查表.md renamed to docs/搜索API对接指南-速查表.md
docs/相关性检索优化说明.md
@@ -17,9 +17,9 @@ @@ -17,9 +17,9 @@
17 查询链路(文本相关): 17 查询链路(文本相关):
18 18
19 1. `QueryParser.parse()` 19 1. `QueryParser.parse()`
20 - 输出 `detected_language`、`query_text_by_lang`、`search_langs`、`index_languages`、`source_in_index_languages` 20 + 输出 `detected_language`、`query_text_by_lang`、`search_langs`、`index_languages`、`source_in_index_languages`;另输出 `contains_chinese` / `contains_english`(仅服务混写辅助召回,见 §4 末)
21 2. `ESQueryBuilder._build_advanced_text_query()` 21 2. `ESQueryBuilder._build_advanced_text_query()`
22 - 按 `search_langs` 动态拼接 `title/brief/description/vendor/category_*` 的 `.{lang}` 字段,叠加 shared 字段(`tags`、`option*_values`) 22 + 按 `search_langs` 动态拼接 `title/brief/description/vendor/category_*` 的 `.{lang}` 字段,叠加 shared 字段(`tags`、`option*_values`);若命中混写辅助条件,在同一子句内并入另一语种列(§4 末)
23 3. `build_query()` 23 3. `build_query()`
24 统一走文本策略,不再有布尔 AST 枝路。 24 统一走文本策略,不再有布尔 AST 枝路。
25 25
@@ -40,14 +40,20 @@ @@ -40,14 +40,20 @@
40 3. 若第 2 步翻译部分失败或全部失败: 40 3. 若第 2 步翻译部分失败或全部失败:
41 对缺失翻译的 `index_languages` 字段,追加“原文低权重兜底”子句,避免完全丢失这些语种索引面的召回机会。 41 对缺失翻译的 `index_languages` 字段,追加“原文低权重兜底”子句,避免完全丢失这些语种索引面的召回机会。
42 42
43 -### 3.2 翻译等待策略 43 +### 3.2 翻译与向量:并发提交与共享超时
44 44
45 -`QueryParser.parse()` 45 +`QueryParser.parse()` 内(Stage 4–6)对**离线调用**采用线程池提交 + **一次** `concurrent.futures.wait`
46 46
47 -- 当源语种不在 `index_languages`:使用 `translate_multi_async(...)` 并等待 futures 收敛  
48 -- 当源语种在 `index_languages`:使用 `translate_multi(..., async_mode=True)`,优先缓存命中,未命中可后台补齐 47 +- **翻译**:对 `index_languages` 中除 `detected_language` 外的每个目标语种各提交一个 `translator.translate` 任务(多目标时并发执行)。
  48 +- **查询向量**(若开启 `enable_text_embedding` 且域为 default):再提交一个 `text_encoder.encode` 任务。
  49 +- 上述任务进入**同一** future 集合;例如租户索引为 `[zh, en]` 且检测语种**不在**索引内时,常为 **2 路翻译 + 1 路向量,共 3 个任务并发**,共用超时。
49 50
50 -这保证了“必须翻译才能检索”的场景不会直接空跑。 51 +**等待预算(毫秒)**由 `detected_language` 是否属于租户 `index_languages` 决定(`query_config`):
  52 +
  53 +- **在索引内**:`translation_embedding_wait_budget_ms_source_in_index`(默认较短,如 80ms)— 主召回已能打在源语种字段,翻译/向量稍慢可容忍。
  54 +- **不在索引内**:`translation_embedding_wait_budget_ms_source_not_in_index`(默认较长,如 200ms)— 翻译对可检索文本更关键,给足时间。
  55 +
  56 +超时未完成的任务会被丢弃并记 warning,解析继续(可能无部分译文或无数向量)。
51 57
52 ## 4. 统一文本召回表达式 58 ## 4. 统一文本召回表达式
53 59
@@ -68,8 +74,16 @@ @@ -68,8 +74,16 @@
68 74
69 最终按 `bool.should` 组合,`minimum_should_match: 1`。 75 最终按 `bool.should` 组合,`minimum_should_match: 1`。
70 76
  77 +> **附 — 混写辅助召回**
  78 +> 当中英(或多脚本)混写时,为略抬召回:`QueryParser` 用 `contains_chinese`(文中有汉字)、`contains_english`(分词中有长度 ≥3 的纯英文 token)打标;`ESQueryBuilder` 在某一语言的 `multi_match` 上,按规则把**另一语种**的同类字段并入同一 `fields`(受 `index_languages` 限制),并入列的 boost 为配置值再乘 **`mixed_script_merged_field_boost_scale`(默认 0.8,`ESQueryBuilder` 构造参数)**。`fallback_original_query_*` 同样适用。字段在内部以 `(path, boost)` 列表合并后再格式化为 ES 字符串。
  79 +
71 ## 5. 关键配置项(文本策略) 80 ## 5. 关键配置项(文本策略)
72 81
  82 +`query_config` 下与解析等待相关的项:
  83 +
  84 +- `translation_embedding_wait_budget_ms_source_in_index`
  85 +- `translation_embedding_wait_budget_ms_source_not_in_index`
  86 +
73 位于 `config/config.yaml -> query_config.text_query_strategy`: 87 位于 `config/config.yaml -> query_config.text_query_strategy`:
74 88
75 - `base_minimum_should_match` 89 - `base_minimum_should_match`
@@ -137,6 +151,7 @@ @@ -137,6 +151,7 @@
137 - `query_text_by_lang` 151 - `query_text_by_lang`
138 - `source_in_index_languages` 152 - `source_in_index_languages`
139 - `index_languages` 153 - `index_languages`
  154 + - `contains_chinese` / `contains_english`
140 - `ESQueryBuilder` 负责“表达式展开”: 155 - `ESQueryBuilder` 负责“表达式展开”:
141 - 动态字段组装 156 - 动态字段组装
142 - 子句权重分配 157 - 子句权重分配
embeddings/README.md
@@ -30,13 +30,13 @@ @@ -30,13 +30,13 @@
30 - 文本服务(默认 `6005`) 30 - 文本服务(默认 `6005`)
31 - `POST /embed/text` 31 - `POST /embed/text`
32 - 请求体:`["文本1", "文本2", ...]` 32 - 请求体:`["文本1", "文本2", ...]`
33 - - 可选 query 参数:`normalize=true|false` 33 + - 可选 query 参数:`normalize=true|false`、`priority=0|1`
34 - 返回:`[[...], [...], ...]` 34 - 返回:`[[...], [...], ...]`
35 - 健康接口:`GET /health`、`GET /ready` 35 - 健康接口:`GET /health`、`GET /ready`
36 - 图片服务(默认 `6008`) 36 - 图片服务(默认 `6008`)
37 - `POST /embed/image` 37 - `POST /embed/image`
38 - 请求体:`["url或本地路径1", ...]` 38 - 请求体:`["url或本地路径1", ...]`
39 - - 可选 query 参数:`normalize=true|false` 39 + - 可选 query 参数:`normalize=true|false`、`priority=0|1`
40 - 返回:`[[...], [...], ...]` 40 - 返回:`[[...], [...], ...]`
41 - 健康接口:`GET /health`、`GET /ready` 41 - 健康接口:`GET /health`、`GET /ready`
42 42
@@ -61,6 +61,11 @@ @@ -61,6 +61,11 @@
61 - 图片服务可以配置得比文本更严格。 61 - 图片服务可以配置得比文本更严格。
62 - 请求若是 full-cache-hit,会在服务端直接返回,不占用模型并发槽位。 62 - 请求若是 full-cache-hit,会在服务端直接返回,不占用模型并发槽位。
63 - 超过处理能力时直接拒绝,比无限排队更稳定。 63 - 超过处理能力时直接拒绝,比无限排队更稳定。
  64 +- 文本服务支持 `priority`:
  65 + - `priority=0`(默认,适合离线索引)仍受 `TEXT_MAX_INFLIGHT` 限制,超限直接返回 overload。
  66 + - `priority>0`(建议在线 query 用 `1`)不会因 admission control 被拒绝,但仍会计入 inflight。
  67 + - 文本服务内部使用双队列调度,处理时会优先消费高优先级请求,避免在线请求长期排在离线批量任务后面。
  68 +- 图片服务同样支持 `priority`(语义与文本一致,按 `IMAGE_MAX_INFLIGHT` 计数;不做队列插队,仅 admission 规则不同)。
64 69
65 ### 图片向量:clip-as-service(推荐) 70 ### 图片向量:clip-as-service(推荐)
66 71
@@ -86,6 +91,14 @@ @@ -86,6 +91,14 @@
86 - `CLIP_AS_SERVICE_MODEL_NAME=CN-CLIP/ViT-L-14` 91 - `CLIP_AS_SERVICE_MODEL_NAME=CN-CLIP/ViT-L-14`
87 - `scripts/start_cnclip_service.sh` 默认会读取同一个 `CLIP_AS_SERVICE_MODEL_NAME`,也可用 `CNCLIP_MODEL_NAME` 或 `--model-name` 临时覆盖 92 - `scripts/start_cnclip_service.sh` 默认会读取同一个 `CLIP_AS_SERVICE_MODEL_NAME`,也可用 `CNCLIP_MODEL_NAME` 或 `--model-name` 临时覆盖
88 93
  94 +### 性能与压测(沿用仓库脚本)
  95 +
  96 +- 接口级压测(与 `perf_reports/2026-03-12/matrix_report/` 等方法一致):`scripts/perf_api_benchmark.py`
  97 + - 示例:`python scripts/perf_api_benchmark.py --scenario embed_text --duration 30 --concurrency 20`
  98 + - 文本/图片向量可带 `priority`(与线上 admission 语义一致):`--embed-text-priority 1`、`--embed-image-priority 1`
  99 + - 自定义请求模板:`--cases-file scripts/perf_cases.json.example`
  100 +- 历史矩阵结果与说明见 `perf_reports/2026-03-12/matrix_report/summary.md`。
  101 +
89 ### 启动服务 102 ### 启动服务
90 103
91 使用仓库脚本启动: 104 使用仓库脚本启动:
embeddings/image_encoder.py
@@ -35,7 +35,12 @@ class CLIPImageEncoder: @@ -35,7 +35,12 @@ class CLIPImageEncoder:
35 namespace="image", 35 namespace="image",
36 ) 36 )
37 37
38 - def _call_service(self, request_data: List[str], normalize_embeddings: bool = True) -> List[Any]: 38 + def _call_service(
  39 + self,
  40 + request_data: List[str],
  41 + normalize_embeddings: bool = True,
  42 + priority: int = 0,
  43 + ) -> List[Any]:
39 """ 44 """
40 Call the embedding service API. 45 Call the embedding service API.
41 46
@@ -48,7 +53,10 @@ class CLIPImageEncoder: @@ -48,7 +53,10 @@ class CLIPImageEncoder:
48 try: 53 try:
49 response = requests.post( 54 response = requests.post(
50 self.endpoint, 55 self.endpoint,
51 - params={"normalize": "true" if normalize_embeddings else "false"}, 56 + params={
  57 + "normalize": "true" if normalize_embeddings else "false",
  58 + "priority": max(0, int(priority)),
  59 + },
52 json=request_data, 60 json=request_data,
53 timeout=60 61 timeout=60
54 ) 62 )
@@ -66,7 +74,12 @@ class CLIPImageEncoder: @@ -66,7 +74,12 @@ class CLIPImageEncoder:
66 """ 74 """
67 raise NotImplementedError("encode_image with PIL Image is not supported by embedding service") 75 raise NotImplementedError("encode_image with PIL Image is not supported by embedding service")
68 76
69 - def encode_image_from_url(self, url: str, normalize_embeddings: bool = True) -> np.ndarray: 77 + def encode_image_from_url(
  78 + self,
  79 + url: str,
  80 + normalize_embeddings: bool = True,
  81 + priority: int = 0,
  82 + ) -> np.ndarray:
70 """ 83 """
71 Generate image embedding via network service using URL. 84 Generate image embedding via network service using URL.
72 85
@@ -81,7 +94,11 @@ class CLIPImageEncoder: @@ -81,7 +94,11 @@ class CLIPImageEncoder:
81 if cached is not None: 94 if cached is not None:
82 return cached 95 return cached
83 96
84 - response_data = self._call_service([url], normalize_embeddings=normalize_embeddings) 97 + response_data = self._call_service(
  98 + [url],
  99 + normalize_embeddings=normalize_embeddings,
  100 + priority=priority,
  101 + )
85 if not response_data or len(response_data) != 1 or response_data[0] is None: 102 if not response_data or len(response_data) != 1 or response_data[0] is None:
86 raise RuntimeError(f"No image embedding returned for URL: {url}") 103 raise RuntimeError(f"No image embedding returned for URL: {url}")
87 vec = np.array(response_data[0], dtype=np.float32) 104 vec = np.array(response_data[0], dtype=np.float32)
@@ -95,6 +112,7 @@ class CLIPImageEncoder: @@ -95,6 +112,7 @@ class CLIPImageEncoder:
95 images: List[Union[str, Image.Image]], 112 images: List[Union[str, Image.Image]],
96 batch_size: int = 8, 113 batch_size: int = 8,
97 normalize_embeddings: bool = True, 114 normalize_embeddings: bool = True,
  115 + priority: int = 0,
98 ) -> List[np.ndarray]: 116 ) -> List[np.ndarray]:
99 """ 117 """
100 Encode a batch of images efficiently via network service. 118 Encode a batch of images efficiently via network service.
@@ -129,7 +147,11 @@ class CLIPImageEncoder: @@ -129,7 +147,11 @@ class CLIPImageEncoder:
129 147
130 for i in range(0, len(pending_urls), batch_size): 148 for i in range(0, len(pending_urls), batch_size):
131 batch_urls = pending_urls[i : i + batch_size] 149 batch_urls = pending_urls[i : i + batch_size]
132 - response_data = self._call_service(batch_urls, normalize_embeddings=normalize_embeddings) 150 + response_data = self._call_service(
  151 + batch_urls,
  152 + normalize_embeddings=normalize_embeddings,
  153 + priority=priority,
  154 + )
133 if not response_data or len(response_data) != len(batch_urls): 155 if not response_data or len(response_data) != len(batch_urls):
134 raise RuntimeError( 156 raise RuntimeError(
135 f"Image embedding response length mismatch: expected {len(batch_urls)}, " 157 f"Image embedding response length mismatch: expected {len(batch_urls)}, "
@@ -153,6 +175,7 @@ class CLIPImageEncoder: @@ -153,6 +175,7 @@ class CLIPImageEncoder:
153 urls: List[str], 175 urls: List[str],
154 batch_size: Optional[int] = None, 176 batch_size: Optional[int] = None,
155 normalize_embeddings: bool = True, 177 normalize_embeddings: bool = True,
  178 + priority: int = 0,
156 ) -> List[np.ndarray]: 179 ) -> List[np.ndarray]:
157 """ 180 """
158 与 ClipImageModel / ClipAsServiceImageEncoder 一致的接口,供索引器 document_transformer 调用。 181 与 ClipImageModel / ClipAsServiceImageEncoder 一致的接口,供索引器 document_transformer 调用。
@@ -168,4 +191,5 @@ class CLIPImageEncoder: @@ -168,4 +191,5 @@ class CLIPImageEncoder:
168 urls, 191 urls,
169 batch_size=batch_size or 8, 192 batch_size=batch_size or 8,
170 normalize_embeddings=normalize_embeddings, 193 normalize_embeddings=normalize_embeddings,
  194 + priority=priority,
171 ) 195 )
embeddings/server.py
@@ -129,7 +129,7 @@ _TEXT_REQUEST_TIMEOUT_SEC = max( @@ -129,7 +129,7 @@ _TEXT_REQUEST_TIMEOUT_SEC = max(
129 1.0, float(os.getenv("TEXT_REQUEST_TIMEOUT_SEC", "30")) 129 1.0, float(os.getenv("TEXT_REQUEST_TIMEOUT_SEC", "30"))
130 ) 130 )
131 _TEXT_MAX_INFLIGHT = max(1, int(os.getenv("TEXT_MAX_INFLIGHT", "32"))) 131 _TEXT_MAX_INFLIGHT = max(1, int(os.getenv("TEXT_MAX_INFLIGHT", "32")))
132 -_IMAGE_MAX_INFLIGHT = max(1, int(os.getenv("IMAGE_MAX_INFLIGHT", "1"))) 132 +_IMAGE_MAX_INFLIGHT = max(1, int(os.getenv("IMAGE_MAX_INFLIGHT", "20")))
133 _OVERLOAD_STATUS_CODE = int(os.getenv("EMBEDDING_OVERLOAD_STATUS_CODE", "503")) 133 _OVERLOAD_STATUS_CODE = int(os.getenv("EMBEDDING_OVERLOAD_STATUS_CODE", "503"))
134 _LOG_PREVIEW_COUNT = max(1, int(os.getenv("EMBEDDING_LOG_PREVIEW_COUNT", "3"))) 134 _LOG_PREVIEW_COUNT = max(1, int(os.getenv("EMBEDDING_LOG_PREVIEW_COUNT", "3")))
135 _LOG_TEXT_PREVIEW_CHARS = max(32, int(os.getenv("EMBEDDING_LOG_TEXT_PREVIEW_CHARS", "120"))) 135 _LOG_TEXT_PREVIEW_CHARS = max(32, int(os.getenv("EMBEDDING_LOG_TEXT_PREVIEW_CHARS", "120")))
@@ -206,23 +206,24 @@ class _InflightLimiter: @@ -206,23 +206,24 @@ class _InflightLimiter:
206 def __init__(self, name: str, limit: int): 206 def __init__(self, name: str, limit: int):
207 self.name = name 207 self.name = name
208 self.limit = max(1, int(limit)) 208 self.limit = max(1, int(limit))
209 - self._sem = threading.BoundedSemaphore(self.limit)  
210 self._lock = threading.Lock() 209 self._lock = threading.Lock()
211 self._active = 0 210 self._active = 0
212 self._rejected = 0 211 self._rejected = 0
213 self._completed = 0 212 self._completed = 0
214 self._failed = 0 213 self._failed = 0
215 self._max_active = 0 214 self._max_active = 0
  215 + self._priority_bypass_total = 0
216 216
217 - def try_acquire(self) -> tuple[bool, int]:  
218 - if not self._sem.acquire(blocking=False):  
219 - with self._lock: 217 + def try_acquire(self, *, bypass_limit: bool = False) -> tuple[bool, int]:
  218 + with self._lock:
  219 + if not bypass_limit and self._active >= self.limit:
220 self._rejected += 1 220 self._rejected += 1
221 active = self._active 221 active = self._active
222 - return False, active  
223 - with self._lock: 222 + return False, active
224 self._active += 1 223 self._active += 1
225 self._max_active = max(self._max_active, self._active) 224 self._max_active = max(self._max_active, self._active)
  225 + if bypass_limit:
  226 + self._priority_bypass_total += 1
226 active = self._active 227 active = self._active
227 return True, active 228 return True, active
228 229
@@ -234,7 +235,6 @@ class _InflightLimiter: @@ -234,7 +235,6 @@ class _InflightLimiter:
234 else: 235 else:
235 self._failed += 1 236 self._failed += 1
236 active = self._active 237 active = self._active
237 - self._sem.release()  
238 return active 238 return active
239 239
240 def snapshot(self) -> Dict[str, int]: 240 def snapshot(self) -> Dict[str, int]:
@@ -246,9 +246,157 @@ class _InflightLimiter: @@ -246,9 +246,157 @@ class _InflightLimiter:
246 "completed_total": self._completed, 246 "completed_total": self._completed,
247 "failed_total": self._failed, 247 "failed_total": self._failed,
248 "max_active": self._max_active, 248 "max_active": self._max_active,
  249 + "priority_bypass_total": self._priority_bypass_total,
249 } 250 }
250 251
251 252
  253 +def _effective_priority(priority: int) -> int:
  254 + return 1 if int(priority) > 0 else 0
  255 +
  256 +
  257 +def _priority_label(priority: int) -> str:
  258 + return "high" if _effective_priority(priority) > 0 else "normal"
  259 +
  260 +
  261 +@dataclass
  262 +class _TextDispatchTask:
  263 + normalized: List[str]
  264 + effective_normalize: bool
  265 + request_id: str
  266 + priority: int
  267 + created_at: float
  268 + done: threading.Event
  269 + result: Optional[_EmbedResult] = None
  270 + error: Optional[Exception] = None
  271 +
  272 +
  273 +_text_dispatch_high_queue: "deque[_TextDispatchTask]" = deque()
  274 +_text_dispatch_normal_queue: "deque[_TextDispatchTask]" = deque()
  275 +_text_dispatch_cv = threading.Condition()
  276 +_text_dispatch_workers: List[threading.Thread] = []
  277 +_text_dispatch_worker_stop = False
  278 +_text_dispatch_worker_count = 0
  279 +
  280 +
  281 +def _text_dispatch_queue_depth() -> Dict[str, int]:
  282 + with _text_dispatch_cv:
  283 + return {
  284 + "high": len(_text_dispatch_high_queue),
  285 + "normal": len(_text_dispatch_normal_queue),
  286 + "total": len(_text_dispatch_high_queue) + len(_text_dispatch_normal_queue),
  287 + }
  288 +
  289 +
  290 +def _pop_text_dispatch_task_locked() -> Optional["_TextDispatchTask"]:
  291 + if _text_dispatch_high_queue:
  292 + return _text_dispatch_high_queue.popleft()
  293 + if _text_dispatch_normal_queue:
  294 + return _text_dispatch_normal_queue.popleft()
  295 + return None
  296 +
  297 +
  298 +def _start_text_dispatch_workers() -> None:
  299 + global _text_dispatch_workers, _text_dispatch_worker_stop, _text_dispatch_worker_count
  300 + if _text_model is None:
  301 + return
  302 + target_worker_count = 1 if _text_backend_name == "local_st" else _TEXT_MAX_INFLIGHT
  303 + alive_workers = [worker for worker in _text_dispatch_workers if worker.is_alive()]
  304 + if len(alive_workers) == target_worker_count:
  305 + _text_dispatch_workers = alive_workers
  306 + _text_dispatch_worker_count = target_worker_count
  307 + return
  308 + _text_dispatch_worker_stop = False
  309 + _text_dispatch_worker_count = target_worker_count
  310 + _text_dispatch_workers = []
  311 + for idx in range(target_worker_count):
  312 + worker = threading.Thread(
  313 + target=_text_dispatch_worker_loop,
  314 + args=(idx,),
  315 + name=f"embed-text-dispatch-{idx}",
  316 + daemon=True,
  317 + )
  318 + worker.start()
  319 + _text_dispatch_workers.append(worker)
  320 + logger.info(
  321 + "Started text dispatch workers | backend=%s workers=%d",
  322 + _text_backend_name,
  323 + target_worker_count,
  324 + )
  325 +
  326 +
  327 +def _stop_text_dispatch_workers() -> None:
  328 + global _text_dispatch_worker_stop
  329 + with _text_dispatch_cv:
  330 + _text_dispatch_worker_stop = True
  331 + _text_dispatch_cv.notify_all()
  332 +
  333 +
  334 +def _text_dispatch_worker_loop(worker_idx: int) -> None:
  335 + while True:
  336 + with _text_dispatch_cv:
  337 + while (
  338 + not _text_dispatch_high_queue
  339 + and not _text_dispatch_normal_queue
  340 + and not _text_dispatch_worker_stop
  341 + ):
  342 + _text_dispatch_cv.wait()
  343 + if _text_dispatch_worker_stop:
  344 + return
  345 + task = _pop_text_dispatch_task_locked()
  346 + if task is None:
  347 + continue
  348 + try:
  349 + queue_wait_ms = (time.perf_counter() - task.created_at) * 1000.0
  350 + logger.info(
  351 + "text dispatch start | worker=%d priority=%s inputs=%d queue_wait_ms=%.2f",
  352 + worker_idx,
  353 + _priority_label(task.priority),
  354 + len(task.normalized),
  355 + queue_wait_ms,
  356 + extra=_request_log_extra(task.request_id),
  357 + )
  358 + task.result = _embed_text_impl(
  359 + task.normalized,
  360 + task.effective_normalize,
  361 + task.request_id,
  362 + task.priority,
  363 + )
  364 + except Exception as exc:
  365 + task.error = exc
  366 + finally:
  367 + task.done.set()
  368 +
  369 +
  370 +def _submit_text_dispatch_and_wait(
  371 + normalized: List[str],
  372 + effective_normalize: bool,
  373 + request_id: str,
  374 + priority: int,
  375 +) -> _EmbedResult:
  376 + if not any(worker.is_alive() for worker in _text_dispatch_workers):
  377 + _start_text_dispatch_workers()
  378 + task = _TextDispatchTask(
  379 + normalized=normalized,
  380 + effective_normalize=effective_normalize,
  381 + request_id=request_id,
  382 + priority=_effective_priority(priority),
  383 + created_at=time.perf_counter(),
  384 + done=threading.Event(),
  385 + )
  386 + with _text_dispatch_cv:
  387 + if task.priority > 0:
  388 + _text_dispatch_high_queue.append(task)
  389 + else:
  390 + _text_dispatch_normal_queue.append(task)
  391 + _text_dispatch_cv.notify()
  392 + task.done.wait()
  393 + if task.error is not None:
  394 + raise task.error
  395 + if task.result is None:
  396 + raise RuntimeError("Text dispatch worker returned empty result")
  397 + return task.result
  398 +
  399 +
252 _text_request_limiter = _InflightLimiter(name="text", limit=_TEXT_MAX_INFLIGHT) 400 _text_request_limiter = _InflightLimiter(name="text", limit=_TEXT_MAX_INFLIGHT)
253 _image_request_limiter = _InflightLimiter(name="image", limit=_IMAGE_MAX_INFLIGHT) 401 _image_request_limiter = _InflightLimiter(name="image", limit=_IMAGE_MAX_INFLIGHT)
254 _text_stats = _EndpointStats(name="text") 402 _text_stats = _EndpointStats(name="text")
@@ -261,6 +409,7 @@ _image_cache = RedisEmbeddingCache(key_prefix=_CACHE_PREFIX, namespace=&quot;image&quot;) @@ -261,6 +409,7 @@ _image_cache = RedisEmbeddingCache(key_prefix=_CACHE_PREFIX, namespace=&quot;image&quot;)
261 class _SingleTextTask: 409 class _SingleTextTask:
262 text: str 410 text: str
263 normalize: bool 411 normalize: bool
  412 + priority: int
264 created_at: float 413 created_at: float
265 request_id: str 414 request_id: str
266 done: threading.Event 415 done: threading.Event
@@ -268,12 +417,30 @@ class _SingleTextTask: @@ -268,12 +417,30 @@ class _SingleTextTask:
268 error: Optional[Exception] = None 417 error: Optional[Exception] = None
269 418
270 419
271 -_text_single_queue: "deque[_SingleTextTask]" = deque() 420 +_text_single_high_queue: "deque[_SingleTextTask]" = deque()
  421 +_text_single_normal_queue: "deque[_SingleTextTask]" = deque()
272 _text_single_queue_cv = threading.Condition() 422 _text_single_queue_cv = threading.Condition()
273 _text_batch_worker: Optional[threading.Thread] = None 423 _text_batch_worker: Optional[threading.Thread] = None
274 _text_batch_worker_stop = False 424 _text_batch_worker_stop = False
275 425
276 426
  427 +def _text_microbatch_queue_depth() -> Dict[str, int]:
  428 + with _text_single_queue_cv:
  429 + return {
  430 + "high": len(_text_single_high_queue),
  431 + "normal": len(_text_single_normal_queue),
  432 + "total": len(_text_single_high_queue) + len(_text_single_normal_queue),
  433 + }
  434 +
  435 +
  436 +def _pop_single_text_task_locked() -> Optional["_SingleTextTask"]:
  437 + if _text_single_high_queue:
  438 + return _text_single_high_queue.popleft()
  439 + if _text_single_normal_queue:
  440 + return _text_single_normal_queue.popleft()
  441 + return None
  442 +
  443 +
277 def _compact_preview(text: str, max_chars: int) -> str: 444 def _compact_preview(text: str, max_chars: int) -> str:
278 compact = " ".join((text or "").split()) 445 compact = " ".join((text or "").split())
279 if len(compact) <= max_chars: 446 if len(compact) <= max_chars:
@@ -356,30 +523,41 @@ def _text_batch_worker_loop() -&gt; None: @@ -356,30 +523,41 @@ def _text_batch_worker_loop() -&gt; None:
356 max_batch = max(1, int(CONFIG.TEXT_BATCH_SIZE)) 523 max_batch = max(1, int(CONFIG.TEXT_BATCH_SIZE))
357 while True: 524 while True:
358 with _text_single_queue_cv: 525 with _text_single_queue_cv:
359 - while not _text_single_queue and not _text_batch_worker_stop: 526 + while (
  527 + not _text_single_high_queue
  528 + and not _text_single_normal_queue
  529 + and not _text_batch_worker_stop
  530 + ):
360 _text_single_queue_cv.wait() 531 _text_single_queue_cv.wait()
361 if _text_batch_worker_stop: 532 if _text_batch_worker_stop:
362 return 533 return
363 534
364 - batch: List[_SingleTextTask] = [_text_single_queue.popleft()] 535 + first_task = _pop_single_text_task_locked()
  536 + if first_task is None:
  537 + continue
  538 + batch: List[_SingleTextTask] = [first_task]
365 deadline = time.perf_counter() + _TEXT_MICROBATCH_WINDOW_SEC 539 deadline = time.perf_counter() + _TEXT_MICROBATCH_WINDOW_SEC
366 540
367 while len(batch) < max_batch: 541 while len(batch) < max_batch:
368 remaining = deadline - time.perf_counter() 542 remaining = deadline - time.perf_counter()
369 if remaining <= 0: 543 if remaining <= 0:
370 break 544 break
371 - if not _text_single_queue: 545 + if not _text_single_high_queue and not _text_single_normal_queue:
372 _text_single_queue_cv.wait(timeout=remaining) 546 _text_single_queue_cv.wait(timeout=remaining)
373 continue 547 continue
374 - while _text_single_queue and len(batch) < max_batch:  
375 - batch.append(_text_single_queue.popleft()) 548 + while len(batch) < max_batch:
  549 + next_task = _pop_single_text_task_locked()
  550 + if next_task is None:
  551 + break
  552 + batch.append(next_task)
376 553
377 try: 554 try:
378 queue_wait_ms = [(time.perf_counter() - task.created_at) * 1000.0 for task in batch] 555 queue_wait_ms = [(time.perf_counter() - task.created_at) * 1000.0 for task in batch]
379 reqids = [task.request_id for task in batch] 556 reqids = [task.request_id for task in batch]
380 logger.info( 557 logger.info(
381 - "text microbatch dispatch | size=%d queue_wait_ms_min=%.2f queue_wait_ms_max=%.2f reqids=%s preview=%s", 558 + "text microbatch dispatch | size=%d priority=%s queue_wait_ms_min=%.2f queue_wait_ms_max=%.2f reqids=%s preview=%s",
382 len(batch), 559 len(batch),
  560 + _priority_label(max(task.priority for task in batch)),
383 min(queue_wait_ms) if queue_wait_ms else 0.0, 561 min(queue_wait_ms) if queue_wait_ms else 0.0,
384 max(queue_wait_ms) if queue_wait_ms else 0.0, 562 max(queue_wait_ms) if queue_wait_ms else 0.0,
385 reqids, 563 reqids,
@@ -423,22 +601,32 @@ def _text_batch_worker_loop() -&gt; None: @@ -423,22 +601,32 @@ def _text_batch_worker_loop() -&gt; None:
423 task.done.set() 601 task.done.set()
424 602
425 603
426 -def _encode_single_text_with_microbatch(text: str, normalize: bool, request_id: str) -> List[float]: 604 +def _encode_single_text_with_microbatch(
  605 + text: str,
  606 + normalize: bool,
  607 + request_id: str,
  608 + priority: int,
  609 +) -> List[float]:
427 task = _SingleTextTask( 610 task = _SingleTextTask(
428 text=text, 611 text=text,
429 normalize=normalize, 612 normalize=normalize,
  613 + priority=_effective_priority(priority),
430 created_at=time.perf_counter(), 614 created_at=time.perf_counter(),
431 request_id=request_id, 615 request_id=request_id,
432 done=threading.Event(), 616 done=threading.Event(),
433 ) 617 )
434 with _text_single_queue_cv: 618 with _text_single_queue_cv:
435 - _text_single_queue.append(task) 619 + if task.priority > 0:
  620 + _text_single_high_queue.append(task)
  621 + else:
  622 + _text_single_normal_queue.append(task)
436 _text_single_queue_cv.notify() 623 _text_single_queue_cv.notify()
437 624
438 if not task.done.wait(timeout=_TEXT_REQUEST_TIMEOUT_SEC): 625 if not task.done.wait(timeout=_TEXT_REQUEST_TIMEOUT_SEC):
439 with _text_single_queue_cv: 626 with _text_single_queue_cv:
  627 + queue = _text_single_high_queue if task.priority > 0 else _text_single_normal_queue
440 try: 628 try:
441 - _text_single_queue.remove(task) 629 + queue.remove(task)
442 except ValueError: 630 except ValueError:
443 pass 631 pass
444 raise RuntimeError( 632 raise RuntimeError(
@@ -489,6 +677,7 @@ def load_models(): @@ -489,6 +677,7 @@ def load_models():
489 f"Unsupported embedding backend: {backend_name}. " 677 f"Unsupported embedding backend: {backend_name}. "
490 "Supported: tei, local_st" 678 "Supported: tei, local_st"
491 ) 679 )
  680 + _start_text_dispatch_workers()
492 logger.info("Text backend loaded successfully: %s", _text_backend_name) 681 logger.info("Text backend loaded successfully: %s", _text_backend_name)
493 except Exception as e: 682 except Exception as e:
494 logger.error("Failed to load text model: %s", e, exc_info=True) 683 logger.error("Failed to load text model: %s", e, exc_info=True)
@@ -532,6 +721,7 @@ def load_models(): @@ -532,6 +721,7 @@ def load_models():
532 @app.on_event("shutdown") 721 @app.on_event("shutdown")
533 def stop_workers() -> None: 722 def stop_workers() -> None:
534 _stop_text_batch_worker() 723 _stop_text_batch_worker()
  724 + _stop_text_dispatch_workers()
535 725
536 726
537 def _normalize_vector(vec: np.ndarray) -> np.ndarray: 727 def _normalize_vector(vec: np.ndarray) -> np.ndarray:
@@ -602,6 +792,8 @@ def _try_full_image_cache_hit( @@ -602,6 +792,8 @@ def _try_full_image_cache_hit(
602 def health() -> Dict[str, Any]: 792 def health() -> Dict[str, Any]:
603 """Health check endpoint. Returns status and current throttling stats.""" 793 """Health check endpoint. Returns status and current throttling stats."""
604 ready = (not open_text_model or _text_model is not None) and (not open_image_model or _image_model is not None) 794 ready = (not open_text_model or _text_model is not None) and (not open_image_model or _image_model is not None)
  795 + text_dispatch_depth = _text_dispatch_queue_depth()
  796 + text_microbatch_depth = _text_microbatch_queue_depth()
605 return { 797 return {
606 "status": "ok" if ready else "degraded", 798 "status": "ok" if ready else "degraded",
607 "service_kind": _SERVICE_KIND, 799 "service_kind": _SERVICE_KIND,
@@ -620,9 +812,18 @@ def health() -&gt; Dict[str, Any]: @@ -620,9 +812,18 @@ def health() -&gt; Dict[str, Any]:
620 "text": _text_stats.snapshot(), 812 "text": _text_stats.snapshot(),
621 "image": _image_stats.snapshot(), 813 "image": _image_stats.snapshot(),
622 }, 814 },
  815 + "text_dispatch": {
  816 + "workers": _text_dispatch_worker_count,
  817 + "workers_alive": sum(1 for worker in _text_dispatch_workers if worker.is_alive()),
  818 + "queue_depth": text_dispatch_depth["total"],
  819 + "queue_depth_high": text_dispatch_depth["high"],
  820 + "queue_depth_normal": text_dispatch_depth["normal"],
  821 + },
623 "text_microbatch": { 822 "text_microbatch": {
624 "window_ms": round(_TEXT_MICROBATCH_WINDOW_SEC * 1000.0, 3), 823 "window_ms": round(_TEXT_MICROBATCH_WINDOW_SEC * 1000.0, 3),
625 - "queue_depth": len(_text_single_queue), 824 + "queue_depth": text_microbatch_depth["total"],
  825 + "queue_depth_high": text_microbatch_depth["high"],
  826 + "queue_depth_normal": text_microbatch_depth["normal"],
626 "worker_alive": bool(_text_batch_worker is not None and _text_batch_worker.is_alive()), 827 "worker_alive": bool(_text_batch_worker is not None and _text_batch_worker.is_alive()),
627 "request_timeout_sec": _TEXT_REQUEST_TIMEOUT_SEC, 828 "request_timeout_sec": _TEXT_REQUEST_TIMEOUT_SEC,
628 }, 829 },
@@ -654,6 +855,7 @@ def _embed_text_impl( @@ -654,6 +855,7 @@ def _embed_text_impl(
654 normalized: List[str], 855 normalized: List[str],
655 effective_normalize: bool, 856 effective_normalize: bool,
656 request_id: str, 857 request_id: str,
  858 + priority: int = 0,
657 ) -> _EmbedResult: 859 ) -> _EmbedResult:
658 if _text_model is None: 860 if _text_model is None:
659 raise RuntimeError("Text model not loaded") 861 raise RuntimeError("Text model not loaded")
@@ -703,6 +905,7 @@ def _embed_text_impl( @@ -703,6 +905,7 @@ def _embed_text_impl(
703 missing_texts[0], 905 missing_texts[0],
704 normalize=effective_normalize, 906 normalize=effective_normalize,
705 request_id=request_id, 907 request_id=request_id,
  908 + priority=priority,
706 ) 909 )
707 ] 910 ]
708 mode = "microbatch-single" 911 mode = "microbatch-single"
@@ -777,6 +980,7 @@ async def embed_text( @@ -777,6 +980,7 @@ async def embed_text(
777 http_request: Request, 980 http_request: Request,
778 response: Response, 981 response: Response,
779 normalize: Optional[bool] = None, 982 normalize: Optional[bool] = None,
  983 + priority: int = 0,
780 ) -> List[Optional[List[float]]]: 984 ) -> List[Optional[List[float]]]:
781 if _text_model is None: 985 if _text_model is None:
782 raise HTTPException(status_code=503, detail="Text embedding model not loaded in this service") 986 raise HTTPException(status_code=503, detail="Text embedding model not loaded in this service")
@@ -784,6 +988,9 @@ async def embed_text( @@ -784,6 +988,9 @@ async def embed_text(
784 request_id = _resolve_request_id(http_request) 988 request_id = _resolve_request_id(http_request)
785 response.headers["X-Request-ID"] = request_id 989 response.headers["X-Request-ID"] = request_id
786 990
  991 + if priority < 0:
  992 + raise HTTPException(status_code=400, detail="priority must be >= 0")
  993 + effective_priority = _effective_priority(priority)
787 effective_normalize = bool(CONFIG.TEXT_NORMALIZE_EMBEDDINGS) if normalize is None else bool(normalize) 994 effective_normalize = bool(CONFIG.TEXT_NORMALIZE_EMBEDDINGS) if normalize is None else bool(normalize)
788 normalized: List[str] = [] 995 normalized: List[str] = []
789 for i, t in enumerate(texts): 996 for i, t in enumerate(texts):
@@ -806,8 +1013,9 @@ async def embed_text( @@ -806,8 +1013,9 @@ async def embed_text(
806 cache_misses=0, 1013 cache_misses=0,
807 ) 1014 )
808 logger.info( 1015 logger.info(
809 - "embed_text response | backend=%s mode=cache-only inputs=%d normalize=%s dim=%d cache_hits=%d cache_misses=0 first_vector=%s latency_ms=%.2f", 1016 + "embed_text response | backend=%s mode=cache-only priority=%s inputs=%d normalize=%s dim=%d cache_hits=%d cache_misses=0 first_vector=%s latency_ms=%.2f",
810 _text_backend_name, 1017 _text_backend_name,
  1018 + _priority_label(effective_priority),
811 len(normalized), 1019 len(normalized),
812 effective_normalize, 1020 effective_normalize,
813 len(cache_only.vectors[0]) if cache_only.vectors and cache_only.vectors[0] is not None else 0, 1021 len(cache_only.vectors[0]) if cache_only.vectors and cache_only.vectors[0] is not None else 0,
@@ -818,13 +1026,14 @@ async def embed_text( @@ -818,13 +1026,14 @@ async def embed_text(
818 ) 1026 )
819 return cache_only.vectors 1027 return cache_only.vectors
820 1028
821 - accepted, active = _text_request_limiter.try_acquire() 1029 + accepted, active = _text_request_limiter.try_acquire(bypass_limit=effective_priority > 0)
822 if not accepted: 1030 if not accepted:
823 _text_stats.record_rejected() 1031 _text_stats.record_rejected()
824 logger.warning( 1032 logger.warning(
825 - "embed_text rejected | client=%s backend=%s inputs=%d normalize=%s active=%d limit=%d preview=%s", 1033 + "embed_text rejected | client=%s backend=%s priority=%s inputs=%d normalize=%s active=%d limit=%d preview=%s",
826 _request_client(http_request), 1034 _request_client(http_request),
827 _text_backend_name, 1035 _text_backend_name,
  1036 + _priority_label(effective_priority),
828 len(normalized), 1037 len(normalized),
829 effective_normalize, 1038 effective_normalize,
830 active, 1039 active,
@@ -834,7 +1043,10 @@ async def embed_text( @@ -834,7 +1043,10 @@ async def embed_text(
834 ) 1043 )
835 raise HTTPException( 1044 raise HTTPException(
836 status_code=_OVERLOAD_STATUS_CODE, 1045 status_code=_OVERLOAD_STATUS_CODE,
837 - detail=f"Text embedding service busy: active={active}, limit={_TEXT_MAX_INFLIGHT}", 1046 + detail=(
  1047 + "Text embedding service busy for priority=0 requests: "
  1048 + f"active={active}, limit={_TEXT_MAX_INFLIGHT}"
  1049 + ),
838 ) 1050 )
839 1051
840 request_started = time.perf_counter() 1052 request_started = time.perf_counter()
@@ -844,9 +1056,10 @@ async def embed_text( @@ -844,9 +1056,10 @@ async def embed_text(
844 cache_misses = 0 1056 cache_misses = 0
845 try: 1057 try:
846 logger.info( 1058 logger.info(
847 - "embed_text request | client=%s backend=%s inputs=%d normalize=%s active=%d limit=%d preview=%s", 1059 + "embed_text request | client=%s backend=%s priority=%s inputs=%d normalize=%s active=%d limit=%d preview=%s",
848 _request_client(http_request), 1060 _request_client(http_request),
849 _text_backend_name, 1061 _text_backend_name,
  1062 + _priority_label(effective_priority),
850 len(normalized), 1063 len(normalized),
851 effective_normalize, 1064 effective_normalize,
852 active, 1065 active,
@@ -855,13 +1068,20 @@ async def embed_text( @@ -855,13 +1068,20 @@ async def embed_text(
855 extra=_request_log_extra(request_id), 1068 extra=_request_log_extra(request_id),
856 ) 1069 )
857 verbose_logger.info( 1070 verbose_logger.info(
858 - "embed_text detail | payload=%s normalize=%s backend=%s", 1071 + "embed_text detail | payload=%s normalize=%s backend=%s priority=%s",
859 normalized, 1072 normalized,
860 effective_normalize, 1073 effective_normalize,
861 _text_backend_name, 1074 _text_backend_name,
  1075 + _priority_label(effective_priority),
862 extra=_request_log_extra(request_id), 1076 extra=_request_log_extra(request_id),
863 ) 1077 )
864 - result = await run_in_threadpool(_embed_text_impl, normalized, effective_normalize, request_id) 1078 + result = await run_in_threadpool(
  1079 + _submit_text_dispatch_and_wait,
  1080 + normalized,
  1081 + effective_normalize,
  1082 + request_id,
  1083 + effective_priority,
  1084 + )
865 success = True 1085 success = True
866 backend_elapsed_ms = result.backend_elapsed_ms 1086 backend_elapsed_ms = result.backend_elapsed_ms
867 cache_hits = result.cache_hits 1087 cache_hits = result.cache_hits
@@ -875,9 +1095,10 @@ async def embed_text( @@ -875,9 +1095,10 @@ async def embed_text(
875 cache_misses=cache_misses, 1095 cache_misses=cache_misses,
876 ) 1096 )
877 logger.info( 1097 logger.info(
878 - "embed_text response | backend=%s mode=%s inputs=%d normalize=%s dim=%d cache_hits=%d cache_misses=%d first_vector=%s latency_ms=%.2f", 1098 + "embed_text response | backend=%s mode=%s priority=%s inputs=%d normalize=%s dim=%d cache_hits=%d cache_misses=%d first_vector=%s latency_ms=%.2f",
879 _text_backend_name, 1099 _text_backend_name,
880 result.mode, 1100 result.mode,
  1101 + _priority_label(effective_priority),
881 len(normalized), 1102 len(normalized),
882 effective_normalize, 1103 effective_normalize,
883 len(result.vectors[0]) if result.vectors and result.vectors[0] is not None else 0, 1104 len(result.vectors[0]) if result.vectors and result.vectors[0] is not None else 0,
@@ -888,8 +1109,9 @@ async def embed_text( @@ -888,8 +1109,9 @@ async def embed_text(
888 extra=_request_log_extra(request_id), 1109 extra=_request_log_extra(request_id),
889 ) 1110 )
890 verbose_logger.info( 1111 verbose_logger.info(
891 - "embed_text result detail | count=%d first_vector=%s latency_ms=%.2f", 1112 + "embed_text result detail | count=%d priority=%s first_vector=%s latency_ms=%.2f",
892 len(result.vectors), 1113 len(result.vectors),
  1114 + _priority_label(effective_priority),
893 result.vectors[0][: _VECTOR_PREVIEW_DIMS] 1115 result.vectors[0][: _VECTOR_PREVIEW_DIMS]
894 if result.vectors and result.vectors[0] is not None 1116 if result.vectors and result.vectors[0] is not None
895 else [], 1117 else [],
@@ -909,8 +1131,9 @@ async def embed_text( @@ -909,8 +1131,9 @@ async def embed_text(
909 cache_misses=cache_misses, 1131 cache_misses=cache_misses,
910 ) 1132 )
911 logger.error( 1133 logger.error(
912 - "embed_text failed | backend=%s inputs=%d normalize=%s latency_ms=%.2f error=%s", 1134 + "embed_text failed | backend=%s priority=%s inputs=%d normalize=%s latency_ms=%.2f error=%s",
913 _text_backend_name, 1135 _text_backend_name,
  1136 + _priority_label(effective_priority),
914 len(normalized), 1137 len(normalized),
915 effective_normalize, 1138 effective_normalize,
916 latency_ms, 1139 latency_ms,
@@ -922,8 +1145,9 @@ async def embed_text( @@ -922,8 +1145,9 @@ async def embed_text(
922 finally: 1145 finally:
923 remaining = _text_request_limiter.release(success=success) 1146 remaining = _text_request_limiter.release(success=success)
924 logger.info( 1147 logger.info(
925 - "embed_text finalize | success=%s active_after=%d", 1148 + "embed_text finalize | success=%s priority=%s active_after=%d",
926 success, 1149 success,
  1150 + _priority_label(effective_priority),
927 remaining, 1151 remaining,
928 extra=_request_log_extra(request_id), 1152 extra=_request_log_extra(request_id),
929 ) 1153 )
@@ -1019,6 +1243,7 @@ async def embed_image( @@ -1019,6 +1243,7 @@ async def embed_image(
1019 http_request: Request, 1243 http_request: Request,
1020 response: Response, 1244 response: Response,
1021 normalize: Optional[bool] = None, 1245 normalize: Optional[bool] = None,
  1246 + priority: int = 0,
1022 ) -> List[Optional[List[float]]]: 1247 ) -> List[Optional[List[float]]]:
1023 if _image_model is None: 1248 if _image_model is None:
1024 raise HTTPException(status_code=503, detail="Image embedding model not loaded in this service") 1249 raise HTTPException(status_code=503, detail="Image embedding model not loaded in this service")
@@ -1026,6 +1251,10 @@ async def embed_image( @@ -1026,6 +1251,10 @@ async def embed_image(
1026 request_id = _resolve_request_id(http_request) 1251 request_id = _resolve_request_id(http_request)
1027 response.headers["X-Request-ID"] = request_id 1252 response.headers["X-Request-ID"] = request_id
1028 1253
  1254 + if priority < 0:
  1255 + raise HTTPException(status_code=400, detail="priority must be >= 0")
  1256 + effective_priority = _effective_priority(priority)
  1257 +
1029 effective_normalize = bool(CONFIG.IMAGE_NORMALIZE_EMBEDDINGS) if normalize is None else bool(normalize) 1258 effective_normalize = bool(CONFIG.IMAGE_NORMALIZE_EMBEDDINGS) if normalize is None else bool(normalize)
1030 urls: List[str] = [] 1259 urls: List[str] = []
1031 for i, url_or_path in enumerate(images): 1260 for i, url_or_path in enumerate(images):
@@ -1048,7 +1277,8 @@ async def embed_image( @@ -1048,7 +1277,8 @@ async def embed_image(
1048 cache_misses=0, 1277 cache_misses=0,
1049 ) 1278 )
1050 logger.info( 1279 logger.info(
1051 - "embed_image response | mode=cache-only inputs=%d normalize=%s dim=%d cache_hits=%d cache_misses=0 first_vector=%s latency_ms=%.2f", 1280 + "embed_image response | mode=cache-only priority=%s inputs=%d normalize=%s dim=%d cache_hits=%d cache_misses=0 first_vector=%s latency_ms=%.2f",
  1281 + _priority_label(effective_priority),
1052 len(urls), 1282 len(urls),
1053 effective_normalize, 1283 effective_normalize,
1054 len(cache_only.vectors[0]) if cache_only.vectors and cache_only.vectors[0] is not None else 0, 1284 len(cache_only.vectors[0]) if cache_only.vectors and cache_only.vectors[0] is not None else 0,
@@ -1059,12 +1289,13 @@ async def embed_image( @@ -1059,12 +1289,13 @@ async def embed_image(
1059 ) 1289 )
1060 return cache_only.vectors 1290 return cache_only.vectors
1061 1291
1062 - accepted, active = _image_request_limiter.try_acquire() 1292 + accepted, active = _image_request_limiter.try_acquire(bypass_limit=effective_priority > 0)
1063 if not accepted: 1293 if not accepted:
1064 _image_stats.record_rejected() 1294 _image_stats.record_rejected()
1065 logger.warning( 1295 logger.warning(
1066 - "embed_image rejected | client=%s inputs=%d normalize=%s active=%d limit=%d preview=%s", 1296 + "embed_image rejected | client=%s priority=%s inputs=%d normalize=%s active=%d limit=%d preview=%s",
1067 _request_client(http_request), 1297 _request_client(http_request),
  1298 + _priority_label(effective_priority),
1068 len(urls), 1299 len(urls),
1069 effective_normalize, 1300 effective_normalize,
1070 active, 1301 active,
@@ -1074,7 +1305,10 @@ async def embed_image( @@ -1074,7 +1305,10 @@ async def embed_image(
1074 ) 1305 )
1075 raise HTTPException( 1306 raise HTTPException(
1076 status_code=_OVERLOAD_STATUS_CODE, 1307 status_code=_OVERLOAD_STATUS_CODE,
1077 - detail=f"Image embedding service busy: active={active}, limit={_IMAGE_MAX_INFLIGHT}", 1308 + detail=(
  1309 + "Image embedding service busy for priority=0 requests: "
  1310 + f"active={active}, limit={_IMAGE_MAX_INFLIGHT}"
  1311 + ),
1078 ) 1312 )
1079 1313
1080 request_started = time.perf_counter() 1314 request_started = time.perf_counter()
@@ -1084,8 +1318,9 @@ async def embed_image( @@ -1084,8 +1318,9 @@ async def embed_image(
1084 cache_misses = 0 1318 cache_misses = 0
1085 try: 1319 try:
1086 logger.info( 1320 logger.info(
1087 - "embed_image request | client=%s inputs=%d normalize=%s active=%d limit=%d preview=%s", 1321 + "embed_image request | client=%s priority=%s inputs=%d normalize=%s active=%d limit=%d preview=%s",
1088 _request_client(http_request), 1322 _request_client(http_request),
  1323 + _priority_label(effective_priority),
1089 len(urls), 1324 len(urls),
1090 effective_normalize, 1325 effective_normalize,
1091 active, 1326 active,
@@ -1094,9 +1329,10 @@ async def embed_image( @@ -1094,9 +1329,10 @@ async def embed_image(
1094 extra=_request_log_extra(request_id), 1329 extra=_request_log_extra(request_id),
1095 ) 1330 )
1096 verbose_logger.info( 1331 verbose_logger.info(
1097 - "embed_image detail | payload=%s normalize=%s", 1332 + "embed_image detail | payload=%s normalize=%s priority=%s",
1098 urls, 1333 urls,
1099 effective_normalize, 1334 effective_normalize,
  1335 + _priority_label(effective_priority),
1100 extra=_request_log_extra(request_id), 1336 extra=_request_log_extra(request_id),
1101 ) 1337 )
1102 result = await run_in_threadpool(_embed_image_impl, urls, effective_normalize, request_id) 1338 result = await run_in_threadpool(_embed_image_impl, urls, effective_normalize, request_id)
@@ -1113,8 +1349,9 @@ async def embed_image( @@ -1113,8 +1349,9 @@ async def embed_image(
1113 cache_misses=cache_misses, 1349 cache_misses=cache_misses,
1114 ) 1350 )
1115 logger.info( 1351 logger.info(
1116 - "embed_image response | mode=%s inputs=%d normalize=%s dim=%d cache_hits=%d cache_misses=%d first_vector=%s latency_ms=%.2f", 1352 + "embed_image response | mode=%s priority=%s inputs=%d normalize=%s dim=%d cache_hits=%d cache_misses=%d first_vector=%s latency_ms=%.2f",
1117 result.mode, 1353 result.mode,
  1354 + _priority_label(effective_priority),
1118 len(urls), 1355 len(urls),
1119 effective_normalize, 1356 effective_normalize,
1120 len(result.vectors[0]) if result.vectors and result.vectors[0] is not None else 0, 1357 len(result.vectors[0]) if result.vectors and result.vectors[0] is not None else 0,
@@ -1146,7 +1383,8 @@ async def embed_image( @@ -1146,7 +1383,8 @@ async def embed_image(
1146 cache_misses=cache_misses, 1383 cache_misses=cache_misses,
1147 ) 1384 )
1148 logger.error( 1385 logger.error(
1149 - "embed_image failed | inputs=%d normalize=%s latency_ms=%.2f error=%s", 1386 + "embed_image failed | priority=%s inputs=%d normalize=%s latency_ms=%.2f error=%s",
  1387 + _priority_label(effective_priority),
1150 len(urls), 1388 len(urls),
1151 effective_normalize, 1389 effective_normalize,
1152 latency_ms, 1390 latency_ms,
@@ -1158,8 +1396,9 @@ async def embed_image( @@ -1158,8 +1396,9 @@ async def embed_image(
1158 finally: 1396 finally:
1159 remaining = _image_request_limiter.release(success=success) 1397 remaining = _image_request_limiter.release(success=success)
1160 logger.info( 1398 logger.info(
1161 - "embed_image finalize | success=%s active_after=%d", 1399 + "embed_image finalize | success=%s priority=%s active_after=%d",
1162 success, 1400 success,
  1401 + _priority_label(effective_priority),
1163 remaining, 1402 remaining,
1164 extra=_request_log_extra(request_id), 1403 extra=_request_log_extra(request_id),
1165 ) 1404 )
embeddings/text_encoder.py
@@ -35,7 +35,12 @@ class TextEmbeddingEncoder: @@ -35,7 +35,12 @@ class TextEmbeddingEncoder:
35 expire_time=self.expire_time, 35 expire_time=self.expire_time,
36 ) 36 )
37 37
38 - def _call_service(self, request_data: List[str], normalize_embeddings: bool = True) -> List[Any]: 38 + def _call_service(
  39 + self,
  40 + request_data: List[str],
  41 + normalize_embeddings: bool = True,
  42 + priority: int = 0,
  43 + ) -> List[Any]:
39 """ 44 """
40 Call the embedding service API. 45 Call the embedding service API.
41 46
@@ -48,7 +53,10 @@ class TextEmbeddingEncoder: @@ -48,7 +53,10 @@ class TextEmbeddingEncoder:
48 try: 53 try:
49 response = requests.post( 54 response = requests.post(
50 self.endpoint, 55 self.endpoint,
51 - params={"normalize": "true" if normalize_embeddings else "false"}, 56 + params={
  57 + "normalize": "true" if normalize_embeddings else "false",
  58 + "priority": max(0, int(priority)),
  59 + },
52 json=request_data, 60 json=request_data,
53 timeout=60 61 timeout=60
54 ) 62 )
@@ -62,6 +70,7 @@ class TextEmbeddingEncoder: @@ -62,6 +70,7 @@ class TextEmbeddingEncoder:
62 self, 70 self,
63 sentences: Union[str, List[str]], 71 sentences: Union[str, List[str]],
64 normalize_embeddings: bool = True, 72 normalize_embeddings: bool = True,
  73 + priority: int = 0,
65 device: str = 'cpu', 74 device: str = 'cpu',
66 batch_size: int = 32 75 batch_size: int = 32
67 ) -> np.ndarray: 76 ) -> np.ndarray:
@@ -100,7 +109,11 @@ class TextEmbeddingEncoder: @@ -100,7 +109,11 @@ class TextEmbeddingEncoder:
100 109
101 # If there are uncached texts, call service 110 # If there are uncached texts, call service
102 if uncached_texts: 111 if uncached_texts:
103 - response_data = self._call_service(request_data, normalize_embeddings=normalize_embeddings) 112 + response_data = self._call_service(
  113 + request_data,
  114 + normalize_embeddings=normalize_embeddings,
  115 + priority=priority,
  116 + )
104 117
105 # Process response 118 # Process response
106 for i, text in enumerate(uncached_texts): 119 for i, text in enumerate(uncached_texts):
frontend/static/css/style.css
@@ -371,9 +371,61 @@ body { @@ -371,9 +371,61 @@ body {
371 margin-bottom: 2px; 371 margin-bottom: 2px;
372 } 372 }
373 373
  374 +.product-debug-actions {
  375 + display: flex;
  376 + flex-wrap: wrap;
  377 + align-items: center;
  378 + gap: 10px 14px;
  379 + margin-top: 8px;
  380 +}
  381 +
  382 +.product-debug-inline-es-btn {
  383 + font-family: inherit;
  384 + font-size: 12px;
  385 + padding: 4px 10px;
  386 + border: 1px solid #ccc;
  387 + border-radius: 4px;
  388 + background: #fafafa;
  389 + color: #333;
  390 + cursor: pointer;
  391 +}
  392 +
  393 +.product-debug-inline-es-btn:hover {
  394 + background: #f0f0f0;
  395 + border-color: #bbb;
  396 +}
  397 +
  398 +.product-debug--es-expanded {
  399 + max-height: min(70vh, 720px);
  400 +}
  401 +
  402 +.product-es-doc-panel {
  403 + margin-top: 10px;
  404 + padding-top: 8px;
  405 + border-top: 1px dashed #e8e8e8;
  406 +}
  407 +
  408 +.product-es-doc-panel-status {
  409 + font-size: 12px;
  410 + color: #888;
  411 +}
  412 +
  413 +.product-es-doc-pre {
  414 + margin: 6px 0 0;
  415 + padding: 10px;
  416 + background: #f5f5f5;
  417 + border-radius: 4px;
  418 + overflow: auto;
  419 + max-height: 50vh;
  420 + font-size: 11px;
  421 + line-height: 1.35;
  422 + white-space: pre-wrap;
  423 + word-break: break-word;
  424 +}
  425 +
374 .product-debug-link { 426 .product-debug-link {
375 display: inline-block; 427 display: inline-block;
376 - margin-top: 6px; 428 + margin-top: 0;
377 font-size: 12px; 429 font-size: 12px;
378 color: #e67e22; 430 color: #e67e22;
379 text-decoration: none; 431 text-decoration: none;
frontend/static/js/app.js
@@ -68,12 +68,85 @@ function initializeApp() { @@ -68,12 +68,85 @@ function initializeApp() {
68 // 初始化租户下拉框和分面面板 68 // 初始化租户下拉框和分面面板
69 console.log('Initializing app...'); 69 console.log('Initializing app...');
70 initTenantSelect(); 70 initTenantSelect();
  71 + setupProductGridEsDocToggle();
71 const searchInput = document.getElementById('searchInput'); 72 const searchInput = document.getElementById('searchInput');
72 if (searchInput) { 73 if (searchInput) {
73 searchInput.focus(); 74 searchInput.focus();
74 } 75 }
75 } 76 }
76 77
  78 +/** Delegated handler: toggle inline ES raw response under each result card (survives innerHTML refresh on re-search). */
  79 +function setupProductGridEsDocToggle() {
  80 + const grid = document.getElementById('productGrid');
  81 + if (!grid || grid.dataset.esDocToggleBound === '1') {
  82 + return;
  83 + }
  84 + grid.dataset.esDocToggleBound = '1';
  85 + grid.addEventListener('click', onProductGridEsDocToggleClick);
  86 +}
  87 +
  88 +async function onProductGridEsDocToggleClick(event) {
  89 + const btn = event.target.closest('[data-action="toggle-es-inline-doc"]');
  90 + if (!btn) {
  91 + return;
  92 + }
  93 + event.preventDefault();
  94 + const debugRoot = btn.closest('.product-debug');
  95 + if (!debugRoot) {
  96 + return;
  97 + }
  98 + const panel = debugRoot.querySelector('.product-es-doc-panel');
  99 + const pre = debugRoot.querySelector('.product-es-doc-pre');
  100 + const statusEl = debugRoot.querySelector('.product-es-doc-panel-status');
  101 + if (!panel || !pre || !statusEl) {
  102 + return;
  103 + }
  104 +
  105 + const spuId = btn.getAttribute('data-spu-id') || '';
  106 + const tenantId = getTenantId();
  107 + const url = `${API_BASE_URL}/search/es-doc/${encodeURIComponent(spuId)}?tenant_id=${encodeURIComponent(tenantId)}`;
  108 +
  109 + if (debugRoot.dataset.esInlineOpen === '1') {
  110 + panel.setAttribute('hidden', '');
  111 + debugRoot.classList.remove('product-debug--es-expanded');
  112 + debugRoot.dataset.esInlineOpen = '0';
  113 + btn.textContent = '在结果中显示 ES 文档';
  114 + return;
  115 + }
  116 +
  117 + panel.removeAttribute('hidden');
  118 + debugRoot.classList.add('product-debug--es-expanded');
  119 + debugRoot.dataset.esInlineOpen = '1';
  120 + btn.textContent = '隐藏 ES 文档';
  121 +
  122 + if (pre.textContent.length > 0) {
  123 + panel.scrollIntoView({ behavior: 'smooth', block: 'nearest' });
  124 + return;
  125 + }
  126 +
  127 + statusEl.style.display = '';
  128 + statusEl.textContent = '加载中…';
  129 + pre.style.display = 'none';
  130 +
  131 + try {
  132 + const response = await fetch(url);
  133 + if (!response.ok) {
  134 + const errText = await response.text();
  135 + throw new Error(`HTTP ${response.status}: ${errText.slice(0, 200)}`);
  136 + }
  137 + const data = await response.json();
  138 + pre.textContent = customStringify(data);
  139 + statusEl.style.display = 'none';
  140 + pre.style.display = 'block';
  141 + } catch (err) {
  142 + console.error('ES doc fetch failed', err);
  143 + statusEl.textContent = `加载失败: ${err.message || err}`;
  144 + pre.style.display = 'none';
  145 + }
  146 +
  147 + panel.scrollIntoView({ behavior: 'smooth', block: 'nearest' });
  148 +}
  149 +
77 // 在 DOM 加载完成后初始化 150 // 在 DOM 加载完成后初始化
78 if (document.readyState === 'loading') { 151 if (document.readyState === 'loading') {
79 document.addEventListener('DOMContentLoaded', initializeApp); 152 document.addEventListener('DOMContentLoaded', initializeApp);
@@ -401,9 +474,20 @@ function displayResults(data) { @@ -401,9 +474,20 @@ function displayResults(data) {
401 <div class="product-debug-line">Rerank score: ${rerankScore}</div> 474 <div class="product-debug-line">Rerank score: ${rerankScore}</div>
402 <div class="product-debug-line">Fused score: ${fusedScore}</div> 475 <div class="product-debug-line">Fused score: ${fusedScore}</div>
403 ${titleLines} 476 ${titleLines}
404 - <a class="product-debug-link" href="${rawUrl}" target="_blank" rel="noopener noreferrer">  
405 - 查看 ES 原始文档  
406 - </a> 477 + <div class="product-debug-actions">
  478 + <button type="button" class="product-debug-inline-es-btn"
  479 + data-action="toggle-es-inline-doc"
  480 + data-spu-id="${escapeAttr(String(spuId || ''))}">
  481 + 在结果中显示 ES 文档
  482 + </button>
  483 + <a class="product-debug-link" href="${rawUrl}" target="_blank" rel="noopener noreferrer">
  484 + 查看 ES 原始文档
  485 + </a>
  486 + </div>
  487 + <div class="product-es-doc-panel" hidden>
  488 + <div class="product-es-doc-panel-status"></div>
  489 + <pre class="product-es-doc-pre"></pre>
  490 + </div>
407 </div> 491 </div>
408 `; 492 `;
409 } 493 }
perf_reports/README.md 0 → 100644
@@ -0,0 +1,34 @@ @@ -0,0 +1,34 @@
  1 +# 性能测试报告索引
  2 +
  3 +本目录存放各次压测/矩阵的原始 JSON 与说明。**推荐复用**仓库脚本,避免重复造轮子:
  4 +
  5 +| 脚本 | 用途 |
  6 +|------|------|
  7 +| `scripts/perf_api_benchmark.py` | 搜索后端、向量、翻译、重排等 HTTP 接口压测;支持 `--embed-text-priority` / `--embed-image-priority` 与 `scripts/perf_cases.json.example` |
  8 +
  9 +历史矩阵示例(并发扫描):
  10 +
  11 +- `2026-03-12/matrix_report/summary.md` — 与 `summary.json` 同目录
  12 +
  13 +## 2026-03-20 — 向量服务 `priority` 参数烟测
  14 +
  15 +环境:本机 `127.0.0.1:6005`(文本)、`127.0.0.1:6008`(图片),命令与结果见同目录 JSON:
  16 +
  17 +| 报告文件 | 场景 | 说明 |
  18 +|----------|------|------|
  19 +| `2026-03-20_embed_text_p0.json` | `embed_text` | `priority=0`(默认),8s,并发 10 |
  20 +| `2026-03-20_embed_text_p1.json` | `embed_text` | `--embed-text-priority 1`,8s,并发 10 |
  21 +| `2026-03-20_embed_image_p0.json` | `embed_image` | `priority=0`,8s,并发 5 |
  22 +| `2026-03-20_embed_image_p1.json` | `embed_image` | `--embed-image-priority 1`,8s,并发 5 |
  23 +
  24 +复现示例:
  25 +
  26 +```bash
  27 +source activate.sh
  28 +python scripts/perf_api_benchmark.py --scenario embed_text --duration 8 --concurrency 10 --timeout 30 --output perf_reports/2026-03-20_embed_text_p0.json
  29 +python scripts/perf_api_benchmark.py --scenario embed_text --duration 8 --concurrency 10 --embed-text-priority 1 --output perf_reports/2026-03-20_embed_text_p1.json
  30 +python scripts/perf_api_benchmark.py --scenario embed_image --duration 8 --concurrency 5 --timeout 60 --output perf_reports/2026-03-20_embed_image_p0.json
  31 +python scripts/perf_api_benchmark.py --scenario embed_image --duration 8 --concurrency 5 --embed-image-priority 1 --output perf_reports/2026-03-20_embed_image_p1.json
  32 +```
  33 +
  34 +说明:本次为 **8 秒 smoke**,与 `2026-03-12` 矩阵的时长/并发不可直接横向对比;仅验证 `priority` 参数下服务仍返回 200 且 payload 校验通过。
query/query_parser.py
@@ -8,7 +8,7 @@ from typing import Dict, List, Optional, Any, Union @@ -8,7 +8,7 @@ from typing import Dict, List, Optional, Any, Union
8 import numpy as np 8 import numpy as np
9 import logging 9 import logging
10 import re 10 import re
11 -from concurrent.futures import ThreadPoolExecutor, as_completed, wait 11 +from concurrent.futures import ThreadPoolExecutor, wait
12 12
13 from embeddings.text_encoder import TextEmbeddingEncoder 13 from embeddings.text_encoder import TextEmbeddingEncoder
14 from config import SearchConfig 14 from config import SearchConfig
@@ -42,6 +42,8 @@ class ParsedQuery: @@ -42,6 +42,8 @@ class ParsedQuery:
42 search_langs: Optional[List[str]] = None, 42 search_langs: Optional[List[str]] = None,
43 index_languages: Optional[List[str]] = None, 43 index_languages: Optional[List[str]] = None,
44 source_in_index_languages: bool = True, 44 source_in_index_languages: bool = True,
  45 + contains_chinese: bool = False,
  46 + contains_english: bool = False,
45 ): 47 ):
46 self.original_query = original_query 48 self.original_query = original_query
47 self.query_normalized = query_normalized 49 self.query_normalized = query_normalized
@@ -58,6 +60,8 @@ class ParsedQuery: @@ -58,6 +60,8 @@ class ParsedQuery:
58 self.search_langs = search_langs or [] 60 self.search_langs = search_langs or []
59 self.index_languages = index_languages or [] 61 self.index_languages = index_languages or []
60 self.source_in_index_languages = bool(source_in_index_languages) 62 self.source_in_index_languages = bool(source_in_index_languages)
  63 + self.contains_chinese = bool(contains_chinese)
  64 + self.contains_english = bool(contains_english)
61 65
62 def to_dict(self) -> Dict[str, Any]: 66 def to_dict(self) -> Dict[str, Any]:
63 """Convert to dictionary representation.""" 67 """Convert to dictionary representation."""
@@ -73,6 +77,8 @@ class ParsedQuery: @@ -73,6 +77,8 @@ class ParsedQuery:
73 result["search_langs"] = self.search_langs 77 result["search_langs"] = self.search_langs
74 result["index_languages"] = self.index_languages 78 result["index_languages"] = self.index_languages
75 result["source_in_index_languages"] = self.source_in_index_languages 79 result["source_in_index_languages"] = self.source_in_index_languages
  80 + result["contains_chinese"] = self.contains_chinese
  81 + result["contains_english"] = self.contains_english
76 return result 82 return result
77 83
78 84
@@ -139,7 +145,6 @@ class QueryParser: @@ -139,7 +145,6 @@ class QueryParser:
139 cfg.get("default_model"), 145 cfg.get("default_model"),
140 ) 146 )
141 self._translator = create_translation_client() 147 self._translator = create_translation_client()
142 - self._translation_executor = ThreadPoolExecutor(max_workers=4, thread_name_prefix="query-translation")  
143 148
144 @property 149 @property
145 def text_encoder(self) -> TextEmbeddingEncoder: 150 def text_encoder(self) -> TextEmbeddingEncoder:
@@ -218,6 +223,16 @@ class QueryParser: @@ -218,6 +223,16 @@ class QueryParser:
218 return bool(re.search(r"[\u4e00-\u9fff]", text or "")) 223 return bool(re.search(r"[\u4e00-\u9fff]", text or ""))
219 224
220 @staticmethod 225 @staticmethod
  226 + def _is_pure_english_word_token(token: str) -> bool:
  227 + """
  228 + A tokenizer token counts as English iff it is letters only (optional internal hyphens)
  229 + and length >= 3.
  230 + """
  231 + if not token or len(token) < 3:
  232 + return False
  233 + return bool(re.fullmatch(r"[A-Za-z]+(?:-[A-Za-z]+)*", token))
  234 +
  235 + @staticmethod
221 def _extract_latin_tokens(text: str) -> List[str]: 236 def _extract_latin_tokens(text: str) -> List[str]:
222 """Extract latin word tokens from query text.""" 237 """Extract latin word tokens from query text."""
223 return re.findall(r"[A-Za-z]+(?:-[A-Za-z]+)*", text or "") 238 return re.findall(r"[A-Za-z]+(?:-[A-Za-z]+)*", text or "")
@@ -332,11 +347,14 @@ class QueryParser: @@ -332,11 +347,14 @@ class QueryParser:
332 if context: 347 if context:
333 context.store_intermediate_result('detected_language', detected_lang) 348 context.store_intermediate_result('detected_language', detected_lang)
334 349
335 - # Stage 4: Translation (with async support and conditional waiting)  
336 - translations = {}  
337 - translation_futures = {}  
338 - translation_executor = None 350 + # Stage 4: Translation — always submit to thread pool; results are collected together with
  351 + # embedding in one wait() that uses a configurable budget (short vs long by source-in-index).
  352 + translations: Dict[str, str] = {}
  353 + translation_futures: Dict[str, Any] = {}
  354 + translation_executor: Optional[ThreadPoolExecutor] = None
339 index_langs: List[str] = [] 355 index_langs: List[str] = []
  356 + detected_norm = str(detected_lang or "").strip().lower()
  357 +
340 try: 358 try:
341 # 根据租户配置的 index_languages 决定翻译目标语言 359 # 根据租户配置的 index_languages 决定翻译目标语言
342 from config.tenant_config_loader import get_tenant_config_loader 360 from config.tenant_config_loader import get_tenant_config_loader
@@ -352,59 +370,32 @@ class QueryParser: @@ -352,59 +370,32 @@ class QueryParser:
352 seen_langs.add(norm_lang) 370 seen_langs.add(norm_lang)
353 index_langs.append(norm_lang) 371 index_langs.append(norm_lang)
354 372
355 - target_langs_for_translation = [lang for lang in index_langs if lang != detected_lang] 373 + target_langs_for_translation = [lang for lang in index_langs if lang != detected_norm]
356 374
357 if target_langs_for_translation: 375 if target_langs_for_translation:
358 - target_langs = target_langs_for_translation  
359 -  
360 - if target_langs:  
361 - # Determine if we need to wait for translation results  
362 - # If detected_lang is not in index_languages, we must wait for translation  
363 - need_wait_translation = detected_lang not in index_langs  
364 -  
365 - if need_wait_translation:  
366 - translation_executor = ThreadPoolExecutor(  
367 - max_workers=max(1, min(len(target_langs), 4)),  
368 - thread_name_prefix="query-translation-wait",  
369 - )  
370 - for lang in target_langs:  
371 - model_name = self._pick_query_translation_model(detected_lang, lang, self.config)  
372 - log_debug(  
373 - f"Submitting query translation | source={detected_lang} target={lang} model={model_name}"  
374 - )  
375 - translation_futures[lang] = translation_executor.submit(  
376 - self.translator.translate,  
377 - query_text,  
378 - lang,  
379 - detected_lang,  
380 - "ecommerce_search_query",  
381 - model_name,  
382 - )  
383 - else:  
384 - for lang in target_langs:  
385 - model_name = self._pick_query_translation_model(detected_lang, lang, self.config)  
386 - log_debug(  
387 - f"Submitting query translation | source={detected_lang} target={lang} model={model_name}"  
388 - )  
389 - self._translation_executor.submit(  
390 - self.translator.translate,  
391 - query_text,  
392 - lang,  
393 - detected_lang,  
394 - "ecommerce_search_query",  
395 - model_name,  
396 - ) 376 + translation_executor = ThreadPoolExecutor(
  377 + max_workers=max(1, min(len(target_langs_for_translation), 4)),
  378 + thread_name_prefix="query-translation",
  379 + )
  380 + for lang in target_langs_for_translation:
  381 + model_name = self._pick_query_translation_model(detected_lang, lang, self.config)
  382 + log_debug(
  383 + f"Submitting query translation | source={detected_lang} target={lang} model={model_name}"
  384 + )
  385 + translation_futures[lang] = translation_executor.submit(
  386 + self.translator.translate,
  387 + query_text,
  388 + lang,
  389 + detected_lang,
  390 + "ecommerce_search_query",
  391 + model_name,
  392 + )
397 393
398 - if translations:  
399 - log_info(f"Translation completed (cache hit) | Query text: '{query_text}' | Results: {translations}")  
400 - if translation_futures:  
401 - log_debug(f"Translation in progress, waiting for results... | Query text: '{query_text}' | Languages: {list(translation_futures.keys())}")  
402 -  
403 - if context:  
404 - context.store_intermediate_result('translations', translations)  
405 - for lang, translation in translations.items():  
406 - if translation:  
407 - context.store_intermediate_result(f'translation_{lang}', translation) 394 + if context:
  395 + context.store_intermediate_result('translations', translations)
  396 + for lang, translation in translations.items():
  397 + if translation:
  398 + context.store_intermediate_result(f'translation_{lang}', translation)
408 399
409 except Exception as e: 400 except Exception as e:
410 error_msg = f"Translation failed | Error: {str(e)}" 401 error_msg = f"Translation failed | Error: {str(e)}"
@@ -416,13 +407,18 @@ class QueryParser: @@ -416,13 +407,18 @@ class QueryParser:
416 keywords = self._extract_keywords(query_text) 407 keywords = self._extract_keywords(query_text)
417 query_tokens = self._get_query_tokens(query_text) 408 query_tokens = self._get_query_tokens(query_text)
418 token_count = len(query_tokens) 409 token_count = len(query_tokens)
  410 + contains_chinese = self._contains_cjk(query_text)
  411 + contains_english = any(self._is_pure_english_word_token(t) for t in query_tokens)
419 412
420 log_debug(f"Query analysis | Keywords: {keywords} | Token count: {token_count} | " 413 log_debug(f"Query analysis | Keywords: {keywords} | Token count: {token_count} | "
421 - f"Query tokens: {query_tokens}") 414 + f"Query tokens: {query_tokens} | contains_chinese={contains_chinese} | "
  415 + f"contains_english={contains_english}")
422 if context: 416 if context:
423 context.store_intermediate_result('keywords', keywords) 417 context.store_intermediate_result('keywords', keywords)
424 context.store_intermediate_result('token_count', token_count) 418 context.store_intermediate_result('token_count', token_count)
425 context.store_intermediate_result('query_tokens', query_tokens) 419 context.store_intermediate_result('query_tokens', query_tokens)
  420 + context.store_intermediate_result('contains_chinese', contains_chinese)
  421 + context.store_intermediate_result('contains_english', contains_english)
426 422
427 # Stage 6: Text embedding (only for non-short queries) - async execution 423 # Stage 6: Text embedding (only for non-short queries) - async execution
428 query_vector = None 424 query_vector = None
@@ -442,7 +438,7 @@ class QueryParser: @@ -442,7 +438,7 @@ class QueryParser:
442 # Submit encoding task to thread pool for async execution 438 # Submit encoding task to thread pool for async execution
443 encoding_executor = ThreadPoolExecutor(max_workers=1) 439 encoding_executor = ThreadPoolExecutor(max_workers=1)
444 def _encode_query_vector() -> Optional[np.ndarray]: 440 def _encode_query_vector() -> Optional[np.ndarray]:
445 - arr = self.text_encoder.encode([query_text]) 441 + arr = self.text_encoder.encode([query_text], priority=1)
446 if arr is None or len(arr) == 0: 442 if arr is None or len(arr) == 0:
447 return None 443 return None
448 vec = arr[0] 444 vec = arr[0]
@@ -458,45 +454,66 @@ class QueryParser: @@ -458,45 +454,66 @@ class QueryParser:
458 encoding_executor = None 454 encoding_executor = None
459 embedding_future = None 455 embedding_future = None
460 456
461 - # Wait for all async tasks to complete (translation and embedding) 457 + # Wait for translation + embedding concurrently; shared budget (ms) depends on whether
  458 + # the detected language is in tenant index_languages.
  459 + qc = self.config.query_config
  460 + source_in_index_for_budget = detected_norm in index_langs
  461 + budget_ms = (
  462 + qc.translation_embedding_wait_budget_ms_source_in_index
  463 + if source_in_index_for_budget
  464 + else qc.translation_embedding_wait_budget_ms_source_not_in_index
  465 + )
  466 + budget_sec = max(0.0, float(budget_ms) / 1000.0)
  467 +
  468 + if translation_futures:
  469 + log_info(
  470 + f"Translation+embedding shared wait budget | budget_ms={budget_ms} | "
  471 + f"source_in_index_languages={source_in_index_for_budget} | "
  472 + f"translation_targets={list(translation_futures.keys())}"
  473 + )
  474 +
462 if translation_futures or embedding_future: 475 if translation_futures or embedding_future:
463 - log_debug("Waiting for async tasks to complete...")  
464 -  
465 - # Collect all futures with their identifiers  
466 - all_futures = []  
467 - future_to_lang = {} 476 + log_debug(
  477 + f"Waiting for async tasks (translation+embedding) | budget_ms={budget_ms} | "
  478 + f"source_in_index_languages={source_in_index_for_budget}"
  479 + )
  480 +
  481 + all_futures: List[Any] = []
  482 + future_to_lang: Dict[Any, tuple] = {}
468 for lang, future in translation_futures.items(): 483 for lang, future in translation_futures.items():
469 all_futures.append(future) 484 all_futures.append(future)
470 - future_to_lang[future] = ('translation', lang)  
471 - 485 + future_to_lang[future] = ("translation", lang)
  486 +
472 if embedding_future: 487 if embedding_future:
473 all_futures.append(embedding_future) 488 all_futures.append(embedding_future)
474 - future_to_lang[embedding_future] = ('embedding', None)  
475 -  
476 - # Enforce a hard timeout for translation-related work (300ms budget)  
477 - done, not_done = wait(all_futures, timeout=0.3) 489 + future_to_lang[embedding_future] = ("embedding", None)
  490 +
  491 + done, not_done = wait(all_futures, timeout=budget_sec)
478 for future in done: 492 for future in done:
479 task_type, lang = future_to_lang[future] 493 task_type, lang = future_to_lang[future]
480 try: 494 try:
481 result = future.result() 495 result = future.result()
482 - if task_type == 'translation': 496 + if task_type == "translation":
483 if result: 497 if result:
484 translations[lang] = result 498 translations[lang] = result
485 log_info( 499 log_info(
486 - f"Translation completed | Query text: '{query_text}' | Target language: {lang} | Translation result: '{result}'" 500 + f"Translation completed | Query text: '{query_text}' | "
  501 + f"Target language: {lang} | Translation result: '{result}'"
487 ) 502 )
488 if context: 503 if context:
489 - context.store_intermediate_result(f'translation_{lang}', result)  
490 - elif task_type == 'embedding': 504 + context.store_intermediate_result(f"translation_{lang}", result)
  505 + elif task_type == "embedding":
491 query_vector = result 506 query_vector = result
492 if query_vector is not None: 507 if query_vector is not None:
493 log_debug(f"Query vector generation completed | Shape: {query_vector.shape}") 508 log_debug(f"Query vector generation completed | Shape: {query_vector.shape}")
494 if context: 509 if context:
495 - context.store_intermediate_result('query_vector_shape', query_vector.shape) 510 + context.store_intermediate_result("query_vector_shape", query_vector.shape)
496 else: 511 else:
497 - log_info("Query vector generation completed but result is None, will process without vector") 512 + log_info(
  513 + "Query vector generation completed but result is None, will process without vector"
  514 + )
498 except Exception as e: 515 except Exception as e:
499 - if task_type == 'translation': 516 + if task_type == "translation":
500 error_msg = f"Translation failed | Language: {lang} | Error: {str(e)}" 517 error_msg = f"Translation failed | Language: {lang} | Error: {str(e)}"
501 else: 518 else:
502 error_msg = f"Query vector generation failed | Error: {str(e)}" 519 error_msg = f"Query vector generation failed | Error: {str(e)}"
@@ -504,30 +521,29 @@ class QueryParser: @@ -504,30 +521,29 @@ class QueryParser:
504 if context: 521 if context:
505 context.add_warning(error_msg) 522 context.add_warning(error_msg)
506 523
507 - # Log timeouts for any futures that did not finish within 300ms  
508 if not_done: 524 if not_done:
509 for future in not_done: 525 for future in not_done:
510 task_type, lang = future_to_lang[future] 526 task_type, lang = future_to_lang[future]
511 - if task_type == 'translation': 527 + if task_type == "translation":
512 timeout_msg = ( 528 timeout_msg = (
513 - f"Translation timeout (>300ms) | Language: {lang} | " 529 + f"Translation timeout (>{budget_ms}ms) | Language: {lang} | "
514 f"Query text: '{query_text}'" 530 f"Query text: '{query_text}'"
515 ) 531 )
516 else: 532 else:
517 - timeout_msg = "Query vector generation timeout (>300ms), proceeding without embedding result" 533 + timeout_msg = (
  534 + f"Query vector generation timeout (>{budget_ms}ms), proceeding without embedding result"
  535 + )
518 log_info(timeout_msg) 536 log_info(timeout_msg)
519 if context: 537 if context:
520 context.add_warning(timeout_msg) 538 context.add_warning(timeout_msg)
521 539
522 - # Clean up encoding executor  
523 if encoding_executor: 540 if encoding_executor:
524 encoding_executor.shutdown(wait=False) 541 encoding_executor.shutdown(wait=False)
525 if translation_executor: 542 if translation_executor:
526 translation_executor.shutdown(wait=False) 543 translation_executor.shutdown(wait=False)
527 -  
528 - # Update translations in context after all are complete 544 +
529 if translations and context: 545 if translations and context:
530 - context.store_intermediate_result('translations', translations) 546 + context.store_intermediate_result("translations", translations)
531 547
532 # Build language-scoped query plan: source language + available translations 548 # Build language-scoped query plan: source language + available translations
533 query_text_by_lang: Dict[str, str] = {} 549 query_text_by_lang: Dict[str, str] = {}
@@ -547,7 +563,7 @@ class QueryParser: @@ -547,7 +563,7 @@ class QueryParser:
547 # Use the original mixed-script query as a robust fallback probe for that language field set. 563 # Use the original mixed-script query as a robust fallback probe for that language field set.
548 query_text_by_lang[lang] = query_text 564 query_text_by_lang[lang] = query_text
549 565
550 - source_in_index_languages = detected_lang in index_langs 566 + source_in_index_languages = detected_norm in index_langs
551 ordered_search_langs: List[str] = [] 567 ordered_search_langs: List[str] = []
552 seen_order = set() 568 seen_order = set()
553 if detected_lang in query_text_by_lang: 569 if detected_lang in query_text_by_lang:
@@ -583,6 +599,8 @@ class QueryParser: @@ -583,6 +599,8 @@ class QueryParser:
583 search_langs=ordered_search_langs, 599 search_langs=ordered_search_langs,
584 index_languages=index_langs, 600 index_languages=index_langs,
585 source_in_index_languages=source_in_index_languages, 601 source_in_index_languages=source_in_index_languages,
  602 + contains_chinese=contains_chinese,
  603 + contains_english=contains_english,
586 ) 604 )
587 605
588 if context and hasattr(context, 'logger'): 606 if context and hasattr(context, 'logger'):
scripts/perf_api_benchmark.py
@@ -15,6 +15,9 @@ Examples: @@ -15,6 +15,9 @@ Examples:
15 python scripts/perf_api_benchmark.py --scenario backend_suggest --duration 30 --concurrency 50 --tenant-id 162 15 python scripts/perf_api_benchmark.py --scenario backend_suggest --duration 30 --concurrency 50 --tenant-id 162
16 python scripts/perf_api_benchmark.py --scenario all --duration 60 --concurrency 80 --tenant-id 162 16 python scripts/perf_api_benchmark.py --scenario all --duration 60 --concurrency 80 --tenant-id 162
17 python scripts/perf_api_benchmark.py --scenario all --cases-file scripts/perf_cases.json.example --output perf_result.json 17 python scripts/perf_api_benchmark.py --scenario all --cases-file scripts/perf_cases.json.example --output perf_result.json
  18 + # Embedding admission / priority (query param `priority`; same semantics as embedding service):
  19 + python scripts/perf_api_benchmark.py --scenario embed_text --embed-text-priority 1 --duration 30 --concurrency 20
  20 + python scripts/perf_api_benchmark.py --scenario embed_image --embed-image-priority 1 --duration 30 --concurrency 10
18 """ 21 """
19 22
20 from __future__ import annotations 23 from __future__ import annotations
@@ -72,9 +75,9 @@ def validate_response_payload( @@ -72,9 +75,9 @@ def validate_response_payload(
72 ) -> Tuple[bool, str]: 75 ) -> Tuple[bool, str]:
73 """ 76 """
74 Lightweight payload validation for correctness-aware perf tests. 77 Lightweight payload validation for correctness-aware perf tests.
75 - Currently strict for embed_text to catch NaN/null vector regressions. 78 + Strict for embed_text / embed_image to catch NaN/null vector regressions.
76 """ 79 """
77 - if scenario_name != "embed_text": 80 + if scenario_name not in ("embed_text", "embed_image"):
78 return True, "" 81 return True, ""
79 82
80 expected_len = len(tpl.json_body) if isinstance(tpl.json_body, list) else None 83 expected_len = len(tpl.json_body) if isinstance(tpl.json_body, list) else None
@@ -219,6 +222,43 @@ def load_cases_from_file(path: Path, tenant_id: str) -&gt; Dict[str, List[RequestTe @@ -219,6 +222,43 @@ def load_cases_from_file(path: Path, tenant_id: str) -&gt; Dict[str, List[RequestTe
219 return out 222 return out
220 223
221 224
  225 +def apply_embed_priority_params(
  226 + scenarios: Dict[str, Scenario],
  227 + embed_text_priority: int,
  228 + embed_image_priority: int,
  229 +) -> None:
  230 + """
  231 + Merge default `priority` query param into embed templates when absent.
  232 + `scripts/perf_cases.json` may set per-request `params.priority` to override.
  233 + """
  234 + mapping = {
  235 + "embed_text": max(0, int(embed_text_priority)),
  236 + "embed_image": max(0, int(embed_image_priority)),
  237 + }
  238 + for name, pri in mapping.items():
  239 + if name not in scenarios:
  240 + continue
  241 + scen = scenarios[name]
  242 + new_templates: List[RequestTemplate] = []
  243 + for t in scen.templates:
  244 + params = dict(t.params or {})
  245 + params.setdefault("priority", str(pri))
  246 + new_templates.append(
  247 + RequestTemplate(
  248 + method=t.method,
  249 + path=t.path,
  250 + params=params,
  251 + json_body=t.json_body,
  252 + headers=t.headers,
  253 + )
  254 + )
  255 + scenarios[name] = Scenario(
  256 + name=scen.name,
  257 + templates=new_templates,
  258 + timeout_sec=scen.timeout_sec,
  259 + )
  260 +
  261 +
222 def build_scenarios(args: argparse.Namespace) -> Dict[str, Scenario]: 262 def build_scenarios(args: argparse.Namespace) -> Dict[str, Scenario]:
223 defaults = make_default_templates(args.tenant_id) 263 defaults = make_default_templates(args.tenant_id)
224 if args.cases_file: 264 if args.cases_file:
@@ -252,6 +292,11 @@ def build_scenarios(args: argparse.Namespace) -&gt; Dict[str, Scenario]: @@ -252,6 +292,11 @@ def build_scenarios(args: argparse.Namespace) -&gt; Dict[str, Scenario]:
252 ) 292 )
253 ) 293 )
254 scenarios[name] = Scenario(name=name, templates=rewritten, timeout_sec=args.timeout) 294 scenarios[name] = Scenario(name=name, templates=rewritten, timeout_sec=args.timeout)
  295 + apply_embed_priority_params(
  296 + scenarios,
  297 + embed_text_priority=args.embed_text_priority,
  298 + embed_image_priority=args.embed_image_priority,
  299 + )
255 return scenarios 300 return scenarios
256 301
257 302
@@ -483,6 +528,18 @@ def parse_args() -&gt; argparse.Namespace: @@ -483,6 +528,18 @@ def parse_args() -&gt; argparse.Namespace:
483 default=0, 528 default=0,
484 help="Optional top_n for rerank requests in dynamic docs mode (0 means omit top_n).", 529 help="Optional top_n for rerank requests in dynamic docs mode (0 means omit top_n).",
485 ) 530 )
  531 + parser.add_argument(
  532 + "--embed-text-priority",
  533 + type=int,
  534 + default=0,
  535 + help="Default query param priority= for embed_text (0=offline admission; >0 bypasses rejection). Merged into params unless set in --cases-file.",
  536 + )
  537 + parser.add_argument(
  538 + "--embed-image-priority",
  539 + type=int,
  540 + default=0,
  541 + help="Default query param priority= for embed_image (same semantics as embed-text-priority).",
  542 + )
486 return parser.parse_args() 543 return parser.parse_args()
487 544
488 545
@@ -609,6 +666,8 @@ async def main_async() -&gt; int: @@ -609,6 +666,8 @@ async def main_async() -&gt; int:
609 print(f" embedding_image_base={args.embedding_image_base}") 666 print(f" embedding_image_base={args.embedding_image_base}")
610 print(f" translator_base={args.translator_base}") 667 print(f" translator_base={args.translator_base}")
611 print(f" reranker_base={args.reranker_base}") 668 print(f" reranker_base={args.reranker_base}")
  669 + print(f" embed_text_priority={args.embed_text_priority}")
  670 + print(f" embed_image_priority={args.embed_image_priority}")
612 if args.rerank_dynamic_docs: 671 if args.rerank_dynamic_docs:
613 print(" rerank_dynamic_docs=True") 672 print(" rerank_dynamic_docs=True")
614 print(f" rerank_doc_count={args.rerank_doc_count}") 673 print(f" rerank_doc_count={args.rerank_doc_count}")
@@ -667,6 +726,8 @@ async def main_async() -&gt; int: @@ -667,6 +726,8 @@ async def main_async() -&gt; int:
667 "rerank_query": args.rerank_query, 726 "rerank_query": args.rerank_query,
668 "rerank_seed": args.rerank_seed, 727 "rerank_seed": args.rerank_seed,
669 "rerank_top_n": args.rerank_top_n, 728 "rerank_top_n": args.rerank_top_n,
  729 + "embed_text_priority": args.embed_text_priority,
  730 + "embed_image_priority": args.embed_image_priority,
670 }, 731 },
671 "results": results, 732 "results": results,
672 "overall": aggregate_results(results), 733 "overall": aggregate_results(results),
scripts/perf_cases.json.example
@@ -32,9 +32,18 @@ @@ -32,9 +32,18 @@
32 { 32 {
33 "method": "POST", 33 "method": "POST",
34 "path": "/embed/text", 34 "path": "/embed/text",
  35 + "params": {"priority": "0"},
35 "json": ["wireless mouse", "gaming keyboard", "USB-C cable", "barbie doll"] 36 "json": ["wireless mouse", "gaming keyboard", "USB-C cable", "barbie doll"]
36 } 37 }
37 ], 38 ],
  39 + "embed_image": [
  40 + {
  41 + "method": "POST",
  42 + "path": "/embed/image",
  43 + "params": {"normalize": "true", "priority": "0"},
  44 + "json": ["/data/saas-search/docs/image-dress1.png"]
  45 + }
  46 + ],
38 "translate": [ 47 "translate": [
39 { 48 {
40 "method": "POST", 49 "method": "POST",
search/es_query_builder.py
@@ -9,9 +9,13 @@ Simplified architecture: @@ -9,9 +9,13 @@ Simplified architecture:
9 """ 9 """
10 10
11 from typing import Dict, Any, List, Optional, Union, Tuple 11 from typing import Dict, Any, List, Optional, Union, Tuple
  12 +
12 import numpy as np 13 import numpy as np
13 from config import FunctionScoreConfig 14 from config import FunctionScoreConfig
14 15
  16 +# (Elasticsearch field path, boost before formatting as "path^boost")
  17 +MatchFieldSpec = Tuple[str, float]
  18 +
15 19
16 class ESQueryBuilder: 20 class ESQueryBuilder:
17 """Builds Elasticsearch DSL queries.""" 21 """Builds Elasticsearch DSL queries."""
@@ -36,6 +40,7 @@ class ESQueryBuilder: @@ -36,6 +40,7 @@ class ESQueryBuilder:
36 source_boost_when_missing: float = 0.6, 40 source_boost_when_missing: float = 0.6,
37 original_query_fallback_boost_when_translation_missing: float = 0.2, 41 original_query_fallback_boost_when_translation_missing: float = 0.2,
38 tie_breaker_base_query: float = 0.9, 42 tie_breaker_base_query: float = 0.9,
  43 + mixed_script_merged_field_boost_scale: float = 0.6,
39 ): 44 ):
40 """ 45 """
41 Initialize query builder. 46 Initialize query builder.
@@ -51,6 +56,7 @@ class ESQueryBuilder: @@ -51,6 +56,7 @@ class ESQueryBuilder:
51 function_score_config: Function score configuration 56 function_score_config: Function score configuration
52 default_language: Default language to use when detection fails or returns "unknown" 57 default_language: Default language to use when detection fails or returns "unknown"
53 knn_boost: Boost value for KNN (embedding recall) 58 knn_boost: Boost value for KNN (embedding recall)
  59 + mixed_script_merged_field_boost_scale: Multiply per-field ^boost for cross-script merged fields
54 """ 60 """
55 self.match_fields = match_fields 61 self.match_fields = match_fields
56 self.field_boosts = field_boosts or {} 62 self.field_boosts = field_boosts or {}
@@ -74,6 +80,7 @@ class ESQueryBuilder: @@ -74,6 +80,7 @@ class ESQueryBuilder:
74 original_query_fallback_boost_when_translation_missing 80 original_query_fallback_boost_when_translation_missing
75 ) 81 )
76 self.tie_breaker_base_query = float(tie_breaker_base_query) 82 self.tie_breaker_base_query = float(tie_breaker_base_query)
  83 + self.mixed_script_merged_field_boost_scale = float(mixed_script_merged_field_boost_scale)
77 84
78 def _apply_source_filter(self, es_query: Dict[str, Any]) -> None: 85 def _apply_source_filter(self, es_query: Dict[str, Any]) -> None:
79 """ 86 """
@@ -414,7 +421,7 @@ class ESQueryBuilder: @@ -414,7 +421,7 @@ class ESQueryBuilder:
414 def _format_field_with_boost(self, field_name: str, boost: float) -> str: 421 def _format_field_with_boost(self, field_name: str, boost: float) -> str:
415 if abs(float(boost) - 1.0) < 1e-9: 422 if abs(float(boost) - 1.0) < 1e-9:
416 return field_name 423 return field_name
417 - return f"{field_name}^{boost}" 424 + return f"{field_name}^{round(boost, 2)}"
418 425
419 def _get_field_boost(self, base_field: str, language: Optional[str] = None) -> float: 426 def _get_field_boost(self, base_field: str, language: Optional[str] = None) -> float:
420 # Language-specific override first (e.g. title.de), then base field (e.g. title) 427 # Language-specific override first (e.g. title.de), then base field (e.g. title)
@@ -426,36 +433,74 @@ class ESQueryBuilder: @@ -426,36 +433,74 @@ class ESQueryBuilder:
426 return float(self.field_boosts[base_field]) 433 return float(self.field_boosts[base_field])
427 return 1.0 434 return 1.0
428 435
429 - def _get_match_fields(self, language: str) -> Tuple[List[str], List[str]]: 436 + def _build_match_field_specs(self, language: str) -> Tuple[List[MatchFieldSpec], List[MatchFieldSpec]]:
430 """ 437 """
431 - Build dynamic match fields for one language.  
432 -  
433 - Args:  
434 - language: Language code (e.g. zh/en/de/fr/...)  
435 -  
436 - Returns:  
437 - (all_fields, core_fields) - core_fields are for phrase/keyword queries 438 + Per-language match targets as (field_path, boost). Single source of truth before string formatting.
  439 + Returns (all_fields, core_fields); core_fields are for phrase/keyword strategies elsewhere.
438 """ 440 """
439 lang = (language or "").strip().lower() 441 lang = (language or "").strip().lower()
440 - all_fields: List[str] = []  
441 - core_fields: List[str] = [] 442 + all_specs: List[MatchFieldSpec] = []
  443 + core_specs: List[MatchFieldSpec] = []
442 444
443 for base in self.multilingual_fields: 445 for base in self.multilingual_fields:
444 field = f"{base}.{lang}" 446 field = f"{base}.{lang}"
445 - boost = self._get_field_boost(base, lang)  
446 - all_fields.append(self._format_field_with_boost(field, boost)) 447 + all_specs.append((field, self._get_field_boost(base, lang)))
447 448
448 for shared in self.shared_fields: 449 for shared in self.shared_fields:
449 - boost = self._get_field_boost(shared, None)  
450 - all_fields.append(self._format_field_with_boost(shared, boost)) 450 + all_specs.append((shared, self._get_field_boost(shared, None)))
451 451
452 for base in self.core_multilingual_fields: 452 for base in self.core_multilingual_fields:
453 field = f"{base}.{lang}" 453 field = f"{base}.{lang}"
454 - boost = self._get_field_boost(base, lang)  
455 - core_fields.append(self._format_field_with_boost(field, boost)) 454 + core_specs.append((field, self._get_field_boost(base, lang)))
  455 +
  456 + return all_specs, core_specs
  457 +
  458 + def _format_match_field_specs(self, specs: List[MatchFieldSpec]) -> List[str]:
  459 + """Format (field_path, boost) pairs for Elasticsearch multi_match ``fields``."""
  460 + return [self._format_field_with_boost(path, boost) for path, boost in specs]
  461 +
  462 + def _merge_supplemental_lang_field_specs(
  463 + self,
  464 + specs: List[MatchFieldSpec],
  465 + supplemental_lang: str,
  466 + ) -> List[MatchFieldSpec]:
  467 + """Append supplemental-language columns; boosts multiplied by mixed_script scale."""
  468 + scale = float(self.mixed_script_merged_field_boost_scale)
  469 + extra_all, _ = self._build_match_field_specs(supplemental_lang)
  470 + seen = {path for path, _ in specs}
  471 + out = list(specs)
  472 + for path, boost in extra_all:
  473 + if path not in seen:
  474 + out.append((path, boost * scale))
  475 + seen.add(path)
  476 + return out
  477 +
  478 + def _expand_match_field_specs_for_mixed_script(
  479 + self,
  480 + lang: str,
  481 + specs: List[MatchFieldSpec],
  482 + contains_chinese: bool,
  483 + contains_english: bool,
  484 + index_languages: List[str],
  485 + ) -> List[MatchFieldSpec]:
  486 + """
  487 + When the query mixes scripts, widen each clause to indexed fields for the other script
  488 + (e.g. zh clause also searches title.en when the query contains an English word token).
  489 + """
  490 + norm = {str(x or "").strip().lower() for x in (index_languages or []) if str(x or "").strip()}
  491 + allow = norm or {"zh", "en"}
  492 +
  493 + def can_use(lcode: str) -> bool:
  494 + return lcode in allow if norm else True
  495 +
  496 + out = list(specs)
  497 + lnorm = (lang or "").strip().lower()
  498 + if contains_english and lnorm != "en" and can_use("en"):
  499 + out = self._merge_supplemental_lang_field_specs(out, "en")
  500 + if contains_chinese and lnorm != "zh" and can_use("zh"):
  501 + out = self._merge_supplemental_lang_field_specs(out, "zh")
  502 + return out
456 503
457 - return all_fields, core_fields  
458 -  
459 def _get_embedding_field(self, language: str) -> str: 504 def _get_embedding_field(self, language: str) -> str:
460 """Get embedding field name for a language.""" 505 """Get embedding field name for a language."""
461 # Currently using unified embedding field 506 # Currently using unified embedding field
@@ -486,6 +531,8 @@ class ESQueryBuilder: @@ -486,6 +531,8 @@ class ESQueryBuilder:
486 source_in_index_languages = True 531 source_in_index_languages = True
487 index_languages: List[str] = [] 532 index_languages: List[str] = []
488 533
  534 + contains_chinese = False
  535 + contains_english = False
489 if parsed_query: 536 if parsed_query:
490 query_text_by_lang = getattr(parsed_query, "query_text_by_lang", None) or {} 537 query_text_by_lang = getattr(parsed_query, "query_text_by_lang", None) or {}
491 search_langs = getattr(parsed_query, "search_langs", None) or [] 538 search_langs = getattr(parsed_query, "search_langs", None) or []
@@ -495,6 +542,8 @@ class ESQueryBuilder: @@ -495,6 +542,8 @@ class ESQueryBuilder:
495 getattr(parsed_query, "source_in_index_languages", True) 542 getattr(parsed_query, "source_in_index_languages", True)
496 ) 543 )
497 index_languages = getattr(parsed_query, "index_languages", None) or [] 544 index_languages = getattr(parsed_query, "index_languages", None) or []
  545 + contains_chinese = bool(getattr(parsed_query, "contains_chinese", False))
  546 + contains_english = bool(getattr(parsed_query, "contains_english", False))
498 547
499 if not query_text_by_lang: 548 if not query_text_by_lang:
500 query_text_by_lang = {source_lang: query_text} 549 query_text_by_lang = {source_lang: query_text}
@@ -508,7 +557,15 @@ class ESQueryBuilder: @@ -508,7 +557,15 @@ class ESQueryBuilder:
508 lang_query = query_text_by_lang.get(lang) 557 lang_query = query_text_by_lang.get(lang)
509 if not lang_query: 558 if not lang_query:
510 continue 559 continue
511 - match_fields, _ = self._get_match_fields(lang) 560 + all_specs, _ = self._build_match_field_specs(lang)
  561 + expanded_specs = self._expand_match_field_specs_for_mixed_script(
  562 + lang,
  563 + all_specs,
  564 + contains_chinese,
  565 + contains_english,
  566 + index_languages,
  567 + )
  568 + match_fields = self._format_match_field_specs(expanded_specs)
512 if not match_fields: 569 if not match_fields:
513 continue 570 continue
514 571
@@ -559,7 +616,15 @@ class ESQueryBuilder: @@ -559,7 +616,15 @@ class ESQueryBuilder:
559 continue 616 continue
560 if lang in query_text_by_lang: 617 if lang in query_text_by_lang:
561 continue 618 continue
562 - match_fields, _ = self._get_match_fields(lang) 619 + fb_specs, _ = self._build_match_field_specs(lang)
  620 + expanded_fb = self._expand_match_field_specs_for_mixed_script(
  621 + lang,
  622 + fb_specs,
  623 + contains_chinese,
  624 + contains_english,
  625 + index_languages,
  626 + )
  627 + match_fields = self._format_match_field_specs(expanded_fb)
563 if not match_fields: 628 if not match_fields:
564 continue 629 continue
565 should_clauses.append({ 630 should_clauses.append({
search/searcher.py
@@ -4,12 +4,13 @@ Main Searcher module - executes search queries against Elasticsearch. @@ -4,12 +4,13 @@ Main Searcher module - executes search queries against Elasticsearch.
4 Handles query parsing, ranking, and result formatting. 4 Handles query parsing, ranking, and result formatting.
5 """ 5 """
6 6
7 -from typing import Dict, Any, List, Optional, Union 7 +from typing import Dict, Any, List, Optional, Union, Tuple
8 import os 8 import os
9 import time, json 9 import time, json
10 import logging 10 import logging
11 import hashlib 11 import hashlib
12 from string import Formatter 12 from string import Formatter
  13 +import numpy as np
13 14
14 from utils.es_client import ESClient 15 from utils.es_client import ESClient
15 from query import QueryParser, ParsedQuery 16 from query import QueryParser, ParsedQuery
@@ -224,6 +225,265 @@ class Searcher: @@ -224,6 +225,265 @@ class Searcher:
224 hits_by_id[str(hid)] = hit 225 hits_by_id[str(hid)] = hit
225 return hits_by_id, int(resp.get("took", 0) or 0) 226 return hits_by_id, int(resp.get("took", 0) or 0)
226 227
  228 + @staticmethod
  229 + def _normalize_sku_match_text(value: Optional[str]) -> str:
  230 + """Normalize free text for lightweight SKU option matching."""
  231 + if value is None:
  232 + return ""
  233 + return " ".join(str(value).strip().casefold().split())
  234 +
  235 + @staticmethod
  236 + def _sku_option1_embedding_key(
  237 + sku: Dict[str, Any],
  238 + spu_option1_name: Optional[Any] = None,
  239 + ) -> Optional[str]:
  240 + """
  241 + Text sent to the embedding service for option1 must be "name:value"
  242 + (option name from SKU row or SPU-level option1_name).
  243 + """
  244 + value_raw = sku.get("option1_value")
  245 + if value_raw is None:
  246 + return None
  247 + value = str(value_raw).strip()
  248 + if not value:
  249 + return None
  250 + name = sku.get("option1_name")
  251 + if name is None or not str(name).strip():
  252 + name = spu_option1_name
  253 + name_str = str(name).strip() if name is not None and str(name).strip() else ""
  254 + if name_str:
  255 + value = f"{name_str}:{value}"
  256 + return value.casefold()
  257 +
  258 + def _build_sku_query_texts(self, parsed_query: ParsedQuery) -> List[str]:
  259 + """Collect original and translated query texts for SKU option matching."""
  260 + candidates: List[str] = []
  261 + for text in (
  262 + getattr(parsed_query, "original_query", None),
  263 + getattr(parsed_query, "query_normalized", None),
  264 + getattr(parsed_query, "rewritten_query", None),
  265 + ):
  266 + normalized = self._normalize_sku_match_text(text)
  267 + if normalized:
  268 + candidates.append(normalized)
  269 +
  270 + query_text_by_lang = getattr(parsed_query, "query_text_by_lang", {}) or {}
  271 + if isinstance(query_text_by_lang, dict):
  272 + for text in query_text_by_lang.values():
  273 + normalized = self._normalize_sku_match_text(text)
  274 + if normalized:
  275 + candidates.append(normalized)
  276 +
  277 + translations = getattr(parsed_query, "translations", {}) or {}
  278 + if isinstance(translations, dict):
  279 + for text in translations.values():
  280 + normalized = self._normalize_sku_match_text(text)
  281 + if normalized:
  282 + candidates.append(normalized)
  283 +
  284 + deduped: List[str] = []
  285 + seen = set()
  286 + for text in candidates:
  287 + if text in seen:
  288 + continue
  289 + seen.add(text)
  290 + deduped.append(text)
  291 + return deduped
  292 +
  293 + def _find_query_matching_sku_index(
  294 + self,
  295 + skus: List[Dict[str, Any]],
  296 + query_texts: List[str],
  297 + spu_option1_name: Optional[Any] = None,
  298 + ) -> Optional[int]:
  299 + """Return the first SKU whose option1_value (or name:value) appears in query texts."""
  300 + if not skus or not query_texts:
  301 + return None
  302 +
  303 + for index, sku in enumerate(skus):
  304 + option1_value = self._normalize_sku_match_text(sku.get("option1_value"))
  305 + if not option1_value:
  306 + continue
  307 + if any(option1_value in query_text for query_text in query_texts):
  308 + return index
  309 + embed_key = self._sku_option1_embedding_key(sku, spu_option1_name)
  310 + if embed_key and embed_key != option1_value:
  311 + composite_norm = self._normalize_sku_match_text(embed_key.replace(":", " "))
  312 + if any(composite_norm in query_text for query_text in query_texts):
  313 + return index
  314 + if any(embed_key.casefold() in query_text for query_text in query_texts):
  315 + return index
  316 + return None
  317 +
  318 + def _encode_query_vector_for_sku_matching(
  319 + self,
  320 + parsed_query: ParsedQuery,
  321 + context: Optional[RequestContext] = None,
  322 + ) -> Optional[np.ndarray]:
  323 + """Best-effort fallback query embedding for final-page SKU matching."""
  324 + query_text = (
  325 + getattr(parsed_query, "rewritten_query", None)
  326 + or getattr(parsed_query, "query_normalized", None)
  327 + or getattr(parsed_query, "original_query", None)
  328 + )
  329 + if not query_text:
  330 + return None
  331 +
  332 + text_encoder = getattr(self.query_parser, "text_encoder", None)
  333 + if text_encoder is None:
  334 + return None
  335 +
  336 + try:
  337 + vectors = text_encoder.encode([query_text], priority=1)
  338 + except Exception as exc:
  339 + logger.warning("Failed to encode query vector for SKU matching: %s", exc, exc_info=True)
  340 + if context is not None:
  341 + context.add_warning(f"SKU query embedding failed: {exc}")
  342 + return None
  343 +
  344 + if vectors is None or len(vectors) == 0:
  345 + return None
  346 +
  347 + vector = vectors[0]
  348 + if vector is None:
  349 + return None
  350 + return np.asarray(vector, dtype=np.float32)
  351 +
  352 + def _select_sku_by_embedding(
  353 + self,
  354 + skus: List[Dict[str, Any]],
  355 + option1_vectors: Dict[str, np.ndarray],
  356 + query_vector: np.ndarray,
  357 + spu_option1_name: Optional[Any] = None,
  358 + ) -> Tuple[Optional[int], Optional[float]]:
  359 + """Select the SKU whose option1 embedding key (name:value) is most similar to the query."""
  360 + best_index: Optional[int] = None
  361 + best_score: Optional[float] = None
  362 +
  363 + for index, sku in enumerate(skus):
  364 + embed_key = self._sku_option1_embedding_key(sku, spu_option1_name)
  365 + if not embed_key:
  366 + continue
  367 + option_vector = option1_vectors.get(embed_key)
  368 + if option_vector is None:
  369 + continue
  370 + score = float(np.inner(query_vector, option_vector))
  371 + if best_score is None or score > best_score:
  372 + best_index = index
  373 + best_score = score
  374 +
  375 + return best_index, best_score
  376 +
  377 + @staticmethod
  378 + def _promote_matching_sku(source: Dict[str, Any], match_index: int) -> Optional[Dict[str, Any]]:
  379 + """Move the matched SKU to the front and swap the SPU image."""
  380 + skus = source.get("skus")
  381 + if not isinstance(skus, list) or match_index < 0 or match_index >= len(skus):
  382 + return None
  383 +
  384 + matched_sku = skus.pop(match_index)
  385 + skus.insert(0, matched_sku)
  386 +
  387 + image_src = matched_sku.get("image_src") or matched_sku.get("imageSrc")
  388 + if image_src:
  389 + source["image_url"] = image_src
  390 + return matched_sku
  391 +
  392 + def _apply_sku_sorting_for_page_hits(
  393 + self,
  394 + es_hits: List[Dict[str, Any]],
  395 + parsed_query: ParsedQuery,
  396 + context: Optional[RequestContext] = None,
  397 + ) -> None:
  398 + """Sort each page hit's SKUs so the best-matching SKU is first."""
  399 + if not es_hits:
  400 + return
  401 +
  402 + query_texts = self._build_sku_query_texts(parsed_query)
  403 + unmatched_hits: List[Dict[str, Any]] = []
  404 + option1_values_to_encode: List[str] = []
  405 + seen_option1_values = set()
  406 + text_matched = 0
  407 + embedding_matched = 0
  408 +
  409 + for hit in es_hits:
  410 + source = hit.get("_source")
  411 + if not isinstance(source, dict):
  412 + continue
  413 + skus = source.get("skus")
  414 + if not isinstance(skus, list) or not skus:
  415 + continue
  416 +
  417 + spu_option1_name = source.get("option1_name")
  418 + match_index = self._find_query_matching_sku_index(
  419 + skus, query_texts, spu_option1_name=spu_option1_name
  420 + )
  421 + if match_index is not None:
  422 + self._promote_matching_sku(source, match_index)
  423 + text_matched += 1
  424 + continue
  425 +
  426 + unmatched_hits.append(hit)
  427 + for sku in skus:
  428 + embed_key = self._sku_option1_embedding_key(sku, spu_option1_name)
  429 + if not embed_key or embed_key in seen_option1_values:
  430 + continue
  431 + seen_option1_values.add(embed_key)
  432 + option1_values_to_encode.append(embed_key)
  433 +
  434 + if not unmatched_hits or not option1_values_to_encode:
  435 + return
  436 +
  437 + query_vector = getattr(parsed_query, "query_vector", None)
  438 + if query_vector is None:
  439 + query_vector = self._encode_query_vector_for_sku_matching(parsed_query, context=context)
  440 + if query_vector is None:
  441 + return
  442 +
  443 + text_encoder = getattr(self.query_parser, "text_encoder", None)
  444 + if text_encoder is None:
  445 + return
  446 +
  447 + try:
  448 + encoded_option_vectors = text_encoder.encode(option1_values_to_encode, priority=1)
  449 + except Exception as exc:
  450 + logger.warning("Failed to encode SKU option1 values for final-page sorting: %s", exc, exc_info=True)
  451 + if context is not None:
  452 + context.add_warning(f"SKU option embedding failed: {exc}")
  453 + return
  454 +
  455 + option1_vectors: Dict[str, np.ndarray] = {}
  456 + for option1_value, vector in zip(option1_values_to_encode, encoded_option_vectors):
  457 + if vector is None:
  458 + continue
  459 + option1_vectors[option1_value] = np.asarray(vector, dtype=np.float32)
  460 +
  461 + query_vector_array = np.asarray(query_vector, dtype=np.float32)
  462 + for hit in unmatched_hits:
  463 + source = hit.get("_source")
  464 + if not isinstance(source, dict):
  465 + continue
  466 + skus = source.get("skus")
  467 + if not isinstance(skus, list) or not skus:
  468 + continue
  469 + match_index, _ = self._select_sku_by_embedding(
  470 + skus,
  471 + option1_vectors,
  472 + query_vector_array,
  473 + spu_option1_name=source.get("option1_name"),
  474 + )
  475 + if match_index is None:
  476 + continue
  477 + self._promote_matching_sku(source, match_index)
  478 + embedding_matched += 1
  479 +
  480 + if text_matched or embedding_matched:
  481 + logger.info(
  482 + "Final-page SKU sorting completed | text_matched=%s | embedding_matched=%s",
  483 + text_matched,
  484 + embedding_matched,
  485 + )
  486 +
227 def search( 487 def search(
228 self, 488 self,
229 query: str, 489 query: str,
@@ -622,6 +882,8 @@ class Searcher: @@ -622,6 +882,8 @@ class Searcher:
622 continue 882 continue
623 rerank_debug_by_doc[str(doc_id)] = item 883 rerank_debug_by_doc[str(doc_id)] = item
624 884
  885 + self._apply_sku_sorting_for_page_hits(es_hits, parsed_query, context=context)
  886 +
625 # Format results using ResultFormatter 887 # Format results using ResultFormatter
626 formatted_results = ResultFormatter.format_search_results( 888 formatted_results = ResultFormatter.format_search_results(
627 es_hits, 889 es_hits,
@@ -791,7 +1053,7 @@ class Searcher: @@ -791,7 +1053,7 @@ class Searcher:
791 # Generate image embedding 1053 # Generate image embedding
792 if self.image_encoder is None: 1054 if self.image_encoder is None:
793 raise RuntimeError("Image encoder is not initialized at startup") 1055 raise RuntimeError("Image encoder is not initialized at startup")
794 - image_vector = self.image_encoder.encode_image_from_url(image_url) 1056 + image_vector = self.image_encoder.encode_image_from_url(image_url, priority=1)
795 1057
796 if image_vector is None: 1058 if image_vector is None:
797 raise ValueError(f"Failed to encode image: {image_url}") 1059 raise ValueError(f"Failed to encode image: {image_url}")
tests/ci/test_service_api_contracts.py
@@ -540,7 +540,15 @@ def test_indexer_index_validation_max_delete_spu_ids(indexer_client: TestClient) @@ -540,7 +540,15 @@ def test_indexer_index_validation_max_delete_spu_ids(indexer_client: TestClient)
540 540
541 541
542 class _FakeTextModel: 542 class _FakeTextModel:
543 - def encode_batch(self, texts, batch_size=32, device="cpu", normalize_embeddings=True): 543 + """Matches TEI / server path: `_text_model.encode(...)` (not encode_batch)."""
  544 +
  545 + def encode(
  546 + self,
  547 + texts,
  548 + batch_size=32,
  549 + device="cpu",
  550 + normalize_embeddings=True,
  551 + ):
544 return [np.array([0.1, 0.2, 0.3], dtype=np.float32) for _ in texts] 552 return [np.array([0.1, 0.2, 0.3], dtype=np.float32) for _ in texts]
545 553
546 554
@@ -549,6 +557,18 @@ class _FakeImageModel: @@ -549,6 +557,18 @@ class _FakeImageModel:
549 return [np.array([0.3, 0.2, 0.1], dtype=np.float32) for _ in urls] 557 return [np.array([0.3, 0.2, 0.1], dtype=np.float32) for _ in urls]
550 558
551 559
  560 +class _EmbeddingCacheMiss:
  561 + """Avoid Redis/module cache hits so contract tests exercise the encode path."""
  562 +
  563 + redis_client = None
  564 +
  565 + def get(self, key):
  566 + return None
  567 +
  568 + def set(self, key, value):
  569 + return True
  570 +
  571 +
552 @pytest.fixture 572 @pytest.fixture
553 def embedding_module(): 573 def embedding_module():
554 import embeddings.server as emb_server 574 import embeddings.server as emb_server
@@ -556,17 +576,31 @@ def embedding_module(): @@ -556,17 +576,31 @@ def embedding_module():
556 emb_server.app.router.on_startup.clear() 576 emb_server.app.router.on_startup.clear()
557 emb_server._text_model = _FakeTextModel() 577 emb_server._text_model = _FakeTextModel()
558 emb_server._image_model = _FakeImageModel() 578 emb_server._image_model = _FakeImageModel()
  579 + emb_server._text_backend_name = "tei"
  580 + emb_server._text_cache = _EmbeddingCacheMiss()
  581 + emb_server._image_cache = _EmbeddingCacheMiss()
559 yield emb_server 582 yield emb_server
560 583
561 584
562 def test_embedding_text_contract(embedding_module): 585 def test_embedding_text_contract(embedding_module):
563 - data = embedding_module.embed_text(["hello", "world"]) 586 + """Contract via HTTP like production; route handlers require Request/Response."""
  587 + from fastapi.testclient import TestClient
  588 +
  589 + with TestClient(embedding_module.app) as client:
  590 + resp = client.post("/embed/text", json=["hello", "world"])
  591 + assert resp.status_code == 200
  592 + data = resp.json()
564 assert len(data) == 2 593 assert len(data) == 2
565 assert len(data[0]) == 3 594 assert len(data[0]) == 3
566 595
567 596
568 def test_embedding_image_contract(embedding_module): 597 def test_embedding_image_contract(embedding_module):
569 - data = embedding_module.embed_image(["https://example.com/a.jpg"]) 598 + from fastapi.testclient import TestClient
  599 +
  600 + with TestClient(embedding_module.app) as client:
  601 + resp = client.post("/embed/image", json=["https://example.com/a.jpg"])
  602 + assert resp.status_code == 200
  603 + data = resp.json()
570 assert len(data[0]) == 3 604 assert len(data[0]) == 3
571 605
572 606
tests/test_embedding_pipeline.py
@@ -63,7 +63,11 @@ class _FakeTranslator: @@ -63,7 +63,11 @@ class _FakeTranslator:
63 63
64 64
65 class _FakeQueryEncoder: 65 class _FakeQueryEncoder:
  66 + def __init__(self):
  67 + self.calls = []
  68 +
66 def encode(self, sentences, **kwargs): 69 def encode(self, sentences, **kwargs):
  70 + self.calls.append({"sentences": sentences, "kwargs": dict(kwargs)})
67 if isinstance(sentences, str): 71 if isinstance(sentences, str):
68 sentences = [sentences] 72 sentences = [sentences]
69 return np.array([np.array([0.11, 0.22, 0.33], dtype=np.float32) for _ in sentences], dtype=object) 73 return np.array([np.array([0.11, 0.22, 0.33], dtype=np.float32) for _ in sentences], dtype=object)
@@ -98,9 +102,7 @@ def _build_test_config() -&gt; SearchConfig: @@ -98,9 +102,7 @@ def _build_test_config() -&gt; SearchConfig:
98 rerank=RerankConfig(), 102 rerank=RerankConfig(),
99 spu_config=SPUConfig(enabled=True, spu_field="spu_id", inner_hits_size=3), 103 spu_config=SPUConfig(enabled=True, spu_field="spu_id", inner_hits_size=3),
100 es_index_name="test_products", 104 es_index_name="test_products",
101 - tenant_config={},  
102 es_settings={}, 105 es_settings={},
103 - services={},  
104 ) 106 )
105 107
106 108
@@ -111,6 +113,7 @@ def test_text_embedding_encoder_response_alignment(monkeypatch): @@ -111,6 +113,7 @@ def test_text_embedding_encoder_response_alignment(monkeypatch):
111 def _fake_post(url, json, timeout, **kwargs): 113 def _fake_post(url, json, timeout, **kwargs):
112 assert url.endswith("/embed/text") 114 assert url.endswith("/embed/text")
113 assert json == ["hello", "world"] 115 assert json == ["hello", "world"]
  116 + assert kwargs["params"]["priority"] == 0
114 return _FakeResponse([[0.1, 0.2], [0.3, 0.4]]) 117 return _FakeResponse([[0.1, 0.2], [0.3, 0.4]])
115 118
116 monkeypatch.setattr("embeddings.text_encoder.requests.post", _fake_post) 119 monkeypatch.setattr("embeddings.text_encoder.requests.post", _fake_post)
@@ -172,6 +175,7 @@ def test_image_embedding_encoder_cache_hit(monkeypatch): @@ -172,6 +175,7 @@ def test_image_embedding_encoder_cache_hit(monkeypatch):
172 175
173 def _fake_post(url, params, json, timeout, **kwargs): 176 def _fake_post(url, params, json, timeout, **kwargs):
174 calls["count"] += 1 177 calls["count"] += 1
  178 + assert params["priority"] == 0
175 return _FakeResponse([[0.1, 0.2]]) 179 return _FakeResponse([[0.1, 0.2]])
176 180
177 monkeypatch.setattr("embeddings.image_encoder.requests.post", _fake_post) 181 monkeypatch.setattr("embeddings.image_encoder.requests.post", _fake_post)
@@ -184,16 +188,35 @@ def test_image_embedding_encoder_cache_hit(monkeypatch): @@ -184,16 +188,35 @@ def test_image_embedding_encoder_cache_hit(monkeypatch):
184 assert np.allclose(out[1], np.array([0.1, 0.2], dtype=np.float32)) 188 assert np.allclose(out[1], np.array([0.1, 0.2], dtype=np.float32))
185 189
186 190
  191 +def test_image_embedding_encoder_passes_priority(monkeypatch):
  192 + fake_cache = _FakeEmbeddingCache()
  193 + monkeypatch.setattr("embeddings.image_encoder.RedisEmbeddingCache", lambda **kwargs: fake_cache)
  194 +
  195 + def _fake_post(url, params, json, timeout, **kwargs):
  196 + assert params["priority"] == 1
  197 + return _FakeResponse([[0.1, 0.2]])
  198 +
  199 + monkeypatch.setattr("embeddings.image_encoder.requests.post", _fake_post)
  200 +
  201 + encoder = CLIPImageEncoder(service_url="http://127.0.0.1:6008")
  202 + out = encoder.encode_batch(["https://example.com/a.jpg"], priority=1)
  203 + assert len(out) == 1
  204 + assert np.allclose(out[0], np.array([0.1, 0.2], dtype=np.float32))
  205 +
  206 +
187 def test_query_parser_generates_query_vector_with_encoder(): 207 def test_query_parser_generates_query_vector_with_encoder():
  208 + encoder = _FakeQueryEncoder()
188 parser = QueryParser( 209 parser = QueryParser(
189 config=_build_test_config(), 210 config=_build_test_config(),
190 - text_encoder=_FakeQueryEncoder(), 211 + text_encoder=encoder,
191 translator=_FakeTranslator(), 212 translator=_FakeTranslator(),
192 ) 213 )
193 214
194 parsed = parser.parse("red dress", tenant_id="162", generate_vector=True) 215 parsed = parser.parse("red dress", tenant_id="162", generate_vector=True)
195 assert parsed.query_vector is not None 216 assert parsed.query_vector is not None
196 assert parsed.query_vector.shape == (3,) 217 assert parsed.query_vector.shape == (3,)
  218 + assert encoder.calls
  219 + assert encoder.calls[0]["kwargs"]["priority"] == 1
197 220
198 221
199 def test_query_parser_skips_query_vector_when_disabled(): 222 def test_query_parser_skips_query_vector_when_disabled():
tests/test_embedding_service_limits.py
@@ -69,6 +69,8 @@ def test_health_exposes_limit_stats(monkeypatch): @@ -69,6 +69,8 @@ def test_health_exposes_limit_stats(monkeypatch):
69 69
70 70
71 def test_embed_image_rejects_when_image_lane_is_full(monkeypatch): 71 def test_embed_image_rejects_when_image_lane_is_full(monkeypatch):
  72 + # Ensure no cache hit (module-level Redis cache may contain this URL from other tests).
  73 + monkeypatch.setattr(embedding_server, "_image_cache", _FakeCache({}))
72 limiter = embedding_server._InflightLimiter("image", 1) 74 limiter = embedding_server._InflightLimiter("image", 1)
73 acquired, _ = limiter.try_acquire() 75 acquired, _ = limiter.try_acquire()
74 assert acquired is True 76 assert acquired is True
tests/test_embedding_service_priority.py 0 → 100644
@@ -0,0 +1,81 @@ @@ -0,0 +1,81 @@
  1 +import threading
  2 +
  3 +import embeddings.server as emb_server
  4 +
  5 +
  6 +def test_text_inflight_limiter_priority_bypass():
  7 + limiter = emb_server._InflightLimiter(name="text", limit=1)
  8 +
  9 + accepted, active = limiter.try_acquire()
  10 + assert accepted is True
  11 + assert active == 1
  12 +
  13 + accepted, active = limiter.try_acquire()
  14 + assert accepted is False
  15 + assert active == 1
  16 +
  17 + accepted, active = limiter.try_acquire(bypass_limit=True)
  18 + assert accepted is True
  19 + assert active == 2
  20 +
  21 + snapshot = limiter.snapshot()
  22 + assert snapshot["priority_bypass_total"] == 1
  23 +
  24 + limiter.release(success=True)
  25 + limiter.release(success=True)
  26 +
  27 +
  28 +def test_text_dispatch_prefers_high_priority_queue():
  29 + high_task = emb_server._TextDispatchTask(
  30 + normalized=["online"],
  31 + effective_normalize=True,
  32 + request_id="high",
  33 + priority=1,
  34 + created_at=0.0,
  35 + done=threading.Event(),
  36 + )
  37 + normal_task = emb_server._TextDispatchTask(
  38 + normalized=["offline"],
  39 + effective_normalize=True,
  40 + request_id="normal",
  41 + priority=0,
  42 + created_at=0.0,
  43 + done=threading.Event(),
  44 + )
  45 +
  46 + with emb_server._text_dispatch_cv:
  47 + emb_server._text_dispatch_high_queue.clear()
  48 + emb_server._text_dispatch_normal_queue.clear()
  49 + emb_server._text_dispatch_normal_queue.append(normal_task)
  50 + emb_server._text_dispatch_high_queue.append(high_task)
  51 +
  52 + first = emb_server._pop_text_dispatch_task_locked()
  53 + second = emb_server._pop_text_dispatch_task_locked()
  54 +
  55 + emb_server._text_dispatch_high_queue.clear()
  56 + emb_server._text_dispatch_normal_queue.clear()
  57 +
  58 + assert first is high_task
  59 + assert second is normal_task
  60 +
  61 +
  62 +def test_image_inflight_limiter_priority_bypass():
  63 + limiter = emb_server._InflightLimiter(name="image", limit=1)
  64 +
  65 + accepted, active = limiter.try_acquire()
  66 + assert accepted is True
  67 + assert active == 1
  68 +
  69 + accepted, active = limiter.try_acquire()
  70 + assert accepted is False
  71 + assert active == 1
  72 +
  73 + accepted, active = limiter.try_acquire(bypass_limit=True)
  74 + assert accepted is True
  75 + assert active == 2
  76 +
  77 + snapshot = limiter.snapshot()
  78 + assert snapshot["priority_bypass_total"] == 1
  79 +
  80 + limiter.release(success=True)
  81 + limiter.release(success=True)
tests/test_es_query_builder.py
@@ -80,3 +80,102 @@ def test_text_query_contains_only_base_translation_and_fallback_named_queries(): @@ -80,3 +80,102 @@ def test_text_query_contains_only_base_translation_and_fallback_named_queries():
80 names = [clause["multi_match"]["_name"] for clause in should] 80 names = [clause["multi_match"]["_name"] for clause in should]
81 81
82 assert names == ["base_query", "base_query_trans_zh", "fallback_original_query_fr"] 82 assert names == ["base_query", "base_query_trans_zh", "fallback_original_query_fr"]
  83 +
  84 +
  85 +def test_mixed_script_merges_en_fields_into_zh_clause():
  86 + qb = ESQueryBuilder(
  87 + match_fields=["title.en^3.0"],
  88 + multilingual_fields=["title", "brief"],
  89 + shared_fields=[],
  90 + text_embedding_field="title_embedding",
  91 + default_language="en",
  92 + )
  93 + parsed_query = SimpleNamespace(
  94 + query_text_by_lang={"zh": "法式 dress"},
  95 + search_langs=["zh"],
  96 + detected_language="zh",
  97 + source_in_index_languages=True,
  98 + index_languages=["zh", "en"],
  99 + contains_chinese=True,
  100 + contains_english=True,
  101 + )
  102 + q = qb.build_query(query_text="法式 dress", parsed_query=parsed_query, enable_knn=False)
  103 + fields = q["query"]["multi_match"]["fields"]
  104 + bases = {f.split("^", 1)[0] for f in fields}
  105 + assert "title.zh" in bases and "title.en" in bases
  106 + assert "brief.zh" in bases and "brief.en" in bases
  107 + # Merged supplemental language fields use boost * 0.8 (implicit 1.0 -> ^0.8)
  108 + assert "title.en^0.8" in fields
  109 + assert "brief.en^0.8" in fields
  110 +
  111 +
  112 +def test_mixed_script_merges_zh_fields_into_en_clause():
  113 + qb = ESQueryBuilder(
  114 + match_fields=["title.en^3.0"],
  115 + multilingual_fields=["title"],
  116 + shared_fields=[],
  117 + text_embedding_field="title_embedding",
  118 + default_language="en",
  119 + )
  120 + parsed_query = SimpleNamespace(
  121 + query_text_by_lang={"en": "red 连衣裙"},
  122 + search_langs=["en"],
  123 + detected_language="en",
  124 + source_in_index_languages=True,
  125 + index_languages=["zh", "en"],
  126 + contains_chinese=True,
  127 + contains_english=True,
  128 + )
  129 + q = qb.build_query(query_text="red 连衣裙", parsed_query=parsed_query, enable_knn=False)
  130 + fields = q["query"]["multi_match"]["fields"]
  131 + bases = {f.split("^", 1)[0] for f in fields}
  132 + assert "title.en" in bases and "title.zh" in bases
  133 + assert "title.zh^0.8" in fields
  134 +
  135 +
  136 +def test_mixed_script_merged_fields_scale_configured_boosts():
  137 + qb = ESQueryBuilder(
  138 + match_fields=["title.en^3.0"],
  139 + multilingual_fields=["title"],
  140 + shared_fields=[],
  141 + field_boosts={"title.zh": 5.0, "title.en": 10.0},
  142 + text_embedding_field="title_embedding",
  143 + default_language="en",
  144 + )
  145 + parsed_query = SimpleNamespace(
  146 + query_text_by_lang={"zh": "法式 dress"},
  147 + search_langs=["zh"],
  148 + detected_language="zh",
  149 + source_in_index_languages=True,
  150 + index_languages=["zh", "en"],
  151 + contains_chinese=True,
  152 + contains_english=True,
  153 + )
  154 + q = qb.build_query(query_text="法式 dress", parsed_query=parsed_query, enable_knn=False)
  155 + fields = q["query"]["multi_match"]["fields"]
  156 + assert "title.zh^5.0" in fields
  157 + assert "title.en^8.0" in fields # 10.0 * 0.8
  158 +
  159 +
  160 +def test_mixed_script_does_not_merge_en_when_not_in_index_languages():
  161 + qb = ESQueryBuilder(
  162 + match_fields=["title.zh^3.0"],
  163 + multilingual_fields=["title"],
  164 + shared_fields=[],
  165 + text_embedding_field="title_embedding",
  166 + default_language="zh",
  167 + )
  168 + parsed_query = SimpleNamespace(
  169 + query_text_by_lang={"zh": "法式 dress"},
  170 + search_langs=["zh"],
  171 + detected_language="zh",
  172 + source_in_index_languages=True,
  173 + index_languages=["zh"],
  174 + contains_chinese=True,
  175 + contains_english=True,
  176 + )
  177 + q = qb.build_query(query_text="法式 dress", parsed_query=parsed_query, enable_knn=False)
  178 + fields = q["query"]["multi_match"]["fields"]
  179 + bases = {f.split("^", 1)[0] for f in fields}
  180 + assert "title.zh" in bases
  181 + assert "title.en" not in bases
tests/test_query_parser_mixed_language.py
@@ -9,6 +9,14 @@ class _DummyTranslator: @@ -9,6 +9,14 @@ class _DummyTranslator:
9 return f"{text}-{target_lang}" 9 return f"{text}-{target_lang}"
10 10
11 11
  12 +def test_pure_english_word_token_length_and_script():
  13 + assert QueryParser._is_pure_english_word_token("ab") is False
  14 + assert QueryParser._is_pure_english_word_token("abc") is True
  15 + assert QueryParser._is_pure_english_word_token("wi-fi") is True
  16 + assert QueryParser._is_pure_english_word_token("连衣裙") is False
  17 + assert QueryParser._is_pure_english_word_token("ab12") is False
  18 +
  19 +
12 def _build_config() -> SearchConfig: 20 def _build_config() -> SearchConfig:
13 return SearchConfig( 21 return SearchConfig(
14 es_index_name="test_products", 22 es_index_name="test_products",
@@ -38,8 +46,11 @@ def test_parse_adds_en_fields_for_mixed_chinese_query_with_meaningful_english(mo @@ -38,8 +46,11 @@ def test_parse_adds_en_fields_for_mixed_chinese_query_with_meaningful_english(mo
38 result = parser.parse("法式 dress 连衣裙", tenant_id="162", generate_vector=False) 46 result = parser.parse("法式 dress 连衣裙", tenant_id="162", generate_vector=False)
39 47
40 assert result.detected_language == "zh" 48 assert result.detected_language == "zh"
  49 + assert result.contains_chinese is True
  50 + assert result.contains_english is True
41 assert "en" in result.search_langs 51 assert "en" in result.search_langs
42 - assert result.query_text_by_lang["en"] == "法式 dress 连衣裙" 52 + # 翻译在预算内完成时会写入目标语言字段(优于仅用原文做 supplemental 探测)
  53 + assert result.query_text_by_lang["en"] == "法式 dress 连衣裙-en"
43 assert result.query_text_by_lang["zh"] == "法式 dress 连衣裙" 54 assert result.query_text_by_lang["zh"] == "法式 dress 连衣裙"
44 55
45 56
@@ -55,6 +66,28 @@ def test_parse_adds_zh_fields_for_english_query_when_cjk_present(monkeypatch): @@ -55,6 +66,28 @@ def test_parse_adds_zh_fields_for_english_query_when_cjk_present(monkeypatch):
55 result = parser.parse("red 连衣裙", tenant_id="0", generate_vector=False) 66 result = parser.parse("red 连衣裙", tenant_id="0", generate_vector=False)
56 67
57 assert result.detected_language == "en" 68 assert result.detected_language == "en"
  69 + assert result.contains_chinese is True
  70 + assert result.contains_english is True
58 assert "zh" in result.search_langs 71 assert "zh" in result.search_langs
59 - assert result.query_text_by_lang["zh"] == "red 连衣裙" 72 + assert result.query_text_by_lang["zh"] == "red 连衣裙-zh"
60 assert result.query_text_by_lang["en"] == "red 连衣裙" 73 assert result.query_text_by_lang["en"] == "red 连衣裙"
  74 +
  75 +
  76 +def test_parse_waits_for_translation_when_source_in_index_languages(monkeypatch):
  77 + """en 在 index_languages 内时仍应等待并采纳 en->zh 翻译结果(与向量共用预算)。"""
  78 + parser = QueryParser(_build_config(), translator=_DummyTranslator())
  79 + monkeypatch.setattr(parser.language_detector, "detect", lambda text: "en")
  80 + monkeypatch.setattr(
  81 + "query.query_parser.get_tenant_config_loader",
  82 + lambda: SimpleNamespace(get_tenant_config=lambda tenant_id: {"index_languages": ["en", "zh"]}),
  83 + raising=False,
  84 + )
  85 +
  86 + result = parser.parse("off shoulder top", tenant_id="0", generate_vector=False)
  87 +
  88 + assert result.detected_language == "en"
  89 + assert result.contains_chinese is False
  90 + assert result.contains_english is True
  91 + assert result.translations.get("zh") == "off shoulder top-zh"
  92 + assert result.query_text_by_lang.get("zh") == "off shoulder top-zh"
  93 + assert result.source_in_index_languages is True
tests/test_search_rerank_window.py
@@ -5,6 +5,7 @@ from pathlib import Path @@ -5,6 +5,7 @@ from pathlib import Path
5 from types import SimpleNamespace 5 from types import SimpleNamespace
6 from typing import Any, Dict, List 6 from typing import Any, Dict, List
7 7
  8 +import numpy as np
8 import yaml 9 import yaml
9 10
10 from config import ( 11 from config import (
@@ -157,9 +158,7 @@ def _build_search_config(*, rerank_enabled: bool = True, rerank_window: int = 38 @@ -157,9 +158,7 @@ def _build_search_config(*, rerank_enabled: bool = True, rerank_window: int = 38
157 rerank=RerankConfig(enabled=rerank_enabled, rerank_window=rerank_window), 158 rerank=RerankConfig(enabled=rerank_enabled, rerank_window=rerank_window),
158 spu_config=SPUConfig(enabled=False), 159 spu_config=SPUConfig(enabled=False),
159 es_index_name="test_products", 160 es_index_name="test_products",
160 - tenant_config={},  
161 es_settings={}, 161 es_settings={},
162 - services={},  
163 ) 162 )
164 163
165 164
@@ -173,6 +172,19 @@ def _build_searcher(config: SearchConfig, es_client: _FakeESClient) -&gt; Searcher: @@ -173,6 +172,19 @@ def _build_searcher(config: SearchConfig, es_client: _FakeESClient) -&gt; Searcher:
173 return searcher 172 return searcher
174 173
175 174
  175 +class _FakeTextEncoder:
  176 + def __init__(self, vectors: Dict[str, List[float]]):
  177 + self.vectors = {
  178 + key: np.array(value, dtype=np.float32)
  179 + for key, value in vectors.items()
  180 + }
  181 +
  182 + def encode(self, sentences, priority: int = 0, **kwargs):
  183 + if isinstance(sentences, str):
  184 + sentences = [sentences]
  185 + return np.array([self.vectors[text] for text in sentences], dtype=object)
  186 +
  187 +
176 def test_config_loader_rerank_enabled_defaults_true(tmp_path: Path): 188 def test_config_loader_rerank_enabled_defaults_true(tmp_path: Path):
177 config_data = { 189 config_data = {
178 "es_index_name": "test_products", 190 "es_index_name": "test_products",
@@ -327,3 +339,118 @@ def test_searcher_skips_rerank_when_page_exceeds_window(monkeypatch): @@ -327,3 +339,118 @@ def test_searcher_skips_rerank_when_page_exceeds_window(monkeypatch):
327 assert es_client.calls[0]["size"] == 10 339 assert es_client.calls[0]["size"] == 10
328 assert es_client.calls[0]["include_named_queries_score"] is False 340 assert es_client.calls[0]["include_named_queries_score"] is False
329 assert len(es_client.calls) == 1 341 assert len(es_client.calls) == 1
  342 +
  343 +
  344 +def test_searcher_promotes_sku_when_option1_matches_translated_query(monkeypatch):
  345 + es_client = _FakeESClient(total_hits=1)
  346 + searcher = _build_searcher(_build_search_config(rerank_enabled=False), es_client)
  347 + context = create_request_context(reqid="sku-text", uid="u-sku-text")
  348 +
  349 + monkeypatch.setattr(
  350 + "search.searcher.get_tenant_config_loader",
  351 + lambda: SimpleNamespace(get_tenant_config=lambda tenant_id: {"index_languages": ["en", "zh"]}),
  352 + )
  353 +
  354 + class _TranslatedQueryParser:
  355 + text_encoder = None
  356 +
  357 + def parse(self, query: str, tenant_id: str, generate_vector: bool, context: Any):
  358 + return _FakeParsedQuery(
  359 + original_query=query,
  360 + query_normalized=query,
  361 + rewritten_query=query,
  362 + translations={"en": "black dress"},
  363 + )
  364 +
  365 + searcher.query_parser = _TranslatedQueryParser()
  366 +
  367 + def _full_source_with_skus(doc_id: str) -> Dict[str, Any]:
  368 + return {
  369 + "spu_id": doc_id,
  370 + "title": {"en": f"product-{doc_id}"},
  371 + "brief": {"en": f"brief-{doc_id}"},
  372 + "vendor": {"en": f"vendor-{doc_id}"},
  373 + "option1_name": "Color",
  374 + "image_url": "https://img/default.jpg",
  375 + "skus": [
  376 + {"sku_id": "sku-red", "option1_value": "Red", "image_src": "https://img/red.jpg"},
  377 + {"sku_id": "sku-black", "option1_value": "Black", "image_src": "https://img/black.jpg"},
  378 + ],
  379 + }
  380 +
  381 + monkeypatch.setattr(_FakeESClient, "_full_source", staticmethod(_full_source_with_skus))
  382 +
  383 + result = searcher.search(
  384 + query="黑色 连衣裙",
  385 + tenant_id="162",
  386 + from_=0,
  387 + size=1,
  388 + context=context,
  389 + enable_rerank=False,
  390 + )
  391 +
  392 + assert len(result.results) == 1
  393 + assert result.results[0].skus[0].sku_id == "sku-black"
  394 + assert result.results[0].image_url == "https://img/black.jpg"
  395 +
  396 +
  397 +def test_searcher_promotes_sku_by_embedding_when_query_has_no_direct_option_match(monkeypatch):
  398 + es_client = _FakeESClient(total_hits=1)
  399 + searcher = _build_searcher(_build_search_config(rerank_enabled=False), es_client)
  400 + context = create_request_context(reqid="sku-embed", uid="u-sku-embed")
  401 +
  402 + monkeypatch.setattr(
  403 + "search.searcher.get_tenant_config_loader",
  404 + lambda: SimpleNamespace(get_tenant_config=lambda tenant_id: {"index_languages": ["en"]}),
  405 + )
  406 +
  407 + encoder = _FakeTextEncoder(
  408 + {
  409 + "linen summer dress": [0.8, 0.2],
  410 + "color:Red": [1.0, 0.0],
  411 + "color:Blue": [0.0, 1.0],
  412 + }
  413 + )
  414 +
  415 + class _EmbeddingQueryParser:
  416 + text_encoder = encoder
  417 +
  418 + def parse(self, query: str, tenant_id: str, generate_vector: bool, context: Any):
  419 + return _FakeParsedQuery(
  420 + original_query=query,
  421 + query_normalized=query,
  422 + rewritten_query=query,
  423 + translations={},
  424 + query_vector=np.array([0.0, 1.0], dtype=np.float32),
  425 + )
  426 +
  427 + searcher.query_parser = _EmbeddingQueryParser()
  428 +
  429 + def _full_source_with_skus(doc_id: str) -> Dict[str, Any]:
  430 + return {
  431 + "spu_id": doc_id,
  432 + "title": {"en": f"product-{doc_id}"},
  433 + "brief": {"en": f"brief-{doc_id}"},
  434 + "vendor": {"en": f"vendor-{doc_id}"},
  435 + "option1_name": "Color",
  436 + "image_url": "https://img/default.jpg",
  437 + "skus": [
  438 + {"sku_id": "sku-red", "option1_value": "Red", "image_src": "https://img/red.jpg"},
  439 + {"sku_id": "sku-blue", "option1_value": "Blue", "image_src": "https://img/blue.jpg"},
  440 + ],
  441 + }
  442 +
  443 + monkeypatch.setattr(_FakeESClient, "_full_source", staticmethod(_full_source_with_skus))
  444 +
  445 + result = searcher.search(
  446 + query="linen summer dress",
  447 + tenant_id="162",
  448 + from_=0,
  449 + size=1,
  450 + context=context,
  451 + enable_rerank=False,
  452 + )
  453 +
  454 + assert len(result.results) == 1
  455 + assert result.results[0].skus[0].sku_id == "sku-blue"
  456 + assert result.results[0].image_url == "https://img/blue.jpg"