Commit 99b72698b556ae19a00ba4cb6206a1343f033abe

Authored by tangwang
1 parent 5c9baf91

测试回归钩子梳理

 变更清单

 修复(6 处漂移用例,全部更新到最新实现)
- `tests/test_eval_metrics.py` — 整体重写为新的 4 级 label + 级联公式断言,放弃旧的 `RELEVANCE_EXACT/HIGH/LOW/IRRELEVANT` 和硬编码 ERR 值。
- `tests/test_embedding_service_priority.py` — 补齐 `_TextDispatchTask(user_id=...)` 新必填位。
- `tests/test_embedding_pipeline.py` — cache-hit 路径的 `np.allclose` 改用 `np.asarray(..., dtype=float32)` 避开 object-dtype。
- `tests/test_es_query_builder_text_recall_languages.py` — keywords 次 combined_fields 的期望值对齐现行值(`MSM 60% / boost 0.8`)并重命名。
- `tests/test_product_enrich_partial_mode.py`
  - `test_create_prompt_supports_taxonomy_analysis_kind`:去掉错误假设(fr 不属于任何 taxonomy schema),明确 `(None, None, None)` sentinel 的契约。
  - `test_build_index_content_fields_non_apparel_taxonomy_returns_en_only`:fake 模拟真实 schema 行为(unsupported lang 返回空列表),删除"zh 未被调用"的过时断言。

 清理历史过渡物(per 开发原则:不保留内部双轨)
- 删除 `tests/test_keywords_query.py`(已被 `query/keyword_extractor.py` 生产实现取代的早期原型)。
- `tests/test_facet_api.py` / `tests/test_cnclip_service.py` 移动到 `tests/manual/`,更新 `tests/manual/README.md` 说明分工。
- 重写 `tests/conftest.py`:仅保留 `sys.path` 注入,删除全库无人引用的 `sample_search_config / mock_es_client / test_searcher / temp_config_file` 等 fixture。
- 删除 `tests/test_suggestions.py` 中 13 处残留 `@pytest.mark.unit` 装饰器(模块级 `pytestmark` 已覆盖)。

 新建一致性基础设施
- `pytest.ini`:权威配置源。`testpaths = tests`、`norecursedirs = tests/manual`、`--strict-markers`、登记所有子系统 marker + `regression` marker。
- `tests/ci/test_service_api_contracts.py` + 30 个 `tests/test_*.py` 批量贴上 `pytestmark = [pytest.mark.<subsystem>, pytest.mark.regression]`(AST 安全插入,避开多行 import)。
- `scripts/run_regression_tests.sh` 新建,支持 `SUBSYSTEM=<name>` 选子集。
- `scripts/run_ci_tests.sh` 扩容:由原先的 `tests/ci -q` 改为 `contract` marker + `search ∧ regression` 双阶段。

 文档统一(删除历史双轨)
- 重写 `docs/测试Pipeline说明.md`:删除 `tests/unit/` / `tests/integration/` / `scripts/start_test_environment.sh` 等早已不存在的引用,给出目录约定、marker 表、回归锚点矩阵、覆盖缺口清单、联调脚本用法。
- 删除 `docs/测试回归钩子梳理-2026-04-20.md`(内容已合并进上面一份权威文档,按"一处真相"原则下掉)。
- `docs/DEVELOPER_GUIDE.md §8.2 测试` 改写,指向 pipeline 权威文档。
- `CLAUDE.md` 的 `Testing` 与 `Testing Infrastructure` 两节同步更新。

 最终状态

| 指标 | 结果 |
|------|------|
| 全量 `pytest tests/` | **241 passed** |
| `./scripts/run_ci_tests.sh` | 45 passed |
| `./scripts/run_regression_tests.sh` | 233 passed |
| 子系统子集(示例) | search=45 / rerank=35 / embedding=23 / intent=25 / translation=33 / indexer=17 / suggestion=13 / query=6 / eval=8 / contract=34 |
| 未清零的已知缺口 | 见新版 `测试Pipeline说明.md §4`(function_score / facet / image search / config loader / document_transformer 等 6 条) |

Pipeline 文档里 §4 的覆盖缺口我没有强行补测用例——那属于"新增覆盖",不是这次清理的范畴;只要后续谁补,把对应 marker 贴上去、从清单里划掉即可。
Showing 45 changed files with 593 additions and 930 deletions   Show diff stats
CLAUDE.md
... ... @@ -99,18 +99,29 @@ python main.py serve --host 0.0.0.0 --port 6002 --reload
99 99  
100 100 ### Testing
101 101 ```bash
102   -# Run all tests
103   -pytest tests/
  102 +# CI gate (API contracts + search core regression anchors)
  103 +./scripts/run_ci_tests.sh
  104 +
  105 +# Full regression anchor suite (pre-release / pre-merge)
  106 +./scripts/run_regression_tests.sh
  107 +
  108 +# Subsystem-scoped regression (e.g. search / query / intent / rerank / embedding / translation / indexer / suggestion)
  109 +SUBSYSTEM=rerank ./scripts/run_regression_tests.sh
104 110  
105   -# Run focused regression sets
106   -python -m pytest tests/ci -q
  111 +# Whole automated suite
  112 +python -m pytest tests/ -q
  113 +
  114 +# Focused debugging
107 115 pytest tests/test_rerank_client.py
108 116 pytest tests/test_query_parser_mixed_language.py
109 117  
110   -# Test search from command line
  118 +# Command-line smoke
111 119 python main.py search "query" --tenant-id 1 --size 10
112 120 ```
113 121  
  122 +See `docs/测试Pipeline说明.md` for the authoritative test pipeline guide,
  123 +including the regression hook matrix and marker conventions.
  124 +
114 125 ### Development Utilities
115 126 ```bash
116 127 # Stop all services
... ... @@ -218,24 +229,24 @@ The system uses centralized configuration through `config/config.yaml`:
218 229  
219 230 ## Testing Infrastructure
220 231  
221   -**Test Framework**: pytest with async support
  232 +**Framework**: pytest. Authoritative guide: `docs/测试Pipeline说明.md`.
  233 +
  234 +**Layout**:
  235 +- `tests/` — flat file layout; each file targets one subsystem.
  236 +- `tests/ci/` — API / service contract tests (FastAPI `TestClient` with fake backends).
  237 +- `tests/manual/` — scripts that need live services (pytest does **not** collect these).
  238 +- `tests/conftest.py` — sys.path injection only. No global fixtures; all fakes live next to the tests that use them.
222 239  
223   -**Test Structure**:
224   -- `tests/conftest.py`: Comprehensive test fixtures and configuration
225   -- `tests/unit/`: Unit tests for individual components
226   -- `tests/integration/`: Integration tests for system workflows
227   -- Test markers: `@pytest.mark.unit`, `@pytest.mark.integration`, `@pytest.mark.api`
  240 +**Markers** (registered in `pytest.ini`, enforced by `--strict-markers`):
  241 +- Subsystem: `contract`, `search`, `query`, `intent`, `rerank`, `embedding`, `translation`, `indexer`, `suggestion`, `eval`.
  242 +- Regression gate: `regression` — anchor tests mandatory for `run_regression_tests.sh`.
228 243  
229 244 **Test Data**:
230 245 - Tenant1: Mock data with 10,000 product records
231 246 - Tenant2: CSV-based test dataset
232 247 - Automated test data generation via `scripts/mock_data.sh`
233 248  
234   -**Key Test Fixtures** (from `conftest.py`):
235   -- `sample_search_config`: Complete configuration for testing
236   -- `mock_es_client`: Mocked Elasticsearch client
237   -- `test_searcher`: Searcher instance with mock dependencies
238   -- `temp_config_file`: Temporary YAML configuration for tests
  249 +**Principle**: tests must inject fakes for ES / DeepL / LLM / Redis. Never add tests that rely on real external services to the automated suite — put them under `tests/manual/`.
239 250  
240 251 ## API Endpoints
241 252  
... ...
docs/DEVELOPER_GUIDE.md
... ... @@ -386,11 +386,16 @@ services:
386 386  
387 387 ### 8.2 测试
388 388  
389   -- **位置**:`tests/`,可按 `unit/`、`integration/` 或按模块划分子目录;公共 fixture 在 `conftest.py`。
390   -- **标记**:使用 `@pytest.mark.unit`、`@pytest.mark.integration`、`@pytest.mark.api` 等区分用例类型,便于按需运行。
391   -- **依赖**:单元测试通过 mock(如 `mock_es_client`、`sample_search_config`)不依赖真实 ES/DB;集成测试需在说明中注明依赖服务。
392   -- **运行**:`python -m pytest tests/`;推荐最小回归:`python -m pytest tests/ci -q`;按模块聚焦可直接指定具体测试文件。
393   -- **原则**:新增逻辑应有对应测试;修改协议或配置契约时更新相关测试与 fixture。
  389 +测试流水线的权威说明见 [`docs/测试Pipeline说明.md`](./测试Pipeline说明.md)。核心约定:
  390 +
  391 +- **位置**:`tests/` 下按文件平铺,`tests/ci/` 放 API 契约测试,`tests/manual/` 放需人工起服务的联调脚本(pytest 默认不 collect)。
  392 +- **Marker**:`pytest.ini` 里登记了子系统 marker(`search / query / intent / rerank / embedding / translation / indexer / suggestion / eval / contract`)与 `regression` marker;新测试必须贴对应 marker(`--strict-markers` 会强制)。
  393 +- **依赖**:测试一律通过注入 fake stub 隔离 ES / DeepL / LLM / Redis 等外部依赖。需要真实依赖的脚本放 `tests/manual/`。
  394 +- **运行**:
  395 + - CI 门禁:`./scripts/run_ci_tests.sh`(契约 + search 回归锚点)
  396 + - 发版前:`./scripts/run_regression_tests.sh`(全部 `regression` 锚点;可配 `SUBSYSTEM=<name>`)
  397 + - 全量:`python -m pytest tests/ -q`
  398 +- **原则**:新增逻辑应有对应测试;修改协议或配置契约时**同步**更新契约测试。不要在测试里保留"旧 assert 作为兼容"——请直接面向当前实现写断言,失败即意味着契约已变更,需要上层决策。
394 399  
395 400 ### 8.3 配置与环境
396 401  
... ...
docs/测试Pipeline说明.md
1 1 # 搜索引擎测试流水线指南
2 2  
3   -## 概述
  3 +本文档是测试套件的**权威入口**,涵盖目录约定、运行方式、回归锚点矩阵、以及手动
  4 +联调脚本的分工。任何与这里不一致的历史文档(例如提到 `tests/unit/` 或
  5 +`scripts/start_test_environment.sh`)都是过期信息,以本文为准。
4 6  
5   -本文档介绍了搜索引擎项目的完整测试流水线,包括测试环境搭建、测试执行、结果分析等内容。测试流水线设计用于commit前的自动化质量保证。
6   -
7   -## 🏗️ 测试架构
8   -
9   -### 测试层次
  7 +## 1. 测试目录与分层
10 8  
11 9 ```
12   -测试流水线
13   -├── 代码质量检查 (Code Quality)
14   -│ ├── 代码格式化检查 (Black, isort)
15   -│ ├── 静态分析 (Flake8, MyPy, Pylint)
16   -│ └── 安全扫描 (Safety, Bandit)
17   -│
18   -├── 单元测试 (Unit Tests)
19   -│ ├── RequestContext测试
20   -│ ├── Searcher测试
21   -│ ├── QueryParser测试
22   -│ └── BooleanParser测试
23   -│
24   -├── 集成测试 (Integration Tests)
25   -│ ├── 端到端搜索流程测试
26   -│ ├── 多组件协同测试
27   -│ └── 错误处理测试
28   -│
29   -├── API测试 (API Tests)
30   -│ ├── REST API接口测试
31   -│ ├── 参数验证测试
32   -│ ├── 并发请求测试
33   -│ └── 错误响应测试
34   -│
35   -└── 性能测试 (Performance Tests)
36   - ├── 响应时间测试
37   - ├── 并发性能测试
38   - └── 资源使用测试
  10 +tests/
  11 +├── conftest.py # 只做 sys.path 注入;不再维护全局 fixture
  12 +├── ci/ # API/服务契约(FastAPI TestClient + 全 fake 依赖)
  13 +│ └── test_service_api_contracts.py
  14 +├── manual/ # 需真实服务才能跑的联调脚本,pytest 默认不 collect
  15 +│ ├── test_build_docs_api.py
  16 +│ ├── test_cnclip_service.py
  17 +│ └── test_facet_api.py
  18 +└── test_*.py # 子系统单测(全部自带 fake,无外部依赖)
39 19 ```
40 20  
41   -### 核心组件
42   -
43   -1. **RequestContext**: 请求级别的上下文管理器,用于跟踪测试过程中的所有数据
44   -2. **测试环境管理**: 自动化启动/停止测试依赖服务
45   -3. **测试执行引擎**: 统一的测试运行和结果收集
46   -4. **报告生成系统**: 多格式的测试报告生成
47   -
48   -## 🚀 快速开始
  21 +关键约束(写在 `pytest.ini` 里,不要另起分支):
49 22  
50   -### 本地测试环境
  23 +- `testpaths = tests`,`norecursedirs = tests/manual`;
  24 +- `--strict-markers`:所有 marker 必须先在 `pytest.ini::markers` 登记;
  25 +- 测试**不得**依赖真实 ES / DeepL / LLM 服务。需要外部依赖的脚本请放 `tests/manual/`。
51 26  
52   -1. **启动测试环境**
53   - ```bash
54   - # 启动所有必要的测试服务
55   - ./scripts/start_test_environment.sh
56   - ```
  27 +## 2. 运行方式
57 28  
58   -2. **运行完整测试套件**
59   - ```bash
60   - # 运行所有测试
61   - python scripts/run_tests.py
  29 +| 场景 | 命令 | 覆盖范围 |
  30 +|------|------|----------|
  31 +| CI 门禁(每次提交) | `./scripts/run_ci_tests.sh` | `tests/ci` + `contract` marker + `search ∧ regression` |
  32 +| 发版 / 大合并前 | `./scripts/run_regression_tests.sh` | 所有 `@pytest.mark.regression` |
  33 +| 子系统子集 | `SUBSYSTEM=search ./scripts/run_regression_tests.sh` | 指定子系统的 regression 锚点 |
  34 +| 全量(含非回归) | `python -m pytest tests/ -q` | 全部自动化用例 |
  35 +| 手动联调 | `python tests/manual/<script>.py` | 需提前起对应服务 |
62 36  
63   - # 或者使用pytest直接运行
64   - pytest tests/ -v
65   - ```
  37 +## 3. Marker 体系与回归锚点矩阵
66 38  
67   -3. **停止测试环境**
68   - ```bash
69   - ./scripts/stop_test_environment.sh
70   - ```
  39 +marker 定义见 `pytest.ini`。每个测试文件通过模块级 `pytestmark` 贴标,同时
  40 +属于 `regression` 的用例构成“**回归锚点集合**”。
71 41  
72   -### CI/CD测试
  42 +| 子系统 marker | 关键文件(锚点) | 保护的行为 |
  43 +|---------------|------------------|------------|
  44 +| `contract` | `tests/ci/test_service_api_contracts.py` | Search / Indexer / Embedding / Reranker / Translation 的 HTTP 契约 |
  45 +| `search` | `test_search_rerank_window.py`, `test_es_query_builder.py`, `test_es_query_builder_text_recall_languages.py` | Searcher 主路径、排序 / 召回、keywords 副 combined_fields、多语种 |
  46 +| `query` | `test_query_parser_mixed_language.py`, `test_tokenization.py` | 中英混合解析、HanLP 分词、language detect |
  47 +| `intent` | `test_style_intent.py`, `test_product_title_exclusion.py`, `test_sku_intent_selector.py` | 风格意图、商品标题排除、SKU 选型 |
  48 +| `rerank` | `test_rerank_client.py`, `test_rerank_query_text.py`, `test_rerank_provider_topn.py`, `test_reranker_server_topn.py`, `test_reranker_dashscope_backend.py`, `test_reranker_qwen3_gguf_backend.py` | 粗排 / 精排 / topN / 后端切换 |
  49 +| `embedding` | `test_embedding_pipeline.py`, `test_embedding_service_limits.py`, `test_embedding_service_priority.py`, `test_cache_keys.py` | 文本/图像向量客户端、inflight limiter、优先级队列、缓存 key |
  50 +| `translation` | `test_translation_deepl_backend.py`, `test_translation_llm_backend.py`, `test_translation_local_backends.py`, `test_translator_failure_semantics.py` | DeepL / LLM / 本地回退、失败语义 |
  51 +| `indexer` | `test_product_enrich_partial_mode.py`, `test_process_products_batching.py`, `test_llm_enrichment_batch_fill.py` | LLM Partial Mode、batch 拆分、空结果补位 |
  52 +| `suggestion` | `test_suggestions.py` | 建议索引构建 |
  53 +| `eval` | `test_eval_metrics.py`(regression) + `test_search_evaluation_datasets.py` / `test_eval_framework_clients.py`(非 regression) | NDCG / ERR 指标、数据集加载、评估客户端 |
73 54  
74   -1. **GitHub Actions**
75   - - Push到主分支自动触发
76   - - Pull Request自动运行
77   - - 手动触发支持
  55 +> 任何新写的子系统单测,都应该在顶部加 `pytestmark = [pytest.mark.<子系统>, pytest.mark.regression]`。
  56 +> 不贴 `regression` 的测试默认**不会**被 `run_regression_tests.sh` 选中,请谨慎决定。
78 57  
79   -2. **测试报告**
80   - - 自动生成并上传
81   - - PR评论显示测试摘要
82   - - 详细报告下载
  58 +## 4. 当前覆盖缺口(跟踪中)
83 59  
84   -## 📋 测试类型详解
  60 +以下场景目前没有被 `regression` 锚点覆盖,优先级从高到低:
85 61  
86   -### 1. 单元测试 (Unit Tests)
  62 +1. **`api/routes/search.py` 的请求参数映射**:`QueryParser.parse(...)` 透传是否完整(目前只有 `tests/ci` 间接覆盖)。
  63 +2. **`indexer/document_transformer.py` 的端到端转换**:从 MySQL 行到 ES doc 的 snapshot 对比。
  64 +3. **`config/loader.py` 加载多租户配置**:含继承 / override 的合并规则。
  65 +4. **`search/searcher.py::_build_function_score`**:function_score 装配。
  66 +5. **Facet 聚合 / disjunctive 过滤**。
  67 +6. **图像搜索主路径**(`search/image_searcher.py`)。
87 68  
88   -**位置**: `tests/unit/`
  69 +补齐时记得同步贴 `regression` + 对应子系统 marker,并在本表删除条目。
89 70  
90   -**目的**: 测试单个函数、类、模块的功能
  71 +## 5. 手动联调:索引文档构建流水线
91 72  
92   -**覆盖范围**:
93   -- `test_context.py`: RequestContext功能测试
94   -- `test_searcher.py`: Searcher核心功能测试
95   -- `test_query_parser.py`: QueryParser处理逻辑测试
96   -
97   -**运行方式**:
98   -```bash
99   -# 运行所有单元测试
100   -pytest tests/unit/ -v
101   -
102   -# 运行特定测试
103   -pytest tests/unit/test_context.py -v
104   -
105   -# 生成覆盖率报告
106   -pytest tests/unit/ --cov=. --cov-report=html
107   -```
108   -
109   -### 2. 集成测试 (Integration Tests)
110   -
111   -**位置**: `tests/integration/`
112   -
113   -**目的**: 测试多个组件协同工作的功能
114   -
115   -**覆盖范围**:
116   -- `test_search_integration.py`: 完整搜索流程集成
117   -- 数据库、ES、搜索器集成测试
118   -- 错误传播和处理测试
119   -
120   -**运行方式**:
121   -```bash
122   -# 运行集成测试(需要启动测试环境)
123   -pytest tests/integration/ -v -m "not slow"
124   -
125   -# 运行包含慢速测试的集成测试
126   -pytest tests/integration/ -v
127   -```
128   -
129   -### 3. API测试 (API Tests)
130   -
131   -**位置**: `tests/integration/test_api_integration.py`
132   -
133   -**目的**: 测试HTTP API接口的功能和性能
134   -
135   -**覆盖范围**:
136   -- 基本搜索API
137   -- 参数验证
138   -- 错误处理
139   -- 并发请求
140   -- Unicode支持
141   -
142   -**运行方式**:
143   -```bash
144   -# 运行API测试
145   -pytest tests/integration/test_api_integration.py -v
146   -```
147   -
148   -### 5. 索引 & 文档构建流水线验证(手动)
149   -
150   -除了自动化测试外,推荐在联调/问题排查时手动跑一遍“**从 MySQL 到 ES doc**”的索引流水线,确保字段与 mapping、查询逻辑一致。
151   -
152   -#### 5.1 启动 Indexer 服务
  73 +除自动化测试外,联调/问题排查时建议走一遍“**MySQL → ES doc**”链路,确保字段与 mapping
  74 +与查询逻辑对齐。
153 75  
154 76 ```bash
155 77 cd /home/tw/saas-search
156 78 ./scripts/stop.sh # 停掉已有进程(可选)
157   -./scripts/start_indexer.sh # 启动专用 indexer 服务,默认端口 6004
158   -```
159   -
160   -#### 5.2 基于数据库构建 ES doc(只看、不写 ES)
  79 +./scripts/start_indexer.sh # 启动 indexer 服务,默认端口 6004
161 80  
162   -> 场景:已经知道某个 `tenant_id` 和 `spu_id`,想看它在“最新逻辑下”的 ES 文档长什么样。
163   -
164   -```bash
165 81 curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \
166 82 -H "Content-Type: application/json" \
167   - -d '{
168   - "tenant_id": "170",
169   - "spu_ids": ["223167"]
170   - }'
171   -```
172   -
173   -返回中:
174   -
175   -- `docs[0]` 为当前代码构造出来的完整 ES doc(与 `mappings/search_products.json` 对齐);
176   -- 可以直接比对:
177   - - 索引字段说明:`docs/索引字段说明v2.md`
178   - - 实际 ES 文档:`docs/常用查询 - ES.md` 中的查询示例(按 `spu_id` 过滤)。
179   -
180   -#### 5.3 与 ES 实际数据对比
181   -
182   -```bash
183   -curl -u 'essa:***' \
184   - -X GET 'http://localhost:9200/search_products_tenant_170/_search?pretty' \
185   - -H 'Content-Type: application/json' \
186   - -d '{
187   - "size": 5,
188   - "_source": ["title", "tags"],
189   - "query": {
190   - "bool": {
191   - "filter": [
192   - { "term": { "spu_id": "223167" } }
193   - ]
194   - }
195   - }
196   - }'
  83 + -d '{ "tenant_id": "170", "spu_ids": ["223167"] }'
197 84 ```
198 85  
199   -对比如下内容是否一致:
200   -
201   -- 多语言字段:`title/brief/description/vendor/category_name_text/category_path`;
202   -- 结构字段:`tags/specifications/skus/min_price/max_price/compare_at_price/total_inventory` 等;
203   -- 算法字段:`title_embedding` 是否存在(值不必逐项比对)。
204   -
205   -如果两边不一致,可以结合:
206   -
207   -- `indexer/document_transformer.py`(文档构造逻辑);
208   -- `indexer/incremental_service.py`(增量索引/查库逻辑);
209   -- `logs/indexer.log`(索引日志)
210   -
211   -逐步缩小问题范围。
212   -
213   -### 4. 性能测试 (Performance Tests)
214   -
215   -**目的**: 验证系统性能指标
216   -
217   -**测试内容**:
218   -- 搜索响应时间
219   -- API并发处理能力
220   -- 资源使用情况
221   -
222   -**运行方式**:
223   -```bash
224   -# 运行性能测试
225   -python scripts/run_performance_tests.py
226   -```
227   -
228   -## 🛠️ 环境配置
229   -
230   -### 测试环境要求
231   -
232   -1. **Python环境**
233   - ```bash
234   - # 创建测试环境
235   - conda create -n searchengine-test python=3.9
236   - conda activate searchengine-test
237   -
238   - # 安装依赖
239   - pip install -r requirements.txt
240   - pip install pytest pytest-cov pytest-json-report
241   - ```
242   -
243   -2. **Elasticsearch**
244   - ```bash
245   - # 使用Docker启动ES
246   - docker run -d \
247   - --name elasticsearch \
248   - -p 9200:9200 \
249   - -e "discovery.type=single-node" \
250   - -e "xpack.security.enabled=false" \
251   - elasticsearch:8.8.0
252   - ```
253   -
254   -3. **环境变量**
255   - ```bash
256   - export ES_HOST="http://localhost:9200"
257   - export ES_USERNAME="elastic"
258   - export ES_PASSWORD="changeme"
259   - export API_HOST="127.0.0.1"
260   - export API_PORT="6003"
261   - export TENANT_ID="test_tenant"
262   - export TESTING_MODE="true"
263   - ```
264   -
265   -### 服务依赖
266   -
267   -测试环境需要以下服务:
268   -
269   -1. **Elasticsearch** (端口9200)
270   - - 存储和搜索测试数据
271   - - 支持中文和英文索引
272   -
273   -2. **API服务** (端口6003)
274   - - FastAPI测试服务
275   - - 提供搜索接口
276   -
277   -3. **测试数据库**
278   - - 预配置的测试索引
279   - - 包含测试数据
280   -
281   -## 📊 测试报告
282   -
283   -### 报告类型
284   -
285   -1. **实时控制台输出**
286   - - 测试进度显示
287   - - 失败详情
288   - - 性能摘要
289   -
290   -2. **JSON格式报告**
291   - ```json
292   - {
293   - "timestamp": "2024-01-01T10:00:00",
294   - "summary": {
295   - "total_tests": 150,
296   - "passed": 148,
297   - "failed": 2,
298   - "success_rate": 98.7
299   - },
300   - "suites": { ... }
301   - }
302   - ```
303   -
304   -3. **文本格式报告**
305   - - 人类友好的格式
306   - - 包含测试摘要和详情
307   - - 适合PR评论
308   -
309   -4. **HTML覆盖率报告**
310   - - 代码覆盖率可视化
311   - - 分支和行覆盖率
312   - - 缺失测试高亮
313   -
314   -### 报告位置
315   -
316   -```
317   -test_logs/
318   -├── unit_test_results.json # 单元测试结果
319   -├── integration_test_results.json # 集成测试结果
320   -├── api_test_results.json # API测试结果
321   -├── test_report_20240101_100000.txt # 文本格式摘要
322   -├── test_report_20240101_100000.json # JSON格式详情
323   -└── htmlcov/ # HTML覆盖率报告
324   -```
325   -
326   -## 🔄 CI/CD集成
327   -
328   -### GitHub Actions工作流
329   -
330   -**触发条件**:
331   -- Push到主分支
332   -- Pull Request创建/更新
333   -- 手动触发
334   -
335   -**工作流阶段**:
336   -
337   -1. **代码质量检查**
338   - - 代码格式验证
339   - - 静态代码分析
340   - - 安全漏洞扫描
341   -
342   -2. **单元测试**
343   - - 多Python版本矩阵测试
344   - - 代码覆盖率收集
345   - - 自动上传到Codecov
346   -
347   -3. **集成测试**
348   - - 服务依赖启动
349   - - 端到端功能测试
350   - - 错误处理验证
351   -
352   -4. **API测试**
353   - - 接口功能验证
354   - - 参数校验测试
355   - - 并发请求测试
356   -
357   -5. **性能测试**
358   - - 响应时间检查
359   - - 资源使用监控
360   - - 性能回归检测
361   -
362   -6. **测试报告生成**
363   - - 结果汇总
364   - - 报告上传
365   - - PR评论更新
366   -
367   -### 工作流配置
368   -
369   -**文件**: `.github/workflows/test.yml`
370   -
371   -**关键特性**:
372   -- 并行执行提高效率
373   -- 服务容器化隔离
374   -- 自动清理资源
375   -- 智能缓存依赖
376   -
377   -## 🧪 测试最佳实践
378   -
379   -### 1. 测试编写原则
380   -
381   -- **独立性**: 每个测试应该独立运行
382   -- **可重复性**: 测试结果应该一致
383   -- **快速执行**: 单元测试应该快速完成
384   -- **清晰命名**: 测试名称应该描述测试内容
385   -
386   -### 2. 测试数据管理
387   -
388   -```python
389   -# 使用fixture提供测试数据
390   -@pytest.fixture
391   -def sample_tenant_config():
392   - return TenantConfig(
393   - tenant_id="test_tenant",
394   - es_index_name="test_products"
395   - )
396   -
397   -# 使用mock避免外部依赖
398   -@patch('search.searcher.ESClient')
399   -def test_search_with_mock_es(mock_es_client, test_searcher):
400   - mock_es_client.search.return_value = mock_response
401   - result = test_searcher.search("test query")
402   - assert result is not None
403   -```
404   -
405   -### 3. RequestContext集成
406   -
407   -```python
408   -def test_with_context(test_searcher):
409   - context = create_request_context("test-req", "test-user")
410   -
411   - result = test_searcher.search("test query", context=context)
412   -
413   - # 验证context被正确更新
414   - assert context.query_analysis.original_query == "test query"
415   - assert context.get_stage_duration("elasticsearch_search") > 0
416   -```
417   -
418   -### 4. 性能测试指南
419   -
420   -```python
421   -def test_search_performance(client):
422   - start_time = time.time()
423   - response = client.get("/search", params={"q": "test query"})
424   - response_time = (time.time() - start_time) * 1000
425   -
426   - assert response.status_code == 200
427   - assert response_time < 2000 # 2秒内响应
428   -```
429   -
430   -## 🚨 故障排除
431   -
432   -### 常见问题
433   -
434   -1. **Elasticsearch连接失败**
435   - ```bash
436   - # 检查ES状态
437   - curl http://localhost:9200/_cluster/health
438   -
439   - # 重启ES服务
440   - docker restart elasticsearch
441   - ```
442   -
443   -2. **测试端口冲突**
444   - ```bash
445   - # 检查端口占用
446   - lsof -i :6003
447   -
448   - # 修改API端口
449   - export API_PORT="6004"
450   - ```
451   -
452   -3. **依赖包缺失**
453   - ```bash
454   - # 重新安装依赖
455   - pip install -r requirements.txt
456   - pip install pytest pytest-cov pytest-json-report
457   - ```
458   -
459   -4. **测试数据问题**
460   - ```bash
461   - # 重新创建测试索引
462   - curl -X DELETE http://localhost:9200/test_products
463   - ./scripts/start_test_environment.sh
464   - ```
465   -
466   -### 调试技巧
467   -
468   -1. **详细日志输出**
469   - ```bash
470   - pytest tests/unit/test_context.py -v -s --tb=long
471   - ```
472   -
473   -2. **运行单个测试**
474   - ```bash
475   - pytest tests/unit/test_context.py::TestRequestContext::test_create_context -v
476   - ```
477   -
478   -3. **调试模式**
479   - ```python
480   - import pdb; pdb.set_trace()
481   - ```
482   -
483   -4. **性能分析**
484   - ```bash
485   - pytest --profile tests/
486   - ```
487   -
488   -## 📈 持续改进
489   -
490   -### 测试覆盖率目标
491   -
492   -- **单元测试**: > 90%
493   -- **集成测试**: > 80%
494   -- **API测试**: > 95%
495   -
496   -### 性能基准
497   -
498   -- **搜索响应时间**: < 2秒
499   -- **API并发处理**: 100 QPS
500   -- **系统资源使用**: < 80% CPU, < 4GB RAM
  86 +返回中 `docs[0]` 即当前代码构造的 ES doc(与 `mappings/search_products.json` 对齐)。
  87 +与真实 ES 数据对比的查询参考 `docs/常用查询 - ES.md`;若字段不一致,按以下路径定位:
501 88  
502   -### 质量门禁
  89 +- `indexer/document_transformer.py` — 文档构造逻辑
  90 +- `indexer/incremental_service.py` — 增量查库逻辑
  91 +- `logs/indexer.log` — 索引日志
503 92  
504   -- **所有测试必须通过**
505   -- **代码覆盖率不能下降**
506   -- **性能不能显著退化**
507   -- **不能有安全漏洞**
  93 +## 6. 编写测试的约束(与 `开发原则` 对齐)
508 94  
  95 +- **fail fast**:测试输入不合法时应直接抛错,不用 `if ... return`;不要用 `try/except` 吃掉异常再 `assert not exception`。
  96 +- **不做兼容双轨**:用例对准当前实现,不为历史行为保留“旧 assert”。若确有外部兼容性(例如 API 上标注 Deprecated 的字段),在 `tests/ci` 里单独写**契约**用例并注明 Deprecated。
  97 +- **外部依赖全 fake**:凡是依赖 HTTP / Redis / ES / LLM 的测试必须注入 fake stub,否则归入 `tests/manual/`。
  98 +- **一处真相**:共享 fixture 如果超过 2 个文件使用,放 `tests/conftest.py`;只给 1 个文件用就放在该文件内。避免再次出现全库无人引用的 dead fixture。
... ...
pytest.ini 0 → 100644
... ... @@ -0,0 +1,30 @@
  1 +[pytest]
  2 +# 权威的 pytest 配置源。新增共享配置请放这里,不要再散落到各测试文件头部。
  3 +#
  4 +# testpaths 明确只扫 tests/(含 tests/ci/),刻意排除 tests/manual/。
  5 +testpaths = tests
  6 +# tests/manual/ 里的脚本依赖外部服务,不参与自动回归。
  7 +norecursedirs = tests/manual
  8 +
  9 +addopts = -ra --strict-markers
  10 +
  11 +# 全局静默第三方的 DeprecationWarning,避免遮掩真正需要关注的业务警告。
  12 +filterwarnings =
  13 + ignore::DeprecationWarning
  14 + ignore::PendingDeprecationWarning
  15 +
  16 +# 子系统 / 回归分层标记。新增 marker 前先在这里登记,未登记的 marker 会因
  17 +# --strict-markers 直接报错。
  18 +markers =
  19 + regression: 提交/发布前必跑的回归锚点集合
  20 + contract: API / 服务契约(tests/ci 默认全部归入)
  21 + search: Searcher / 排序 / 召回管线
  22 + query: QueryParser / 翻译 / 分词
  23 + intent: 样式与 SKU 意图识别
  24 + rerank: 粗排 / 精排 / 融合
  25 + embedding: 文本/图像向量服务与客户端
  26 + translation: 翻译服务与缓存
  27 + indexer: 索引构建 / LLM enrich
  28 + suggestion: 搜索建议索引
  29 + eval: 评估框架
  30 + manual: 需人工起服务,CI 不跑
... ...
scripts/run_ci_tests.sh
1 1 #!/bin/bash
  2 +# CI 门禁脚本:每次提交必跑的最小集合。
  3 +#
  4 +# 覆盖范围:
  5 +# 1. tests/ci 下的服务契约测试(HTTP/JSON schema / 路由 / 鉴权)
  6 +# 2. tests/ 下带 `contract` marker 的所有用例(冗余保障,防止 marker 与目录漂移)
  7 +# 3. 搜索主路径 + ES 查询构建器的回归锚点(search 子系统)
  8 +#
  9 +# 超出这个范围的完整回归集请用 scripts/run_regression_tests.sh。
2 10  
3 11 set -euo pipefail
4 12  
5 13 cd "$(dirname "$0")/.."
6 14 source ./activate.sh
7 15  
8   -echo "Running CI contract tests..."
9   -python -m pytest tests/ci -q
  16 +echo "==> [CI-1/2] API contract tests (tests/ci + contract marker)..."
  17 +python -m pytest tests/ci tests/ -q -m contract
  18 +
  19 +echo "==> [CI-2/2] Search core regression (search marker)..."
  20 +python -m pytest tests/ -q -m "search and regression"
... ...
scripts/run_regression_tests.sh 0 → 100755
... ... @@ -0,0 +1,26 @@
  1 +#!/bin/bash
  2 +# 回归锚点脚本:发版 / 大合并前必跑的回归集合。
  3 +#
  4 +# 选中策略:所有 @pytest.mark.regression 用例,即 docs/测试Pipeline说明.md
  5 +# “回归钩子矩阵” 中列出的各子系统锚点。
  6 +#
  7 +# 可选参数:
  8 +# SUBSYSTEM=search ./scripts/run_regression_tests.sh # 只跑某个子系统的回归子集
  9 +#
  10 +# 约束:本脚本不启外部依赖(ES / DeepL / LLM 全 fake)。如需真实依赖,请用
  11 +# tests/manual 下的脚本。
  12 +
  13 +set -euo pipefail
  14 +
  15 +cd "$(dirname "$0")/.."
  16 +source ./activate.sh
  17 +
  18 +SUBSYSTEM="${SUBSYSTEM:-}"
  19 +
  20 +if [[ -n "${SUBSYSTEM}" ]]; then
  21 + echo "==> Running regression subset: subsystem=${SUBSYSTEM}"
  22 + python -m pytest tests/ -q -m "${SUBSYSTEM} and regression"
  23 +else
  24 + echo "==> Running full regression anchor suite..."
  25 + python -m pytest tests/ -q -m regression
  26 +fi
... ...
search/searcher.py
... ... @@ -370,6 +370,11 @@ class Searcher:
370 370 # (on the same dimension as optionN).
371 371 includes.add("enriched_taxonomy_attributes")
372 372  
  373 + # Needed when inner_hits url string differs from sku.image_src but ES exposes
  374 + # _nested.offset — we re-resolve the winning url from image_embedding[offset].
  375 + if self._has_image_signal(parsed_query):
  376 + includes.add("image_embedding")
  377 +
373 378 return {"includes": sorted(includes)}
374 379  
375 380 def _fetch_hits_by_ids(
... ...
search/sku_intent_selector.py
... ... @@ -40,7 +40,8 @@ from __future__ import annotations
40 40  
41 41 from dataclasses import dataclass, field
42 42 from typing import Any, Callable, Dict, List, Optional, Tuple
43   -from urllib.parse import urlsplit
  43 +import posixpath
  44 +from urllib.parse import unquote, urlsplit
44 45  
45 46 from query.style_intent import (
46 47 DetectedStyleIntent,
... ... @@ -439,6 +440,7 @@ class StyleSkuSelector:
439 440 # ------------------------------------------------------------------
440 441 @staticmethod
441 442 def _normalize_url(url: Any) -> str:
  443 + """host + path, no query/fragment; casefolded — primary equality key."""
442 444 raw = str(url or "").strip()
443 445 if not raw:
444 446 return ""
... ... @@ -448,20 +450,93 @@ class StyleSkuSelector:
448 450 try:
449 451 parts = urlsplit(raw)
450 452 except ValueError:
451   - return raw.casefold()
  453 + return str(url).strip().casefold()
452 454 host = (parts.netloc or "").casefold()
453   - path = parts.path or ""
  455 + path = unquote(parts.path or "")
454 456 return f"{host}{path}".casefold()
455 457  
  458 + @staticmethod
  459 + def _normalize_path_only(url: Any) -> str:
  460 + """Path-only key for cross-CDN / host-alias cases."""
  461 + raw = str(url or "").strip()
  462 + if not raw:
  463 + return ""
  464 + if raw.startswith("//"):
  465 + raw = "https:" + raw
  466 + try:
  467 + parts = urlsplit(raw)
  468 + path = unquote(parts.path or "")
  469 + except ValueError:
  470 + return ""
  471 + return path.casefold().rstrip("/")
  472 +
  473 + @classmethod
  474 + def _url_filename(cls, url: Any) -> str:
  475 + p = cls._normalize_path_only(url)
  476 + if not p:
  477 + return ""
  478 + return posixpath.basename(p).casefold()
  479 +
  480 + @classmethod
  481 + def _urls_equivalent(cls, a: Any, b: Any) -> bool:
  482 + if not a or not b:
  483 + return False
  484 + na, nb = cls._normalize_url(a), cls._normalize_url(b)
  485 + if na and nb and na == nb:
  486 + return True
  487 + pa, pb = cls._normalize_path_only(a), cls._normalize_path_only(b)
  488 + if pa and pb and pa == pb:
  489 + return True
  490 + fa, fb = cls._url_filename(a), cls._url_filename(b)
  491 + if fa and fb and fa == fb and len(fa) > 4:
  492 + return True
  493 + return False
  494 +
  495 + @staticmethod
  496 + def _inner_hit_url_candidates(entry: Dict[str, Any], source: Dict[str, Any]) -> List[str]:
  497 + """URLs to try for this inner_hit: _source.url plus image_embedding[offset].url."""
  498 + out: List[str] = []
  499 + src = entry.get("_source") or {}
  500 + u = src.get("url")
  501 + if u:
  502 + out.append(str(u).strip())
  503 + nested = entry.get("_nested")
  504 + if not isinstance(nested, dict):
  505 + return out
  506 + off = nested.get("offset")
  507 + if not isinstance(off, int):
  508 + return out
  509 + embs = source.get("image_embedding")
  510 + if not isinstance(embs, list) or not (0 <= off < len(embs)):
  511 + return out
  512 + emb = embs[off]
  513 + if isinstance(emb, dict) and emb.get("url"):
  514 + u2 = str(emb.get("url")).strip()
  515 + if u2 and u2 not in out:
  516 + out.append(u2)
  517 + return out
  518 +
456 519 def _pick_sku_by_image(
457 520 self,
458 521 hit: Dict[str, Any],
459 522 source: Dict[str, Any],
460 523 ) -> Optional[ImagePick]:
  524 + """Map ES nested image KNN inner_hits to a SKU via image URL alignment.
  525 +
  526 + ``image_pick`` is empty when:
  527 + - ES did not return ``inner_hits`` for this hit (e.g. doc outside
  528 + ``rescore.window_size`` so no exact-image rescore inner_hits; or the
  529 + nested image clause did not match this document).
  530 + - The winning nested ``url`` cannot be aligned to any ``skus[].image_src``
  531 + even after path/filename normalization (rare CDN / encoding edge cases).
  532 +
  533 + We try ``_source.url``, ``_nested.offset`` + ``image_embedding[offset].url``,
  534 + and loose path/filename matching to reduce false negatives.
  535 + """
461 536 inner_hits = hit.get("inner_hits")
462 537 if not isinstance(inner_hits, dict):
463 538 return None
464   - top_url: Optional[str] = None
  539 + best_entry: Optional[Dict[str, Any]] = None
465 540 top_score: Optional[float] = None
466 541 for key in _IMAGE_INNER_HITS_KEYS:
467 542 payload = inner_hits.get(key)
... ... @@ -474,33 +549,36 @@ class StyleSkuSelector:
474 549 for entry in inner_list:
475 550 if not isinstance(entry, dict):
476 551 continue
477   - url = (entry.get("_source") or {}).get("url")
478   - if not url:
  552 + if not self._inner_hit_url_candidates(entry, source):
479 553 continue
480 554 try:
481 555 score = float(entry.get("_score") or 0.0)
482 556 except (TypeError, ValueError):
483 557 score = 0.0
484 558 if top_score is None or score > top_score:
485   - top_url = str(url)
  559 + best_entry = entry
486 560 top_score = score
487   - if top_url is not None:
488   - break # Prefer the first listed inner_hits source (exact > approx).
489   - if top_url is None:
  561 + if best_entry is not None:
  562 + break # Prefer exact_image_knn_query_hits over image_knn_query_hits.
  563 + if best_entry is None:
  564 + return None
  565 +
  566 + candidates = self._inner_hit_url_candidates(best_entry, source)
  567 + if not candidates:
490 568 return None
491 569  
492 570 skus = source.get("skus")
493 571 if not isinstance(skus, list):
494 572 return None
495   - target = self._normalize_url(top_url)
496 573 for sku in skus:
497   - sku_url = self._normalize_url(sku.get("image_src") or sku.get("imageSrc"))
498   - if sku_url and sku_url == target:
499   - return ImagePick(
500   - sku_id=str(sku.get("sku_id") or ""),
501   - url=top_url,
502   - score=float(top_score or 0.0),
503   - )
  574 + sku_raw = sku.get("image_src") or sku.get("imageSrc")
  575 + for cand in candidates:
  576 + if self._urls_equivalent(cand, sku_raw):
  577 + return ImagePick(
  578 + sku_id=str(sku.get("sku_id") or ""),
  579 + url=cand,
  580 + score=float(top_score or 0.0),
  581 + )
504 582 return None
505 583  
506 584 # ------------------------------------------------------------------
... ...
tests/ci/test_service_api_contracts.py
... ... @@ -11,6 +11,8 @@ import pytest
11 11 from fastapi.testclient import TestClient
12 12 from translation.scenes import normalize_scene_name
13 13  
  14 +pytestmark = [pytest.mark.contract, pytest.mark.regression]
  15 +
14 16  
15 17 class _FakeSearcher:
16 18 def search(self, **kwargs):
... ...
tests/conftest.py
1   -"""
2   -pytest配置文件
  1 +"""pytest 全局配置。
  2 +
  3 +- 项目根路径注入(便于 `tests/` 下模块直接 `from <pkg>` 导入)
  4 +- marker / testpaths / 过滤规则的**权威来源是 `pytest.ini`**,不在这里重复定义
3 5  
4   -提供测试夹具和共享配置
  6 +历史上这里曾定义过一批 `sample_search_config / mock_es_client / test_searcher` 等
  7 +fixture,但 2026-Q2 起的测试全部自带 fake stub,这些 fixture 全库无人引用,已一并
  8 +移除。新增共享 fixture 时请明确列出其被哪些测试使用,避免再次出现 dead fixtures。
5 9 """
6 10  
7 11 import os
8 12 import sys
9   -import pytest
10   -import tempfile
11   -from typing import Dict, Any, Generator
12   -from unittest.mock import Mock, MagicMock
13 13  
14   -# 添加项目根目录到Python路径
15 14 project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
16 15 sys.path.insert(0, project_root)
17   -
18   -from config import SearchConfig, QueryConfig, IndexConfig, SPUConfig, FunctionScoreConfig, RerankConfig
19   -from utils.es_client import ESClient
20   -from search import Searcher
21   -from query import QueryParser
22   -from context import RequestContext, create_request_context
23   -
24   -
25   -@pytest.fixture
26   -def sample_index_config() -> IndexConfig:
27   - """样例索引配置"""
28   - return IndexConfig(
29   - name="default",
30   - label="默认索引",
31   - fields=["title.zh", "brief.zh", "tags"],
32   - boost=1.0
33   - )
34   -
35   -
36   -@pytest.fixture
37   -def sample_search_config(sample_index_config) -> SearchConfig:
38   - """样例搜索配置"""
39   - query_config = QueryConfig(
40   - enable_query_rewrite=True,
41   - enable_text_embedding=True,
42   - supported_languages=["zh", "en"]
43   - )
44   -
45   - spu_config = SPUConfig(
46   - enabled=True,
47   - spu_field="spu_id",
48   - inner_hits_size=3
49   - )
50   -
51   - function_score_config = FunctionScoreConfig()
52   - rerank_config = RerankConfig()
53   -
54   - return SearchConfig(
55   - es_index_name="test_products",
56   - field_boosts={
57   - "tenant_id": 1.0,
58   - "title.zh": 3.0,
59   - "brief.zh": 1.5,
60   - "tags": 1.0,
61   - "category_path.zh": 1.5,
62   - },
63   - indexes=[sample_index_config],
64   - query_config=query_config,
65   - function_score=function_score_config,
66   - rerank=rerank_config,
67   - spu_config=spu_config
68   - )
69   -
70   -
71   -@pytest.fixture
72   -def mock_es_client() -> Mock:
73   - """模拟ES客户端"""
74   - mock_client = Mock(spec=ESClient)
75   -
76   - # 模拟搜索响应
77   - mock_response = {
78   - "hits": {
79   - "total": {"value": 10},
80   - "max_score": 2.5,
81   - "hits": [
82   - {
83   - "_id": "1",
84   - "_score": 2.5,
85   - "_source": {
86   - "title": {"zh": "红色连衣裙"},
87   - "vendor": {"zh": "测试品牌"},
88   - "min_price": 299.0,
89   - "category_id": "1"
90   - }
91   - },
92   - {
93   - "_id": "2",
94   - "_score": 2.2,
95   - "_source": {
96   - "title": {"zh": "蓝色连衣裙"},
97   - "vendor": {"zh": "测试品牌"},
98   - "min_price": 399.0,
99   - "category_id": "1"
100   - }
101   - }
102   - ]
103   - },
104   - "took": 15
105   - }
106   -
107   - mock_client.search.return_value = mock_response
108   - return mock_client
109   -
110   -
111   -@pytest.fixture
112   -def test_searcher(sample_search_config, mock_es_client) -> Searcher:
113   - """测试用Searcher实例"""
114   - return Searcher(
115   - es_client=mock_es_client,
116   - config=sample_search_config
117   - )
118   -
119   -
120   -@pytest.fixture
121   -def test_query_parser(sample_search_config) -> QueryParser:
122   - """测试用QueryParser实例"""
123   - return QueryParser(sample_search_config)
124   -
125   -
126   -@pytest.fixture
127   -def test_request_context() -> RequestContext:
128   - """测试用RequestContext实例"""
129   - return create_request_context("test-req-001", "test-user")
130   -
131   -
132   -@pytest.fixture
133   -def sample_search_results() -> Dict[str, Any]:
134   - """样例搜索结果"""
135   - return {
136   - "query": "红色连衣裙",
137   - "expected_total": 2,
138   - "expected_products": [
139   - {"title": "红色连衣裙", "min_price": 299.0},
140   - {"title": "蓝色连衣裙", "min_price": 399.0}
141   - ]
142   - }
143   -
144   -
145   -@pytest.fixture
146   -def temp_config_file() -> Generator[str, None, None]:
147   - """临时配置文件"""
148   - import tempfile
149   - import yaml
150   -
151   - config_data = {
152   - "es_index_name": "test_products",
153   - "field_boosts": {
154   - "title.zh": 3.0,
155   - "brief.zh": 1.5,
156   - "tags": 1.0,
157   - "category_path.zh": 1.5
158   - },
159   - "indexes": [
160   - {
161   - "name": "default",
162   - "label": "默认索引",
163   - "fields": ["title.zh", "brief.zh", "tags"],
164   - "boost": 1.0
165   - }
166   - ],
167   - "query_config": {
168   - "supported_languages": ["zh", "en"],
169   - "default_language": "zh",
170   - "enable_text_embedding": True,
171   - "enable_query_rewrite": True
172   - },
173   - "spu_config": {
174   - "enabled": True,
175   - "spu_field": "spu_id",
176   - "inner_hits_size": 3
177   - },
178   - "ranking": {
179   - "expression": "bm25() + 0.2*text_embedding_relevance()",
180   - "description": "Test ranking"
181   - },
182   - "function_score": {
183   - "score_mode": "sum",
184   - "boost_mode": "multiply",
185   - "functions": []
186   - },
187   - "rerank": {
188   - "rerank_window": 386
189   - }
190   - }
191   -
192   - with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
193   - yaml.dump(config_data, f)
194   - temp_file = f.name
195   -
196   - yield temp_file
197   -
198   - # 清理
199   - os.unlink(temp_file)
200   -
201   -
202   -@pytest.fixture
203   -def mock_env_variables(monkeypatch):
204   - """设置环境变量"""
205   - monkeypatch.setenv("ES_HOST", "http://localhost:9200")
206   - monkeypatch.setenv("ES_USERNAME", "elastic")
207   - monkeypatch.setenv("ES_PASSWORD", "changeme")
208   -
209   -
210   -# 标记配置
211   -pytest_plugins = []
212   -
213   -# 标记定义
214   -def pytest_configure(config):
215   - """配置pytest标记"""
216   - config.addinivalue_line(
217   - "markers", "unit: 单元测试"
218   - )
219   - config.addinivalue_line(
220   - "markers", "integration: 集成测试"
221   - )
222   - config.addinivalue_line(
223   - "markers", "api: API测试"
224   - )
225   - config.addinivalue_line(
226   - "markers", "e2e: 端到端测试"
227   - )
228   - config.addinivalue_line(
229   - "markers", "performance: 性能测试"
230   - )
231   - config.addinivalue_line(
232   - "markers", "slow: 慢速测试"
233   - )
234   -
235   -
236   -# 测试数据
237   -@pytest.fixture
238   -def test_queries():
239   - """测试查询集合"""
240   - return [
241   - "红色连衣裙",
242   - "wireless bluetooth headphones",
243   - "手机 手机壳",
244   - "laptop AND (gaming OR professional)",
245   - "运动鞋 -价格:0-500"
246   - ]
247   -
248   -
249   -@pytest.fixture
250   -def expected_response_structure():
251   - """期望的API响应结构"""
252   - return {
253   - "hits": list,
254   - "total": int,
255   - "max_score": float,
256   - "took_ms": int,
257   - "aggregations": dict,
258   - "query_info": dict,
259   - "performance_summary": dict
260   - }
... ...
tests/test_cnclip_service.py renamed to tests/manual/test_cnclip_service.py
tests/test_facet_api.py renamed to tests/manual/test_facet_api.py
tests/test_cache_keys.py
... ... @@ -4,6 +4,10 @@ import hashlib
4 4  
5 5 from embeddings import cache_keys as ck
6 6  
  7 +import pytest
  8 +
  9 +pytestmark = [pytest.mark.embedding, pytest.mark.regression]
  10 +
7 11  
8 12 def test_stable_body_short_unchanged():
9 13 s = "a" * ck.CACHE_KEY_RAW_BODY_MAX_CHARS
... ...
tests/test_embedding_pipeline.py
... ... @@ -21,6 +21,8 @@ from embeddings.config import CONFIG
21 21 from query import QueryParser
22 22 from context.request_context import create_request_context, set_current_request_context, clear_current_request_context
23 23  
  24 +pytestmark = [pytest.mark.embedding, pytest.mark.regression]
  25 +
24 26  
25 27 class _FakeRedis:
26 28 def __init__(self):
... ... @@ -177,8 +179,10 @@ def test_text_embedding_encoder_cache_hit(monkeypatch):
177 179 out = encoder.encode(["cached-text", "new-text"])
178 180  
179 181 assert calls["count"] == 1
180   - assert np.allclose(out[0], cached)
181   - assert np.allclose(out[1], np.array([0.3, 0.4], dtype=np.float32))
  182 + # encoder returns an object-dtype ndarray of 1-D float32 vectors; cast per-row
  183 + # before numeric comparison.
  184 + assert np.allclose(np.asarray(out[0], dtype=np.float32), cached)
  185 + assert np.allclose(np.asarray(out[1], dtype=np.float32), np.array([0.3, 0.4], dtype=np.float32))
182 186  
183 187  
184 188 def test_text_embedding_encoder_forwards_request_headers(monkeypatch):
... ...
tests/test_embedding_service_limits.py
... ... @@ -5,6 +5,8 @@ import pytest
5 5  
6 6 import embeddings.server as embedding_server
7 7  
  8 +pytestmark = [pytest.mark.embedding, pytest.mark.regression]
  9 +
8 10  
9 11 class _DummyClient:
10 12 host = "127.0.0.1"
... ...
tests/test_embedding_service_priority.py
... ... @@ -2,6 +2,10 @@ import threading
2 2  
3 3 import embeddings.server as emb_server
4 4  
  5 +import pytest
  6 +
  7 +pytestmark = [pytest.mark.embedding, pytest.mark.regression]
  8 +
5 9  
6 10 def test_text_inflight_limiter_priority_bypass():
7 11 limiter = emb_server._InflightLimiter(name="text", limit=1)
... ... @@ -30,6 +34,7 @@ def test_text_dispatch_prefers_high_priority_queue():
30 34 normalized=["online"],
31 35 effective_normalize=True,
32 36 request_id="high",
  37 + user_id="u-high",
33 38 priority=1,
34 39 created_at=0.0,
35 40 done=threading.Event(),
... ... @@ -38,6 +43,7 @@ def test_text_dispatch_prefers_high_priority_queue():
38 43 normalized=["offline"],
39 44 effective_normalize=True,
40 45 request_id="normal",
  46 + user_id="u-normal",
41 47 priority=0,
42 48 created_at=0.0,
43 49 done=threading.Event(),
... ...
tests/test_es_query_builder.py
... ... @@ -5,6 +5,10 @@ import numpy as np
5 5  
6 6 from search.es_query_builder import ESQueryBuilder
7 7  
  8 +import pytest
  9 +
  10 +pytestmark = [pytest.mark.search, pytest.mark.regression]
  11 +
8 12  
9 13 def _builder() -> ESQueryBuilder:
10 14 return ESQueryBuilder(
... ...
tests/test_es_query_builder_text_recall_languages.py
... ... @@ -14,6 +14,10 @@ import numpy as np
14 14 from query.keyword_extractor import KEYWORDS_QUERY_BASE_KEY
15 15 from search.es_query_builder import ESQueryBuilder
16 16  
  17 +import pytest
  18 +
  19 +pytestmark = [pytest.mark.search, pytest.mark.regression]
  20 +
17 21  
18 22 def _builder_multilingual_title_only(*, default_language: str = "en") -> ESQueryBuilder:
19 23 """Minimal builder: only title.{lang} for easy field assertions."""
... ... @@ -135,8 +139,13 @@ def test_zh_query_index_zh_en_includes_base_zh_and_trans_en():
135 139 assert "title.en" in _title_fields(idx["base_query_trans_en"])
136 140  
137 141  
138   -def test_keywords_combined_fields_second_must_same_fields_and_50pct():
139   - """When ParsedQuery.keywords_queries is set, inner must has two boosted combined_fields."""
  142 +def test_keywords_combined_fields_second_must_shares_fields_with_main_query():
  143 + """When ParsedQuery.keywords_queries is set, inner must has two boosted combined_fields.
  144 +
  145 + The second must sub-clause reuses the primary clause's field set and applies a
  146 + tuned minimum_should_match / boost to keep keyword recall under control; see
  147 + `search/es_query_builder.py` ``_keywords_combined_fields_sub_must``.
  148 + """
140 149 qb = _builder_multilingual_title_only(default_language="en")
141 150 parsed = SimpleNamespace(
142 151 rewritten_query="连衣裙",
... ... @@ -153,16 +162,16 @@ def test_keywords_combined_fields_second_must_same_fields_and_50pct():
153 162 assert bm[0]["combined_fields"]["query"] == "连衣裙"
154 163 assert bm[0]["combined_fields"]["boost"] == 2.0
155 164 assert bm[1]["combined_fields"]["query"] == "连衣 裙"
156   - assert bm[1]["combined_fields"]["minimum_should_match"] == "50%"
157   - assert bm[1]["combined_fields"]["boost"] == 0.6
  165 + assert bm[1]["combined_fields"]["minimum_should_match"] == "60%"
  166 + assert bm[1]["combined_fields"]["boost"] == 0.8
158 167 assert bm[1]["combined_fields"]["fields"] == bm[0]["combined_fields"]["fields"]
159 168 trans = idx["base_query_trans_en"]
160 169 assert trans["minimum_should_match"] == 1
161 170 tm = _combined_fields_must(trans)
162 171 assert len(tm) == 2
163 172 assert tm[1]["combined_fields"]["query"] == "dress"
164   - assert tm[1]["combined_fields"]["minimum_should_match"] == "50%"
165   - assert tm[1]["combined_fields"]["boost"] == 0.6
  173 + assert tm[1]["combined_fields"]["minimum_should_match"] == "60%"
  174 + assert tm[1]["combined_fields"]["boost"] == 0.8
166 175  
167 176  
168 177 def test_keywords_omitted_when_same_as_main_combined_fields_query():
... ...
tests/test_eval_framework_clients.py
... ... @@ -4,6 +4,8 @@ import requests
4 4 from scripts.evaluation.eval_framework.clients import DashScopeLabelClient
5 5 from scripts.evaluation.eval_framework.utils import build_label_doc_line
6 6  
  7 +pytestmark = [pytest.mark.eval]
  8 +
7 9  
8 10 def _http_error(status_code: int, body: str) -> requests.exceptions.HTTPError:
9 11 response = requests.Response()
... ...
tests/test_eval_metrics.py
1 1 """Tests for search evaluation ranking metrics (NDCG, ERR)."""
2 2  
  3 +import math
  4 +
  5 +import pytest
  6 +
  7 +pytestmark = [pytest.mark.eval, pytest.mark.regression]
  8 +
3 9 from scripts.evaluation.eval_framework.constants import (
4   - RELEVANCE_EXACT,
5   - RELEVANCE_HIGH,
6   - RELEVANCE_IRRELEVANT,
7   - RELEVANCE_LOW,
  10 + RELEVANCE_LV0,
  11 + RELEVANCE_LV1,
  12 + RELEVANCE_LV2,
  13 + RELEVANCE_LV3,
  14 + STOP_PROB_MAP,
8 15 )
9 16 from scripts.evaluation.eval_framework.metrics import compute_query_metrics
10 17  
11 18  
12   -def test_err_matches_documented_three_item_examples():
13   - # Model A: [Exact, Irrelevant, High] -> ERR ≈ 0.992667
14   - m_a = compute_query_metrics(
15   - [RELEVANCE_EXACT, RELEVANCE_IRRELEVANT, RELEVANCE_HIGH],
16   - ideal_labels=[RELEVANCE_EXACT],
17   - )
18   - assert abs(m_a["ERR@5"] - (0.99 + (1.0 / 3.0) * 0.8 * 0.01)) < 1e-5
19   -
20   - # Model B: [High, Low, Exact] -> ERR ≈ 0.8694
21   - m_b = compute_query_metrics(
22   - [RELEVANCE_HIGH, RELEVANCE_LOW, RELEVANCE_EXACT],
23   - ideal_labels=[RELEVANCE_EXACT],
24   - )
25   - expected_b = 0.8 + 0.5 * 0.1 * 0.2 + (1.0 / 3.0) * 0.99 * 0.18
26   - assert abs(m_b["ERR@5"] - expected_b) < 1e-5
  19 +def _expected_err(labels):
  20 + err = 0.0
  21 + product = 1.0
  22 + for i, label in enumerate(labels, start=1):
  23 + p = STOP_PROB_MAP[label]
  24 + err += (1.0 / i) * p * product
  25 + product *= 1.0 - p
  26 + return err
  27 +
  28 +
  29 +def test_err_matches_cascade_formula_on_four_level_labels():
  30 + """ERR@k must equal the textbook cascade formula against the four-level label set.
  31 +
  32 + The metric is the primary ranking signal (see `PRIMARY_METRIC_KEYS` in
  33 + `eval_framework.metrics`); any regression here invalidates the whole
  34 + evaluation pipeline.
  35 + """
  36 +
  37 + ranked_a = [RELEVANCE_LV3, RELEVANCE_LV0, RELEVANCE_LV2]
  38 + ranked_b = [RELEVANCE_LV2, RELEVANCE_LV1, RELEVANCE_LV3]
  39 +
  40 + m_a = compute_query_metrics(ranked_a, ideal_labels=[RELEVANCE_LV3])
  41 + m_b = compute_query_metrics(ranked_b, ideal_labels=[RELEVANCE_LV3])
  42 +
  43 + assert math.isclose(m_a["ERR@5"], _expected_err(ranked_a), abs_tol=1e-5)
  44 + assert math.isclose(m_b["ERR@5"], _expected_err(ranked_b), abs_tol=1e-5)
  45 + assert m_a["ERR@5"] > m_b["ERR@5"]
  46 +
  47 +
  48 +def test_ndcg_at_k_is_1_when_actual_equals_ideal():
  49 + labels = [RELEVANCE_LV3, RELEVANCE_LV2, RELEVANCE_LV1]
  50 + metrics = compute_query_metrics(labels, ideal_labels=labels)
  51 + assert math.isclose(metrics["NDCG@5"], 1.0, abs_tol=1e-9)
  52 + assert math.isclose(metrics["NDCG@20"], 1.0, abs_tol=1e-9)
  53 +
  54 +
  55 +def test_all_irrelevant_zeroes_out_primary_signals():
  56 + labels = [RELEVANCE_LV0, RELEVANCE_LV0, RELEVANCE_LV0]
  57 + metrics = compute_query_metrics(labels, ideal_labels=[RELEVANCE_LV3])
  58 + assert metrics["ERR@10"] == 0.0
  59 + assert metrics["NDCG@20"] == 0.0
  60 + assert metrics["Strong_Precision@10"] == 0.0
  61 + assert metrics["Primary_Metric_Score"] == 0.0
... ...
tests/test_keywords_query.py deleted
... ... @@ -1,115 +0,0 @@
1   -import hanlp
2   -from typing import List, Tuple, Dict, Any
3   -
4   -class KeywordExtractor:
5   - """
6   - 基于 HanLP 的名词关键词提取器
7   - """
8   - def __init__(self):
9   - # 加载带位置信息的分词模型(细粒度)
10   - self.tok = hanlp.load(hanlp.pretrained.tok.CTB9_TOK_ELECTRA_BASE_CRF)
11   - self.tok.config.output_spans = True # 启用位置输出
12   -
13   - # 加载词性标注模型
14   - self.pos_tag = hanlp.load(hanlp.pretrained.pos.CTB9_POS_ELECTRA_SMALL)
15   -
16   - def extract_keywords(self, query: str) -> str:
17   - """
18   - 从查询中提取关键词(名词,长度 ≥ 2)
19   -
20   - Args:
21   - query: 输入文本
22   -
23   - Returns:
24   - 拼接后的关键词字符串,非连续词之间自动插入空格
25   - """
26   - query = query.strip()
27   - # 分词结果带位置:[[word, start, end], ...]
28   - tok_result_with_position = self.tok(query)
29   - tok_result = [x[0] for x in tok_result_with_position]
30   -
31   - # 词性标注
32   - pos_tag_result = list(zip(tok_result, self.pos_tag(tok_result)))
33   -
34   - # 需要忽略的词
35   - ignore_keywords = ['玩具']
36   -
37   - keywords = []
38   - last_end_pos = 0
39   -
40   - for (word, postag), (_, start_pos, end_pos) in zip(pos_tag_result, tok_result_with_position):
41   - if len(word) >= 2 and postag.startswith('N'):
42   - if word in ignore_keywords:
43   - continue
44   - # 如果当前词与上一个词在原文中不连续,插入空格
45   - if start_pos != last_end_pos and keywords:
46   - keywords.append(" ")
47   - keywords.append(word)
48   - last_end_pos = end_pos
49   - # 可选:打印调试信息
50   - # print(f'分词: {word} | 词性: {postag} | 起始: {start_pos} | 结束: {end_pos}')
51   -
52   - return "".join(keywords).strip()
53   -
54   -
55   -# 测试代码
56   -if __name__ == "__main__":
57   - extractor = KeywordExtractor()
58   -
59   - test_queries = [
60   - # 中文(保留 9 个代表性查询)
61   - "2.4G遥控大蛇",
62   - "充气的篮球",
63   - "遥控 塑料 飞船 汽车 ",
64   - "亚克力相框",
65   - "8寸 搪胶蘑菇钉",
66   - "7寸娃娃",
67   - "太空沙套装",
68   - "脚蹬工程车",
69   - "捏捏乐钥匙扣",
70   -
71   - # 英文(新增)
72   - "plastic toy car",
73   - "remote control helicopter",
74   - "inflatable beach ball",
75   - "music keychain",
76   - "sand play set",
77   - # 常见商品搜索
78   - "plastic dinosaur toy",
79   - "wireless bluetooth speaker",
80   - "4K action camera",
81   - "stainless steel water bottle",
82   - "baby stroller with cup holder",
83   -
84   - # 疑问式 / 自然语言
85   - "what is the best smartphone under 500 dollars",
86   - "how to clean a laptop screen",
87   - "where can I buy organic coffee beans",
88   -
89   - # 含数字、特殊字符
90   - "USB-C to HDMI adapter 4K",
91   - "LED strip lights 16.4ft",
92   - "Nintendo Switch OLED model",
93   - "iPhone 15 Pro Max case",
94   -
95   - # 简短词组
96   - "gaming mouse",
97   - "mechanical keyboard",
98   - "wireless earbuds",
99   -
100   - # 长尾词
101   - "rechargeable AA batteries with charger",
102   - "foldable picnic blanket waterproof",
103   -
104   - # 商品属性组合
105   - "women's running shoes size 8",
106   - "men's cotton t-shirt crew neck",
107   -
108   -
109   - # 其他语种(保留原样,用于多语言测试)
110   - "свет USB с пультом дистанционного управления красочные", # 俄语
111   - ]
112   -
113   - for q in test_queries:
114   - keywords = extractor.extract_keywords(q)
115   - print(f"{q:30} => {keywords}")
tests/test_llm_enrichment_batch_fill.py
... ... @@ -6,6 +6,10 @@ import pandas as pd
6 6  
7 7 from indexer.document_transformer import SPUDocumentTransformer
8 8  
  9 +import pytest
  10 +
  11 +pytestmark = [pytest.mark.indexer, pytest.mark.regression]
  12 +
9 13  
10 14 def test_fill_llm_attributes_batch_uses_product_enrich_helper(monkeypatch):
11 15 seen_calls: List[Dict[str, Any]] = []
... ...
tests/test_process_products_batching.py
... ... @@ -4,6 +4,10 @@ from typing import Any, Dict, List
4 4  
5 5 import indexer.product_enrich as process_products
6 6  
  7 +import pytest
  8 +
  9 +pytestmark = [pytest.mark.indexer, pytest.mark.regression]
  10 +
7 11  
8 12 def _mk_products(n: int) -> List[Dict[str, str]]:
9 13 return [{"id": str(i), "title": f"title-{i}"} for i in range(n)]
... ...
tests/test_product_enrich_partial_mode.py
... ... @@ -9,6 +9,10 @@ import types
9 9 from pathlib import Path
10 10 from unittest import mock
11 11  
  12 +import pytest
  13 +
  14 +pytestmark = [pytest.mark.indexer, pytest.mark.regression]
  15 +
12 16  
13 17 def _load_product_enrich_module():
14 18 if "dotenv" not in sys.modules:
... ... @@ -75,6 +79,12 @@ def test_create_prompt_splits_shared_context_and_localized_tail():
75 79  
76 80  
77 81 def test_create_prompt_supports_taxonomy_analysis_kind():
  82 + """Taxonomy schema must produce prompts for every language it declares.
  83 +
  84 + Unsupported (schema, lang) combinations return ``(None, None, None)`` so the
  85 + caller (``process_batch``) can mark the batch as failed without calling LLM,
  86 + instead of silently emitting garbage.
  87 + """
78 88 products = [{"id": "1", "title": "linen dress"}]
79 89  
80 90 shared_zh, user_zh, prefix_zh = product_enrich.create_prompt(
... ... @@ -82,18 +92,26 @@ def test_create_prompt_supports_taxonomy_analysis_kind():
82 92 target_lang="zh",
83 93 analysis_kind="taxonomy",
84 94 )
85   - shared_fr, user_fr, prefix_fr = product_enrich.create_prompt(
  95 + shared_en, user_en, prefix_en = product_enrich.create_prompt(
86 96 products,
87   - target_lang="fr",
  97 + target_lang="en",
88 98 analysis_kind="taxonomy",
89 99 )
90 100  
91 101 assert "apparel attribute taxonomy" in shared_zh
92 102 assert "1. linen dress" in shared_zh
93 103 assert "Language: Chinese" in user_zh
94   - assert "Language: French" in user_fr
  104 + assert "Language: English" in user_en
95 105 assert prefix_zh.startswith("| 序号 | 品类 | 目标性别 |")
96   - assert prefix_fr.startswith("| No. | Product Type | Target Gender |")
  106 + assert prefix_en.startswith("| No. | Product Type | Target Gender |")
  107 +
  108 + # Unsupported (schema, lang) must return a sentinel. French is not declared
  109 + # by any taxonomy schema.
  110 + assert product_enrich.create_prompt(
  111 + products,
  112 + target_lang="fr",
  113 + analysis_kind="taxonomy",
  114 + ) == (None, None, None)
97 115  
98 116  
99 117 def test_call_llm_logs_shared_context_once_and_verbose_contains_full_requests():
... ... @@ -573,7 +591,11 @@ def test_build_index_content_fields_non_apparel_taxonomy_returns_en_only():
573 591 seen_calls.append((analysis_kind, target_lang, category_taxonomy_profile, tuple(p["id"] for p in products)))
574 592 if analysis_kind == "taxonomy":
575 593 assert category_taxonomy_profile == "toys"
576   - assert target_lang == "en"
  594 + # Non-apparel taxonomy profiles only emit en; mirror the real
  595 + # `analyze_products` by returning empty for unsupported langs so the
  596 + # caller drops zh silently.
  597 + if target_lang != "en":
  598 + return []
577 599 return [
578 600 {
579 601 "id": products[0]["id"],
... ... @@ -638,7 +660,6 @@ def test_build_index_content_fields_non_apparel_taxonomy_returns_en_only():
638 660 ],
639 661 }
640 662 ]
641   - assert ("taxonomy", "zh", "toys", ("2",)) not in seen_calls
642 663 assert ("taxonomy", "en", "toys", ("2",)) in seen_calls
643 664  
644 665  
... ...
tests/test_product_title_exclusion.py
... ... @@ -6,6 +6,10 @@ from query.product_title_exclusion import (
6 6 ProductTitleExclusionRegistry,
7 7 )
8 8  
  9 +import pytest
  10 +
  11 +pytestmark = [pytest.mark.intent, pytest.mark.regression]
  12 +
9 13  
10 14 def test_product_title_exclusion_detector_matches_translated_english_token():
11 15 query_config = QueryConfig(
... ...
tests/test_query_parser_mixed_language.py
1 1 from config import FunctionScoreConfig, IndexConfig, QueryConfig, RerankConfig, SPUConfig, SearchConfig
2 2 from query.query_parser import QueryParser
3 3  
  4 +import pytest
  5 +
  6 +pytestmark = [pytest.mark.query, pytest.mark.regression]
  7 +
4 8  
5 9 class _DummyTranslator:
6 10 def translate(self, text, target_lang, source_lang, scene, model_name):
... ...
tests/test_rerank_client.py
... ... @@ -3,6 +3,10 @@ from math import isclose
3 3 from config.schema import CoarseRankFusionConfig, RerankFusionConfig
4 4 from search.rerank_client import coarse_resort_hits, fuse_scores_and_resort, run_lightweight_rerank
5 5  
  6 +import pytest
  7 +
  8 +pytestmark = [pytest.mark.rerank, pytest.mark.regression]
  9 +
6 10  
7 11 def test_fuse_scores_and_resort_aggregates_text_components_and_keeps_rerank_primary():
8 12 hits = [
... ...
tests/test_rerank_provider_topn.py
... ... @@ -4,6 +4,10 @@ from typing import Any, Dict
4 4  
5 5 from providers.rerank import HttpRerankProvider
6 6  
  7 +import pytest
  8 +
  9 +pytestmark = [pytest.mark.rerank, pytest.mark.regression]
  10 +
7 11  
8 12 class _FakeResponse:
9 13 def __init__(self, status_code: int, data: Dict[str, Any]):
... ...
tests/test_rerank_query_text.py
... ... @@ -2,6 +2,10 @@
2 2  
3 3 from query.query_parser import ParsedQuery, rerank_query_text
4 4  
  5 +import pytest
  6 +
  7 +pytestmark = [pytest.mark.rerank, pytest.mark.regression]
  8 +
5 9  
6 10 def test_rerank_query_text_zh_uses_original():
7 11 assert rerank_query_text("你好", detected_language="zh", translations={"en": "hello"}) == "你好"
... ...
tests/test_reranker_dashscope_backend.py
... ... @@ -7,6 +7,8 @@ import pytest
7 7 from reranker.backends import get_rerank_backend
8 8 from reranker.backends.dashscope_rerank import DashScopeRerankBackend
9 9  
  10 +pytestmark = [pytest.mark.rerank, pytest.mark.regression]
  11 +
10 12  
11 13 @pytest.fixture(autouse=True)
12 14 def _clear_global_dashscope_key(monkeypatch):
... ...
tests/test_reranker_qwen3_gguf_backend.py
... ... @@ -6,6 +6,10 @@ import types
6 6 from reranker.backends import get_rerank_backend
7 7 from reranker.backends.qwen3_gguf import Qwen3GGUFRerankerBackend
8 8  
  9 +import pytest
  10 +
  11 +pytestmark = [pytest.mark.rerank, pytest.mark.regression]
  12 +
9 13  
10 14 class _FakeLlama:
11 15 def __init__(self, model_path: str | None = None, **kwargs):
... ...
tests/test_reranker_server_topn.py
... ... @@ -4,6 +4,10 @@ from typing import Any, Dict, List
4 4  
5 5 from fastapi.testclient import TestClient
6 6  
  7 +import pytest
  8 +
  9 +pytestmark = [pytest.mark.rerank, pytest.mark.regression]
  10 +
7 11  
8 12 class _FakeTopNReranker:
9 13 _model_name = "fake-topn-reranker"
... ...
tests/test_search_evaluation_datasets.py
1 1 from config.loader import get_app_config
2 2 from scripts.evaluation.eval_framework.datasets import resolve_dataset
3 3  
  4 +import pytest
  5 +
  6 +pytestmark = [pytest.mark.eval]
  7 +
4 8  
5 9 def test_search_evaluation_registry_contains_expected_datasets() -> None:
6 10 se = get_app_config().search_evaluation
... ...
tests/test_search_rerank_window.py
... ... @@ -22,6 +22,10 @@ from context import create_request_context
22 22 from query.style_intent import DetectedStyleIntent, StyleIntentProfile
23 23 from search.searcher import Searcher
24 24  
  25 +import pytest
  26 +
  27 +pytestmark = [pytest.mark.search, pytest.mark.regression]
  28 +
25 29  
26 30 @dataclass
27 31 class _FakeParsedQuery:
... ...
tests/test_sku_intent_selector.py
... ... @@ -6,6 +6,8 @@ from config import QueryConfig
6 6 from query.style_intent import DetectedStyleIntent, StyleIntentProfile, StyleIntentRegistry
7 7 from search.sku_intent_selector import StyleSkuSelector
8 8  
  9 +pytestmark = [pytest.mark.intent, pytest.mark.regression]
  10 +
9 11  
10 12 def test_style_sku_selector_matches_first_sku_by_attribute_terms():
11 13 registry = StyleIntentRegistry.from_query_config(
... ... @@ -537,3 +539,73 @@ def test_image_pick_ignored_when_text_matches_but_visual_url_not_in_text_set():
537 539 assert decision.selected_sku_id == "khaki"
538 540 assert decision.final_source == "option"
539 541 assert decision.image_pick_sku_id == "black"
  542 +
  543 +
  544 +def test_image_pick_matches_when_inner_hit_url_has_query_string():
  545 + """inner_hits 带 ?v=1,SKU 无 query —— 应用归一化后应对齐。"""
  546 + selector = StyleSkuSelector(_color_registry())
  547 + parsed_query = SimpleNamespace(style_intent_profile=None)
  548 + hits = [
  549 + {
  550 + "_id": "spu-1",
  551 + "_source": {
  552 + "skus": [
  553 + {
  554 + "sku_id": "s1",
  555 + "image_src": "https://cdn/img/p.jpg",
  556 + },
  557 + ],
  558 + },
  559 + "inner_hits": {
  560 + "exact_image_knn_query_hits": {
  561 + "hits": {
  562 + "hits": [
  563 + {
  564 + "_score": 0.8,
  565 + "_source": {"url": "https://cdn/img/p.jpg?width=800&quality=85"},
  566 + }
  567 + ]
  568 + }
  569 + }
  570 + },
  571 + }
  572 + ]
  573 + d = selector.prepare_hits(hits, parsed_query)["spu-1"]
  574 + assert d.selected_sku_id == "s1"
  575 + assert d.final_source == "image"
  576 +
  577 +
  578 +def test_image_pick_uses_nested_offset_and_image_embedding_when_needed():
  579 + """_source.url 与 sku 写法不一致时,用 offset 从 image_embedding 取 canonical url。"""
  580 + selector = StyleSkuSelector(_color_registry())
  581 + parsed_query = SimpleNamespace(style_intent_profile=None)
  582 + hits = [
  583 + {
  584 + "_id": "spu-1",
  585 + "_source": {
  586 + "image_embedding": [
  587 + {"url": "https://cdn/a/spu.jpg"},
  588 + {"url": "https://cdn/b/sku-match.jpg"},
  589 + ],
  590 + "skus": [
  591 + {"sku_id": "sku-a", "image_src": "//cdn/b/sku-match.jpg"},
  592 + ],
  593 + },
  594 + "inner_hits": {
  595 + "exact_image_knn_query_hits": {
  596 + "hits": {
  597 + "hits": [
  598 + {
  599 + "_score": 0.91,
  600 + "_nested": {"field": "image_embedding", "offset": 1},
  601 + "_source": {"url": "https://wrong.example/x.jpg"},
  602 + }
  603 + ]
  604 + }
  605 + }
  606 + },
  607 + }
  608 + ]
  609 + d = selector.prepare_hits(hits, parsed_query)["spu-1"]
  610 + assert d.selected_sku_id == "sku-a"
  611 + assert d.image_pick_url == "https://cdn/b/sku-match.jpg"
... ...
tests/test_style_intent.py
... ... @@ -3,6 +3,10 @@ from types import SimpleNamespace
3 3 from config import QueryConfig
4 4 from query.style_intent import StyleIntentDetector, StyleIntentRegistry
5 5  
  6 +import pytest
  7 +
  8 +pytestmark = [pytest.mark.intent, pytest.mark.regression]
  9 +
6 10  
7 11 def test_style_intent_detector_matches_original_and_translated_queries():
8 12 query_config = QueryConfig(
... ...
tests/test_suggestions.py
... ... @@ -12,6 +12,8 @@ from suggestion.builder import (
12 12 )
13 13 from suggestion.service import SuggestionService
14 14  
  15 +pytestmark = [pytest.mark.suggestion, pytest.mark.regression]
  16 +
15 17  
16 18 class FakeESClient:
17 19 """Lightweight fake ES client for suggestion unit tests."""
... ... @@ -160,7 +162,6 @@ class FakeESClient:
160 162 return sorted([x for x in self.indices if x.startswith(prefix)])
161 163  
162 164  
163   -@pytest.mark.unit
164 165 def test_versioned_index_name_uses_microseconds():
165 166 build_at = datetime(2026, 4, 7, 3, 52, 26, 123456, tzinfo=timezone.utc)
166 167 assert (
... ... @@ -169,7 +170,6 @@ def test_versioned_index_name_uses_microseconds():
169 170 )
170 171  
171 172  
172   -@pytest.mark.unit
173 173 def test_rebuild_cleans_up_unallocatable_new_index():
174 174 fake_es = FakeESClient()
175 175  
... ... @@ -221,7 +221,6 @@ def test_rebuild_cleans_up_unallocatable_new_index():
221 221 assert created_index not in fake_es.indices
222 222  
223 223  
224   -@pytest.mark.unit
225 224 def test_resolve_query_language_prefers_log_field():
226 225 fake_es = FakeESClient()
227 226 builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
... ... @@ -238,7 +237,6 @@ def test_resolve_query_language_prefers_log_field():
238 237 assert conflict is False
239 238  
240 239  
241   -@pytest.mark.unit
242 240 def test_resolve_query_language_uses_request_params_when_log_missing():
243 241 fake_es = FakeESClient()
244 242 builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
... ... @@ -256,7 +254,6 @@ def test_resolve_query_language_uses_request_params_when_log_missing():
256 254 assert conflict is False
257 255  
258 256  
259   -@pytest.mark.unit
260 257 def test_resolve_query_language_fallback_to_primary():
261 258 fake_es = FakeESClient()
262 259 builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
... ... @@ -272,7 +269,6 @@ def test_resolve_query_language_fallback_to_primary():
272 269 assert conflict is False
273 270  
274 271  
275   -@pytest.mark.unit
276 272 def test_suggestion_service_basic_flow_uses_alias_and_routing():
277 273 from config import tenant_config_loader as tcl
278 274  
... ... @@ -309,7 +305,6 @@ def test_suggestion_service_basic_flow_uses_alias_and_routing():
309 305 assert any(x.get("index") == alias_name for x in search_calls)
310 306  
311 307  
312   -@pytest.mark.unit
313 308 def test_publish_alias_and_cleanup_old_versions(monkeypatch):
314 309 fake_es = FakeESClient()
315 310 builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
... ... @@ -338,7 +333,6 @@ def test_publish_alias_and_cleanup_old_versions(monkeypatch):
338 333 assert "search_suggestions_tenant_162_v20260310170000" not in fake_es.indices
339 334  
340 335  
341   -@pytest.mark.unit
342 336 def test_incremental_bootstrap_when_no_active_index(monkeypatch):
343 337 fake_es = FakeESClient()
344 338 builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
... ... @@ -363,7 +357,6 @@ def test_incremental_bootstrap_when_no_active_index(monkeypatch):
363 357 assert result["bootstrap_result"]["mode"] == "full"
364 358  
365 359  
366   -@pytest.mark.unit
367 360 def test_incremental_updates_existing_index(monkeypatch):
368 361 fake_es = FakeESClient()
369 362 builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
... ... @@ -419,7 +412,6 @@ def test_incremental_updates_existing_index(monkeypatch):
419 412 assert len(bulk_calls[0]["actions"]) == 1
420 413  
421 414  
422   -@pytest.mark.unit
423 415 def test_build_full_candidates_fallback_to_id_when_spu_id_missing(monkeypatch):
424 416 fake_es = FakeESClient()
425 417 builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
... ... @@ -459,7 +451,6 @@ def test_build_full_candidates_fallback_to_id_when_spu_id_missing(monkeypatch):
459 451 assert key_to_candidate[qanchor_key].qanchor_spu_ids == {"521"}
460 452  
461 453  
462   -@pytest.mark.unit
463 454 def test_build_full_candidates_tags_and_qanchor_phrases(monkeypatch):
464 455 fake_es = FakeESClient()
465 456 builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
... ... @@ -509,7 +500,6 @@ def test_build_full_candidates_tags_and_qanchor_phrases(monkeypatch):
509 500 assert ("en", "ribbed neckline") in key_to_candidate
510 501  
511 502  
512   -@pytest.mark.unit
513 503 def test_build_full_candidates_splits_long_title_for_suggest(monkeypatch):
514 504 fake_es = FakeESClient()
515 505 builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
... ... @@ -542,7 +532,6 @@ def test_build_full_candidates_splits_long_title_for_suggest(monkeypatch):
542 532 assert key_to_candidate[key].text == "Furby Furblets 2-Pack"
543 533  
544 534  
545   -@pytest.mark.unit
546 535 def test_iter_products_requests_dual_sort_and_fields():
547 536 fake_es = FakeESClient()
548 537 builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
... ...
tests/test_tokenization.py
1 1 from query.tokenization import QueryTextAnalysisCache
2 2  
  3 +import pytest
  4 +
  5 +pytestmark = [pytest.mark.query]
  6 +
3 7  
4 8 def test_han_coarse_tokens_follow_model_tokens_instead_of_whole_sentence():
5 9 cache = QueryTextAnalysisCache(
... ...
tests/test_translation_converter_resolution.py
... ... @@ -7,6 +7,8 @@ import pytest
7 7  
8 8 import translation.ct2_conversion as ct2_conversion
9 9  
  10 +pytestmark = [pytest.mark.translation]
  11 +
10 12  
11 13 class _FakeTransformersConverter:
12 14 def __init__(self, model_name_or_path):
... ...
tests/test_translation_deepl_backend.py
1 1 from translation.backends.deepl import DeepLTranslationBackend
2 2  
  3 +import pytest
  4 +
  5 +pytestmark = [pytest.mark.translation, pytest.mark.regression]
  6 +
3 7  
4 8 class _FakeResponse:
5 9 def __init__(self, status_code, payload=None, text=""):
... ...
tests/test_translation_llm_backend.py
... ... @@ -2,6 +2,10 @@ from types import SimpleNamespace
2 2  
3 3 from translation.backends.llm import LLMTranslationBackend
4 4  
  5 +import pytest
  6 +
  7 +pytestmark = [pytest.mark.translation, pytest.mark.regression]
  8 +
5 9  
6 10 class _FakeCompletions:
7 11 def __init__(self, responses):
... ...
tests/test_translation_local_backends.py
... ... @@ -9,6 +9,8 @@ from translation.languages import build_nllb_language_catalog, resolve_nllb_lang
9 9 from translation.service import TranslationService
10 10 from translation.text_splitter import compute_safe_input_token_limit, split_text_for_translation
11 11  
  12 +pytestmark = [pytest.mark.translation, pytest.mark.regression]
  13 +
12 14  
13 15 class _FakeBatch(dict):
14 16 def to(self, device):
... ...
tests/test_translator_failure_semantics.py
... ... @@ -11,6 +11,8 @@ from translation.logging_utils import (
11 11 from translation.service import TranslationService
12 12 from translation.settings import build_translation_config, translation_cache_probe_models
13 13  
  14 +pytestmark = [pytest.mark.translation, pytest.mark.regression]
  15 +
14 16  
15 17 class _FakeCache:
16 18 def __init__(self):
... ...
translation/prompts.py
... ... @@ -30,6 +30,18 @@ TRANSLATION_PROMPTS: Dict[str, Dict[str, str]] = {
30 30 "it": "Sei un traduttore ecommerce da {source_lang} ({src_lang_code}) a {target_lang} ({tgt_lang_code}). Traduce in un nome SKU prodotto {target_lang} conciso e accurato, restituisci solo il risultato: {text}",
31 31 "pt": "Você é um tradutor de e-commerce de {source_lang} ({src_lang_code}) para {target_lang} ({tgt_lang_code}). Traduza para um nome SKU de produto {target_lang} conciso e preciso, produza apenas o resultado: {text}",
32 32 },
  33 + "sku_attribute": {
  34 + "zh": "你是一名专业的 {source_lang}({src_lang_code})到 {target_lang}({tgt_lang_code})电商翻译专家,请将原文翻译为{target_lang}商品SKU属性值(如颜色、尺码、材质等),要求简洁准确、符合属性展示习惯,只输出结果:{text}",
  35 + "en": "You are a professional {source_lang} ({src_lang_code}) to {target_lang} ({tgt_lang_code}) ecommerce translator. Translate into concise {target_lang} product SKU attribute values (e.g. color, size, material), suitable for attribute display, output only the result: {text}",
  36 + "ru": "Вы переводчик e-commerce с {source_lang} ({src_lang_code}) на {target_lang} ({tgt_lang_code}). Переведите в краткие и точные значения атрибутов SKU на {target_lang} (цвет, размер, материал и т.п.), выводите только результат: {text}",
  37 + "ar": "أنت مترجم تجارة إلكترونية من {source_lang} ({src_lang_code}) إلى {target_lang} ({tgt_lang_code}). ترجم إلى قيم سمات SKU للمنتج بلغة {target_lang} (مثل اللون والمقاس والخامة) بإيجاز ودقة، وأخرج النتيجة فقط: {text}",
  38 + "ja": "{source_lang}({src_lang_code})から {target_lang}({tgt_lang_code})へのEC翻訳者として、商品SKUの属性値(色・サイズ・素材など)に簡潔かつ正確に翻訳し、結果のみ出力してください:{text}",
  39 + "es": "Eres un traductor ecommerce de {source_lang} ({src_lang_code}) a {target_lang} ({tgt_lang_code}). Traduce a valores de atributo SKU de producto en {target_lang} (color, talla, material, etc.), concisos y precisos, devuelve solo el resultado: {text}",
  40 + "de": "Du bist ein E-Commerce-Übersetzer von {source_lang} ({src_lang_code}) nach {target_lang} ({tgt_lang_code}). Übersetze in präzise {target_lang} SKU-Produktattributwerte (z. B. Farbe, Größe, Material), nur Ergebnis ausgeben: {text}",
  41 + "fr": "Vous êtes un traducteur e-commerce de {source_lang} ({src_lang_code}) vers {target_lang} ({tgt_lang_code}). Traduisez en valeurs d'attributs SKU produit {target_lang} (couleur, taille, matière, etc.), concises et précises, sortie uniquement : {text}",
  42 + "it": "Sei un traduttore ecommerce da {source_lang} ({src_lang_code}) a {target_lang} ({tgt_lang_code}). Traduci in valori di attributo SKU prodotto {target_lang} (colore, taglia, materiale, ecc.), concisi e accurati, restituisci solo il risultato: {text}",
  43 + "pt": "Você é um tradutor de e-commerce de {source_lang} ({src_lang_code}) para {target_lang} ({tgt_lang_code}). Traduza para valores de atributo SKU de produto em {target_lang} (cor, tamanho, material etc.), concisos e precisos, produza apenas o resultado: {text}",
  44 + },
33 45 "ecommerce_search_query": {
34 46 "zh": "你是一名专业的 {source_lang}({src_lang_code})到 {target_lang}({tgt_lang_code})翻译助手,请将电商搜索词准确翻译为{target_lang}并符合搜索习惯,只输出结果:{text}",
35 47 "en": "You are a professional {source_lang} ({src_lang_code}) to {target_lang} ({tgt_lang_code}) translator. Translate the ecommerce search query accurately following {target_lang} search habits, output only the result: {text}",
... ... @@ -113,6 +125,39 @@ BATCH_TRANSLATION_PROMPTS: Dict[str, Dict[str, str]] = {
113 125 "Входные данные:\n{text}"
114 126 ),
115 127 },
  128 + "sku_attribute": {
  129 + "en": (
  130 + "Translate each item from {source_lang} ({src_lang_code}) to concise {target_lang} ({tgt_lang_code}) "
  131 + "product SKU attribute values (e.g. color, size, material).\n"
  132 + "Accurately preserve the meaning; keep wording short and suitable for attribute display.\n"
  133 + "Output exactly one line for each input item, in the same order, using this exact format:\n"
  134 + "1. translation\n"
  135 + "2. translation\n"
  136 + "...\n"
  137 + "Do not explain or output anything else.\n"
  138 + "Input:\n{text}"
  139 + ),
  140 + "zh": (
  141 + "将每一项从 {source_lang} ({src_lang_code}) 翻译为简洁的 {target_lang} ({tgt_lang_code}) 商品SKU属性值(如颜色、尺码、材质等)。\n"
  142 + "准确传达含义,措辞简短,适合属性展示。\n"
  143 + "请按输入顺序逐行输出,每个输入对应一行,格式必须如下:\n"
  144 + "1. 翻译结果\n"
  145 + "2. 翻译结果\n"
  146 + "...\n"
  147 + "不要解释或输出其他任何内容。\n"
  148 + "输入:\n{text}"
  149 + ),
  150 + "ru": (
  151 + "Переведите каждый элемент с {source_lang} ({src_lang_code}) на краткие значения атрибутов SKU на {target_lang} ({tgt_lang_code}) (цвет, размер, материал и т.п.).\n"
  152 + "Точно сохраняйте смысл; формулировки должны быть короткими и подходить для отображения атрибутов.\n"
  153 + "Выводите ровно по одной строке для каждого входного элемента в том же порядке, в следующем формате:\n"
  154 + "1. перевод\n"
  155 + "2. перевод\n"
  156 + "...\n"
  157 + "Не добавляйте объяснений и ничего лишнего.\n"
  158 + "Входные данные:\n{text}"
  159 + ),
  160 + },
116 161 "ecommerce_search_query": {
117 162 "en": (
118 163 "Translate each item from {source_lang} ({src_lang_code}) to a natural {target_lang} ({tgt_lang_code}) "
... ...
translation/scenes.py
... ... @@ -18,6 +18,10 @@ SCENE_DEEPL_CONTEXTS: Dict[str, Dict[str, str]] = {
18 18 "zh": "电商搜索词",
19 19 "en": "e-commerce search query",
20 20 },
  21 + "sku_attribute": {
  22 + "zh": "商品SKU属性值",
  23 + "en": "product SKU attribute value",
  24 + },
21 25 }
22 26  
23 27  
... ...