测试回归钩子梳理

变更清单修复（6 处漂移用例，全部更新到最新实现） - `tests/test_eval_metrics.py` — 整体重写为新的 4 级 label + 级联公式断言，放弃旧的 `RELEVANCE_EXACT/HIGH/LOW/IRRELEVANT` 和硬编码 ERR 值。 - `tests/test_embedding_service_priority.py` — 补齐 `_TextDispatchTask(user_id=...)` 新必填位。 - `tests/test_embedding_pipeline.py` — cache-hit 路径的 `np.allclose` 改用 `np.asarray(..., dtype=float32)` 避开 object-dtype。 - `tests/test_es_query_builder_text_recall_languages.py` — keywords 次 combined_fields 的期望值对齐现行值（`MSM 60% / boost 0.8`）并重命名。 - `tests/test_product_enrich_partial_mode.py` - `test_create_prompt_supports_taxonomy_analysis_kind`：去掉错误假设（fr 不属于任何 taxonomy schema），明确 `(None, None, None)` sentinel 的契约。 - `test_build_index_content_fields_non_apparel_taxonomy_returns_en_only`：fake 模拟真实 schema 行为（unsupported lang 返回空列表），删除"zh 未被调用"的过时断言。清理历史过渡物（per 开发原则：不保留内部双轨） - 删除 `tests/test_keywords_query.py`（已被 `query/keyword_extractor.py` 生产实现取代的早期原型）。 - `tests/test_facet_api.py` / `tests/test_cnclip_service.py` 移动到 `tests/manual/`，更新 `tests/manual/README.md` 说明分工。 - 重写 `tests/conftest.py`：仅保留 `sys.path` 注入，删除全库无人引用的 `sample_search_config / mock_es_client / test_searcher / temp_config_file` 等 fixture。 - 删除 `tests/test_suggestions.py` 中 13 处残留 `@pytest.mark.unit` 装饰器（模块级 `pytestmark` 已覆盖）。新建一致性基础设施 - `pytest.ini`：权威配置源。`testpaths = tests`、`norecursedirs = tests/manual`、`--strict-markers`、登记所有子系统 marker + `regression` marker。 - `tests/ci/test_service_api_contracts.py` + 30 个 `tests/test_*.py` 批量贴上 `pytestmark = [pytest.mark.<subsystem>, pytest.mark.regression]`（AST 安全插入，避开多行 import）。 - `scripts/run_regression_tests.sh` 新建，支持 `SUBSYSTEM=<name>` 选子集。 - `scripts/run_ci_tests.sh` 扩容：由原先的 `tests/ci -q` 改为 `contract` marker + `search ∧ regression` 双阶段。文档统一（删除历史双轨） - 重写 `docs/测试Pipeline说明.md`：删除 `tests/unit/` / `tests/integration/` / `scripts/start_test_environment.sh` 等早已不存在的引用，给出目录约定、marker 表、回归锚点矩阵、覆盖缺口清单、联调脚本用法。 - 删除 `docs/测试回归钩子梳理-2026-04-20.md`（内容已合并进上面一份权威文档，按"一处真相"原则下掉）。 - `docs/DEVELOPER_GUIDE.md §8.2 测试` 改写，指向 pipeline 权威文档。 - `CLAUDE.md` 的 `Testing` 与 `Testing Infrastructure` 两节同步更新。最终状态 | 指标 | 结果 | |------|------| | 全量 `pytest tests/` | **241 passed** | | `./scripts/run_ci_tests.sh` | 45 passed | | `./scripts/run_regression_tests.sh` | 233 passed | | 子系统子集（示例） | search=45 / rerank=35 / embedding=23 / intent=25 / translation=33 / indexer=17 / suggestion=13 / query=6 / eval=8 / contract=34 | | 未清零的已知缺口 | 见新版 `测试Pipeline说明.md §4`（function_score / facet / image search / config loader / document_transformer 等 6 条） | Pipeline 文档里 §4 的覆盖缺口我没有强行补测用例——那属于"新增覆盖"，不是这次清理的范畴；只要后续谁补，把对应 marker 贴上去、从清单里划掉即可。

测试回归钩子梳理
变更清单修复（6 处漂移用例，全部更新到最新实现） - `tests/test_eval_metrics.py` — 整体重写为新的 4 级 label + 级联公式断言，放弃旧的 `RELEVANCE_EXACT/HIGH/LOW/IRRELEVANT` 和硬编码 ERR 值。 - `tests/test_embedding_service_priority.py` — 补齐 `_TextDispatchTask(user_id=...)` 新必填位。 - `tests/test_embedding_pipeline.py` — cache-hit 路径的 `np.allclose` 改用 `np.asarray(..., dtype=float32)` 避开 object-dtype。 - `tests/test_es_query_builder_text_recall_languages.py` — keywords 次 combined_fields 的期望值对齐现行值（`MSM 60% / boost 0.8`）并重命名。 - `tests/test_product_enrich_partial_mode.py` - `test_create_prompt_supports_taxonomy_analysis_kind`：去掉错误假设（fr 不属于任何 taxonomy schema），明确 `(None, None, None)` sentinel 的契约。 - `test_build_index_content_fields_non_apparel_taxonomy_returns_en_only`：fake 模拟真实 schema 行为（unsupported lang 返回空列表），删除"zh 未被调用"的过时断言。清理历史过渡物（per 开发原则：不保留内部双轨） - 删除 `tests/test_keywords_query.py`（已被 `query/keyword_extractor.py` 生产实现取代的早期原型）。 - `tests/test_facet_api.py` / `tests/test_cnclip_service.py` 移动到 `tests/manual/`，更新 `tests/manual/README.md` 说明分工。 - 重写 `tests/conftest.py`：仅保留 `sys.path` 注入，删除全库无人引用的 `sample_search_config / mock_es_client / test_searcher / temp_config_file` 等 fixture。 - 删除 `tests/test_suggestions.py` 中 13 处残留 `@pytest.mark.unit` 装饰器（模块级 `pytestmark` 已覆盖）。新建一致性基础设施 - `pytest.ini`：权威配置源。`testpaths = tests`、`norecursedirs = tests/manual`、`--strict-markers`、登记所有子系统 marker + `regression` marker。 - `tests/ci/test_service_api_contracts.py` + 30 个 `tests/test_*.py` 批量贴上 `pytestmark = [pytest.mark.<subsystem>, pytest.mark.regression]`（AST 安全插入，避开多行 import）。 - `scripts/run_regression_tests.sh` 新建，支持 `SUBSYSTEM=<name>` 选子集。 - `scripts/run_ci_tests.sh` 扩容：由原先的 `tests/ci -q` 改为 `contract` marker + `search ∧ regression` 双阶段。文档统一（删除历史双轨） - 重写 `docs/测试Pipeline说明.md`：删除 `tests/unit/` / `tests/integration/` / `scripts/start_test_environment.sh` 等早已不存在的引用，给出目录约定、marker 表、回归锚点矩阵、覆盖缺口清单、联调脚本用法。 - 删除 `docs/测试回归钩子梳理-2026-04-20.md`（内容已合并进上面一份权威文档，按"一处真相"原则下掉）。 - `docs/DEVELOPER_GUIDE.md §8.2 测试` 改写，指向 pipeline 权威文档。 - `CLAUDE.md` 的 `Testing` 与 `Testing Infrastructure` 两节同步更新。最终状态 | 指标 | 结果 | |------|------| | 全量 `pytest tests/` | **241 passed** | | `./scripts/run_ci_tests.sh` | 45 passed | | `./scripts/run_regression_tests.sh` | 233 passed | | 子系统子集（示例） | search=45 / rerank=35 / embedding=23 / intent=25 / translation=33 / indexer=17 / suggestion=13 / query=6 / eval=8 / contract=34 | | 未清零的已知缺口 | 见新版 `测试Pipeline说明.md §4`（function_score / facet / image search / config loader / document_transformer 等 6 条） | Pipeline 文档里 §4 的覆盖缺口我没有强行补测用例——那属于"新增覆盖"，不是这次清理的范畴；只要后续谁补，把对应 marker 贴上去、从清单里划掉即可。
tangwang
1 parent 5c9baf91
Showing 45 changed files with 593 additions and 930 deletions Show diff stats
CLAUDE.md
docs/DEVELOPER_GUIDE.md
docs/测试Pipeline说明.md
pytest.ini
scripts/run_ci_tests.sh
scripts/run_regression_tests.sh
search/searcher.py
search/sku_intent_selector.py
tests/ci/test_service_api_contracts.py
tests/conftest.py
tests/test_cnclip_service.py -> tests/manual/test_cnclip_service.py
tests/test_facet_api.py -> tests/manual/test_facet_api.py
tests/test_cache_keys.py
tests/test_embedding_pipeline.py
tests/test_embedding_service_limits.py
tests/test_embedding_service_priority.py
tests/test_es_query_builder.py
tests/test_es_query_builder_text_recall_languages.py
tests/test_eval_framework_clients.py
tests/test_eval_metrics.py
@@ -99,18 +99,29 @@ python main.py serve --host 0.0.0.0 --port 6002 --reload
 ### Testing
 ```bash
-# Run all tests
-pytest tests/
+# CI gate (API contracts + search core regression anchors)
+./scripts/run_ci_tests.sh
+
+# Full regression anchor suite (pre-release / pre-merge)
+./scripts/run_regression_tests.sh
+
+# Subsystem-scoped regression (e.g. search / query / intent / rerank / embedding / translation / indexer / suggestion)
+SUBSYSTEM=rerank ./scripts/run_regression_tests.sh
-# Run focused regression sets
-python -m pytest tests/ci -q
+# Whole automated suite
+python -m pytest tests/ -q
+
+# Focused debugging
 pytest tests/test_rerank_client.py
 pytest tests/test_query_parser_mixed_language.py
-# Test search from command line
+# Command-line smoke
 python main.py search "query" --tenant-id 1 --size 10
 ```
+See `docs/测试Pipeline说明.md` for the authoritative test pipeline guide,
+including the regression hook matrix and marker conventions.
+
 ### Development Utilities
 ```bash
 # Stop all services
@@ -218,24 +229,24 @@ The system uses centralized configuration through `config/config.yaml`:
 ## Testing Infrastructure
-**Test Framework**: pytest with async support
+**Framework**: pytest. Authoritative guide: `docs/测试Pipeline说明.md`.
+
+**Layout**:
+- `tests/` — flat file layout; each file targets one subsystem.
+- `tests/ci/` — API / service contract tests (FastAPI `TestClient` with fake backends).
+- `tests/manual/` — scripts that need live services (pytest does **not** collect these).
+- `tests/conftest.py` — sys.path injection only. No global fixtures; all fakes live next to the tests that use them.
-**Test Structure**:
-- `tests/conftest.py`: Comprehensive test fixtures and configuration
-- `tests/unit/`: Unit tests for individual components
-- `tests/integration/`: Integration tests for system workflows
-- Test markers: `@pytest.mark.unit`, `@pytest.mark.integration`, `@pytest.mark.api`
+**Markers** (registered in `pytest.ini`, enforced by `--strict-markers`):
+- Subsystem: `contract`, `search`, `query`, `intent`, `rerank`, `embedding`, `translation`, `indexer`, `suggestion`, `eval`.
+- Regression gate: `regression` — anchor tests mandatory for `run_regression_tests.sh`.
 **Test Data**:
 - Tenant1: Mock data with 10,000 product records
 - Tenant2: CSV-based test dataset
 - Automated test data generation via `scripts/mock_data.sh`
-**Key Test Fixtures** (from `conftest.py`):
-- `sample_search_config`: Complete configuration for testing
-- `mock_es_client`: Mocked Elasticsearch client
-- `test_searcher`: Searcher instance with mock dependencies
-- `temp_config_file`: Temporary YAML configuration for tests
+**Principle**: tests must inject fakes for ES / DeepL / LLM / Redis. Never add tests that rely on real external services to the automated suite — put them under `tests/manual/`.
 ## API Endpoints
@@ -386,11 +386,16 @@ services:
 ### 8.2 测试
-- **位置**：`tests/`，可按 `unit/`、`integration/` 或按模块划分子目录；公共 fixture 在 `conftest.py`。
-- **标记**：使用 `@pytest.mark.unit`、`@pytest.mark.integration`、`@pytest.mark.api` 等区分用例类型，便于按需运行。
-- **依赖**：单元测试通过 mock（如 `mock_es_client`、`sample_search_config`）不依赖真实 ES/DB；集成测试需在说明中注明依赖服务。
-- **运行**：`python -m pytest tests/`；推荐最小回归：`python -m pytest tests/ci -q`；按模块聚焦可直接指定具体测试文件。
-- **原则**：新增逻辑应有对应测试；修改协议或配置契约时更新相关测试与 fixture。
+测试流水线的权威说明见 [`docs/测试Pipeline说明.md`](./测试Pipeline说明.md)。核心约定：
+
+- **位置**：`tests/` 下按文件平铺，`tests/ci/` 放 API 契约测试，`tests/manual/` 放需人工起服务的联调脚本（pytest 默认不 collect）。
+- **Marker**：`pytest.ini` 里登记了子系统 marker（`search / query / intent / rerank / embedding / translation / indexer / suggestion / eval / contract`）与 `regression` marker；新测试必须贴对应 marker（`--strict-markers` 会强制）。
+- **依赖**：测试一律通过注入 fake stub 隔离 ES / DeepL / LLM / Redis 等外部依赖。需要真实依赖的脚本放 `tests/manual/`。
+- **运行**：
+  - CI 门禁：`./scripts/run_ci_tests.sh`（契约 + search 回归锚点）
+  - 发版前：`./scripts/run_regression_tests.sh`（全部 `regression` 锚点；可配 `SUBSYSTEM=<name>`）
+  - 全量：`python -m pytest tests/ -q`
+- **原则**：新增逻辑应有对应测试；修改协议或配置契约时**同步**更新契约测试。不要在测试里保留"旧 assert 作为兼容"——请直接面向当前实现写断言，失败即意味着契约已变更，需要上层决策。
 ### 8.3 配置与环境
 # 搜索引擎测试流水线指南
-## 概述
+本文档是测试套件的**权威入口**，涵盖目录约定、运行方式、回归锚点矩阵、以及手动
+联调脚本的分工。任何与这里不一致的历史文档（例如提到 `tests/unit/` 或
+`scripts/start_test_environment.sh`）都是过期信息，以本文为准。
-本文档介绍了搜索引擎项目的完整测试流水线，包括测试环境搭建、测试执行、结果分析等内容。测试流水线设计用于commit前的自动化质量保证。
-
-## 🏗️ 测试架构
-
-### 测试层次
+## 1. 测试目录与分层
 ```
-测试流水线
-├── 代码质量检查 (Code Quality)
-│   ├── 代码格式化检查 (Black, isort)
-│   ├── 静态分析 (Flake8, MyPy, Pylint)
-│   └── 安全扫描 (Safety, Bandit)
-│
-├── 单元测试 (Unit Tests)
-│   ├── RequestContext测试
-│   ├── Searcher测试
-│   ├── QueryParser测试
-│   └── BooleanParser测试
-│
-├── 集成测试 (Integration Tests)
-│   ├── 端到端搜索流程测试
-│   ├── 多组件协同测试
-│   └── 错误处理测试
-│
-├── API测试 (API Tests)
-│   ├── REST API接口测试
-│   ├── 参数验证测试
-│   ├── 并发请求测试
-│   └── 错误响应测试
-│
-└── 性能测试 (Performance Tests)
-    ├── 响应时间测试
-    ├── 并发性能测试
-    └── 资源使用测试
+tests/
+├── conftest.py              # 只做 sys.path 注入；不再维护全局 fixture
+├── ci/                      # API/服务契约（FastAPI TestClient + 全 fake 依赖）
+│   └── test_service_api_contracts.py
+├── manual/                  # 需真实服务才能跑的联调脚本，pytest 默认不 collect
+│   ├── test_build_docs_api.py
+│   ├── test_cnclip_service.py
+│   └── test_facet_api.py
+└── test_*.py                # 子系统单测（全部自带 fake，无外部依赖）
 ```
-### 核心组件
-
-1. **RequestContext**: 请求级别的上下文管理器，用于跟踪测试过程中的所有数据
-2. **测试环境管理**: 自动化启动/停止测试依赖服务
-3. **测试执行引擎**: 统一的测试运行和结果收集
-4. **报告生成系统**: 多格式的测试报告生成
-
-## 🚀 快速开始
+关键约束（写在 `pytest.ini` 里，不要另起分支）：
-### 本地测试环境
+- `testpaths = tests`，`norecursedirs = tests/manual`；
+- `--strict-markers`：所有 marker 必须先在 `pytest.ini::markers` 登记；
+- 测试**不得**依赖真实 ES / DeepL / LLM 服务。需要外部依赖的脚本请放 `tests/manual/`。
-1. **启动测试环境**
-   ```bash
-   # 启动所有必要的测试服务
-   ./scripts/start_test_environment.sh
-   ```
+## 2. 运行方式
-2. **运行完整测试套件**
-   ```bash
-   # 运行所有测试
-   python scripts/run_tests.py
+| 场景 | 命令 | 覆盖范围 |
+|------|------|----------|
+| CI 门禁（每次提交） | `./scripts/run_ci_tests.sh` | `tests/ci` + `contract` marker + `search ∧ regression` |
+| 发版 / 大合并前 | `./scripts/run_regression_tests.sh` | 所有 `@pytest.mark.regression` |
+| 子系统子集 | `SUBSYSTEM=search ./scripts/run_regression_tests.sh` | 指定子系统的 regression 锚点 |
+| 全量（含非回归） | `python -m pytest tests/ -q` | 全部自动化用例 |
+| 手动联调 | `python tests/manual/<script>.py` | 需提前起对应服务 |
-   # 或者使用pytest直接运行
-   pytest tests/ -v
-   ```
+## 3. Marker 体系与回归锚点矩阵
-3. **停止测试环境**
-   ```bash
-   ./scripts/stop_test_environment.sh
-   ```
+marker 定义见 `pytest.ini`。每个测试文件通过模块级 `pytestmark` 贴标，同时
+属于 `regression` 的用例构成“**回归锚点集合**”。
-### CI/CD测试
+| 子系统 marker | 关键文件（锚点） | 保护的行为 |
+|---------------|------------------|------------|
+| `contract`    | `tests/ci/test_service_api_contracts.py` | Search / Indexer / Embedding / Reranker / Translation 的 HTTP 契约 |
+| `search`      | `test_search_rerank_window.py`, `test_es_query_builder.py`, `test_es_query_builder_text_recall_languages.py` | Searcher 主路径、排序 / 召回、keywords 副 combined_fields、多语种 |
+| `query`       | `test_query_parser_mixed_language.py`, `test_tokenization.py` | 中英混合解析、HanLP 分词、language detect |
+| `intent`      | `test_style_intent.py`, `test_product_title_exclusion.py`, `test_sku_intent_selector.py` | 风格意图、商品标题排除、SKU 选型 |
+| `rerank`      | `test_rerank_client.py`, `test_rerank_query_text.py`, `test_rerank_provider_topn.py`, `test_reranker_server_topn.py`, `test_reranker_dashscope_backend.py`, `test_reranker_qwen3_gguf_backend.py` | 粗排 / 精排 / topN / 后端切换 |
+| `embedding`   | `test_embedding_pipeline.py`, `test_embedding_service_limits.py`, `test_embedding_service_priority.py`, `test_cache_keys.py` | 文本/图像向量客户端、inflight limiter、优先级队列、缓存 key |
+| `translation` | `test_translation_deepl_backend.py`, `test_translation_llm_backend.py`, `test_translation_local_backends.py`, `test_translator_failure_semantics.py` | DeepL / LLM / 本地回退、失败语义 |
+| `indexer`     | `test_product_enrich_partial_mode.py`, `test_process_products_batching.py`, `test_llm_enrichment_batch_fill.py` | LLM Partial Mode、batch 拆分、空结果补位 |
+| `suggestion`  | `test_suggestions.py` | 建议索引构建 |
+| `eval`        | `test_eval_metrics.py`（regression） + `test_search_evaluation_datasets.py` / `test_eval_framework_clients.py`（非 regression） | NDCG / ERR 指标、数据集加载、评估客户端 |
-1. **GitHub Actions**
-   - Push到主分支自动触发
-   - Pull Request自动运行
-   - 手动触发支持
+> 任何新写的子系统单测，都应该在顶部加 `pytestmark = [pytest.mark.<子系统>, pytest.mark.regression]`。
+> 不贴 `regression` 的测试默认**不会**被 `run_regression_tests.sh` 选中，请谨慎决定。
-2. **测试报告**
-   - 自动生成并上传
-   - PR评论显示测试摘要
-   - 详细报告下载
+## 4. 当前覆盖缺口（跟踪中）
-## 📋 测试类型详解
+以下场景目前没有被 `regression` 锚点覆盖，优先级从高到低：
-### 1. 单元测试 (Unit Tests)
+1. **`api/routes/search.py` 的请求参数映射**：`QueryParser.parse(...)` 透传是否完整（目前只有 `tests/ci` 间接覆盖）。
+2. **`indexer/document_transformer.py` 的端到端转换**：从 MySQL 行到 ES doc 的 snapshot 对比。
+3. **`config/loader.py` 加载多租户配置**：含继承 / override 的合并规则。
+4. **`search/searcher.py::_build_function_score`**：function_score 装配。
+5. **Facet 聚合 / disjunctive 过滤**。
+6. **图像搜索主路径**（`search/image_searcher.py`）。
-**位置**: `tests/unit/`
+补齐时记得同步贴 `regression` + 对应子系统 marker，并在本表删除条目。
-**目的**: 测试单个函数、类、模块的功能
+## 5. 手动联调：索引文档构建流水线
-**覆盖范围**:
-- `test_context.py`: RequestContext功能测试
-- `test_searcher.py`: Searcher核心功能测试
-- `test_query_parser.py`: QueryParser处理逻辑测试
-
-**运行方式**:
-```bash
-# 运行所有单元测试
-pytest tests/unit/ -v
-
-# 运行特定测试
-pytest tests/unit/test_context.py -v
-
-# 生成覆盖率报告
-pytest tests/unit/ --cov=. --cov-report=html
-```
-
-### 2. 集成测试 (Integration Tests)
-
-**位置**: `tests/integration/`
-
-**目的**: 测试多个组件协同工作的功能
-
-**覆盖范围**:
-- `test_search_integration.py`: 完整搜索流程集成
-- 数据库、ES、搜索器集成测试
-- 错误传播和处理测试
-
-**运行方式**:
-```bash
-# 运行集成测试（需要启动测试环境）
-pytest tests/integration/ -v -m "not slow"
-
-# 运行包含慢速测试的集成测试
-pytest tests/integration/ -v
-```
-
-### 3. API测试 (API Tests)
-
-**位置**: `tests/integration/test_api_integration.py`
-
-**目的**: 测试HTTP API接口的功能和性能
-
-**覆盖范围**:
-- 基本搜索API
-- 参数验证
-- 错误处理
-- 并发请求
-- Unicode支持
-
-**运行方式**:
-```bash
-# 运行API测试
-pytest tests/integration/test_api_integration.py -v
-```
-
-### 5. 索引 & 文档构建流水线验证（手动）
-
-除了自动化测试外，推荐在联调/问题排查时手动跑一遍“**从 MySQL 到 ES doc**”的索引流水线，确保字段与 mapping、查询逻辑一致。
-
-#### 5.1 启动 Indexer 服务
+除自动化测试外，联调/问题排查时建议走一遍“**MySQL → ES doc**”链路，确保字段与 mapping
+与查询逻辑对齐。
 ```bash
 cd /home/tw/saas-search
 ./scripts/stop.sh                  # 停掉已有进程（可选）
-./scripts/start_indexer.sh         # 启动专用 indexer 服务，默认端口 6004
-```
-
-#### 5.2 基于数据库构建 ES doc（只看、不写 ES）
+./scripts/start_indexer.sh         # 启动 indexer 服务，默认端口 6004
-> 场景：已经知道某个 `tenant_id` 和 `spu_id`，想看它在“最新逻辑下”的 ES 文档长什么样。
-
-```bash
 curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \
   -H "Content-Type: application/json" \
-  -d '{
-        "tenant_id": "170",
-        "spu_ids": ["223167"]
-      }'
-```
-
-返回中：
-
-- `docs[0]` 为当前代码构造出来的完整 ES doc（与 `mappings/search_products.json` 对齐）；
-- 可以直接比对：
-  - 索引字段说明：`docs/索引字段说明v2.md`
-  - 实际 ES 文档：`docs/常用查询 - ES.md` 中的查询示例（按 `spu_id` 过滤）。
-
-#### 5.3 与 ES 实际数据对比
-
-```bash
-curl -u 'essa:***' \
-  -X GET 'http://localhost:9200/search_products_tenant_170/_search?pretty' \
-  -H 'Content-Type: application/json' \
-  -d '{
-    "size": 5,
-    "_source": ["title", "tags"],
-    "query": {
-      "bool": {
-        "filter": [
-          { "term": { "spu_id": "223167" } }
-        ]
-      }
-    }
-  }'
+  -d '{ "tenant_id": "170", "spu_ids": ["223167"] }'
 ```
-对比如下内容是否一致：
-
-- 多语言字段：`title/brief/description/vendor/category_name_text/category_path`；
-- 结构字段：`tags/specifications/skus/min_price/max_price/compare_at_price/total_inventory` 等；
-- 算法字段：`title_embedding` 是否存在（值不必逐项比对）。
-
-如果两边不一致，可以结合：
-
-- `indexer/document_transformer.py`（文档构造逻辑）；
-- `indexer/incremental_service.py`（增量索引/查库逻辑）；
-- `logs/indexer.log`（索引日志）
-
-逐步缩小问题范围。
-
-### 4. 性能测试 (Performance Tests)
-
-**目的**: 验证系统性能指标
-
-**测试内容**:
-- 搜索响应时间
-- API并发处理能力
-- 资源使用情况
-
-**运行方式**:
-```bash
-# 运行性能测试
-python scripts/run_performance_tests.py
-```
-
-## 🛠️ 环境配置
-
-### 测试环境要求
-
-1. **Python环境**
-   ```bash
-   # 创建测试环境
-   conda create -n searchengine-test python=3.9
-   conda activate searchengine-test
-
-   # 安装依赖
-   pip install -r requirements.txt
-   pip install pytest pytest-cov pytest-json-report
-   ```
-
-2. **Elasticsearch**
-   ```bash
-   # 使用Docker启动ES
-   docker run -d \
-     --name elasticsearch \
-     -p 9200:9200 \
-     -e "discovery.type=single-node" \
-     -e "xpack.security.enabled=false" \
-     elasticsearch:8.8.0
-   ```
-
-3. **环境变量**
-   ```bash
-   export ES_HOST="http://localhost:9200"
-   export ES_USERNAME="elastic"
-   export ES_PASSWORD="changeme"
-   export API_HOST="127.0.0.1"
-   export API_PORT="6003"
-   export TENANT_ID="test_tenant"
-   export TESTING_MODE="true"
-   ```
-
-### 服务依赖
-
-测试环境需要以下服务：
-
-1. **Elasticsearch** (端口9200)
-   - 存储和搜索测试数据
-   - 支持中文和英文索引
-
-2. **API服务** (端口6003)
-   - FastAPI测试服务
-   - 提供搜索接口
-
-3. **测试数据库**
-   - 预配置的测试索引
-   - 包含测试数据
-
-## 📊 测试报告
-
-### 报告类型
-
-1. **实时控制台输出**
-   - 测试进度显示
-   - 失败详情
-   - 性能摘要
-
-2. **JSON格式报告**
-   ```json
-   {
-     "timestamp": "2024-01-01T10:00:00",
-     "summary": {
-       "total_tests": 150,
-       "passed": 148,
-       "failed": 2,
-       "success_rate": 98.7
-     },
-     "suites": { ... }
-   }
-   ```
-
-3. **文本格式报告**
-   - 人类友好的格式
-   - 包含测试摘要和详情
-   - 适合PR评论
-
-4. **HTML覆盖率报告**
-   - 代码覆盖率可视化
-   - 分支和行覆盖率
-   - 缺失测试高亮
-
-### 报告位置
-
-```
-test_logs/
-├── unit_test_results.json          # 单元测试结果
-├── integration_test_results.json   # 集成测试结果
-├── api_test_results.json           # API测试结果
-├── test_report_20240101_100000.txt # 文本格式摘要
-├── test_report_20240101_100000.json # JSON格式详情
-└── htmlcov/                        # HTML覆盖率报告
-```
-
-## 🔄 CI/CD集成
-
-### GitHub Actions工作流
-
-**触发条件**:
-- Push到主分支
-- Pull Request创建/更新
-- 手动触发
-
-**工作流阶段**:
-
-1. **代码质量检查**
-   - 代码格式验证
-   - 静态代码分析
-   - 安全漏洞扫描
-
-2. **单元测试**
-   - 多Python版本矩阵测试
-   - 代码覆盖率收集
-   - 自动上传到Codecov
-
-3. **集成测试**
-   - 服务依赖启动
-   - 端到端功能测试
-   - 错误处理验证
-
-4. **API测试**
-   - 接口功能验证
-   - 参数校验测试
-   - 并发请求测试
-
-5. **性能测试**
-   - 响应时间检查
-   - 资源使用监控
-   - 性能回归检测
-
-6. **测试报告生成**
-   - 结果汇总
-   - 报告上传
-   - PR评论更新
-
-### 工作流配置
-
-**文件**: `.github/workflows/test.yml`
-
-**关键特性**:
-- 并行执行提高效率
-- 服务容器化隔离
-- 自动清理资源
-- 智能缓存依赖
-
-## 🧪 测试最佳实践
-
-### 1. 测试编写原则
-
-- **独立性**: 每个测试应该独立运行
-- **可重复性**: 测试结果应该一致
-- **快速执行**: 单元测试应该快速完成
-- **清晰命名**: 测试名称应该描述测试内容
-
-### 2. 测试数据管理
-
-```python
-# 使用fixture提供测试数据
-@pytest.fixture
-def sample_tenant_config():
-    return TenantConfig(
-        tenant_id="test_tenant",
-        es_index_name="test_products"
-    )
-
-# 使用mock避免外部依赖
-@patch('search.searcher.ESClient')
-def test_search_with_mock_es(mock_es_client, test_searcher):
-    mock_es_client.search.return_value = mock_response
-    result = test_searcher.search("test query")
-    assert result is not None
-```
-
-### 3. RequestContext集成
-
-```python
-def test_with_context(test_searcher):
-    context = create_request_context("test-req", "test-user")
-
-    result = test_searcher.search("test query", context=context)
-
-    # 验证context被正确更新
-    assert context.query_analysis.original_query == "test query"
-    assert context.get_stage_duration("elasticsearch_search") > 0
-```
-
-### 4. 性能测试指南
-
-```python
-def test_search_performance(client):
-    start_time = time.time()
-    response = client.get("/search", params={"q": "test query"})
-    response_time = (time.time() - start_time) * 1000
-
-    assert response.status_code == 200
-    assert response_time < 2000  # 2秒内响应
-```
-
-## 🚨 故障排除
-
-### 常见问题
-
-1. **Elasticsearch连接失败**
-   ```bash
-   # 检查ES状态
-   curl http://localhost:9200/_cluster/health
-
-   # 重启ES服务
-   docker restart elasticsearch
-   ```
-
-2. **测试端口冲突**
-   ```bash
-   # 检查端口占用
-   lsof -i :6003
-
-   # 修改API端口
-   export API_PORT="6004"
-   ```
-
-3. **依赖包缺失**
-   ```bash
-   # 重新安装依赖
-   pip install -r requirements.txt
-   pip install pytest pytest-cov pytest-json-report
-   ```
-
-4. **测试数据问题**
-   ```bash
-   # 重新创建测试索引
-   curl -X DELETE http://localhost:9200/test_products
-   ./scripts/start_test_environment.sh
-   ```
-
-### 调试技巧
-
-1. **详细日志输出**
-   ```bash
-   pytest tests/unit/test_context.py -v -s --tb=long
-   ```
-
-2. **运行单个测试**
-   ```bash
-   pytest tests/unit/test_context.py::TestRequestContext::test_create_context -v
-   ```
-
-3. **调试模式**
-   ```python
-   import pdb; pdb.set_trace()
-   ```
-
-4. **性能分析**
-   ```bash
-   pytest --profile tests/
-   ```
-
-## 📈 持续改进
-
-### 测试覆盖率目标
-
-- **单元测试**: > 90%
-- **集成测试**: > 80%
-- **API测试**: > 95%
-
-### 性能基准
-
-- **搜索响应时间**: < 2秒
-- **API并发处理**: 100 QPS
-- **系统资源使用**: < 80% CPU, < 4GB RAM
+返回中 `docs[0]` 即当前代码构造的 ES doc（与 `mappings/search_products.json` 对齐）。
+与真实 ES 数据对比的查询参考 `docs/常用查询 - ES.md`；若字段不一致，按以下路径定位：
-### 质量门禁
+- `indexer/document_transformer.py` — 文档构造逻辑
+- `indexer/incremental_service.py` — 增量查库逻辑
+- `logs/indexer.log` — 索引日志
-- **所有测试必须通过**
-- **代码覆盖率不能下降**
-- **性能不能显著退化**
-- **不能有安全漏洞**
+## 6. 编写测试的约束（与 `开发原则` 对齐）
+- **fail fast**：测试输入不合法时应直接抛错，不用 `if ... return`；不要用 `try/except` 吃掉异常再 `assert not exception`。
+- **不做兼容双轨**：用例对准当前实现，不为历史行为保留“旧 assert”。若确有外部兼容性（例如 API 上标注 Deprecated 的字段），在 `tests/ci` 里单独写**契约**用例并注明 Deprecated。
+- **外部依赖全 fake**：凡是依赖 HTTP / Redis / ES / LLM 的测试必须注入 fake stub，否则归入 `tests/manual/`。
+- **一处真相**：共享 fixture 如果超过 2 个文件使用，放 `tests/conftest.py`；只给 1 个文件用就放在该文件内。避免再次出现全库无人引用的 dead fixture。
@@ -0,0 +1,30 @@
+[pytest]
+# 权威的 pytest 配置源。新增共享配置请放这里，不要再散落到各测试文件头部。
+#
+# testpaths 明确只扫 tests/（含 tests/ci/），刻意排除 tests/manual/。
+testpaths = tests
+# tests/manual/ 里的脚本依赖外部服务，不参与自动回归。
+norecursedirs = tests/manual
+
+addopts = -ra --strict-markers
+
+# 全局静默第三方的 DeprecationWarning，避免遮掩真正需要关注的业务警告。
+filterwarnings =
+    ignore::DeprecationWarning
+    ignore::PendingDeprecationWarning
+
+# 子系统 / 回归分层标记。新增 marker 前先在这里登记，未登记的 marker 会因
+# --strict-markers 直接报错。
+markers =
+    regression: 提交/发布前必跑的回归锚点集合
+    contract: API / 服务契约（tests/ci 默认全部归入）
+    search: Searcher / 排序 / 召回管线
+    query: QueryParser / 翻译 / 分词
+    intent: 样式与 SKU 意图识别
+    rerank: 粗排 / 精排 / 融合
+    embedding: 文本/图像向量服务与客户端
+    translation: 翻译服务与缓存
+    indexer: 索引构建 / LLM enrich
+    suggestion: 搜索建议索引
+    eval: 评估框架
+    manual: 需人工起服务，CI 不跑
 #!/bin/bash
+# CI 门禁脚本：每次提交必跑的最小集合。
+#
+# 覆盖范围：
+#   1. tests/ci 下的服务契约测试（HTTP/JSON schema / 路由 / 鉴权）
+#   2. tests/ 下带 `contract` marker 的所有用例（冗余保障，防止 marker 与目录漂移）
+#   3. 搜索主路径 + ES 查询构建器的回归锚点（search 子系统）
+#
+# 超出这个范围的完整回归集请用 scripts/run_regression_tests.sh。
 set -euo pipefail
 cd "$(dirname "$0")/.."
 source ./activate.sh
-echo "Running CI contract tests..."
-python -m pytest tests/ci -q
+echo "==> [CI-1/2] API contract tests (tests/ci + contract marker)..."
+python -m pytest tests/ci tests/ -q -m contract
+
+echo "==> [CI-2/2] Search core regression (search marker)..."
+python -m pytest tests/ -q -m "search and regression"
@@ -0,0 +1,26 @@
+#!/bin/bash
+# 回归锚点脚本：发版 / 大合并前必跑的回归集合。
+#
+# 选中策略：所有 @pytest.mark.regression 用例，即 docs/测试Pipeline说明.md
+# “回归钩子矩阵” 中列出的各子系统锚点。
+#
+# 可选参数：
+#   SUBSYSTEM=search  ./scripts/run_regression_tests.sh   # 只跑某个子系统的回归子集
+#
+# 约束：本脚本不启外部依赖（ES / DeepL / LLM 全 fake）。如需真实依赖，请用
+# tests/manual 下的脚本。
+
+set -euo pipefail
+
+cd "$(dirname "$0")/.."
+source ./activate.sh
+
+SUBSYSTEM="${SUBSYSTEM:-}"
+
+if [[ -n "${SUBSYSTEM}" ]]; then
+    echo "==> Running regression subset: subsystem=${SUBSYSTEM}"
+    python -m pytest tests/ -q -m "${SUBSYSTEM} and regression"
+else
+    echo "==> Running full regression anchor suite..."
+    python -m pytest tests/ -q -m regression
+fi
@@ -370,6 +370,11 @@ class Searcher:
             # (on the same dimension as optionN).
             includes.add("enriched_taxonomy_attributes")
+        # Needed when inner_hits url string differs from sku.image_src but ES exposes
+        # _nested.offset — we re-resolve the winning url from image_embedding[offset].
+        if self._has_image_signal(parsed_query):
+            includes.add("image_embedding")
+
         return {"includes": sorted(includes)}
     def _fetch_hits_by_ids(
@@ -40,7 +40,8 @@ from __future__ import annotations
 from dataclasses import dataclass, field
 from typing import Any, Callable, Dict, List, Optional, Tuple
-from urllib.parse import urlsplit
+import posixpath
+from urllib.parse import unquote, urlsplit
 from query.style_intent import (
     DetectedStyleIntent,
@@ -439,6 +440,7 @@ class StyleSkuSelector:
     # ------------------------------------------------------------------
     @staticmethod
     def _normalize_url(url: Any) -> str:
+        """host + path, no query/fragment; casefolded — primary equality key."""
         raw = str(url or "").strip()
         if not raw:
             return ""
@@ -448,20 +450,93 @@ class StyleSkuSelector:
         try:
             parts = urlsplit(raw)
         except ValueError:
-            return raw.casefold()
+            return str(url).strip().casefold()
         host = (parts.netloc or "").casefold()
-        path = parts.path or ""
+        path = unquote(parts.path or "")
         return f"{host}{path}".casefold()
+    @staticmethod
+    def _normalize_path_only(url: Any) -> str:
+        """Path-only key for cross-CDN / host-alias cases."""
+        raw = str(url or "").strip()
+        if not raw:
+            return ""
+        if raw.startswith("//"):
+            raw = "https:" + raw
+        try:
+            parts = urlsplit(raw)
+            path = unquote(parts.path or "")
+        except ValueError:
+            return ""
+        return path.casefold().rstrip("/")
+
+    @classmethod
+    def _url_filename(cls, url: Any) -> str:
+        p = cls._normalize_path_only(url)
+        if not p:
+            return ""
+        return posixpath.basename(p).casefold()
+
+    @classmethod
+    def _urls_equivalent(cls, a: Any, b: Any) -> bool:
+        if not a or not b:
+            return False
+        na, nb = cls._normalize_url(a), cls._normalize_url(b)
+        if na and nb and na == nb:
+            return True
+        pa, pb = cls._normalize_path_only(a), cls._normalize_path_only(b)
+        if pa and pb and pa == pb:
+            return True
+        fa, fb = cls._url_filename(a), cls._url_filename(b)
+        if fa and fb and fa == fb and len(fa) > 4:
+            return True
+        return False
+
+    @staticmethod
+    def _inner_hit_url_candidates(entry: Dict[str, Any], source: Dict[str, Any]) -> List[str]:
+        """URLs to try for this inner_hit: _source.url plus image_embedding[offset].url."""
+        out: List[str] = []
+        src = entry.get("_source") or {}
+        u = src.get("url")
+        if u:
+            out.append(str(u).strip())
+        nested = entry.get("_nested")
+        if not isinstance(nested, dict):
+            return out
+        off = nested.get("offset")
+        if not isinstance(off, int):
+            return out
+        embs = source.get("image_embedding")
+        if not isinstance(embs, list) or not (0 <= off < len(embs)):
+            return out
+        emb = embs[off]
+        if isinstance(emb, dict) and emb.get("url"):
+            u2 = str(emb.get("url")).strip()
+            if u2 and u2 not in out:
+                out.append(u2)
+        return out
+
     def _pick_sku_by_image(
         self,
         hit: Dict[str, Any],
         source: Dict[str, Any],
     ) -> Optional[ImagePick]:
+        """Map ES nested image KNN inner_hits to a SKU via image URL alignment.
+
+        ``image_pick`` is empty when:
+        - ES did not return ``inner_hits`` for this hit (e.g. doc outside
+          ``rescore.window_size`` so no exact-image rescore inner_hits; or the
+          nested image clause did not match this document).
+        - The winning nested ``url`` cannot be aligned to any ``skus[].image_src``
+          even after path/filename normalization (rare CDN / encoding edge cases).
+
+        We try ``_source.url``, ``_nested.offset`` + ``image_embedding[offset].url``,
+        and loose path/filename matching to reduce false negatives.
+        """
         inner_hits = hit.get("inner_hits")
         if not isinstance(inner_hits, dict):
             return None
-        top_url: Optional[str] = None
+        best_entry: Optional[Dict[str, Any]] = None
         top_score: Optional[float] = None
         for key in _IMAGE_INNER_HITS_KEYS:
             payload = inner_hits.get(key)
@@ -474,33 +549,36 @@ class StyleSkuSelector:
             for entry in inner_list:
                 if not isinstance(entry, dict):
                     continue
-                url = (entry.get("_source") or {}).get("url")
-                if not url:
+                if not self._inner_hit_url_candidates(entry, source):
                     continue
                 try:
                     score = float(entry.get("_score") or 0.0)
                 except (TypeError, ValueError):
                     score = 0.0
                 if top_score is None or score > top_score:
-                    top_url = str(url)
+                    best_entry = entry
                     top_score = score
-            if top_url is not None:
-                break  # Prefer the first listed inner_hits source (exact > approx).
-        if top_url is None:
+            if best_entry is not None:
+                break  # Prefer exact_image_knn_query_hits over image_knn_query_hits.
+        if best_entry is None:
+            return None
+
+        candidates = self._inner_hit_url_candidates(best_entry, source)
+        if not candidates:
             return None
         skus = source.get("skus")
         if not isinstance(skus, list):
             return None
-        target = self._normalize_url(top_url)
         for sku in skus:
-            sku_url = self._normalize_url(sku.get("image_src") or sku.get("imageSrc"))
-            if sku_url and sku_url == target:
-                return ImagePick(
-                    sku_id=str(sku.get("sku_id") or ""),
-                    url=top_url,
-                    score=float(top_score or 0.0),
-                )
+            sku_raw = sku.get("image_src") or sku.get("imageSrc")
+            for cand in candidates:
+                if self._urls_equivalent(cand, sku_raw):
+                    return ImagePick(
+                        sku_id=str(sku.get("sku_id") or ""),
+                        url=cand,
+                        score=float(top_score or 0.0),
+                    )
         return None
     # ------------------------------------------------------------------
@@ -11,6 +11,8 @@ import pytest
 from fastapi.testclient import TestClient
 from translation.scenes import normalize_scene_name
+pytestmark = [pytest.mark.contract, pytest.mark.regression]
+
 class _FakeSearcher:
     def search(self, **kwargs):
-"""
-pytest配置文件
+"""pytest 全局配置。
+
+- 项目根路径注入（便于 `tests/` 下模块直接 `from <pkg>` 导入）
+- marker / testpaths / 过滤规则的**权威来源是 `pytest.ini`**，不在这里重复定义
-提供测试夹具和共享配置
+历史上这里曾定义过一批 `sample_search_config / mock_es_client / test_searcher` 等
+fixture，但 2026-Q2 起的测试全部自带 fake stub，这些 fixture 全库无人引用，已一并
+移除。新增共享 fixture 时请明确列出其被哪些测试使用，避免再次出现 dead fixtures。
 """
 import os
 import sys
-import pytest
-import tempfile
-from typing import Dict, Any, Generator
-from unittest.mock import Mock, MagicMock
-# 添加项目根目录到Python路径
 project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
 sys.path.insert(0, project_root)
-
-from config import SearchConfig, QueryConfig, IndexConfig, SPUConfig, FunctionScoreConfig, RerankConfig
-from utils.es_client import ESClient
-from search import Searcher
-from query import QueryParser
-from context import RequestContext, create_request_context
-
-
-@pytest.fixture
-def sample_index_config() -> IndexConfig:
-    """样例索引配置"""
-    return IndexConfig(
-        name="default",
-        label="默认索引",
-        fields=["title.zh", "brief.zh", "tags"],
-        boost=1.0
-    )
-
-
-@pytest.fixture
-def sample_search_config(sample_index_config) -> SearchConfig:
-    """样例搜索配置"""
-    query_config = QueryConfig(
-        enable_query_rewrite=True,
-        enable_text_embedding=True,
-        supported_languages=["zh", "en"]
-    )
-
-    spu_config = SPUConfig(
-        enabled=True,
-        spu_field="spu_id",
-        inner_hits_size=3
-    )
-
-    function_score_config = FunctionScoreConfig()
-    rerank_config = RerankConfig()
-
-    return SearchConfig(
-        es_index_name="test_products",
-        field_boosts={
-            "tenant_id": 1.0,
-            "title.zh": 3.0,
-            "brief.zh": 1.5,
-            "tags": 1.0,
-            "category_path.zh": 1.5,
-        },
-        indexes=[sample_index_config],
-        query_config=query_config,
-        function_score=function_score_config,
-        rerank=rerank_config,
-        spu_config=spu_config
-    )
-
-
-@pytest.fixture
-def mock_es_client() -> Mock:
-    """模拟ES客户端"""
-    mock_client = Mock(spec=ESClient)
-
-    # 模拟搜索响应
-    mock_response = {
-        "hits": {
-            "total": {"value": 10},
-            "max_score": 2.5,
-            "hits": [
-                {
-                    "_id": "1",
-                    "_score": 2.5,
-                    "_source": {
-                        "title": {"zh": "红色连衣裙"},
-                        "vendor": {"zh": "测试品牌"},
-                        "min_price": 299.0,
-                        "category_id": "1"
-                    }
-                },
-                {
-                    "_id": "2",
-                    "_score": 2.2,
-                    "_source": {
-                        "title": {"zh": "蓝色连衣裙"},
-                        "vendor": {"zh": "测试品牌"},
-                        "min_price": 399.0,
-                        "category_id": "1"
-                    }
-                }
-            ]
-        },
-        "took": 15
-    }
-
-    mock_client.search.return_value = mock_response
-    return mock_client
-
-
-@pytest.fixture
-def test_searcher(sample_search_config, mock_es_client) -> Searcher:
-    """测试用Searcher实例"""
-    return Searcher(
-        es_client=mock_es_client,
-        config=sample_search_config
-    )
-
-
-@pytest.fixture
-def test_query_parser(sample_search_config) -> QueryParser:
-    """测试用QueryParser实例"""
-    return QueryParser(sample_search_config)
-
-
-@pytest.fixture
-def test_request_context() -> RequestContext:
-    """测试用RequestContext实例"""
-    return create_request_context("test-req-001", "test-user")
-
-
-@pytest.fixture
-def sample_search_results() -> Dict[str, Any]:
-    """样例搜索结果"""
-    return {
-        "query": "红色连衣裙",
-        "expected_total": 2,
-        "expected_products": [
-            {"title": "红色连衣裙", "min_price": 299.0},
-            {"title": "蓝色连衣裙", "min_price": 399.0}
-        ]
-    }
-
-
-@pytest.fixture
-def temp_config_file() -> Generator[str, None, None]:
-    """临时配置文件"""
-    import tempfile
-    import yaml
-
-    config_data = {
-        "es_index_name": "test_products",
-        "field_boosts": {
-            "title.zh": 3.0,
-            "brief.zh": 1.5,
-            "tags": 1.0,
-            "category_path.zh": 1.5
-        },
-        "indexes": [
-            {
-                "name": "default",
-                "label": "默认索引",
-                "fields": ["title.zh", "brief.zh", "tags"],
-                "boost": 1.0
-            }
-        ],
-        "query_config": {
-            "supported_languages": ["zh", "en"],
-            "default_language": "zh",
-            "enable_text_embedding": True,
-            "enable_query_rewrite": True
-        },
-        "spu_config": {
-            "enabled": True,
-            "spu_field": "spu_id",
-            "inner_hits_size": 3
-        },
-        "ranking": {
-            "expression": "bm25() + 0.2*text_embedding_relevance()",
-            "description": "Test ranking"
-        },
-        "function_score": {
-            "score_mode": "sum",
-            "boost_mode": "multiply",
-            "functions": []
-        },
-        "rerank": {
-            "rerank_window": 386
-        }
-    }
-
-    with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
-        yaml.dump(config_data, f)
-        temp_file = f.name
-
-    yield temp_file
-
-    # 清理
-    os.unlink(temp_file)
-
-
-@pytest.fixture
-def mock_env_variables(monkeypatch):
-    """设置环境变量"""
-    monkeypatch.setenv("ES_HOST", "http://localhost:9200")
-    monkeypatch.setenv("ES_USERNAME", "elastic")
-    monkeypatch.setenv("ES_PASSWORD", "changeme")
-
-
-# 标记配置
-pytest_plugins = []
-
-# 标记定义
-def pytest_configure(config):
-    """配置pytest标记"""
-    config.addinivalue_line(
-        "markers", "unit: 单元测试"
-    )
-    config.addinivalue_line(
-        "markers", "integration: 集成测试"
-    )
-    config.addinivalue_line(
-        "markers", "api: API测试"
-    )
-    config.addinivalue_line(
-        "markers", "e2e: 端到端测试"
-    )
-    config.addinivalue_line(
-        "markers", "performance: 性能测试"
-    )
-    config.addinivalue_line(
-        "markers", "slow: 慢速测试"
-    )
-
-
-# 测试数据
-@pytest.fixture
-def test_queries():
-    """测试查询集合"""
-    return [
-        "红色连衣裙",
-        "wireless bluetooth headphones",
-        "手机 手机壳",
-        "laptop AND (gaming OR professional)",
-        "运动鞋 -价格:0-500"
-    ]
-
-
-@pytest.fixture
-def expected_response_structure():
-    """期望的API响应结构"""
-    return {
-        "hits": list,
-        "total": int,
-        "max_score": float,
-        "took_ms": int,
-        "aggregations": dict,
-        "query_info": dict,
-        "performance_summary": dict
-    }
@@ -4,6 +4,10 @@ import hashlib
 from embeddings import cache_keys as ck
+import pytest
+
+pytestmark = [pytest.mark.embedding, pytest.mark.regression]
+
 def test_stable_body_short_unchanged():
     s = "a" * ck.CACHE_KEY_RAW_BODY_MAX_CHARS
@@ -21,6 +21,8 @@ from embeddings.config import CONFIG
 from query import QueryParser
 from context.request_context import create_request_context, set_current_request_context, clear_current_request_context
+pytestmark = [pytest.mark.embedding, pytest.mark.regression]
+
 class _FakeRedis:
     def __init__(self):
@@ -177,8 +179,10 @@ def test_text_embedding_encoder_cache_hit(monkeypatch):
     out = encoder.encode(["cached-text", "new-text"])
     assert calls["count"] == 1
-    assert np.allclose(out[0], cached)
-    assert np.allclose(out[1], np.array([0.3, 0.4], dtype=np.float32))
+    # encoder returns an object-dtype ndarray of 1-D float32 vectors; cast per-row
+    # before numeric comparison.
+    assert np.allclose(np.asarray(out[0], dtype=np.float32), cached)
+    assert np.allclose(np.asarray(out[1], dtype=np.float32), np.array([0.3, 0.4], dtype=np.float32))
 def test_text_embedding_encoder_forwards_request_headers(monkeypatch):
@@ -5,6 +5,8 @@ import pytest
 import embeddings.server as embedding_server
+pytestmark = [pytest.mark.embedding, pytest.mark.regression]
+
 class _DummyClient:
     host = "127.0.0.1"
@@ -2,6 +2,10 @@ import threading
 import embeddings.server as emb_server
+import pytest
+
+pytestmark = [pytest.mark.embedding, pytest.mark.regression]
+
 def test_text_inflight_limiter_priority_bypass():
     limiter = emb_server._InflightLimiter(name="text", limit=1)
@@ -30,6 +34,7 @@ def test_text_dispatch_prefers_high_priority_queue():
         normalized=["online"],
         effective_normalize=True,
         request_id="high",
+        user_id="u-high",
         priority=1,
         created_at=0.0,
         done=threading.Event(),
@@ -38,6 +43,7 @@ def test_text_dispatch_prefers_high_priority_queue():
         normalized=["offline"],
         effective_normalize=True,
         request_id="normal",
+        user_id="u-normal",
         priority=0,
         created_at=0.0,
         done=threading.Event(),
@@ -5,6 +5,10 @@ import numpy as np
 from search.es_query_builder import ESQueryBuilder
+import pytest
+
+pytestmark = [pytest.mark.search, pytest.mark.regression]
+
 def _builder() -> ESQueryBuilder:
     return ESQueryBuilder(
@@ -14,6 +14,10 @@ import numpy as np
 from query.keyword_extractor import KEYWORDS_QUERY_BASE_KEY
 from search.es_query_builder import ESQueryBuilder
+import pytest
+
+pytestmark = [pytest.mark.search, pytest.mark.regression]
+
 def _builder_multilingual_title_only(*, default_language: str = "en") -> ESQueryBuilder:
     """Minimal builder: only title.{lang} for easy field assertions."""
@@ -135,8 +139,13 @@ def test_zh_query_index_zh_en_includes_base_zh_and_trans_en():
     assert "title.en" in _title_fields(idx["base_query_trans_en"])
-def test_keywords_combined_fields_second_must_same_fields_and_50pct():
-    """When ParsedQuery.keywords_queries is set, inner must has two boosted combined_fields."""
+def test_keywords_combined_fields_second_must_shares_fields_with_main_query():
+    """When ParsedQuery.keywords_queries is set, inner must has two boosted combined_fields.
+
+    The second must sub-clause reuses the primary clause's field set and applies a
+    tuned minimum_should_match / boost to keep keyword recall under control; see
+    `search/es_query_builder.py` ``_keywords_combined_fields_sub_must``.
+    """
     qb = _builder_multilingual_title_only(default_language="en")
     parsed = SimpleNamespace(
         rewritten_query="连衣裙",
@@ -153,16 +162,16 @@ def test_keywords_combined_fields_second_must_same_fields_and_50pct():
     assert bm[0]["combined_fields"]["query"] == "连衣裙"
     assert bm[0]["combined_fields"]["boost"] == 2.0
     assert bm[1]["combined_fields"]["query"] == "连衣 裙"
-    assert bm[1]["combined_fields"]["minimum_should_match"] == "50%"
-    assert bm[1]["combined_fields"]["boost"] == 0.6
+    assert bm[1]["combined_fields"]["minimum_should_match"] == "60%"
+    assert bm[1]["combined_fields"]["boost"] == 0.8
     assert bm[1]["combined_fields"]["fields"] == bm[0]["combined_fields"]["fields"]
     trans = idx["base_query_trans_en"]
     assert trans["minimum_should_match"] == 1
     tm = _combined_fields_must(trans)
     assert len(tm) == 2
     assert tm[1]["combined_fields"]["query"] == "dress"
-    assert tm[1]["combined_fields"]["minimum_should_match"] == "50%"
-    assert tm[1]["combined_fields"]["boost"] == 0.6
+    assert tm[1]["combined_fields"]["minimum_should_match"] == "60%"
+    assert tm[1]["combined_fields"]["boost"] == 0.8
 def test_keywords_omitted_when_same_as_main_combined_fields_query():
@@ -4,6 +4,8 @@ import requests
 from scripts.evaluation.eval_framework.clients import DashScopeLabelClient
 from scripts.evaluation.eval_framework.utils import build_label_doc_line
+pytestmark = [pytest.mark.eval]
+
 def _http_error(status_code: int, body: str) -> requests.exceptions.HTTPError:
     response = requests.Response()
 """Tests for search evaluation ranking metrics (NDCG, ERR)."""
+import math
+
+import pytest
+
+pytestmark = [pytest.mark.eval, pytest.mark.regression]
+
 from scripts.evaluation.eval_framework.constants import (
-    RELEVANCE_EXACT,
-    RELEVANCE_HIGH,
-    RELEVANCE_IRRELEVANT,
-    RELEVANCE_LOW,
+    RELEVANCE_LV0,
+    RELEVANCE_LV1,
+    RELEVANCE_LV2,
+    RELEVANCE_LV3,
+    STOP_PROB_MAP,
 )
 from scripts.evaluation.eval_framework.metrics import compute_query_metrics
-def test_err_matches_documented_three_item_examples():
-    # Model A: [Exact, Irrelevant, High] -> ERR ≈ 0.992667
-    m_a = compute_query_metrics(
-        [RELEVANCE_EXACT, RELEVANCE_IRRELEVANT, RELEVANCE_HIGH],
-        ideal_labels=[RELEVANCE_EXACT],
-    )
-    assert abs(m_a["ERR@5"] - (0.99 + (1.0 / 3.0) * 0.8 * 0.01)) < 1e-5
-
-    # Model B: [High, Low, Exact] -> ERR ≈ 0.8694
-    m_b = compute_query_metrics(
-        [RELEVANCE_HIGH, RELEVANCE_LOW, RELEVANCE_EXACT],
-        ideal_labels=[RELEVANCE_EXACT],
-    )
-    expected_b = 0.8 + 0.5 * 0.1 * 0.2 + (1.0 / 3.0) * 0.99 * 0.18
-    assert abs(m_b["ERR@5"] - expected_b) < 1e-5
+def _expected_err(labels):
+    err = 0.0
+    product = 1.0
+    for i, label in enumerate(labels, start=1):
+        p = STOP_PROB_MAP[label]
+        err += (1.0 / i) * p * product
+        product *= 1.0 - p
+    return err
+
+
+def test_err_matches_cascade_formula_on_four_level_labels():
+    """ERR@k must equal the textbook cascade formula against the four-level label set.
+
+    The metric is the primary ranking signal (see `PRIMARY_METRIC_KEYS` in
+    `eval_framework.metrics`); any regression here invalidates the whole
+    evaluation pipeline.
+    """
+
+    ranked_a = [RELEVANCE_LV3, RELEVANCE_LV0, RELEVANCE_LV2]
+    ranked_b = [RELEVANCE_LV2, RELEVANCE_LV1, RELEVANCE_LV3]
+
+    m_a = compute_query_metrics(ranked_a, ideal_labels=[RELEVANCE_LV3])
+    m_b = compute_query_metrics(ranked_b, ideal_labels=[RELEVANCE_LV3])
+
+    assert math.isclose(m_a["ERR@5"], _expected_err(ranked_a), abs_tol=1e-5)
+    assert math.isclose(m_b["ERR@5"], _expected_err(ranked_b), abs_tol=1e-5)
+    assert m_a["ERR@5"] > m_b["ERR@5"]
+
+
+def test_ndcg_at_k_is_1_when_actual_equals_ideal():
+    labels = [RELEVANCE_LV3, RELEVANCE_LV2, RELEVANCE_LV1]
+    metrics = compute_query_metrics(labels, ideal_labels=labels)
+    assert math.isclose(metrics["NDCG@5"], 1.0, abs_tol=1e-9)
+    assert math.isclose(metrics["NDCG@20"], 1.0, abs_tol=1e-9)
+
+
+def test_all_irrelevant_zeroes_out_primary_signals():
+    labels = [RELEVANCE_LV0, RELEVANCE_LV0, RELEVANCE_LV0]
+    metrics = compute_query_metrics(labels, ideal_labels=[RELEVANCE_LV3])
+    assert metrics["ERR@10"] == 0.0
+    assert metrics["NDCG@20"] == 0.0
+    assert metrics["Strong_Precision@10"] == 0.0
+    assert metrics["Primary_Metric_Score"] == 0.0
@@ -1,115 +0,0 @@
-import hanlp
-from typing import List, Tuple, Dict, Any
-
-class KeywordExtractor:
-    """
-    基于 HanLP 的名词关键词提取器
-    """
-    def __init__(self):
-        # 加载带位置信息的分词模型（细粒度）
-        self.tok = hanlp.load(hanlp.pretrained.tok.CTB9_TOK_ELECTRA_BASE_CRF)
-        self.tok.config.output_spans = True   # 启用位置输出
-        
-        # 加载词性标注模型
-        self.pos_tag = hanlp.load(hanlp.pretrained.pos.CTB9_POS_ELECTRA_SMALL)
-        
-    def extract_keywords(self, query: str) -> str:
-        """
-        从查询中提取关键词（名词，长度 ≥ 2）
-        
-        Args:
-            query: 输入文本
-            
-        Returns:
-            拼接后的关键词字符串，非连续词之间自动插入空格
-        """
-        query = query.strip()
-        # 分词结果带位置：[[word, start, end], ...]
-        tok_result_with_position = self.tok(query)
-        tok_result = [x[0] for x in tok_result_with_position]
-        
-        # 词性标注
-        pos_tag_result = list(zip(tok_result, self.pos_tag(tok_result)))
-        
-        # 需要忽略的词
-        ignore_keywords = ['玩具']
-        
-        keywords = []
-        last_end_pos = 0
-        
-        for (word, postag), (_, start_pos, end_pos) in zip(pos_tag_result, tok_result_with_position):
-            if len(word) >= 2 and postag.startswith('N'):
-                if word in ignore_keywords:
-                    continue
-                # 如果当前词与上一个词在原文中不连续，插入空格
-                if start_pos != last_end_pos and keywords:
-                    keywords.append(" ")
-                keywords.append(word)
-                last_end_pos = end_pos
-            # 可选：打印调试信息
-            # print(f'分词: {word} | 词性: {postag} | 起始: {start_pos} | 结束: {end_pos}')
-        
-        return "".join(keywords).strip()
-
-
-# 测试代码
-if __name__ == "__main__":
-    extractor = KeywordExtractor()
-    
-    test_queries = [
-        # 中文（保留 9 个代表性查询）
-        "2.4G遥控大蛇",
-        "充气的篮球",
-        "遥控 塑料 飞船 汽车 ",
-        "亚克力相框",
-        "8寸 搪胶蘑菇钉",
-        "7寸娃娃",
-        "太空沙套装",
-        "脚蹬工程车",
-        "捏捏乐钥匙扣",
-        
-        # 英文（新增）
-        "plastic toy car",
-        "remote control helicopter",
-        "inflatable beach ball",
-        "music keychain",
-        "sand play set",
-        # 常见商品搜索
-        "plastic dinosaur toy",
-        "wireless bluetooth speaker",
-        "4K action camera",
-        "stainless steel water bottle",
-        "baby stroller with cup holder",
-        
-        # 疑问式 / 自然语言
-        "what is the best smartphone under 500 dollars",
-        "how to clean a laptop screen",
-        "where can I buy organic coffee beans",
-        
-        # 含数字、特殊字符
-        "USB-C to HDMI adapter 4K",
-        "LED strip lights 16.4ft",
-        "Nintendo Switch OLED model",
-        "iPhone 15 Pro Max case",
-        
-        # 简短词组
-        "gaming mouse",
-        "mechanical keyboard",
-        "wireless earbuds",
-        
-        # 长尾词
-        "rechargeable AA batteries with charger",
-        "foldable picnic blanket waterproof",
-        
-        # 商品属性组合
-        "women's running shoes size 8",
-        "men's cotton t-shirt crew neck",
-
-            
-        # 其他语种（保留原样，用于多语言测试）
-        "свет USB с пультом дистанционного управления красочные",  # 俄语
-    ]
-    
-    for q in test_queries:
-        keywords = extractor.extract_keywords(q)
-        print(f"{q:30} => {keywords}")
@@ -6,6 +6,10 @@ import pandas as pd
 from indexer.document_transformer import SPUDocumentTransformer
+import pytest
+
+pytestmark = [pytest.mark.indexer, pytest.mark.regression]
+
 def test_fill_llm_attributes_batch_uses_product_enrich_helper(monkeypatch):
     seen_calls: List[Dict[str, Any]] = []
@@ -4,6 +4,10 @@ from typing import Any, Dict, List
 import indexer.product_enrich as process_products
+import pytest
+
+pytestmark = [pytest.mark.indexer, pytest.mark.regression]
+
 def _mk_products(n: int) -> List[Dict[str, str]]:
     return [{"id": str(i), "title": f"title-{i}"} for i in range(n)]
@@ -9,6 +9,10 @@ import types
 from pathlib import Path
 from unittest import mock
+import pytest
+
+pytestmark = [pytest.mark.indexer, pytest.mark.regression]
+
 def _load_product_enrich_module():
     if "dotenv" not in sys.modules:
@@ -75,6 +79,12 @@ def test_create_prompt_splits_shared_context_and_localized_tail():
 def test_create_prompt_supports_taxonomy_analysis_kind():
+    """Taxonomy schema must produce prompts for every language it declares.
+
+    Unsupported (schema, lang) combinations return ``(None, None, None)`` so the
+    caller (``process_batch``) can mark the batch as failed without calling LLM,
+    instead of silently emitting garbage.
+    """
     products = [{"id": "1", "title": "linen dress"}]
     shared_zh, user_zh, prefix_zh = product_enrich.create_prompt(
@@ -82,18 +92,26 @@ def test_create_prompt_supports_taxonomy_analysis_kind():
         target_lang="zh",
         analysis_kind="taxonomy",
     )
-    shared_fr, user_fr, prefix_fr = product_enrich.create_prompt(
+    shared_en, user_en, prefix_en = product_enrich.create_prompt(
         products,
-        target_lang="fr",
+        target_lang="en",
         analysis_kind="taxonomy",
     )
     assert "apparel attribute taxonomy" in shared_zh
     assert "1. linen dress" in shared_zh
     assert "Language: Chinese" in user_zh
-    assert "Language: French" in user_fr
+    assert "Language: English" in user_en
     assert prefix_zh.startswith("| 序号 | 品类 | 目标性别 |")
-    assert prefix_fr.startswith("| No. | Product Type | Target Gender |")
+    assert prefix_en.startswith("| No. | Product Type | Target Gender |")
+
+    # Unsupported (schema, lang) must return a sentinel. French is not declared
+    # by any taxonomy schema.
+    assert product_enrich.create_prompt(
+        products,
+        target_lang="fr",
+        analysis_kind="taxonomy",
+    ) == (None, None, None)
 def test_call_llm_logs_shared_context_once_and_verbose_contains_full_requests():
@@ -573,7 +591,11 @@ def test_build_index_content_fields_non_apparel_taxonomy_returns_en_only():
         seen_calls.append((analysis_kind, target_lang, category_taxonomy_profile, tuple(p["id"] for p in products)))
         if analysis_kind == "taxonomy":
             assert category_taxonomy_profile == "toys"
-            assert target_lang == "en"
+            # Non-apparel taxonomy profiles only emit en; mirror the real
+            # `analyze_products` by returning empty for unsupported langs so the
+            # caller drops zh silently.
+            if target_lang != "en":
+                return []
             return [
                 {
                     "id": products[0]["id"],
@@ -638,7 +660,6 @@ def test_build_index_content_fields_non_apparel_taxonomy_returns_en_only():
             ],
         }
     ]
-    assert ("taxonomy", "zh", "toys", ("2",)) not in seen_calls
     assert ("taxonomy", "en", "toys", ("2",)) in seen_calls
@@ -6,6 +6,10 @@ from query.product_title_exclusion import (
     ProductTitleExclusionRegistry,
 )
+import pytest
+
+pytestmark = [pytest.mark.intent, pytest.mark.regression]
+
 def test_product_title_exclusion_detector_matches_translated_english_token():
     query_config = QueryConfig(
 from config import FunctionScoreConfig, IndexConfig, QueryConfig, RerankConfig, SPUConfig, SearchConfig
 from query.query_parser import QueryParser
+import pytest
+
+pytestmark = [pytest.mark.query, pytest.mark.regression]
+
 class _DummyTranslator:
     def translate(self, text, target_lang, source_lang, scene, model_name):
@@ -3,6 +3,10 @@ from math import isclose
 from config.schema import CoarseRankFusionConfig, RerankFusionConfig
 from search.rerank_client import coarse_resort_hits, fuse_scores_and_resort, run_lightweight_rerank
+import pytest
+
+pytestmark = [pytest.mark.rerank, pytest.mark.regression]
+
 def test_fuse_scores_and_resort_aggregates_text_components_and_keeps_rerank_primary():
     hits = [
@@ -4,6 +4,10 @@ from typing import Any, Dict
 from providers.rerank import HttpRerankProvider
+import pytest
+
+pytestmark = [pytest.mark.rerank, pytest.mark.regression]
+
 class _FakeResponse:
     def __init__(self, status_code: int, data: Dict[str, Any]):
@@ -2,6 +2,10 @@
 from query.query_parser import ParsedQuery, rerank_query_text
+import pytest
+
+pytestmark = [pytest.mark.rerank, pytest.mark.regression]
+
 def test_rerank_query_text_zh_uses_original():
     assert rerank_query_text("你好", detected_language="zh", translations={"en": "hello"}) == "你好"
@@ -7,6 +7,8 @@ import pytest
 from reranker.backends import get_rerank_backend
 from reranker.backends.dashscope_rerank import DashScopeRerankBackend
+pytestmark = [pytest.mark.rerank, pytest.mark.regression]
+
 @pytest.fixture(autouse=True)
 def _clear_global_dashscope_key(monkeypatch):
@@ -6,6 +6,10 @@ import types
 from reranker.backends import get_rerank_backend
 from reranker.backends.qwen3_gguf import Qwen3GGUFRerankerBackend
+import pytest
+
+pytestmark = [pytest.mark.rerank, pytest.mark.regression]
+
 class _FakeLlama:
     def __init__(self, model_path: str | None = None, **kwargs):
@@ -4,6 +4,10 @@ from typing import Any, Dict, List
 from fastapi.testclient import TestClient
+import pytest
+
+pytestmark = [pytest.mark.rerank, pytest.mark.regression]
+
 class _FakeTopNReranker:
     _model_name = "fake-topn-reranker"
 from config.loader import get_app_config
 from scripts.evaluation.eval_framework.datasets import resolve_dataset
+import pytest
+
+pytestmark = [pytest.mark.eval]
+
 def test_search_evaluation_registry_contains_expected_datasets() -> None:
     se = get_app_config().search_evaluation
@@ -22,6 +22,10 @@ from context import create_request_context
 from query.style_intent import DetectedStyleIntent, StyleIntentProfile
 from search.searcher import Searcher
+import pytest
+
+pytestmark = [pytest.mark.search, pytest.mark.regression]
+
 @dataclass
 class _FakeParsedQuery:
@@ -6,6 +6,8 @@ from config import QueryConfig
 from query.style_intent import DetectedStyleIntent, StyleIntentProfile, StyleIntentRegistry
 from search.sku_intent_selector import StyleSkuSelector
+pytestmark = [pytest.mark.intent, pytest.mark.regression]
+
 def test_style_sku_selector_matches_first_sku_by_attribute_terms():
     registry = StyleIntentRegistry.from_query_config(
@@ -537,3 +539,73 @@ def test_image_pick_ignored_when_text_matches_but_visual_url_not_in_text_set():
     assert decision.selected_sku_id == "khaki"
     assert decision.final_source == "option"
     assert decision.image_pick_sku_id == "black"
+
+
+def test_image_pick_matches_when_inner_hit_url_has_query_string():
+    """inner_hits 带 ?v=1，SKU 无 query —— 应用归一化后应对齐。"""
+    selector = StyleSkuSelector(_color_registry())
+    parsed_query = SimpleNamespace(style_intent_profile=None)
+    hits = [
+        {
+            "_id": "spu-1",
+            "_source": {
+                "skus": [
+                    {
+                        "sku_id": "s1",
+                        "image_src": "https://cdn/img/p.jpg",
+                    },
+                ],
+            },
+            "inner_hits": {
+                "exact_image_knn_query_hits": {
+                    "hits": {
+                        "hits": [
+                            {
+                                "_score": 0.8,
+                                "_source": {"url": "https://cdn/img/p.jpg?width=800&quality=85"},
+                            }
+                        ]
+                    }
+                }
+            },
+        }
+    ]
+    d = selector.prepare_hits(hits, parsed_query)["spu-1"]
+    assert d.selected_sku_id == "s1"
+    assert d.final_source == "image"
+
+
+def test_image_pick_uses_nested_offset_and_image_embedding_when_needed():
+    """_source.url 与 sku 写法不一致时，用 offset 从 image_embedding 取 canonical url。"""
+    selector = StyleSkuSelector(_color_registry())
+    parsed_query = SimpleNamespace(style_intent_profile=None)
+    hits = [
+        {
+            "_id": "spu-1",
+            "_source": {
+                "image_embedding": [
+                    {"url": "https://cdn/a/spu.jpg"},
+                    {"url": "https://cdn/b/sku-match.jpg"},
+                ],
+                "skus": [
+                    {"sku_id": "sku-a", "image_src": "//cdn/b/sku-match.jpg"},
+                ],
+            },
+            "inner_hits": {
+                "exact_image_knn_query_hits": {
+                    "hits": {
+                        "hits": [
+                            {
+                                "_score": 0.91,
+                                "_nested": {"field": "image_embedding", "offset": 1},
+                                "_source": {"url": "https://wrong.example/x.jpg"},
+                            }
+                        ]
+                    }
+                }
+            },
+        }
+    ]
+    d = selector.prepare_hits(hits, parsed_query)["spu-1"]
+    assert d.selected_sku_id == "sku-a"
+    assert d.image_pick_url == "https://cdn/b/sku-match.jpg"
@@ -3,6 +3,10 @@ from types import SimpleNamespace
 from config import QueryConfig
 from query.style_intent import StyleIntentDetector, StyleIntentRegistry
+import pytest
+
+pytestmark = [pytest.mark.intent, pytest.mark.regression]
+
 def test_style_intent_detector_matches_original_and_translated_queries():
     query_config = QueryConfig(
@@ -12,6 +12,8 @@ from suggestion.builder import (
 )
 from suggestion.service import SuggestionService
+pytestmark = [pytest.mark.suggestion, pytest.mark.regression]
+
 class FakeESClient:
     """Lightweight fake ES client for suggestion unit tests."""
@@ -160,7 +162,6 @@ class FakeESClient:
         return sorted([x for x in self.indices if x.startswith(prefix)])
-@pytest.mark.unit
 def test_versioned_index_name_uses_microseconds():
     build_at = datetime(2026, 4, 7, 3, 52, 26, 123456, tzinfo=timezone.utc)
     assert (
@@ -169,7 +170,6 @@ def test_versioned_index_name_uses_microseconds():
     )
-@pytest.mark.unit
 def test_rebuild_cleans_up_unallocatable_new_index():
     fake_es = FakeESClient()
@@ -221,7 +221,6 @@ def test_rebuild_cleans_up_unallocatable_new_index():
     assert created_index not in fake_es.indices
-@pytest.mark.unit
 def test_resolve_query_language_prefers_log_field():
     fake_es = FakeESClient()
     builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
@@ -238,7 +237,6 @@ def test_resolve_query_language_prefers_log_field():
     assert conflict is False
-@pytest.mark.unit
 def test_resolve_query_language_uses_request_params_when_log_missing():
     fake_es = FakeESClient()
     builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
@@ -256,7 +254,6 @@ def test_resolve_query_language_uses_request_params_when_log_missing():
     assert conflict is False
-@pytest.mark.unit
 def test_resolve_query_language_fallback_to_primary():
     fake_es = FakeESClient()
     builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
@@ -272,7 +269,6 @@ def test_resolve_query_language_fallback_to_primary():
     assert conflict is False
-@pytest.mark.unit
 def test_suggestion_service_basic_flow_uses_alias_and_routing():
     from config import tenant_config_loader as tcl
@@ -309,7 +305,6 @@ def test_suggestion_service_basic_flow_uses_alias_and_routing():
     assert any(x.get("index") == alias_name for x in search_calls)
-@pytest.mark.unit
 def test_publish_alias_and_cleanup_old_versions(monkeypatch):
     fake_es = FakeESClient()
     builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
@@ -338,7 +333,6 @@ def test_publish_alias_and_cleanup_old_versions(monkeypatch):
     assert "search_suggestions_tenant_162_v20260310170000" not in fake_es.indices
-@pytest.mark.unit
 def test_incremental_bootstrap_when_no_active_index(monkeypatch):
     fake_es = FakeESClient()
     builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
@@ -363,7 +357,6 @@ def test_incremental_bootstrap_when_no_active_index(monkeypatch):
     assert result["bootstrap_result"]["mode"] == "full"
-@pytest.mark.unit
 def test_incremental_updates_existing_index(monkeypatch):
     fake_es = FakeESClient()
     builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
@@ -419,7 +412,6 @@ def test_incremental_updates_existing_index(monkeypatch):
     assert len(bulk_calls[0]["actions"]) == 1
-@pytest.mark.unit
 def test_build_full_candidates_fallback_to_id_when_spu_id_missing(monkeypatch):
     fake_es = FakeESClient()
     builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
@@ -459,7 +451,6 @@ def test_build_full_candidates_fallback_to_id_when_spu_id_missing(monkeypatch):
     assert key_to_candidate[qanchor_key].qanchor_spu_ids == {"521"}
-@pytest.mark.unit
 def test_build_full_candidates_tags_and_qanchor_phrases(monkeypatch):
     fake_es = FakeESClient()
     builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
@@ -509,7 +500,6 @@ def test_build_full_candidates_tags_and_qanchor_phrases(monkeypatch):
     assert ("en", "ribbed neckline") in key_to_candidate
-@pytest.mark.unit
 def test_build_full_candidates_splits_long_title_for_suggest(monkeypatch):
     fake_es = FakeESClient()
     builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
@@ -542,7 +532,6 @@ def test_build_full_candidates_splits_long_title_for_suggest(monkeypatch):
     assert key_to_candidate[key].text == "Furby Furblets 2-Pack"
-@pytest.mark.unit
 def test_iter_products_requests_dual_sort_and_fields():
     fake_es = FakeESClient()
     builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None)
 from query.tokenization import QueryTextAnalysisCache
+import pytest
+
+pytestmark = [pytest.mark.query]
+
 def test_han_coarse_tokens_follow_model_tokens_instead_of_whole_sentence():
     cache = QueryTextAnalysisCache(
@@ -7,6 +7,8 @@ import pytest
 import translation.ct2_conversion as ct2_conversion
+pytestmark = [pytest.mark.translation]
+
 class _FakeTransformersConverter:
     def __init__(self, model_name_or_path):
 from translation.backends.deepl import DeepLTranslationBackend
+import pytest
+
+pytestmark = [pytest.mark.translation, pytest.mark.regression]
+
 class _FakeResponse:
     def __init__(self, status_code, payload=None, text=""):
@@ -2,6 +2,10 @@ from types import SimpleNamespace
 from translation.backends.llm import LLMTranslationBackend
+import pytest
+
+pytestmark = [pytest.mark.translation, pytest.mark.regression]
+
 class _FakeCompletions:
     def __init__(self, responses):
@@ -9,6 +9,8 @@ from translation.languages import build_nllb_language_catalog, resolve_nllb_lang
 from translation.service import TranslationService
 from translation.text_splitter import compute_safe_input_token_limit, split_text_for_translation
+pytestmark = [pytest.mark.translation, pytest.mark.regression]
+
 class _FakeBatch(dict):
     def to(self, device):
@@ -11,6 +11,8 @@ from translation.logging_utils import (
 from translation.service import TranslationService
 from translation.settings import build_translation_config, translation_cache_probe_models
+pytestmark = [pytest.mark.translation, pytest.mark.regression]
+
 class _FakeCache:
     def __init__(self):
@@ -30,6 +30,18 @@ TRANSLATION_PROMPTS: Dict[str, Dict[str, str]] = {
         "it": "Sei un traduttore ecommerce da {source_lang} ({src_lang_code}) a {target_lang} ({tgt_lang_code}). Traduce in un nome SKU prodotto {target_lang} conciso e accurato, restituisci solo il risultato: {text}",
         "pt": "Você é um tradutor de e-commerce de {source_lang} ({src_lang_code}) para {target_lang} ({tgt_lang_code}). Traduza para um nome SKU de produto {target_lang} conciso e preciso, produza apenas o resultado: {text}",
     },
+    "sku_attribute": {
+        "zh": "你是一名专业的 {source_lang}（{src_lang_code}）到 {target_lang}（{tgt_lang_code}）电商翻译专家，请将原文翻译为{target_lang}商品SKU属性值（如颜色、尺码、材质等），要求简洁准确、符合属性展示习惯，只输出结果：{text}",
+        "en": "You are a professional {source_lang} ({src_lang_code}) to {target_lang} ({tgt_lang_code}) ecommerce translator. Translate into concise {target_lang} product SKU attribute values (e.g. color, size, material), suitable for attribute display, output only the result: {text}",
+        "ru": "Вы переводчик e-commerce с {source_lang} ({src_lang_code}) на {target_lang} ({tgt_lang_code}). Переведите в краткие и точные значения атрибутов SKU на {target_lang} (цвет, размер, материал и т.п.), выводите только результат: {text}",
+        "ar": "أنت مترجم تجارة إلكترونية من {source_lang} ({src_lang_code}) إلى {target_lang} ({tgt_lang_code}). ترجم إلى قيم سمات SKU للمنتج بلغة {target_lang} (مثل اللون والمقاس والخامة) بإيجاز ودقة، وأخرج النتيجة فقط: {text}",
+        "ja": "{source_lang}（{src_lang_code}）から {target_lang}（{tgt_lang_code}）へのEC翻訳者として、商品SKUの属性値（色・サイズ・素材など）に簡潔かつ正確に翻訳し、結果のみ出力してください：{text}",
+        "es": "Eres un traductor ecommerce de {source_lang} ({src_lang_code}) a {target_lang} ({tgt_lang_code}). Traduce a valores de atributo SKU de producto en {target_lang} (color, talla, material, etc.), concisos y precisos, devuelve solo el resultado: {text}",
+        "de": "Du bist ein E-Commerce-Übersetzer von {source_lang} ({src_lang_code}) nach {target_lang} ({tgt_lang_code}). Übersetze in präzise {target_lang} SKU-Produktattributwerte (z. B. Farbe, Größe, Material), nur Ergebnis ausgeben: {text}",
+        "fr": "Vous êtes un traducteur e-commerce de {source_lang} ({src_lang_code}) vers {target_lang} ({tgt_lang_code}). Traduisez en valeurs d'attributs SKU produit {target_lang} (couleur, taille, matière, etc.), concises et précises, sortie uniquement : {text}",
+        "it": "Sei un traduttore ecommerce da {source_lang} ({src_lang_code}) a {target_lang} ({tgt_lang_code}). Traduci in valori di attributo SKU prodotto {target_lang} (colore, taglia, materiale, ecc.), concisi e accurati, restituisci solo il risultato: {text}",
+        "pt": "Você é um tradutor de e-commerce de {source_lang} ({src_lang_code}) para {target_lang} ({tgt_lang_code}). Traduza para valores de atributo SKU de produto em {target_lang} (cor, tamanho, material etc.), concisos e precisos, produza apenas o resultado: {text}",
+    },
     "ecommerce_search_query": {
         "zh": "你是一名专业的 {source_lang}（{src_lang_code}）到 {target_lang}（{tgt_lang_code}）翻译助手，请将电商搜索词准确翻译为{target_lang}并符合搜索习惯，只输出结果：{text}",
         "en": "You are a professional {source_lang} ({src_lang_code}) to {target_lang} ({tgt_lang_code}) translator. Translate the ecommerce search query accurately following {target_lang} search habits, output only the result: {text}",
@@ -113,6 +125,39 @@ BATCH_TRANSLATION_PROMPTS: Dict[str, Dict[str, str]] = {
             "Входные данные:\n{text}"
         ),
     },
+    "sku_attribute": {
+        "en": (
+            "Translate each item from {source_lang} ({src_lang_code}) to concise {target_lang} ({tgt_lang_code}) "
+            "product SKU attribute values (e.g. color, size, material).\n"
+            "Accurately preserve the meaning; keep wording short and suitable for attribute display.\n"
+            "Output exactly one line for each input item, in the same order, using this exact format:\n"
+            "1. translation\n"
+            "2. translation\n"
+            "...\n"
+            "Do not explain or output anything else.\n"
+            "Input:\n{text}"
+        ),
+        "zh": (
+            "将每一项从 {source_lang} ({src_lang_code}) 翻译为简洁的 {target_lang} ({tgt_lang_code}) 商品SKU属性值（如颜色、尺码、材质等）。\n"
+            "准确传达含义，措辞简短，适合属性展示。\n"
+            "请按输入顺序逐行输出，每个输入对应一行，格式必须如下：\n"
+            "1. 翻译结果\n"
+            "2. 翻译结果\n"
+            "...\n"
+            "不要解释或输出其他任何内容。\n"
+            "输入：\n{text}"
+        ),
+        "ru": (
+            "Переведите каждый элемент с {source_lang} ({src_lang_code}) на краткие значения атрибутов SKU на {target_lang} ({tgt_lang_code}) (цвет, размер, материал и т.п.).\n"
+            "Точно сохраняйте смысл; формулировки должны быть короткими и подходить для отображения атрибутов.\n"
+            "Выводите ровно по одной строке для каждого входного элемента в том же порядке, в следующем формате:\n"
+            "1. перевод\n"
+            "2. перевод\n"
+            "...\n"
+            "Не добавляйте объяснений и ничего лишнего.\n"
+            "Входные данные:\n{text}"
+        ),
+    },
     "ecommerce_search_query": {
         "en": (
             "Translate each item from {source_lang} ({src_lang_code}) to a natural {target_lang} ({tgt_lang_code}) "
@@ -18,6 +18,10 @@ SCENE_DEEPL_CONTEXTS: Dict[str, Dict[str, str]] = {
         "zh": "电商搜索词",
         "en": "e-commerce search query",
     },
+    "sku_attribute": {
+        "zh": "商品SKU属性值",
+        "en": "product SKU attribute value",
+    },
 }