Commit 99b72698b556ae19a00ba4cb6206a1343f033abe
1 parent
5c9baf91
测试回归钩子梳理
变更清单 修复(6 处漂移用例,全部更新到最新实现) - `tests/test_eval_metrics.py` — 整体重写为新的 4 级 label + 级联公式断言,放弃旧的 `RELEVANCE_EXACT/HIGH/LOW/IRRELEVANT` 和硬编码 ERR 值。 - `tests/test_embedding_service_priority.py` — 补齐 `_TextDispatchTask(user_id=...)` 新必填位。 - `tests/test_embedding_pipeline.py` — cache-hit 路径的 `np.allclose` 改用 `np.asarray(..., dtype=float32)` 避开 object-dtype。 - `tests/test_es_query_builder_text_recall_languages.py` — keywords 次 combined_fields 的期望值对齐现行值(`MSM 60% / boost 0.8`)并重命名。 - `tests/test_product_enrich_partial_mode.py` - `test_create_prompt_supports_taxonomy_analysis_kind`:去掉错误假设(fr 不属于任何 taxonomy schema),明确 `(None, None, None)` sentinel 的契约。 - `test_build_index_content_fields_non_apparel_taxonomy_returns_en_only`:fake 模拟真实 schema 行为(unsupported lang 返回空列表),删除"zh 未被调用"的过时断言。 清理历史过渡物(per 开发原则:不保留内部双轨) - 删除 `tests/test_keywords_query.py`(已被 `query/keyword_extractor.py` 生产实现取代的早期原型)。 - `tests/test_facet_api.py` / `tests/test_cnclip_service.py` 移动到 `tests/manual/`,更新 `tests/manual/README.md` 说明分工。 - 重写 `tests/conftest.py`:仅保留 `sys.path` 注入,删除全库无人引用的 `sample_search_config / mock_es_client / test_searcher / temp_config_file` 等 fixture。 - 删除 `tests/test_suggestions.py` 中 13 处残留 `@pytest.mark.unit` 装饰器(模块级 `pytestmark` 已覆盖)。 新建一致性基础设施 - `pytest.ini`:权威配置源。`testpaths = tests`、`norecursedirs = tests/manual`、`--strict-markers`、登记所有子系统 marker + `regression` marker。 - `tests/ci/test_service_api_contracts.py` + 30 个 `tests/test_*.py` 批量贴上 `pytestmark = [pytest.mark.<subsystem>, pytest.mark.regression]`(AST 安全插入,避开多行 import)。 - `scripts/run_regression_tests.sh` 新建,支持 `SUBSYSTEM=<name>` 选子集。 - `scripts/run_ci_tests.sh` 扩容:由原先的 `tests/ci -q` 改为 `contract` marker + `search ∧ regression` 双阶段。 文档统一(删除历史双轨) - 重写 `docs/测试Pipeline说明.md`:删除 `tests/unit/` / `tests/integration/` / `scripts/start_test_environment.sh` 等早已不存在的引用,给出目录约定、marker 表、回归锚点矩阵、覆盖缺口清单、联调脚本用法。 - 删除 `docs/测试回归钩子梳理-2026-04-20.md`(内容已合并进上面一份权威文档,按"一处真相"原则下掉)。 - `docs/DEVELOPER_GUIDE.md §8.2 测试` 改写,指向 pipeline 权威文档。 - `CLAUDE.md` 的 `Testing` 与 `Testing Infrastructure` 两节同步更新。 最终状态 | 指标 | 结果 | |------|------| | 全量 `pytest tests/` | **241 passed** | | `./scripts/run_ci_tests.sh` | 45 passed | | `./scripts/run_regression_tests.sh` | 233 passed | | 子系统子集(示例) | search=45 / rerank=35 / embedding=23 / intent=25 / translation=33 / indexer=17 / suggestion=13 / query=6 / eval=8 / contract=34 | | 未清零的已知缺口 | 见新版 `测试Pipeline说明.md §4`(function_score / facet / image search / config loader / document_transformer 等 6 条) | Pipeline 文档里 §4 的覆盖缺口我没有强行补测用例——那属于"新增覆盖",不是这次清理的范畴;只要后续谁补,把对应 marker 贴上去、从清单里划掉即可。
Showing
45 changed files
with
593 additions
and
930 deletions
Show diff stats
CLAUDE.md
| ... | ... | @@ -99,18 +99,29 @@ python main.py serve --host 0.0.0.0 --port 6002 --reload |
| 99 | 99 | |
| 100 | 100 | ### Testing |
| 101 | 101 | ```bash |
| 102 | -# Run all tests | |
| 103 | -pytest tests/ | |
| 102 | +# CI gate (API contracts + search core regression anchors) | |
| 103 | +./scripts/run_ci_tests.sh | |
| 104 | + | |
| 105 | +# Full regression anchor suite (pre-release / pre-merge) | |
| 106 | +./scripts/run_regression_tests.sh | |
| 107 | + | |
| 108 | +# Subsystem-scoped regression (e.g. search / query / intent / rerank / embedding / translation / indexer / suggestion) | |
| 109 | +SUBSYSTEM=rerank ./scripts/run_regression_tests.sh | |
| 104 | 110 | |
| 105 | -# Run focused regression sets | |
| 106 | -python -m pytest tests/ci -q | |
| 111 | +# Whole automated suite | |
| 112 | +python -m pytest tests/ -q | |
| 113 | + | |
| 114 | +# Focused debugging | |
| 107 | 115 | pytest tests/test_rerank_client.py |
| 108 | 116 | pytest tests/test_query_parser_mixed_language.py |
| 109 | 117 | |
| 110 | -# Test search from command line | |
| 118 | +# Command-line smoke | |
| 111 | 119 | python main.py search "query" --tenant-id 1 --size 10 |
| 112 | 120 | ``` |
| 113 | 121 | |
| 122 | +See `docs/测试Pipeline说明.md` for the authoritative test pipeline guide, | |
| 123 | +including the regression hook matrix and marker conventions. | |
| 124 | + | |
| 114 | 125 | ### Development Utilities |
| 115 | 126 | ```bash |
| 116 | 127 | # Stop all services |
| ... | ... | @@ -218,24 +229,24 @@ The system uses centralized configuration through `config/config.yaml`: |
| 218 | 229 | |
| 219 | 230 | ## Testing Infrastructure |
| 220 | 231 | |
| 221 | -**Test Framework**: pytest with async support | |
| 232 | +**Framework**: pytest. Authoritative guide: `docs/测试Pipeline说明.md`. | |
| 233 | + | |
| 234 | +**Layout**: | |
| 235 | +- `tests/` — flat file layout; each file targets one subsystem. | |
| 236 | +- `tests/ci/` — API / service contract tests (FastAPI `TestClient` with fake backends). | |
| 237 | +- `tests/manual/` — scripts that need live services (pytest does **not** collect these). | |
| 238 | +- `tests/conftest.py` — sys.path injection only. No global fixtures; all fakes live next to the tests that use them. | |
| 222 | 239 | |
| 223 | -**Test Structure**: | |
| 224 | -- `tests/conftest.py`: Comprehensive test fixtures and configuration | |
| 225 | -- `tests/unit/`: Unit tests for individual components | |
| 226 | -- `tests/integration/`: Integration tests for system workflows | |
| 227 | -- Test markers: `@pytest.mark.unit`, `@pytest.mark.integration`, `@pytest.mark.api` | |
| 240 | +**Markers** (registered in `pytest.ini`, enforced by `--strict-markers`): | |
| 241 | +- Subsystem: `contract`, `search`, `query`, `intent`, `rerank`, `embedding`, `translation`, `indexer`, `suggestion`, `eval`. | |
| 242 | +- Regression gate: `regression` — anchor tests mandatory for `run_regression_tests.sh`. | |
| 228 | 243 | |
| 229 | 244 | **Test Data**: |
| 230 | 245 | - Tenant1: Mock data with 10,000 product records |
| 231 | 246 | - Tenant2: CSV-based test dataset |
| 232 | 247 | - Automated test data generation via `scripts/mock_data.sh` |
| 233 | 248 | |
| 234 | -**Key Test Fixtures** (from `conftest.py`): | |
| 235 | -- `sample_search_config`: Complete configuration for testing | |
| 236 | -- `mock_es_client`: Mocked Elasticsearch client | |
| 237 | -- `test_searcher`: Searcher instance with mock dependencies | |
| 238 | -- `temp_config_file`: Temporary YAML configuration for tests | |
| 249 | +**Principle**: tests must inject fakes for ES / DeepL / LLM / Redis. Never add tests that rely on real external services to the automated suite — put them under `tests/manual/`. | |
| 239 | 250 | |
| 240 | 251 | ## API Endpoints |
| 241 | 252 | ... | ... |
docs/DEVELOPER_GUIDE.md
| ... | ... | @@ -386,11 +386,16 @@ services: |
| 386 | 386 | |
| 387 | 387 | ### 8.2 测试 |
| 388 | 388 | |
| 389 | -- **位置**:`tests/`,可按 `unit/`、`integration/` 或按模块划分子目录;公共 fixture 在 `conftest.py`。 | |
| 390 | -- **标记**:使用 `@pytest.mark.unit`、`@pytest.mark.integration`、`@pytest.mark.api` 等区分用例类型,便于按需运行。 | |
| 391 | -- **依赖**:单元测试通过 mock(如 `mock_es_client`、`sample_search_config`)不依赖真实 ES/DB;集成测试需在说明中注明依赖服务。 | |
| 392 | -- **运行**:`python -m pytest tests/`;推荐最小回归:`python -m pytest tests/ci -q`;按模块聚焦可直接指定具体测试文件。 | |
| 393 | -- **原则**:新增逻辑应有对应测试;修改协议或配置契约时更新相关测试与 fixture。 | |
| 389 | +测试流水线的权威说明见 [`docs/测试Pipeline说明.md`](./测试Pipeline说明.md)。核心约定: | |
| 390 | + | |
| 391 | +- **位置**:`tests/` 下按文件平铺,`tests/ci/` 放 API 契约测试,`tests/manual/` 放需人工起服务的联调脚本(pytest 默认不 collect)。 | |
| 392 | +- **Marker**:`pytest.ini` 里登记了子系统 marker(`search / query / intent / rerank / embedding / translation / indexer / suggestion / eval / contract`)与 `regression` marker;新测试必须贴对应 marker(`--strict-markers` 会强制)。 | |
| 393 | +- **依赖**:测试一律通过注入 fake stub 隔离 ES / DeepL / LLM / Redis 等外部依赖。需要真实依赖的脚本放 `tests/manual/`。 | |
| 394 | +- **运行**: | |
| 395 | + - CI 门禁:`./scripts/run_ci_tests.sh`(契约 + search 回归锚点) | |
| 396 | + - 发版前:`./scripts/run_regression_tests.sh`(全部 `regression` 锚点;可配 `SUBSYSTEM=<name>`) | |
| 397 | + - 全量:`python -m pytest tests/ -q` | |
| 398 | +- **原则**:新增逻辑应有对应测试;修改协议或配置契约时**同步**更新契约测试。不要在测试里保留"旧 assert 作为兼容"——请直接面向当前实现写断言,失败即意味着契约已变更,需要上层决策。 | |
| 394 | 399 | |
| 395 | 400 | ### 8.3 配置与环境 |
| 396 | 401 | ... | ... |
docs/测试Pipeline说明.md
| 1 | 1 | # 搜索引擎测试流水线指南 |
| 2 | 2 | |
| 3 | -## 概述 | |
| 3 | +本文档是测试套件的**权威入口**,涵盖目录约定、运行方式、回归锚点矩阵、以及手动 | |
| 4 | +联调脚本的分工。任何与这里不一致的历史文档(例如提到 `tests/unit/` 或 | |
| 5 | +`scripts/start_test_environment.sh`)都是过期信息,以本文为准。 | |
| 4 | 6 | |
| 5 | -本文档介绍了搜索引擎项目的完整测试流水线,包括测试环境搭建、测试执行、结果分析等内容。测试流水线设计用于commit前的自动化质量保证。 | |
| 6 | - | |
| 7 | -## 🏗️ 测试架构 | |
| 8 | - | |
| 9 | -### 测试层次 | |
| 7 | +## 1. 测试目录与分层 | |
| 10 | 8 | |
| 11 | 9 | ``` |
| 12 | -测试流水线 | |
| 13 | -├── 代码质量检查 (Code Quality) | |
| 14 | -│ ├── 代码格式化检查 (Black, isort) | |
| 15 | -│ ├── 静态分析 (Flake8, MyPy, Pylint) | |
| 16 | -│ └── 安全扫描 (Safety, Bandit) | |
| 17 | -│ | |
| 18 | -├── 单元测试 (Unit Tests) | |
| 19 | -│ ├── RequestContext测试 | |
| 20 | -│ ├── Searcher测试 | |
| 21 | -│ ├── QueryParser测试 | |
| 22 | -│ └── BooleanParser测试 | |
| 23 | -│ | |
| 24 | -├── 集成测试 (Integration Tests) | |
| 25 | -│ ├── 端到端搜索流程测试 | |
| 26 | -│ ├── 多组件协同测试 | |
| 27 | -│ └── 错误处理测试 | |
| 28 | -│ | |
| 29 | -├── API测试 (API Tests) | |
| 30 | -│ ├── REST API接口测试 | |
| 31 | -│ ├── 参数验证测试 | |
| 32 | -│ ├── 并发请求测试 | |
| 33 | -│ └── 错误响应测试 | |
| 34 | -│ | |
| 35 | -└── 性能测试 (Performance Tests) | |
| 36 | - ├── 响应时间测试 | |
| 37 | - ├── 并发性能测试 | |
| 38 | - └── 资源使用测试 | |
| 10 | +tests/ | |
| 11 | +├── conftest.py # 只做 sys.path 注入;不再维护全局 fixture | |
| 12 | +├── ci/ # API/服务契约(FastAPI TestClient + 全 fake 依赖) | |
| 13 | +│ └── test_service_api_contracts.py | |
| 14 | +├── manual/ # 需真实服务才能跑的联调脚本,pytest 默认不 collect | |
| 15 | +│ ├── test_build_docs_api.py | |
| 16 | +│ ├── test_cnclip_service.py | |
| 17 | +│ └── test_facet_api.py | |
| 18 | +└── test_*.py # 子系统单测(全部自带 fake,无外部依赖) | |
| 39 | 19 | ``` |
| 40 | 20 | |
| 41 | -### 核心组件 | |
| 42 | - | |
| 43 | -1. **RequestContext**: 请求级别的上下文管理器,用于跟踪测试过程中的所有数据 | |
| 44 | -2. **测试环境管理**: 自动化启动/停止测试依赖服务 | |
| 45 | -3. **测试执行引擎**: 统一的测试运行和结果收集 | |
| 46 | -4. **报告生成系统**: 多格式的测试报告生成 | |
| 47 | - | |
| 48 | -## 🚀 快速开始 | |
| 21 | +关键约束(写在 `pytest.ini` 里,不要另起分支): | |
| 49 | 22 | |
| 50 | -### 本地测试环境 | |
| 23 | +- `testpaths = tests`,`norecursedirs = tests/manual`; | |
| 24 | +- `--strict-markers`:所有 marker 必须先在 `pytest.ini::markers` 登记; | |
| 25 | +- 测试**不得**依赖真实 ES / DeepL / LLM 服务。需要外部依赖的脚本请放 `tests/manual/`。 | |
| 51 | 26 | |
| 52 | -1. **启动测试环境** | |
| 53 | - ```bash | |
| 54 | - # 启动所有必要的测试服务 | |
| 55 | - ./scripts/start_test_environment.sh | |
| 56 | - ``` | |
| 27 | +## 2. 运行方式 | |
| 57 | 28 | |
| 58 | -2. **运行完整测试套件** | |
| 59 | - ```bash | |
| 60 | - # 运行所有测试 | |
| 61 | - python scripts/run_tests.py | |
| 29 | +| 场景 | 命令 | 覆盖范围 | | |
| 30 | +|------|------|----------| | |
| 31 | +| CI 门禁(每次提交) | `./scripts/run_ci_tests.sh` | `tests/ci` + `contract` marker + `search ∧ regression` | | |
| 32 | +| 发版 / 大合并前 | `./scripts/run_regression_tests.sh` | 所有 `@pytest.mark.regression` | | |
| 33 | +| 子系统子集 | `SUBSYSTEM=search ./scripts/run_regression_tests.sh` | 指定子系统的 regression 锚点 | | |
| 34 | +| 全量(含非回归) | `python -m pytest tests/ -q` | 全部自动化用例 | | |
| 35 | +| 手动联调 | `python tests/manual/<script>.py` | 需提前起对应服务 | | |
| 62 | 36 | |
| 63 | - # 或者使用pytest直接运行 | |
| 64 | - pytest tests/ -v | |
| 65 | - ``` | |
| 37 | +## 3. Marker 体系与回归锚点矩阵 | |
| 66 | 38 | |
| 67 | -3. **停止测试环境** | |
| 68 | - ```bash | |
| 69 | - ./scripts/stop_test_environment.sh | |
| 70 | - ``` | |
| 39 | +marker 定义见 `pytest.ini`。每个测试文件通过模块级 `pytestmark` 贴标,同时 | |
| 40 | +属于 `regression` 的用例构成“**回归锚点集合**”。 | |
| 71 | 41 | |
| 72 | -### CI/CD测试 | |
| 42 | +| 子系统 marker | 关键文件(锚点) | 保护的行为 | | |
| 43 | +|---------------|------------------|------------| | |
| 44 | +| `contract` | `tests/ci/test_service_api_contracts.py` | Search / Indexer / Embedding / Reranker / Translation 的 HTTP 契约 | | |
| 45 | +| `search` | `test_search_rerank_window.py`, `test_es_query_builder.py`, `test_es_query_builder_text_recall_languages.py` | Searcher 主路径、排序 / 召回、keywords 副 combined_fields、多语种 | | |
| 46 | +| `query` | `test_query_parser_mixed_language.py`, `test_tokenization.py` | 中英混合解析、HanLP 分词、language detect | | |
| 47 | +| `intent` | `test_style_intent.py`, `test_product_title_exclusion.py`, `test_sku_intent_selector.py` | 风格意图、商品标题排除、SKU 选型 | | |
| 48 | +| `rerank` | `test_rerank_client.py`, `test_rerank_query_text.py`, `test_rerank_provider_topn.py`, `test_reranker_server_topn.py`, `test_reranker_dashscope_backend.py`, `test_reranker_qwen3_gguf_backend.py` | 粗排 / 精排 / topN / 后端切换 | | |
| 49 | +| `embedding` | `test_embedding_pipeline.py`, `test_embedding_service_limits.py`, `test_embedding_service_priority.py`, `test_cache_keys.py` | 文本/图像向量客户端、inflight limiter、优先级队列、缓存 key | | |
| 50 | +| `translation` | `test_translation_deepl_backend.py`, `test_translation_llm_backend.py`, `test_translation_local_backends.py`, `test_translator_failure_semantics.py` | DeepL / LLM / 本地回退、失败语义 | | |
| 51 | +| `indexer` | `test_product_enrich_partial_mode.py`, `test_process_products_batching.py`, `test_llm_enrichment_batch_fill.py` | LLM Partial Mode、batch 拆分、空结果补位 | | |
| 52 | +| `suggestion` | `test_suggestions.py` | 建议索引构建 | | |
| 53 | +| `eval` | `test_eval_metrics.py`(regression) + `test_search_evaluation_datasets.py` / `test_eval_framework_clients.py`(非 regression) | NDCG / ERR 指标、数据集加载、评估客户端 | | |
| 73 | 54 | |
| 74 | -1. **GitHub Actions** | |
| 75 | - - Push到主分支自动触发 | |
| 76 | - - Pull Request自动运行 | |
| 77 | - - 手动触发支持 | |
| 55 | +> 任何新写的子系统单测,都应该在顶部加 `pytestmark = [pytest.mark.<子系统>, pytest.mark.regression]`。 | |
| 56 | +> 不贴 `regression` 的测试默认**不会**被 `run_regression_tests.sh` 选中,请谨慎决定。 | |
| 78 | 57 | |
| 79 | -2. **测试报告** | |
| 80 | - - 自动生成并上传 | |
| 81 | - - PR评论显示测试摘要 | |
| 82 | - - 详细报告下载 | |
| 58 | +## 4. 当前覆盖缺口(跟踪中) | |
| 83 | 59 | |
| 84 | -## 📋 测试类型详解 | |
| 60 | +以下场景目前没有被 `regression` 锚点覆盖,优先级从高到低: | |
| 85 | 61 | |
| 86 | -### 1. 单元测试 (Unit Tests) | |
| 62 | +1. **`api/routes/search.py` 的请求参数映射**:`QueryParser.parse(...)` 透传是否完整(目前只有 `tests/ci` 间接覆盖)。 | |
| 63 | +2. **`indexer/document_transformer.py` 的端到端转换**:从 MySQL 行到 ES doc 的 snapshot 对比。 | |
| 64 | +3. **`config/loader.py` 加载多租户配置**:含继承 / override 的合并规则。 | |
| 65 | +4. **`search/searcher.py::_build_function_score`**:function_score 装配。 | |
| 66 | +5. **Facet 聚合 / disjunctive 过滤**。 | |
| 67 | +6. **图像搜索主路径**(`search/image_searcher.py`)。 | |
| 87 | 68 | |
| 88 | -**位置**: `tests/unit/` | |
| 69 | +补齐时记得同步贴 `regression` + 对应子系统 marker,并在本表删除条目。 | |
| 89 | 70 | |
| 90 | -**目的**: 测试单个函数、类、模块的功能 | |
| 71 | +## 5. 手动联调:索引文档构建流水线 | |
| 91 | 72 | |
| 92 | -**覆盖范围**: | |
| 93 | -- `test_context.py`: RequestContext功能测试 | |
| 94 | -- `test_searcher.py`: Searcher核心功能测试 | |
| 95 | -- `test_query_parser.py`: QueryParser处理逻辑测试 | |
| 96 | - | |
| 97 | -**运行方式**: | |
| 98 | -```bash | |
| 99 | -# 运行所有单元测试 | |
| 100 | -pytest tests/unit/ -v | |
| 101 | - | |
| 102 | -# 运行特定测试 | |
| 103 | -pytest tests/unit/test_context.py -v | |
| 104 | - | |
| 105 | -# 生成覆盖率报告 | |
| 106 | -pytest tests/unit/ --cov=. --cov-report=html | |
| 107 | -``` | |
| 108 | - | |
| 109 | -### 2. 集成测试 (Integration Tests) | |
| 110 | - | |
| 111 | -**位置**: `tests/integration/` | |
| 112 | - | |
| 113 | -**目的**: 测试多个组件协同工作的功能 | |
| 114 | - | |
| 115 | -**覆盖范围**: | |
| 116 | -- `test_search_integration.py`: 完整搜索流程集成 | |
| 117 | -- 数据库、ES、搜索器集成测试 | |
| 118 | -- 错误传播和处理测试 | |
| 119 | - | |
| 120 | -**运行方式**: | |
| 121 | -```bash | |
| 122 | -# 运行集成测试(需要启动测试环境) | |
| 123 | -pytest tests/integration/ -v -m "not slow" | |
| 124 | - | |
| 125 | -# 运行包含慢速测试的集成测试 | |
| 126 | -pytest tests/integration/ -v | |
| 127 | -``` | |
| 128 | - | |
| 129 | -### 3. API测试 (API Tests) | |
| 130 | - | |
| 131 | -**位置**: `tests/integration/test_api_integration.py` | |
| 132 | - | |
| 133 | -**目的**: 测试HTTP API接口的功能和性能 | |
| 134 | - | |
| 135 | -**覆盖范围**: | |
| 136 | -- 基本搜索API | |
| 137 | -- 参数验证 | |
| 138 | -- 错误处理 | |
| 139 | -- 并发请求 | |
| 140 | -- Unicode支持 | |
| 141 | - | |
| 142 | -**运行方式**: | |
| 143 | -```bash | |
| 144 | -# 运行API测试 | |
| 145 | -pytest tests/integration/test_api_integration.py -v | |
| 146 | -``` | |
| 147 | - | |
| 148 | -### 5. 索引 & 文档构建流水线验证(手动) | |
| 149 | - | |
| 150 | -除了自动化测试外,推荐在联调/问题排查时手动跑一遍“**从 MySQL 到 ES doc**”的索引流水线,确保字段与 mapping、查询逻辑一致。 | |
| 151 | - | |
| 152 | -#### 5.1 启动 Indexer 服务 | |
| 73 | +除自动化测试外,联调/问题排查时建议走一遍“**MySQL → ES doc**”链路,确保字段与 mapping | |
| 74 | +与查询逻辑对齐。 | |
| 153 | 75 | |
| 154 | 76 | ```bash |
| 155 | 77 | cd /home/tw/saas-search |
| 156 | 78 | ./scripts/stop.sh # 停掉已有进程(可选) |
| 157 | -./scripts/start_indexer.sh # 启动专用 indexer 服务,默认端口 6004 | |
| 158 | -``` | |
| 159 | - | |
| 160 | -#### 5.2 基于数据库构建 ES doc(只看、不写 ES) | |
| 79 | +./scripts/start_indexer.sh # 启动 indexer 服务,默认端口 6004 | |
| 161 | 80 | |
| 162 | -> 场景:已经知道某个 `tenant_id` 和 `spu_id`,想看它在“最新逻辑下”的 ES 文档长什么样。 | |
| 163 | - | |
| 164 | -```bash | |
| 165 | 81 | curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \ |
| 166 | 82 | -H "Content-Type: application/json" \ |
| 167 | - -d '{ | |
| 168 | - "tenant_id": "170", | |
| 169 | - "spu_ids": ["223167"] | |
| 170 | - }' | |
| 171 | -``` | |
| 172 | - | |
| 173 | -返回中: | |
| 174 | - | |
| 175 | -- `docs[0]` 为当前代码构造出来的完整 ES doc(与 `mappings/search_products.json` 对齐); | |
| 176 | -- 可以直接比对: | |
| 177 | - - 索引字段说明:`docs/索引字段说明v2.md` | |
| 178 | - - 实际 ES 文档:`docs/常用查询 - ES.md` 中的查询示例(按 `spu_id` 过滤)。 | |
| 179 | - | |
| 180 | -#### 5.3 与 ES 实际数据对比 | |
| 181 | - | |
| 182 | -```bash | |
| 183 | -curl -u 'essa:***' \ | |
| 184 | - -X GET 'http://localhost:9200/search_products_tenant_170/_search?pretty' \ | |
| 185 | - -H 'Content-Type: application/json' \ | |
| 186 | - -d '{ | |
| 187 | - "size": 5, | |
| 188 | - "_source": ["title", "tags"], | |
| 189 | - "query": { | |
| 190 | - "bool": { | |
| 191 | - "filter": [ | |
| 192 | - { "term": { "spu_id": "223167" } } | |
| 193 | - ] | |
| 194 | - } | |
| 195 | - } | |
| 196 | - }' | |
| 83 | + -d '{ "tenant_id": "170", "spu_ids": ["223167"] }' | |
| 197 | 84 | ``` |
| 198 | 85 | |
| 199 | -对比如下内容是否一致: | |
| 200 | - | |
| 201 | -- 多语言字段:`title/brief/description/vendor/category_name_text/category_path`; | |
| 202 | -- 结构字段:`tags/specifications/skus/min_price/max_price/compare_at_price/total_inventory` 等; | |
| 203 | -- 算法字段:`title_embedding` 是否存在(值不必逐项比对)。 | |
| 204 | - | |
| 205 | -如果两边不一致,可以结合: | |
| 206 | - | |
| 207 | -- `indexer/document_transformer.py`(文档构造逻辑); | |
| 208 | -- `indexer/incremental_service.py`(增量索引/查库逻辑); | |
| 209 | -- `logs/indexer.log`(索引日志) | |
| 210 | - | |
| 211 | -逐步缩小问题范围。 | |
| 212 | - | |
| 213 | -### 4. 性能测试 (Performance Tests) | |
| 214 | - | |
| 215 | -**目的**: 验证系统性能指标 | |
| 216 | - | |
| 217 | -**测试内容**: | |
| 218 | -- 搜索响应时间 | |
| 219 | -- API并发处理能力 | |
| 220 | -- 资源使用情况 | |
| 221 | - | |
| 222 | -**运行方式**: | |
| 223 | -```bash | |
| 224 | -# 运行性能测试 | |
| 225 | -python scripts/run_performance_tests.py | |
| 226 | -``` | |
| 227 | - | |
| 228 | -## 🛠️ 环境配置 | |
| 229 | - | |
| 230 | -### 测试环境要求 | |
| 231 | - | |
| 232 | -1. **Python环境** | |
| 233 | - ```bash | |
| 234 | - # 创建测试环境 | |
| 235 | - conda create -n searchengine-test python=3.9 | |
| 236 | - conda activate searchengine-test | |
| 237 | - | |
| 238 | - # 安装依赖 | |
| 239 | - pip install -r requirements.txt | |
| 240 | - pip install pytest pytest-cov pytest-json-report | |
| 241 | - ``` | |
| 242 | - | |
| 243 | -2. **Elasticsearch** | |
| 244 | - ```bash | |
| 245 | - # 使用Docker启动ES | |
| 246 | - docker run -d \ | |
| 247 | - --name elasticsearch \ | |
| 248 | - -p 9200:9200 \ | |
| 249 | - -e "discovery.type=single-node" \ | |
| 250 | - -e "xpack.security.enabled=false" \ | |
| 251 | - elasticsearch:8.8.0 | |
| 252 | - ``` | |
| 253 | - | |
| 254 | -3. **环境变量** | |
| 255 | - ```bash | |
| 256 | - export ES_HOST="http://localhost:9200" | |
| 257 | - export ES_USERNAME="elastic" | |
| 258 | - export ES_PASSWORD="changeme" | |
| 259 | - export API_HOST="127.0.0.1" | |
| 260 | - export API_PORT="6003" | |
| 261 | - export TENANT_ID="test_tenant" | |
| 262 | - export TESTING_MODE="true" | |
| 263 | - ``` | |
| 264 | - | |
| 265 | -### 服务依赖 | |
| 266 | - | |
| 267 | -测试环境需要以下服务: | |
| 268 | - | |
| 269 | -1. **Elasticsearch** (端口9200) | |
| 270 | - - 存储和搜索测试数据 | |
| 271 | - - 支持中文和英文索引 | |
| 272 | - | |
| 273 | -2. **API服务** (端口6003) | |
| 274 | - - FastAPI测试服务 | |
| 275 | - - 提供搜索接口 | |
| 276 | - | |
| 277 | -3. **测试数据库** | |
| 278 | - - 预配置的测试索引 | |
| 279 | - - 包含测试数据 | |
| 280 | - | |
| 281 | -## 📊 测试报告 | |
| 282 | - | |
| 283 | -### 报告类型 | |
| 284 | - | |
| 285 | -1. **实时控制台输出** | |
| 286 | - - 测试进度显示 | |
| 287 | - - 失败详情 | |
| 288 | - - 性能摘要 | |
| 289 | - | |
| 290 | -2. **JSON格式报告** | |
| 291 | - ```json | |
| 292 | - { | |
| 293 | - "timestamp": "2024-01-01T10:00:00", | |
| 294 | - "summary": { | |
| 295 | - "total_tests": 150, | |
| 296 | - "passed": 148, | |
| 297 | - "failed": 2, | |
| 298 | - "success_rate": 98.7 | |
| 299 | - }, | |
| 300 | - "suites": { ... } | |
| 301 | - } | |
| 302 | - ``` | |
| 303 | - | |
| 304 | -3. **文本格式报告** | |
| 305 | - - 人类友好的格式 | |
| 306 | - - 包含测试摘要和详情 | |
| 307 | - - 适合PR评论 | |
| 308 | - | |
| 309 | -4. **HTML覆盖率报告** | |
| 310 | - - 代码覆盖率可视化 | |
| 311 | - - 分支和行覆盖率 | |
| 312 | - - 缺失测试高亮 | |
| 313 | - | |
| 314 | -### 报告位置 | |
| 315 | - | |
| 316 | -``` | |
| 317 | -test_logs/ | |
| 318 | -├── unit_test_results.json # 单元测试结果 | |
| 319 | -├── integration_test_results.json # 集成测试结果 | |
| 320 | -├── api_test_results.json # API测试结果 | |
| 321 | -├── test_report_20240101_100000.txt # 文本格式摘要 | |
| 322 | -├── test_report_20240101_100000.json # JSON格式详情 | |
| 323 | -└── htmlcov/ # HTML覆盖率报告 | |
| 324 | -``` | |
| 325 | - | |
| 326 | -## 🔄 CI/CD集成 | |
| 327 | - | |
| 328 | -### GitHub Actions工作流 | |
| 329 | - | |
| 330 | -**触发条件**: | |
| 331 | -- Push到主分支 | |
| 332 | -- Pull Request创建/更新 | |
| 333 | -- 手动触发 | |
| 334 | - | |
| 335 | -**工作流阶段**: | |
| 336 | - | |
| 337 | -1. **代码质量检查** | |
| 338 | - - 代码格式验证 | |
| 339 | - - 静态代码分析 | |
| 340 | - - 安全漏洞扫描 | |
| 341 | - | |
| 342 | -2. **单元测试** | |
| 343 | - - 多Python版本矩阵测试 | |
| 344 | - - 代码覆盖率收集 | |
| 345 | - - 自动上传到Codecov | |
| 346 | - | |
| 347 | -3. **集成测试** | |
| 348 | - - 服务依赖启动 | |
| 349 | - - 端到端功能测试 | |
| 350 | - - 错误处理验证 | |
| 351 | - | |
| 352 | -4. **API测试** | |
| 353 | - - 接口功能验证 | |
| 354 | - - 参数校验测试 | |
| 355 | - - 并发请求测试 | |
| 356 | - | |
| 357 | -5. **性能测试** | |
| 358 | - - 响应时间检查 | |
| 359 | - - 资源使用监控 | |
| 360 | - - 性能回归检测 | |
| 361 | - | |
| 362 | -6. **测试报告生成** | |
| 363 | - - 结果汇总 | |
| 364 | - - 报告上传 | |
| 365 | - - PR评论更新 | |
| 366 | - | |
| 367 | -### 工作流配置 | |
| 368 | - | |
| 369 | -**文件**: `.github/workflows/test.yml` | |
| 370 | - | |
| 371 | -**关键特性**: | |
| 372 | -- 并行执行提高效率 | |
| 373 | -- 服务容器化隔离 | |
| 374 | -- 自动清理资源 | |
| 375 | -- 智能缓存依赖 | |
| 376 | - | |
| 377 | -## 🧪 测试最佳实践 | |
| 378 | - | |
| 379 | -### 1. 测试编写原则 | |
| 380 | - | |
| 381 | -- **独立性**: 每个测试应该独立运行 | |
| 382 | -- **可重复性**: 测试结果应该一致 | |
| 383 | -- **快速执行**: 单元测试应该快速完成 | |
| 384 | -- **清晰命名**: 测试名称应该描述测试内容 | |
| 385 | - | |
| 386 | -### 2. 测试数据管理 | |
| 387 | - | |
| 388 | -```python | |
| 389 | -# 使用fixture提供测试数据 | |
| 390 | -@pytest.fixture | |
| 391 | -def sample_tenant_config(): | |
| 392 | - return TenantConfig( | |
| 393 | - tenant_id="test_tenant", | |
| 394 | - es_index_name="test_products" | |
| 395 | - ) | |
| 396 | - | |
| 397 | -# 使用mock避免外部依赖 | |
| 398 | -@patch('search.searcher.ESClient') | |
| 399 | -def test_search_with_mock_es(mock_es_client, test_searcher): | |
| 400 | - mock_es_client.search.return_value = mock_response | |
| 401 | - result = test_searcher.search("test query") | |
| 402 | - assert result is not None | |
| 403 | -``` | |
| 404 | - | |
| 405 | -### 3. RequestContext集成 | |
| 406 | - | |
| 407 | -```python | |
| 408 | -def test_with_context(test_searcher): | |
| 409 | - context = create_request_context("test-req", "test-user") | |
| 410 | - | |
| 411 | - result = test_searcher.search("test query", context=context) | |
| 412 | - | |
| 413 | - # 验证context被正确更新 | |
| 414 | - assert context.query_analysis.original_query == "test query" | |
| 415 | - assert context.get_stage_duration("elasticsearch_search") > 0 | |
| 416 | -``` | |
| 417 | - | |
| 418 | -### 4. 性能测试指南 | |
| 419 | - | |
| 420 | -```python | |
| 421 | -def test_search_performance(client): | |
| 422 | - start_time = time.time() | |
| 423 | - response = client.get("/search", params={"q": "test query"}) | |
| 424 | - response_time = (time.time() - start_time) * 1000 | |
| 425 | - | |
| 426 | - assert response.status_code == 200 | |
| 427 | - assert response_time < 2000 # 2秒内响应 | |
| 428 | -``` | |
| 429 | - | |
| 430 | -## 🚨 故障排除 | |
| 431 | - | |
| 432 | -### 常见问题 | |
| 433 | - | |
| 434 | -1. **Elasticsearch连接失败** | |
| 435 | - ```bash | |
| 436 | - # 检查ES状态 | |
| 437 | - curl http://localhost:9200/_cluster/health | |
| 438 | - | |
| 439 | - # 重启ES服务 | |
| 440 | - docker restart elasticsearch | |
| 441 | - ``` | |
| 442 | - | |
| 443 | -2. **测试端口冲突** | |
| 444 | - ```bash | |
| 445 | - # 检查端口占用 | |
| 446 | - lsof -i :6003 | |
| 447 | - | |
| 448 | - # 修改API端口 | |
| 449 | - export API_PORT="6004" | |
| 450 | - ``` | |
| 451 | - | |
| 452 | -3. **依赖包缺失** | |
| 453 | - ```bash | |
| 454 | - # 重新安装依赖 | |
| 455 | - pip install -r requirements.txt | |
| 456 | - pip install pytest pytest-cov pytest-json-report | |
| 457 | - ``` | |
| 458 | - | |
| 459 | -4. **测试数据问题** | |
| 460 | - ```bash | |
| 461 | - # 重新创建测试索引 | |
| 462 | - curl -X DELETE http://localhost:9200/test_products | |
| 463 | - ./scripts/start_test_environment.sh | |
| 464 | - ``` | |
| 465 | - | |
| 466 | -### 调试技巧 | |
| 467 | - | |
| 468 | -1. **详细日志输出** | |
| 469 | - ```bash | |
| 470 | - pytest tests/unit/test_context.py -v -s --tb=long | |
| 471 | - ``` | |
| 472 | - | |
| 473 | -2. **运行单个测试** | |
| 474 | - ```bash | |
| 475 | - pytest tests/unit/test_context.py::TestRequestContext::test_create_context -v | |
| 476 | - ``` | |
| 477 | - | |
| 478 | -3. **调试模式** | |
| 479 | - ```python | |
| 480 | - import pdb; pdb.set_trace() | |
| 481 | - ``` | |
| 482 | - | |
| 483 | -4. **性能分析** | |
| 484 | - ```bash | |
| 485 | - pytest --profile tests/ | |
| 486 | - ``` | |
| 487 | - | |
| 488 | -## 📈 持续改进 | |
| 489 | - | |
| 490 | -### 测试覆盖率目标 | |
| 491 | - | |
| 492 | -- **单元测试**: > 90% | |
| 493 | -- **集成测试**: > 80% | |
| 494 | -- **API测试**: > 95% | |
| 495 | - | |
| 496 | -### 性能基准 | |
| 497 | - | |
| 498 | -- **搜索响应时间**: < 2秒 | |
| 499 | -- **API并发处理**: 100 QPS | |
| 500 | -- **系统资源使用**: < 80% CPU, < 4GB RAM | |
| 86 | +返回中 `docs[0]` 即当前代码构造的 ES doc(与 `mappings/search_products.json` 对齐)。 | |
| 87 | +与真实 ES 数据对比的查询参考 `docs/常用查询 - ES.md`;若字段不一致,按以下路径定位: | |
| 501 | 88 | |
| 502 | -### 质量门禁 | |
| 89 | +- `indexer/document_transformer.py` — 文档构造逻辑 | |
| 90 | +- `indexer/incremental_service.py` — 增量查库逻辑 | |
| 91 | +- `logs/indexer.log` — 索引日志 | |
| 503 | 92 | |
| 504 | -- **所有测试必须通过** | |
| 505 | -- **代码覆盖率不能下降** | |
| 506 | -- **性能不能显著退化** | |
| 507 | -- **不能有安全漏洞** | |
| 93 | +## 6. 编写测试的约束(与 `开发原则` 对齐) | |
| 508 | 94 | |
| 95 | +- **fail fast**:测试输入不合法时应直接抛错,不用 `if ... return`;不要用 `try/except` 吃掉异常再 `assert not exception`。 | |
| 96 | +- **不做兼容双轨**:用例对准当前实现,不为历史行为保留“旧 assert”。若确有外部兼容性(例如 API 上标注 Deprecated 的字段),在 `tests/ci` 里单独写**契约**用例并注明 Deprecated。 | |
| 97 | +- **外部依赖全 fake**:凡是依赖 HTTP / Redis / ES / LLM 的测试必须注入 fake stub,否则归入 `tests/manual/`。 | |
| 98 | +- **一处真相**:共享 fixture 如果超过 2 个文件使用,放 `tests/conftest.py`;只给 1 个文件用就放在该文件内。避免再次出现全库无人引用的 dead fixture。 | ... | ... |
| ... | ... | @@ -0,0 +1,30 @@ |
| 1 | +[pytest] | |
| 2 | +# 权威的 pytest 配置源。新增共享配置请放这里,不要再散落到各测试文件头部。 | |
| 3 | +# | |
| 4 | +# testpaths 明确只扫 tests/(含 tests/ci/),刻意排除 tests/manual/。 | |
| 5 | +testpaths = tests | |
| 6 | +# tests/manual/ 里的脚本依赖外部服务,不参与自动回归。 | |
| 7 | +norecursedirs = tests/manual | |
| 8 | + | |
| 9 | +addopts = -ra --strict-markers | |
| 10 | + | |
| 11 | +# 全局静默第三方的 DeprecationWarning,避免遮掩真正需要关注的业务警告。 | |
| 12 | +filterwarnings = | |
| 13 | + ignore::DeprecationWarning | |
| 14 | + ignore::PendingDeprecationWarning | |
| 15 | + | |
| 16 | +# 子系统 / 回归分层标记。新增 marker 前先在这里登记,未登记的 marker 会因 | |
| 17 | +# --strict-markers 直接报错。 | |
| 18 | +markers = | |
| 19 | + regression: 提交/发布前必跑的回归锚点集合 | |
| 20 | + contract: API / 服务契约(tests/ci 默认全部归入) | |
| 21 | + search: Searcher / 排序 / 召回管线 | |
| 22 | + query: QueryParser / 翻译 / 分词 | |
| 23 | + intent: 样式与 SKU 意图识别 | |
| 24 | + rerank: 粗排 / 精排 / 融合 | |
| 25 | + embedding: 文本/图像向量服务与客户端 | |
| 26 | + translation: 翻译服务与缓存 | |
| 27 | + indexer: 索引构建 / LLM enrich | |
| 28 | + suggestion: 搜索建议索引 | |
| 29 | + eval: 评估框架 | |
| 30 | + manual: 需人工起服务,CI 不跑 | ... | ... |
scripts/run_ci_tests.sh
| 1 | 1 | #!/bin/bash |
| 2 | +# CI 门禁脚本:每次提交必跑的最小集合。 | |
| 3 | +# | |
| 4 | +# 覆盖范围: | |
| 5 | +# 1. tests/ci 下的服务契约测试(HTTP/JSON schema / 路由 / 鉴权) | |
| 6 | +# 2. tests/ 下带 `contract` marker 的所有用例(冗余保障,防止 marker 与目录漂移) | |
| 7 | +# 3. 搜索主路径 + ES 查询构建器的回归锚点(search 子系统) | |
| 8 | +# | |
| 9 | +# 超出这个范围的完整回归集请用 scripts/run_regression_tests.sh。 | |
| 2 | 10 | |
| 3 | 11 | set -euo pipefail |
| 4 | 12 | |
| 5 | 13 | cd "$(dirname "$0")/.." |
| 6 | 14 | source ./activate.sh |
| 7 | 15 | |
| 8 | -echo "Running CI contract tests..." | |
| 9 | -python -m pytest tests/ci -q | |
| 16 | +echo "==> [CI-1/2] API contract tests (tests/ci + contract marker)..." | |
| 17 | +python -m pytest tests/ci tests/ -q -m contract | |
| 18 | + | |
| 19 | +echo "==> [CI-2/2] Search core regression (search marker)..." | |
| 20 | +python -m pytest tests/ -q -m "search and regression" | ... | ... |
| ... | ... | @@ -0,0 +1,26 @@ |
| 1 | +#!/bin/bash | |
| 2 | +# 回归锚点脚本:发版 / 大合并前必跑的回归集合。 | |
| 3 | +# | |
| 4 | +# 选中策略:所有 @pytest.mark.regression 用例,即 docs/测试Pipeline说明.md | |
| 5 | +# “回归钩子矩阵” 中列出的各子系统锚点。 | |
| 6 | +# | |
| 7 | +# 可选参数: | |
| 8 | +# SUBSYSTEM=search ./scripts/run_regression_tests.sh # 只跑某个子系统的回归子集 | |
| 9 | +# | |
| 10 | +# 约束:本脚本不启外部依赖(ES / DeepL / LLM 全 fake)。如需真实依赖,请用 | |
| 11 | +# tests/manual 下的脚本。 | |
| 12 | + | |
| 13 | +set -euo pipefail | |
| 14 | + | |
| 15 | +cd "$(dirname "$0")/.." | |
| 16 | +source ./activate.sh | |
| 17 | + | |
| 18 | +SUBSYSTEM="${SUBSYSTEM:-}" | |
| 19 | + | |
| 20 | +if [[ -n "${SUBSYSTEM}" ]]; then | |
| 21 | + echo "==> Running regression subset: subsystem=${SUBSYSTEM}" | |
| 22 | + python -m pytest tests/ -q -m "${SUBSYSTEM} and regression" | |
| 23 | +else | |
| 24 | + echo "==> Running full regression anchor suite..." | |
| 25 | + python -m pytest tests/ -q -m regression | |
| 26 | +fi | ... | ... |
search/searcher.py
| ... | ... | @@ -370,6 +370,11 @@ class Searcher: |
| 370 | 370 | # (on the same dimension as optionN). |
| 371 | 371 | includes.add("enriched_taxonomy_attributes") |
| 372 | 372 | |
| 373 | + # Needed when inner_hits url string differs from sku.image_src but ES exposes | |
| 374 | + # _nested.offset — we re-resolve the winning url from image_embedding[offset]. | |
| 375 | + if self._has_image_signal(parsed_query): | |
| 376 | + includes.add("image_embedding") | |
| 377 | + | |
| 373 | 378 | return {"includes": sorted(includes)} |
| 374 | 379 | |
| 375 | 380 | def _fetch_hits_by_ids( | ... | ... |
search/sku_intent_selector.py
| ... | ... | @@ -40,7 +40,8 @@ from __future__ import annotations |
| 40 | 40 | |
| 41 | 41 | from dataclasses import dataclass, field |
| 42 | 42 | from typing import Any, Callable, Dict, List, Optional, Tuple |
| 43 | -from urllib.parse import urlsplit | |
| 43 | +import posixpath | |
| 44 | +from urllib.parse import unquote, urlsplit | |
| 44 | 45 | |
| 45 | 46 | from query.style_intent import ( |
| 46 | 47 | DetectedStyleIntent, |
| ... | ... | @@ -439,6 +440,7 @@ class StyleSkuSelector: |
| 439 | 440 | # ------------------------------------------------------------------ |
| 440 | 441 | @staticmethod |
| 441 | 442 | def _normalize_url(url: Any) -> str: |
| 443 | + """host + path, no query/fragment; casefolded — primary equality key.""" | |
| 442 | 444 | raw = str(url or "").strip() |
| 443 | 445 | if not raw: |
| 444 | 446 | return "" |
| ... | ... | @@ -448,20 +450,93 @@ class StyleSkuSelector: |
| 448 | 450 | try: |
| 449 | 451 | parts = urlsplit(raw) |
| 450 | 452 | except ValueError: |
| 451 | - return raw.casefold() | |
| 453 | + return str(url).strip().casefold() | |
| 452 | 454 | host = (parts.netloc or "").casefold() |
| 453 | - path = parts.path or "" | |
| 455 | + path = unquote(parts.path or "") | |
| 454 | 456 | return f"{host}{path}".casefold() |
| 455 | 457 | |
| 458 | + @staticmethod | |
| 459 | + def _normalize_path_only(url: Any) -> str: | |
| 460 | + """Path-only key for cross-CDN / host-alias cases.""" | |
| 461 | + raw = str(url or "").strip() | |
| 462 | + if not raw: | |
| 463 | + return "" | |
| 464 | + if raw.startswith("//"): | |
| 465 | + raw = "https:" + raw | |
| 466 | + try: | |
| 467 | + parts = urlsplit(raw) | |
| 468 | + path = unquote(parts.path or "") | |
| 469 | + except ValueError: | |
| 470 | + return "" | |
| 471 | + return path.casefold().rstrip("/") | |
| 472 | + | |
| 473 | + @classmethod | |
| 474 | + def _url_filename(cls, url: Any) -> str: | |
| 475 | + p = cls._normalize_path_only(url) | |
| 476 | + if not p: | |
| 477 | + return "" | |
| 478 | + return posixpath.basename(p).casefold() | |
| 479 | + | |
| 480 | + @classmethod | |
| 481 | + def _urls_equivalent(cls, a: Any, b: Any) -> bool: | |
| 482 | + if not a or not b: | |
| 483 | + return False | |
| 484 | + na, nb = cls._normalize_url(a), cls._normalize_url(b) | |
| 485 | + if na and nb and na == nb: | |
| 486 | + return True | |
| 487 | + pa, pb = cls._normalize_path_only(a), cls._normalize_path_only(b) | |
| 488 | + if pa and pb and pa == pb: | |
| 489 | + return True | |
| 490 | + fa, fb = cls._url_filename(a), cls._url_filename(b) | |
| 491 | + if fa and fb and fa == fb and len(fa) > 4: | |
| 492 | + return True | |
| 493 | + return False | |
| 494 | + | |
| 495 | + @staticmethod | |
| 496 | + def _inner_hit_url_candidates(entry: Dict[str, Any], source: Dict[str, Any]) -> List[str]: | |
| 497 | + """URLs to try for this inner_hit: _source.url plus image_embedding[offset].url.""" | |
| 498 | + out: List[str] = [] | |
| 499 | + src = entry.get("_source") or {} | |
| 500 | + u = src.get("url") | |
| 501 | + if u: | |
| 502 | + out.append(str(u).strip()) | |
| 503 | + nested = entry.get("_nested") | |
| 504 | + if not isinstance(nested, dict): | |
| 505 | + return out | |
| 506 | + off = nested.get("offset") | |
| 507 | + if not isinstance(off, int): | |
| 508 | + return out | |
| 509 | + embs = source.get("image_embedding") | |
| 510 | + if not isinstance(embs, list) or not (0 <= off < len(embs)): | |
| 511 | + return out | |
| 512 | + emb = embs[off] | |
| 513 | + if isinstance(emb, dict) and emb.get("url"): | |
| 514 | + u2 = str(emb.get("url")).strip() | |
| 515 | + if u2 and u2 not in out: | |
| 516 | + out.append(u2) | |
| 517 | + return out | |
| 518 | + | |
| 456 | 519 | def _pick_sku_by_image( |
| 457 | 520 | self, |
| 458 | 521 | hit: Dict[str, Any], |
| 459 | 522 | source: Dict[str, Any], |
| 460 | 523 | ) -> Optional[ImagePick]: |
| 524 | + """Map ES nested image KNN inner_hits to a SKU via image URL alignment. | |
| 525 | + | |
| 526 | + ``image_pick`` is empty when: | |
| 527 | + - ES did not return ``inner_hits`` for this hit (e.g. doc outside | |
| 528 | + ``rescore.window_size`` so no exact-image rescore inner_hits; or the | |
| 529 | + nested image clause did not match this document). | |
| 530 | + - The winning nested ``url`` cannot be aligned to any ``skus[].image_src`` | |
| 531 | + even after path/filename normalization (rare CDN / encoding edge cases). | |
| 532 | + | |
| 533 | + We try ``_source.url``, ``_nested.offset`` + ``image_embedding[offset].url``, | |
| 534 | + and loose path/filename matching to reduce false negatives. | |
| 535 | + """ | |
| 461 | 536 | inner_hits = hit.get("inner_hits") |
| 462 | 537 | if not isinstance(inner_hits, dict): |
| 463 | 538 | return None |
| 464 | - top_url: Optional[str] = None | |
| 539 | + best_entry: Optional[Dict[str, Any]] = None | |
| 465 | 540 | top_score: Optional[float] = None |
| 466 | 541 | for key in _IMAGE_INNER_HITS_KEYS: |
| 467 | 542 | payload = inner_hits.get(key) |
| ... | ... | @@ -474,33 +549,36 @@ class StyleSkuSelector: |
| 474 | 549 | for entry in inner_list: |
| 475 | 550 | if not isinstance(entry, dict): |
| 476 | 551 | continue |
| 477 | - url = (entry.get("_source") or {}).get("url") | |
| 478 | - if not url: | |
| 552 | + if not self._inner_hit_url_candidates(entry, source): | |
| 479 | 553 | continue |
| 480 | 554 | try: |
| 481 | 555 | score = float(entry.get("_score") or 0.0) |
| 482 | 556 | except (TypeError, ValueError): |
| 483 | 557 | score = 0.0 |
| 484 | 558 | if top_score is None or score > top_score: |
| 485 | - top_url = str(url) | |
| 559 | + best_entry = entry | |
| 486 | 560 | top_score = score |
| 487 | - if top_url is not None: | |
| 488 | - break # Prefer the first listed inner_hits source (exact > approx). | |
| 489 | - if top_url is None: | |
| 561 | + if best_entry is not None: | |
| 562 | + break # Prefer exact_image_knn_query_hits over image_knn_query_hits. | |
| 563 | + if best_entry is None: | |
| 564 | + return None | |
| 565 | + | |
| 566 | + candidates = self._inner_hit_url_candidates(best_entry, source) | |
| 567 | + if not candidates: | |
| 490 | 568 | return None |
| 491 | 569 | |
| 492 | 570 | skus = source.get("skus") |
| 493 | 571 | if not isinstance(skus, list): |
| 494 | 572 | return None |
| 495 | - target = self._normalize_url(top_url) | |
| 496 | 573 | for sku in skus: |
| 497 | - sku_url = self._normalize_url(sku.get("image_src") or sku.get("imageSrc")) | |
| 498 | - if sku_url and sku_url == target: | |
| 499 | - return ImagePick( | |
| 500 | - sku_id=str(sku.get("sku_id") or ""), | |
| 501 | - url=top_url, | |
| 502 | - score=float(top_score or 0.0), | |
| 503 | - ) | |
| 574 | + sku_raw = sku.get("image_src") or sku.get("imageSrc") | |
| 575 | + for cand in candidates: | |
| 576 | + if self._urls_equivalent(cand, sku_raw): | |
| 577 | + return ImagePick( | |
| 578 | + sku_id=str(sku.get("sku_id") or ""), | |
| 579 | + url=cand, | |
| 580 | + score=float(top_score or 0.0), | |
| 581 | + ) | |
| 504 | 582 | return None |
| 505 | 583 | |
| 506 | 584 | # ------------------------------------------------------------------ | ... | ... |
tests/ci/test_service_api_contracts.py
tests/conftest.py
| 1 | -""" | |
| 2 | -pytest配置文件 | |
| 1 | +"""pytest 全局配置。 | |
| 2 | + | |
| 3 | +- 项目根路径注入(便于 `tests/` 下模块直接 `from <pkg>` 导入) | |
| 4 | +- marker / testpaths / 过滤规则的**权威来源是 `pytest.ini`**,不在这里重复定义 | |
| 3 | 5 | |
| 4 | -提供测试夹具和共享配置 | |
| 6 | +历史上这里曾定义过一批 `sample_search_config / mock_es_client / test_searcher` 等 | |
| 7 | +fixture,但 2026-Q2 起的测试全部自带 fake stub,这些 fixture 全库无人引用,已一并 | |
| 8 | +移除。新增共享 fixture 时请明确列出其被哪些测试使用,避免再次出现 dead fixtures。 | |
| 5 | 9 | """ |
| 6 | 10 | |
| 7 | 11 | import os |
| 8 | 12 | import sys |
| 9 | -import pytest | |
| 10 | -import tempfile | |
| 11 | -from typing import Dict, Any, Generator | |
| 12 | -from unittest.mock import Mock, MagicMock | |
| 13 | 13 | |
| 14 | -# 添加项目根目录到Python路径 | |
| 15 | 14 | project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) |
| 16 | 15 | sys.path.insert(0, project_root) |
| 17 | - | |
| 18 | -from config import SearchConfig, QueryConfig, IndexConfig, SPUConfig, FunctionScoreConfig, RerankConfig | |
| 19 | -from utils.es_client import ESClient | |
| 20 | -from search import Searcher | |
| 21 | -from query import QueryParser | |
| 22 | -from context import RequestContext, create_request_context | |
| 23 | - | |
| 24 | - | |
| 25 | -@pytest.fixture | |
| 26 | -def sample_index_config() -> IndexConfig: | |
| 27 | - """样例索引配置""" | |
| 28 | - return IndexConfig( | |
| 29 | - name="default", | |
| 30 | - label="默认索引", | |
| 31 | - fields=["title.zh", "brief.zh", "tags"], | |
| 32 | - boost=1.0 | |
| 33 | - ) | |
| 34 | - | |
| 35 | - | |
| 36 | -@pytest.fixture | |
| 37 | -def sample_search_config(sample_index_config) -> SearchConfig: | |
| 38 | - """样例搜索配置""" | |
| 39 | - query_config = QueryConfig( | |
| 40 | - enable_query_rewrite=True, | |
| 41 | - enable_text_embedding=True, | |
| 42 | - supported_languages=["zh", "en"] | |
| 43 | - ) | |
| 44 | - | |
| 45 | - spu_config = SPUConfig( | |
| 46 | - enabled=True, | |
| 47 | - spu_field="spu_id", | |
| 48 | - inner_hits_size=3 | |
| 49 | - ) | |
| 50 | - | |
| 51 | - function_score_config = FunctionScoreConfig() | |
| 52 | - rerank_config = RerankConfig() | |
| 53 | - | |
| 54 | - return SearchConfig( | |
| 55 | - es_index_name="test_products", | |
| 56 | - field_boosts={ | |
| 57 | - "tenant_id": 1.0, | |
| 58 | - "title.zh": 3.0, | |
| 59 | - "brief.zh": 1.5, | |
| 60 | - "tags": 1.0, | |
| 61 | - "category_path.zh": 1.5, | |
| 62 | - }, | |
| 63 | - indexes=[sample_index_config], | |
| 64 | - query_config=query_config, | |
| 65 | - function_score=function_score_config, | |
| 66 | - rerank=rerank_config, | |
| 67 | - spu_config=spu_config | |
| 68 | - ) | |
| 69 | - | |
| 70 | - | |
| 71 | -@pytest.fixture | |
| 72 | -def mock_es_client() -> Mock: | |
| 73 | - """模拟ES客户端""" | |
| 74 | - mock_client = Mock(spec=ESClient) | |
| 75 | - | |
| 76 | - # 模拟搜索响应 | |
| 77 | - mock_response = { | |
| 78 | - "hits": { | |
| 79 | - "total": {"value": 10}, | |
| 80 | - "max_score": 2.5, | |
| 81 | - "hits": [ | |
| 82 | - { | |
| 83 | - "_id": "1", | |
| 84 | - "_score": 2.5, | |
| 85 | - "_source": { | |
| 86 | - "title": {"zh": "红色连衣裙"}, | |
| 87 | - "vendor": {"zh": "测试品牌"}, | |
| 88 | - "min_price": 299.0, | |
| 89 | - "category_id": "1" | |
| 90 | - } | |
| 91 | - }, | |
| 92 | - { | |
| 93 | - "_id": "2", | |
| 94 | - "_score": 2.2, | |
| 95 | - "_source": { | |
| 96 | - "title": {"zh": "蓝色连衣裙"}, | |
| 97 | - "vendor": {"zh": "测试品牌"}, | |
| 98 | - "min_price": 399.0, | |
| 99 | - "category_id": "1" | |
| 100 | - } | |
| 101 | - } | |
| 102 | - ] | |
| 103 | - }, | |
| 104 | - "took": 15 | |
| 105 | - } | |
| 106 | - | |
| 107 | - mock_client.search.return_value = mock_response | |
| 108 | - return mock_client | |
| 109 | - | |
| 110 | - | |
| 111 | -@pytest.fixture | |
| 112 | -def test_searcher(sample_search_config, mock_es_client) -> Searcher: | |
| 113 | - """测试用Searcher实例""" | |
| 114 | - return Searcher( | |
| 115 | - es_client=mock_es_client, | |
| 116 | - config=sample_search_config | |
| 117 | - ) | |
| 118 | - | |
| 119 | - | |
| 120 | -@pytest.fixture | |
| 121 | -def test_query_parser(sample_search_config) -> QueryParser: | |
| 122 | - """测试用QueryParser实例""" | |
| 123 | - return QueryParser(sample_search_config) | |
| 124 | - | |
| 125 | - | |
| 126 | -@pytest.fixture | |
| 127 | -def test_request_context() -> RequestContext: | |
| 128 | - """测试用RequestContext实例""" | |
| 129 | - return create_request_context("test-req-001", "test-user") | |
| 130 | - | |
| 131 | - | |
| 132 | -@pytest.fixture | |
| 133 | -def sample_search_results() -> Dict[str, Any]: | |
| 134 | - """样例搜索结果""" | |
| 135 | - return { | |
| 136 | - "query": "红色连衣裙", | |
| 137 | - "expected_total": 2, | |
| 138 | - "expected_products": [ | |
| 139 | - {"title": "红色连衣裙", "min_price": 299.0}, | |
| 140 | - {"title": "蓝色连衣裙", "min_price": 399.0} | |
| 141 | - ] | |
| 142 | - } | |
| 143 | - | |
| 144 | - | |
| 145 | -@pytest.fixture | |
| 146 | -def temp_config_file() -> Generator[str, None, None]: | |
| 147 | - """临时配置文件""" | |
| 148 | - import tempfile | |
| 149 | - import yaml | |
| 150 | - | |
| 151 | - config_data = { | |
| 152 | - "es_index_name": "test_products", | |
| 153 | - "field_boosts": { | |
| 154 | - "title.zh": 3.0, | |
| 155 | - "brief.zh": 1.5, | |
| 156 | - "tags": 1.0, | |
| 157 | - "category_path.zh": 1.5 | |
| 158 | - }, | |
| 159 | - "indexes": [ | |
| 160 | - { | |
| 161 | - "name": "default", | |
| 162 | - "label": "默认索引", | |
| 163 | - "fields": ["title.zh", "brief.zh", "tags"], | |
| 164 | - "boost": 1.0 | |
| 165 | - } | |
| 166 | - ], | |
| 167 | - "query_config": { | |
| 168 | - "supported_languages": ["zh", "en"], | |
| 169 | - "default_language": "zh", | |
| 170 | - "enable_text_embedding": True, | |
| 171 | - "enable_query_rewrite": True | |
| 172 | - }, | |
| 173 | - "spu_config": { | |
| 174 | - "enabled": True, | |
| 175 | - "spu_field": "spu_id", | |
| 176 | - "inner_hits_size": 3 | |
| 177 | - }, | |
| 178 | - "ranking": { | |
| 179 | - "expression": "bm25() + 0.2*text_embedding_relevance()", | |
| 180 | - "description": "Test ranking" | |
| 181 | - }, | |
| 182 | - "function_score": { | |
| 183 | - "score_mode": "sum", | |
| 184 | - "boost_mode": "multiply", | |
| 185 | - "functions": [] | |
| 186 | - }, | |
| 187 | - "rerank": { | |
| 188 | - "rerank_window": 386 | |
| 189 | - } | |
| 190 | - } | |
| 191 | - | |
| 192 | - with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f: | |
| 193 | - yaml.dump(config_data, f) | |
| 194 | - temp_file = f.name | |
| 195 | - | |
| 196 | - yield temp_file | |
| 197 | - | |
| 198 | - # 清理 | |
| 199 | - os.unlink(temp_file) | |
| 200 | - | |
| 201 | - | |
| 202 | -@pytest.fixture | |
| 203 | -def mock_env_variables(monkeypatch): | |
| 204 | - """设置环境变量""" | |
| 205 | - monkeypatch.setenv("ES_HOST", "http://localhost:9200") | |
| 206 | - monkeypatch.setenv("ES_USERNAME", "elastic") | |
| 207 | - monkeypatch.setenv("ES_PASSWORD", "changeme") | |
| 208 | - | |
| 209 | - | |
| 210 | -# 标记配置 | |
| 211 | -pytest_plugins = [] | |
| 212 | - | |
| 213 | -# 标记定义 | |
| 214 | -def pytest_configure(config): | |
| 215 | - """配置pytest标记""" | |
| 216 | - config.addinivalue_line( | |
| 217 | - "markers", "unit: 单元测试" | |
| 218 | - ) | |
| 219 | - config.addinivalue_line( | |
| 220 | - "markers", "integration: 集成测试" | |
| 221 | - ) | |
| 222 | - config.addinivalue_line( | |
| 223 | - "markers", "api: API测试" | |
| 224 | - ) | |
| 225 | - config.addinivalue_line( | |
| 226 | - "markers", "e2e: 端到端测试" | |
| 227 | - ) | |
| 228 | - config.addinivalue_line( | |
| 229 | - "markers", "performance: 性能测试" | |
| 230 | - ) | |
| 231 | - config.addinivalue_line( | |
| 232 | - "markers", "slow: 慢速测试" | |
| 233 | - ) | |
| 234 | - | |
| 235 | - | |
| 236 | -# 测试数据 | |
| 237 | -@pytest.fixture | |
| 238 | -def test_queries(): | |
| 239 | - """测试查询集合""" | |
| 240 | - return [ | |
| 241 | - "红色连衣裙", | |
| 242 | - "wireless bluetooth headphones", | |
| 243 | - "手机 手机壳", | |
| 244 | - "laptop AND (gaming OR professional)", | |
| 245 | - "运动鞋 -价格:0-500" | |
| 246 | - ] | |
| 247 | - | |
| 248 | - | |
| 249 | -@pytest.fixture | |
| 250 | -def expected_response_structure(): | |
| 251 | - """期望的API响应结构""" | |
| 252 | - return { | |
| 253 | - "hits": list, | |
| 254 | - "total": int, | |
| 255 | - "max_score": float, | |
| 256 | - "took_ms": int, | |
| 257 | - "aggregations": dict, | |
| 258 | - "query_info": dict, | |
| 259 | - "performance_summary": dict | |
| 260 | - } | ... | ... |
tests/test_cnclip_service.py renamed to tests/manual/test_cnclip_service.py
tests/test_facet_api.py renamed to tests/manual/test_facet_api.py
tests/test_cache_keys.py
tests/test_embedding_pipeline.py
| ... | ... | @@ -21,6 +21,8 @@ from embeddings.config import CONFIG |
| 21 | 21 | from query import QueryParser |
| 22 | 22 | from context.request_context import create_request_context, set_current_request_context, clear_current_request_context |
| 23 | 23 | |
| 24 | +pytestmark = [pytest.mark.embedding, pytest.mark.regression] | |
| 25 | + | |
| 24 | 26 | |
| 25 | 27 | class _FakeRedis: |
| 26 | 28 | def __init__(self): |
| ... | ... | @@ -177,8 +179,10 @@ def test_text_embedding_encoder_cache_hit(monkeypatch): |
| 177 | 179 | out = encoder.encode(["cached-text", "new-text"]) |
| 178 | 180 | |
| 179 | 181 | assert calls["count"] == 1 |
| 180 | - assert np.allclose(out[0], cached) | |
| 181 | - assert np.allclose(out[1], np.array([0.3, 0.4], dtype=np.float32)) | |
| 182 | + # encoder returns an object-dtype ndarray of 1-D float32 vectors; cast per-row | |
| 183 | + # before numeric comparison. | |
| 184 | + assert np.allclose(np.asarray(out[0], dtype=np.float32), cached) | |
| 185 | + assert np.allclose(np.asarray(out[1], dtype=np.float32), np.array([0.3, 0.4], dtype=np.float32)) | |
| 182 | 186 | |
| 183 | 187 | |
| 184 | 188 | def test_text_embedding_encoder_forwards_request_headers(monkeypatch): | ... | ... |
tests/test_embedding_service_limits.py
tests/test_embedding_service_priority.py
| ... | ... | @@ -2,6 +2,10 @@ import threading |
| 2 | 2 | |
| 3 | 3 | import embeddings.server as emb_server |
| 4 | 4 | |
| 5 | +import pytest | |
| 6 | + | |
| 7 | +pytestmark = [pytest.mark.embedding, pytest.mark.regression] | |
| 8 | + | |
| 5 | 9 | |
| 6 | 10 | def test_text_inflight_limiter_priority_bypass(): |
| 7 | 11 | limiter = emb_server._InflightLimiter(name="text", limit=1) |
| ... | ... | @@ -30,6 +34,7 @@ def test_text_dispatch_prefers_high_priority_queue(): |
| 30 | 34 | normalized=["online"], |
| 31 | 35 | effective_normalize=True, |
| 32 | 36 | request_id="high", |
| 37 | + user_id="u-high", | |
| 33 | 38 | priority=1, |
| 34 | 39 | created_at=0.0, |
| 35 | 40 | done=threading.Event(), |
| ... | ... | @@ -38,6 +43,7 @@ def test_text_dispatch_prefers_high_priority_queue(): |
| 38 | 43 | normalized=["offline"], |
| 39 | 44 | effective_normalize=True, |
| 40 | 45 | request_id="normal", |
| 46 | + user_id="u-normal", | |
| 41 | 47 | priority=0, |
| 42 | 48 | created_at=0.0, |
| 43 | 49 | done=threading.Event(), | ... | ... |
tests/test_es_query_builder.py
tests/test_es_query_builder_text_recall_languages.py
| ... | ... | @@ -14,6 +14,10 @@ import numpy as np |
| 14 | 14 | from query.keyword_extractor import KEYWORDS_QUERY_BASE_KEY |
| 15 | 15 | from search.es_query_builder import ESQueryBuilder |
| 16 | 16 | |
| 17 | +import pytest | |
| 18 | + | |
| 19 | +pytestmark = [pytest.mark.search, pytest.mark.regression] | |
| 20 | + | |
| 17 | 21 | |
| 18 | 22 | def _builder_multilingual_title_only(*, default_language: str = "en") -> ESQueryBuilder: |
| 19 | 23 | """Minimal builder: only title.{lang} for easy field assertions.""" |
| ... | ... | @@ -135,8 +139,13 @@ def test_zh_query_index_zh_en_includes_base_zh_and_trans_en(): |
| 135 | 139 | assert "title.en" in _title_fields(idx["base_query_trans_en"]) |
| 136 | 140 | |
| 137 | 141 | |
| 138 | -def test_keywords_combined_fields_second_must_same_fields_and_50pct(): | |
| 139 | - """When ParsedQuery.keywords_queries is set, inner must has two boosted combined_fields.""" | |
| 142 | +def test_keywords_combined_fields_second_must_shares_fields_with_main_query(): | |
| 143 | + """When ParsedQuery.keywords_queries is set, inner must has two boosted combined_fields. | |
| 144 | + | |
| 145 | + The second must sub-clause reuses the primary clause's field set and applies a | |
| 146 | + tuned minimum_should_match / boost to keep keyword recall under control; see | |
| 147 | + `search/es_query_builder.py` ``_keywords_combined_fields_sub_must``. | |
| 148 | + """ | |
| 140 | 149 | qb = _builder_multilingual_title_only(default_language="en") |
| 141 | 150 | parsed = SimpleNamespace( |
| 142 | 151 | rewritten_query="连衣裙", |
| ... | ... | @@ -153,16 +162,16 @@ def test_keywords_combined_fields_second_must_same_fields_and_50pct(): |
| 153 | 162 | assert bm[0]["combined_fields"]["query"] == "连衣裙" |
| 154 | 163 | assert bm[0]["combined_fields"]["boost"] == 2.0 |
| 155 | 164 | assert bm[1]["combined_fields"]["query"] == "连衣 裙" |
| 156 | - assert bm[1]["combined_fields"]["minimum_should_match"] == "50%" | |
| 157 | - assert bm[1]["combined_fields"]["boost"] == 0.6 | |
| 165 | + assert bm[1]["combined_fields"]["minimum_should_match"] == "60%" | |
| 166 | + assert bm[1]["combined_fields"]["boost"] == 0.8 | |
| 158 | 167 | assert bm[1]["combined_fields"]["fields"] == bm[0]["combined_fields"]["fields"] |
| 159 | 168 | trans = idx["base_query_trans_en"] |
| 160 | 169 | assert trans["minimum_should_match"] == 1 |
| 161 | 170 | tm = _combined_fields_must(trans) |
| 162 | 171 | assert len(tm) == 2 |
| 163 | 172 | assert tm[1]["combined_fields"]["query"] == "dress" |
| 164 | - assert tm[1]["combined_fields"]["minimum_should_match"] == "50%" | |
| 165 | - assert tm[1]["combined_fields"]["boost"] == 0.6 | |
| 173 | + assert tm[1]["combined_fields"]["minimum_should_match"] == "60%" | |
| 174 | + assert tm[1]["combined_fields"]["boost"] == 0.8 | |
| 166 | 175 | |
| 167 | 176 | |
| 168 | 177 | def test_keywords_omitted_when_same_as_main_combined_fields_query(): | ... | ... |
tests/test_eval_framework_clients.py
| ... | ... | @@ -4,6 +4,8 @@ import requests |
| 4 | 4 | from scripts.evaluation.eval_framework.clients import DashScopeLabelClient |
| 5 | 5 | from scripts.evaluation.eval_framework.utils import build_label_doc_line |
| 6 | 6 | |
| 7 | +pytestmark = [pytest.mark.eval] | |
| 8 | + | |
| 7 | 9 | |
| 8 | 10 | def _http_error(status_code: int, body: str) -> requests.exceptions.HTTPError: |
| 9 | 11 | response = requests.Response() | ... | ... |
tests/test_eval_metrics.py
| 1 | 1 | """Tests for search evaluation ranking metrics (NDCG, ERR).""" |
| 2 | 2 | |
| 3 | +import math | |
| 4 | + | |
| 5 | +import pytest | |
| 6 | + | |
| 7 | +pytestmark = [pytest.mark.eval, pytest.mark.regression] | |
| 8 | + | |
| 3 | 9 | from scripts.evaluation.eval_framework.constants import ( |
| 4 | - RELEVANCE_EXACT, | |
| 5 | - RELEVANCE_HIGH, | |
| 6 | - RELEVANCE_IRRELEVANT, | |
| 7 | - RELEVANCE_LOW, | |
| 10 | + RELEVANCE_LV0, | |
| 11 | + RELEVANCE_LV1, | |
| 12 | + RELEVANCE_LV2, | |
| 13 | + RELEVANCE_LV3, | |
| 14 | + STOP_PROB_MAP, | |
| 8 | 15 | ) |
| 9 | 16 | from scripts.evaluation.eval_framework.metrics import compute_query_metrics |
| 10 | 17 | |
| 11 | 18 | |
| 12 | -def test_err_matches_documented_three_item_examples(): | |
| 13 | - # Model A: [Exact, Irrelevant, High] -> ERR ≈ 0.992667 | |
| 14 | - m_a = compute_query_metrics( | |
| 15 | - [RELEVANCE_EXACT, RELEVANCE_IRRELEVANT, RELEVANCE_HIGH], | |
| 16 | - ideal_labels=[RELEVANCE_EXACT], | |
| 17 | - ) | |
| 18 | - assert abs(m_a["ERR@5"] - (0.99 + (1.0 / 3.0) * 0.8 * 0.01)) < 1e-5 | |
| 19 | - | |
| 20 | - # Model B: [High, Low, Exact] -> ERR ≈ 0.8694 | |
| 21 | - m_b = compute_query_metrics( | |
| 22 | - [RELEVANCE_HIGH, RELEVANCE_LOW, RELEVANCE_EXACT], | |
| 23 | - ideal_labels=[RELEVANCE_EXACT], | |
| 24 | - ) | |
| 25 | - expected_b = 0.8 + 0.5 * 0.1 * 0.2 + (1.0 / 3.0) * 0.99 * 0.18 | |
| 26 | - assert abs(m_b["ERR@5"] - expected_b) < 1e-5 | |
| 19 | +def _expected_err(labels): | |
| 20 | + err = 0.0 | |
| 21 | + product = 1.0 | |
| 22 | + for i, label in enumerate(labels, start=1): | |
| 23 | + p = STOP_PROB_MAP[label] | |
| 24 | + err += (1.0 / i) * p * product | |
| 25 | + product *= 1.0 - p | |
| 26 | + return err | |
| 27 | + | |
| 28 | + | |
| 29 | +def test_err_matches_cascade_formula_on_four_level_labels(): | |
| 30 | + """ERR@k must equal the textbook cascade formula against the four-level label set. | |
| 31 | + | |
| 32 | + The metric is the primary ranking signal (see `PRIMARY_METRIC_KEYS` in | |
| 33 | + `eval_framework.metrics`); any regression here invalidates the whole | |
| 34 | + evaluation pipeline. | |
| 35 | + """ | |
| 36 | + | |
| 37 | + ranked_a = [RELEVANCE_LV3, RELEVANCE_LV0, RELEVANCE_LV2] | |
| 38 | + ranked_b = [RELEVANCE_LV2, RELEVANCE_LV1, RELEVANCE_LV3] | |
| 39 | + | |
| 40 | + m_a = compute_query_metrics(ranked_a, ideal_labels=[RELEVANCE_LV3]) | |
| 41 | + m_b = compute_query_metrics(ranked_b, ideal_labels=[RELEVANCE_LV3]) | |
| 42 | + | |
| 43 | + assert math.isclose(m_a["ERR@5"], _expected_err(ranked_a), abs_tol=1e-5) | |
| 44 | + assert math.isclose(m_b["ERR@5"], _expected_err(ranked_b), abs_tol=1e-5) | |
| 45 | + assert m_a["ERR@5"] > m_b["ERR@5"] | |
| 46 | + | |
| 47 | + | |
| 48 | +def test_ndcg_at_k_is_1_when_actual_equals_ideal(): | |
| 49 | + labels = [RELEVANCE_LV3, RELEVANCE_LV2, RELEVANCE_LV1] | |
| 50 | + metrics = compute_query_metrics(labels, ideal_labels=labels) | |
| 51 | + assert math.isclose(metrics["NDCG@5"], 1.0, abs_tol=1e-9) | |
| 52 | + assert math.isclose(metrics["NDCG@20"], 1.0, abs_tol=1e-9) | |
| 53 | + | |
| 54 | + | |
| 55 | +def test_all_irrelevant_zeroes_out_primary_signals(): | |
| 56 | + labels = [RELEVANCE_LV0, RELEVANCE_LV0, RELEVANCE_LV0] | |
| 57 | + metrics = compute_query_metrics(labels, ideal_labels=[RELEVANCE_LV3]) | |
| 58 | + assert metrics["ERR@10"] == 0.0 | |
| 59 | + assert metrics["NDCG@20"] == 0.0 | |
| 60 | + assert metrics["Strong_Precision@10"] == 0.0 | |
| 61 | + assert metrics["Primary_Metric_Score"] == 0.0 | ... | ... |
tests/test_keywords_query.py deleted
| ... | ... | @@ -1,115 +0,0 @@ |
| 1 | -import hanlp | |
| 2 | -from typing import List, Tuple, Dict, Any | |
| 3 | - | |
| 4 | -class KeywordExtractor: | |
| 5 | - """ | |
| 6 | - 基于 HanLP 的名词关键词提取器 | |
| 7 | - """ | |
| 8 | - def __init__(self): | |
| 9 | - # 加载带位置信息的分词模型(细粒度) | |
| 10 | - self.tok = hanlp.load(hanlp.pretrained.tok.CTB9_TOK_ELECTRA_BASE_CRF) | |
| 11 | - self.tok.config.output_spans = True # 启用位置输出 | |
| 12 | - | |
| 13 | - # 加载词性标注模型 | |
| 14 | - self.pos_tag = hanlp.load(hanlp.pretrained.pos.CTB9_POS_ELECTRA_SMALL) | |
| 15 | - | |
| 16 | - def extract_keywords(self, query: str) -> str: | |
| 17 | - """ | |
| 18 | - 从查询中提取关键词(名词,长度 ≥ 2) | |
| 19 | - | |
| 20 | - Args: | |
| 21 | - query: 输入文本 | |
| 22 | - | |
| 23 | - Returns: | |
| 24 | - 拼接后的关键词字符串,非连续词之间自动插入空格 | |
| 25 | - """ | |
| 26 | - query = query.strip() | |
| 27 | - # 分词结果带位置:[[word, start, end], ...] | |
| 28 | - tok_result_with_position = self.tok(query) | |
| 29 | - tok_result = [x[0] for x in tok_result_with_position] | |
| 30 | - | |
| 31 | - # 词性标注 | |
| 32 | - pos_tag_result = list(zip(tok_result, self.pos_tag(tok_result))) | |
| 33 | - | |
| 34 | - # 需要忽略的词 | |
| 35 | - ignore_keywords = ['玩具'] | |
| 36 | - | |
| 37 | - keywords = [] | |
| 38 | - last_end_pos = 0 | |
| 39 | - | |
| 40 | - for (word, postag), (_, start_pos, end_pos) in zip(pos_tag_result, tok_result_with_position): | |
| 41 | - if len(word) >= 2 and postag.startswith('N'): | |
| 42 | - if word in ignore_keywords: | |
| 43 | - continue | |
| 44 | - # 如果当前词与上一个词在原文中不连续,插入空格 | |
| 45 | - if start_pos != last_end_pos and keywords: | |
| 46 | - keywords.append(" ") | |
| 47 | - keywords.append(word) | |
| 48 | - last_end_pos = end_pos | |
| 49 | - # 可选:打印调试信息 | |
| 50 | - # print(f'分词: {word} | 词性: {postag} | 起始: {start_pos} | 结束: {end_pos}') | |
| 51 | - | |
| 52 | - return "".join(keywords).strip() | |
| 53 | - | |
| 54 | - | |
| 55 | -# 测试代码 | |
| 56 | -if __name__ == "__main__": | |
| 57 | - extractor = KeywordExtractor() | |
| 58 | - | |
| 59 | - test_queries = [ | |
| 60 | - # 中文(保留 9 个代表性查询) | |
| 61 | - "2.4G遥控大蛇", | |
| 62 | - "充气的篮球", | |
| 63 | - "遥控 塑料 飞船 汽车 ", | |
| 64 | - "亚克力相框", | |
| 65 | - "8寸 搪胶蘑菇钉", | |
| 66 | - "7寸娃娃", | |
| 67 | - "太空沙套装", | |
| 68 | - "脚蹬工程车", | |
| 69 | - "捏捏乐钥匙扣", | |
| 70 | - | |
| 71 | - # 英文(新增) | |
| 72 | - "plastic toy car", | |
| 73 | - "remote control helicopter", | |
| 74 | - "inflatable beach ball", | |
| 75 | - "music keychain", | |
| 76 | - "sand play set", | |
| 77 | - # 常见商品搜索 | |
| 78 | - "plastic dinosaur toy", | |
| 79 | - "wireless bluetooth speaker", | |
| 80 | - "4K action camera", | |
| 81 | - "stainless steel water bottle", | |
| 82 | - "baby stroller with cup holder", | |
| 83 | - | |
| 84 | - # 疑问式 / 自然语言 | |
| 85 | - "what is the best smartphone under 500 dollars", | |
| 86 | - "how to clean a laptop screen", | |
| 87 | - "where can I buy organic coffee beans", | |
| 88 | - | |
| 89 | - # 含数字、特殊字符 | |
| 90 | - "USB-C to HDMI adapter 4K", | |
| 91 | - "LED strip lights 16.4ft", | |
| 92 | - "Nintendo Switch OLED model", | |
| 93 | - "iPhone 15 Pro Max case", | |
| 94 | - | |
| 95 | - # 简短词组 | |
| 96 | - "gaming mouse", | |
| 97 | - "mechanical keyboard", | |
| 98 | - "wireless earbuds", | |
| 99 | - | |
| 100 | - # 长尾词 | |
| 101 | - "rechargeable AA batteries with charger", | |
| 102 | - "foldable picnic blanket waterproof", | |
| 103 | - | |
| 104 | - # 商品属性组合 | |
| 105 | - "women's running shoes size 8", | |
| 106 | - "men's cotton t-shirt crew neck", | |
| 107 | - | |
| 108 | - | |
| 109 | - # 其他语种(保留原样,用于多语言测试) | |
| 110 | - "свет USB с пультом дистанционного управления красочные", # 俄语 | |
| 111 | - ] | |
| 112 | - | |
| 113 | - for q in test_queries: | |
| 114 | - keywords = extractor.extract_keywords(q) | |
| 115 | - print(f"{q:30} => {keywords}") |
tests/test_llm_enrichment_batch_fill.py
| ... | ... | @@ -6,6 +6,10 @@ import pandas as pd |
| 6 | 6 | |
| 7 | 7 | from indexer.document_transformer import SPUDocumentTransformer |
| 8 | 8 | |
| 9 | +import pytest | |
| 10 | + | |
| 11 | +pytestmark = [pytest.mark.indexer, pytest.mark.regression] | |
| 12 | + | |
| 9 | 13 | |
| 10 | 14 | def test_fill_llm_attributes_batch_uses_product_enrich_helper(monkeypatch): |
| 11 | 15 | seen_calls: List[Dict[str, Any]] = [] | ... | ... |
tests/test_process_products_batching.py
| ... | ... | @@ -4,6 +4,10 @@ from typing import Any, Dict, List |
| 4 | 4 | |
| 5 | 5 | import indexer.product_enrich as process_products |
| 6 | 6 | |
| 7 | +import pytest | |
| 8 | + | |
| 9 | +pytestmark = [pytest.mark.indexer, pytest.mark.regression] | |
| 10 | + | |
| 7 | 11 | |
| 8 | 12 | def _mk_products(n: int) -> List[Dict[str, str]]: |
| 9 | 13 | return [{"id": str(i), "title": f"title-{i}"} for i in range(n)] | ... | ... |
tests/test_product_enrich_partial_mode.py
| ... | ... | @@ -9,6 +9,10 @@ import types |
| 9 | 9 | from pathlib import Path |
| 10 | 10 | from unittest import mock |
| 11 | 11 | |
| 12 | +import pytest | |
| 13 | + | |
| 14 | +pytestmark = [pytest.mark.indexer, pytest.mark.regression] | |
| 15 | + | |
| 12 | 16 | |
| 13 | 17 | def _load_product_enrich_module(): |
| 14 | 18 | if "dotenv" not in sys.modules: |
| ... | ... | @@ -75,6 +79,12 @@ def test_create_prompt_splits_shared_context_and_localized_tail(): |
| 75 | 79 | |
| 76 | 80 | |
| 77 | 81 | def test_create_prompt_supports_taxonomy_analysis_kind(): |
| 82 | + """Taxonomy schema must produce prompts for every language it declares. | |
| 83 | + | |
| 84 | + Unsupported (schema, lang) combinations return ``(None, None, None)`` so the | |
| 85 | + caller (``process_batch``) can mark the batch as failed without calling LLM, | |
| 86 | + instead of silently emitting garbage. | |
| 87 | + """ | |
| 78 | 88 | products = [{"id": "1", "title": "linen dress"}] |
| 79 | 89 | |
| 80 | 90 | shared_zh, user_zh, prefix_zh = product_enrich.create_prompt( |
| ... | ... | @@ -82,18 +92,26 @@ def test_create_prompt_supports_taxonomy_analysis_kind(): |
| 82 | 92 | target_lang="zh", |
| 83 | 93 | analysis_kind="taxonomy", |
| 84 | 94 | ) |
| 85 | - shared_fr, user_fr, prefix_fr = product_enrich.create_prompt( | |
| 95 | + shared_en, user_en, prefix_en = product_enrich.create_prompt( | |
| 86 | 96 | products, |
| 87 | - target_lang="fr", | |
| 97 | + target_lang="en", | |
| 88 | 98 | analysis_kind="taxonomy", |
| 89 | 99 | ) |
| 90 | 100 | |
| 91 | 101 | assert "apparel attribute taxonomy" in shared_zh |
| 92 | 102 | assert "1. linen dress" in shared_zh |
| 93 | 103 | assert "Language: Chinese" in user_zh |
| 94 | - assert "Language: French" in user_fr | |
| 104 | + assert "Language: English" in user_en | |
| 95 | 105 | assert prefix_zh.startswith("| 序号 | 品类 | 目标性别 |") |
| 96 | - assert prefix_fr.startswith("| No. | Product Type | Target Gender |") | |
| 106 | + assert prefix_en.startswith("| No. | Product Type | Target Gender |") | |
| 107 | + | |
| 108 | + # Unsupported (schema, lang) must return a sentinel. French is not declared | |
| 109 | + # by any taxonomy schema. | |
| 110 | + assert product_enrich.create_prompt( | |
| 111 | + products, | |
| 112 | + target_lang="fr", | |
| 113 | + analysis_kind="taxonomy", | |
| 114 | + ) == (None, None, None) | |
| 97 | 115 | |
| 98 | 116 | |
| 99 | 117 | def test_call_llm_logs_shared_context_once_and_verbose_contains_full_requests(): |
| ... | ... | @@ -573,7 +591,11 @@ def test_build_index_content_fields_non_apparel_taxonomy_returns_en_only(): |
| 573 | 591 | seen_calls.append((analysis_kind, target_lang, category_taxonomy_profile, tuple(p["id"] for p in products))) |
| 574 | 592 | if analysis_kind == "taxonomy": |
| 575 | 593 | assert category_taxonomy_profile == "toys" |
| 576 | - assert target_lang == "en" | |
| 594 | + # Non-apparel taxonomy profiles only emit en; mirror the real | |
| 595 | + # `analyze_products` by returning empty for unsupported langs so the | |
| 596 | + # caller drops zh silently. | |
| 597 | + if target_lang != "en": | |
| 598 | + return [] | |
| 577 | 599 | return [ |
| 578 | 600 | { |
| 579 | 601 | "id": products[0]["id"], |
| ... | ... | @@ -638,7 +660,6 @@ def test_build_index_content_fields_non_apparel_taxonomy_returns_en_only(): |
| 638 | 660 | ], |
| 639 | 661 | } |
| 640 | 662 | ] |
| 641 | - assert ("taxonomy", "zh", "toys", ("2",)) not in seen_calls | |
| 642 | 663 | assert ("taxonomy", "en", "toys", ("2",)) in seen_calls |
| 643 | 664 | |
| 644 | 665 | ... | ... |
tests/test_product_title_exclusion.py
| ... | ... | @@ -6,6 +6,10 @@ from query.product_title_exclusion import ( |
| 6 | 6 | ProductTitleExclusionRegistry, |
| 7 | 7 | ) |
| 8 | 8 | |
| 9 | +import pytest | |
| 10 | + | |
| 11 | +pytestmark = [pytest.mark.intent, pytest.mark.regression] | |
| 12 | + | |
| 9 | 13 | |
| 10 | 14 | def test_product_title_exclusion_detector_matches_translated_english_token(): |
| 11 | 15 | query_config = QueryConfig( | ... | ... |
tests/test_query_parser_mixed_language.py
| 1 | 1 | from config import FunctionScoreConfig, IndexConfig, QueryConfig, RerankConfig, SPUConfig, SearchConfig |
| 2 | 2 | from query.query_parser import QueryParser |
| 3 | 3 | |
| 4 | +import pytest | |
| 5 | + | |
| 6 | +pytestmark = [pytest.mark.query, pytest.mark.regression] | |
| 7 | + | |
| 4 | 8 | |
| 5 | 9 | class _DummyTranslator: |
| 6 | 10 | def translate(self, text, target_lang, source_lang, scene, model_name): | ... | ... |
tests/test_rerank_client.py
| ... | ... | @@ -3,6 +3,10 @@ from math import isclose |
| 3 | 3 | from config.schema import CoarseRankFusionConfig, RerankFusionConfig |
| 4 | 4 | from search.rerank_client import coarse_resort_hits, fuse_scores_and_resort, run_lightweight_rerank |
| 5 | 5 | |
| 6 | +import pytest | |
| 7 | + | |
| 8 | +pytestmark = [pytest.mark.rerank, pytest.mark.regression] | |
| 9 | + | |
| 6 | 10 | |
| 7 | 11 | def test_fuse_scores_and_resort_aggregates_text_components_and_keeps_rerank_primary(): |
| 8 | 12 | hits = [ | ... | ... |
tests/test_rerank_provider_topn.py
| ... | ... | @@ -4,6 +4,10 @@ from typing import Any, Dict |
| 4 | 4 | |
| 5 | 5 | from providers.rerank import HttpRerankProvider |
| 6 | 6 | |
| 7 | +import pytest | |
| 8 | + | |
| 9 | +pytestmark = [pytest.mark.rerank, pytest.mark.regression] | |
| 10 | + | |
| 7 | 11 | |
| 8 | 12 | class _FakeResponse: |
| 9 | 13 | def __init__(self, status_code: int, data: Dict[str, Any]): | ... | ... |
tests/test_rerank_query_text.py
| ... | ... | @@ -2,6 +2,10 @@ |
| 2 | 2 | |
| 3 | 3 | from query.query_parser import ParsedQuery, rerank_query_text |
| 4 | 4 | |
| 5 | +import pytest | |
| 6 | + | |
| 7 | +pytestmark = [pytest.mark.rerank, pytest.mark.regression] | |
| 8 | + | |
| 5 | 9 | |
| 6 | 10 | def test_rerank_query_text_zh_uses_original(): |
| 7 | 11 | assert rerank_query_text("你好", detected_language="zh", translations={"en": "hello"}) == "你好" | ... | ... |
tests/test_reranker_dashscope_backend.py
| ... | ... | @@ -7,6 +7,8 @@ import pytest |
| 7 | 7 | from reranker.backends import get_rerank_backend |
| 8 | 8 | from reranker.backends.dashscope_rerank import DashScopeRerankBackend |
| 9 | 9 | |
| 10 | +pytestmark = [pytest.mark.rerank, pytest.mark.regression] | |
| 11 | + | |
| 10 | 12 | |
| 11 | 13 | @pytest.fixture(autouse=True) |
| 12 | 14 | def _clear_global_dashscope_key(monkeypatch): | ... | ... |
tests/test_reranker_qwen3_gguf_backend.py
| ... | ... | @@ -6,6 +6,10 @@ import types |
| 6 | 6 | from reranker.backends import get_rerank_backend |
| 7 | 7 | from reranker.backends.qwen3_gguf import Qwen3GGUFRerankerBackend |
| 8 | 8 | |
| 9 | +import pytest | |
| 10 | + | |
| 11 | +pytestmark = [pytest.mark.rerank, pytest.mark.regression] | |
| 12 | + | |
| 9 | 13 | |
| 10 | 14 | class _FakeLlama: |
| 11 | 15 | def __init__(self, model_path: str | None = None, **kwargs): | ... | ... |
tests/test_reranker_server_topn.py
tests/test_search_evaluation_datasets.py
| 1 | 1 | from config.loader import get_app_config |
| 2 | 2 | from scripts.evaluation.eval_framework.datasets import resolve_dataset |
| 3 | 3 | |
| 4 | +import pytest | |
| 5 | + | |
| 6 | +pytestmark = [pytest.mark.eval] | |
| 7 | + | |
| 4 | 8 | |
| 5 | 9 | def test_search_evaluation_registry_contains_expected_datasets() -> None: |
| 6 | 10 | se = get_app_config().search_evaluation | ... | ... |
tests/test_search_rerank_window.py
| ... | ... | @@ -22,6 +22,10 @@ from context import create_request_context |
| 22 | 22 | from query.style_intent import DetectedStyleIntent, StyleIntentProfile |
| 23 | 23 | from search.searcher import Searcher |
| 24 | 24 | |
| 25 | +import pytest | |
| 26 | + | |
| 27 | +pytestmark = [pytest.mark.search, pytest.mark.regression] | |
| 28 | + | |
| 25 | 29 | |
| 26 | 30 | @dataclass |
| 27 | 31 | class _FakeParsedQuery: | ... | ... |
tests/test_sku_intent_selector.py
| ... | ... | @@ -6,6 +6,8 @@ from config import QueryConfig |
| 6 | 6 | from query.style_intent import DetectedStyleIntent, StyleIntentProfile, StyleIntentRegistry |
| 7 | 7 | from search.sku_intent_selector import StyleSkuSelector |
| 8 | 8 | |
| 9 | +pytestmark = [pytest.mark.intent, pytest.mark.regression] | |
| 10 | + | |
| 9 | 11 | |
| 10 | 12 | def test_style_sku_selector_matches_first_sku_by_attribute_terms(): |
| 11 | 13 | registry = StyleIntentRegistry.from_query_config( |
| ... | ... | @@ -537,3 +539,73 @@ def test_image_pick_ignored_when_text_matches_but_visual_url_not_in_text_set(): |
| 537 | 539 | assert decision.selected_sku_id == "khaki" |
| 538 | 540 | assert decision.final_source == "option" |
| 539 | 541 | assert decision.image_pick_sku_id == "black" |
| 542 | + | |
| 543 | + | |
| 544 | +def test_image_pick_matches_when_inner_hit_url_has_query_string(): | |
| 545 | + """inner_hits 带 ?v=1,SKU 无 query —— 应用归一化后应对齐。""" | |
| 546 | + selector = StyleSkuSelector(_color_registry()) | |
| 547 | + parsed_query = SimpleNamespace(style_intent_profile=None) | |
| 548 | + hits = [ | |
| 549 | + { | |
| 550 | + "_id": "spu-1", | |
| 551 | + "_source": { | |
| 552 | + "skus": [ | |
| 553 | + { | |
| 554 | + "sku_id": "s1", | |
| 555 | + "image_src": "https://cdn/img/p.jpg", | |
| 556 | + }, | |
| 557 | + ], | |
| 558 | + }, | |
| 559 | + "inner_hits": { | |
| 560 | + "exact_image_knn_query_hits": { | |
| 561 | + "hits": { | |
| 562 | + "hits": [ | |
| 563 | + { | |
| 564 | + "_score": 0.8, | |
| 565 | + "_source": {"url": "https://cdn/img/p.jpg?width=800&quality=85"}, | |
| 566 | + } | |
| 567 | + ] | |
| 568 | + } | |
| 569 | + } | |
| 570 | + }, | |
| 571 | + } | |
| 572 | + ] | |
| 573 | + d = selector.prepare_hits(hits, parsed_query)["spu-1"] | |
| 574 | + assert d.selected_sku_id == "s1" | |
| 575 | + assert d.final_source == "image" | |
| 576 | + | |
| 577 | + | |
| 578 | +def test_image_pick_uses_nested_offset_and_image_embedding_when_needed(): | |
| 579 | + """_source.url 与 sku 写法不一致时,用 offset 从 image_embedding 取 canonical url。""" | |
| 580 | + selector = StyleSkuSelector(_color_registry()) | |
| 581 | + parsed_query = SimpleNamespace(style_intent_profile=None) | |
| 582 | + hits = [ | |
| 583 | + { | |
| 584 | + "_id": "spu-1", | |
| 585 | + "_source": { | |
| 586 | + "image_embedding": [ | |
| 587 | + {"url": "https://cdn/a/spu.jpg"}, | |
| 588 | + {"url": "https://cdn/b/sku-match.jpg"}, | |
| 589 | + ], | |
| 590 | + "skus": [ | |
| 591 | + {"sku_id": "sku-a", "image_src": "//cdn/b/sku-match.jpg"}, | |
| 592 | + ], | |
| 593 | + }, | |
| 594 | + "inner_hits": { | |
| 595 | + "exact_image_knn_query_hits": { | |
| 596 | + "hits": { | |
| 597 | + "hits": [ | |
| 598 | + { | |
| 599 | + "_score": 0.91, | |
| 600 | + "_nested": {"field": "image_embedding", "offset": 1}, | |
| 601 | + "_source": {"url": "https://wrong.example/x.jpg"}, | |
| 602 | + } | |
| 603 | + ] | |
| 604 | + } | |
| 605 | + } | |
| 606 | + }, | |
| 607 | + } | |
| 608 | + ] | |
| 609 | + d = selector.prepare_hits(hits, parsed_query)["spu-1"] | |
| 610 | + assert d.selected_sku_id == "sku-a" | |
| 611 | + assert d.image_pick_url == "https://cdn/b/sku-match.jpg" | ... | ... |
tests/test_style_intent.py
| ... | ... | @@ -3,6 +3,10 @@ from types import SimpleNamespace |
| 3 | 3 | from config import QueryConfig |
| 4 | 4 | from query.style_intent import StyleIntentDetector, StyleIntentRegistry |
| 5 | 5 | |
| 6 | +import pytest | |
| 7 | + | |
| 8 | +pytestmark = [pytest.mark.intent, pytest.mark.regression] | |
| 9 | + | |
| 6 | 10 | |
| 7 | 11 | def test_style_intent_detector_matches_original_and_translated_queries(): |
| 8 | 12 | query_config = QueryConfig( | ... | ... |
tests/test_suggestions.py
| ... | ... | @@ -12,6 +12,8 @@ from suggestion.builder import ( |
| 12 | 12 | ) |
| 13 | 13 | from suggestion.service import SuggestionService |
| 14 | 14 | |
| 15 | +pytestmark = [pytest.mark.suggestion, pytest.mark.regression] | |
| 16 | + | |
| 15 | 17 | |
| 16 | 18 | class FakeESClient: |
| 17 | 19 | """Lightweight fake ES client for suggestion unit tests.""" |
| ... | ... | @@ -160,7 +162,6 @@ class FakeESClient: |
| 160 | 162 | return sorted([x for x in self.indices if x.startswith(prefix)]) |
| 161 | 163 | |
| 162 | 164 | |
| 163 | -@pytest.mark.unit | |
| 164 | 165 | def test_versioned_index_name_uses_microseconds(): |
| 165 | 166 | build_at = datetime(2026, 4, 7, 3, 52, 26, 123456, tzinfo=timezone.utc) |
| 166 | 167 | assert ( |
| ... | ... | @@ -169,7 +170,6 @@ def test_versioned_index_name_uses_microseconds(): |
| 169 | 170 | ) |
| 170 | 171 | |
| 171 | 172 | |
| 172 | -@pytest.mark.unit | |
| 173 | 173 | def test_rebuild_cleans_up_unallocatable_new_index(): |
| 174 | 174 | fake_es = FakeESClient() |
| 175 | 175 | |
| ... | ... | @@ -221,7 +221,6 @@ def test_rebuild_cleans_up_unallocatable_new_index(): |
| 221 | 221 | assert created_index not in fake_es.indices |
| 222 | 222 | |
| 223 | 223 | |
| 224 | -@pytest.mark.unit | |
| 225 | 224 | def test_resolve_query_language_prefers_log_field(): |
| 226 | 225 | fake_es = FakeESClient() |
| 227 | 226 | builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None) |
| ... | ... | @@ -238,7 +237,6 @@ def test_resolve_query_language_prefers_log_field(): |
| 238 | 237 | assert conflict is False |
| 239 | 238 | |
| 240 | 239 | |
| 241 | -@pytest.mark.unit | |
| 242 | 240 | def test_resolve_query_language_uses_request_params_when_log_missing(): |
| 243 | 241 | fake_es = FakeESClient() |
| 244 | 242 | builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None) |
| ... | ... | @@ -256,7 +254,6 @@ def test_resolve_query_language_uses_request_params_when_log_missing(): |
| 256 | 254 | assert conflict is False |
| 257 | 255 | |
| 258 | 256 | |
| 259 | -@pytest.mark.unit | |
| 260 | 257 | def test_resolve_query_language_fallback_to_primary(): |
| 261 | 258 | fake_es = FakeESClient() |
| 262 | 259 | builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None) |
| ... | ... | @@ -272,7 +269,6 @@ def test_resolve_query_language_fallback_to_primary(): |
| 272 | 269 | assert conflict is False |
| 273 | 270 | |
| 274 | 271 | |
| 275 | -@pytest.mark.unit | |
| 276 | 272 | def test_suggestion_service_basic_flow_uses_alias_and_routing(): |
| 277 | 273 | from config import tenant_config_loader as tcl |
| 278 | 274 | |
| ... | ... | @@ -309,7 +305,6 @@ def test_suggestion_service_basic_flow_uses_alias_and_routing(): |
| 309 | 305 | assert any(x.get("index") == alias_name for x in search_calls) |
| 310 | 306 | |
| 311 | 307 | |
| 312 | -@pytest.mark.unit | |
| 313 | 308 | def test_publish_alias_and_cleanup_old_versions(monkeypatch): |
| 314 | 309 | fake_es = FakeESClient() |
| 315 | 310 | builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None) |
| ... | ... | @@ -338,7 +333,6 @@ def test_publish_alias_and_cleanup_old_versions(monkeypatch): |
| 338 | 333 | assert "search_suggestions_tenant_162_v20260310170000" not in fake_es.indices |
| 339 | 334 | |
| 340 | 335 | |
| 341 | -@pytest.mark.unit | |
| 342 | 336 | def test_incremental_bootstrap_when_no_active_index(monkeypatch): |
| 343 | 337 | fake_es = FakeESClient() |
| 344 | 338 | builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None) |
| ... | ... | @@ -363,7 +357,6 @@ def test_incremental_bootstrap_when_no_active_index(monkeypatch): |
| 363 | 357 | assert result["bootstrap_result"]["mode"] == "full" |
| 364 | 358 | |
| 365 | 359 | |
| 366 | -@pytest.mark.unit | |
| 367 | 360 | def test_incremental_updates_existing_index(monkeypatch): |
| 368 | 361 | fake_es = FakeESClient() |
| 369 | 362 | builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None) |
| ... | ... | @@ -419,7 +412,6 @@ def test_incremental_updates_existing_index(monkeypatch): |
| 419 | 412 | assert len(bulk_calls[0]["actions"]) == 1 |
| 420 | 413 | |
| 421 | 414 | |
| 422 | -@pytest.mark.unit | |
| 423 | 415 | def test_build_full_candidates_fallback_to_id_when_spu_id_missing(monkeypatch): |
| 424 | 416 | fake_es = FakeESClient() |
| 425 | 417 | builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None) |
| ... | ... | @@ -459,7 +451,6 @@ def test_build_full_candidates_fallback_to_id_when_spu_id_missing(monkeypatch): |
| 459 | 451 | assert key_to_candidate[qanchor_key].qanchor_spu_ids == {"521"} |
| 460 | 452 | |
| 461 | 453 | |
| 462 | -@pytest.mark.unit | |
| 463 | 454 | def test_build_full_candidates_tags_and_qanchor_phrases(monkeypatch): |
| 464 | 455 | fake_es = FakeESClient() |
| 465 | 456 | builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None) |
| ... | ... | @@ -509,7 +500,6 @@ def test_build_full_candidates_tags_and_qanchor_phrases(monkeypatch): |
| 509 | 500 | assert ("en", "ribbed neckline") in key_to_candidate |
| 510 | 501 | |
| 511 | 502 | |
| 512 | -@pytest.mark.unit | |
| 513 | 503 | def test_build_full_candidates_splits_long_title_for_suggest(monkeypatch): |
| 514 | 504 | fake_es = FakeESClient() |
| 515 | 505 | builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None) |
| ... | ... | @@ -542,7 +532,6 @@ def test_build_full_candidates_splits_long_title_for_suggest(monkeypatch): |
| 542 | 532 | assert key_to_candidate[key].text == "Furby Furblets 2-Pack" |
| 543 | 533 | |
| 544 | 534 | |
| 545 | -@pytest.mark.unit | |
| 546 | 535 | def test_iter_products_requests_dual_sort_and_fields(): |
| 547 | 536 | fake_es = FakeESClient() |
| 548 | 537 | builder = SuggestionIndexBuilder(es_client=fake_es, db_engine=None) | ... | ... |
tests/test_tokenization.py
tests/test_translation_converter_resolution.py
tests/test_translation_deepl_backend.py
tests/test_translation_llm_backend.py
tests/test_translation_local_backends.py
| ... | ... | @@ -9,6 +9,8 @@ from translation.languages import build_nllb_language_catalog, resolve_nllb_lang |
| 9 | 9 | from translation.service import TranslationService |
| 10 | 10 | from translation.text_splitter import compute_safe_input_token_limit, split_text_for_translation |
| 11 | 11 | |
| 12 | +pytestmark = [pytest.mark.translation, pytest.mark.regression] | |
| 13 | + | |
| 12 | 14 | |
| 13 | 15 | class _FakeBatch(dict): |
| 14 | 16 | def to(self, device): | ... | ... |
tests/test_translator_failure_semantics.py
| ... | ... | @@ -11,6 +11,8 @@ from translation.logging_utils import ( |
| 11 | 11 | from translation.service import TranslationService |
| 12 | 12 | from translation.settings import build_translation_config, translation_cache_probe_models |
| 13 | 13 | |
| 14 | +pytestmark = [pytest.mark.translation, pytest.mark.regression] | |
| 15 | + | |
| 14 | 16 | |
| 15 | 17 | class _FakeCache: |
| 16 | 18 | def __init__(self): | ... | ... |
translation/prompts.py
| ... | ... | @@ -30,6 +30,18 @@ TRANSLATION_PROMPTS: Dict[str, Dict[str, str]] = { |
| 30 | 30 | "it": "Sei un traduttore ecommerce da {source_lang} ({src_lang_code}) a {target_lang} ({tgt_lang_code}). Traduce in un nome SKU prodotto {target_lang} conciso e accurato, restituisci solo il risultato: {text}", |
| 31 | 31 | "pt": "Você é um tradutor de e-commerce de {source_lang} ({src_lang_code}) para {target_lang} ({tgt_lang_code}). Traduza para um nome SKU de produto {target_lang} conciso e preciso, produza apenas o resultado: {text}", |
| 32 | 32 | }, |
| 33 | + "sku_attribute": { | |
| 34 | + "zh": "你是一名专业的 {source_lang}({src_lang_code})到 {target_lang}({tgt_lang_code})电商翻译专家,请将原文翻译为{target_lang}商品SKU属性值(如颜色、尺码、材质等),要求简洁准确、符合属性展示习惯,只输出结果:{text}", | |
| 35 | + "en": "You are a professional {source_lang} ({src_lang_code}) to {target_lang} ({tgt_lang_code}) ecommerce translator. Translate into concise {target_lang} product SKU attribute values (e.g. color, size, material), suitable for attribute display, output only the result: {text}", | |
| 36 | + "ru": "Вы переводчик e-commerce с {source_lang} ({src_lang_code}) на {target_lang} ({tgt_lang_code}). Переведите в краткие и точные значения атрибутов SKU на {target_lang} (цвет, размер, материал и т.п.), выводите только результат: {text}", | |
| 37 | + "ar": "أنت مترجم تجارة إلكترونية من {source_lang} ({src_lang_code}) إلى {target_lang} ({tgt_lang_code}). ترجم إلى قيم سمات SKU للمنتج بلغة {target_lang} (مثل اللون والمقاس والخامة) بإيجاز ودقة، وأخرج النتيجة فقط: {text}", | |
| 38 | + "ja": "{source_lang}({src_lang_code})から {target_lang}({tgt_lang_code})へのEC翻訳者として、商品SKUの属性値(色・サイズ・素材など)に簡潔かつ正確に翻訳し、結果のみ出力してください:{text}", | |
| 39 | + "es": "Eres un traductor ecommerce de {source_lang} ({src_lang_code}) a {target_lang} ({tgt_lang_code}). Traduce a valores de atributo SKU de producto en {target_lang} (color, talla, material, etc.), concisos y precisos, devuelve solo el resultado: {text}", | |
| 40 | + "de": "Du bist ein E-Commerce-Übersetzer von {source_lang} ({src_lang_code}) nach {target_lang} ({tgt_lang_code}). Übersetze in präzise {target_lang} SKU-Produktattributwerte (z. B. Farbe, Größe, Material), nur Ergebnis ausgeben: {text}", | |
| 41 | + "fr": "Vous êtes un traducteur e-commerce de {source_lang} ({src_lang_code}) vers {target_lang} ({tgt_lang_code}). Traduisez en valeurs d'attributs SKU produit {target_lang} (couleur, taille, matière, etc.), concises et précises, sortie uniquement : {text}", | |
| 42 | + "it": "Sei un traduttore ecommerce da {source_lang} ({src_lang_code}) a {target_lang} ({tgt_lang_code}). Traduci in valori di attributo SKU prodotto {target_lang} (colore, taglia, materiale, ecc.), concisi e accurati, restituisci solo il risultato: {text}", | |
| 43 | + "pt": "Você é um tradutor de e-commerce de {source_lang} ({src_lang_code}) para {target_lang} ({tgt_lang_code}). Traduza para valores de atributo SKU de produto em {target_lang} (cor, tamanho, material etc.), concisos e precisos, produza apenas o resultado: {text}", | |
| 44 | + }, | |
| 33 | 45 | "ecommerce_search_query": { |
| 34 | 46 | "zh": "你是一名专业的 {source_lang}({src_lang_code})到 {target_lang}({tgt_lang_code})翻译助手,请将电商搜索词准确翻译为{target_lang}并符合搜索习惯,只输出结果:{text}", |
| 35 | 47 | "en": "You are a professional {source_lang} ({src_lang_code}) to {target_lang} ({tgt_lang_code}) translator. Translate the ecommerce search query accurately following {target_lang} search habits, output only the result: {text}", |
| ... | ... | @@ -113,6 +125,39 @@ BATCH_TRANSLATION_PROMPTS: Dict[str, Dict[str, str]] = { |
| 113 | 125 | "Входные данные:\n{text}" |
| 114 | 126 | ), |
| 115 | 127 | }, |
| 128 | + "sku_attribute": { | |
| 129 | + "en": ( | |
| 130 | + "Translate each item from {source_lang} ({src_lang_code}) to concise {target_lang} ({tgt_lang_code}) " | |
| 131 | + "product SKU attribute values (e.g. color, size, material).\n" | |
| 132 | + "Accurately preserve the meaning; keep wording short and suitable for attribute display.\n" | |
| 133 | + "Output exactly one line for each input item, in the same order, using this exact format:\n" | |
| 134 | + "1. translation\n" | |
| 135 | + "2. translation\n" | |
| 136 | + "...\n" | |
| 137 | + "Do not explain or output anything else.\n" | |
| 138 | + "Input:\n{text}" | |
| 139 | + ), | |
| 140 | + "zh": ( | |
| 141 | + "将每一项从 {source_lang} ({src_lang_code}) 翻译为简洁的 {target_lang} ({tgt_lang_code}) 商品SKU属性值(如颜色、尺码、材质等)。\n" | |
| 142 | + "准确传达含义,措辞简短,适合属性展示。\n" | |
| 143 | + "请按输入顺序逐行输出,每个输入对应一行,格式必须如下:\n" | |
| 144 | + "1. 翻译结果\n" | |
| 145 | + "2. 翻译结果\n" | |
| 146 | + "...\n" | |
| 147 | + "不要解释或输出其他任何内容。\n" | |
| 148 | + "输入:\n{text}" | |
| 149 | + ), | |
| 150 | + "ru": ( | |
| 151 | + "Переведите каждый элемент с {source_lang} ({src_lang_code}) на краткие значения атрибутов SKU на {target_lang} ({tgt_lang_code}) (цвет, размер, материал и т.п.).\n" | |
| 152 | + "Точно сохраняйте смысл; формулировки должны быть короткими и подходить для отображения атрибутов.\n" | |
| 153 | + "Выводите ровно по одной строке для каждого входного элемента в том же порядке, в следующем формате:\n" | |
| 154 | + "1. перевод\n" | |
| 155 | + "2. перевод\n" | |
| 156 | + "...\n" | |
| 157 | + "Не добавляйте объяснений и ничего лишнего.\n" | |
| 158 | + "Входные данные:\n{text}" | |
| 159 | + ), | |
| 160 | + }, | |
| 116 | 161 | "ecommerce_search_query": { |
| 117 | 162 | "en": ( |
| 118 | 163 | "Translate each item from {source_lang} ({src_lang_code}) to a natural {target_lang} ({tgt_lang_code}) " | ... | ... |
translation/scenes.py