From 3abbc95aef12ebad7dead00036e4c0523b2d5c55 Mon Sep 17 00:00:00 2001 From: tangwang Date: Thu, 9 Apr 2026 23:26:20 +0800 Subject: [PATCH] 重构(scripts): 整理scripts目录,按现架构分类并迁移性能/手动测试脚本 --- CLAUDE.md | 26 ++++++++++++-------------- benchmarks/README.md | 17 +++++++++++++++++ benchmarks/perf_api_benchmark.py | 757 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ benchmarks/perf_cases.json.example | 71 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ benchmarks/reranker/benchmark_reranker_1000docs.sh | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ benchmarks/reranker/benchmark_reranker_gguf_local.py | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ benchmarks/reranker/benchmark_reranker_random_titles.py | 312 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ benchmarks/reranker/manual/curl1.sh | 23 +++++++++++++++++++++++ benchmarks/reranker/manual/curl1_simple.sh | 417 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ benchmarks/reranker/manual/curl2.sh | 26 ++++++++++++++++++++++++++ benchmarks/reranker/manual/rerank_performance_compare.sh | 117 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ benchmarks/reranker/patch_rerank_vllm_benchmark_config.py | 100 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ benchmarks/reranker/run_reranker_vllm_instruction_benchmark.sh | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ benchmarks/reranker/smoke_qwen3_vllm_score_backend.py | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ benchmarks/translation/benchmark_nllb_t4_tuning.py | 318 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ benchmarks/translation/benchmark_translation_local_models.py | 948 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ benchmarks/translation/benchmark_translation_local_models_focus.py | 250 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ benchmarks/translation/benchmark_translation_longtext_single.py | 186 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ config/config.yaml | 6 +++--- docs/DEVELOPER_GUIDE.md | 2 +- docs/QUICKSTART.md | 2 +- docs/Usage-Guide.md | 4 ++-- docs/工作总结-微服务性能优化与架构.md | 12 ++++++------ docs/性能测试报告.md | 16 ++++++++-------- docs/搜索API对接指南-05-索引接口(Indexer).md | 2 +- docs/搜索API对接指南-10-接口级压测脚本.md | 15 +++++++-------- docs/相关性检索优化说明.md | 3 +-- embeddings/README.md | 6 +++--- perf_reports/20260311/reranker_1000docs/report.md | 2 +- perf_reports/20260317/translation_local_models/README.md | 4 ++-- perf_reports/20260318/nllb_t4_product_names_ct2/README.md | 2 +- perf_reports/20260318/translation_local_models/README.md | 4 ++-- perf_reports/20260318/translation_local_models_ct2/README.md | 4 ++-- perf_reports/20260318/translation_local_models_ct2_focus/README.md | 2 +- perf_reports/README.md | 10 +++++----- perf_reports/reranker_vllm_instruction/2026-03-25/RESULTS.md | 10 +++++----- reranker/DEPLOYMENT_AND_TUNING.md | 2 +- reranker/GGUF_0_6B_INSTALL_AND_TUNING.md | 2 +- reranker/GGUF_INSTALL_AND_TUNING.md | 10 +++++----- reranker/README.md | 8 ++++---- scripts/README.md | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++ scripts/benchmark_nllb_t4_tuning.py | 318 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ scripts/benchmark_reranker_1000docs.sh | 130 ---------------------------------------------------------------------------------------------------------------------------------- scripts/benchmark_reranker_gguf_local.py | 198 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ scripts/benchmark_reranker_random_titles.py | 312 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ scripts/benchmark_translation_local_models.py | 948 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ scripts/benchmark_translation_local_models_focus.py | 250 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- scripts/benchmark_translation_longtext_single.py | 186 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ scripts/debug/trace_indexer_calls.sh | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ scripts/indexer__old_2025_11/import_tenant2_csv.py | 495 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- scripts/indexer__old_2025_11/import_test_data.py | 277 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- scripts/indexer__old_2025_11/ingest.sh | 92 -------------------------------------------------------------------------------------------- scripts/indexer__old_2025_11/ingest_shoplazza.py | 146 -------------------------------------------------------------------------------------------------------------------------------------------------- scripts/indexer__old_2025_11/recreate_and_import.py | 184 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- scripts/install_server_deps.sh | 14 -------------- scripts/patch_rerank_vllm_benchmark_config.py | 100 ---------------------------------------------------------------------------------------------------- scripts/perf_api_benchmark.py | 757 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- scripts/perf_cases.json.example | 71 ----------------------------------------------------------------------- scripts/reindex_from_remote_tenant_170_to_0.sh | 99 --------------------------------------------------------------------------------------------------- scripts/run_reranker_vllm_instruction_benchmark.sh | 89 ----------------------------------------------------------------------------------------- scripts/smoke_qwen3_vllm_score_backend.py | 76 ---------------------------------------------------------------------------- scripts/start.sh | 10 ---------- scripts/test_build_docs_api.py | 159 --------------------------------------------------------------------------------------------------------------------------------------------------------------- scripts/trace_indexer_calls.sh | 76 ---------------------------------------------------------------------------- tests/manual/README.md | 5 +++++ tests/manual/test_build_docs_api.py | 159 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/reranker_performance/curl1.sh | 23 ----------------------- tests/reranker_performance/curl1_simple.sh | 417 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- tests/reranker_performance/curl2.sh | 26 -------------------------- tests/reranker_performance/rerank_performance_compare.sh | 117 --------------------------------------------------------------------------------------------------------------------- translation/README.md | 16 ++++++++-------- 71 files changed, 4411 insertions(+), 5657 deletions(-) create mode 100644 benchmarks/README.md create mode 100755 benchmarks/perf_api_benchmark.py create mode 100644 benchmarks/perf_cases.json.example create mode 100755 benchmarks/reranker/benchmark_reranker_1000docs.sh create mode 100644 benchmarks/reranker/benchmark_reranker_gguf_local.py create mode 100755 benchmarks/reranker/benchmark_reranker_random_titles.py create mode 100644 benchmarks/reranker/manual/curl1.sh create mode 100644 benchmarks/reranker/manual/curl1_simple.sh create mode 100644 benchmarks/reranker/manual/curl2.sh create mode 100644 benchmarks/reranker/manual/rerank_performance_compare.sh create mode 100755 benchmarks/reranker/patch_rerank_vllm_benchmark_config.py create mode 100755 benchmarks/reranker/run_reranker_vllm_instruction_benchmark.sh create mode 100644 benchmarks/reranker/smoke_qwen3_vllm_score_backend.py create mode 100644 benchmarks/translation/benchmark_nllb_t4_tuning.py create mode 100644 benchmarks/translation/benchmark_translation_local_models.py create mode 100644 benchmarks/translation/benchmark_translation_local_models_focus.py create mode 100644 benchmarks/translation/benchmark_translation_longtext_single.py create mode 100644 scripts/README.md delete mode 100644 scripts/benchmark_nllb_t4_tuning.py delete mode 100755 scripts/benchmark_reranker_1000docs.sh delete mode 100644 scripts/benchmark_reranker_gguf_local.py delete mode 100755 scripts/benchmark_reranker_random_titles.py delete mode 100644 scripts/benchmark_translation_local_models.py delete mode 100644 scripts/benchmark_translation_local_models_focus.py delete mode 100644 scripts/benchmark_translation_longtext_single.py create mode 100755 scripts/debug/trace_indexer_calls.sh delete mode 100755 scripts/indexer__old_2025_11/import_tenant2_csv.py delete mode 100644 scripts/indexer__old_2025_11/import_test_data.py delete mode 100755 scripts/indexer__old_2025_11/ingest.sh delete mode 100644 scripts/indexer__old_2025_11/ingest_shoplazza.py delete mode 100755 scripts/indexer__old_2025_11/recreate_and_import.py delete mode 100755 scripts/install_server_deps.sh delete mode 100755 scripts/patch_rerank_vllm_benchmark_config.py delete mode 100755 scripts/perf_api_benchmark.py delete mode 100644 scripts/perf_cases.json.example delete mode 100755 scripts/reindex_from_remote_tenant_170_to_0.sh delete mode 100755 scripts/run_reranker_vllm_instruction_benchmark.sh delete mode 100644 scripts/smoke_qwen3_vllm_score_backend.py delete mode 100755 scripts/start.sh delete mode 100644 scripts/test_build_docs_api.py delete mode 100755 scripts/trace_indexer_calls.sh create mode 100644 tests/manual/README.md create mode 100644 tests/manual/test_build_docs_api.py delete mode 100644 tests/reranker_performance/curl1.sh delete mode 100644 tests/reranker_performance/curl1_simple.sh delete mode 100644 tests/reranker_performance/curl2.sh delete mode 100644 tests/reranker_performance/rerank_performance_compare.sh diff --git a/CLAUDE.md b/CLAUDE.md index 1e67aad..97c3257 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -77,9 +77,11 @@ source activate.sh # Generate test data (Tenant1 Mock + Tenant2 CSV) ./scripts/mock_data.sh -# Ingest data to Elasticsearch -./scripts/ingest.sh [recreate] # e.g., ./scripts/ingest.sh 1 true -python main.py ingest data.csv --limit 1000 --batch-size 50 +# Create tenant index structure +./scripts/create_tenant_index.sh + +# Build / refresh suggestion index +./scripts/build_suggestions.sh --mode incremental ``` ### Running Services @@ -100,10 +102,10 @@ python main.py serve --host 0.0.0.0 --port 6002 --reload # Run all tests pytest tests/ -# Run specific test types -pytest tests/unit/ # Unit tests -pytest tests/integration/ # Integration tests -pytest -m "api" # API tests only +# Run focused regression sets +python -m pytest tests/ci -q +pytest tests/test_rerank_client.py +pytest tests/test_query_parser_mixed_language.py # Test search from command line python main.py search "query" --tenant-id 1 --size 10 @@ -114,12 +116,8 @@ python main.py search "query" --tenant-id 1 --size 10 # Stop all services ./scripts/stop.sh -# Test environment (for CI/development) -./scripts/start_test_environment.sh -./scripts/stop_test_environment.sh - -# Install server dependencies -./scripts/install_server_deps.sh +# Run CI contract tests +./scripts/run_ci_tests.sh ``` ## Architecture Overview @@ -585,7 +583,7 @@ GET /admin/stats # Index statistics ./scripts/start_frontend.sh # Frontend UI (port 6003) # Data Operations -./scripts/ingest.sh [recreate] # Index data +./scripts/create_tenant_index.sh # Create tenant index ./scripts/mock_data.sh # Generate test data # Testing diff --git a/benchmarks/README.md b/benchmarks/README.md new file mode 100644 index 0000000..5268081 --- /dev/null +++ b/benchmarks/README.md @@ -0,0 +1,17 @@ +# Benchmarks + +基准压测脚本统一放在 `benchmarks/`,不再和 `scripts/` 里的服务启动/运维脚本混放。 + +目录约定: + +- `benchmarks/perf_api_benchmark.py`:通用 HTTP 接口压测入口 +- `benchmarks/reranker/`:reranker 定向 benchmark、smoke、手工对比脚本 +- `benchmarks/translation/`:translation 本地模型 benchmark + +这些脚本默认不是 CI 测试的一部分,因为它们通常具备以下特征: + +- 依赖真实服务、GPU、模型或特定数据集 +- 结果受机器配置和运行时负载影响,不适合作为稳定回归门禁 +- 更多用于容量评估、调参和问题复现,而不是功能正确性判定 + +如果某个性能场景需要进入自动化回归,应新增到 `tests/` 下并明确收敛输入、环境和判定阈值,而不是直接复用这里的基准脚本。 diff --git a/benchmarks/perf_api_benchmark.py b/benchmarks/perf_api_benchmark.py new file mode 100755 index 0000000..339e528 --- /dev/null +++ b/benchmarks/perf_api_benchmark.py @@ -0,0 +1,757 @@ +#!/usr/bin/env python3 +""" +API-level performance test script for search stack services. + +Default scenarios (aligned with docs/搜索API对接指南 分册,如 -01 / -02 / -07): +- backend_search POST /search/ +- backend_suggest GET /search/suggestions +- embed_text POST /embed/text +- embed_image POST /embed/image +- translate POST /translate +- rerank POST /rerank + +Examples: + python benchmarks/perf_api_benchmark.py --scenario backend_search --duration 30 --concurrency 20 --tenant-id 162 + python benchmarks/perf_api_benchmark.py --scenario backend_suggest --duration 30 --concurrency 50 --tenant-id 162 + python benchmarks/perf_api_benchmark.py --scenario all --duration 60 --concurrency 80 --tenant-id 162 + python benchmarks/perf_api_benchmark.py --scenario all --cases-file benchmarks/perf_cases.json.example --output perf_result.json + # Embedding admission / priority (query param `priority`; same semantics as embedding service): + python benchmarks/perf_api_benchmark.py --scenario embed_text --embed-text-priority 1 --duration 30 --concurrency 20 + python benchmarks/perf_api_benchmark.py --scenario embed_image --embed-image-priority 1 --duration 30 --concurrency 10 +""" + +from __future__ import annotations + +import argparse +import asyncio +import json +import math +import random +import statistics +import time +from dataclasses import dataclass +from pathlib import Path +from typing import Any, Dict, List, Optional, Tuple + +import httpx + + +@dataclass +class RequestTemplate: + method: str + path: str + params: Optional[Dict[str, Any]] = None + json_body: Optional[Any] = None + headers: Optional[Dict[str, str]] = None + + +@dataclass +class Scenario: + name: str + templates: List[RequestTemplate] + timeout_sec: float + + +@dataclass +class RequestResult: + ok: bool + status_code: int + latency_ms: float + error: str = "" + + +def _is_finite_number(v: Any) -> bool: + if isinstance(v, bool): + return False + if isinstance(v, (int, float)): + return math.isfinite(float(v)) + return False + + +def validate_response_payload( + scenario_name: str, + tpl: RequestTemplate, + payload: Any, +) -> Tuple[bool, str]: + """ + Lightweight payload validation for correctness-aware perf tests. + Strict for embed_text / embed_image to catch NaN/null vector regressions. + """ + if scenario_name not in ("embed_text", "embed_image"): + return True, "" + + expected_len = len(tpl.json_body) if isinstance(tpl.json_body, list) else None + if not isinstance(payload, list): + return False, "invalid_payload_non_list" + if expected_len is not None and len(payload) != expected_len: + return False, "invalid_payload_length" + if len(payload) == 0: + return False, "invalid_payload_empty" + + for i, vec in enumerate(payload): + if not isinstance(vec, list) or len(vec) == 0: + return False, f"invalid_vector_{i}_shape" + for x in vec: + if not _is_finite_number(x): + return False, f"invalid_vector_{i}_non_finite" + return True, "" + + +def percentile(sorted_values: List[float], p: float) -> float: + if not sorted_values: + return 0.0 + if p <= 0: + return sorted_values[0] + if p >= 100: + return sorted_values[-1] + rank = (len(sorted_values) - 1) * (p / 100.0) + low = int(math.floor(rank)) + high = int(math.ceil(rank)) + if low == high: + return sorted_values[low] + weight = rank - low + return sorted_values[low] * (1.0 - weight) + sorted_values[high] * weight + + +def make_default_templates(tenant_id: str) -> Dict[str, List[RequestTemplate]]: + return { + "backend_search": [ + RequestTemplate( + method="POST", + path="/search/", + headers={"X-Tenant-ID": tenant_id}, + json_body={"query": "wireless mouse", "size": 10, "language": "en"}, + ), + RequestTemplate( + method="POST", + path="/search/", + headers={"X-Tenant-ID": tenant_id}, + json_body={"query": "芭比娃娃", "size": 10, "language": "zh"}, + ), + RequestTemplate( + method="POST", + path="/search/", + headers={"X-Tenant-ID": tenant_id}, + json_body={"query": "f", "size": 10, "language": "en"}, + ), + ], + "backend_suggest": [ + RequestTemplate( + method="GET", + path="/search/suggestions", + headers={"X-Tenant-ID": tenant_id}, + params={"q": "f", "size": 10, "language": "en"}, + ), + RequestTemplate( + method="GET", + path="/search/suggestions", + headers={"X-Tenant-ID": tenant_id}, + params={"q": "玩", "size": 10, "language": "zh"}, + ), + RequestTemplate( + method="GET", + path="/search/suggestions", + headers={"X-Tenant-ID": tenant_id}, + params={"q": "shi", "size": 10, "language": "en"}, + ), + ], + "embed_text": [ + RequestTemplate( + method="POST", + path="/embed/text", + json_body=["wireless mouse", "gaming keyboard", "barbie doll"], + ) + ], + "embed_image": [ + RequestTemplate( + method="POST", + path="/embed/image", + json_body=["/data/saas-search/docs/image-dress1.png"], + ) + ], + "translate": [ + RequestTemplate( + method="POST", + path="/translate", + json_body={"text": "商品名称", "target_lang": "en", "source_lang": "zh", "model": "qwen"}, + ), + RequestTemplate( + method="POST", + path="/translate", + json_body={"text": "Product title", "target_lang": "zh", "model": "qwen"}, + ), + ], + "rerank": [ + RequestTemplate( + method="POST", + path="/rerank", + json_body={ + "query": "wireless mouse", + "docs": [ + "Wireless ergonomic mouse with rechargeable battery", + "USB-C cable 1m", + "Gaming mouse 26000 DPI", + ], + "normalize": True, + }, + ) + ], + } + + +def load_cases_from_file(path: Path, tenant_id: str) -> Dict[str, List[RequestTemplate]]: + data = json.loads(path.read_text(encoding="utf-8")) + out: Dict[str, List[RequestTemplate]] = {} + for scenario_name, requests_data in (data.get("scenarios") or {}).items(): + templates: List[RequestTemplate] = [] + for item in requests_data: + headers = dict(item.get("headers") or {}) + if "X-Tenant-ID" in headers and str(headers["X-Tenant-ID"]).strip() == "${tenant_id}": + headers["X-Tenant-ID"] = tenant_id + templates.append( + RequestTemplate( + method=str(item.get("method", "GET")).upper(), + path=str(item.get("path", "")).strip(), + params=item.get("params"), + json_body=item.get("json"), + headers=headers or None, + ) + ) + if templates: + out[scenario_name] = templates + return out + + +def apply_embed_priority_params( + scenarios: Dict[str, Scenario], + embed_text_priority: int, + embed_image_priority: int, +) -> None: + """ + Merge default `priority` query param into embed templates when absent. + `benchmarks/perf_cases.json` may set per-request `params.priority` to override. + """ + mapping = { + "embed_text": max(0, int(embed_text_priority)), + "embed_image": max(0, int(embed_image_priority)), + } + for name, pri in mapping.items(): + if name not in scenarios: + continue + scen = scenarios[name] + new_templates: List[RequestTemplate] = [] + for t in scen.templates: + params = dict(t.params or {}) + params.setdefault("priority", str(pri)) + new_templates.append( + RequestTemplate( + method=t.method, + path=t.path, + params=params, + json_body=t.json_body, + headers=t.headers, + ) + ) + scenarios[name] = Scenario( + name=scen.name, + templates=new_templates, + timeout_sec=scen.timeout_sec, + ) + + +def build_scenarios(args: argparse.Namespace) -> Dict[str, Scenario]: + defaults = make_default_templates(args.tenant_id) + if args.cases_file: + custom = load_cases_from_file(Path(args.cases_file), tenant_id=args.tenant_id) + defaults.update(custom) + + scenario_base = { + "backend_search": args.backend_base, + "backend_suggest": args.backend_base, + "embed_text": args.embedding_text_base, + "embed_image": args.embedding_image_base, + "translate": args.translator_base, + "rerank": args.reranker_base, + } + + scenarios: Dict[str, Scenario] = {} + for name, templates in defaults.items(): + if name not in scenario_base: + continue + base = scenario_base[name].rstrip("/") + rewritten: List[RequestTemplate] = [] + for t in templates: + path = t.path if t.path.startswith("/") else f"/{t.path}" + rewritten.append( + RequestTemplate( + method=t.method, + path=f"{base}{path}", + params=t.params, + json_body=t.json_body, + headers=t.headers, + ) + ) + scenarios[name] = Scenario(name=name, templates=rewritten, timeout_sec=args.timeout) + apply_embed_priority_params( + scenarios, + embed_text_priority=args.embed_text_priority, + embed_image_priority=args.embed_image_priority, + ) + return scenarios + + +async def run_single_scenario( + scenario: Scenario, + duration_sec: int, + concurrency: int, + max_requests: int, + max_errors: int, + rerank_dynamic_cfg: Optional[Dict[str, Any]] = None, +) -> Dict[str, Any]: + latencies: List[float] = [] + status_counter: Dict[int, int] = {} + err_counter: Dict[str, int] = {} + total_requests = 0 + success_requests = 0 + stop_flag = False + lock = asyncio.Lock() + start = time.perf_counter() + + timeout = httpx.Timeout(timeout=scenario.timeout_sec) + limits = httpx.Limits(max_connections=max(concurrency * 2, 20), max_keepalive_connections=max(concurrency, 10)) + + async def worker(worker_id: int, client: httpx.AsyncClient) -> None: + nonlocal total_requests, success_requests, stop_flag + idx = worker_id % len(scenario.templates) + worker_rng: Optional[random.Random] = None + if rerank_dynamic_cfg is not None: + worker_rng = random.Random(int(rerank_dynamic_cfg["seed"]) + worker_id) + + while not stop_flag: + elapsed = time.perf_counter() - start + if duration_sec > 0 and elapsed >= duration_sec: + break + + async with lock: + if max_requests > 0 and total_requests >= max_requests: + stop_flag = True + break + total_requests += 1 + + tpl = scenario.templates[idx % len(scenario.templates)] + idx += 1 + + t0 = time.perf_counter() + ok = False + status = 0 + err = "" + try: + req_json_body = tpl.json_body + if rerank_dynamic_cfg is not None and worker_rng is not None: + req_json_body = build_random_rerank_payload(rerank_dynamic_cfg, worker_rng) + resp = await client.request( + method=tpl.method, + url=tpl.path, + params=tpl.params, + json=req_json_body, + headers=tpl.headers, + ) + status = int(resp.status_code) + ok = 200 <= status < 300 + if ok: + try: + payload = resp.json() + except Exception: + ok = False + err = "invalid_json_response" + else: + valid, reason = validate_response_payload( + scenario_name=scenario.name, + tpl=tpl, + payload=payload, + ) + if not valid: + ok = False + err = reason or "invalid_payload" + if not ok and not err: + err = f"http_{status}" + except Exception as e: + err = type(e).__name__ + t1 = time.perf_counter() + cost_ms = (t1 - t0) * 1000.0 + + async with lock: + latencies.append(cost_ms) + if status: + status_counter[status] = status_counter.get(status, 0) + 1 + if ok: + success_requests += 1 + else: + err_counter[err or "unknown"] = err_counter.get(err or "unknown", 0) + 1 + total_err = sum(err_counter.values()) + if max_errors > 0 and total_err >= max_errors: + stop_flag = True + + async with httpx.AsyncClient(timeout=timeout, limits=limits) as client: + tasks = [asyncio.create_task(worker(i, client)) for i in range(concurrency)] + await asyncio.gather(*tasks) + + elapsed = max(time.perf_counter() - start, 1e-9) + lat_sorted = sorted(latencies) + + result = { + "scenario": scenario.name, + "duration_sec": round(elapsed, 3), + "total_requests": total_requests, + "success_requests": success_requests, + "failed_requests": max(total_requests - success_requests, 0), + "success_rate": round((success_requests / total_requests) * 100.0, 2) if total_requests else 0.0, + "throughput_rps": round(total_requests / elapsed, 2), + "latency_ms": { + "avg": round(statistics.mean(lat_sorted), 2) if lat_sorted else 0.0, + "p50": round(percentile(lat_sorted, 50), 2), + "p90": round(percentile(lat_sorted, 90), 2), + "p95": round(percentile(lat_sorted, 95), 2), + "p99": round(percentile(lat_sorted, 99), 2), + "max": round(max(lat_sorted), 2) if lat_sorted else 0.0, + }, + "status_codes": dict(sorted(status_counter.items(), key=lambda x: x[0])), + "errors": dict(sorted(err_counter.items(), key=lambda x: x[0])), + } + return result + + +def format_summary(result: Dict[str, Any]) -> str: + lines = [] + lines.append(f"\\n=== Scenario: {result['scenario']} ===") + lines.append( + "requests={total_requests} success={success_requests} fail={failed_requests} " + "success_rate={success_rate}% rps={throughput_rps}".format(**result) + ) + lat = result["latency_ms"] + lines.append( + f"latency(ms): avg={lat['avg']} p50={lat['p50']} p90={lat['p90']} p95={lat['p95']} p99={lat['p99']} max={lat['max']}" + ) + lines.append(f"status_codes: {result['status_codes']}") + if result["errors"]: + lines.append(f"errors: {result['errors']}") + return "\\n".join(lines) + + +def aggregate_results(results: List[Dict[str, Any]]) -> Dict[str, Any]: + if not results: + return {} + total_requests = sum(x["total_requests"] for x in results) + success_requests = sum(x["success_requests"] for x in results) + failed_requests = sum(x["failed_requests"] for x in results) + total_duration = sum(x["duration_sec"] for x in results) + weighted_avg_latency = 0.0 + if total_requests > 0: + weighted_avg_latency = sum(x["latency_ms"]["avg"] * x["total_requests"] for x in results) / total_requests + + return { + "scenario": "ALL", + "total_requests": total_requests, + "success_requests": success_requests, + "failed_requests": failed_requests, + "success_rate": round((success_requests / total_requests) * 100.0, 2) if total_requests else 0.0, + "aggregate_rps": round(total_requests / max(total_duration, 1e-9), 2), + "weighted_avg_latency_ms": round(weighted_avg_latency, 2), + } + + +def parse_csv_items(raw: str) -> List[str]: + return [x.strip() for x in str(raw or "").split(",") if x.strip()] + + +def parse_csv_ints(raw: str) -> List[int]: + values: List[int] = [] + seen = set() + for item in parse_csv_items(raw): + try: + value = int(item) + except ValueError as exc: + raise ValueError(f"Invalid integer in CSV list: {item}") from exc + if value <= 0: + raise ValueError(f"Concurrency must be > 0, got {value}") + if value in seen: + continue + seen.add(value) + values.append(value) + return values + + +def parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser(description="Interface-level load test for search and related microservices") + parser.add_argument( + "--scenario", + type=str, + default="all", + help="Scenario: backend_search | backend_suggest | embed_text | embed_image | translate | rerank | all | comma-separated list", + ) + parser.add_argument("--tenant-id", type=str, default="162", help="Tenant ID for backend search/suggest") + parser.add_argument("--duration", type=int, default=30, help="Duration seconds per scenario; <=0 means no duration cap") + parser.add_argument("--concurrency", type=int, default=20, help="Concurrent workers per scenario") + parser.add_argument("--max-requests", type=int, default=0, help="Stop after N requests per scenario (0 means unlimited)") + parser.add_argument("--timeout", type=float, default=10.0, help="Request timeout seconds") + parser.add_argument("--max-errors", type=int, default=0, help="Stop scenario when accumulated errors reach this value") + + parser.add_argument("--backend-base", type=str, default="http://127.0.0.1:6002", help="Base URL for backend search API") + parser.add_argument("--embedding-text-base", type=str, default="http://127.0.0.1:6005", help="Base URL for text embedding service") + parser.add_argument("--embedding-image-base", type=str, default="http://127.0.0.1:6008", help="Base URL for image embedding service") + parser.add_argument("--translator-base", type=str, default="http://127.0.0.1:6006", help="Base URL for translation service") + parser.add_argument("--reranker-base", type=str, default="http://127.0.0.1:6007", help="Base URL for reranker service") + + parser.add_argument("--cases-file", type=str, default="", help="Optional JSON file to override/add request templates") + parser.add_argument("--output", type=str, default="", help="Optional output JSON path") + parser.add_argument("--pause", type=float, default=0.0, help="Pause seconds between scenarios in all mode") + parser.add_argument( + "--concurrency-list", + type=str, + default="", + help="Comma-separated concurrency list (e.g. 1,5,10,20). If set, overrides --concurrency.", + ) + parser.add_argument( + "--rerank-dynamic-docs", + action="store_true", + help="For rerank scenario, generate docs payload dynamically on every request.", + ) + parser.add_argument("--rerank-doc-count", type=int, default=386, help="Doc count per rerank request when dynamic docs are enabled") + parser.add_argument("--rerank-vocab-size", type=int, default=1000, help="Word pool size for rerank dynamic docs generation") + parser.add_argument("--rerank-sentence-min-words", type=int, default=15, help="Minimum words per generated doc sentence") + parser.add_argument("--rerank-sentence-max-words", type=int, default=40, help="Maximum words per generated doc sentence") + parser.add_argument("--rerank-query", type=str, default="wireless mouse", help="Fixed query used for rerank dynamic docs mode") + parser.add_argument("--rerank-seed", type=int, default=20260312, help="Base random seed for rerank dynamic docs mode") + parser.add_argument( + "--rerank-top-n", + type=int, + default=0, + help="Optional top_n for rerank requests in dynamic docs mode (0 means omit top_n).", + ) + parser.add_argument( + "--embed-text-priority", + type=int, + default=0, + help="Default query param priority= for embed_text (0=offline admission; >0 bypasses rejection). Merged into params unless set in --cases-file.", + ) + parser.add_argument( + "--embed-image-priority", + type=int, + default=0, + help="Default query param priority= for embed_image (same semantics as embed-text-priority).", + ) + return parser.parse_args() + + +def build_rerank_dynamic_cfg(args: argparse.Namespace) -> Dict[str, Any]: + min_words = int(args.rerank_sentence_min_words) + max_words = int(args.rerank_sentence_max_words) + doc_count = int(args.rerank_doc_count) + vocab_size = int(args.rerank_vocab_size) + if doc_count <= 0: + raise ValueError(f"rerank-doc-count must be > 0, got {doc_count}") + if vocab_size <= 0: + raise ValueError(f"rerank-vocab-size must be > 0, got {vocab_size}") + if min_words <= 0: + raise ValueError(f"rerank-sentence-min-words must be > 0, got {min_words}") + if max_words < min_words: + raise ValueError( + f"rerank-sentence-max-words must be >= rerank-sentence-min-words, got {max_words} < {min_words}" + ) + if args.rerank_seed < 0: + raise ValueError(f"rerank-seed must be >= 0, got {args.rerank_seed}") + if int(args.rerank_top_n) < 0: + raise ValueError(f"rerank-top-n must be >= 0, got {args.rerank_top_n}") + + # Use deterministic, letter-only pseudo words to avoid long tokenization of numeric strings. + syllables = [ + "al", "an", "ar", "as", "at", "ba", "be", "bi", "bo", "ca", + "ce", "ci", "co", "da", "de", "di", "do", "el", "en", "er", + "fa", "fe", "fi", "fo", "ga", "ge", "gi", "go", "ha", "he", + "hi", "ho", "ia", "ie", "il", "in", "io", "is", "ka", "ke", + "ki", "ko", "la", "le", "li", "lo", "ma", "me", "mi", "mo", + ] + word_pool: List[str] = [] + for a in syllables: + for b in syllables: + word_pool.append(f"{a}{b}") + if len(word_pool) >= vocab_size: + break + if len(word_pool) >= vocab_size: + break + if len(word_pool) < vocab_size: + raise ValueError(f"Unable to generate enough synthetic words: requested={vocab_size}, got={len(word_pool)}") + return { + "query": args.rerank_query, + "doc_count": doc_count, + "min_words": min_words, + "max_words": max_words, + "seed": int(args.rerank_seed), + "normalize": True, + "top_n": int(args.rerank_top_n), + "word_pool": word_pool, + } + + +def build_random_rerank_payload( + cfg: Dict[str, Any], + rng: random.Random, +) -> Dict[str, Any]: + word_pool: List[str] = cfg["word_pool"] + docs = [] + for _ in range(cfg["doc_count"]): + doc_len = rng.randint(cfg["min_words"], cfg["max_words"]) + docs.append(" ".join(rng.choices(word_pool, k=doc_len))) + return { + "query": cfg["query"], + "docs": docs, + "normalize": bool(cfg.get("normalize", True)), + **({"top_n": int(cfg["top_n"])} if int(cfg.get("top_n", 0)) > 0 else {}), + } + + +async def main_async() -> int: + args = parse_args() + scenarios = build_scenarios(args) + + all_names = ["backend_search", "backend_suggest", "embed_text", "embed_image", "translate", "rerank"] + if args.scenario == "all": + run_names = [x for x in all_names if x in scenarios] + else: + requested = parse_csv_items(args.scenario) + if not requested: + print("No scenario specified.") + return 2 + unknown = [name for name in requested if name not in scenarios] + if unknown: + print(f"Unknown scenario(s): {', '.join(unknown)}") + print(f"Available: {', '.join(sorted(scenarios.keys()))}") + return 2 + run_names = requested + + if not run_names: + print("No scenarios to run.") + return 2 + + rerank_dynamic_cfg: Optional[Dict[str, Any]] = None + if args.rerank_dynamic_docs: + try: + rerank_dynamic_cfg = build_rerank_dynamic_cfg(args) + except ValueError as exc: + print(str(exc)) + return 2 + + concurrency_values = [args.concurrency] + if args.concurrency_list: + try: + concurrency_values = parse_csv_ints(args.concurrency_list) + except ValueError as exc: + print(str(exc)) + return 2 + if not concurrency_values: + print("concurrency-list is empty after parsing.") + return 2 + + print("Load test config:") + print(f" scenario={args.scenario}") + print(f" tenant_id={args.tenant_id}") + print(f" duration={args.duration}s") + print(f" concurrency={args.concurrency}") + print(f" concurrency_list={concurrency_values}") + print(f" max_requests={args.max_requests}") + print(f" timeout={args.timeout}s") + print(f" max_errors={args.max_errors}") + print(f" backend_base={args.backend_base}") + print(f" embedding_text_base={args.embedding_text_base}") + print(f" embedding_image_base={args.embedding_image_base}") + print(f" translator_base={args.translator_base}") + print(f" reranker_base={args.reranker_base}") + print(f" embed_text_priority={args.embed_text_priority}") + print(f" embed_image_priority={args.embed_image_priority}") + if args.rerank_dynamic_docs: + print(" rerank_dynamic_docs=True") + print(f" rerank_doc_count={args.rerank_doc_count}") + print(f" rerank_vocab_size={args.rerank_vocab_size}") + print(f" rerank_sentence_words=[{args.rerank_sentence_min_words},{args.rerank_sentence_max_words}]") + print(f" rerank_query={args.rerank_query}") + print(f" rerank_seed={args.rerank_seed}") + print(f" rerank_top_n={args.rerank_top_n}") + + results: List[Dict[str, Any]] = [] + total_jobs = len(run_names) * len(concurrency_values) + job_idx = 0 + for name in run_names: + scenario = scenarios[name] + for c in concurrency_values: + job_idx += 1 + print(f"\\n[{job_idx}/{total_jobs}] running {name} @ concurrency={c} ...") + result = await run_single_scenario( + scenario=scenario, + duration_sec=args.duration, + concurrency=c, + max_requests=args.max_requests, + max_errors=args.max_errors, + rerank_dynamic_cfg=rerank_dynamic_cfg if name == "rerank" else None, + ) + result["concurrency"] = c + print(format_summary(result)) + results.append(result) + + if args.pause > 0 and job_idx < total_jobs: + await asyncio.sleep(args.pause) + + final = { + "timestamp": time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()), + "config": { + "scenario": args.scenario, + "run_names": run_names, + "tenant_id": args.tenant_id, + "duration_sec": args.duration, + "concurrency": args.concurrency, + "concurrency_list": concurrency_values, + "max_requests": args.max_requests, + "timeout_sec": args.timeout, + "max_errors": args.max_errors, + "backend_base": args.backend_base, + "embedding_text_base": args.embedding_text_base, + "embedding_image_base": args.embedding_image_base, + "translator_base": args.translator_base, + "reranker_base": args.reranker_base, + "cases_file": args.cases_file or None, + "rerank_dynamic_docs": args.rerank_dynamic_docs, + "rerank_doc_count": args.rerank_doc_count, + "rerank_vocab_size": args.rerank_vocab_size, + "rerank_sentence_min_words": args.rerank_sentence_min_words, + "rerank_sentence_max_words": args.rerank_sentence_max_words, + "rerank_query": args.rerank_query, + "rerank_seed": args.rerank_seed, + "rerank_top_n": args.rerank_top_n, + "embed_text_priority": args.embed_text_priority, + "embed_image_priority": args.embed_image_priority, + }, + "results": results, + "overall": aggregate_results(results), + } + + print("\\n=== Overall ===") + print(json.dumps(final["overall"], ensure_ascii=False, indent=2)) + + if args.output: + out_path = Path(args.output) + out_path.parent.mkdir(parents=True, exist_ok=True) + out_path.write_text(json.dumps(final, ensure_ascii=False, indent=2), encoding="utf-8") + print(f"Saved JSON report: {out_path}") + + return 0 + + +def main() -> int: + try: + return asyncio.run(main_async()) + except KeyboardInterrupt: + print("Interrupted by user") + return 130 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/benchmarks/perf_cases.json.example b/benchmarks/perf_cases.json.example new file mode 100644 index 0000000..0291dcb --- /dev/null +++ b/benchmarks/perf_cases.json.example @@ -0,0 +1,71 @@ +{ + "scenarios": { + "backend_search": [ + { + "method": "POST", + "path": "/search/", + "headers": {"X-Tenant-ID": "${tenant_id}"}, + "json": {"query": "wireless mouse", "size": 20, "language": "en", "enable_rerank": false} + }, + { + "method": "POST", + "path": "/search/", + "headers": {"X-Tenant-ID": "${tenant_id}"}, + "json": {"query": "芭比娃娃", "size": 20, "language": "zh", "enable_rerank": false} + } + ], + "backend_suggest": [ + { + "method": "GET", + "path": "/search/suggestions", + "headers": {"X-Tenant-ID": "${tenant_id}"}, + "params": {"q": "f", "size": 20, "language": "en"} + }, + { + "method": "GET", + "path": "/search/suggestions", + "headers": {"X-Tenant-ID": "${tenant_id}"}, + "params": {"q": "玩", "size": 20, "language": "zh"} + } + ], + "embed_text": [ + { + "method": "POST", + "path": "/embed/text", + "params": {"priority": "0"}, + "json": ["wireless mouse", "gaming keyboard", "USB-C cable", "barbie doll"] + } + ], + "embed_image": [ + { + "method": "POST", + "path": "/embed/image", + "params": {"normalize": "true", "priority": "0"}, + "json": ["/data/saas-search/docs/image-dress1.png"] + } + ], + "translate": [ + { + "method": "POST", + "path": "/translate", + "json": {"text": "商品标题", "target_lang": "en", "source_lang": "zh", "model": "qwen"} + } + ], + "rerank": [ + { + "method": "POST", + "path": "/rerank", + "json": { + "query": "wireless mouse", + "docs": [ + "Wireless ergonomic mouse", + "Bluetooth gaming mouse", + "USB cable 1 meter", + "Mouse pad large size" + ], + "normalize": true + } + } + ] + } +} diff --git a/benchmarks/reranker/benchmark_reranker_1000docs.sh b/benchmarks/reranker/benchmark_reranker_1000docs.sh new file mode 100755 index 0000000..bb698ee --- /dev/null +++ b/benchmarks/reranker/benchmark_reranker_1000docs.sh @@ -0,0 +1,130 @@ +#!/bin/bash +# +# Benchmark reranker for e-commerce short-text workload: +# - query <= ~100 tokens +# - docs are short title / title+brief +# - one request contains ~1000 docs +# +# Outputs JSON reports under perf_reports//reranker_1000docs/ +# +# Usage: +# ./benchmarks/reranker/benchmark_reranker_1000docs.sh +# Optional env: +# BATCH_SIZES="24 32 48 64" +# C1_REQUESTS=4 +# C4_REQUESTS=8 +# TENANT_ID=162 +# +set -euo pipefail + +PROJECT_ROOT="$(cd "$(dirname "$0")/.." && pwd)" +cd "${PROJECT_ROOT}" + +TENANT_ID="${TENANT_ID:-162}" +BATCH_SIZES="${BATCH_SIZES:-24 32 48 64}" +C1_REQUESTS="${C1_REQUESTS:-4}" +C4_REQUESTS="${C4_REQUESTS:-8}" +TIMEOUT_SEC="${TIMEOUT_SEC:-240}" +RERANK_BASE="${RERANK_BASE:-http://127.0.0.1:6007}" + +DATE_TAG="$(date +%Y%m%d)" +OUT_DIR="perf_reports/${DATE_TAG}/reranker_1000docs" +TMP_CASES="/tmp/rerank_1000_shortdocs_cases.json" +mkdir -p "${OUT_DIR}" + +cleanup() { + ./scripts/service_ctl.sh stop reranker >/dev/null 2>&1 || true +} +trap cleanup EXIT + +cat > "${TMP_CASES}" <<'JSON' +{ + "scenarios": { + "rerank": [ + { + "method": "POST", + "path": "/rerank", + "json": { + "query": "wireless ergonomic gaming mouse for office use with rechargeable battery and bluetooth", + "docs": [], + "normalize": true + } + } + ] + } +} +JSON + +python3 - <<'PY' +import json +from pathlib import Path + +p = Path("/tmp/rerank_1000_shortdocs_cases.json") +d = json.loads(p.read_text(encoding="utf-8")) +docs = [] +for i in range(1000): + if i % 3 == 0: + doc = f"wireless mouse model {i} ergonomic grip 2.4g bluetooth" + elif i % 3 == 1: + doc = f"gaming mouse {i} rgb lightweight high precision sensor" + else: + doc = f"office mouse {i} rechargeable silent click compact" + if i % 5 == 0: + doc += " with usb receiver" + if i % 7 == 0: + doc += " long battery life" + docs.append(doc) + +d["scenarios"]["rerank"][0]["json"]["docs"] = docs +p.write_text(json.dumps(d, ensure_ascii=False), encoding="utf-8") +print(f"[info] generated docs={len(docs)} at {p}") +PY + +run_bench() { + local bs="$1" + local c="$2" + local req="$3" + local out="${OUT_DIR}/rerank_bs${bs}_c${c}_r${req}.json" + .venv/bin/python benchmarks/perf_api_benchmark.py \ + --scenario rerank \ + --tenant-id "${TENANT_ID}" \ + --reranker-base "${RERANK_BASE}" \ + --cases-file "${TMP_CASES}" \ + --concurrency "${c}" \ + --max-requests "${req}" \ + --timeout "${TIMEOUT_SEC}" \ + --output "${out}" >/dev/null + python3 - <"${OUT_DIR}/start_bs${bs}.log" 2>&1 & + + for i in $(seq 1 180); do + if curl -sf "${RERANK_BASE}/health" >/dev/null 2>&1; then + break + fi + sleep 1 + if [ "${i}" -eq 180 ]; then + echo "[error] reranker startup timeout for bs=${bs}" >&2 + tail -n 80 "${OUT_DIR}/start_bs${bs}.log" >&2 || true + exit 1 + fi + done + + run_bench "${bs}" 1 "${C1_REQUESTS}" + run_bench "${bs}" 4 "${C4_REQUESTS}" +done + +echo "[info] benchmark done: ${OUT_DIR}" diff --git a/benchmarks/reranker/benchmark_reranker_gguf_local.py b/benchmarks/reranker/benchmark_reranker_gguf_local.py new file mode 100644 index 0000000..fea1d44 --- /dev/null +++ b/benchmarks/reranker/benchmark_reranker_gguf_local.py @@ -0,0 +1,198 @@ +#!/usr/bin/env python3 +""" +Local tuning probe for GGUF reranker backends. + +Runs the backend directly in a fresh process per config to measure: +- load time +- GPU memory used by this process +- single-request rerank latency + +Example: + ./.venv-reranker-gguf/bin/python benchmarks/reranker/benchmark_reranker_gguf_local.py + ./.venv-reranker-gguf-06b/bin/python benchmarks/reranker/benchmark_reranker_gguf_local.py --backend-name qwen3_gguf_06b --docs 400 +""" + +from __future__ import annotations + +import argparse +import json +import os +import random +import statistics +import subprocess +import sys +import time +from pathlib import Path +from typing import Any + + +DEFAULT_TITLES = Path("/home/ubuntu/rerank_test/titles.1.8w") + + +def load_titles(path: Path) -> list[str]: + items: list[str] = [] + with path.open(encoding="utf-8", errors="replace") as fh: + for line in fh: + text = line.strip() + if text: + items.append(text) + return items + + +def gpu_mem_for_pid(pid: int) -> int: + try: + out = subprocess.check_output( + [ + "nvidia-smi", + "--query-compute-apps=pid,used_gpu_memory", + "--format=csv,noheader,nounits", + ], + text=True, + ) + except Exception: + return -1 + for raw in out.splitlines(): + parts = [p.strip() for p in raw.split(",")] + if len(parts) != 2: + continue + try: + row_pid = int(parts[0]) + row_mem = int(parts[1]) + except ValueError: + continue + if row_pid == pid: + return row_mem + return -1 + + +def main() -> int: + parser = argparse.ArgumentParser() + parser.add_argument("--backend-name", type=str, default="qwen3_gguf") + parser.add_argument("--titles-file", type=Path, default=DEFAULT_TITLES) + parser.add_argument("--query", type=str, default="白色oversized T-shirt") + parser.add_argument("--docs", type=int, default=160) + parser.add_argument("--repeat", type=int, default=1) + parser.add_argument("--seed", type=int, default=42) + parser.add_argument( + "--configs-json", + type=str, + default="", + help="JSON array of config objects; when omitted, uses built-in scan set.", + ) + args = parser.parse_args() + + if not args.titles_file.is_file(): + print(f"missing titles file: {args.titles_file}", file=sys.stderr) + return 2 + + titles = load_titles(args.titles_file) + if len(titles) < args.docs: + print(f"not enough titles: need {args.docs}, got {len(titles)}", file=sys.stderr) + return 2 + + random.seed(args.seed) + docs = random.sample(titles, args.docs) + + if args.configs_json: + configs = json.loads(args.configs_json) + elif args.backend_name == "qwen3_gguf_06b": + configs = [ + {"name": "gguf_06b_full_256", "n_ctx": 256, "n_batch": 256, "n_ubatch": 256, "n_gpu_layers": 999}, + {"name": "gguf_06b_full_320", "n_ctx": 320, "n_batch": 320, "n_ubatch": 320, "n_gpu_layers": 999}, + {"name": "gguf_06b_full_384", "n_ctx": 384, "n_batch": 384, "n_ubatch": 384, "n_gpu_layers": 999}, + {"name": "gguf_06b_full_512", "n_ctx": 512, "n_batch": 512, "n_ubatch": 512, "n_gpu_layers": 999}, + ] + else: + configs = [ + {"name": "gguf_t4_24g", "n_ctx": 384, "n_batch": 384, "n_ubatch": 128, "n_gpu_layers": 24}, + {"name": "gguf_t4_40g", "n_ctx": 384, "n_batch": 384, "n_ubatch": 128, "n_gpu_layers": 40}, + {"name": "gguf_t4_full", "n_ctx": 384, "n_batch": 384, "n_ubatch": 128, "n_gpu_layers": 999}, + {"name": "gguf_t4_full_512", "n_ctx": 512, "n_batch": 512, "n_ubatch": 256, "n_gpu_layers": 999}, + {"name": "gguf_t4_full_512_u512", "n_ctx": 512, "n_batch": 512, "n_ubatch": 512, "n_gpu_layers": 999}, + {"name": "gguf_t4_full_768", "n_ctx": 768, "n_batch": 768, "n_ubatch": 256, "n_gpu_layers": 999}, + ] + + from reranker.backends.qwen3_gguf import Qwen3GGUFRerankerBackend + + default_cfg_by_backend: dict[str, dict[str, Any]] = { + "qwen3_gguf": { + "_backend_name": "qwen3_gguf", + "repo_id": "DevQuasar/Qwen.Qwen3-Reranker-4B-GGUF", + "filename": "*Q8_0.gguf", + "local_dir": "./models/reranker/qwen3-reranker-4b-gguf", + "infer_batch_size": 8, + }, + "qwen3_gguf_06b": { + "_backend_name": "qwen3_gguf_06b", + "repo_id": "ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF", + "filename": "qwen3-reranker-0.6b-q8_0.gguf", + "local_dir": "./models/reranker/qwen3-reranker-0.6b-q8_0-gguf", + "infer_batch_size": 32, + }, + } + if args.backend_name not in default_cfg_by_backend: + print(f"unsupported backend: {args.backend_name}", file=sys.stderr) + return 2 + + base_cfg: dict[str, Any] = { + **default_cfg_by_backend[args.backend_name], + "instruction": "Rank products by query with category & style match prioritized", + "cache_dir": "./model_cache", + "main_gpu": 0, + "n_threads": 2, + "n_threads_batch": 4, + "flash_attn": True, + "offload_kqv": True, + "use_mmap": True, + "use_mlock": False, + "sort_by_doc_length": True, + "length_sort_mode": "char", + "enable_warmup": True, + "verbose": False, + "reuse_query_state": True, + } + + all_results: list[dict[str, Any]] = [] + for cfg in configs: + merged = dict(base_cfg) + merged.update(cfg) + name = str(merged.pop("name")) + + t0 = time.perf_counter() + backend = Qwen3GGUFRerankerBackend(merged) + load_ms = (time.perf_counter() - t0) * 1000.0 + gpu_mem_mib = gpu_mem_for_pid(os.getpid()) + + runs: list[float] = [] + last_meta: dict[str, Any] = {} + for _ in range(args.repeat): + t1 = time.perf_counter() + _scores, meta = backend.score_with_meta(args.query, docs, normalize=True) + runs.append((time.perf_counter() - t1) * 1000.0) + last_meta = dict(meta) + + result = { + "name": name, + "config": merged, + "load_ms": round(load_ms, 2), + "gpu_mem_mib": gpu_mem_mib, + "latency_ms_min": round(min(runs), 2), + "latency_ms_avg": round(statistics.mean(runs), 2), + "latency_ms_max": round(max(runs), 2), + "meta": last_meta, + } + all_results.append(result) + print(json.dumps(result, ensure_ascii=False)) + del backend + + print("SUMMARY") + for item in sorted(all_results, key=lambda x: x["latency_ms_avg"]): + print( + f'{item["name"]}: avg={item["latency_ms_avg"]}ms ' + f'gpu={item["gpu_mem_mib"]}MiB load={item["load_ms"]}ms' + ) + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/benchmarks/reranker/benchmark_reranker_random_titles.py b/benchmarks/reranker/benchmark_reranker_random_titles.py new file mode 100755 index 0000000..4e19435 --- /dev/null +++ b/benchmarks/reranker/benchmark_reranker_random_titles.py @@ -0,0 +1,312 @@ +#!/usr/bin/env python3 +""" +Single-request rerank latency probe using real title lines (e.g. 1.8w export). + +Randomly samples N titles from a text file (one title per line), POSTs to the +rerank HTTP API, prints wall-clock latency. + +Supports multiple N values (comma-separated) and multiple repeats per N. +Each invocation runs 3 warmup requests with n=400 first; those are not timed for summaries. + +Example: + source activate.sh + python benchmarks/reranker/benchmark_reranker_random_titles.py 386 + python benchmarks/reranker/benchmark_reranker_random_titles.py 40,80,100 + python benchmarks/reranker/benchmark_reranker_random_titles.py 40,80,100 --repeat 3 --seed 42 + RERANK_BASE=http://127.0.0.1:6007 python benchmarks/reranker/benchmark_reranker_random_titles.py 200 +""" + +from __future__ import annotations + +import argparse +import json +import os +import random +import statistics +import sys +import time +from pathlib import Path +from typing import List, Optional, Tuple + +import httpx + + +def _load_titles(path: Path) -> List[str]: + lines: List[str] = [] + with path.open(encoding="utf-8", errors="replace") as f: + for line in f: + s = line.strip() + if s: + lines.append(s) + return lines + + +def _parse_doc_counts(s: str) -> List[int]: + parts = [p.strip() for p in s.split(",") if p.strip()] + if not parts: + raise ValueError("empty doc-count list") + out: List[int] = [] + for p in parts: + v = int(p, 10) + if v <= 0: + raise ValueError(f"doc count must be positive, got {v}") + out.append(v) + return out + + +def _do_rerank( + client: httpx.Client, + url: str, + query: str, + docs: List[str], + *, + top_n: int, + normalize: bool, +) -> Tuple[bool, int, float, Optional[int], str]: + payload: dict = {"query": query, "docs": docs, "normalize": normalize} + if top_n > 0: + payload["top_n"] = top_n + body = json.dumps(payload, ensure_ascii=False) + headers = {"Content-Type": "application/json"} + t0 = time.perf_counter() + try: + resp = client.post(url, content=body.encode("utf-8"), headers=headers) + except httpx.HTTPError: + raise + elapsed_ms = (time.perf_counter() - t0) * 1000.0 + text = resp.text or "" + ok = resp.status_code == 200 + scores_len: Optional[int] = None + if ok: + try: + data = resp.json() + sc = data.get("scores") + if isinstance(sc, list): + scores_len = len(sc) + except json.JSONDecodeError: + scores_len = None + return ok, resp.status_code, elapsed_ms, scores_len, text + + +def main() -> int: + parser = argparse.ArgumentParser( + description="POST /rerank with N random titles from a file and print latency." + ) + parser.add_argument( + "n", + type=str, + metavar="N[,N,...]", + help="Doc counts: one integer or comma-separated list, e.g. 40,80,100.", + ) + parser.add_argument( + "--repeat", + type=int, + default=3, + help="Number of runs per doc count (default: 3).", + ) + parser.add_argument( + "--titles-file", + type=Path, + default=Path(os.environ.get("RERANK_TITLE_FILE", "/home/ubuntu/rerank_test/titles.1.8w")), + help="Path to newline-separated titles (default: %(default)s or env RERANK_TITLE_FILE).", + ) + parser.add_argument( + "--url", + type=str, + default=os.environ.get("RERANK_BASE", "http://127.0.0.1:6007").rstrip("/") + "/rerank", + help="Full rerank URL (default: $RERANK_BASE/rerank or http://127.0.0.1:6007/rerank).", + ) + parser.add_argument( + "--query", + type=str, + default="健身女生T恤短袖", + help="Rerank query string.", + ) + parser.add_argument( + "--seed", + type=int, + default=None, + help="RNG base seed; each (n, run) uses a derived seed when set (optional).", + ) + parser.add_argument( + "--top-n", + type=int, + default=0, + help="If > 0, include top_n in JSON body (omit field when 0).", + ) + parser.add_argument( + "--no-normalize", + action="store_true", + help="Send normalize=false (default: normalize=true).", + ) + parser.add_argument( + "--timeout", + type=float, + default=float(os.environ.get("RERANK_TIMEOUT_SEC", "240")), + help="HTTP timeout seconds.", + ) + parser.add_argument( + "--print-body-preview", + action="store_true", + help="Print first ~500 chars of response body on success (last run only).", + ) + parser.add_argument( + "--tag", + type=str, + default=os.environ.get("BENCH_TAG", ""), + help="Optional label stored in --json-summary-out (default: env BENCH_TAG or empty).", + ) + parser.add_argument( + "--json-summary-out", + type=Path, + default=None, + help="Write one JSON object with per-n latencies and aggregates for downstream tables.", + ) + parser.add_argument( + "--quiet-runs", + action="store_true", + help="Suppress per-run lines; still prints warmup lines and text summaries.", + ) + args = parser.parse_args() + + try: + doc_counts = _parse_doc_counts(args.n) + except ValueError as exc: + print(f"error: invalid N list {args.n!r}: {exc}", file=sys.stderr) + return 2 + + repeat = int(args.repeat) + if repeat <= 0: + print("error: --repeat must be positive", file=sys.stderr) + return 2 + + if not args.titles_file.is_file(): + print(f"error: titles file not found: {args.titles_file}", file=sys.stderr) + return 2 + + titles = _load_titles(args.titles_file) + warmup_n = 400 + warmup_runs = 3 + max_n = max(max(doc_counts), warmup_n) + if len(titles) < max_n: + print( + f"error: file has only {len(titles)} non-empty lines, need at least {max_n}", + file=sys.stderr, + ) + return 2 + + top_n = int(args.top_n) + normalize = not args.no_normalize + any_fail = False + summary: dict[int, List[float]] = {n: [] for n in doc_counts} + + with httpx.Client(timeout=args.timeout) as client: + for w in range(warmup_runs): + if args.seed is not None: + random.seed(args.seed + 8_000_000 + w) + docs_w = random.sample(titles, warmup_n) + try: + ok_w, status_w, _elapsed_w, scores_len_w, _text_w = _do_rerank( + client, + args.url, + args.query, + docs_w, + top_n=top_n, + normalize=normalize, + ) + except httpx.HTTPError as exc: + print( + f"warmup n={warmup_n} {w + 1}/{warmup_runs} error: request failed: {exc}", + file=sys.stderr, + ) + any_fail = True + continue + if not ok_w: + any_fail = True + print( + f"warmup n={warmup_n} {w + 1}/{warmup_runs} status={status_w} " + f"scores={scores_len_w if scores_len_w is not None else 'n/a'} (not timed)" + ) + + for n in doc_counts: + for run_idx in range(repeat): + if args.seed is not None: + random.seed(args.seed + n * 10_000 + run_idx) + docs = random.sample(titles, n) + try: + ok, status, elapsed_ms, scores_len, text = _do_rerank( + client, + args.url, + args.query, + docs, + top_n=top_n, + normalize=normalize, + ) + except httpx.HTTPError as exc: + print( + f"n={n} run={run_idx + 1}/{repeat} error: request failed: {exc}", + file=sys.stderr, + ) + any_fail = True + continue + + if ok: + summary[n].append(elapsed_ms) + else: + any_fail = True + + if not args.quiet_runs: + print( + f"n={n} run={run_idx + 1}/{repeat} status={status} " + f"latency_ms={elapsed_ms:.2f} scores={scores_len if scores_len is not None else 'n/a'}" + ) + if args.print_body_preview and text and run_idx == repeat - 1 and n == doc_counts[-1]: + preview = text[:500] + ("…" if len(text) > 500 else "") + print(preview) + + for n in doc_counts: + lat = summary[n] + if not lat: + print(f"summary n={n} runs=0 (all failed)") + continue + avg = statistics.mean(lat) + lo, hi = min(lat), max(lat) + extra = "" + if len(lat) >= 2: + extra = f" stdev_ms={statistics.stdev(lat):.2f}" + print( + f"summary n={n} runs={len(lat)} min_ms={lo:.2f} max_ms={hi:.2f} avg_ms={avg:.2f}{extra}" + ) + + if args.json_summary_out is not None: + per_n: dict = {} + for n in doc_counts: + lat = summary[n] + row: dict = {"values_ms": lat, "runs": len(lat)} + if lat: + row["mean_ms"] = statistics.mean(lat) + row["min_ms"] = min(lat) + row["max_ms"] = max(lat) + if len(lat) >= 2: + row["stdev_ms"] = statistics.stdev(lat) + per_n[str(n)] = row + out_obj = { + "tag": args.tag or None, + "doc_counts": doc_counts, + "repeat": repeat, + "url": args.url, + "per_n": per_n, + "failed": bool(any_fail), + } + args.json_summary_out.parent.mkdir(parents=True, exist_ok=True) + args.json_summary_out.write_text( + json.dumps(out_obj, ensure_ascii=False, indent=2) + "\n", + encoding="utf-8", + ) + print(f"wrote json summary -> {args.json_summary_out}") + + return 1 if any_fail else 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/benchmarks/reranker/manual/curl1.sh b/benchmarks/reranker/manual/curl1.sh new file mode 100644 index 0000000..e9b946e --- /dev/null +++ b/benchmarks/reranker/manual/curl1.sh @@ -0,0 +1,23 @@ +#!/bin/bash + +start=$(date +%s%N) # 开始时间,纳秒级 + +# 将 titles.400 每一行转成 JSON 数组 + +docs_json=$(jq -R -s 'split("\n") | map(select(length > 0))' /data/saas-search/tests/data/titles.400) + +time curl -X POST "http://localhost:6007/rerank" \ + -H "Content-Type: application/json" \ + -d "$(jq -n \ + --arg query "健身女生T恤短袖" \ + --argjson docs "$docs_json" \ + '{ + query: $query, + docs: $docs, + top_n: 386, + normalize: true + }')" + +end=$(date +%s%N) # 结束时间,纳秒级 +duration=$(( (end - start) / 1000000 )) # 转换为毫秒 +echo "Command took $duration milliseconds." diff --git a/benchmarks/reranker/manual/curl1_simple.sh b/benchmarks/reranker/manual/curl1_simple.sh new file mode 100644 index 0000000..f55842e --- /dev/null +++ b/benchmarks/reranker/manual/curl1_simple.sh @@ -0,0 +1,417 @@ +#!/bin/bash +start=$(date +%s%N) # 开始时间,纳秒级 + +time curl -X POST "http://localhost:6007/rerank" \ + -H "Content-Type: application/json" \ + -d '{ + "query": "健身女生T恤短袖", + "docs": [ "60 Jelly Bracelets 80 s Adult Size - MAQIHAN Neon Gummy Bracelets for Women 80s Jelly Bangles Glow Silicone Bands Jewelry Wristband Rainbow Jellies Bangle Girls Boys Colored Accessories Party Favor", +"MEROKEETY Women s 2025 Summer Square Neck Puff Sleeve Boho Midi Dress Swiss Dot Ruffle Flowy Tie Back Dress", +"FITORY Mens Sandals", +"Lefant 3 Packs Dust Bags Replacement Kit Suitable for Lefant Base Station of M3/M3 Max Robot Vacuum", +"Merrell Mens Hydro Moc", +"Lounge Sets for Women Summer Outfits Women 2 Piece Sets 2025 Sleeveless Matching Lounge Crop Top High Waisted Short", +"Men s Underwear", +"Executive Functioning Workbook for Teens: 101 Activities and Strategies for Enhancing Self-Discipline", +"LEVSOX Compression Socks Women and Men", +"MGparty 12 Pieces Christmas Headbands Christmas Parties Favors Decoration Supplies Xmas Gifts Photo Booth Xmas Tree Snowman Reindeer Antlers Santa Hat", +"10 Large Vacuum Storage Bags with Hand Pump", +"Disney Lilo and Stitch Boys Swim Set", +"Sterling Silver Hoop Earrings", +"23 Pcs Day of The Dead Altar Decorations Set", +"Travel Makeup Bag for Women Fashion Large Capacity Pouch Open Flat Cosmetic Portable Organizer Waterproof Large Opening Storage Toiletry Bags Vertical Free-Standing Brush Holder for Easy Access Blue", +"Iron Flame: Empyrean", +"Luxebell Luggage Straps Suitcase Belt TSA Approved Travel Accessories Gift 4-Pack 6.56ft (Green)", +"TONY & SANDY Christian Gifts for Women", +"Blue Birthday Party Supplies", +"Vionic Women s Coral Loafer Moccasin", +"LIQING 35L Large Picnic Basket 2 Layers of Internal Pockets Leak-Proof and Insulated ,Folding with Internal Support for enhansed Stability", +"40oz Softball Tumbler with Handle Softball Gifts Stuff for Women Girls Men Gift for Coach Lovers Fan Stainless Steel Cup", +"Crayola Colour & Erase Reusable Puzzle Set", +"Carry On Luggage with Front Compartment and Cup Holder", +"Interactive Cat Toy Rechargeable", +"Nike Air Rift", +"Portable Hookah Set for Travel - Premium Handheld Glass Aluminum Mini Hookah Real Metal Accessories", +"Clear Backpack for Boys", +"Women’s Knee High Boots Round Toe Chunky Heel Faux Leather Tall Riding Boots with Side Zipper", +"Golf Grip Trainer & Connection Band 2Set", +"Monster High Self Scare Day Cleo De Nile Doll Play Set", +"Fortnite eGift Card - Powered by the Epic Games Store", +"Mesh Beach Bags", +"Crowye Anime Cosplay Costume for Halloween Princess Costume Accessories Anime White Cosplay Wig Egypt Arm Cuff Bracelet Gold Earrings Greek Goddess Set for Halloween Dress up Princess", +"Premium Women s Leather Tote Handbag - Bag for Everyday Use", +"Ekouaer Maternity Nursing Gown and Robe Set Labor Delivery Nursing Nightgowns for Breastfeeding Pregnancy Clothes", +"Superband Mermaid Tails for Swimming for Women and Adults Without Monofin", +"Pink Queen Women s 2025 Casual Pullover Sweaters Sexy V Neck Long Sleeve Twist Knot Cropped Knit Sweater Tops", +"WDIRARA Girl s Bow Puff Sleeve A Line Midi Dress Cute Collared Ruffle Hem Swing Dresses", +"Funziez! Adult Onesie Halloween Costume Animal Dinosaur Shark Unisex Plush One Piece Cosplay Suit for Adults", +"Rockland Duffel Bag", +"Centipede Demon Baby Shoes Baby Boys Girls Walking Shoes Non Slip Booties Sock Shoe Infants Breathable Sneakers Lightweight Barefoot Slip On Sneakers", +"CYDREAM Long Sleeve Bodysuits for Women - Square Neck Shapewear Bodysuit Tops Going Out Body Suits Shirt Leotard", +"Men s Oversized Letter Graphic Tank Top Sleeveless Casual Summer Tops Y2K Streetwear", +"Flower Claw Clip 7 PCS Claw Clips", +"waist twister,waist twisting machine ab twister board with 300 lbs Weight Capacity", +"PAGE ONE Womens Winter Ribbed Beanie Crossed Cap Chunky Cable Knit Pompom Soft Warm Hat", +"5 Pack Cute Keychains for Girls", +"Dragon Ball Super - Complete Series - Blu-ray", +"VejiA Multifunctional Simple Shoe Cabinet Storage Shoe Rack Save Space Hallway Furniture", +"50Pcs Handbag Purse Feet Handbag Nailhead Brass Studs Screw-Back Feet Flat Head Stud Metal Studs Rivet Leather Craft DIY for DIY Purse Leather Craft", +"Wearable Blanket Hoodie with Letter A-Z - Oversized Blanket Hooded Personalized Birthday Christmas Gifts for Women Mom", +"On Women s Cloudnova Form 2 Sneakers", +"SANTINY 18 Skorts for Women with 4 Pockets High Waist Long Athletic Tennis Skirt Golf Skort Dressy Casual", +"Compatible with AirTag Case Keychain", +"Rod Holder Plugs", +"Protective Case Compatible with Have A Seat Figure-Clear PVC Portable Storage Box with Keychain", +"adidas Men s Swift Run 1.0 Running Shoes", +"M MOOHAM Cross Necklace for Women Teen Girls", +"Sportneer Adjustable Ankle Weights for Women and Men 7 lbs/Pair Adjustable Leg Weights with Secure Straps", +"PRETTYGARDEN Women s 2 Piece Outfits Sleeveless Suit Vest and Wide Leg Pants Business Casual Blazer Sets", +"Bouncer Seat for Babies 0-12 Months", +"Womens Crew Socks Cotton Long Gym Socks Lightweight Athletic Running Socks", +"Denior Magnetic Card Phone Wallet Holder for iPhone 17/16/15/14/13/12 Series", +"LIGHT DOT Women s Summer Dress Plisse Maxi Tube Bodycon Dress Back Tie Beach Resort Vacation", +"Vivresina UV Resin 400g (400.0", +"Wide Leg Pants High Waisted Pleated Trousers with 4 Colors", +"Osprey Daylite Shoulder Sling Bag – Compact Crossbody Backpack for Everyday Carry", +"Tote Bag for Women Large PVC Tote Bag Letters Print Plastic Handbag for Christmas Gift", +"Hello Kitty Giant Coloring & Activity Book 11x16", +"Skechers Mens Delson 3.0 - Roth 210606", +"3pcs Heart Badge Reel with Alligator Clip Cute Retractable Badge Holder Acrylic Nurse Badge Clip for Office Workers", +"Ortho Balance Hiking Shoes for Men Women", +"GOLDENMATE 1000VA/600W Lithium UPS Battery Backup and Surge Protector", +"Gelante Solid Color 100% Cotton Bucket Hat for Women and Men Packable Travel Summer Beach Hat", +"Sonic The Hedgehog 3 Movie Action Figures 2.5-Inch Movie Collector Toy Figure Multi-Pack Includes Sonic The Hedgehog Knuckles Shadow Buzz Bomber & Drone- Officially Licensed Toys", +"61 Pcs Nacho Libre Stickers Comedy Movie Graffiti Waterproof Vinyl for Adults for Birthday Party Supplies Decoration Favors for Water Bottles Laptop Suitcase Scrapbooking Choice", +"Neck Lift Tape", +"925 Sterling Silver Earrings for Womens Sparkly Colorful Full Diamond Simple Stylish Elegant Hypoallergenic Jewelry", +"Pink Ceramic Bow Vase for Flowers", +"Winter Coats For Men Winter Jackets Water Resistant Warm Thicken Parka Puffer Coat Long Down Jacket", +"Alarm Clocks for Bedrooms", +"KINURI Running Belt for Men & Women – Fits All Smartphones – Waterproof Waist Pack with Adjustable Strap – Ideal for Jogging", +"DREAM PAIRS Heels for Women Flip Flops Kitten Low Heels Open Square Toe Thong Heeled Sandals", +"Amazon Basics All Purpose Washable School Craft Liquid Glue for Making Slime", +"Inflatable Costume Adult Frog Full Body Deluxe Funny Air Blow Up Costume for Men Women Halloween", +"Mens Golf Pants Stretch Casual Dress Pants Elastic Drawstring Slacks for Men Lightweight Trousers with 5 Pockets", +"Lip Smacker Hello Kitty Lip Balm", +"Brown Sugar Keeper 3D – Terracotta Clay Bear Softener", +"MEETSUN Polarized Sunglasses for Women Men Trendy Classic Retro Designer Style", +"Corset Top Bustier Lingerie for Women Zipper Front Flower Sexy Burlesque Vintage", +"Pro Club Men s Heavyweight Mesh Basketball Shorts", +"Nike Tech Men s Full-Zip Windrunner Hoodie (HV0949-237", +"Ear Piercing Kit", +"Timberland Men s 6 Premium Boot", +"STAR WARS The Black Series Darth Maul", +"VZQI Halloween Cosplay Costumes Kamado Tanjir Kids Anime Kimono Halloween Green Cloak", +"Fringe Vest for Women Faux Suede Open Front Cardigan Sleeveless Tassels Fringed Vest Cardigan Hippie Jacket", +"Smart Health Ring 2.0 for Women Men", +"Fast Forward Kid s Licensed 15 Backpack With Lunch Box Combo Set (Hello Kitty)", +"Handmade Authentic Katana - 41-inch Full Tang Sharp Blade", +"Inateck Sling Bag X", +"EXLURA Women s Fashion Faux Wool Mini Skirt High Waisted Y2K Trendy Side Slit Tweed Plaid Skirts 2025 Fall Winter Outfits", +"LASLULU Womens Sexy Crossover Crop Top Long Sleeve Workout Tops Crewneck Athletic Yoga T-Shirts Fall Outfits", +"Wrangler Authentics Men s Classic Relaxed Fit Five Pocket Jean Short", +"ZeroBound Built in Bra Tank Tops for Women - High Neck Racerback Tank Tops", +"Nike Mens Air Max Alpha Trainer 6", +"MAZZERI Solid Gold Plated Sterling Silver Italian 1.3/1.6/2.2/2.8mm Diamond-Cut Braided Rope Chain Necklace for Men Women", +"Milumia Women s Polka Dots Twist Front Halter Top Dressy Casual Textured Peplum Going Out Tops", +"80s 90s Outfit for Women", +"EFAN Womens Sexy Sleeveless Double Lined Crop Tops Workout Cute Tight Racerback Tank Tops Summer Clothes Teen Girls 2025", +"Nike Mens Shorts Dri-Fit Flex Woven Shorts 7inch (US", +"top handle satchel Women", +"Kono Expandable Luggage 3 Piece Set Hardshell Lightweight 20in 24in 28in Carry On Suitcase with Spinner Wheels TSA Lock(Black & Brown)", +"Nations of The World | National Pride Flag Symbol Arms Tee Unisex T-Shirt for Men or Women", +"Jo & Bette Seamless Thongs for Women - High Waist Panties 6 Pack - Thong Underwear Pack Breathable No show Sports", +"eKids Disney Frozen 2 Bluetooth Headphones with Microphone", +"Arctix Kids Insulated Snow Bib Overalls", +"USA Flag Charlie Gift T-Shirt", +"CBKSUHBADE 15in×11in Anime One Piece Wanted Bounty Posters", +"Plus Size Underwear for Women XL-5XL Cotton High Waist Women Briefs Full Coverage Ladies Panties 4 Pack", +"Little Adventures Enchanted Rapunzel Dress-Up Costume for Adult Women", +"G Gradual Tennis Dress for Women Golf Outfits with Shorts and Pockets Sleeveless Active Exercise Athletic Dresses for Women", +"Pastoral Style Porch Goose Outfits", +"Vive Thigh High Compression Stockings for Women & Men - 15-20 mmHg Graduated Support Hose - Opaque Closed Toe Compression Tights - Stockings for Varicose Veins", +"Canada is Not for Sale Vintage Cotton Twill Cap", +"TomTiger Yoga Shorts for Women Tummy Control High Waist Biker Shorts Exercise Workout Butt Lifting Tights Women s Short Pants", +"4PCS GOD IS FIRST IM SECOND Bracelet: Faith Priority Bracelet - Engraved Cross Silicone Wristband for Daily Encouragement", +"Tahitian Black Pearl Pendant Necklace AAAA 18K White Gold Plated 925 Sterling Silver Black Pearl Jewelry Gift for Women Mother Wife Her for Anniversary Christmas Birthday", +"HOTOUCH Womens Short Sleeve Button Down Shirts Loose Fit V Neck Business Casual Blouses Summer Top with Pockets S-XXL", +"Men s Corduroy Short Sleeved Cargo Shirt Relaxed Fit Button Down Casual Wear Tops with Flap Pockets", +"Orange Blue Light Blocking Glasses for Better Sleep - 99.5% Premium Acetate Migraine Glasses for Women & Men", +"Disney Stitch Beach Towel for Kids Cotton Bath Towels with 2 Clothes Pins Travel Swimming Quick Dry Towel Beach Vacation Essentials", +"PGANDS Womens Crew Neck Solid/Color Block Sweatshirts Long Sleeve Casual Lightweight Pullover Tops", +"Premium Organic Whole Cloves 5.3 oz (150 grams)", +"habibee Bra for Women No Underwire Comfort Seamless Bras Push Up Wireless Bras Full Coverage Bralettes", +"Puma Mens Caven 2.0 Shoes", +"PRETTYGARDEN Women s Fall Button Down Shirts Dressy Casual Spring Long Puff Sleeve Eyelet Loose Fit Collared Blouse Top", +"TNNZEET 2 Pack Plus Size Biker Shorts for Women - 8 Black High Waisted Tummy Control Spandex Workout Shorts (XL-4XL)", +"Marvel Legends Series Captain America Shield", +"PAVOI 14K Gold AAA+ Handpicked White Freshwater Cultured Pearl Earrings Studs", +"Trendy Queen Long Skirts for Women Boho Maxi Skirt Winter Swing Tiered A-Line Elastic High Waist Dress with Pockets Fashion", +"Reebok Classic Leather Sneakers for Men", +"PRETTYGARDEN Women s Summer Bodycon Maxi Tube Dress Ribbed Strapless Side Slit Long Going Out Casual Elegant Party Dresses", +"Favorite Daughter Women s Classic Logo Baseball Cap", +"Reebok Men s Cotton Vital Fleece Sweatpant", +"COOFANDY Mens Hawaiian Shirt Short Sleeve Button Down Shirts Tropical Summer Beach Shirts Casual Floral Aloha Shirts", +"Columbia Mens Grander Marlin Iii Offshore Short", +"Satin One Shoulder Flower Girl Dress with Bow Wedding Princess Pageant Party Gown Puffy Formal First Communion", +"Nike Mens V5 RNR", +"Speed Cube 3x3", +"FOURSTEEDS Women s Cotton Zipper Front Multi-Pocket Twill Bermuda Women Cargo Shorts", +"Curly Hair Brush Defining", +"YQXCC Cooling Towels | 4 Pack 47x12 | Ice Cool for Neck | Microfiber Soft Breathable Chilly | for Yoga", +"Hot Wheels Toy Car Playset with Lights", +"Carhartt Men s Loose Fit Heavyweight Short-Sleeve Pocket Henley T-Shirt", +"Women s Mid-High Rise Ripped Denim Shorts Stretchy Distressed Jean Shorts with Pockets Folded Hem Casual Summer Jorts", +"Monster High Cleo De Nile Doll in Golden Blouse & Layered Skirt", +"Ariat Women’s Fatbaby Western Boot", +"UYYE Car Registration and Insurance Card Holder", +"365 by Whole Foods Market", +"Crystal Bracelet for Women Fashion 7 Inch Approximately Rainbow Sparkling Crystal Bracelet with Adjustable Elastic Cord", +"Samsung Galaxy Watch 7 (44mm) AI Smartwatch w/ 1.5 AMOLED", +"DOUKEN 4 Pair Sneaker Creases Protector", +"Elvis: The Legend music word search puzzle.: Great Country Music Word Scrambles about Elvis. Large print word puzzle for adults and rock music lovers. ... Great music gift for your friends or family.", +"Pinkfong Bebefinn Plush Toy - 12 (30cm) Stuffed Doll | Soft Cuddly Plush for Toddlers | Bebefinn Toy | Perfect Birthday", +"Thrusting Dildo Vibrator Sex Toys for Women", +"VANLOVEMAC Baseball Gifts for Boys 8-12 Baseball Stuff College Going Away Gifts Welcome Back to School Gifts Dorm Room Essentials for Guys Off to College", +"Hello Kitty and Friends - Cinnamoroll 12” Pink Monochrome Plush", +"BOBISUKA Pearl White Face Body Paint", +"OMKAGI 2 Piece Workout Sets for Women Halter Sports Bras Gym Sets Booty Leggings Outfits", +"Ivay Womens Scoop Neck Ribbed Knit Tank Top Sleeveless Cotton Wife Beater Camisole Shirts", +"SOLY HUX Women s Graphic Tee Shirts Novelty Funny Short Sleeve Summer Casual Tops", +"Wooden Taper Candle Holders: Wood Candlestick Holders Rustic Brown Farmhouse Fall Decor for Living Room Dinning Table Centerpiece Christmas Set of 2", +"PRETTYGARDEN Long Sleeve Shirts for Women 2025 Fall V Neck Waffle Basic Tee Dressy Casual Winter Blouses Knit Tunic Tops", +"Ray-Ban RB2140 Original Wayfarer Square Sunglasses", +"Lee Womens Ultra Lux Comfort with Flex-to-go Utility Skimmer Capri Pant", +"3D Pedometer for Walking", +"HiiFeuer Medieval Faux Leather Chest Armor", +"Pet Deadly Dog Costume", +"Western Chief Kids Freestyle Neoprene Outdoor Boot", +"SKECHERS Women s Ultra Flex 3.0-Brilliant Path Hands Free Slip-INS Sneaker", +"LUOBO Keychain Accessory Decor Keychain Decoration backpacks Bag Pendant", +"10inch Teddy Bear Stuffed Animal", +"Halloweentown University T-Shirt for Women Fall Pumpkin Shirts Funny Halloween Thanksgiving Gift Tops", +"Women s Sexy American Flag Crop Tank 4th of July Patriotic Sleeveless Tee Tops", +"Gillette Fusion5 ProGlide Men s Razor Blade Refills", +"Poppy Playtime - Mommy Long Legs Plush (14 Medium Plush", +"Women’s Heated Vest with 12V 20000mAh Battery – Cropped Stand Collar Lightweight Insulated Winter Vest.", +"toolant Winter Work Gloves for Men", +"192Pcs Halloween Favors Stationery Gift Set", +"20 Pcs Ultra Thin Tattoo Cover up Patch Waterproof Tattoo Cover up Tape Sweatproof Tattoos Covers Patches Cuttable Invisible Non-Woven Fabric Patches for Tattoos Scar Birthmark 4.72×3.35In(Light Skin)", +"Popcorns Maker", +"Paladone Kuromi GloBuddies Night Light", +"Creativity for Kids Sensory Minis Dinosaur Kit | Cloud Clay Sensory Toy for Toddlers | Squish", +"Mouse Ears Headband Fully Sewn Sturdy Headbands 2-Pcs, 4.6-Inch Sequin Big Ears 3D Silk Satin Bowknot Suitable for Women and Girls Theme Role Play Costume Accessories Party", +"Tanluhu Sweatbands Sport Headbands for Men & Women", +"Pilates Reformer Machine", +"Fossil Fenmore Analog Men Watch", +"Stray Kids Official Lightstick Ver 2", +"Zima Dental Pod PRO: New Ultrasonic Retainer Cleaner Machine. Market-Leading", +"2300pcs Polymer Clay Beads Bracelet Making Kit", +"AI ACCESSORY INNOVATIONS Bluey 4 Piece Backpack Set for Pre School Girls & Boys", +"MIRITY Women s High Waist Cotton Underwear - Soft Full Coverage Briefs with Double-Layer Waistedband", +"Plus Size Summer Dresses - Floral Beach Wedding Guest Semi Formal Tiered Flowy Long Sundress", +"AUTOMET Womens Tops Summer Sweater Long Tunic Dressy Casual Blouses Business Cute Trendy Short Sleeve Shirt 2025", +"Black Sabbath Sketch Band T-Shirt", +"Loomie Upgraded 6 Drawer White Dresser for Bedroom", +"Michael Kors Womens Zuma Trainer", +"Chunky Silver Bohemian Flower Bracelet For Wemen Men", +"Classic Black Western Felt Roll Up Brim Cowboy and Cowgirl Hat for Women and Men - Decoration with Western Belt Bukle", +"Jellycat Little Pig Bag Charm", +"LARNMERN Steel Toe Work Boots Men", +"3PCS Gold Hair Ties", +"Red Kap Men s Snap Front Cotton Coverall", +"Citizen Quartz Mens Watch", +"ATHMILE Long Sleeve Shirts for Women Tunic Fall Tops Loose Fit Dressy Crew Neck Basic Sweaters 2025", +"Narecte Summer Maxi Dresses for Women Back Strap Beach Dress Women s Casual Dress Long Flowy Dresses for Vacation", +"LIDHAY Cowboy Hat for Women and Men Western Cowgirl Hats Suede Cowboy Hat for Rodeo", +"BIC Classic Maxi Pocket Lighter", +"A + S Luxxe Diaper Bag Tote – Stylish", +"100pack Name Badge Holders Name Tag Holder Clear Plastic Badge Holder ID Holders for Lanyard (100Pcs Vertical)", +"MOOSEA Christmas Gifts for Women Wife - Love Knot Moissanite Necklace 1-3ct D Color VVS1 Clarity Moissanite 925 Sterling Silver Necklace Anniversary Birthday Gifts for Women Wife Mom Girlfriend", +"Solid Wood Retangle End Table with Drawer and Storage Shelf", +"Madden Girl womens Beella Heeled SandalHeeled Sandal", +"Ekouaer 2 Pack Womens Pajama Sets Short Sleeve Sleepwear Soft Crew Neck Pj Shorts Set Printed Loungewear Set S-XXL", +"NPQQUAN Original Classic Low Profile Baseball Cap Golf Dad Hat Adjustable Cotton Hats Men Women Unconstructed Plain Cap", +"YEOREO Women Workout Biker Shorts Impact 4.5 No Front Seam Hidden Scrunch Lifting Seamless Yoga Gym Shorts", +"Merino Wool Underwear Men by Thermowave - Sport & Everyday Men s Merino Wool Boxer Brief - 150 GSM Stretchy & Soft", +"COACH Women s Leah Platform Loafers", +"Doodle Me Happy Kids Thank You Cards - 25 Cards With Envelopes - Cute", +"Spring Summer Women Pleated Casual Denim V Neck Ruffle Sleeve Dress Light Blue XL", +"Disney Hooded Matching Family Cosplay T-Shirt Infant to Adult Sizes (12 Months - 2XL)", +"Leather CPR Cleaner & Conditioner 18oz - Cleans", +"Baseball Shirts Women Baseball Mom Tshirt Baseball Heart Graphic Tee Game Day Gifts Funny Short Sleeve Tops", +"4 Pack Cooling Towels", +"ZEEPORTE Mask Fin Snorkel Set", +"60 Pcs Bride Tribe Bachelorette Party Favors Bulk Friendship Bridesmaid Gifts 12 Set Friendship Bracelets Heart Sunglasses Satin Scrunchie for Engagement Bridal Shower Wedding Favor", +"AUSELILY Summer Dress Sundress Beach Cover up Swing Dresses", +"Loungefly Disney Minnie Mouse Crossbody Satchel Handbag", +"Tactical Gym Bag for Men,50L Large 3 in 1 Sports Duffle Bag with Shoes Compartment for Travel", +"YETI Rambler 42 oz Tumbler with Handle and Straw Lid", +"Samsonite Classic Leather Slim Backpack", +"Vive Thigh High Compression Stockings for Women & Men - 15-20 mmHg Graduated Support Hose - Opaque Closed Toe Compression Tights - Stockings for Varicose Veins", +"Canada is Not for Sale Vintage Cotton Twill Cap", +"TomTiger Yoga Shorts for Women Tummy Control High Waist Biker Shorts Exercise Workout Butt Lifting Tights Women s Short Pants", +"4PCS GOD IS FIRST IM SECOND Bracelet: Faith Priority Bracelet - Engraved Cross Silicone Wristband for Daily Encouragement", +"Tahitian Black Pearl Pendant Necklace AAAA 18K White Gold Plated 925 Sterling Silver Black Pearl Jewelry Gift for Women Mother Wife Her for Anniversary Christmas Birthday", +"HOTOUCH Womens Short Sleeve Button Down Shirts Loose Fit V Neck Business Casual Blouses Summer Top with Pockets S-XXL", +"Men s Corduroy Short Sleeved Cargo Shirt Relaxed Fit Button Down Casual Wear Tops with Flap Pockets", +"Orange Blue Light Blocking Glasses for Better Sleep - 99.5% Premium Acetate Migraine Glasses for Women & Men", +"Disney Stitch Beach Towel for Kids Cotton Bath Towels with 2 Clothes Pins Travel Swimming Quick Dry Towel Beach Vacation Essentials", +"PGANDS Womens Crew Neck Solid/Color Block Sweatshirts Long Sleeve Casual Lightweight Pullover Tops", +"Premium Organic Whole Cloves 5.3 oz (150 grams)", +"habibee Bra for Women No Underwire Comfort Seamless Bras Push Up Wireless Bras Full Coverage Bralettes", +"Puma Mens Caven 2.0 Shoes", +"PRETTYGARDEN Women s Fall Button Down Shirts Dressy Casual Spring Long Puff Sleeve Eyelet Loose Fit Collared Blouse Top", +"TNNZEET 2 Pack Plus Size Biker Shorts for Women - 8 Black High Waisted Tummy Control Spandex Workout Shorts (XL-4XL)", +"Marvel Legends Series Captain America Shield", +"PAVOI 14K Gold AAA+ Handpicked White Freshwater Cultured Pearl Earrings Studs", +"Trendy Queen Long Skirts for Women Boho Maxi Skirt Winter Swing Tiered A-Line Elastic High Waist Dress with Pockets Fashion", +"Reebok Classic Leather Sneakers for Men", +"PRETTYGARDEN Women s Summer Bodycon Maxi Tube Dress Ribbed Strapless Side Slit Long Going Out Casual Elegant Party Dresses", +"Favorite Daughter Women s Classic Logo Baseball Cap", +"Reebok Men s Cotton Vital Fleece Sweatpant", +"COOFANDY Mens Hawaiian Shirt Short Sleeve Button Down Shirts Tropical Summer Beach Shirts Casual Floral Aloha Shirts", +"Columbia Mens Grander Marlin Iii Offshore Short", +"Satin One Shoulder Flower Girl Dress with Bow Wedding Princess Pageant Party Gown Puffy Formal First Communion", +"Nike Mens V5 RNR", +"Speed Cube 3x3", +"FOURSTEEDS Women s Cotton Zipper Front Multi-Pocket Twill Bermuda Women Cargo Shorts", +"Curly Hair Brush Defining", +"YQXCC Cooling Towels | 4 Pack 47x12 | Ice Cool for Neck | Microfiber Soft Breathable Chilly | for Yoga", +"Hot Wheels Toy Car Playset with Lights", +"Carhartt Men s Loose Fit Heavyweight Short-Sleeve Pocket Henley T-Shirt", +"Women s Mid-High Rise Ripped Denim Shorts Stretchy Distressed Jean Shorts with Pockets Folded Hem Casual Summer Jorts", +"Monster High Cleo De Nile Doll in Golden Blouse & Layered Skirt", +"Ariat Women’s Fatbaby Western Boot", +"UYYE Car Registration and Insurance Card Holder", +"365 by Whole Foods Market", +"Crystal Bracelet for Women Fashion 7 Inch Approximately Rainbow Sparkling Crystal Bracelet with Adjustable Elastic Cord", +"Samsung Galaxy Watch 7 (44mm) AI Smartwatch w/ 1.5 AMOLED", +"DOUKEN 4 Pair Sneaker Creases Protector", +"Elvis: The Legend music word search puzzle.: Great Country Music Word Scrambles about Elvis. Large print word puzzle for adults and rock music lovers. ... Great music gift for your friends or family.", +"Pinkfong Bebefinn Plush Toy - 12 (30cm) Stuffed Doll | Soft Cuddly Plush for Toddlers | Bebefinn Toy | Perfect Birthday", +"Thrusting Dildo Vibrator Sex Toys for Women", +"VANLOVEMAC Baseball Gifts for Boys 8-12 Baseball Stuff College Going Away Gifts Welcome Back to School Gifts Dorm Room Essentials for Guys Off to College", +"Hello Kitty and Friends - Cinnamoroll 12” Pink Monochrome Plush", +"BOBISUKA Pearl White Face Body Paint", +"OMKAGI 2 Piece Workout Sets for Women Halter Sports Bras Gym Sets Booty Leggings Outfits", +"Ivay Womens Scoop Neck Ribbed Knit Tank Top Sleeveless Cotton Wife Beater Camisole Shirts", +"SOLY HUX Women s Graphic Tee Shirts Novelty Funny Short Sleeve Summer Casual Tops", +"Wooden Taper Candle Holders: Wood Candlestick Holders Rustic Brown Farmhouse Fall Decor for Living Room Dinning Table Centerpiece Christmas Set of 2", +"PRETTYGARDEN Long Sleeve Shirts for Women 2025 Fall V Neck Waffle Basic Tee Dressy Casual Winter Blouses Knit Tunic Tops", +"Ray-Ban RB2140 Original Wayfarer Square Sunglasses", +"Lee Womens Ultra Lux Comfort with Flex-to-go Utility Skimmer Capri Pant", +"3D Pedometer for Walking", +"HiiFeuer Medieval Faux Leather Chest Armor", +"Pet Deadly Dog Costume", +"Western Chief Kids Freestyle Neoprene Outdoor Boot", +"SKECHERS Women s Ultra Flex 3.0-Brilliant Path Hands Free Slip-INS Sneaker", +"LUOBO Keychain Accessory Decor Keychain Decoration backpacks Bag Pendant", +"10inch Teddy Bear Stuffed Animal", +"Halloweentown University T-Shirt for Women Fall Pumpkin Shirts Funny Halloween Thanksgiving Gift Tops", +"Women s Sexy American Flag Crop Tank 4th of July Patriotic Sleeveless Tee Tops", +"Gillette Fusion5 ProGlide Men s Razor Blade Refills", +"Poppy Playtime - Mommy Long Legs Plush (14 Medium Plush", +"Women’s Heated Vest with 12V 20000mAh Battery – Cropped Stand Collar Lightweight Insulated Winter Vest.", +"toolant Winter Work Gloves for Men", +"192Pcs Halloween Favors Stationery Gift Set", +"20 Pcs Ultra Thin Tattoo Cover up Patch Waterproof Tattoo Cover up Tape Sweatproof Tattoos Covers Patches Cuttable Invisible Non-Woven Fabric Patches for Tattoos Scar Birthmark 4.72×3.35In(Light Skin)", +"Popcorns Maker", +"Paladone Kuromi GloBuddies Night Light", +"Creativity for Kids Sensory Minis Dinosaur Kit | Cloud Clay Sensory Toy for Toddlers | Squish", +"Mouse Ears Headband Fully Sewn Sturdy Headbands 2-Pcs, 4.6-Inch Sequin Big Ears 3D Silk Satin Bowknot Suitable for Women and Girls Theme Role Play Costume Accessories Party", +"Tanluhu Sweatbands Sport Headbands for Men & Women", +"Pilates Reformer Machine", +"Fossil Fenmore Analog Men Watch", +"Stray Kids Official Lightstick Ver 2", +"Zima Dental Pod PRO: New Ultrasonic Retainer Cleaner Machine. Market-Leading", +"2300pcs Polymer Clay Beads Bracelet Making Kit", +"AI ACCESSORY INNOVATIONS Bluey 4 Piece Backpack Set for Pre School Girls & Boys", +"MIRITY Women s High Waist Cotton Underwear - Soft Full Coverage Briefs with Double-Layer Waistedband", +"Plus Size Summer Dresses - Floral Beach Wedding Guest Semi Formal Tiered Flowy Long Sundress", +"AUTOMET Womens Tops Summer Sweater Long Tunic Dressy Casual Blouses Business Cute Trendy Short Sleeve Shirt 2025", +"Black Sabbath Sketch Band T-Shirt", +"Loomie Upgraded 6 Drawer White Dresser for Bedroom", +"Michael Kors Womens Zuma Trainer", +"Chunky Silver Bohemian Flower Bracelet For Wemen Men", +"Classic Black Western Felt Roll Up Brim Cowboy and Cowgirl Hat for Women and Men - Decoration with Western Belt Bukle", +"Jellycat Little Pig Bag Charm", +"LARNMERN Steel Toe Work Boots Men", +"3PCS Gold Hair Ties", +"Red Kap Men s Snap Front Cotton Coverall", +"Citizen Quartz Mens Watch", +"ATHMILE Long Sleeve Shirts for Women Tunic Fall Tops Loose Fit Dressy Crew Neck Basic Sweaters 2025", +"Narecte Summer Maxi Dresses for Women Back Strap Beach Dress Women s Casual Dress Long Flowy Dresses for Vacation", +"LIDHAY Cowboy Hat for Women and Men Western Cowgirl Hats Suede Cowboy Hat for Rodeo", +"BIC Classic Maxi Pocket Lighter", +"A + S Luxxe Diaper Bag Tote – Stylish", +"100pack Name Badge Holders Name Tag Holder Clear Plastic Badge Holder ID Holders for Lanyard (100Pcs Vertical)", +"MOOSEA Christmas Gifts for Women Wife - Love Knot Moissanite Necklace 1-3ct D Color VVS1 Clarity Moissanite 925 Sterling Silver Necklace Anniversary Birthday Gifts for Women Wife Mom Girlfriend", +"Solid Wood Retangle End Table with Drawer and Storage Shelf", +"Madden Girl womens Beella Heeled SandalHeeled Sandal", +"Ekouaer 2 Pack Womens Pajama Sets Short Sleeve Sleepwear Soft Crew Neck Pj Shorts Set Printed Loungewear Set S-XXL", +"NPQQUAN Original Classic Low Profile Baseball Cap Golf Dad Hat Adjustable Cotton Hats Men Women Unconstructed Plain Cap", +"YEOREO Women Workout Biker Shorts Impact 4.5 No Front Seam Hidden Scrunch Lifting Seamless Yoga Gym Shorts", +"Merino Wool Underwear Men by Thermowave - Sport & Everyday Men s Merino Wool Boxer Brief - 150 GSM Stretchy & Soft", +"COACH Women s Leah Platform Loafers", +"Doodle Me Happy Kids Thank You Cards - 25 Cards With Envelopes - Cute", +"Spring Summer Women Pleated Casual Denim V Neck Ruffle Sleeve Dress Light Blue XL", +"Disney Hooded Matching Family Cosplay T-Shirt Infant to Adult Sizes (12 Months - 2XL)", +"Leather CPR Cleaner & Conditioner 18oz - Cleans", +"Baseball Shirts Women Baseball Mom Tshirt Baseball Heart Graphic Tee Game Day Gifts Funny Short Sleeve Tops", +"4 Pack Cooling Towels", +"ZEEPORTE Mask Fin Snorkel Set", +"60 Pcs Bride Tribe Bachelorette Party Favors Bulk Friendship Bridesmaid Gifts 12 Set Friendship Bracelets Heart Sunglasses Satin Scrunchie for Engagement Bridal Shower Wedding Favor", +"AUSELILY Summer Dress Sundress Beach Cover up Swing Dresses", +"Loungefly Disney Minnie Mouse Crossbody Satchel Handbag", +"Tactical Gym Bag for Men,50L Large 3 in 1 Sports Duffle Bag with Shoes Compartment for Travel", +"YETI Rambler 42 oz Tumbler with Handle and Straw Lid", +"Samsonite Classic Leather Slim Backpack", +"Fabletics Men s Only Short", +"3pcs Heart Badge Reel with Alligator Clip Cute Retractable Badge Holder Acrylic Nurse Badge Clip for Office Workers", +"Ortho Balance Hiking Shoes for Men Women", +"GOLDENMATE 1000VA/600W Lithium UPS Battery Backup and Surge Protector", +"Gelante Solid Color 100% Cotton Bucket Hat for Women and Men Packable Travel Summer Beach Hat", +"Sonic The Hedgehog 3 Movie Action Figures 2.5-Inch Movie Collector Toy Figure Multi-Pack Includes Sonic The Hedgehog Knuckles Shadow Buzz Bomber & Drone- Officially Licensed Toys", +"61 Pcs Nacho Libre Stickers Comedy Movie Graffiti Waterproof Vinyl for Adults for Birthday Party Supplies Decoration Favors for Water Bottles Laptop Suitcase Scrapbooking Choice", +"Neck Lift Tape", +"925 Sterling Silver Earrings for Womens Sparkly Colorful Full Diamond Simple Stylish Elegant Hypoallergenic Jewelry", +"Pink Ceramic Bow Vase for Flowers", +"Winter Coats For Men Winter Jackets Water Resistant Warm Thicken Parka Puffer Coat Long Down Jacket", +"Alarm Clocks for Bedrooms", +"KINURI Running Belt for Men & Women – Fits All Smartphones – Waterproof Waist Pack with Adjustable Strap – Ideal for Jogging", +"DREAM PAIRS Heels for Women Flip Flops Kitten Low Heels Open Square Toe Thong Heeled Sandals", +"Amazon Basics All Purpose Washable School Craft Liquid Glue for Making Slime", +"Inflatable Costume Adult Frog Full Body Deluxe Funny Air Blow Up Costume for Men Women Halloween", +"Mens Golf Pants Stretch Casual Dress Pants Elastic Drawstring Slacks for Men Lightweight Trousers with 5 Pockets", +"Lip Smacker Hello Kitty Lip Balm", +"Brown Sugar Keeper 3D – Terracotta Clay Bear Softener", +"MEETSUN Polarized Sunglasses for Women Men Trendy Classic Retro Designer Style", +"Corset Top Bustier Lingerie for Women Zipper Front Flower Sexy Burlesque Vintage", +"Pro Club Men s Heavyweight Mesh Basketball Shorts", +"Nike Tech Men s Full-Zip Windrunner Hoodie (HV0949-237", +"Ear Piercing Kit", +"Timberland Men s 6 Premium Boot", +"Nike Air Rift", +"Portable Hookah Set for Travel - Premium Handheld Glass Aluminum Mini Hookah Real Metal Accessories", +"Clear Backpack for Boys", +"Women’s Knee High Boots Round Toe Chunky Heel Faux Leather Tall Riding Boots with Side Zipper", +"Golf Grip Trainer & Connection Band 2Set", +"Monster High Self Scare Day Cleo De Nile Doll Play Set", +"Fortnite eGift Card - Powered by the Epic Games Store", +"Mesh Beach Bags", +"Crowye Anime Cosplay Costume for Halloween Princess Costume Accessories Anime White Cosplay Wig Egypt Arm Cuff Bracelet Gold Earrings Greek Goddess Set for Halloween Dress up Princess", +"Premium Women s Leather Tote Handbag - Bag for Everyday Use", +"Ekouaer Maternity Nursing Gown and Robe Set Labor Delivery Nursing Nightgowns for Breastfeeding Pregnancy Clothes", +"Superband Mermaid Tails for Swimming for Women and Adults Without Monofin", +"Pink Queen Women s 2025 Casual Pullover Sweaters Sexy V Neck Long Sleeve Twist Knot Cropped Knit Sweater Tops" + ], + "top_n":386, + "normalize": true + }' + +end=$(date +%s%N) # 结束时间,纳秒级 +duration=$(( (end - start) / 1000000 )) # 转换为毫秒 +echo "Command took $duration milliseconds." + + diff --git a/benchmarks/reranker/manual/curl2.sh b/benchmarks/reranker/manual/curl2.sh new file mode 100644 index 0000000..f5f894a --- /dev/null +++ b/benchmarks/reranker/manual/curl2.sh @@ -0,0 +1,26 @@ +#!/bin/bash + +start=$(date +%s%N) # 开始时间,纳秒级 + +# 将 titles.400 每一行转成 JSON 数组 +documents_json=$(jq -R -s 'split("\n") | map(select(length > 0))' /data/saas-search/tests/data/titles.400) +#echo $documents_json +#exit + +time curl -X POST "http://10.200.16.14:9997/v1/rerank" \ + -H "accept: application/json" \ + -H "Content-Type: application/json" \ + -d "$(jq -n \ + --arg model "Qwen3-Reranker-0.6B" \ + --arg query "健身女生T恤短袖" \ + --argjson documents "$documents_json" \ + '{ + model: $model, + query: $query, + documents: $documents + }')" \ + -i + +end=$(date +%s%N) # 结束时间,纳秒级 +duration=$(( (end - start) / 1000000 )) # 转换为毫秒 +echo "Command took $duration milliseconds." diff --git a/benchmarks/reranker/manual/rerank_performance_compare.sh b/benchmarks/reranker/manual/rerank_performance_compare.sh new file mode 100644 index 0000000..32539d7 --- /dev/null +++ b/benchmarks/reranker/manual/rerank_performance_compare.sh @@ -0,0 +1,117 @@ +#!/bin/bash + +set -u + +FILE="/data/saas-search/tests/data/titles.1.8w" +ROUNDS=10 +SAMPLE_SIZE=400 + +if [ ! -f "$FILE" ]; then + echo "文件不存在: $FILE" + exit 1 +fi + +# 生成随机 400 行文本,并转成 JSON 数组 +generate_docs_json() { + shuf -n "$SAMPLE_SIZE" "$FILE" | jq -R -s 'split("\n")[:-1]' +} + +# 统计汇总 +summarize_times() { + local name="$1" + shift + local arr=("$@") + local total=0 + local min=${arr[0]} + local max=${arr[0]} + local count=${#arr[@]} + + for t in "${arr[@]}"; do + total=$((total + t)) + if [ "$t" -lt "$min" ]; then + min=$t + fi + if [ "$t" -gt "$max" ]; then + max=$t + fi + done + + local avg=$((total / count)) + + echo "========================================" + echo "$name 汇总" + echo "测试次数: $count" + echo "总耗时: ${total} ms" + echo "平均耗时: ${avg} ms" + echo "最小耗时: ${min} ms" + echo "最大耗时: ${max} ms" + echo "========================================" +} + +echo "开始测试..." +echo "数据文件: $FILE" +echo "每次随机抽样: $SAMPLE_SIZE 行" +echo "每个测试对象执行次数: $ROUNDS" +echo + +times_obj1=() +times_obj2=() + +for ((i=1; i<=ROUNDS; i++)); do + echo "---------- 第 $i 轮 ----------" + + # 每轮随机生成一组 400 行 + DOCS_JSON=$(generate_docs_json) + + # 测试对象1 + PAYLOAD1=$(jq -n \ + --arg query "健身女生T恤短袖" \ + --argjson docs "$DOCS_JSON" \ + --argjson top_n 386 \ + --argjson normalize true \ + '{ + query: $query, + docs: $docs, + top_n: $top_n, + normalize: $normalize + }') + + start1=$(date +%s%N) + curl -s -o /dev/null -X POST "http://localhost:6007/rerank" \ + -H "Content-Type: application/json" \ + -d "$PAYLOAD1" + end1=$(date +%s%N) + duration1=$(( (end1 - start1) / 1000000 )) + times_obj1+=("$duration1") + echo "测试对象1 第 $i 次耗时: ${duration1} ms" + + # 测试对象2 + PAYLOAD2=$(jq -n \ + --arg model "Qwen3-Reranker-0.6B" \ + --arg query "什么是机器学习" \ + --argjson documents "$DOCS_JSON" \ + '{ + model: $model, + query: $query, + documents: $documents + }') + + start2=$(date +%s%N) + curl -s -o /dev/null -X POST "http://10.200.16.14:9997/v1/rerank" \ + -H "accept: application/json" \ + -H "Content-Type: application/json" \ + -d "$PAYLOAD2" + end2=$(date +%s%N) + duration2=$(( (end2 - start2) / 1000000 )) + times_obj2+=("$duration2") + echo "测试对象2 第 $i 次耗时: ${duration2} ms" + + echo +done + +echo +echo "测试完成,开始汇总..." +echo + +summarize_times "测试对象1" "${times_obj1[@]}" +summarize_times "测试对象2" "${times_obj2[@]}" diff --git a/benchmarks/reranker/patch_rerank_vllm_benchmark_config.py b/benchmarks/reranker/patch_rerank_vllm_benchmark_config.py new file mode 100755 index 0000000..97af9cb --- /dev/null +++ b/benchmarks/reranker/patch_rerank_vllm_benchmark_config.py @@ -0,0 +1,100 @@ +#!/usr/bin/env python3 +""" +Surgically patch config/config.yaml: + services.rerank.backend + services.rerank.backends.qwen3_vllm.instruction_format + services.rerank.backends.qwen3_vllm_score.instruction_format + +Preserves comments and unrelated lines. Used for benchmark matrix runs. +""" + +from __future__ import annotations + +import argparse +import re +import sys +from pathlib import Path + + +def _with_stripped_body(line: str) -> tuple[str, str]: + """Return (body without end newline, newline suffix including '' if none).""" + if line.endswith("\r\n"): + return line[:-2], "\r\n" + if line.endswith("\n"): + return line[:-1], "\n" + return line, "" + + +def _patch_backend_in_rerank_block(lines: list[str], backend: str) -> None: + in_rerank = False + for i, line in enumerate(lines): + if line.startswith(" rerank:"): + in_rerank = True + continue + if in_rerank: + if line.startswith(" ") and not line.startswith(" ") and line.strip(): + in_rerank = False + continue + body, nl = _with_stripped_body(line) + m = re.match(r'^(\s*backend:\s*")[^"]+(".*)$', body) + if m: + lines[i] = f'{m.group(1)}{backend}{m.group(2)}{nl}' + return + raise RuntimeError("services.rerank.backend line not found") + + +def _patch_instruction_format_under_backend( + lines: list[str], section: str, fmt: str +) -> None: + """section is 'qwen3_vllm' or 'qwen3_vllm_score' (first line is ' qwen3_vllm:').""" + header = f" {section}:" + start = None + for i, line in enumerate(lines): + if line.rstrip() == header: + start = i + break + if start is None: + raise RuntimeError(f"section {section!r} not found") + + for j in range(start + 1, len(lines)): + line = lines[j] + body, nl = _with_stripped_body(line) + if re.match(r"^ [a-zA-Z0-9_]+:\s*$", body): + break + m = re.match(r"^(\s*instruction_format:\s*)\S+", body) + if m: + lines[j] = f"{m.group(1)}{fmt}{nl}" + return + raise RuntimeError(f"instruction_format not found under {section!r}") + + +def main() -> int: + p = argparse.ArgumentParser() + p.add_argument( + "--config", + type=Path, + default=Path(__file__).resolve().parents[2] / "config" / "config.yaml", + ) + p.add_argument("--backend", choices=("qwen3_vllm", "qwen3_vllm_score"), required=True) + p.add_argument( + "--instruction-format", + dest="instruction_format", + choices=("compact", "standard"), + required=True, + ) + args = p.parse_args() + text = args.config.read_text(encoding="utf-8") + lines = text.splitlines(keepends=True) + if not lines: + print("empty config", file=sys.stderr) + return 2 + _patch_backend_in_rerank_block(lines, args.backend) + _patch_instruction_format_under_backend(lines, "qwen3_vllm", args.instruction_format) + _patch_instruction_format_under_backend(lines, "qwen3_vllm_score", args.instruction_format) + args.config.write_text("".join(lines), encoding="utf-8") + print(f"patched {args.config}: backend={args.backend} instruction_format={args.instruction_format} (both vLLM blocks)") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/benchmarks/reranker/run_reranker_vllm_instruction_benchmark.sh b/benchmarks/reranker/run_reranker_vllm_instruction_benchmark.sh new file mode 100755 index 0000000..d0964e5 --- /dev/null +++ b/benchmarks/reranker/run_reranker_vllm_instruction_benchmark.sh @@ -0,0 +1,89 @@ +#!/usr/bin/env bash +# Patch config, restart reranker, wait for /health, run benchmark_reranker_random_titles.py. +# Requires: curl, .venv with PyYAML not needed (patch is standalone Python). + +set -euo pipefail +ROOT="$(cd "$(dirname "$0")/.." && pwd)" +cd "$ROOT" + +PYTHON="${ROOT}/.venv/bin/python" +DAY="$(date +%F)" +OUT_DIR="${ROOT}/perf_reports/reranker_vllm_instruction/${DAY}" +mkdir -p "$OUT_DIR" + +health_ok() { + local want_backend="$1" + local want_fmt="$2" + local body + if ! body="$(curl -sS --connect-timeout 2 --max-time 5 "http://127.0.0.1:6007/health" 2>/dev/null)"; then + return 1 + fi + echo "$body" | "$PYTHON" -c " +import json, sys +want_b, want_f = sys.argv[1], sys.argv[2] +d = json.load(sys.stdin) +if d.get('status') != 'ok' or not d.get('model_loaded'): + sys.exit(1) +if d.get('backend') != want_b: + sys.exit(1) +if d.get('instruction_format') != want_f: + sys.exit(1) +sys.exit(0) +" "$want_backend" "$want_fmt" +} + +wait_health() { + local want_backend="$1" + local want_fmt="$2" + local i + for i in $(seq 1 180); do + if health_ok "$want_backend" "$want_fmt"; then + curl -sS "http://127.0.0.1:6007/health" | "$PYTHON" -m json.tool + return 0 + fi + echo "[wait] ${i}/180 backend=${want_backend} instruction_format=${want_fmt} ..." + sleep 3 + done + echo "[error] health did not match in time" >&2 + return 1 +} + +run_one() { + local backend="$1" + local fmt="$2" + local tag="${backend}|${fmt}" + local jf="${OUT_DIR}/${backend}_${fmt}.json" + + echo "========== ${tag} ==========" + "$PYTHON" "${ROOT}/benchmarks/reranker/patch_rerank_vllm_benchmark_config.py" \ + --backend "$backend" --instruction-format "$fmt" + + "${ROOT}/restart.sh" reranker + wait_health "$backend" "$fmt" + + if ! "$PYTHON" "${ROOT}/benchmarks/reranker/benchmark_reranker_random_titles.py" \ + 100,200,400,600,800,1000 \ + --repeat 5 \ + --seed 42 \ + --quiet-runs \ + --timeout 360 \ + --tag "$tag" \ + --json-summary-out "$jf" + then + echo "[warn] benchmark exited non-zero for ${tag} (see ${jf} failed flag / partial runs)" >&2 + fi + + echo "artifact: $jf" +} + +run_one qwen3_vllm compact +run_one qwen3_vllm standard +run_one qwen3_vllm_score compact +run_one qwen3_vllm_score standard + +# Restore repo-default-style rerank settings (score + compact). +"$PYTHON" "${ROOT}/benchmarks/reranker/patch_rerank_vllm_benchmark_config.py" \ + --backend qwen3_vllm_score --instruction-format compact +"${ROOT}/restart.sh" reranker +wait_health qwen3_vllm_score compact +echo "Restored config: qwen3_vllm_score + compact. Done. Artifacts under ${OUT_DIR}" diff --git a/benchmarks/reranker/smoke_qwen3_vllm_score_backend.py b/benchmarks/reranker/smoke_qwen3_vllm_score_backend.py new file mode 100644 index 0000000..4631444 --- /dev/null +++ b/benchmarks/reranker/smoke_qwen3_vllm_score_backend.py @@ -0,0 +1,76 @@ +#!/usr/bin/env python3 +""" +Smoke test: load Qwen3VLLMScoreRerankerBackend (must run as a file, not stdin — vLLM spawn). + +Usage (from repo root, score venv): + PYTHONPATH=. ./.venv-reranker-score/bin/python benchmarks/reranker/smoke_qwen3_vllm_score_backend.py + +Same as production: vLLM child processes need the venv's ``bin`` on PATH (for pip's ``ninja`` when +vLLM auto-selects FLASHINFER on T4/Turing). ``start_reranker.sh`` exports that; this script prepends +``sysconfig.get_path("scripts")`` (the stdlib location for this environment's console scripts, +independent of ``python`` symlink targets). +""" + +from __future__ import annotations + +import argparse +import logging +import os +import sys +import sysconfig +from pathlib import Path + +# Repo root on sys.path when run from benchmarks/reranker/. +_ROOT = Path(__file__).resolve().parents[2] +if str(_ROOT) not in sys.path: + sys.path.insert(0, str(_ROOT)) + +logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") + +import torch + +from reranker.backends.qwen3_vllm_score import ( + Qwen3VLLMScoreRerankerBackend, +) + + +def main() -> int: + p = argparse.ArgumentParser() + p.add_argument( + "--gpu-memory-utilization", + type=float, + default=0.12, + help="vLLM gpu_memory_utilization (default 0.12 for tight GPUs)", + ) + args = p.parse_args() + + scripts = sysconfig.get_path("scripts") + if scripts: + os.environ["PATH"] = scripts + os.pathsep + os.environ.get("PATH", "") + + if not torch.cuda.is_available(): + print("SKIP: CUDA not available") + return 0 + + cfg = { + "model_name": "Qwen/Qwen3-Reranker-0.6B", + "max_model_len": 160, + "tensor_parallel_size": 1, + "gpu_memory_utilization": args.gpu_memory_utilization, + "dtype": "float16", + "enable_prefix_caching": False, + "enforce_eager": True, + "infer_batch_size": 4, + "instruction_format": "compact", + } + print("Loading backend ...") + backend = Qwen3VLLMScoreRerankerBackend(cfg) + scores, meta = backend.score_with_meta("smoke query", ["title one", "title two"], normalize=False) + print("scores:", scores) + print("meta:", {k: meta[k] for k in ("backend", "infer_batch_size", "instruction_format") if k in meta}) + print("OK") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/benchmarks/translation/benchmark_nllb_t4_tuning.py b/benchmarks/translation/benchmark_nllb_t4_tuning.py new file mode 100644 index 0000000..3b22ee0 --- /dev/null +++ b/benchmarks/translation/benchmark_nllb_t4_tuning.py @@ -0,0 +1,318 @@ +#!/usr/bin/env python3 +"""Focused NLLB T4 tuning benchmark for product-name translation.""" + +from __future__ import annotations + +import argparse +import copy +import json +import sys +from datetime import datetime +from pathlib import Path +from typing import Any, Dict, List, Tuple + +PROJECT_ROOT = Path(__file__).resolve().parents[2] +if str(PROJECT_ROOT) not in sys.path: + sys.path.insert(0, str(PROJECT_ROOT)) + +from config.services_config import get_translation_config +from benchmarks.translation.benchmark_translation_local_models import ( + benchmark_concurrency_case, + benchmark_serial_case, + build_environment_info, + ensure_cuda_stats_reset, + load_texts, +) +from translation.service import TranslationService + + +SCENARIOS = [ + { + "name": "nllb zh->en", + "model": "nllb-200-distilled-600m", + "source_lang": "zh", + "target_lang": "en", + "column": "title_cn", + "scene": "sku_name", + }, + { + "name": "nllb en->zh", + "model": "nllb-200-distilled-600m", + "source_lang": "en", + "target_lang": "zh", + "column": "title", + "scene": "sku_name", + }, +] + +VARIANTS = [ + { + "name": "ct2_default_fixed64", + "description": "Original CT2 default", + "overrides": { + "ct2_inter_threads": 1, + "ct2_max_queued_batches": 0, + "ct2_batch_type": "examples", + "max_new_tokens": 64, + }, + }, + { + "name": "ct2_prev_t4_fixed64", + "description": "Previous T4 tuning result", + "overrides": { + "ct2_inter_threads": 2, + "ct2_max_queued_batches": 16, + "ct2_batch_type": "examples", + "max_new_tokens": 64, + }, + }, + { + "name": "ct2_best_t4_dynamic", + "description": "Recommended T4 profile after this round", + "overrides": { + "ct2_inter_threads": 4, + "ct2_max_queued_batches": 32, + "ct2_batch_type": "examples", + "max_new_tokens": 64, + "ct2_decoding_length_mode": "source", + "ct2_decoding_length_extra": 8, + "ct2_decoding_length_min": 32, + }, + }, + { + "name": "ct2_fixed48_experiment", + "description": "High-gain experiment with truncation risk", + "overrides": { + "ct2_inter_threads": 3, + "ct2_max_queued_batches": 16, + "ct2_batch_type": "examples", + "max_new_tokens": 48, + }, + }, +] + + +def parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser(description="Focused NLLB T4 tuning benchmark") + parser.add_argument("--csv-path", default="products_analyzed.csv", help="Benchmark dataset CSV path") + parser.add_argument( + "--output-dir", + default="perf_reports/20260318/nllb_t4_product_names_ct2", + help="Directory for JSON/Markdown reports", + ) + parser.add_argument("--batch-size", type=int, default=64, help="Batch size for the bulk scenario") + parser.add_argument("--batch-items", type=int, default=256, help="Rows used for the bulk scenario") + parser.add_argument("--concurrency", type=int, default=64, help="Concurrency for the online scenario") + parser.add_argument( + "--requests-per-case", + type=int, + default=24, + help="Requests per worker in the online scenario", + ) + parser.add_argument("--quality-samples", type=int, default=100, help="Rows used for quality spot-checks") + parser.add_argument("--warmup-batches", type=int, default=1, help="Warmup batches before measuring") + return parser.parse_args() + + +def build_service(model: str, overrides: Dict[str, Any]) -> Tuple[TranslationService, Dict[str, Any]]: + config = copy.deepcopy(get_translation_config()) + for name, cfg in config["capabilities"].items(): + cfg["enabled"] = name == model + cfg["use_cache"] = False + config["default_model"] = model + capability = config["capabilities"][model] + capability.update(overrides) + return TranslationService(config), capability + + +def build_quality_reference_overrides(overrides: Dict[str, Any]) -> Dict[str, Any]: + reference = dict(overrides) + reference.pop("ct2_decoding_length_mode", None) + reference.pop("ct2_decoding_length_extra", None) + reference.pop("ct2_decoding_length_min", None) + reference["max_new_tokens"] = max(64, int(reference.get("max_new_tokens", 64))) + return reference + + +def summarize_quality(reference_outputs: List[Any], candidate_outputs: List[Any], texts: List[str]) -> Dict[str, Any]: + same = 0 + diffs: List[Dict[str, str]] = [] + for text, ref_output, candidate_output in zip(texts, reference_outputs, candidate_outputs): + if ref_output == candidate_output: + same += 1 + continue + if len(diffs) < 3: + diffs.append( + { + "input": text, + "candidate": "" if candidate_output is None else str(candidate_output), + "reference": "" if ref_output is None else str(ref_output), + } + ) + return { + "same": same, + "total": len(texts), + "changed": len(texts) - same, + "sample_diffs": diffs, + } + + +def render_markdown(report: Dict[str, Any]) -> str: + lines = [ + "# NLLB T4 Product-Name Tuning", + "", + f"- Generated at: `{report['generated_at']}`", + f"- Python: `{report['environment']['python']}`", + f"- Torch: `{report['environment']['torch']}`", + f"- Transformers: `{report['environment']['transformers']}`", + f"- CUDA: `{report['environment']['cuda_available']}`", + ] + if report["environment"]["gpu_name"]: + lines.append(f"- GPU: `{report['environment']['gpu_name']}` ({report['environment']['gpu_total_mem_gb']} GiB)") + lines.extend( + [ + "", + "## Scope", + "", + f"- Bulk scenario: `batch={report['config']['batch_size']}, concurrency=1`", + f"- Online scenario: `batch=1, concurrency={report['config']['concurrency']}`", + f"- Online requests per worker: `{report['config']['requests_per_case']}`", + f"- Quality spot-check samples: `{report['config']['quality_samples']}`", + "", + "## Variants", + "", + ] + ) + for variant in report["variants"]: + lines.append(f"- `{variant['name']}`: {variant['description']} -> `{variant['overrides']}`") + + for scenario in report["scenarios"]: + lines.extend( + [ + "", + f"## {scenario['name']}", + "", + "| Variant | Bulk items/s | Bulk p95 ms | Online items/s | Online p95 ms | Quality same/total |", + "|---|---:|---:|---:|---:|---:|", + ] + ) + for variant in scenario["variants"]: + quality = variant["quality_vs_reference"] + lines.append( + f"| {variant['name']} | {variant['bulk']['items_per_second']} | {variant['bulk']['request_latency_p95_ms']} | " + f"{variant['online']['items_per_second']} | {variant['online']['request_latency_p95_ms']} | " + f"{quality['same']}/{quality['total']} |" + ) + for variant in scenario["variants"]: + quality = variant["quality_vs_reference"] + if not quality["sample_diffs"]: + continue + lines.extend( + [ + "", + f"### Quality Notes: {variant['name']}", + "", + ] + ) + for diff in quality["sample_diffs"]: + lines.append(f"- Input: `{diff['input']}`") + lines.append(f"- Candidate: `{diff['candidate']}`") + lines.append(f"- Reference: `{diff['reference']}`") + lines.append("") + + return "\n".join(lines).rstrip() + "\n" + + +def main() -> None: + args = parse_args() + csv_path = (PROJECT_ROOT / args.csv_path).resolve() if not Path(args.csv_path).is_absolute() else Path(args.csv_path) + output_dir = (PROJECT_ROOT / args.output_dir).resolve() if not Path(args.output_dir).is_absolute() else Path(args.output_dir) + output_dir.mkdir(parents=True, exist_ok=True) + + report: Dict[str, Any] = { + "generated_at": datetime.now().isoformat(timespec="seconds"), + "environment": build_environment_info(), + "config": { + "csv_path": str(csv_path), + "batch_size": args.batch_size, + "batch_items": args.batch_items, + "concurrency": args.concurrency, + "requests_per_case": args.requests_per_case, + "quality_samples": args.quality_samples, + }, + "variants": VARIANTS, + "scenarios": [], + } + + for scenario in SCENARIOS: + batch_texts = load_texts(csv_path, scenario["column"], args.batch_items) + online_texts = load_texts(csv_path, scenario["column"], args.concurrency * args.requests_per_case) + quality_texts = load_texts(csv_path, scenario["column"], args.quality_samples) + + scenario_report = dict(scenario) + scenario_report["variants"] = [] + for variant in VARIANTS: + print(f"[start] {scenario['name']} | {variant['name']}", flush=True) + ensure_cuda_stats_reset() + service, capability = build_service(scenario["model"], variant["overrides"]) + backend = service.get_backend(scenario["model"]) + bulk = benchmark_serial_case( + service=service, + backend=backend, + scenario=scenario, + capability=capability, + texts=batch_texts, + batch_size=args.batch_size, + warmup_batches=args.warmup_batches, + ) + online = benchmark_concurrency_case( + service=service, + backend=backend, + scenario=scenario, + capability=capability, + texts=online_texts, + batch_size=1, + concurrency=args.concurrency, + requests_per_case=args.requests_per_case, + warmup_batches=args.warmup_batches, + ) + quality_reference_overrides = build_quality_reference_overrides(variant["overrides"]) + reference_service, _ = build_service(scenario["model"], quality_reference_overrides) + reference_outputs = reference_service.translate( + quality_texts, + source_lang=scenario["source_lang"], + target_lang=scenario["target_lang"], + model=scenario["model"], + scene=scenario["scene"], + ) + candidate_outputs = service.translate( + quality_texts, + source_lang=scenario["source_lang"], + target_lang=scenario["target_lang"], + model=scenario["model"], + scene=scenario["scene"], + ) + scenario_report["variants"].append( + { + "name": variant["name"], + "description": variant["description"], + "overrides": variant["overrides"], + "quality_reference_overrides": quality_reference_overrides, + "bulk": bulk, + "online": online, + "quality_vs_reference": summarize_quality(reference_outputs, candidate_outputs, quality_texts), + } + ) + report["scenarios"].append(scenario_report) + + timestamp = datetime.now().strftime("%H%M%S") + json_path = output_dir / f"nllb_t4_tuning_{timestamp}.json" + md_path = output_dir / f"nllb_t4_tuning_{timestamp}.md" + json_path.write_text(json.dumps(report, ensure_ascii=False, indent=2), encoding="utf-8") + md_path.write_text(render_markdown(report), encoding="utf-8") + print(f"JSON_REPORT={json_path}") + print(f"MARKDOWN_REPORT={md_path}") + + +if __name__ == "__main__": + main() diff --git a/benchmarks/translation/benchmark_translation_local_models.py b/benchmarks/translation/benchmark_translation_local_models.py new file mode 100644 index 0000000..7e74c60 --- /dev/null +++ b/benchmarks/translation/benchmark_translation_local_models.py @@ -0,0 +1,948 @@ +#!/usr/bin/env python3 +"""Benchmark local translation models with products_analyzed.csv.""" + +from __future__ import annotations + +import argparse +import concurrent.futures +import copy +import csv +import json +import math +import platform +import resource +import statistics +import subprocess +import sys +import time +from datetime import datetime +from pathlib import Path +from typing import Any, Dict, Iterable, List, Sequence + +import torch +import transformers + +PROJECT_ROOT = Path(__file__).resolve().parents[2] +if str(PROJECT_ROOT) not in sys.path: + sys.path.insert(0, str(PROJECT_ROOT)) + +from config.services_config import get_translation_config # noqa: E402 +from translation.service import TranslationService # noqa: E402 +from translation.settings import get_translation_capability # noqa: E402 + + +DEFAULT_BATCH_SIZES = [1, 4, 8, 16, 32, 64] +DEFAULT_CONCURRENCIES = [1, 2, 4, 8, 16, 64] + +SCENARIOS: List[Dict[str, str]] = [ + { + "name": "nllb-200-distilled-600m zh->en", + "model": "nllb-200-distilled-600m", + "source_lang": "zh", + "target_lang": "en", + "column": "title_cn", + "scene": "sku_name", + }, + { + "name": "nllb-200-distilled-600m en->zh", + "model": "nllb-200-distilled-600m", + "source_lang": "en", + "target_lang": "zh", + "column": "title", + "scene": "sku_name", + }, + { + "name": "opus-mt-zh-en zh->en", + "model": "opus-mt-zh-en", + "source_lang": "zh", + "target_lang": "en", + "column": "title_cn", + "scene": "sku_name", + }, + { + "name": "opus-mt-en-zh en->zh", + "model": "opus-mt-en-zh", + "source_lang": "en", + "target_lang": "zh", + "column": "title", + "scene": "sku_name", + }, +] + + +def parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser(description="Benchmark local translation models") + parser.add_argument("--csv-path", default="products_analyzed.csv", help="Benchmark dataset CSV path") + parser.add_argument("--limit", type=int, default=0, help="Limit rows for baseline or single-case run; 0 means all") + parser.add_argument("--output-dir", default="", help="Directory for JSON/Markdown reports") + parser.add_argument("--single", action="store_true", help="Run a single scenario in-process") + parser.add_argument("--model", default="", help="Model name for --single mode") + parser.add_argument("--source-lang", default="", help="Source language for --single mode") + parser.add_argument("--target-lang", default="", help="Target language for --single mode") + parser.add_argument("--column", default="", help="CSV column to benchmark for --single mode") + parser.add_argument("--scene", default="sku_name", help="Scene passed to translation service") + parser.add_argument("--batch-size", type=int, default=0, help="Override configured batch size") + parser.add_argument("--device-override", default="", help="Override configured device, for example cpu or cuda") + parser.add_argument("--torch-dtype-override", default="", help="Override configured torch dtype, for example float32 or float16") + parser.add_argument("--max-new-tokens", type=int, default=0, help="Override configured max_new_tokens") + parser.add_argument("--num-beams", type=int, default=0, help="Override configured num_beams") + parser.add_argument("--attn-implementation", default="", help="Override attention implementation, for example sdpa") + parser.add_argument("--ct2-inter-threads", type=int, default=-1, help="Override CTranslate2 inter_threads") + parser.add_argument("--ct2-intra-threads", type=int, default=-1, help="Override CTranslate2 intra_threads") + parser.add_argument( + "--ct2-max-queued-batches", + type=int, + default=-1, + help="Override CTranslate2 max_queued_batches", + ) + parser.add_argument( + "--ct2-batch-type", + default="", + help="Override CTranslate2 batch_type, for example examples or tokens", + ) + parser.add_argument( + "--ct2-decoding-length-mode", + default="", + help="Override CTranslate2 decoding length mode, for example fixed or source", + ) + parser.add_argument( + "--ct2-decoding-length-extra", + type=int, + default=0, + help="Extra tokens added when ct2 decoding length mode is source", + ) + parser.add_argument( + "--ct2-decoding-length-min", + type=int, + default=0, + help="Minimum decoding length when ct2 decoding length mode is source", + ) + parser.add_argument("--warmup-batches", type=int, default=1, help="Warmup batches before measuring") + parser.add_argument("--disable-cache", action="store_true", help="Disable translation cache during benchmarks") + parser.add_argument( + "--suite", + choices=["baseline", "extended"], + default="baseline", + help="baseline keeps the previous all-scenarios summary; extended adds batch/concurrency/matrix sweeps", + ) + parser.add_argument( + "--batch-size-list", + default="", + help="Comma-separated batch sizes for extended suite; default 1,4,8,16,32,64", + ) + parser.add_argument( + "--concurrency-list", + default="", + help="Comma-separated concurrency levels for extended suite; default 1,2,4,8,16,64", + ) + parser.add_argument( + "--serial-items-per-case", + type=int, + default=512, + help="Items per batch-size case in extended suite", + ) + parser.add_argument( + "--concurrency-requests-per-case", + type=int, + default=128, + help="Requests per concurrency or matrix case in extended suite", + ) + parser.add_argument( + "--concurrency-batch-size", + type=int, + default=1, + help="Batch size used by the dedicated concurrency sweep", + ) + parser.add_argument( + "--max-batch-concurrency-product", + type=int, + default=128, + help="Skip matrix cases where batch_size * concurrency exceeds this value; 0 disables the limit", + ) + return parser.parse_args() + + +def parse_csv_ints(raw: str, fallback: Sequence[int]) -> List[int]: + if not raw.strip(): + return list(fallback) + values: List[int] = [] + for item in raw.split(","): + stripped = item.strip() + if not stripped: + continue + value = int(stripped) + if value <= 0: + raise ValueError(f"Expected positive integer, got {value}") + values.append(value) + if not values: + raise ValueError("Parsed empty integer list") + return values + + +def load_texts(csv_path: Path, column: str, limit: int) -> List[str]: + texts: List[str] = [] + with csv_path.open("r", encoding="utf-8") as handle: + reader = csv.DictReader(handle) + for row in reader: + value = (row.get(column) or "").strip() + if value: + texts.append(value) + if limit > 0 and len(texts) >= limit: + break + if not texts: + raise ValueError(f"No non-empty texts found in column '{column}' from {csv_path}") + return texts + + +def batched(values: Sequence[str], batch_size: int) -> Iterable[List[str]]: + for start in range(0, len(values), batch_size): + yield list(values[start:start + batch_size]) + + +def percentile(values: List[float], p: float) -> float: + if not values: + return 0.0 + ordered = sorted(values) + if len(values) == 1: + return float(ordered[0]) + idx = (len(ordered) - 1) * p + lower = math.floor(idx) + upper = math.ceil(idx) + if lower == upper: + return float(ordered[lower]) + return float(ordered[lower] + (ordered[upper] - ordered[lower]) * (idx - lower)) + + +def resolve_output_dir(output_dir: str) -> Path: + if output_dir: + path = Path(output_dir) + else: + path = PROJECT_ROOT / "perf_reports" / datetime.now().strftime("%Y%m%d") / "translation_local_models" + path.mkdir(parents=True, exist_ok=True) + return path + + +def build_environment_info() -> Dict[str, Any]: + gpu_name = None + gpu_total_mem_gb = None + if torch.cuda.is_available(): + gpu_name = torch.cuda.get_device_name(0) + props = torch.cuda.get_device_properties(0) + gpu_total_mem_gb = round(props.total_memory / (1024 ** 3), 2) + return { + "python": platform.python_version(), + "torch": torch.__version__, + "transformers": transformers.__version__, + "cuda_available": torch.cuda.is_available(), + "gpu_name": gpu_name, + "gpu_total_mem_gb": gpu_total_mem_gb, + "platform": platform.platform(), + } + + +def scenario_from_args(args: argparse.Namespace) -> Dict[str, str]: + return { + "name": f"{args.model} {args.source_lang}->{args.target_lang}", + "model": args.model, + "source_lang": args.source_lang, + "target_lang": args.target_lang, + "column": args.column, + "scene": args.scene, + } + + +def build_config_and_capability( + args: argparse.Namespace, + *, + batch_size_override: int | None = None, +) -> tuple[Dict[str, Any], Dict[str, Any]]: + config = copy.deepcopy(get_translation_config()) + for name, cfg in config["capabilities"].items(): + cfg["enabled"] = name == args.model + config["default_model"] = args.model + capability = get_translation_capability(config, args.model, require_enabled=False) + if args.device_override: + capability["device"] = args.device_override + if args.torch_dtype_override: + capability["torch_dtype"] = args.torch_dtype_override + if batch_size_override is not None: + capability["batch_size"] = batch_size_override + elif args.batch_size: + capability["batch_size"] = args.batch_size + if args.max_new_tokens: + capability["max_new_tokens"] = args.max_new_tokens + if args.num_beams: + capability["num_beams"] = args.num_beams + if args.attn_implementation: + capability["attn_implementation"] = args.attn_implementation + if args.ct2_inter_threads >= 0: + capability["ct2_inter_threads"] = args.ct2_inter_threads + if args.ct2_intra_threads >= 0: + capability["ct2_intra_threads"] = args.ct2_intra_threads + if args.ct2_max_queued_batches >= 0: + capability["ct2_max_queued_batches"] = args.ct2_max_queued_batches + if args.ct2_batch_type: + capability["ct2_batch_type"] = args.ct2_batch_type + if args.ct2_decoding_length_mode: + capability["ct2_decoding_length_mode"] = args.ct2_decoding_length_mode + if args.ct2_decoding_length_extra: + capability["ct2_decoding_length_extra"] = args.ct2_decoding_length_extra + if args.ct2_decoding_length_min: + capability["ct2_decoding_length_min"] = args.ct2_decoding_length_min + if args.disable_cache: + capability["use_cache"] = False + config["capabilities"][args.model] = capability + return config, capability + + +def ensure_cuda_stats_reset() -> None: + if torch.cuda.is_available(): + torch.cuda.empty_cache() + torch.cuda.reset_peak_memory_stats() + + +def build_memory_metrics() -> Dict[str, Any]: + peak_gpu_mem_gb = None + peak_gpu_reserved_gb = None + if torch.cuda.is_available(): + peak_gpu_mem_gb = round(torch.cuda.max_memory_allocated() / (1024 ** 3), 3) + peak_gpu_reserved_gb = round(torch.cuda.max_memory_reserved() / (1024 ** 3), 3) + max_rss_mb = round(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024, 2) + return { + "max_rss_mb": max_rss_mb, + "peak_gpu_memory_gb": peak_gpu_mem_gb, + "peak_gpu_reserved_gb": peak_gpu_reserved_gb, + } + + +def make_request_payload(batch: Sequence[str]) -> str | List[str]: + if len(batch) == 1: + return batch[0] + return list(batch) + + +def benchmark_serial_case( + *, + service: TranslationService, + backend: Any, + scenario: Dict[str, str], + capability: Dict[str, Any], + texts: List[str], + batch_size: int, + warmup_batches: int, +) -> Dict[str, Any]: + backend.batch_size = batch_size + measured_batches = list(batched(texts, batch_size)) + warmup_count = min(max(warmup_batches, 0), len(measured_batches)) + + for batch in measured_batches[:warmup_count]: + service.translate( + text=make_request_payload(batch), + source_lang=scenario["source_lang"], + target_lang=scenario["target_lang"], + model=scenario["model"], + scene=scenario["scene"], + ) + + batch_latencies_ms: List[float] = [] + success_count = 0 + failure_count = 0 + output_chars = 0 + total_input_chars = sum(len(text) for text in texts) + + start = time.perf_counter() + for batch in measured_batches: + batch_start = time.perf_counter() + outputs = service.translate( + text=make_request_payload(batch), + source_lang=scenario["source_lang"], + target_lang=scenario["target_lang"], + model=scenario["model"], + scene=scenario["scene"], + ) + elapsed_ms = (time.perf_counter() - batch_start) * 1000 + batch_latencies_ms.append(elapsed_ms) + + if isinstance(outputs, list): + result_items = outputs + else: + result_items = [outputs] + for item in result_items: + if item is None: + failure_count += 1 + else: + success_count += 1 + output_chars += len(item) + translate_seconds = time.perf_counter() - start + total_items = len(texts) + memory = build_memory_metrics() + + return { + "mode": "serial_batch", + "batch_size": batch_size, + "concurrency": 1, + "rows": total_items, + "requests": len(measured_batches), + "input_chars": total_input_chars, + "load_seconds": 0.0, + "translate_seconds": round(translate_seconds, 4), + "total_seconds": round(translate_seconds, 4), + "batch_count": len(batch_latencies_ms), + "request_latency_p50_ms": round(percentile(batch_latencies_ms, 0.50), 2), + "request_latency_p95_ms": round(percentile(batch_latencies_ms, 0.95), 2), + "request_latency_max_ms": round(max(batch_latencies_ms), 2), + "avg_request_latency_ms": round(statistics.fmean(batch_latencies_ms), 2), + "avg_item_latency_ms": round((translate_seconds / total_items) * 1000, 3), + "requests_per_second": round(len(measured_batches) / translate_seconds, 2), + "items_per_second": round(total_items / translate_seconds, 2), + "input_chars_per_second": round(total_input_chars / translate_seconds, 2), + "output_chars_per_second": round(output_chars / translate_seconds, 2), + "success_count": success_count, + "failure_count": failure_count, + "success_rate": round(success_count / total_items, 6), + "device": str(getattr(backend, "device", capability.get("device", "unknown"))), + "torch_dtype": str(getattr(backend, "torch_dtype", capability.get("torch_dtype", "unknown"))), + "configured_batch_size": int(capability.get("batch_size") or batch_size), + "used_batch_size": batch_size, + "warmup_batches": warmup_count, + **memory, + } + + +def benchmark_concurrency_case( + *, + service: TranslationService, + backend: Any, + scenario: Dict[str, str], + capability: Dict[str, Any], + texts: List[str], + batch_size: int, + concurrency: int, + requests_per_case: int, + warmup_batches: int, +) -> Dict[str, Any]: + backend.batch_size = batch_size + required_items = batch_size * requests_per_case + case_texts = texts[:required_items] + request_batches = list(batched(case_texts, batch_size)) + if not request_batches: + raise ValueError("No request batches prepared for concurrency benchmark") + warmup_count = min(max(warmup_batches, 0), len(request_batches)) + + for batch in request_batches[:warmup_count]: + service.translate( + text=make_request_payload(batch), + source_lang=scenario["source_lang"], + target_lang=scenario["target_lang"], + model=scenario["model"], + scene=scenario["scene"], + ) + + request_latencies_ms: List[float] = [] + success_count = 0 + failure_count = 0 + output_chars = 0 + total_input_chars = sum(len(text) for text in case_texts) + + def worker(batch: List[str]) -> tuple[float, int, int, int]: + started = time.perf_counter() + outputs = service.translate( + text=make_request_payload(batch), + source_lang=scenario["source_lang"], + target_lang=scenario["target_lang"], + model=scenario["model"], + scene=scenario["scene"], + ) + elapsed_ms = (time.perf_counter() - started) * 1000 + if isinstance(outputs, list): + result_items = outputs + else: + result_items = [outputs] + local_success = 0 + local_failure = 0 + local_output_chars = 0 + for item in result_items: + if item is None: + local_failure += 1 + else: + local_success += 1 + local_output_chars += len(item) + return elapsed_ms, local_success, local_failure, local_output_chars + + wall_start = time.perf_counter() + with concurrent.futures.ThreadPoolExecutor(max_workers=concurrency) as executor: + futures = [executor.submit(worker, batch) for batch in request_batches] + for future in concurrent.futures.as_completed(futures): + latency_ms, local_success, local_failure, local_output_chars = future.result() + request_latencies_ms.append(latency_ms) + success_count += local_success + failure_count += local_failure + output_chars += local_output_chars + wall_seconds = time.perf_counter() - wall_start + total_items = len(case_texts) + memory = build_memory_metrics() + + return { + "mode": "concurrency", + "batch_size": batch_size, + "concurrency": concurrency, + "rows": total_items, + "requests": len(request_batches), + "input_chars": total_input_chars, + "load_seconds": 0.0, + "translate_seconds": round(wall_seconds, 4), + "total_seconds": round(wall_seconds, 4), + "batch_count": len(request_latencies_ms), + "request_latency_p50_ms": round(percentile(request_latencies_ms, 0.50), 2), + "request_latency_p95_ms": round(percentile(request_latencies_ms, 0.95), 2), + "request_latency_max_ms": round(max(request_latencies_ms), 2), + "avg_request_latency_ms": round(statistics.fmean(request_latencies_ms), 2), + "avg_item_latency_ms": round((wall_seconds / total_items) * 1000, 3), + "requests_per_second": round(len(request_batches) / wall_seconds, 2), + "items_per_second": round(total_items / wall_seconds, 2), + "input_chars_per_second": round(total_input_chars / wall_seconds, 2), + "output_chars_per_second": round(output_chars / wall_seconds, 2), + "success_count": success_count, + "failure_count": failure_count, + "success_rate": round(success_count / total_items, 6), + "device": str(getattr(backend, "device", capability.get("device", "unknown"))), + "torch_dtype": str(getattr(backend, "torch_dtype", capability.get("torch_dtype", "unknown"))), + "configured_batch_size": int(capability.get("batch_size") or batch_size), + "used_batch_size": batch_size, + "warmup_batches": warmup_count, + **memory, + } + + +def benchmark_single_scenario(args: argparse.Namespace) -> Dict[str, Any]: + csv_path = (PROJECT_ROOT / args.csv_path).resolve() if not Path(args.csv_path).is_absolute() else Path(args.csv_path) + scenario = scenario_from_args(args) + config, capability = build_config_and_capability(args) + configured_batch_size = int(capability.get("batch_size") or 1) + batch_size = configured_batch_size + texts = load_texts(csv_path, args.column, args.limit) + + ensure_cuda_stats_reset() + load_start = time.perf_counter() + service = TranslationService(config) + backend = service.get_backend(args.model) + load_seconds = time.perf_counter() - load_start + + runtime = benchmark_serial_case( + service=service, + backend=backend, + scenario=scenario, + capability=capability, + texts=texts, + batch_size=batch_size, + warmup_batches=args.warmup_batches, + ) + runtime["load_seconds"] = round(load_seconds, 4) + runtime["total_seconds"] = round(runtime["load_seconds"] + runtime["translate_seconds"], 4) + + return { + "scenario": scenario, + "dataset": { + "csv_path": str(csv_path), + "rows": len(texts), + "input_chars": sum(len(text) for text in texts), + }, + "runtime": runtime, + } + + +def benchmark_extended_scenario(args: argparse.Namespace) -> Dict[str, Any]: + csv_path = (PROJECT_ROOT / args.csv_path).resolve() if not Path(args.csv_path).is_absolute() else Path(args.csv_path) + scenario = scenario_from_args(args) + batch_sizes = parse_csv_ints(args.batch_size_list, DEFAULT_BATCH_SIZES) + concurrencies = parse_csv_ints(args.concurrency_list, DEFAULT_CONCURRENCIES) + largest_batch = max(batch_sizes + [args.concurrency_batch_size]) + largest_concurrency = max(concurrencies) + max_product = args.max_batch_concurrency_product + required_items = max( + args.limit or 0, + max(args.serial_items_per_case, largest_batch), + args.concurrency_requests_per_case * args.concurrency_batch_size, + largest_batch * args.concurrency_requests_per_case, + ) + texts = load_texts(csv_path, args.column, required_items) + config, capability = build_config_and_capability(args) + + ensure_cuda_stats_reset() + load_start = time.perf_counter() + service = TranslationService(config) + backend = service.get_backend(args.model) + load_seconds = time.perf_counter() - load_start + + batch_sweep: List[Dict[str, Any]] = [] + concurrency_sweep: List[Dict[str, Any]] = [] + matrix_results: List[Dict[str, Any]] = [] + + for batch_size in batch_sizes: + case_texts = texts[: max(batch_size, args.serial_items_per_case)] + batch_sweep.append( + benchmark_serial_case( + service=service, + backend=backend, + scenario=scenario, + capability=capability, + texts=case_texts, + batch_size=batch_size, + warmup_batches=args.warmup_batches, + ) + ) + + for concurrency in concurrencies: + concurrency_sweep.append( + benchmark_concurrency_case( + service=service, + backend=backend, + scenario=scenario, + capability=capability, + texts=texts, + batch_size=args.concurrency_batch_size, + concurrency=concurrency, + requests_per_case=args.concurrency_requests_per_case, + warmup_batches=args.warmup_batches, + ) + ) + + for batch_size in batch_sizes: + for concurrency in concurrencies: + if max_product > 0 and batch_size * concurrency > max_product: + continue + matrix_results.append( + benchmark_concurrency_case( + service=service, + backend=backend, + scenario=scenario, + capability=capability, + texts=texts, + batch_size=batch_size, + concurrency=concurrency, + requests_per_case=args.concurrency_requests_per_case, + warmup_batches=args.warmup_batches, + ) + ) + + for collection in (batch_sweep, concurrency_sweep, matrix_results): + for idx, item in enumerate(collection): + item["load_seconds"] = round(load_seconds if idx == 0 else 0.0, 4) + item["total_seconds"] = round(item["load_seconds"] + item["translate_seconds"], 4) + + return { + "scenario": scenario, + "dataset": { + "csv_path": str(csv_path), + "rows_loaded": len(texts), + }, + "config": { + "batch_sizes": batch_sizes, + "concurrencies": concurrencies, + "serial_items_per_case": args.serial_items_per_case, + "concurrency_requests_per_case": args.concurrency_requests_per_case, + "concurrency_batch_size": args.concurrency_batch_size, + "max_batch_concurrency_product": max_product, + "cache_disabled": bool(args.disable_cache), + }, + "runtime_defaults": { + "device": str(getattr(backend, "device", capability.get("device", "unknown"))), + "torch_dtype": str(getattr(backend, "torch_dtype", capability.get("torch_dtype", "unknown"))), + "configured_batch_size": int(capability.get("batch_size") or 1), + "load_seconds": round(load_seconds, 4), + }, + "batch_sweep": batch_sweep, + "concurrency_sweep": concurrency_sweep, + "matrix": matrix_results, + } + + +def run_all_scenarios(args: argparse.Namespace) -> Dict[str, Any]: + report = { + "generated_at": datetime.now().isoformat(timespec="seconds"), + "suite": args.suite, + "environment": build_environment_info(), + "scenarios": [], + } + + for scenario in SCENARIOS: + cmd = [ + sys.executable, + str(Path(__file__).resolve()), + "--single", + "--csv-path", + args.csv_path, + "--model", + scenario["model"], + "--source-lang", + scenario["source_lang"], + "--target-lang", + scenario["target_lang"], + "--column", + scenario["column"], + "--scene", + scenario["scene"], + "--warmup-batches", + str(args.warmup_batches), + "--suite", + args.suite, + "--serial-items-per-case", + str(args.serial_items_per_case), + "--concurrency-requests-per-case", + str(args.concurrency_requests_per_case), + "--concurrency-batch-size", + str(args.concurrency_batch_size), + "--max-batch-concurrency-product", + str(args.max_batch_concurrency_product), + ] + if args.limit: + cmd.extend(["--limit", str(args.limit)]) + if args.batch_size: + cmd.extend(["--batch-size", str(args.batch_size)]) + if args.batch_size_list: + cmd.extend(["--batch-size-list", args.batch_size_list]) + if args.concurrency_list: + cmd.extend(["--concurrency-list", args.concurrency_list]) + if args.device_override: + cmd.extend(["--device-override", args.device_override]) + if args.torch_dtype_override: + cmd.extend(["--torch-dtype-override", args.torch_dtype_override]) + if args.max_new_tokens: + cmd.extend(["--max-new-tokens", str(args.max_new_tokens)]) + if args.num_beams: + cmd.extend(["--num-beams", str(args.num_beams)]) + if args.attn_implementation: + cmd.extend(["--attn-implementation", args.attn_implementation]) + if args.ct2_inter_threads >= 0: + cmd.extend(["--ct2-inter-threads", str(args.ct2_inter_threads)]) + if args.ct2_intra_threads >= 0: + cmd.extend(["--ct2-intra-threads", str(args.ct2_intra_threads)]) + if args.ct2_max_queued_batches >= 0: + cmd.extend(["--ct2-max-queued-batches", str(args.ct2_max_queued_batches)]) + if args.ct2_batch_type: + cmd.extend(["--ct2-batch-type", args.ct2_batch_type]) + if args.ct2_decoding_length_mode: + cmd.extend(["--ct2-decoding-length-mode", args.ct2_decoding_length_mode]) + if args.ct2_decoding_length_extra: + cmd.extend(["--ct2-decoding-length-extra", str(args.ct2_decoding_length_extra)]) + if args.ct2_decoding_length_min: + cmd.extend(["--ct2-decoding-length-min", str(args.ct2_decoding_length_min)]) + if args.disable_cache: + cmd.append("--disable-cache") + + completed = subprocess.run(cmd, capture_output=True, text=True, check=True) + result_line = "" + for line in reversed(completed.stdout.splitlines()): + if line.startswith("JSON_RESULT="): + result_line = line + break + if not result_line: + raise RuntimeError(f"Scenario output missing JSON_RESULT marker:\n{completed.stdout}\n{completed.stderr}") + payload = json.loads(result_line.split("=", 1)[1]) + payload["scenario"]["name"] = scenario["name"] + report["scenarios"].append(payload) + + return report + + +def render_baseline_markdown_report(report: Dict[str, Any]) -> str: + lines = [ + "# Local Translation Model Benchmark", + "", + f"- Generated at: `{report['generated_at']}`", + f"- Suite: `{report['suite']}`", + f"- Python: `{report['environment']['python']}`", + f"- Torch: `{report['environment']['torch']}`", + f"- Transformers: `{report['environment']['transformers']}`", + f"- CUDA: `{report['environment']['cuda_available']}`", + ] + if report["environment"]["gpu_name"]: + lines.append(f"- GPU: `{report['environment']['gpu_name']}` ({report['environment']['gpu_total_mem_gb']} GiB)") + lines.extend( + [ + "", + "| Scenario | Items/s | Avg item ms | Req p50 ms | Req p95 ms | Load s | Peak GPU GiB | Success |", + "|---|---:|---:|---:|---:|---:|---:|---:|", + ] + ) + for item in report["scenarios"]: + runtime = item["runtime"] + lines.append( + "| {name} | {items_per_second} | {avg_item_latency_ms} | {request_latency_p50_ms} | {request_latency_p95_ms} | {load_seconds} | {peak_gpu_memory_gb} | {success_rate} |".format( + name=item["scenario"]["name"], + items_per_second=runtime["items_per_second"], + avg_item_latency_ms=runtime["avg_item_latency_ms"], + request_latency_p50_ms=runtime["request_latency_p50_ms"], + request_latency_p95_ms=runtime["request_latency_p95_ms"], + load_seconds=runtime["load_seconds"], + peak_gpu_memory_gb=runtime["peak_gpu_memory_gb"], + success_rate=runtime["success_rate"], + ) + ) + + lines.append("") + for item in report["scenarios"]: + runtime = item["runtime"] + dataset = item["dataset"] + lines.extend( + [ + f"## {item['scenario']['name']}", + "", + f"- Dataset rows: `{dataset['rows']}` from column `{item['scenario']['column']}`", + f"- Direction: `{item['scenario']['source_lang']} -> {item['scenario']['target_lang']}`", + f"- Batch size: configured `{runtime['configured_batch_size']}`, used `{runtime['used_batch_size']}`", + f"- Load time: `{runtime['load_seconds']} s`", + f"- Translate time: `{runtime['translate_seconds']} s`", + f"- Throughput: `{runtime['items_per_second']} items/s`, `{runtime['input_chars_per_second']} input chars/s`", + f"- Latency: avg item `{runtime['avg_item_latency_ms']} ms`, req p50 `{runtime['request_latency_p50_ms']} ms`, req p95 `{runtime['request_latency_p95_ms']} ms`, req max `{runtime['request_latency_max_ms']} ms`", + f"- Memory: max RSS `{runtime['max_rss_mb']} MB`, peak GPU allocated `{runtime['peak_gpu_memory_gb']} GiB`, peak GPU reserved `{runtime['peak_gpu_reserved_gb']} GiB`", + f"- Success: `{runtime['success_count']}/{dataset['rows']}`", + "", + ] + ) + return "\n".join(lines) + + +def render_case_table( + title: str, + rows: Sequence[Dict[str, Any]], + *, + include_batch: bool, + include_concurrency: bool, +) -> List[str]: + headers = ["Rows", "Requests", "Items/s", "Req/s", "Avg req ms", "Req p50 ms", "Req p95 ms", "Peak GPU GiB"] + prefix_headers: List[str] = [] + if include_batch: + prefix_headers.append("Batch") + if include_concurrency: + prefix_headers.append("Concurrency") + headers = prefix_headers + headers + lines = [f"### {title}", ""] + lines.append("| " + " | ".join(headers) + " |") + lines.append("|" + "|".join(["---:"] * len(headers)) + "|") + for item in rows: + values: List[str] = [] + if include_batch: + values.append(str(item["batch_size"])) + if include_concurrency: + values.append(str(item["concurrency"])) + values.extend( + [ + str(item["rows"]), + str(item["requests"]), + str(item["items_per_second"]), + str(item["requests_per_second"]), + str(item["avg_request_latency_ms"]), + str(item["request_latency_p50_ms"]), + str(item["request_latency_p95_ms"]), + str(item["peak_gpu_memory_gb"]), + ] + ) + lines.append("| " + " | ".join(values) + " |") + lines.append("") + return lines + + +def render_extended_markdown_report(report: Dict[str, Any]) -> str: + lines = [ + "# Local Translation Model Extended Benchmark", + "", + f"- Generated at: `{report['generated_at']}`", + f"- Suite: `{report['suite']}`", + f"- Python: `{report['environment']['python']}`", + f"- Torch: `{report['environment']['torch']}`", + f"- Transformers: `{report['environment']['transformers']}`", + f"- CUDA: `{report['environment']['cuda_available']}`", + ] + if report["environment"]["gpu_name"]: + lines.append(f"- GPU: `{report['environment']['gpu_name']}` ({report['environment']['gpu_total_mem_gb']} GiB)") + + lines.extend( + [ + "", + "## Reading Guide", + "", + "- `batch_sweep`: single stream only (`concurrency=1`), used to compare bulk translation efficiency across batch sizes.", + "- `concurrency_sweep`: fixed request batch size, used to compare online request latency and throughput as concurrency rises.", + "- `matrix`: combined `batch_size x concurrency` runs, filtered by `batch_size * concurrency <= limit` when configured.", + "", + ] + ) + + for item in report["scenarios"]: + lines.extend( + [ + f"## {item['scenario']['name']}", + "", + f"- Direction: `{item['scenario']['source_lang']} -> {item['scenario']['target_lang']}`", + f"- Column: `{item['scenario']['column']}`", + f"- Loaded rows: `{item['dataset']['rows_loaded']}`", + f"- Load time: `{item['runtime_defaults']['load_seconds']} s`", + f"- Device: `{item['runtime_defaults']['device']}`", + f"- DType: `{item['runtime_defaults']['torch_dtype']}`", + f"- Cache disabled: `{item['config']['cache_disabled']}`", + "", + ] + ) + lines.extend(render_case_table("Batch Sweep (`concurrency=1`)", item["batch_sweep"], include_batch=True, include_concurrency=False)) + lines.extend( + render_case_table( + f"Concurrency Sweep (`batch_size={item['config']['concurrency_batch_size']}`)", + item["concurrency_sweep"], + include_batch=False, + include_concurrency=True, + ) + ) + lines.extend(render_case_table("Batch x Concurrency Matrix", item["matrix"], include_batch=True, include_concurrency=True)) + return "\n".join(lines) + + +def render_markdown_report(report: Dict[str, Any]) -> str: + if report["suite"] == "extended": + return render_extended_markdown_report(report) + return render_baseline_markdown_report(report) + + +def main() -> None: + args = parse_args() + if args.single: + if args.suite == "extended": + result = benchmark_extended_scenario(args) + else: + result = benchmark_single_scenario(args) + print("JSON_RESULT=" + json.dumps(result, ensure_ascii=False)) + return + + report = run_all_scenarios(args) + output_dir = resolve_output_dir(args.output_dir) + timestamp = datetime.now().strftime("%H%M%S") + suffix = "extended" if args.suite == "extended" else "baseline" + json_path = output_dir / f"translation_local_models_{suffix}_{timestamp}.json" + md_path = output_dir / f"translation_local_models_{suffix}_{timestamp}.md" + json_path.write_text(json.dumps(report, ensure_ascii=False, indent=2), encoding="utf-8") + md_path.write_text(render_markdown_report(report), encoding="utf-8") + + print(f"JSON report: {json_path}") + print(f"Markdown report: {md_path}") + for item in report["scenarios"]: + if args.suite == "extended": + best_batch = max(item["batch_sweep"], key=lambda x: x["items_per_second"]) + best_concurrency = max(item["concurrency_sweep"], key=lambda x: x["items_per_second"]) + print( + f"{item['scenario']['name']}: " + f"best_batch={best_batch['batch_size']} ({best_batch['items_per_second']} items/s) | " + f"best_concurrency={best_concurrency['concurrency']} ({best_concurrency['items_per_second']} items/s @ batch={best_concurrency['batch_size']})" + ) + else: + runtime = item["runtime"] + print( + f"{item['scenario']['name']}: " + f"{runtime['items_per_second']} items/s | " + f"avg_item={runtime['avg_item_latency_ms']} ms | " + f"p95_req={runtime['request_latency_p95_ms']} ms | " + f"load={runtime['load_seconds']} s" + ) + + +if __name__ == "__main__": + main() diff --git a/benchmarks/translation/benchmark_translation_local_models_focus.py b/benchmarks/translation/benchmark_translation_local_models_focus.py new file mode 100644 index 0000000..d793a10 --- /dev/null +++ b/benchmarks/translation/benchmark_translation_local_models_focus.py @@ -0,0 +1,250 @@ +#!/usr/bin/env python3 +"""Focused translation benchmark for two stress scenarios on local CT2 models.""" + +from __future__ import annotations + +import argparse +import copy +import json +import sys +from datetime import datetime +from pathlib import Path +from typing import Any, Dict, List + +PROJECT_ROOT = Path(__file__).resolve().parents[2] +if str(PROJECT_ROOT) not in sys.path: + sys.path.insert(0, str(PROJECT_ROOT)) + +from config.services_config import get_translation_config +from benchmarks.translation.benchmark_translation_local_models import ( + SCENARIOS, + benchmark_concurrency_case, + benchmark_serial_case, + build_environment_info, + ensure_cuda_stats_reset, + load_texts, +) +from translation.service import TranslationService + +DEFAULT_HIGH_BATCH_SIZES = [32, 64, 128] +DEFAULT_HIGH_CONCURRENCIES = [8, 16, 32, 64] + + +def parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser(description="Focused benchmark for local CT2 translation models") + parser.add_argument("--csv-path", default="products_analyzed.csv", help="Benchmark dataset CSV path") + parser.add_argument( + "--output-dir", + default="perf_reports/20260318/translation_local_models_ct2_focus", + help="Directory for JSON/Markdown focused reports", + ) + parser.add_argument( + "--high-batch-sizes", + default="32,64,128", + help="Comma-separated batch sizes for the high-batch/low-concurrency scenario", + ) + parser.add_argument( + "--high-concurrencies", + default="8,16,32,64", + help="Comma-separated concurrency levels for the high-concurrency/low-batch scenario", + ) + parser.add_argument( + "--high-batch-rows", + type=int, + default=512, + help="Rows used for the high-batch/low-concurrency scenario", + ) + parser.add_argument( + "--high-concurrency-requests", + type=int, + default=32, + help="Requests per high-concurrency/low-batch case", + ) + parser.add_argument("--warmup-batches", type=int, default=1, help="Warmup batches before measuring") + return parser.parse_args() + + +def parse_csv_ints(raw: str) -> List[int]: + values: List[int] = [] + for item in raw.split(","): + stripped = item.strip() + if not stripped: + continue + value = int(stripped) + if value <= 0: + raise ValueError(f"Expected positive integer, got {value}") + values.append(value) + if not values: + raise ValueError("Parsed empty integer list") + return values + + +def build_variant_config(model: str, overrides: Dict[str, Any]) -> tuple[Dict[str, Any], Dict[str, Any]]: + config = copy.deepcopy(get_translation_config()) + for name, cfg in config["capabilities"].items(): + cfg["enabled"] = name == model + cfg["use_cache"] = False + config["default_model"] = model + capability = config["capabilities"][model] + capability.update(overrides) + config["capabilities"][model] = capability + return config, capability + + +def render_markdown(report: Dict[str, Any]) -> str: + lines = [ + "# Local Translation Model Focused Benchmark", + "", + f"- Generated at: `{report['generated_at']}`", + f"- Python: `{report['environment']['python']}`", + f"- Torch: `{report['environment']['torch']}`", + f"- Transformers: `{report['environment']['transformers']}`", + f"- CUDA: `{report['environment']['cuda_available']}`", + ] + if report["environment"]["gpu_name"]: + lines.append(f"- GPU: `{report['environment']['gpu_name']}` ({report['environment']['gpu_total_mem_gb']} GiB)") + lines.extend( + [ + "", + "## Scope", + "", + "- Scenario 1: high batch size + low concurrency", + "- Scenario 2: high concurrency + low batch size", + "- Variants in this report:", + ] + ) + for variant in report["variants"]: + lines.append(f" - `{variant['name']}`: `{variant['overrides']}`") + + for scenario in report["scenarios"]: + lines.extend( + [ + "", + f"## {scenario['name']}", + "", + f"- Direction: `{scenario['source_lang']} -> {scenario['target_lang']}`", + f"- Column: `{scenario['column']}`", + ] + ) + for variant in scenario["variants"]: + lines.extend( + [ + "", + f"### Variant `{variant['name']}`", + "", + "| Scenario | Setting | Items/s | Req p95 ms | Avg req ms |", + "|---|---|---:|---:|---:|", + ] + ) + for row in variant["high_batch_low_concurrency"]: + lines.append( + f"| high-batch/low-concurrency | batch={row['batch_size']}, concurrency=1 | " + f"{row['items_per_second']} | {row['request_latency_p95_ms']} | {row['avg_request_latency_ms']} |" + ) + for row in variant["high_concurrency_low_batch"]: + lines.append( + f"| high-concurrency/low-batch | batch=1, concurrency={row['concurrency']} | " + f"{row['items_per_second']} | {row['request_latency_p95_ms']} | {row['avg_request_latency_ms']} |" + ) + return "\n".join(lines) + "\n" + + +def main() -> None: + args = parse_args() + csv_path = (PROJECT_ROOT / args.csv_path).resolve() if not Path(args.csv_path).is_absolute() else Path(args.csv_path) + output_dir = (PROJECT_ROOT / args.output_dir).resolve() if not Path(args.output_dir).is_absolute() else Path(args.output_dir) + output_dir.mkdir(parents=True, exist_ok=True) + + high_batch_sizes = parse_csv_ints(args.high_batch_sizes) + high_concurrencies = parse_csv_ints(args.high_concurrencies) + + variants = [ + {"name": "ct2_default", "overrides": {}}, + { + "name": "ct2_tuned_t4", + "overrides": { + "ct2_inter_threads": 2, + "ct2_max_queued_batches": 16, + "ct2_batch_type": "examples", + }, + }, + ] + + report: Dict[str, Any] = { + "generated_at": datetime.now().isoformat(timespec="seconds"), + "environment": build_environment_info(), + "csv_path": str(csv_path), + "variants": variants, + "scenarios": [], + } + + largest_batch = max(high_batch_sizes) + high_batch_rows = max(args.high_batch_rows, largest_batch) + + for scenario in SCENARIOS: + scenario_entry = dict(scenario) + scenario_entry["variants"] = [] + batch_texts = load_texts(csv_path, scenario["column"], high_batch_rows) + conc_needed = max(high_concurrencies) * args.high_concurrency_requests + conc_texts = load_texts(csv_path, scenario["column"], conc_needed) + + for variant in variants: + print(f"[start] {scenario['name']} | {variant['name']}", flush=True) + config, capability = build_variant_config(scenario["model"], variant["overrides"]) + ensure_cuda_stats_reset() + service = TranslationService(config) + backend = service.get_backend(scenario["model"]) + + high_batch_results = [] + for batch_size in high_batch_sizes: + high_batch_results.append( + benchmark_serial_case( + service=service, + backend=backend, + scenario=scenario, + capability=capability, + texts=batch_texts[: max(batch_size, high_batch_rows)], + batch_size=batch_size, + warmup_batches=args.warmup_batches, + ) + ) + + high_concurrency_results = [] + for concurrency in high_concurrencies: + high_concurrency_results.append( + benchmark_concurrency_case( + service=service, + backend=backend, + scenario=scenario, + capability=capability, + texts=conc_texts, + batch_size=1, + concurrency=concurrency, + requests_per_case=args.high_concurrency_requests, + warmup_batches=args.warmup_batches, + ) + ) + + scenario_entry["variants"].append( + { + "name": variant["name"], + "overrides": variant["overrides"], + "high_batch_low_concurrency": high_batch_results, + "high_concurrency_low_batch": high_concurrency_results, + } + ) + print(f"[done] {scenario['name']} | {variant['name']}", flush=True) + + report["scenarios"].append(scenario_entry) + + stamp = datetime.now().strftime("%H%M%S") + json_path = output_dir / f"translation_local_models_focus_{stamp}.json" + md_path = output_dir / f"translation_local_models_focus_{stamp}.md" + json_path.write_text(json.dumps(report, ensure_ascii=False, indent=2), encoding="utf-8") + md_path.write_text(render_markdown(report), encoding="utf-8") + print(f"JSON report: {json_path}") + print(f"Markdown report: {md_path}") + + +if __name__ == "__main__": + main() diff --git a/benchmarks/translation/benchmark_translation_longtext_single.py b/benchmarks/translation/benchmark_translation_longtext_single.py new file mode 100644 index 0000000..8d8c726 --- /dev/null +++ b/benchmarks/translation/benchmark_translation_longtext_single.py @@ -0,0 +1,186 @@ +#!/usr/bin/env python3 +"""Benchmark a single long-text translation request for local models.""" + +from __future__ import annotations + +import argparse +import copy +import json +import logging +import statistics +import time +from pathlib import Path + +import torch + +PROJECT_ROOT = Path(__file__).resolve().parents[2] + +import sys + +if str(PROJECT_ROOT) not in sys.path: + sys.path.insert(0, str(PROJECT_ROOT)) + +from config.services_config import get_translation_config # noqa: E402 +from translation.service import TranslationService # noqa: E402 +from translation.text_splitter import compute_safe_input_token_limit # noqa: E402 + + +def parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser(description="Benchmark a long-text translation request") + parser.add_argument("--model", default="nllb-200-distilled-600m") + parser.add_argument("--source-lang", default="zh") + parser.add_argument("--target-lang", default="en") + parser.add_argument("--scene", default="sku_name") + parser.add_argument("--source-md", default="docs/DEVELOPER_GUIDE.md") + parser.add_argument("--paragraph-min-chars", type=int, default=250) + parser.add_argument("--target-doc-chars", type=int, default=4500) + parser.add_argument("--min-doc-chars", type=int, default=2400) + parser.add_argument("--runs", type=int, default=3) + parser.add_argument("--batch-size", type=int, default=64) + parser.add_argument("--ct2-inter-threads", type=int, default=4) + parser.add_argument("--ct2-max-queued-batches", type=int, default=32) + parser.add_argument("--ct2-batch-type", default="examples") + parser.add_argument("--max-new-tokens", type=int, default=64) + parser.add_argument("--ct2-decoding-length-mode", default="source") + parser.add_argument("--ct2-decoding-length-extra", type=int, default=8) + parser.add_argument("--ct2-decoding-length-min", type=int, default=32) + return parser.parse_args() + + +def build_long_document(args: argparse.Namespace) -> str: + source_path = (PROJECT_ROOT / args.source_md).resolve() + text = source_path.read_text(encoding="utf-8") + paragraphs = [] + for raw in text.split("\n\n"): + normalized = " ".join(line.strip() for line in raw.splitlines() if line.strip()) + if len(normalized) >= args.paragraph_min_chars and not normalized.startswith("```"): + paragraphs.append(normalized) + + parts = [] + total = 0 + for paragraph in paragraphs: + parts.append(paragraph) + total += len(paragraph) + 2 + if total >= args.target_doc_chars: + break + document = "\n\n".join(parts) + if len(document) < args.min_doc_chars: + raise ValueError( + f"Prepared long document is too short: {len(document)} chars < {args.min_doc_chars}" + ) + return document + + +def build_service(args: argparse.Namespace) -> TranslationService: + config = copy.deepcopy(get_translation_config()) + for name, capability in config["capabilities"].items(): + capability["enabled"] = name == args.model + + capability = config["capabilities"][args.model] + capability["use_cache"] = False + capability["batch_size"] = args.batch_size + capability["ct2_inter_threads"] = args.ct2_inter_threads + capability["ct2_max_queued_batches"] = args.ct2_max_queued_batches + capability["ct2_batch_type"] = args.ct2_batch_type + capability["max_new_tokens"] = args.max_new_tokens + capability["ct2_decoding_length_mode"] = args.ct2_decoding_length_mode + capability["ct2_decoding_length_extra"] = args.ct2_decoding_length_extra + capability["ct2_decoding_length_min"] = args.ct2_decoding_length_min + config["default_model"] = args.model + return TranslationService(config) + + +def percentile(values: list[float], p: float) -> float: + if not values: + return 0.0 + ordered = sorted(values) + if len(ordered) == 1: + return float(ordered[0]) + index = min(len(ordered) - 1, max(0, round((len(ordered) - 1) * p))) + return float(ordered[index]) + + +def main() -> None: + args = parse_args() + logging.getLogger().setLevel(logging.WARNING) + + document = build_long_document(args) + load_started = time.perf_counter() + service = build_service(args) + backend = service.get_backend(args.model) + load_seconds = time.perf_counter() - load_started + + safe_input_limit = compute_safe_input_token_limit( + max_input_length=backend.max_input_length, + max_new_tokens=backend.max_new_tokens, + decoding_length_mode=backend.ct2_decoding_length_mode, + decoding_length_extra=backend.ct2_decoding_length_extra, + ) + segments = backend._split_text_if_needed( + document, + target_lang=args.target_lang, + source_lang=args.source_lang, + ) + + # Warm up once before measurements. + _ = service.translate( + document, + source_lang=args.source_lang, + target_lang=args.target_lang, + model=args.model, + scene=args.scene, + ) + if torch.cuda.is_available(): + torch.cuda.synchronize() + + latencies_ms: list[float] = [] + output_chars = 0 + for _ in range(args.runs): + started = time.perf_counter() + output = service.translate( + document, + source_lang=args.source_lang, + target_lang=args.target_lang, + model=args.model, + scene=args.scene, + ) + if torch.cuda.is_available(): + torch.cuda.synchronize() + latencies_ms.append((time.perf_counter() - started) * 1000) + output_chars += len(output or "") + + total_seconds = sum(latencies_ms) / 1000.0 + payload = { + "model": args.model, + "source_lang": args.source_lang, + "target_lang": args.target_lang, + "doc_chars": len(document), + "runs": args.runs, + "load_seconds": round(load_seconds, 3), + "batch_size": backend.batch_size, + "ct2_inter_threads": backend.ct2_inter_threads, + "ct2_max_queued_batches": backend.ct2_max_queued_batches, + "ct2_batch_type": backend.ct2_batch_type, + "max_new_tokens": backend.max_new_tokens, + "ct2_decoding_length_mode": backend.ct2_decoding_length_mode, + "ct2_decoding_length_extra": backend.ct2_decoding_length_extra, + "ct2_decoding_length_min": backend.ct2_decoding_length_min, + "safe_input_limit": safe_input_limit, + "segment_count": len(segments), + "segment_char_lengths": { + "min": min(len(segment) for segment in segments), + "max": max(len(segment) for segment in segments), + "avg": round(statistics.fmean(len(segment) for segment in segments), 1), + }, + "latency_avg_ms": round(statistics.fmean(latencies_ms), 2), + "latency_p50_ms": round(percentile(latencies_ms, 0.50), 2), + "latency_p95_ms": round(percentile(latencies_ms, 0.95), 2), + "latency_max_ms": round(max(latencies_ms), 2), + "input_chars_per_second": round((len(document) * args.runs) / total_seconds, 2), + "output_chars_per_second": round(output_chars / total_seconds, 2), + } + print(json.dumps(payload, ensure_ascii=False)) + + +if __name__ == "__main__": + main() diff --git a/config/config.yaml b/config/config.yaml index 578bd67..620c3dc 100644 --- a/config/config.yaml +++ b/config/config.yaml @@ -114,7 +114,7 @@ field_boosts: qanchors: 1.0 enriched_tags: 1.0 enriched_attributes.value: 1.5 - enriched_taxonomy_attributes.value: 0.3 + # enriched_taxonomy_attributes.value: 0.3 category_name_text: 2.0 category_path: 2.0 keywords: 2.0 @@ -195,7 +195,7 @@ query_config: - qanchors - enriched_tags - enriched_attributes.value - - enriched_taxonomy_attributes.value + # - enriched_taxonomy_attributes.value - option1_values - option2_values - option3_values @@ -254,7 +254,7 @@ query_config: # - qanchors # - enriched_tags # - enriched_attributes - # - enriched_taxonomy_attributes.value + # - # enriched_taxonomy_attributes.value - min_price - compare_at_price - image_url diff --git a/docs/DEVELOPER_GUIDE.md b/docs/DEVELOPER_GUIDE.md index 2c70b2e..17b8ee9 100644 --- a/docs/DEVELOPER_GUIDE.md +++ b/docs/DEVELOPER_GUIDE.md @@ -389,7 +389,7 @@ services: - **位置**:`tests/`,可按 `unit/`、`integration/` 或按模块划分子目录;公共 fixture 在 `conftest.py`。 - **标记**:使用 `@pytest.mark.unit`、`@pytest.mark.integration`、`@pytest.mark.api` 等区分用例类型,便于按需运行。 - **依赖**:单元测试通过 mock(如 `mock_es_client`、`sample_search_config`)不依赖真实 ES/DB;集成测试需在说明中注明依赖服务。 -- **运行**:`python -m pytest tests/`;仅单元:`python -m pytest tests/unit/` 或 `-m unit`。 +- **运行**:`python -m pytest tests/`;推荐最小回归:`python -m pytest tests/ci -q`;按模块聚焦可直接指定具体测试文件。 - **原则**:新增逻辑应有对应测试;修改协议或配置契约时更新相关测试与 fixture。 ### 8.3 配置与环境 diff --git a/docs/QUICKSTART.md b/docs/QUICKSTART.md index 9000ff7..db2d057 100644 --- a/docs/QUICKSTART.md +++ b/docs/QUICKSTART.md @@ -69,7 +69,7 @@ source activate.sh ./run.sh all # 仅为薄封装:等价于 ./scripts/service_ctl.sh up all # 说明: -# - all = tei cnclip embedding embedding-image translator reranker reranker-fine backend indexer frontend eval-web +# - all = tei cnclip embedding embedding-image translator reranker backend indexer frontend eval-web # - up 会同时启动 monitor daemon(运行期连续失败自动重启) # - reranker 为 GPU 强制模式(资源不足会直接启动失败) # - TEI 默认使用 GPU;当 TEI_DEVICE=cuda 且 GPU 不可用时会直接失败(不会自动降级到 CPU) diff --git a/docs/Usage-Guide.md b/docs/Usage-Guide.md index 5873680..c29619a 100644 --- a/docs/Usage-Guide.md +++ b/docs/Usage-Guide.md @@ -126,7 +126,7 @@ cd /data/saas-search 这个脚本会自动: 1. 创建日志目录 -2. 按目标启动服务(`all`:`tei cnclip embedding embedding-image translator reranker reranker-fine backend indexer frontend eval-web`) +2. 按目标启动服务(`all`:`tei cnclip embedding embedding-image translator reranker backend indexer frontend eval-web`) 3. 写入 PID 到 `logs/*.pid` 4. 执行健康检查 5. 启动 monitor daemon(运行期连续失败自动重启) @@ -202,7 +202,7 @@ python -m pytest -q tests/test_rerank_client.py tests/test_es_query_builder.py t ./scripts/service_ctl.sh restart backend sleep 3 ./scripts/service_ctl.sh status backend -./scripts/evaluation/start_eval.sh.sh batch +./scripts/evaluation/start_eval.sh batch ``` 离线批量评估会把标注与报表写到 `artifacts/search_evaluation/`(SQLite、`batch_reports/` 下的 JSON/Markdown 等)。说明与命令见 [scripts/evaluation/README.md](../scripts/evaluation/README.md)。 diff --git a/docs/工作总结-微服务性能优化与架构.md b/docs/工作总结-微服务性能优化与架构.md index debd48e..5407569 100644 --- a/docs/工作总结-微服务性能优化与架构.md +++ b/docs/工作总结-微服务性能优化与架构.md @@ -129,7 +129,7 @@ instruction: "Given a shopping query, rank product titles by relevance" - 可选:embedding(text) **6005**、embedding-image **6008**、translator **6006**、reranker **6007**、tei **8080**、cnclip **51000**。 - 端口可由环境变量覆盖:`API_PORT`、`INDEXER_PORT`、`FRONTEND_PORT`、`EVAL_WEB_PORT`、`EMBEDDING_TEXT_PORT`、`EMBEDDING_IMAGE_PORT`、`TRANSLATION_PORT`、`RERANKER_PORT`、`TEI_PORT`、`CNCLIP_PORT`。 - **命令**: - - `./scripts/service_ctl.sh start [service...]` 或 `up all` / `start all`(all 含 tei、cnclip、embedding、embedding-image、translator、reranker、reranker-fine、backend、indexer、frontend、eval-web,按依赖顺序);`stop`、`restart`、`down` 同参数;`status` 默认列出所有服务。 + - `./scripts/service_ctl.sh start [service...]` 或 `up all` / `start all`(all 含 tei、cnclip、embedding、embedding-image、translator、reranker、backend、indexer、frontend、eval-web,按依赖顺序);`stop`、`restart`、`down` 同参数;`status` 默认列出所有服务。 - 启动时:backend/indexer/frontend/embedding/translator/reranker 会写 pid 到 `logs/.pid`,并执行 `wait_for_health`(GET `http://127.0.0.1:/health`);reranker 健康重试 90 次,其余 30 次;TEI 校验 Docker 容器存在且 `/health` 成功;cnclip 无 HTTP 健康则仅校验进程/端口。 - **监控常驻**: - `./scripts/service_ctl.sh monitor-start ` 启动后台监控进程,将 targets 写入 `logs/service-monitor.targets`,pid 写入 `logs/service-monitor.pid`,日志追加到 `logs/service-monitor.log`。 @@ -153,12 +153,12 @@ instruction: "Given a shopping query, rank product titles by relevance" ## 三、性能测试报告摘要 -以下数据来自 `docs/性能测试报告.md`,测试时间 **2026-03-12**,环境:**8 vCPU**(Intel Xeon Platinum 8255C @ 2.50GHz)、**约 15Gi 可用内存**;租户 **162** 文档数约 **53**(search/search/suggestions/rerank 与文档规模相关)。压测工具:`scripts/perf_api_benchmark.py`,场景×并发矩阵,每档 **20s**。 +以下数据来自 `docs/性能测试报告.md`,测试时间 **2026-03-12**,环境:**8 vCPU**(Intel Xeon Platinum 8255C @ 2.50GHz)、**约 15Gi 可用内存**;租户 **162** 文档数约 **53**(search/search/suggestions/rerank 与文档规模相关)。压测工具:`benchmarks/perf_api_benchmark.py`,场景×并发矩阵,每档 **20s**。 **复现命令(四场景×四并发)**: ```bash cd /data/saas-search -.venv/bin/python scripts/perf_api_benchmark.py \ +.venv/bin/python benchmarks/perf_api_benchmark.py \ --scenario backend_search,backend_suggest,embed_text,rerank \ --concurrency-list 1,5,10,20 \ --duration 20 \ @@ -188,7 +188,7 @@ cd /data/saas-search 口径:query 固定 `wireless mouse`,每次请求 **386 docs**,句长 15–40 词随机(从 1000 词池采样);配置 `rerank_window=384`。复现命令: ```bash -.venv/bin/python scripts/perf_api_benchmark.py \ +.venv/bin/python benchmarks/perf_api_benchmark.py \ --scenario rerank --duration 20 --concurrency-list 1,5,10,20 --timeout 60 \ --rerank-dynamic-docs --rerank-doc-count 386 --rerank-vocab-size 1000 \ --rerank-sentence-min-words 15 --rerank-sentence-max-words 40 \ @@ -217,7 +217,7 @@ cd /data/saas-search | 10 | 181 | 100% | 8.78 | 1129.23| 1295.88| 1330.96| | 20 | 161 | 100% | 7.63 | 2594.00| 4706.44| 4783.05| -**结论**:吞吐约 **8 rps** 平台化,延迟随并发上升明显,符合“检索 + 向量 + 重排”重链路特征。多租户补测(文档数 500–10000,见报告 §12)表明:文档数越大,RPS 下降、延迟升高;tenant 0(10000 doc)在并发 20 出现部分 ReadTimeout(成功率 59.02%),需注意 timeout 与容量规划;补测命令示例:`for t in 0 1 2 3 4; do .venv/bin/python scripts/perf_api_benchmark.py --scenario backend_search --concurrency-list 1,5,10,20 --duration 20 --tenant-id $t --output perf_reports/2026-03-12/search_tenant_matrix/tenant_${t}.json; done`。 +**结论**:吞吐约 **8 rps** 平台化,延迟随并发上升明显,符合“检索 + 向量 + 重排”重链路特征。多租户补测(文档数 500–10000,见报告 §12)表明:文档数越大,RPS 下降、延迟升高;tenant 0(10000 doc)在并发 20 出现部分 ReadTimeout(成功率 59.02%),需注意 timeout 与容量规划;补测命令示例:`for t in 0 1 2 3 4; do .venv/bin/python benchmarks/perf_api_benchmark.py --scenario backend_search --concurrency-list 1,5,10,20 --duration 20 --tenant-id $t --output perf_reports/2026-03-12/search_tenant_matrix/tenant_${t}.json; done`。 --- @@ -247,5 +247,5 @@ cd /data/saas-search **关键文件与复现**: - 配置:`config/config.yaml`(services、rerank、query_config)、`.env`(端口与 API Key)。 -- 脚本:`scripts/service_ctl.sh`(启停与监控)、`scripts/perf_api_benchmark.py`(压测)、`scripts/build_suggestions.sh`(suggest 构建)。 +- 脚本:`scripts/service_ctl.sh`(启停与监控)、`benchmarks/perf_api_benchmark.py`(压测)、`scripts/build_suggestions.sh`(suggest 构建)。 - 完整步骤与多租户/rerank 对比见:`docs/性能测试报告.md`。 diff --git a/docs/性能测试报告.md b/docs/性能测试报告.md index 1380d37..806245e 100644 --- a/docs/性能测试报告.md +++ b/docs/性能测试报告.md @@ -18,13 +18,13 @@ 执行方式: - 每组压测持续 `20s` -- 使用统一脚本 `scripts/perf_api_benchmark.py` +- 使用统一脚本 `benchmarks/perf_api_benchmark.py` - 通过 `--scenario` 多值 + `--concurrency-list` 一次性跑完 `场景 x 并发` ## 3. 压测工具优化说明(复用现有脚本) 为了解决原脚本“一次只能跑一个场景+一个并发”的可用性问题,本次直接扩展现有脚本: -- `scripts/perf_api_benchmark.py` +- `benchmarks/perf_api_benchmark.py` 能力: - 一条命令执行 `场景列表 x 并发列表` 全矩阵 @@ -33,7 +33,7 @@ 示例: ```bash -.venv/bin/python scripts/perf_api_benchmark.py \ +.venv/bin/python benchmarks/perf_api_benchmark.py \ --scenario backend_search,backend_suggest,embed_text,rerank \ --concurrency-list 1,5,10,20 \ --duration 20 \ @@ -106,7 +106,7 @@ curl -sS http://127.0.0.1:6007/health ```bash cd /data/saas-search -.venv/bin/python scripts/perf_api_benchmark.py \ +.venv/bin/python benchmarks/perf_api_benchmark.py \ --scenario backend_search,backend_suggest,embed_text,rerank \ --concurrency-list 1,5,10,20 \ --duration 20 \ @@ -164,7 +164,7 @@ cd /data/saas-search 复现命令: ```bash -.venv/bin/python scripts/perf_api_benchmark.py \ +.venv/bin/python benchmarks/perf_api_benchmark.py \ --scenario rerank \ --duration 20 \ --concurrency-list 1,5,10,20 \ @@ -237,7 +237,7 @@ cd /data/saas-search - 使用项目虚拟环境执行: ```bash -.venv/bin/python scripts/perf_api_benchmark.py -h +.venv/bin/python benchmarks/perf_api_benchmark.py -h ``` ### 10.3 某场景成功率下降 @@ -249,7 +249,7 @@ cd /data/saas-search ## 11. 关联文件 -- 压测脚本:`scripts/perf_api_benchmark.py` +- 压测脚本:`benchmarks/perf_api_benchmark.py` - 本次结果:`perf_reports/2026-03-12/perf_matrix_report.json` - Search 多租户补测:`perf_reports/2026-03-12/search_tenant_matrix/` - Reranker 386 docs 口径补测:`perf_reports/2026-03-12/rerank_realistic/rerank_386docs.json` @@ -280,7 +280,7 @@ cd /data/saas-search cd /data/saas-search mkdir -p perf_reports/2026-03-12/search_tenant_matrix for t in 0 1 2 3 4; do - .venv/bin/python scripts/perf_api_benchmark.py \ + .venv/bin/python benchmarks/perf_api_benchmark.py \ --scenario backend_search \ --concurrency-list 1,5,10,20 \ --duration 20 \ diff --git a/docs/搜索API对接指南-05-索引接口(Indexer).md b/docs/搜索API对接指南-05-索引接口(Indexer).md index 70a65ee..398a5c1 100644 --- a/docs/搜索API对接指南-05-索引接口(Indexer).md +++ b/docs/搜索API对接指南-05-索引接口(Indexer).md @@ -498,7 +498,7 @@ curl -X GET "http://localhost:6004/indexer/health" #### 请求示例(完整 curl) -> 完整请求体参考 `scripts/test_build_docs_api.py` 中的 `build_sample_request()`。 +> 完整请求体参考 `tests/manual/test_build_docs_api.py` 中的 `build_sample_request()`。 ```bash # 单条 SPU 示例(含 spu、skus、options) diff --git a/docs/搜索API对接指南-10-接口级压测脚本.md b/docs/搜索API对接指南-10-接口级压测脚本.md index 68f463c..593f104 100644 --- a/docs/搜索API对接指南-10-接口级压测脚本.md +++ b/docs/搜索API对接指南-10-接口级压测脚本.md @@ -4,7 +4,7 @@ ## 10. 接口级压测脚本 -仓库提供统一压测脚本:`scripts/perf_api_benchmark.py`,用于对以下接口做并发压测: +仓库提供统一压测脚本:`benchmarks/perf_api_benchmark.py`,用于对以下接口做并发压测: - 后端搜索:`POST /search/` - 搜索建议:`GET /search/suggestions` @@ -18,21 +18,21 @@ ```bash # suggest 压测(tenant 162) -python scripts/perf_api_benchmark.py \ +python benchmarks/perf_api_benchmark.py \ --scenario backend_suggest \ --tenant-id 162 \ --duration 30 \ --concurrency 50 # search 压测 -python scripts/perf_api_benchmark.py \ +python benchmarks/perf_api_benchmark.py \ --scenario backend_search \ --tenant-id 162 \ --duration 30 \ --concurrency 20 # 全链路压测(search + suggest + embedding + translate + rerank) -python scripts/perf_api_benchmark.py \ +python benchmarks/perf_api_benchmark.py \ --scenario all \ --tenant-id 162 \ --duration 60 \ @@ -45,17 +45,16 @@ python scripts/perf_api_benchmark.py \ 可通过 `--cases-file` 覆盖默认请求模板。示例文件: ```bash -scripts/perf_cases.json.example +benchmarks/perf_cases.json.example ``` 执行示例: ```bash -python scripts/perf_api_benchmark.py \ +python benchmarks/perf_api_benchmark.py \ --scenario all \ --tenant-id 162 \ - --cases-file scripts/perf_cases.json.example \ + --cases-file benchmarks/perf_cases.json.example \ --duration 60 \ --concurrency 40 ``` - diff --git a/docs/相关性检索优化说明.md b/docs/相关性检索优化说明.md index a313bbc..d601647 100644 --- a/docs/相关性检索优化说明.md +++ b/docs/相关性检索优化说明.md @@ -330,7 +330,7 @@ python -m pytest -q tests/test_rerank_client.py tests/test_es_query_builder.py t ./scripts/service_ctl.sh restart backend sleep 3 ./scripts/service_ctl.sh status backend -./scripts/evaluation/start_eval.sh.sh batch +./scripts/evaluation/start_eval.sh batch ``` 评估产物在 `artifacts/search_evaluation/`(如 `search_eval.sqlite3`、`batch_reports/` 下的 JSON/Markdown)。流程与参数说明见 [scripts/evaluation/README.md](../scripts/evaluation/README.md)。 @@ -895,4 +895,3 @@ rerank_score:0.4784 rerank_score:0.5849 "zh": "新款女士修身仿旧牛仔短裤 – 休闲性感磨边水洗牛仔短裤,时尚舒", "en": "New Women's Slim-fit Vintage Washed Denim Shorts – Casual Sexy Frayed Hem, Fashionable & Comfortable" - diff --git a/embeddings/README.md b/embeddings/README.md index e7acb6c..fa99e3b 100644 --- a/embeddings/README.md +++ b/embeddings/README.md @@ -98,10 +98,10 @@ ### 性能与压测(沿用仓库脚本) -- 接口级压测(与 `perf_reports/2026-03-12/matrix_report/` 等方法一致):`scripts/perf_api_benchmark.py` - - 示例:`python scripts/perf_api_benchmark.py --scenario embed_text --duration 30 --concurrency 20` +- 接口级压测(与 `perf_reports/2026-03-12/matrix_report/` 等方法一致):`benchmarks/perf_api_benchmark.py` + - 示例:`python benchmarks/perf_api_benchmark.py --scenario embed_text --duration 30 --concurrency 20` - 文本/图片向量可带 `priority`(与线上 admission 语义一致):`--embed-text-priority 1`、`--embed-image-priority 1` - - 自定义请求模板:`--cases-file scripts/perf_cases.json.example` + - 自定义请求模板:`--cases-file benchmarks/perf_cases.json.example` - 历史矩阵结果与说明见 `perf_reports/2026-03-12/matrix_report/summary.md`。 ### 启动服务 diff --git a/perf_reports/20260311/reranker_1000docs/report.md b/perf_reports/20260311/reranker_1000docs/report.md index 5ce73aa..6e888c8 100644 --- a/perf_reports/20260311/reranker_1000docs/report.md +++ b/perf_reports/20260311/reranker_1000docs/report.md @@ -34,5 +34,5 @@ Workload profile: ## Reproduce ```bash -./scripts/benchmark_reranker_1000docs.sh +./benchmarks/reranker/benchmark_reranker_1000docs.sh ``` diff --git a/perf_reports/20260317/translation_local_models/README.md b/perf_reports/20260317/translation_local_models/README.md index f347d60..20bcb86 100644 --- a/perf_reports/20260317/translation_local_models/README.md +++ b/perf_reports/20260317/translation_local_models/README.md @@ -1,6 +1,6 @@ # Local Translation Model Benchmark Report -Test script: [`scripts/benchmark_translation_local_models.py`](/data/saas-search/scripts/benchmark_translation_local_models.py) +Test script: [`benchmarks/translation/benchmark_translation_local_models.py`](/data/saas-search/benchmarks/translation/benchmark_translation_local_models.py) Test time: `2026-03-17` @@ -67,7 +67,7 @@ To model online search query translation, we reran NLLB with `batch_size=1`. In Command used: ```bash -./.venv-translator/bin/python scripts/benchmark_translation_local_models.py \ +./.venv-translator/bin/python benchmarks/translation/benchmark_translation_local_models.py \ --single \ --model nllb-200-distilled-600m \ --source-lang zh \ diff --git a/perf_reports/20260318/nllb_t4_product_names_ct2/README.md b/perf_reports/20260318/nllb_t4_product_names_ct2/README.md index c4107aa..4ac5f1a 100644 --- a/perf_reports/20260318/nllb_t4_product_names_ct2/README.md +++ b/perf_reports/20260318/nllb_t4_product_names_ct2/README.md @@ -1,7 +1,7 @@ # NLLB T4 Product-Name Tuning Summary 测试脚本: -- [`scripts/benchmark_nllb_t4_tuning.py`](/data/saas-search/scripts/benchmark_nllb_t4_tuning.py) +- [`benchmarks/translation/benchmark_nllb_t4_tuning.py`](/data/saas-search/benchmarks/translation/benchmark_nllb_t4_tuning.py) 本轮报告: - Markdown:[`nllb_t4_tuning_003608.md`](/data/saas-search/perf_reports/20260318/nllb_t4_product_names_ct2/nllb_t4_tuning_003608.md) diff --git a/perf_reports/20260318/translation_local_models/README.md b/perf_reports/20260318/translation_local_models/README.md index cdee75b..c82f5f6 100644 --- a/perf_reports/20260318/translation_local_models/README.md +++ b/perf_reports/20260318/translation_local_models/README.md @@ -1,7 +1,7 @@ # Local Translation Model Benchmark Report 测试脚本: -- [`scripts/benchmark_translation_local_models.py`](/data/saas-search/scripts/benchmark_translation_local_models.py) +- [`benchmarks/translation/benchmark_translation_local_models.py`](/data/saas-search/benchmarks/translation/benchmark_translation_local_models.py) 完整结果: - Markdown:[`translation_local_models_extended_221846.md`](/data/saas-search/perf_reports/20260318/translation_local_models/translation_local_models_extended_221846.md) @@ -39,7 +39,7 @@ ```bash cd /data/saas-search -./.venv-translator/bin/python scripts/benchmark_translation_local_models.py \ +./.venv-translator/bin/python benchmarks/translation/benchmark_translation_local_models.py \ --suite extended \ --disable-cache \ --serial-items-per-case 256 \ diff --git a/perf_reports/20260318/translation_local_models_ct2/README.md b/perf_reports/20260318/translation_local_models_ct2/README.md index 3d0dbc4..712d482 100644 --- a/perf_reports/20260318/translation_local_models_ct2/README.md +++ b/perf_reports/20260318/translation_local_models_ct2/README.md @@ -1,7 +1,7 @@ # Local Translation Model Benchmark Report (CTranslate2) 测试脚本: -- [`scripts/benchmark_translation_local_models.py`](/data/saas-search/scripts/benchmark_translation_local_models.py) +- [`benchmarks/translation/benchmark_translation_local_models.py`](/data/saas-search/benchmarks/translation/benchmark_translation_local_models.py) 本轮 CT2 结果: - Markdown:[`translation_local_models_ct2_extended_233253.md`](/data/saas-search/perf_reports/20260318/translation_local_models_ct2/translation_local_models_ct2_extended_233253.md) @@ -46,7 +46,7 @@ from datetime import datetime from pathlib import Path from types import SimpleNamespace -from scripts.benchmark_translation_local_models import ( +from benchmarks.translation.benchmark_translation_local_models import ( SCENARIOS, benchmark_extended_scenario, build_environment_info, diff --git a/perf_reports/20260318/translation_local_models_ct2_focus/README.md b/perf_reports/20260318/translation_local_models_ct2_focus/README.md index 2092880..16e46e6 100644 --- a/perf_reports/20260318/translation_local_models_ct2_focus/README.md +++ b/perf_reports/20260318/translation_local_models_ct2_focus/README.md @@ -1,7 +1,7 @@ # Local Translation Model Focused T4 Tuning 测试脚本: -- [`scripts/benchmark_translation_local_models_focus.py`](/data/saas-search/scripts/benchmark_translation_local_models_focus.py) +- [`benchmarks/translation/benchmark_translation_local_models_focus.py`](/data/saas-search/benchmarks/translation/benchmark_translation_local_models_focus.py) 本轮聚焦结果: - Markdown:[`translation_local_models_focus_235018.md`](/data/saas-search/perf_reports/20260318/translation_local_models_ct2_focus/translation_local_models_focus_235018.md) diff --git a/perf_reports/README.md b/perf_reports/README.md index c918ef4..8f529f3 100644 --- a/perf_reports/README.md +++ b/perf_reports/README.md @@ -4,7 +4,7 @@ | 脚本 | 用途 | |------|------| -| `scripts/perf_api_benchmark.py` | 搜索后端、向量、翻译、重排等 HTTP 接口压测;支持 `--embed-text-priority` / `--embed-image-priority` 与 `scripts/perf_cases.json.example` | +| `benchmarks/perf_api_benchmark.py` | 搜索后端、向量、翻译、重排等 HTTP 接口压测;支持 `--embed-text-priority` / `--embed-image-priority` 与 `benchmarks/perf_cases.json.example` | 历史矩阵示例(并发扫描): @@ -25,10 +25,10 @@ ```bash source activate.sh -python scripts/perf_api_benchmark.py --scenario embed_text --duration 8 --concurrency 10 --timeout 30 --output perf_reports/2026-03-20_embed_text_p0.json -python scripts/perf_api_benchmark.py --scenario embed_text --duration 8 --concurrency 10 --embed-text-priority 1 --output perf_reports/2026-03-20_embed_text_p1.json -python scripts/perf_api_benchmark.py --scenario embed_image --duration 8 --concurrency 5 --timeout 60 --output perf_reports/2026-03-20_embed_image_p0.json -python scripts/perf_api_benchmark.py --scenario embed_image --duration 8 --concurrency 5 --embed-image-priority 1 --output perf_reports/2026-03-20_embed_image_p1.json +python benchmarks/perf_api_benchmark.py --scenario embed_text --duration 8 --concurrency 10 --timeout 30 --output perf_reports/2026-03-20_embed_text_p0.json +python benchmarks/perf_api_benchmark.py --scenario embed_text --duration 8 --concurrency 10 --embed-text-priority 1 --output perf_reports/2026-03-20_embed_text_p1.json +python benchmarks/perf_api_benchmark.py --scenario embed_image --duration 8 --concurrency 5 --timeout 60 --output perf_reports/2026-03-20_embed_image_p0.json +python benchmarks/perf_api_benchmark.py --scenario embed_image --duration 8 --concurrency 5 --embed-image-priority 1 --output perf_reports/2026-03-20_embed_image_p1.json ``` 说明:本次为 **8 秒 smoke**,与 `2026-03-12` 矩阵的时长/并发不可直接横向对比;仅验证 `priority` 参数下服务仍返回 200 且 payload 校验通过。 diff --git a/perf_reports/reranker_vllm_instruction/2026-03-25/RESULTS.md b/perf_reports/reranker_vllm_instruction/2026-03-25/RESULTS.md index 71a832f..0de29c3 100644 --- a/perf_reports/reranker_vllm_instruction/2026-03-25/RESULTS.md +++ b/perf_reports/reranker_vllm_instruction/2026-03-25/RESULTS.md @@ -25,7 +25,7 @@ Shared across both backends for this run: ## Methodology -- Script: `python scripts/benchmark_reranker_random_titles.py 100,200,400,600,800,1000 --repeat 5` with **`--seed 99`** (see note below), **`--quiet-runs`**, **`--timeout 360`**. +- Script: `python benchmarks/reranker/benchmark_reranker_random_titles.py 100,200,400,600,800,1000 --repeat 5` with **`--seed 99`** (see note below), **`--quiet-runs`**, **`--timeout 360`**. - Titles: default file `/home/ubuntu/rerank_test/titles.1.8w` (one title per line). - Query: default `健身女生T恤短袖`. - Each scenario: **3 warm-up** requests at `n=400` (not timed), then **5 timed** runs per `n`. @@ -56,9 +56,9 @@ JSON aggregates (means, stdev, raw `values_ms`): same directory, `qwen3_vllm_{co ## Tooling added / changed - `reranker/server.py`: `/health` includes `instruction_format` when the active backend sets `_instruction_format`. -- `scripts/benchmark_reranker_random_titles.py`: `--tag`, `--json-summary-out`, `--quiet-runs`. -- `scripts/patch_rerank_vllm_benchmark_config.py`: surgical YAML patch (preserves newlines). -- `scripts/run_reranker_vllm_instruction_benchmark.sh`: full matrix driver (continues if a benchmark exits non-zero; uses `--timeout 360`). +- `benchmarks/reranker/benchmark_reranker_random_titles.py`: `--tag`, `--json-summary-out`, `--quiet-runs`. +- `benchmarks/reranker/patch_rerank_vllm_benchmark_config.py`: surgical YAML patch (preserves newlines). +- `benchmarks/reranker/run_reranker_vllm_instruction_benchmark.sh`: full matrix driver (continues if a benchmark exits non-zero; uses `--timeout 360`). --- @@ -73,7 +73,7 @@ JSON aggregates (means, stdev, raw `values_ms`): same directory, `qwen3_vllm_{co | Attention | Backend forced / steered attention on T4 (e.g. `TRITON_ATTN` path) | **No** `attention_config` in `LLM(...)`; vLLM **auto** — on this T4 run, logs show **`FLASHINFER`** | | Config surface | `vllm_attention_backend` / `RERANK_VLLM_ATTENTION_BACKEND` 等 | **Removed**(少 YAML/环境变量分支,逻辑收敛) | | Code default `instruction_format` | `qwen3_vllm_score` 默认 `standard` | 与 `qwen3_vllm` 对齐为 **`compact`**(仍可在 YAML 写 `standard`) | -| Smoke / 启动 | — | `scripts/smoke_qwen3_vllm_score_backend.py`;`scripts/start_reranker.sh` 将 **venv `bin` 置于 `PATH`**(FLASHINFER JIT 依赖 venv 内的 `ninja`) | +| Smoke / 启动 | — | `benchmarks/reranker/smoke_qwen3_vllm_score_backend.py`;`scripts/start_reranker.sh` 将 **venv `bin` 置于 `PATH`**(FLASHINFER JIT 依赖 venv 内的 `ninja`) | Micro-benchmark (same machine, isolated): **~927.5 ms → ~673.1 ms** at **n=400** docs on `LLM.score()` steady state (~**28%**), after removing the forced attention path and letting vLLM pick **FLASHINFER**. diff --git a/reranker/DEPLOYMENT_AND_TUNING.md b/reranker/DEPLOYMENT_AND_TUNING.md index 289873a..cfb94d8 100644 --- a/reranker/DEPLOYMENT_AND_TUNING.md +++ b/reranker/DEPLOYMENT_AND_TUNING.md @@ -109,7 +109,7 @@ curl -sS http://127.0.0.1:6007/health ### 5.1 使用一键压测脚本 ```bash -./scripts/benchmark_reranker_1000docs.sh +./benchmarks/reranker/benchmark_reranker_1000docs.sh ``` 输出目录: diff --git a/reranker/GGUF_0_6B_INSTALL_AND_TUNING.md b/reranker/GGUF_0_6B_INSTALL_AND_TUNING.md index 68e1ddd..548f53f 100644 --- a/reranker/GGUF_0_6B_INSTALL_AND_TUNING.md +++ b/reranker/GGUF_0_6B_INSTALL_AND_TUNING.md @@ -144,7 +144,7 @@ qwen3_gguf_06b: ```bash PYTHONPATH=/data/saas-search ./.venv-reranker-gguf/bin/python \ - scripts/benchmark_reranker_gguf_local.py --backend-name qwen3_gguf_06b --docs 400 + benchmarks/reranker/benchmark_reranker_gguf_local.py --backend-name qwen3_gguf_06b --docs 400 ``` 按服务方式启动: diff --git a/reranker/GGUF_INSTALL_AND_TUNING.md b/reranker/GGUF_INSTALL_AND_TUNING.md index 773b249..43568fa 100644 --- a/reranker/GGUF_INSTALL_AND_TUNING.md +++ b/reranker/GGUF_INSTALL_AND_TUNING.md @@ -117,7 +117,7 @@ HF_HUB_DISABLE_XET=1 ```bash PYTHONPATH=/data/saas-search ./.venv-reranker-gguf/bin/python \ - scripts/benchmark_reranker_gguf_local.py --docs 64 --repeat 1 + benchmarks/reranker/benchmark_reranker_gguf_local.py --docs 64 --repeat 1 ``` 它会直接实例化 GGUF backend,输出: @@ -134,7 +134,7 @@ PYTHONPATH=/data/saas-search ./.venv-reranker-gguf/bin/python \ - Query: `白色oversized T-shirt` - Docs: `64` 条商品标题 -- 本地脚本:`scripts/benchmark_reranker_gguf_local.py` +- 本地脚本:`benchmarks/reranker/benchmark_reranker_gguf_local.py` - 每组 1 次,重点比较相对趋势 结果: @@ -195,7 +195,7 @@ n_gpu_layers=999 ```bash RERANK_BASE=http://127.0.0.1:6007 \ - ./.venv/bin/python scripts/benchmark_reranker_random_titles.py 64 --repeat 1 --query '白色oversized T-shirt' + ./.venv/bin/python benchmarks/reranker/benchmark_reranker_random_titles.py 64 --repeat 1 --query '白色oversized T-shirt' ``` 得到: @@ -206,7 +206,7 @@ RERANK_BASE=http://127.0.0.1:6007 \ ```bash RERANK_BASE=http://127.0.0.1:6007 \ - ./.venv/bin/python scripts/benchmark_reranker_random_titles.py 153 --repeat 1 --query '白色oversized T-shirt' + ./.venv/bin/python benchmarks/reranker/benchmark_reranker_random_titles.py 153 --repeat 1 --query '白色oversized T-shirt' ``` 得到: @@ -276,5 +276,5 @@ offload_kqv: true - `config/config.yaml` - `scripts/setup_reranker_venv.sh` - `scripts/start_reranker.sh` -- `scripts/benchmark_reranker_gguf_local.py` +- `benchmarks/reranker/benchmark_reranker_gguf_local.py` - `reranker/GGUF_INSTALL_AND_TUNING.md` diff --git a/reranker/README.md b/reranker/README.md index c5b6235..5c2e8fe 100644 --- a/reranker/README.md +++ b/reranker/README.md @@ -46,9 +46,9 @@ Reranker 服务提供统一的 `/rerank` API,支持可插拔后端(BGE、Jin - `backends/dashscope_rerank.py`:DashScope 云端重排后端 - `scripts/setup_reranker_venv.sh`:按后端创建独立 venv - `scripts/start_reranker.sh`:启动 reranker 服务 -- `scripts/smoke_qwen3_vllm_score_backend.py`:`qwen3_vllm_score` 本地 smoke -- `scripts/benchmark_reranker_random_titles.py`:随机标题压测脚本 -- `scripts/run_reranker_vllm_instruction_benchmark.sh`:历史矩阵脚本 +- `benchmarks/reranker/smoke_qwen3_vllm_score_backend.py`:`qwen3_vllm_score` 本地 smoke +- `benchmarks/reranker/benchmark_reranker_random_titles.py`:随机标题压测脚本 +- `benchmarks/reranker/run_reranker_vllm_instruction_benchmark.sh`:历史矩阵脚本 ## 环境基线 @@ -118,7 +118,7 @@ nvidia-smi ### 4. Smoke ```bash -PYTHONPATH=. ./.venv-reranker-score/bin/python scripts/smoke_qwen3_vllm_score_backend.py --gpu-memory-utilization 0.2 +PYTHONPATH=. ./.venv-reranker-score/bin/python benchmarks/reranker/smoke_qwen3_vllm_score_backend.py --gpu-memory-utilization 0.2 ``` ## `jina_reranker_v3` diff --git a/scripts/README.md b/scripts/README.md new file mode 100644 index 0000000..8c6a3b6 --- /dev/null +++ b/scripts/README.md @@ -0,0 +1,53 @@ +# Scripts + +`scripts/` 现在只保留当前架构下仍然有效的运行、运维、环境和数据处理脚本。 + +## 当前分类 + +- 服务编排 + - `service_ctl.sh` + - `start_backend.sh` + - `start_indexer.sh` + - `start_frontend.sh` + - `start_eval_web.sh` + - `start_embedding_service.sh` + - `start_embedding_text_service.sh` + - `start_embedding_image_service.sh` + - `start_reranker.sh` + - `start_translator.sh` + - `start_tei_service.sh` + - `start_cnclip_service.sh` + - `stop.sh` + - `stop_tei_service.sh` + - `stop_cnclip_service.sh` + +- 环境初始化 + - `create_venv.sh` + - `init_env.sh` + - `setup_embedding_venv.sh` + - `setup_reranker_venv.sh` + - `setup_translator_venv.sh` + - `setup_cnclip_venv.sh` + +- 数据与索引 + - `create_tenant_index.sh` + - `build_suggestions.sh` + - `mock_data.sh` + +- 评估与专项工具 + - `evaluation/` + - `redis/` + - `debug/` + +## 已迁移 + +- 基准压测与 smoke 脚本:迁到 `benchmarks/` +- 手工接口试跑脚本:迁到 `tests/manual/` + +## 已清理 + +- 历史备份目录:`indexer__old_2025_11/` +- 过时壳脚本:`start.sh` +- Conda 时代残留:`install_server_deps.sh` + +后续如果新增脚本,优先放到明确子目录,不再把 benchmark、manual、历史备份直接丢回根 `scripts/`。 diff --git a/scripts/benchmark_nllb_t4_tuning.py b/scripts/benchmark_nllb_t4_tuning.py deleted file mode 100644 index b33459f..0000000 --- a/scripts/benchmark_nllb_t4_tuning.py +++ /dev/null @@ -1,318 +0,0 @@ -#!/usr/bin/env python3 -"""Focused NLLB T4 tuning benchmark for product-name translation.""" - -from __future__ import annotations - -import argparse -import copy -import json -import sys -from datetime import datetime -from pathlib import Path -from typing import Any, Dict, List, Tuple - -PROJECT_ROOT = Path(__file__).resolve().parent.parent -if str(PROJECT_ROOT) not in sys.path: - sys.path.insert(0, str(PROJECT_ROOT)) - -from config.services_config import get_translation_config -from scripts.benchmark_translation_local_models import ( - benchmark_concurrency_case, - benchmark_serial_case, - build_environment_info, - ensure_cuda_stats_reset, - load_texts, -) -from translation.service import TranslationService - - -SCENARIOS = [ - { - "name": "nllb zh->en", - "model": "nllb-200-distilled-600m", - "source_lang": "zh", - "target_lang": "en", - "column": "title_cn", - "scene": "sku_name", - }, - { - "name": "nllb en->zh", - "model": "nllb-200-distilled-600m", - "source_lang": "en", - "target_lang": "zh", - "column": "title", - "scene": "sku_name", - }, -] - -VARIANTS = [ - { - "name": "ct2_default_fixed64", - "description": "Original CT2 default", - "overrides": { - "ct2_inter_threads": 1, - "ct2_max_queued_batches": 0, - "ct2_batch_type": "examples", - "max_new_tokens": 64, - }, - }, - { - "name": "ct2_prev_t4_fixed64", - "description": "Previous T4 tuning result", - "overrides": { - "ct2_inter_threads": 2, - "ct2_max_queued_batches": 16, - "ct2_batch_type": "examples", - "max_new_tokens": 64, - }, - }, - { - "name": "ct2_best_t4_dynamic", - "description": "Recommended T4 profile after this round", - "overrides": { - "ct2_inter_threads": 4, - "ct2_max_queued_batches": 32, - "ct2_batch_type": "examples", - "max_new_tokens": 64, - "ct2_decoding_length_mode": "source", - "ct2_decoding_length_extra": 8, - "ct2_decoding_length_min": 32, - }, - }, - { - "name": "ct2_fixed48_experiment", - "description": "High-gain experiment with truncation risk", - "overrides": { - "ct2_inter_threads": 3, - "ct2_max_queued_batches": 16, - "ct2_batch_type": "examples", - "max_new_tokens": 48, - }, - }, -] - - -def parse_args() -> argparse.Namespace: - parser = argparse.ArgumentParser(description="Focused NLLB T4 tuning benchmark") - parser.add_argument("--csv-path", default="products_analyzed.csv", help="Benchmark dataset CSV path") - parser.add_argument( - "--output-dir", - default="perf_reports/20260318/nllb_t4_product_names_ct2", - help="Directory for JSON/Markdown reports", - ) - parser.add_argument("--batch-size", type=int, default=64, help="Batch size for the bulk scenario") - parser.add_argument("--batch-items", type=int, default=256, help="Rows used for the bulk scenario") - parser.add_argument("--concurrency", type=int, default=64, help="Concurrency for the online scenario") - parser.add_argument( - "--requests-per-case", - type=int, - default=24, - help="Requests per worker in the online scenario", - ) - parser.add_argument("--quality-samples", type=int, default=100, help="Rows used for quality spot-checks") - parser.add_argument("--warmup-batches", type=int, default=1, help="Warmup batches before measuring") - return parser.parse_args() - - -def build_service(model: str, overrides: Dict[str, Any]) -> Tuple[TranslationService, Dict[str, Any]]: - config = copy.deepcopy(get_translation_config()) - for name, cfg in config["capabilities"].items(): - cfg["enabled"] = name == model - cfg["use_cache"] = False - config["default_model"] = model - capability = config["capabilities"][model] - capability.update(overrides) - return TranslationService(config), capability - - -def build_quality_reference_overrides(overrides: Dict[str, Any]) -> Dict[str, Any]: - reference = dict(overrides) - reference.pop("ct2_decoding_length_mode", None) - reference.pop("ct2_decoding_length_extra", None) - reference.pop("ct2_decoding_length_min", None) - reference["max_new_tokens"] = max(64, int(reference.get("max_new_tokens", 64))) - return reference - - -def summarize_quality(reference_outputs: List[Any], candidate_outputs: List[Any], texts: List[str]) -> Dict[str, Any]: - same = 0 - diffs: List[Dict[str, str]] = [] - for text, ref_output, candidate_output in zip(texts, reference_outputs, candidate_outputs): - if ref_output == candidate_output: - same += 1 - continue - if len(diffs) < 3: - diffs.append( - { - "input": text, - "candidate": "" if candidate_output is None else str(candidate_output), - "reference": "" if ref_output is None else str(ref_output), - } - ) - return { - "same": same, - "total": len(texts), - "changed": len(texts) - same, - "sample_diffs": diffs, - } - - -def render_markdown(report: Dict[str, Any]) -> str: - lines = [ - "# NLLB T4 Product-Name Tuning", - "", - f"- Generated at: `{report['generated_at']}`", - f"- Python: `{report['environment']['python']}`", - f"- Torch: `{report['environment']['torch']}`", - f"- Transformers: `{report['environment']['transformers']}`", - f"- CUDA: `{report['environment']['cuda_available']}`", - ] - if report["environment"]["gpu_name"]: - lines.append(f"- GPU: `{report['environment']['gpu_name']}` ({report['environment']['gpu_total_mem_gb']} GiB)") - lines.extend( - [ - "", - "## Scope", - "", - f"- Bulk scenario: `batch={report['config']['batch_size']}, concurrency=1`", - f"- Online scenario: `batch=1, concurrency={report['config']['concurrency']}`", - f"- Online requests per worker: `{report['config']['requests_per_case']}`", - f"- Quality spot-check samples: `{report['config']['quality_samples']}`", - "", - "## Variants", - "", - ] - ) - for variant in report["variants"]: - lines.append(f"- `{variant['name']}`: {variant['description']} -> `{variant['overrides']}`") - - for scenario in report["scenarios"]: - lines.extend( - [ - "", - f"## {scenario['name']}", - "", - "| Variant | Bulk items/s | Bulk p95 ms | Online items/s | Online p95 ms | Quality same/total |", - "|---|---:|---:|---:|---:|---:|", - ] - ) - for variant in scenario["variants"]: - quality = variant["quality_vs_reference"] - lines.append( - f"| {variant['name']} | {variant['bulk']['items_per_second']} | {variant['bulk']['request_latency_p95_ms']} | " - f"{variant['online']['items_per_second']} | {variant['online']['request_latency_p95_ms']} | " - f"{quality['same']}/{quality['total']} |" - ) - for variant in scenario["variants"]: - quality = variant["quality_vs_reference"] - if not quality["sample_diffs"]: - continue - lines.extend( - [ - "", - f"### Quality Notes: {variant['name']}", - "", - ] - ) - for diff in quality["sample_diffs"]: - lines.append(f"- Input: `{diff['input']}`") - lines.append(f"- Candidate: `{diff['candidate']}`") - lines.append(f"- Reference: `{diff['reference']}`") - lines.append("") - - return "\n".join(lines).rstrip() + "\n" - - -def main() -> None: - args = parse_args() - csv_path = (PROJECT_ROOT / args.csv_path).resolve() if not Path(args.csv_path).is_absolute() else Path(args.csv_path) - output_dir = (PROJECT_ROOT / args.output_dir).resolve() if not Path(args.output_dir).is_absolute() else Path(args.output_dir) - output_dir.mkdir(parents=True, exist_ok=True) - - report: Dict[str, Any] = { - "generated_at": datetime.now().isoformat(timespec="seconds"), - "environment": build_environment_info(), - "config": { - "csv_path": str(csv_path), - "batch_size": args.batch_size, - "batch_items": args.batch_items, - "concurrency": args.concurrency, - "requests_per_case": args.requests_per_case, - "quality_samples": args.quality_samples, - }, - "variants": VARIANTS, - "scenarios": [], - } - - for scenario in SCENARIOS: - batch_texts = load_texts(csv_path, scenario["column"], args.batch_items) - online_texts = load_texts(csv_path, scenario["column"], args.concurrency * args.requests_per_case) - quality_texts = load_texts(csv_path, scenario["column"], args.quality_samples) - - scenario_report = dict(scenario) - scenario_report["variants"] = [] - for variant in VARIANTS: - print(f"[start] {scenario['name']} | {variant['name']}", flush=True) - ensure_cuda_stats_reset() - service, capability = build_service(scenario["model"], variant["overrides"]) - backend = service.get_backend(scenario["model"]) - bulk = benchmark_serial_case( - service=service, - backend=backend, - scenario=scenario, - capability=capability, - texts=batch_texts, - batch_size=args.batch_size, - warmup_batches=args.warmup_batches, - ) - online = benchmark_concurrency_case( - service=service, - backend=backend, - scenario=scenario, - capability=capability, - texts=online_texts, - batch_size=1, - concurrency=args.concurrency, - requests_per_case=args.requests_per_case, - warmup_batches=args.warmup_batches, - ) - quality_reference_overrides = build_quality_reference_overrides(variant["overrides"]) - reference_service, _ = build_service(scenario["model"], quality_reference_overrides) - reference_outputs = reference_service.translate( - quality_texts, - source_lang=scenario["source_lang"], - target_lang=scenario["target_lang"], - model=scenario["model"], - scene=scenario["scene"], - ) - candidate_outputs = service.translate( - quality_texts, - source_lang=scenario["source_lang"], - target_lang=scenario["target_lang"], - model=scenario["model"], - scene=scenario["scene"], - ) - scenario_report["variants"].append( - { - "name": variant["name"], - "description": variant["description"], - "overrides": variant["overrides"], - "quality_reference_overrides": quality_reference_overrides, - "bulk": bulk, - "online": online, - "quality_vs_reference": summarize_quality(reference_outputs, candidate_outputs, quality_texts), - } - ) - report["scenarios"].append(scenario_report) - - timestamp = datetime.now().strftime("%H%M%S") - json_path = output_dir / f"nllb_t4_tuning_{timestamp}.json" - md_path = output_dir / f"nllb_t4_tuning_{timestamp}.md" - json_path.write_text(json.dumps(report, ensure_ascii=False, indent=2), encoding="utf-8") - md_path.write_text(render_markdown(report), encoding="utf-8") - print(f"JSON_REPORT={json_path}") - print(f"MARKDOWN_REPORT={md_path}") - - -if __name__ == "__main__": - main() diff --git a/scripts/benchmark_reranker_1000docs.sh b/scripts/benchmark_reranker_1000docs.sh deleted file mode 100755 index a90e26d..0000000 --- a/scripts/benchmark_reranker_1000docs.sh +++ /dev/null @@ -1,130 +0,0 @@ -#!/bin/bash -# -# Benchmark reranker for e-commerce short-text workload: -# - query <= ~100 tokens -# - docs are short title / title+brief -# - one request contains ~1000 docs -# -# Outputs JSON reports under perf_reports//reranker_1000docs/ -# -# Usage: -# ./scripts/benchmark_reranker_1000docs.sh -# Optional env: -# BATCH_SIZES="24 32 48 64" -# C1_REQUESTS=4 -# C4_REQUESTS=8 -# TENANT_ID=162 -# -set -euo pipefail - -PROJECT_ROOT="$(cd "$(dirname "$0")/.." && pwd)" -cd "${PROJECT_ROOT}" - -TENANT_ID="${TENANT_ID:-162}" -BATCH_SIZES="${BATCH_SIZES:-24 32 48 64}" -C1_REQUESTS="${C1_REQUESTS:-4}" -C4_REQUESTS="${C4_REQUESTS:-8}" -TIMEOUT_SEC="${TIMEOUT_SEC:-240}" -RERANK_BASE="${RERANK_BASE:-http://127.0.0.1:6007}" - -DATE_TAG="$(date +%Y%m%d)" -OUT_DIR="perf_reports/${DATE_TAG}/reranker_1000docs" -TMP_CASES="/tmp/rerank_1000_shortdocs_cases.json" -mkdir -p "${OUT_DIR}" - -cleanup() { - ./scripts/service_ctl.sh stop reranker >/dev/null 2>&1 || true -} -trap cleanup EXIT - -cat > "${TMP_CASES}" <<'JSON' -{ - "scenarios": { - "rerank": [ - { - "method": "POST", - "path": "/rerank", - "json": { - "query": "wireless ergonomic gaming mouse for office use with rechargeable battery and bluetooth", - "docs": [], - "normalize": true - } - } - ] - } -} -JSON - -python3 - <<'PY' -import json -from pathlib import Path - -p = Path("/tmp/rerank_1000_shortdocs_cases.json") -d = json.loads(p.read_text(encoding="utf-8")) -docs = [] -for i in range(1000): - if i % 3 == 0: - doc = f"wireless mouse model {i} ergonomic grip 2.4g bluetooth" - elif i % 3 == 1: - doc = f"gaming mouse {i} rgb lightweight high precision sensor" - else: - doc = f"office mouse {i} rechargeable silent click compact" - if i % 5 == 0: - doc += " with usb receiver" - if i % 7 == 0: - doc += " long battery life" - docs.append(doc) - -d["scenarios"]["rerank"][0]["json"]["docs"] = docs -p.write_text(json.dumps(d, ensure_ascii=False), encoding="utf-8") -print(f"[info] generated docs={len(docs)} at {p}") -PY - -run_bench() { - local bs="$1" - local c="$2" - local req="$3" - local out="${OUT_DIR}/rerank_bs${bs}_c${c}_r${req}.json" - .venv/bin/python scripts/perf_api_benchmark.py \ - --scenario rerank \ - --tenant-id "${TENANT_ID}" \ - --reranker-base "${RERANK_BASE}" \ - --cases-file "${TMP_CASES}" \ - --concurrency "${c}" \ - --max-requests "${req}" \ - --timeout "${TIMEOUT_SEC}" \ - --output "${out}" >/dev/null - python3 - <"${OUT_DIR}/start_bs${bs}.log" 2>&1 & - - for i in $(seq 1 180); do - if curl -sf "${RERANK_BASE}/health" >/dev/null 2>&1; then - break - fi - sleep 1 - if [ "${i}" -eq 180 ]; then - echo "[error] reranker startup timeout for bs=${bs}" >&2 - tail -n 80 "${OUT_DIR}/start_bs${bs}.log" >&2 || true - exit 1 - fi - done - - run_bench "${bs}" 1 "${C1_REQUESTS}" - run_bench "${bs}" 4 "${C4_REQUESTS}" -done - -echo "[info] benchmark done: ${OUT_DIR}" diff --git a/scripts/benchmark_reranker_gguf_local.py b/scripts/benchmark_reranker_gguf_local.py deleted file mode 100644 index 2d12b33..0000000 --- a/scripts/benchmark_reranker_gguf_local.py +++ /dev/null @@ -1,198 +0,0 @@ -#!/usr/bin/env python3 -""" -Local tuning probe for GGUF reranker backends. - -Runs the backend directly in a fresh process per config to measure: -- load time -- GPU memory used by this process -- single-request rerank latency - -Example: - ./.venv-reranker-gguf/bin/python scripts/benchmark_reranker_gguf_local.py - ./.venv-reranker-gguf-06b/bin/python scripts/benchmark_reranker_gguf_local.py --backend-name qwen3_gguf_06b --docs 400 -""" - -from __future__ import annotations - -import argparse -import json -import os -import random -import statistics -import subprocess -import sys -import time -from pathlib import Path -from typing import Any - - -DEFAULT_TITLES = Path("/home/ubuntu/rerank_test/titles.1.8w") - - -def load_titles(path: Path) -> list[str]: - items: list[str] = [] - with path.open(encoding="utf-8", errors="replace") as fh: - for line in fh: - text = line.strip() - if text: - items.append(text) - return items - - -def gpu_mem_for_pid(pid: int) -> int: - try: - out = subprocess.check_output( - [ - "nvidia-smi", - "--query-compute-apps=pid,used_gpu_memory", - "--format=csv,noheader,nounits", - ], - text=True, - ) - except Exception: - return -1 - for raw in out.splitlines(): - parts = [p.strip() for p in raw.split(",")] - if len(parts) != 2: - continue - try: - row_pid = int(parts[0]) - row_mem = int(parts[1]) - except ValueError: - continue - if row_pid == pid: - return row_mem - return -1 - - -def main() -> int: - parser = argparse.ArgumentParser() - parser.add_argument("--backend-name", type=str, default="qwen3_gguf") - parser.add_argument("--titles-file", type=Path, default=DEFAULT_TITLES) - parser.add_argument("--query", type=str, default="白色oversized T-shirt") - parser.add_argument("--docs", type=int, default=160) - parser.add_argument("--repeat", type=int, default=1) - parser.add_argument("--seed", type=int, default=42) - parser.add_argument( - "--configs-json", - type=str, - default="", - help="JSON array of config objects; when omitted, uses built-in scan set.", - ) - args = parser.parse_args() - - if not args.titles_file.is_file(): - print(f"missing titles file: {args.titles_file}", file=sys.stderr) - return 2 - - titles = load_titles(args.titles_file) - if len(titles) < args.docs: - print(f"not enough titles: need {args.docs}, got {len(titles)}", file=sys.stderr) - return 2 - - random.seed(args.seed) - docs = random.sample(titles, args.docs) - - if args.configs_json: - configs = json.loads(args.configs_json) - elif args.backend_name == "qwen3_gguf_06b": - configs = [ - {"name": "gguf_06b_full_256", "n_ctx": 256, "n_batch": 256, "n_ubatch": 256, "n_gpu_layers": 999}, - {"name": "gguf_06b_full_320", "n_ctx": 320, "n_batch": 320, "n_ubatch": 320, "n_gpu_layers": 999}, - {"name": "gguf_06b_full_384", "n_ctx": 384, "n_batch": 384, "n_ubatch": 384, "n_gpu_layers": 999}, - {"name": "gguf_06b_full_512", "n_ctx": 512, "n_batch": 512, "n_ubatch": 512, "n_gpu_layers": 999}, - ] - else: - configs = [ - {"name": "gguf_t4_24g", "n_ctx": 384, "n_batch": 384, "n_ubatch": 128, "n_gpu_layers": 24}, - {"name": "gguf_t4_40g", "n_ctx": 384, "n_batch": 384, "n_ubatch": 128, "n_gpu_layers": 40}, - {"name": "gguf_t4_full", "n_ctx": 384, "n_batch": 384, "n_ubatch": 128, "n_gpu_layers": 999}, - {"name": "gguf_t4_full_512", "n_ctx": 512, "n_batch": 512, "n_ubatch": 256, "n_gpu_layers": 999}, - {"name": "gguf_t4_full_512_u512", "n_ctx": 512, "n_batch": 512, "n_ubatch": 512, "n_gpu_layers": 999}, - {"name": "gguf_t4_full_768", "n_ctx": 768, "n_batch": 768, "n_ubatch": 256, "n_gpu_layers": 999}, - ] - - from reranker.backends.qwen3_gguf import Qwen3GGUFRerankerBackend - - default_cfg_by_backend: dict[str, dict[str, Any]] = { - "qwen3_gguf": { - "_backend_name": "qwen3_gguf", - "repo_id": "DevQuasar/Qwen.Qwen3-Reranker-4B-GGUF", - "filename": "*Q8_0.gguf", - "local_dir": "./models/reranker/qwen3-reranker-4b-gguf", - "infer_batch_size": 8, - }, - "qwen3_gguf_06b": { - "_backend_name": "qwen3_gguf_06b", - "repo_id": "ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF", - "filename": "qwen3-reranker-0.6b-q8_0.gguf", - "local_dir": "./models/reranker/qwen3-reranker-0.6b-q8_0-gguf", - "infer_batch_size": 32, - }, - } - if args.backend_name not in default_cfg_by_backend: - print(f"unsupported backend: {args.backend_name}", file=sys.stderr) - return 2 - - base_cfg: dict[str, Any] = { - **default_cfg_by_backend[args.backend_name], - "instruction": "Rank products by query with category & style match prioritized", - "cache_dir": "./model_cache", - "main_gpu": 0, - "n_threads": 2, - "n_threads_batch": 4, - "flash_attn": True, - "offload_kqv": True, - "use_mmap": True, - "use_mlock": False, - "sort_by_doc_length": True, - "length_sort_mode": "char", - "enable_warmup": True, - "verbose": False, - "reuse_query_state": True, - } - - all_results: list[dict[str, Any]] = [] - for cfg in configs: - merged = dict(base_cfg) - merged.update(cfg) - name = str(merged.pop("name")) - - t0 = time.perf_counter() - backend = Qwen3GGUFRerankerBackend(merged) - load_ms = (time.perf_counter() - t0) * 1000.0 - gpu_mem_mib = gpu_mem_for_pid(os.getpid()) - - runs: list[float] = [] - last_meta: dict[str, Any] = {} - for _ in range(args.repeat): - t1 = time.perf_counter() - _scores, meta = backend.score_with_meta(args.query, docs, normalize=True) - runs.append((time.perf_counter() - t1) * 1000.0) - last_meta = dict(meta) - - result = { - "name": name, - "config": merged, - "load_ms": round(load_ms, 2), - "gpu_mem_mib": gpu_mem_mib, - "latency_ms_min": round(min(runs), 2), - "latency_ms_avg": round(statistics.mean(runs), 2), - "latency_ms_max": round(max(runs), 2), - "meta": last_meta, - } - all_results.append(result) - print(json.dumps(result, ensure_ascii=False)) - del backend - - print("SUMMARY") - for item in sorted(all_results, key=lambda x: x["latency_ms_avg"]): - print( - f'{item["name"]}: avg={item["latency_ms_avg"]}ms ' - f'gpu={item["gpu_mem_mib"]}MiB load={item["load_ms"]}ms' - ) - return 0 - - -if __name__ == "__main__": - raise SystemExit(main()) diff --git a/scripts/benchmark_reranker_random_titles.py b/scripts/benchmark_reranker_random_titles.py deleted file mode 100755 index 64fe917..0000000 --- a/scripts/benchmark_reranker_random_titles.py +++ /dev/null @@ -1,312 +0,0 @@ -#!/usr/bin/env python3 -""" -Single-request rerank latency probe using real title lines (e.g. 1.8w export). - -Randomly samples N titles from a text file (one title per line), POSTs to the -rerank HTTP API, prints wall-clock latency. - -Supports multiple N values (comma-separated) and multiple repeats per N. -Each invocation runs 3 warmup requests with n=400 first; those are not timed for summaries. - -Example: - source activate.sh - python scripts/benchmark_reranker_random_titles.py 386 - python scripts/benchmark_reranker_random_titles.py 40,80,100 - python scripts/benchmark_reranker_random_titles.py 40,80,100 --repeat 3 --seed 42 - RERANK_BASE=http://127.0.0.1:6007 python scripts/benchmark_reranker_random_titles.py 200 -""" - -from __future__ import annotations - -import argparse -import json -import os -import random -import statistics -import sys -import time -from pathlib import Path -from typing import List, Optional, Tuple - -import httpx - - -def _load_titles(path: Path) -> List[str]: - lines: List[str] = [] - with path.open(encoding="utf-8", errors="replace") as f: - for line in f: - s = line.strip() - if s: - lines.append(s) - return lines - - -def _parse_doc_counts(s: str) -> List[int]: - parts = [p.strip() for p in s.split(",") if p.strip()] - if not parts: - raise ValueError("empty doc-count list") - out: List[int] = [] - for p in parts: - v = int(p, 10) - if v <= 0: - raise ValueError(f"doc count must be positive, got {v}") - out.append(v) - return out - - -def _do_rerank( - client: httpx.Client, - url: str, - query: str, - docs: List[str], - *, - top_n: int, - normalize: bool, -) -> Tuple[bool, int, float, Optional[int], str]: - payload: dict = {"query": query, "docs": docs, "normalize": normalize} - if top_n > 0: - payload["top_n"] = top_n - body = json.dumps(payload, ensure_ascii=False) - headers = {"Content-Type": "application/json"} - t0 = time.perf_counter() - try: - resp = client.post(url, content=body.encode("utf-8"), headers=headers) - except httpx.HTTPError: - raise - elapsed_ms = (time.perf_counter() - t0) * 1000.0 - text = resp.text or "" - ok = resp.status_code == 200 - scores_len: Optional[int] = None - if ok: - try: - data = resp.json() - sc = data.get("scores") - if isinstance(sc, list): - scores_len = len(sc) - except json.JSONDecodeError: - scores_len = None - return ok, resp.status_code, elapsed_ms, scores_len, text - - -def main() -> int: - parser = argparse.ArgumentParser( - description="POST /rerank with N random titles from a file and print latency." - ) - parser.add_argument( - "n", - type=str, - metavar="N[,N,...]", - help="Doc counts: one integer or comma-separated list, e.g. 40,80,100.", - ) - parser.add_argument( - "--repeat", - type=int, - default=3, - help="Number of runs per doc count (default: 3).", - ) - parser.add_argument( - "--titles-file", - type=Path, - default=Path(os.environ.get("RERANK_TITLE_FILE", "/home/ubuntu/rerank_test/titles.1.8w")), - help="Path to newline-separated titles (default: %(default)s or env RERANK_TITLE_FILE).", - ) - parser.add_argument( - "--url", - type=str, - default=os.environ.get("RERANK_BASE", "http://127.0.0.1:6007").rstrip("/") + "/rerank", - help="Full rerank URL (default: $RERANK_BASE/rerank or http://127.0.0.1:6007/rerank).", - ) - parser.add_argument( - "--query", - type=str, - default="健身女生T恤短袖", - help="Rerank query string.", - ) - parser.add_argument( - "--seed", - type=int, - default=None, - help="RNG base seed; each (n, run) uses a derived seed when set (optional).", - ) - parser.add_argument( - "--top-n", - type=int, - default=0, - help="If > 0, include top_n in JSON body (omit field when 0).", - ) - parser.add_argument( - "--no-normalize", - action="store_true", - help="Send normalize=false (default: normalize=true).", - ) - parser.add_argument( - "--timeout", - type=float, - default=float(os.environ.get("RERANK_TIMEOUT_SEC", "240")), - help="HTTP timeout seconds.", - ) - parser.add_argument( - "--print-body-preview", - action="store_true", - help="Print first ~500 chars of response body on success (last run only).", - ) - parser.add_argument( - "--tag", - type=str, - default=os.environ.get("BENCH_TAG", ""), - help="Optional label stored in --json-summary-out (default: env BENCH_TAG or empty).", - ) - parser.add_argument( - "--json-summary-out", - type=Path, - default=None, - help="Write one JSON object with per-n latencies and aggregates for downstream tables.", - ) - parser.add_argument( - "--quiet-runs", - action="store_true", - help="Suppress per-run lines; still prints warmup lines and text summaries.", - ) - args = parser.parse_args() - - try: - doc_counts = _parse_doc_counts(args.n) - except ValueError as exc: - print(f"error: invalid N list {args.n!r}: {exc}", file=sys.stderr) - return 2 - - repeat = int(args.repeat) - if repeat <= 0: - print("error: --repeat must be positive", file=sys.stderr) - return 2 - - if not args.titles_file.is_file(): - print(f"error: titles file not found: {args.titles_file}", file=sys.stderr) - return 2 - - titles = _load_titles(args.titles_file) - warmup_n = 400 - warmup_runs = 3 - max_n = max(max(doc_counts), warmup_n) - if len(titles) < max_n: - print( - f"error: file has only {len(titles)} non-empty lines, need at least {max_n}", - file=sys.stderr, - ) - return 2 - - top_n = int(args.top_n) - normalize = not args.no_normalize - any_fail = False - summary: dict[int, List[float]] = {n: [] for n in doc_counts} - - with httpx.Client(timeout=args.timeout) as client: - for w in range(warmup_runs): - if args.seed is not None: - random.seed(args.seed + 8_000_000 + w) - docs_w = random.sample(titles, warmup_n) - try: - ok_w, status_w, _elapsed_w, scores_len_w, _text_w = _do_rerank( - client, - args.url, - args.query, - docs_w, - top_n=top_n, - normalize=normalize, - ) - except httpx.HTTPError as exc: - print( - f"warmup n={warmup_n} {w + 1}/{warmup_runs} error: request failed: {exc}", - file=sys.stderr, - ) - any_fail = True - continue - if not ok_w: - any_fail = True - print( - f"warmup n={warmup_n} {w + 1}/{warmup_runs} status={status_w} " - f"scores={scores_len_w if scores_len_w is not None else 'n/a'} (not timed)" - ) - - for n in doc_counts: - for run_idx in range(repeat): - if args.seed is not None: - random.seed(args.seed + n * 10_000 + run_idx) - docs = random.sample(titles, n) - try: - ok, status, elapsed_ms, scores_len, text = _do_rerank( - client, - args.url, - args.query, - docs, - top_n=top_n, - normalize=normalize, - ) - except httpx.HTTPError as exc: - print( - f"n={n} run={run_idx + 1}/{repeat} error: request failed: {exc}", - file=sys.stderr, - ) - any_fail = True - continue - - if ok: - summary[n].append(elapsed_ms) - else: - any_fail = True - - if not args.quiet_runs: - print( - f"n={n} run={run_idx + 1}/{repeat} status={status} " - f"latency_ms={elapsed_ms:.2f} scores={scores_len if scores_len is not None else 'n/a'}" - ) - if args.print_body_preview and text and run_idx == repeat - 1 and n == doc_counts[-1]: - preview = text[:500] + ("…" if len(text) > 500 else "") - print(preview) - - for n in doc_counts: - lat = summary[n] - if not lat: - print(f"summary n={n} runs=0 (all failed)") - continue - avg = statistics.mean(lat) - lo, hi = min(lat), max(lat) - extra = "" - if len(lat) >= 2: - extra = f" stdev_ms={statistics.stdev(lat):.2f}" - print( - f"summary n={n} runs={len(lat)} min_ms={lo:.2f} max_ms={hi:.2f} avg_ms={avg:.2f}{extra}" - ) - - if args.json_summary_out is not None: - per_n: dict = {} - for n in doc_counts: - lat = summary[n] - row: dict = {"values_ms": lat, "runs": len(lat)} - if lat: - row["mean_ms"] = statistics.mean(lat) - row["min_ms"] = min(lat) - row["max_ms"] = max(lat) - if len(lat) >= 2: - row["stdev_ms"] = statistics.stdev(lat) - per_n[str(n)] = row - out_obj = { - "tag": args.tag or None, - "doc_counts": doc_counts, - "repeat": repeat, - "url": args.url, - "per_n": per_n, - "failed": bool(any_fail), - } - args.json_summary_out.parent.mkdir(parents=True, exist_ok=True) - args.json_summary_out.write_text( - json.dumps(out_obj, ensure_ascii=False, indent=2) + "\n", - encoding="utf-8", - ) - print(f"wrote json summary -> {args.json_summary_out}") - - return 1 if any_fail else 0 - - -if __name__ == "__main__": - raise SystemExit(main()) diff --git a/scripts/benchmark_translation_local_models.py b/scripts/benchmark_translation_local_models.py deleted file mode 100644 index ded8c64..0000000 --- a/scripts/benchmark_translation_local_models.py +++ /dev/null @@ -1,948 +0,0 @@ -#!/usr/bin/env python3 -"""Benchmark local translation models with products_analyzed.csv.""" - -from __future__ import annotations - -import argparse -import concurrent.futures -import copy -import csv -import json -import math -import platform -import resource -import statistics -import subprocess -import sys -import time -from datetime import datetime -from pathlib import Path -from typing import Any, Dict, Iterable, List, Sequence - -import torch -import transformers - -PROJECT_ROOT = Path(__file__).resolve().parent.parent -if str(PROJECT_ROOT) not in sys.path: - sys.path.insert(0, str(PROJECT_ROOT)) - -from config.services_config import get_translation_config # noqa: E402 -from translation.service import TranslationService # noqa: E402 -from translation.settings import get_translation_capability # noqa: E402 - - -DEFAULT_BATCH_SIZES = [1, 4, 8, 16, 32, 64] -DEFAULT_CONCURRENCIES = [1, 2, 4, 8, 16, 64] - -SCENARIOS: List[Dict[str, str]] = [ - { - "name": "nllb-200-distilled-600m zh->en", - "model": "nllb-200-distilled-600m", - "source_lang": "zh", - "target_lang": "en", - "column": "title_cn", - "scene": "sku_name", - }, - { - "name": "nllb-200-distilled-600m en->zh", - "model": "nllb-200-distilled-600m", - "source_lang": "en", - "target_lang": "zh", - "column": "title", - "scene": "sku_name", - }, - { - "name": "opus-mt-zh-en zh->en", - "model": "opus-mt-zh-en", - "source_lang": "zh", - "target_lang": "en", - "column": "title_cn", - "scene": "sku_name", - }, - { - "name": "opus-mt-en-zh en->zh", - "model": "opus-mt-en-zh", - "source_lang": "en", - "target_lang": "zh", - "column": "title", - "scene": "sku_name", - }, -] - - -def parse_args() -> argparse.Namespace: - parser = argparse.ArgumentParser(description="Benchmark local translation models") - parser.add_argument("--csv-path", default="products_analyzed.csv", help="Benchmark dataset CSV path") - parser.add_argument("--limit", type=int, default=0, help="Limit rows for baseline or single-case run; 0 means all") - parser.add_argument("--output-dir", default="", help="Directory for JSON/Markdown reports") - parser.add_argument("--single", action="store_true", help="Run a single scenario in-process") - parser.add_argument("--model", default="", help="Model name for --single mode") - parser.add_argument("--source-lang", default="", help="Source language for --single mode") - parser.add_argument("--target-lang", default="", help="Target language for --single mode") - parser.add_argument("--column", default="", help="CSV column to benchmark for --single mode") - parser.add_argument("--scene", default="sku_name", help="Scene passed to translation service") - parser.add_argument("--batch-size", type=int, default=0, help="Override configured batch size") - parser.add_argument("--device-override", default="", help="Override configured device, for example cpu or cuda") - parser.add_argument("--torch-dtype-override", default="", help="Override configured torch dtype, for example float32 or float16") - parser.add_argument("--max-new-tokens", type=int, default=0, help="Override configured max_new_tokens") - parser.add_argument("--num-beams", type=int, default=0, help="Override configured num_beams") - parser.add_argument("--attn-implementation", default="", help="Override attention implementation, for example sdpa") - parser.add_argument("--ct2-inter-threads", type=int, default=-1, help="Override CTranslate2 inter_threads") - parser.add_argument("--ct2-intra-threads", type=int, default=-1, help="Override CTranslate2 intra_threads") - parser.add_argument( - "--ct2-max-queued-batches", - type=int, - default=-1, - help="Override CTranslate2 max_queued_batches", - ) - parser.add_argument( - "--ct2-batch-type", - default="", - help="Override CTranslate2 batch_type, for example examples or tokens", - ) - parser.add_argument( - "--ct2-decoding-length-mode", - default="", - help="Override CTranslate2 decoding length mode, for example fixed or source", - ) - parser.add_argument( - "--ct2-decoding-length-extra", - type=int, - default=0, - help="Extra tokens added when ct2 decoding length mode is source", - ) - parser.add_argument( - "--ct2-decoding-length-min", - type=int, - default=0, - help="Minimum decoding length when ct2 decoding length mode is source", - ) - parser.add_argument("--warmup-batches", type=int, default=1, help="Warmup batches before measuring") - parser.add_argument("--disable-cache", action="store_true", help="Disable translation cache during benchmarks") - parser.add_argument( - "--suite", - choices=["baseline", "extended"], - default="baseline", - help="baseline keeps the previous all-scenarios summary; extended adds batch/concurrency/matrix sweeps", - ) - parser.add_argument( - "--batch-size-list", - default="", - help="Comma-separated batch sizes for extended suite; default 1,4,8,16,32,64", - ) - parser.add_argument( - "--concurrency-list", - default="", - help="Comma-separated concurrency levels for extended suite; default 1,2,4,8,16,64", - ) - parser.add_argument( - "--serial-items-per-case", - type=int, - default=512, - help="Items per batch-size case in extended suite", - ) - parser.add_argument( - "--concurrency-requests-per-case", - type=int, - default=128, - help="Requests per concurrency or matrix case in extended suite", - ) - parser.add_argument( - "--concurrency-batch-size", - type=int, - default=1, - help="Batch size used by the dedicated concurrency sweep", - ) - parser.add_argument( - "--max-batch-concurrency-product", - type=int, - default=128, - help="Skip matrix cases where batch_size * concurrency exceeds this value; 0 disables the limit", - ) - return parser.parse_args() - - -def parse_csv_ints(raw: str, fallback: Sequence[int]) -> List[int]: - if not raw.strip(): - return list(fallback) - values: List[int] = [] - for item in raw.split(","): - stripped = item.strip() - if not stripped: - continue - value = int(stripped) - if value <= 0: - raise ValueError(f"Expected positive integer, got {value}") - values.append(value) - if not values: - raise ValueError("Parsed empty integer list") - return values - - -def load_texts(csv_path: Path, column: str, limit: int) -> List[str]: - texts: List[str] = [] - with csv_path.open("r", encoding="utf-8") as handle: - reader = csv.DictReader(handle) - for row in reader: - value = (row.get(column) or "").strip() - if value: - texts.append(value) - if limit > 0 and len(texts) >= limit: - break - if not texts: - raise ValueError(f"No non-empty texts found in column '{column}' from {csv_path}") - return texts - - -def batched(values: Sequence[str], batch_size: int) -> Iterable[List[str]]: - for start in range(0, len(values), batch_size): - yield list(values[start:start + batch_size]) - - -def percentile(values: List[float], p: float) -> float: - if not values: - return 0.0 - ordered = sorted(values) - if len(values) == 1: - return float(ordered[0]) - idx = (len(ordered) - 1) * p - lower = math.floor(idx) - upper = math.ceil(idx) - if lower == upper: - return float(ordered[lower]) - return float(ordered[lower] + (ordered[upper] - ordered[lower]) * (idx - lower)) - - -def resolve_output_dir(output_dir: str) -> Path: - if output_dir: - path = Path(output_dir) - else: - path = PROJECT_ROOT / "perf_reports" / datetime.now().strftime("%Y%m%d") / "translation_local_models" - path.mkdir(parents=True, exist_ok=True) - return path - - -def build_environment_info() -> Dict[str, Any]: - gpu_name = None - gpu_total_mem_gb = None - if torch.cuda.is_available(): - gpu_name = torch.cuda.get_device_name(0) - props = torch.cuda.get_device_properties(0) - gpu_total_mem_gb = round(props.total_memory / (1024 ** 3), 2) - return { - "python": platform.python_version(), - "torch": torch.__version__, - "transformers": transformers.__version__, - "cuda_available": torch.cuda.is_available(), - "gpu_name": gpu_name, - "gpu_total_mem_gb": gpu_total_mem_gb, - "platform": platform.platform(), - } - - -def scenario_from_args(args: argparse.Namespace) -> Dict[str, str]: - return { - "name": f"{args.model} {args.source_lang}->{args.target_lang}", - "model": args.model, - "source_lang": args.source_lang, - "target_lang": args.target_lang, - "column": args.column, - "scene": args.scene, - } - - -def build_config_and_capability( - args: argparse.Namespace, - *, - batch_size_override: int | None = None, -) -> tuple[Dict[str, Any], Dict[str, Any]]: - config = copy.deepcopy(get_translation_config()) - for name, cfg in config["capabilities"].items(): - cfg["enabled"] = name == args.model - config["default_model"] = args.model - capability = get_translation_capability(config, args.model, require_enabled=False) - if args.device_override: - capability["device"] = args.device_override - if args.torch_dtype_override: - capability["torch_dtype"] = args.torch_dtype_override - if batch_size_override is not None: - capability["batch_size"] = batch_size_override - elif args.batch_size: - capability["batch_size"] = args.batch_size - if args.max_new_tokens: - capability["max_new_tokens"] = args.max_new_tokens - if args.num_beams: - capability["num_beams"] = args.num_beams - if args.attn_implementation: - capability["attn_implementation"] = args.attn_implementation - if args.ct2_inter_threads >= 0: - capability["ct2_inter_threads"] = args.ct2_inter_threads - if args.ct2_intra_threads >= 0: - capability["ct2_intra_threads"] = args.ct2_intra_threads - if args.ct2_max_queued_batches >= 0: - capability["ct2_max_queued_batches"] = args.ct2_max_queued_batches - if args.ct2_batch_type: - capability["ct2_batch_type"] = args.ct2_batch_type - if args.ct2_decoding_length_mode: - capability["ct2_decoding_length_mode"] = args.ct2_decoding_length_mode - if args.ct2_decoding_length_extra: - capability["ct2_decoding_length_extra"] = args.ct2_decoding_length_extra - if args.ct2_decoding_length_min: - capability["ct2_decoding_length_min"] = args.ct2_decoding_length_min - if args.disable_cache: - capability["use_cache"] = False - config["capabilities"][args.model] = capability - return config, capability - - -def ensure_cuda_stats_reset() -> None: - if torch.cuda.is_available(): - torch.cuda.empty_cache() - torch.cuda.reset_peak_memory_stats() - - -def build_memory_metrics() -> Dict[str, Any]: - peak_gpu_mem_gb = None - peak_gpu_reserved_gb = None - if torch.cuda.is_available(): - peak_gpu_mem_gb = round(torch.cuda.max_memory_allocated() / (1024 ** 3), 3) - peak_gpu_reserved_gb = round(torch.cuda.max_memory_reserved() / (1024 ** 3), 3) - max_rss_mb = round(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024, 2) - return { - "max_rss_mb": max_rss_mb, - "peak_gpu_memory_gb": peak_gpu_mem_gb, - "peak_gpu_reserved_gb": peak_gpu_reserved_gb, - } - - -def make_request_payload(batch: Sequence[str]) -> str | List[str]: - if len(batch) == 1: - return batch[0] - return list(batch) - - -def benchmark_serial_case( - *, - service: TranslationService, - backend: Any, - scenario: Dict[str, str], - capability: Dict[str, Any], - texts: List[str], - batch_size: int, - warmup_batches: int, -) -> Dict[str, Any]: - backend.batch_size = batch_size - measured_batches = list(batched(texts, batch_size)) - warmup_count = min(max(warmup_batches, 0), len(measured_batches)) - - for batch in measured_batches[:warmup_count]: - service.translate( - text=make_request_payload(batch), - source_lang=scenario["source_lang"], - target_lang=scenario["target_lang"], - model=scenario["model"], - scene=scenario["scene"], - ) - - batch_latencies_ms: List[float] = [] - success_count = 0 - failure_count = 0 - output_chars = 0 - total_input_chars = sum(len(text) for text in texts) - - start = time.perf_counter() - for batch in measured_batches: - batch_start = time.perf_counter() - outputs = service.translate( - text=make_request_payload(batch), - source_lang=scenario["source_lang"], - target_lang=scenario["target_lang"], - model=scenario["model"], - scene=scenario["scene"], - ) - elapsed_ms = (time.perf_counter() - batch_start) * 1000 - batch_latencies_ms.append(elapsed_ms) - - if isinstance(outputs, list): - result_items = outputs - else: - result_items = [outputs] - for item in result_items: - if item is None: - failure_count += 1 - else: - success_count += 1 - output_chars += len(item) - translate_seconds = time.perf_counter() - start - total_items = len(texts) - memory = build_memory_metrics() - - return { - "mode": "serial_batch", - "batch_size": batch_size, - "concurrency": 1, - "rows": total_items, - "requests": len(measured_batches), - "input_chars": total_input_chars, - "load_seconds": 0.0, - "translate_seconds": round(translate_seconds, 4), - "total_seconds": round(translate_seconds, 4), - "batch_count": len(batch_latencies_ms), - "request_latency_p50_ms": round(percentile(batch_latencies_ms, 0.50), 2), - "request_latency_p95_ms": round(percentile(batch_latencies_ms, 0.95), 2), - "request_latency_max_ms": round(max(batch_latencies_ms), 2), - "avg_request_latency_ms": round(statistics.fmean(batch_latencies_ms), 2), - "avg_item_latency_ms": round((translate_seconds / total_items) * 1000, 3), - "requests_per_second": round(len(measured_batches) / translate_seconds, 2), - "items_per_second": round(total_items / translate_seconds, 2), - "input_chars_per_second": round(total_input_chars / translate_seconds, 2), - "output_chars_per_second": round(output_chars / translate_seconds, 2), - "success_count": success_count, - "failure_count": failure_count, - "success_rate": round(success_count / total_items, 6), - "device": str(getattr(backend, "device", capability.get("device", "unknown"))), - "torch_dtype": str(getattr(backend, "torch_dtype", capability.get("torch_dtype", "unknown"))), - "configured_batch_size": int(capability.get("batch_size") or batch_size), - "used_batch_size": batch_size, - "warmup_batches": warmup_count, - **memory, - } - - -def benchmark_concurrency_case( - *, - service: TranslationService, - backend: Any, - scenario: Dict[str, str], - capability: Dict[str, Any], - texts: List[str], - batch_size: int, - concurrency: int, - requests_per_case: int, - warmup_batches: int, -) -> Dict[str, Any]: - backend.batch_size = batch_size - required_items = batch_size * requests_per_case - case_texts = texts[:required_items] - request_batches = list(batched(case_texts, batch_size)) - if not request_batches: - raise ValueError("No request batches prepared for concurrency benchmark") - warmup_count = min(max(warmup_batches, 0), len(request_batches)) - - for batch in request_batches[:warmup_count]: - service.translate( - text=make_request_payload(batch), - source_lang=scenario["source_lang"], - target_lang=scenario["target_lang"], - model=scenario["model"], - scene=scenario["scene"], - ) - - request_latencies_ms: List[float] = [] - success_count = 0 - failure_count = 0 - output_chars = 0 - total_input_chars = sum(len(text) for text in case_texts) - - def worker(batch: List[str]) -> tuple[float, int, int, int]: - started = time.perf_counter() - outputs = service.translate( - text=make_request_payload(batch), - source_lang=scenario["source_lang"], - target_lang=scenario["target_lang"], - model=scenario["model"], - scene=scenario["scene"], - ) - elapsed_ms = (time.perf_counter() - started) * 1000 - if isinstance(outputs, list): - result_items = outputs - else: - result_items = [outputs] - local_success = 0 - local_failure = 0 - local_output_chars = 0 - for item in result_items: - if item is None: - local_failure += 1 - else: - local_success += 1 - local_output_chars += len(item) - return elapsed_ms, local_success, local_failure, local_output_chars - - wall_start = time.perf_counter() - with concurrent.futures.ThreadPoolExecutor(max_workers=concurrency) as executor: - futures = [executor.submit(worker, batch) for batch in request_batches] - for future in concurrent.futures.as_completed(futures): - latency_ms, local_success, local_failure, local_output_chars = future.result() - request_latencies_ms.append(latency_ms) - success_count += local_success - failure_count += local_failure - output_chars += local_output_chars - wall_seconds = time.perf_counter() - wall_start - total_items = len(case_texts) - memory = build_memory_metrics() - - return { - "mode": "concurrency", - "batch_size": batch_size, - "concurrency": concurrency, - "rows": total_items, - "requests": len(request_batches), - "input_chars": total_input_chars, - "load_seconds": 0.0, - "translate_seconds": round(wall_seconds, 4), - "total_seconds": round(wall_seconds, 4), - "batch_count": len(request_latencies_ms), - "request_latency_p50_ms": round(percentile(request_latencies_ms, 0.50), 2), - "request_latency_p95_ms": round(percentile(request_latencies_ms, 0.95), 2), - "request_latency_max_ms": round(max(request_latencies_ms), 2), - "avg_request_latency_ms": round(statistics.fmean(request_latencies_ms), 2), - "avg_item_latency_ms": round((wall_seconds / total_items) * 1000, 3), - "requests_per_second": round(len(request_batches) / wall_seconds, 2), - "items_per_second": round(total_items / wall_seconds, 2), - "input_chars_per_second": round(total_input_chars / wall_seconds, 2), - "output_chars_per_second": round(output_chars / wall_seconds, 2), - "success_count": success_count, - "failure_count": failure_count, - "success_rate": round(success_count / total_items, 6), - "device": str(getattr(backend, "device", capability.get("device", "unknown"))), - "torch_dtype": str(getattr(backend, "torch_dtype", capability.get("torch_dtype", "unknown"))), - "configured_batch_size": int(capability.get("batch_size") or batch_size), - "used_batch_size": batch_size, - "warmup_batches": warmup_count, - **memory, - } - - -def benchmark_single_scenario(args: argparse.Namespace) -> Dict[str, Any]: - csv_path = (PROJECT_ROOT / args.csv_path).resolve() if not Path(args.csv_path).is_absolute() else Path(args.csv_path) - scenario = scenario_from_args(args) - config, capability = build_config_and_capability(args) - configured_batch_size = int(capability.get("batch_size") or 1) - batch_size = configured_batch_size - texts = load_texts(csv_path, args.column, args.limit) - - ensure_cuda_stats_reset() - load_start = time.perf_counter() - service = TranslationService(config) - backend = service.get_backend(args.model) - load_seconds = time.perf_counter() - load_start - - runtime = benchmark_serial_case( - service=service, - backend=backend, - scenario=scenario, - capability=capability, - texts=texts, - batch_size=batch_size, - warmup_batches=args.warmup_batches, - ) - runtime["load_seconds"] = round(load_seconds, 4) - runtime["total_seconds"] = round(runtime["load_seconds"] + runtime["translate_seconds"], 4) - - return { - "scenario": scenario, - "dataset": { - "csv_path": str(csv_path), - "rows": len(texts), - "input_chars": sum(len(text) for text in texts), - }, - "runtime": runtime, - } - - -def benchmark_extended_scenario(args: argparse.Namespace) -> Dict[str, Any]: - csv_path = (PROJECT_ROOT / args.csv_path).resolve() if not Path(args.csv_path).is_absolute() else Path(args.csv_path) - scenario = scenario_from_args(args) - batch_sizes = parse_csv_ints(args.batch_size_list, DEFAULT_BATCH_SIZES) - concurrencies = parse_csv_ints(args.concurrency_list, DEFAULT_CONCURRENCIES) - largest_batch = max(batch_sizes + [args.concurrency_batch_size]) - largest_concurrency = max(concurrencies) - max_product = args.max_batch_concurrency_product - required_items = max( - args.limit or 0, - max(args.serial_items_per_case, largest_batch), - args.concurrency_requests_per_case * args.concurrency_batch_size, - largest_batch * args.concurrency_requests_per_case, - ) - texts = load_texts(csv_path, args.column, required_items) - config, capability = build_config_and_capability(args) - - ensure_cuda_stats_reset() - load_start = time.perf_counter() - service = TranslationService(config) - backend = service.get_backend(args.model) - load_seconds = time.perf_counter() - load_start - - batch_sweep: List[Dict[str, Any]] = [] - concurrency_sweep: List[Dict[str, Any]] = [] - matrix_results: List[Dict[str, Any]] = [] - - for batch_size in batch_sizes: - case_texts = texts[: max(batch_size, args.serial_items_per_case)] - batch_sweep.append( - benchmark_serial_case( - service=service, - backend=backend, - scenario=scenario, - capability=capability, - texts=case_texts, - batch_size=batch_size, - warmup_batches=args.warmup_batches, - ) - ) - - for concurrency in concurrencies: - concurrency_sweep.append( - benchmark_concurrency_case( - service=service, - backend=backend, - scenario=scenario, - capability=capability, - texts=texts, - batch_size=args.concurrency_batch_size, - concurrency=concurrency, - requests_per_case=args.concurrency_requests_per_case, - warmup_batches=args.warmup_batches, - ) - ) - - for batch_size in batch_sizes: - for concurrency in concurrencies: - if max_product > 0 and batch_size * concurrency > max_product: - continue - matrix_results.append( - benchmark_concurrency_case( - service=service, - backend=backend, - scenario=scenario, - capability=capability, - texts=texts, - batch_size=batch_size, - concurrency=concurrency, - requests_per_case=args.concurrency_requests_per_case, - warmup_batches=args.warmup_batches, - ) - ) - - for collection in (batch_sweep, concurrency_sweep, matrix_results): - for idx, item in enumerate(collection): - item["load_seconds"] = round(load_seconds if idx == 0 else 0.0, 4) - item["total_seconds"] = round(item["load_seconds"] + item["translate_seconds"], 4) - - return { - "scenario": scenario, - "dataset": { - "csv_path": str(csv_path), - "rows_loaded": len(texts), - }, - "config": { - "batch_sizes": batch_sizes, - "concurrencies": concurrencies, - "serial_items_per_case": args.serial_items_per_case, - "concurrency_requests_per_case": args.concurrency_requests_per_case, - "concurrency_batch_size": args.concurrency_batch_size, - "max_batch_concurrency_product": max_product, - "cache_disabled": bool(args.disable_cache), - }, - "runtime_defaults": { - "device": str(getattr(backend, "device", capability.get("device", "unknown"))), - "torch_dtype": str(getattr(backend, "torch_dtype", capability.get("torch_dtype", "unknown"))), - "configured_batch_size": int(capability.get("batch_size") or 1), - "load_seconds": round(load_seconds, 4), - }, - "batch_sweep": batch_sweep, - "concurrency_sweep": concurrency_sweep, - "matrix": matrix_results, - } - - -def run_all_scenarios(args: argparse.Namespace) -> Dict[str, Any]: - report = { - "generated_at": datetime.now().isoformat(timespec="seconds"), - "suite": args.suite, - "environment": build_environment_info(), - "scenarios": [], - } - - for scenario in SCENARIOS: - cmd = [ - sys.executable, - str(Path(__file__).resolve()), - "--single", - "--csv-path", - args.csv_path, - "--model", - scenario["model"], - "--source-lang", - scenario["source_lang"], - "--target-lang", - scenario["target_lang"], - "--column", - scenario["column"], - "--scene", - scenario["scene"], - "--warmup-batches", - str(args.warmup_batches), - "--suite", - args.suite, - "--serial-items-per-case", - str(args.serial_items_per_case), - "--concurrency-requests-per-case", - str(args.concurrency_requests_per_case), - "--concurrency-batch-size", - str(args.concurrency_batch_size), - "--max-batch-concurrency-product", - str(args.max_batch_concurrency_product), - ] - if args.limit: - cmd.extend(["--limit", str(args.limit)]) - if args.batch_size: - cmd.extend(["--batch-size", str(args.batch_size)]) - if args.batch_size_list: - cmd.extend(["--batch-size-list", args.batch_size_list]) - if args.concurrency_list: - cmd.extend(["--concurrency-list", args.concurrency_list]) - if args.device_override: - cmd.extend(["--device-override", args.device_override]) - if args.torch_dtype_override: - cmd.extend(["--torch-dtype-override", args.torch_dtype_override]) - if args.max_new_tokens: - cmd.extend(["--max-new-tokens", str(args.max_new_tokens)]) - if args.num_beams: - cmd.extend(["--num-beams", str(args.num_beams)]) - if args.attn_implementation: - cmd.extend(["--attn-implementation", args.attn_implementation]) - if args.ct2_inter_threads >= 0: - cmd.extend(["--ct2-inter-threads", str(args.ct2_inter_threads)]) - if args.ct2_intra_threads >= 0: - cmd.extend(["--ct2-intra-threads", str(args.ct2_intra_threads)]) - if args.ct2_max_queued_batches >= 0: - cmd.extend(["--ct2-max-queued-batches", str(args.ct2_max_queued_batches)]) - if args.ct2_batch_type: - cmd.extend(["--ct2-batch-type", args.ct2_batch_type]) - if args.ct2_decoding_length_mode: - cmd.extend(["--ct2-decoding-length-mode", args.ct2_decoding_length_mode]) - if args.ct2_decoding_length_extra: - cmd.extend(["--ct2-decoding-length-extra", str(args.ct2_decoding_length_extra)]) - if args.ct2_decoding_length_min: - cmd.extend(["--ct2-decoding-length-min", str(args.ct2_decoding_length_min)]) - if args.disable_cache: - cmd.append("--disable-cache") - - completed = subprocess.run(cmd, capture_output=True, text=True, check=True) - result_line = "" - for line in reversed(completed.stdout.splitlines()): - if line.startswith("JSON_RESULT="): - result_line = line - break - if not result_line: - raise RuntimeError(f"Scenario output missing JSON_RESULT marker:\n{completed.stdout}\n{completed.stderr}") - payload = json.loads(result_line.split("=", 1)[1]) - payload["scenario"]["name"] = scenario["name"] - report["scenarios"].append(payload) - - return report - - -def render_baseline_markdown_report(report: Dict[str, Any]) -> str: - lines = [ - "# Local Translation Model Benchmark", - "", - f"- Generated at: `{report['generated_at']}`", - f"- Suite: `{report['suite']}`", - f"- Python: `{report['environment']['python']}`", - f"- Torch: `{report['environment']['torch']}`", - f"- Transformers: `{report['environment']['transformers']}`", - f"- CUDA: `{report['environment']['cuda_available']}`", - ] - if report["environment"]["gpu_name"]: - lines.append(f"- GPU: `{report['environment']['gpu_name']}` ({report['environment']['gpu_total_mem_gb']} GiB)") - lines.extend( - [ - "", - "| Scenario | Items/s | Avg item ms | Req p50 ms | Req p95 ms | Load s | Peak GPU GiB | Success |", - "|---|---:|---:|---:|---:|---:|---:|---:|", - ] - ) - for item in report["scenarios"]: - runtime = item["runtime"] - lines.append( - "| {name} | {items_per_second} | {avg_item_latency_ms} | {request_latency_p50_ms} | {request_latency_p95_ms} | {load_seconds} | {peak_gpu_memory_gb} | {success_rate} |".format( - name=item["scenario"]["name"], - items_per_second=runtime["items_per_second"], - avg_item_latency_ms=runtime["avg_item_latency_ms"], - request_latency_p50_ms=runtime["request_latency_p50_ms"], - request_latency_p95_ms=runtime["request_latency_p95_ms"], - load_seconds=runtime["load_seconds"], - peak_gpu_memory_gb=runtime["peak_gpu_memory_gb"], - success_rate=runtime["success_rate"], - ) - ) - - lines.append("") - for item in report["scenarios"]: - runtime = item["runtime"] - dataset = item["dataset"] - lines.extend( - [ - f"## {item['scenario']['name']}", - "", - f"- Dataset rows: `{dataset['rows']}` from column `{item['scenario']['column']}`", - f"- Direction: `{item['scenario']['source_lang']} -> {item['scenario']['target_lang']}`", - f"- Batch size: configured `{runtime['configured_batch_size']}`, used `{runtime['used_batch_size']}`", - f"- Load time: `{runtime['load_seconds']} s`", - f"- Translate time: `{runtime['translate_seconds']} s`", - f"- Throughput: `{runtime['items_per_second']} items/s`, `{runtime['input_chars_per_second']} input chars/s`", - f"- Latency: avg item `{runtime['avg_item_latency_ms']} ms`, req p50 `{runtime['request_latency_p50_ms']} ms`, req p95 `{runtime['request_latency_p95_ms']} ms`, req max `{runtime['request_latency_max_ms']} ms`", - f"- Memory: max RSS `{runtime['max_rss_mb']} MB`, peak GPU allocated `{runtime['peak_gpu_memory_gb']} GiB`, peak GPU reserved `{runtime['peak_gpu_reserved_gb']} GiB`", - f"- Success: `{runtime['success_count']}/{dataset['rows']}`", - "", - ] - ) - return "\n".join(lines) - - -def render_case_table( - title: str, - rows: Sequence[Dict[str, Any]], - *, - include_batch: bool, - include_concurrency: bool, -) -> List[str]: - headers = ["Rows", "Requests", "Items/s", "Req/s", "Avg req ms", "Req p50 ms", "Req p95 ms", "Peak GPU GiB"] - prefix_headers: List[str] = [] - if include_batch: - prefix_headers.append("Batch") - if include_concurrency: - prefix_headers.append("Concurrency") - headers = prefix_headers + headers - lines = [f"### {title}", ""] - lines.append("| " + " | ".join(headers) + " |") - lines.append("|" + "|".join(["---:"] * len(headers)) + "|") - for item in rows: - values: List[str] = [] - if include_batch: - values.append(str(item["batch_size"])) - if include_concurrency: - values.append(str(item["concurrency"])) - values.extend( - [ - str(item["rows"]), - str(item["requests"]), - str(item["items_per_second"]), - str(item["requests_per_second"]), - str(item["avg_request_latency_ms"]), - str(item["request_latency_p50_ms"]), - str(item["request_latency_p95_ms"]), - str(item["peak_gpu_memory_gb"]), - ] - ) - lines.append("| " + " | ".join(values) + " |") - lines.append("") - return lines - - -def render_extended_markdown_report(report: Dict[str, Any]) -> str: - lines = [ - "# Local Translation Model Extended Benchmark", - "", - f"- Generated at: `{report['generated_at']}`", - f"- Suite: `{report['suite']}`", - f"- Python: `{report['environment']['python']}`", - f"- Torch: `{report['environment']['torch']}`", - f"- Transformers: `{report['environment']['transformers']}`", - f"- CUDA: `{report['environment']['cuda_available']}`", - ] - if report["environment"]["gpu_name"]: - lines.append(f"- GPU: `{report['environment']['gpu_name']}` ({report['environment']['gpu_total_mem_gb']} GiB)") - - lines.extend( - [ - "", - "## Reading Guide", - "", - "- `batch_sweep`: single stream only (`concurrency=1`), used to compare bulk translation efficiency across batch sizes.", - "- `concurrency_sweep`: fixed request batch size, used to compare online request latency and throughput as concurrency rises.", - "- `matrix`: combined `batch_size x concurrency` runs, filtered by `batch_size * concurrency <= limit` when configured.", - "", - ] - ) - - for item in report["scenarios"]: - lines.extend( - [ - f"## {item['scenario']['name']}", - "", - f"- Direction: `{item['scenario']['source_lang']} -> {item['scenario']['target_lang']}`", - f"- Column: `{item['scenario']['column']}`", - f"- Loaded rows: `{item['dataset']['rows_loaded']}`", - f"- Load time: `{item['runtime_defaults']['load_seconds']} s`", - f"- Device: `{item['runtime_defaults']['device']}`", - f"- DType: `{item['runtime_defaults']['torch_dtype']}`", - f"- Cache disabled: `{item['config']['cache_disabled']}`", - "", - ] - ) - lines.extend(render_case_table("Batch Sweep (`concurrency=1`)", item["batch_sweep"], include_batch=True, include_concurrency=False)) - lines.extend( - render_case_table( - f"Concurrency Sweep (`batch_size={item['config']['concurrency_batch_size']}`)", - item["concurrency_sweep"], - include_batch=False, - include_concurrency=True, - ) - ) - lines.extend(render_case_table("Batch x Concurrency Matrix", item["matrix"], include_batch=True, include_concurrency=True)) - return "\n".join(lines) - - -def render_markdown_report(report: Dict[str, Any]) -> str: - if report["suite"] == "extended": - return render_extended_markdown_report(report) - return render_baseline_markdown_report(report) - - -def main() -> None: - args = parse_args() - if args.single: - if args.suite == "extended": - result = benchmark_extended_scenario(args) - else: - result = benchmark_single_scenario(args) - print("JSON_RESULT=" + json.dumps(result, ensure_ascii=False)) - return - - report = run_all_scenarios(args) - output_dir = resolve_output_dir(args.output_dir) - timestamp = datetime.now().strftime("%H%M%S") - suffix = "extended" if args.suite == "extended" else "baseline" - json_path = output_dir / f"translation_local_models_{suffix}_{timestamp}.json" - md_path = output_dir / f"translation_local_models_{suffix}_{timestamp}.md" - json_path.write_text(json.dumps(report, ensure_ascii=False, indent=2), encoding="utf-8") - md_path.write_text(render_markdown_report(report), encoding="utf-8") - - print(f"JSON report: {json_path}") - print(f"Markdown report: {md_path}") - for item in report["scenarios"]: - if args.suite == "extended": - best_batch = max(item["batch_sweep"], key=lambda x: x["items_per_second"]) - best_concurrency = max(item["concurrency_sweep"], key=lambda x: x["items_per_second"]) - print( - f"{item['scenario']['name']}: " - f"best_batch={best_batch['batch_size']} ({best_batch['items_per_second']} items/s) | " - f"best_concurrency={best_concurrency['concurrency']} ({best_concurrency['items_per_second']} items/s @ batch={best_concurrency['batch_size']})" - ) - else: - runtime = item["runtime"] - print( - f"{item['scenario']['name']}: " - f"{runtime['items_per_second']} items/s | " - f"avg_item={runtime['avg_item_latency_ms']} ms | " - f"p95_req={runtime['request_latency_p95_ms']} ms | " - f"load={runtime['load_seconds']} s" - ) - - -if __name__ == "__main__": - main() diff --git a/scripts/benchmark_translation_local_models_focus.py b/scripts/benchmark_translation_local_models_focus.py deleted file mode 100644 index 00f0610..0000000 --- a/scripts/benchmark_translation_local_models_focus.py +++ /dev/null @@ -1,250 +0,0 @@ -#!/usr/bin/env python3 -"""Focused translation benchmark for two stress scenarios on local CT2 models.""" - -from __future__ import annotations - -import argparse -import copy -import json -import sys -from datetime import datetime -from pathlib import Path -from typing import Any, Dict, List - -PROJECT_ROOT = Path(__file__).resolve().parent.parent -if str(PROJECT_ROOT) not in sys.path: - sys.path.insert(0, str(PROJECT_ROOT)) - -from config.services_config import get_translation_config -from scripts.benchmark_translation_local_models import ( - SCENARIOS, - benchmark_concurrency_case, - benchmark_serial_case, - build_environment_info, - ensure_cuda_stats_reset, - load_texts, -) -from translation.service import TranslationService - -DEFAULT_HIGH_BATCH_SIZES = [32, 64, 128] -DEFAULT_HIGH_CONCURRENCIES = [8, 16, 32, 64] - - -def parse_args() -> argparse.Namespace: - parser = argparse.ArgumentParser(description="Focused benchmark for local CT2 translation models") - parser.add_argument("--csv-path", default="products_analyzed.csv", help="Benchmark dataset CSV path") - parser.add_argument( - "--output-dir", - default="perf_reports/20260318/translation_local_models_ct2_focus", - help="Directory for JSON/Markdown focused reports", - ) - parser.add_argument( - "--high-batch-sizes", - default="32,64,128", - help="Comma-separated batch sizes for the high-batch/low-concurrency scenario", - ) - parser.add_argument( - "--high-concurrencies", - default="8,16,32,64", - help="Comma-separated concurrency levels for the high-concurrency/low-batch scenario", - ) - parser.add_argument( - "--high-batch-rows", - type=int, - default=512, - help="Rows used for the high-batch/low-concurrency scenario", - ) - parser.add_argument( - "--high-concurrency-requests", - type=int, - default=32, - help="Requests per high-concurrency/low-batch case", - ) - parser.add_argument("--warmup-batches", type=int, default=1, help="Warmup batches before measuring") - return parser.parse_args() - - -def parse_csv_ints(raw: str) -> List[int]: - values: List[int] = [] - for item in raw.split(","): - stripped = item.strip() - if not stripped: - continue - value = int(stripped) - if value <= 0: - raise ValueError(f"Expected positive integer, got {value}") - values.append(value) - if not values: - raise ValueError("Parsed empty integer list") - return values - - -def build_variant_config(model: str, overrides: Dict[str, Any]) -> tuple[Dict[str, Any], Dict[str, Any]]: - config = copy.deepcopy(get_translation_config()) - for name, cfg in config["capabilities"].items(): - cfg["enabled"] = name == model - cfg["use_cache"] = False - config["default_model"] = model - capability = config["capabilities"][model] - capability.update(overrides) - config["capabilities"][model] = capability - return config, capability - - -def render_markdown(report: Dict[str, Any]) -> str: - lines = [ - "# Local Translation Model Focused Benchmark", - "", - f"- Generated at: `{report['generated_at']}`", - f"- Python: `{report['environment']['python']}`", - f"- Torch: `{report['environment']['torch']}`", - f"- Transformers: `{report['environment']['transformers']}`", - f"- CUDA: `{report['environment']['cuda_available']}`", - ] - if report["environment"]["gpu_name"]: - lines.append(f"- GPU: `{report['environment']['gpu_name']}` ({report['environment']['gpu_total_mem_gb']} GiB)") - lines.extend( - [ - "", - "## Scope", - "", - "- Scenario 1: high batch size + low concurrency", - "- Scenario 2: high concurrency + low batch size", - "- Variants in this report:", - ] - ) - for variant in report["variants"]: - lines.append(f" - `{variant['name']}`: `{variant['overrides']}`") - - for scenario in report["scenarios"]: - lines.extend( - [ - "", - f"## {scenario['name']}", - "", - f"- Direction: `{scenario['source_lang']} -> {scenario['target_lang']}`", - f"- Column: `{scenario['column']}`", - ] - ) - for variant in scenario["variants"]: - lines.extend( - [ - "", - f"### Variant `{variant['name']}`", - "", - "| Scenario | Setting | Items/s | Req p95 ms | Avg req ms |", - "|---|---|---:|---:|---:|", - ] - ) - for row in variant["high_batch_low_concurrency"]: - lines.append( - f"| high-batch/low-concurrency | batch={row['batch_size']}, concurrency=1 | " - f"{row['items_per_second']} | {row['request_latency_p95_ms']} | {row['avg_request_latency_ms']} |" - ) - for row in variant["high_concurrency_low_batch"]: - lines.append( - f"| high-concurrency/low-batch | batch=1, concurrency={row['concurrency']} | " - f"{row['items_per_second']} | {row['request_latency_p95_ms']} | {row['avg_request_latency_ms']} |" - ) - return "\n".join(lines) + "\n" - - -def main() -> None: - args = parse_args() - csv_path = (PROJECT_ROOT / args.csv_path).resolve() if not Path(args.csv_path).is_absolute() else Path(args.csv_path) - output_dir = (PROJECT_ROOT / args.output_dir).resolve() if not Path(args.output_dir).is_absolute() else Path(args.output_dir) - output_dir.mkdir(parents=True, exist_ok=True) - - high_batch_sizes = parse_csv_ints(args.high_batch_sizes) - high_concurrencies = parse_csv_ints(args.high_concurrencies) - - variants = [ - {"name": "ct2_default", "overrides": {}}, - { - "name": "ct2_tuned_t4", - "overrides": { - "ct2_inter_threads": 2, - "ct2_max_queued_batches": 16, - "ct2_batch_type": "examples", - }, - }, - ] - - report: Dict[str, Any] = { - "generated_at": datetime.now().isoformat(timespec="seconds"), - "environment": build_environment_info(), - "csv_path": str(csv_path), - "variants": variants, - "scenarios": [], - } - - largest_batch = max(high_batch_sizes) - high_batch_rows = max(args.high_batch_rows, largest_batch) - - for scenario in SCENARIOS: - scenario_entry = dict(scenario) - scenario_entry["variants"] = [] - batch_texts = load_texts(csv_path, scenario["column"], high_batch_rows) - conc_needed = max(high_concurrencies) * args.high_concurrency_requests - conc_texts = load_texts(csv_path, scenario["column"], conc_needed) - - for variant in variants: - print(f"[start] {scenario['name']} | {variant['name']}", flush=True) - config, capability = build_variant_config(scenario["model"], variant["overrides"]) - ensure_cuda_stats_reset() - service = TranslationService(config) - backend = service.get_backend(scenario["model"]) - - high_batch_results = [] - for batch_size in high_batch_sizes: - high_batch_results.append( - benchmark_serial_case( - service=service, - backend=backend, - scenario=scenario, - capability=capability, - texts=batch_texts[: max(batch_size, high_batch_rows)], - batch_size=batch_size, - warmup_batches=args.warmup_batches, - ) - ) - - high_concurrency_results = [] - for concurrency in high_concurrencies: - high_concurrency_results.append( - benchmark_concurrency_case( - service=service, - backend=backend, - scenario=scenario, - capability=capability, - texts=conc_texts, - batch_size=1, - concurrency=concurrency, - requests_per_case=args.high_concurrency_requests, - warmup_batches=args.warmup_batches, - ) - ) - - scenario_entry["variants"].append( - { - "name": variant["name"], - "overrides": variant["overrides"], - "high_batch_low_concurrency": high_batch_results, - "high_concurrency_low_batch": high_concurrency_results, - } - ) - print(f"[done] {scenario['name']} | {variant['name']}", flush=True) - - report["scenarios"].append(scenario_entry) - - stamp = datetime.now().strftime("%H%M%S") - json_path = output_dir / f"translation_local_models_focus_{stamp}.json" - md_path = output_dir / f"translation_local_models_focus_{stamp}.md" - json_path.write_text(json.dumps(report, ensure_ascii=False, indent=2), encoding="utf-8") - md_path.write_text(render_markdown(report), encoding="utf-8") - print(f"JSON report: {json_path}") - print(f"Markdown report: {md_path}") - - -if __name__ == "__main__": - main() diff --git a/scripts/benchmark_translation_longtext_single.py b/scripts/benchmark_translation_longtext_single.py deleted file mode 100644 index ba48d56..0000000 --- a/scripts/benchmark_translation_longtext_single.py +++ /dev/null @@ -1,186 +0,0 @@ -#!/usr/bin/env python3 -"""Benchmark a single long-text translation request for local models.""" - -from __future__ import annotations - -import argparse -import copy -import json -import logging -import statistics -import time -from pathlib import Path - -import torch - -PROJECT_ROOT = Path(__file__).resolve().parent.parent - -import sys - -if str(PROJECT_ROOT) not in sys.path: - sys.path.insert(0, str(PROJECT_ROOT)) - -from config.services_config import get_translation_config # noqa: E402 -from translation.service import TranslationService # noqa: E402 -from translation.text_splitter import compute_safe_input_token_limit # noqa: E402 - - -def parse_args() -> argparse.Namespace: - parser = argparse.ArgumentParser(description="Benchmark a long-text translation request") - parser.add_argument("--model", default="nllb-200-distilled-600m") - parser.add_argument("--source-lang", default="zh") - parser.add_argument("--target-lang", default="en") - parser.add_argument("--scene", default="sku_name") - parser.add_argument("--source-md", default="docs/DEVELOPER_GUIDE.md") - parser.add_argument("--paragraph-min-chars", type=int, default=250) - parser.add_argument("--target-doc-chars", type=int, default=4500) - parser.add_argument("--min-doc-chars", type=int, default=2400) - parser.add_argument("--runs", type=int, default=3) - parser.add_argument("--batch-size", type=int, default=64) - parser.add_argument("--ct2-inter-threads", type=int, default=4) - parser.add_argument("--ct2-max-queued-batches", type=int, default=32) - parser.add_argument("--ct2-batch-type", default="examples") - parser.add_argument("--max-new-tokens", type=int, default=64) - parser.add_argument("--ct2-decoding-length-mode", default="source") - parser.add_argument("--ct2-decoding-length-extra", type=int, default=8) - parser.add_argument("--ct2-decoding-length-min", type=int, default=32) - return parser.parse_args() - - -def build_long_document(args: argparse.Namespace) -> str: - source_path = (PROJECT_ROOT / args.source_md).resolve() - text = source_path.read_text(encoding="utf-8") - paragraphs = [] - for raw in text.split("\n\n"): - normalized = " ".join(line.strip() for line in raw.splitlines() if line.strip()) - if len(normalized) >= args.paragraph_min_chars and not normalized.startswith("```"): - paragraphs.append(normalized) - - parts = [] - total = 0 - for paragraph in paragraphs: - parts.append(paragraph) - total += len(paragraph) + 2 - if total >= args.target_doc_chars: - break - document = "\n\n".join(parts) - if len(document) < args.min_doc_chars: - raise ValueError( - f"Prepared long document is too short: {len(document)} chars < {args.min_doc_chars}" - ) - return document - - -def build_service(args: argparse.Namespace) -> TranslationService: - config = copy.deepcopy(get_translation_config()) - for name, capability in config["capabilities"].items(): - capability["enabled"] = name == args.model - - capability = config["capabilities"][args.model] - capability["use_cache"] = False - capability["batch_size"] = args.batch_size - capability["ct2_inter_threads"] = args.ct2_inter_threads - capability["ct2_max_queued_batches"] = args.ct2_max_queued_batches - capability["ct2_batch_type"] = args.ct2_batch_type - capability["max_new_tokens"] = args.max_new_tokens - capability["ct2_decoding_length_mode"] = args.ct2_decoding_length_mode - capability["ct2_decoding_length_extra"] = args.ct2_decoding_length_extra - capability["ct2_decoding_length_min"] = args.ct2_decoding_length_min - config["default_model"] = args.model - return TranslationService(config) - - -def percentile(values: list[float], p: float) -> float: - if not values: - return 0.0 - ordered = sorted(values) - if len(ordered) == 1: - return float(ordered[0]) - index = min(len(ordered) - 1, max(0, round((len(ordered) - 1) * p))) - return float(ordered[index]) - - -def main() -> None: - args = parse_args() - logging.getLogger().setLevel(logging.WARNING) - - document = build_long_document(args) - load_started = time.perf_counter() - service = build_service(args) - backend = service.get_backend(args.model) - load_seconds = time.perf_counter() - load_started - - safe_input_limit = compute_safe_input_token_limit( - max_input_length=backend.max_input_length, - max_new_tokens=backend.max_new_tokens, - decoding_length_mode=backend.ct2_decoding_length_mode, - decoding_length_extra=backend.ct2_decoding_length_extra, - ) - segments = backend._split_text_if_needed( - document, - target_lang=args.target_lang, - source_lang=args.source_lang, - ) - - # Warm up once before measurements. - _ = service.translate( - document, - source_lang=args.source_lang, - target_lang=args.target_lang, - model=args.model, - scene=args.scene, - ) - if torch.cuda.is_available(): - torch.cuda.synchronize() - - latencies_ms: list[float] = [] - output_chars = 0 - for _ in range(args.runs): - started = time.perf_counter() - output = service.translate( - document, - source_lang=args.source_lang, - target_lang=args.target_lang, - model=args.model, - scene=args.scene, - ) - if torch.cuda.is_available(): - torch.cuda.synchronize() - latencies_ms.append((time.perf_counter() - started) * 1000) - output_chars += len(output or "") - - total_seconds = sum(latencies_ms) / 1000.0 - payload = { - "model": args.model, - "source_lang": args.source_lang, - "target_lang": args.target_lang, - "doc_chars": len(document), - "runs": args.runs, - "load_seconds": round(load_seconds, 3), - "batch_size": backend.batch_size, - "ct2_inter_threads": backend.ct2_inter_threads, - "ct2_max_queued_batches": backend.ct2_max_queued_batches, - "ct2_batch_type": backend.ct2_batch_type, - "max_new_tokens": backend.max_new_tokens, - "ct2_decoding_length_mode": backend.ct2_decoding_length_mode, - "ct2_decoding_length_extra": backend.ct2_decoding_length_extra, - "ct2_decoding_length_min": backend.ct2_decoding_length_min, - "safe_input_limit": safe_input_limit, - "segment_count": len(segments), - "segment_char_lengths": { - "min": min(len(segment) for segment in segments), - "max": max(len(segment) for segment in segments), - "avg": round(statistics.fmean(len(segment) for segment in segments), 1), - }, - "latency_avg_ms": round(statistics.fmean(latencies_ms), 2), - "latency_p50_ms": round(percentile(latencies_ms, 0.50), 2), - "latency_p95_ms": round(percentile(latencies_ms, 0.95), 2), - "latency_max_ms": round(max(latencies_ms), 2), - "input_chars_per_second": round((len(document) * args.runs) / total_seconds, 2), - "output_chars_per_second": round(output_chars / total_seconds, 2), - } - print(json.dumps(payload, ensure_ascii=False)) - - -if __name__ == "__main__": - main() diff --git a/scripts/debug/trace_indexer_calls.sh b/scripts/debug/trace_indexer_calls.sh new file mode 100755 index 0000000..d22b9ea --- /dev/null +++ b/scripts/debug/trace_indexer_calls.sh @@ -0,0 +1,76 @@ +#!/bin/bash +# +# 排查「谁在调用索引服务」的脚本 +# 用法: ./scripts/debug/trace_indexer_calls.sh +# + +set -euo pipefail + +cd "$(dirname "$0")/.." +source ./activate.sh 2>/dev/null || true + +echo "==========================================" +echo "索引服务调用方排查" +echo "==========================================" + +INDEXER_PORT="${INDEXER_PORT:-6004}" +EMBEDDING_TEXT_PORT="${EMBEDDING_TEXT_PORT:-6005}" +EMBEDDING_IMAGE_PORT="${EMBEDDING_IMAGE_PORT:-6008}" + +echo "" +echo "1. 监听端口 6004 的进程(Indexer 服务)" +echo "------------------------------------------" +if command -v lsof >/dev/null 2>&1; then + lsof -i :"${INDEXER_PORT}" 2>/dev/null || echo " (无进程监听或 lsof 无权限)" +else + ss -tlnp 2>/dev/null | grep ":${INDEXER_PORT}" || echo " (无进程监听)" +fi + +echo "" +echo "2. 连接到 6004 的客户端(谁在请求 Indexer)" +echo "------------------------------------------" +if command -v ss >/dev/null 2>&1; then + ss -tnp 2>/dev/null | grep ":${INDEXER_PORT}" || echo " (当前无活跃连接)" +elif command -v netstat >/dev/null 2>&1; then + netstat -tnp 2>/dev/null | grep ":${INDEXER_PORT}" || echo " (当前无活跃连接)" +else + echo " 请安装 ss 或 netstat" +fi + +echo "" +echo "3. 连接到 Embedding 服务的客户端" +echo "------------------------------------------" +if command -v ss >/dev/null 2>&1; then + ss -tnp 2>/dev/null | grep -E ":${EMBEDDING_TEXT_PORT}|:${EMBEDDING_IMAGE_PORT}" || echo " (当前无活跃连接)" +fi + +echo "" +echo "4. 检查定时任务(cron)" +echo "------------------------------------------" +(crontab -l 2>/dev/null | grep -i indexer) || echo " 当前用户无相关 cron" +if [ -d /etc/cron.d ]; then + grep -l -i indexer /etc/cron.d/* 2>/dev/null || true +fi + +echo "" +echo "5. 端口与逻辑说明" +echo "------------------------------------------" +echo " - Indexer 服务: 端口 ${INDEXER_PORT}" +echo " 启动: ./scripts/start_indexer.sh 或 python main.py serve-indexer" +echo " 接口: POST /indexer/reindex, POST /indexer/index, POST /indexer/build-docs 等" +echo "" +echo " - 调用方(文档说明): 外部 Java 程序或 curl 等 HTTP 客户端" +echo " 全量: curl -X POST http://localhost:${INDEXER_PORT}/indexer/reindex -d '{\"tenant_id\":\"170\",\"batch_size\":500}'" +echo " 增量: curl -X POST http://localhost:${INDEXER_PORT}/indexer/index -d '{\"tenant_id\":\"170\",\"spu_ids\":[\"123\"]}'" +echo "" +echo " - Indexer 内部会调用:" +echo " - Text Embedding 服务 (${EMBEDDING_TEXT_PORT}): POST /embed/text" +echo " - Image Embedding 服务 (${EMBEDDING_IMAGE_PORT}): POST /embed/image" +echo " - Qwen API: dashscope.aliyuncs.com (翻译、LLM 分析)" +echo " - MySQL: 商品数据" +echo " - Elasticsearch: 写入索引" +echo "" +echo "6. 实时监控连接(按 Ctrl+C 停止)" +echo "------------------------------------------" +echo " 运行: watch -n 2 'ss -tnp | grep -E \":${INDEXER_PORT}|:${EMBEDDING_TEXT_PORT}|:${EMBEDDING_IMAGE_PORT}\"'" +echo "" diff --git a/scripts/indexer__old_2025_11/import_tenant2_csv.py b/scripts/indexer__old_2025_11/import_tenant2_csv.py deleted file mode 100755 index 063dd77..0000000 --- a/scripts/indexer__old_2025_11/import_tenant2_csv.py +++ /dev/null @@ -1,495 +0,0 @@ -#!/usr/bin/env python3 -""" -Import tenant2 CSV data into MySQL Shoplazza tables. - -Reads CSV file and generates SQL INSERT statements for SPU and SKU tables. -Each CSV row corresponds to 1 SPU and 1 SKU. -This script is for generating test data for tenant_id=2 from CSV files. -""" - -import sys -import os -import csv -import random -import argparse -import re -from pathlib import Path -from datetime import datetime, timedelta - -# Add parent directory to path -sys.path.insert(0, str(Path(__file__).parent.parent)) - - -def escape_sql_string(value: str) -> str: - """ - Escape SQL string value (replace single quotes with doubled quotes and handle special characters). - - Args: - value: String value to escape - - Returns: - Escaped string - """ - if value is None: - return '' - - # Convert to string and handle None - s = str(value) - - # Replace single quotes with doubled quotes (SQL standard) - s = s.replace("'", "''") - - # Replace backslashes (MySQL escape) - s = s.replace("\\", "\\\\") - - # Remove or replace control characters that can break SQL - # Replace newlines and carriage returns with spaces - s = s.replace("\n", " ").replace("\r", " ") - - # Remove other control characters (except tab) - s = re.sub(r'[\x00-\x08\x0B-\x0C\x0E-\x1F\x7F]', '', s) - - # Remove null bytes - s = s.replace('\x00', '') - - return s - - -def generate_handle(title: str) -> str: - """ - Generate URL-friendly handle from title. - - Args: - title: Product title - - Returns: - URL-friendly handle - """ - # Remove special characters, convert to lowercase, replace spaces with hyphens - handle = re.sub(r'[^\w\s-]', '', title.lower()) - handle = re.sub(r'[-\s]+', '-', handle) - handle = handle.strip('-') - # Limit length - if len(handle) > 255: - handle = handle[:255] - return handle or 'product' - - -def parse_csv_row(row: dict) -> dict: - """ - Parse CSV row and extract fields. - - Args: - row: CSV row dictionary - - Returns: - Parsed data dictionary - """ - # Remove quotes from values if present - def clean_value(value): - if value is None: - return '' - value = str(value).strip() - # Remove surrounding quotes - if value.startswith('"') and value.endswith('"'): - value = value[1:-1] - return value - - return { - 'skuId': clean_value(row.get('skuId', '')), - 'name': clean_value(row.get('name', '')), - 'name_pinyin': clean_value(row.get('name_pinyin', '')), - 'create_time': clean_value(row.get('create_time', '')), - 'ruSkuName': clean_value(row.get('ruSkuName', '')), - 'enSpuName': clean_value(row.get('enSpuName', '')), - 'categoryName': clean_value(row.get('categoryName', '')), - 'supplierName': clean_value(row.get('supplierName', '')), - 'brandName': clean_value(row.get('brandName', '')), - 'file_id': clean_value(row.get('file_id', '')), - 'days_since_last_update': clean_value(row.get('days_since_last_update', '')), - 'id': clean_value(row.get('id', '')), - 'imageUrl': clean_value(row.get('imageUrl', '')) - } - - -def generate_spu_data(csv_data: dict, spu_id: int, tenant_id: str = "2") -> dict: - """ - Generate SPU data from CSV row. - - Args: - csv_data: Parsed CSV row data - spu_id: SPU ID - tenant_id: Tenant ID (default: "2") - - Returns: - SPU data dictionary - """ - # Parse create_time - try: - created_at = datetime.strptime(csv_data['create_time'], '%Y-%m-%d %H:%M:%S') - except: - created_at = datetime.now() - timedelta(days=random.randint(1, 365)) - - updated_at = created_at + timedelta(days=random.randint(0, 30)) - - # Generate handle from title - title = csv_data['name'] or csv_data['enSpuName'] or 'Product' - handle = generate_handle(title) - - # Generate tags from category and brand - tags_parts = [] - if csv_data['categoryName']: - tags_parts.append(csv_data['categoryName']) - if csv_data['brandName']: - tags_parts.append(csv_data['brandName']) - tags = ','.join(tags_parts) if tags_parts else '' - - # Generate SEO fields - seo_title = f"{title} - {csv_data['categoryName']}" if csv_data['categoryName'] else title - seo_description = f"购买{csv_data['brandName']}{title}" if csv_data['brandName'] else title - seo_keywords = f"{title},{csv_data['categoryName']},{csv_data['brandName']}" if csv_data['categoryName'] else title - - spu = { - 'id': spu_id, - 'shop_id': 1, - 'shoplazza_id': csv_data['id'] or f"spu-{spu_id}", - 'handle': handle, - 'title': title, - 'brief': csv_data['name'] or '', - 'description': f"

{csv_data['name']}

" if csv_data['name'] else '', - 'spu': '', - 'vendor': csv_data['supplierName'] or '', - 'vendor_url': '', - 'seo_title': seo_title, - 'seo_description': seo_description, - 'seo_keywords': seo_keywords, - 'image_src': csv_data['imageUrl'] or '', - 'image_width': 800, - 'image_height': 600, - 'image_path': f"products/{spu_id}.jpg", - 'image_alt': title, - 'inventory_policy': '', - 'inventory_quantity': 0, - 'inventory_tracking': '0', - 'published': 1, - 'published_at': created_at.strftime('%Y-%m-%d %H:%M:%S'), - 'requires_shipping': 1, - 'taxable': 0, - 'fake_sales': 0, - 'display_fake_sales': 0, - 'mixed_wholesale': 0, - 'need_variant_image': 0, - 'has_only_default_variant': 0, - 'tags': tags, - 'note': '', - 'category': csv_data['categoryName'] or '', - 'shoplazza_created_at': created_at.strftime('%Y-%m-%d %H:%M:%S'), - 'shoplazza_updated_at': updated_at.strftime('%Y-%m-%d %H:%M:%S'), - 'tenant_id': tenant_id, - 'creator': '1', - 'create_time': created_at.strftime('%Y-%m-%d %H:%M:%S'), - 'updater': '1', - 'update_time': updated_at.strftime('%Y-%m-%d %H:%M:%S'), - 'deleted': 0 - } - - return spu - - -def generate_sku_data(csv_data: dict, spu_id: int, sku_id: int, tenant_id: str = "2") -> dict: - """ - Generate SKU data from CSV row. - - Args: - csv_data: Parsed CSV row data - spu_id: Associated SPU ID - sku_id: SKU ID (from CSV skuId) - tenant_id: Tenant ID (default: "2") - - Returns: - SKU data dictionary - """ - # Parse create_time - try: - created_at = datetime.strptime(csv_data['create_time'], '%Y-%m-%d %H:%M:%S') - except: - created_at = datetime.now() - timedelta(days=random.randint(1, 365)) - - updated_at = created_at + timedelta(days=random.randint(0, 30)) - - # Generate random price - price = round(random.uniform(50, 500), 2) - compare_at_price = round(price * random.uniform(1.2, 1.5), 2) - cost_price = round(price * 0.6, 2) - - # Generate random stock - inventory_quantity = random.randint(0, 100) - - # Generate random weight - weight = round(random.uniform(0.1, 5.0), 2) - - # Use ruSkuName as title, fallback to name - title = csv_data['ruSkuName'] or csv_data['name'] or 'SKU' - - # Use skuId as SKU code - sku_code = csv_data['skuId'] or f"SKU-{sku_id}" - - sku = { - 'id': sku_id, - 'spu_id': spu_id, - 'shop_id': 1, - 'shoplazza_id': f"sku-{sku_id}", - 'shoplazza_product_id': csv_data['id'] or f"spu-{spu_id}", - 'shoplazza_image_id': '', - 'title': title, - 'sku': sku_code, - 'barcode': f"BAR{sku_id:08d}", - 'position': 1, - 'price': price, - 'compare_at_price': compare_at_price, - 'cost_price': cost_price, - 'option1': '', - 'option2': '', - 'option3': '', - 'inventory_quantity': inventory_quantity, - 'weight': weight, - 'weight_unit': 'kg', - 'image_src': csv_data['imageUrl'] or '', - 'wholesale_price': f'[{{"price": {round(price * 0.8, 2)}, "minQuantity": 10}}]', - 'note': '', - 'extend': None, # JSON field, use NULL - 'shoplazza_created_at': created_at.strftime('%Y-%m-%d %H:%M:%S'), - 'shoplazza_updated_at': updated_at.strftime('%Y-%m-%d %H:%M:%S'), - 'tenant_id': tenant_id, - 'creator': '1', - 'create_time': created_at.strftime('%Y-%m-%d %H:%M:%S'), - 'updater': '1', - 'update_time': updated_at.strftime('%Y-%m-%d %H:%M:%S'), - 'deleted': 0 - } - - return sku - - -def read_csv_file(csv_file: str) -> list: - """ - Read CSV file and return list of parsed rows. - - Args: - csv_file: Path to CSV file - - Returns: - List of parsed CSV data dictionaries - """ - csv_data_list = [] - - with open(csv_file, 'r', encoding='utf-8') as f: - # Use csv.DictReader to handle quoted fields properly - reader = csv.DictReader(f) - for row in reader: - parsed = parse_csv_row(row) - csv_data_list.append(parsed) - - return csv_data_list - - -def generate_sql_inserts(spus: list, skus: list, output_file: str): - """ - Generate SQL INSERT statements. - - Args: - spus: List of SPU data - skus: List of SKU data - output_file: Output file path - """ - with open(output_file, 'w', encoding='utf-8') as f: - f.write("-- SPU Data from tenant2 CSV\n") - f.write("INSERT INTO shoplazza_product_spu (\n") - f.write(" id, shop_id, shoplazza_id, handle, title, brief, description, spu,\n") - f.write(" vendor, vendor_url, seo_title, seo_description, seo_keywords,\n") - f.write(" image_src, image_width, image_height, image_path, image_alt,\n") - f.write(" inventory_policy, inventory_quantity, inventory_tracking,\n") - f.write(" published, published_at, requires_shipping, taxable,\n") - f.write(" fake_sales, display_fake_sales, mixed_wholesale, need_variant_image,\n") - f.write(" has_only_default_variant, tags, note, category,\n") - f.write(" shoplazza_created_at, shoplazza_updated_at, tenant_id,\n") - f.write(" creator, create_time, updater, update_time, deleted\n") - f.write(") VALUES\n") - - for i, spu in enumerate(spus): - values = ( - f"({spu['id']}, {spu['shop_id']}, '{escape_sql_string(spu['shoplazza_id'])}', " - f"'{escape_sql_string(spu['handle'])}', '{escape_sql_string(spu['title'])}', " - f"'{escape_sql_string(spu['brief'])}', '{escape_sql_string(spu['description'])}', " - f"'{escape_sql_string(spu['spu'])}', '{escape_sql_string(spu['vendor'])}', " - f"'{escape_sql_string(spu['vendor_url'])}', '{escape_sql_string(spu['seo_title'])}', " - f"'{escape_sql_string(spu['seo_description'])}', '{escape_sql_string(spu['seo_keywords'])}', " - f"'{escape_sql_string(spu['image_src'])}', {spu['image_width']}, " - f"{spu['image_height']}, '{escape_sql_string(spu['image_path'])}', " - f"'{escape_sql_string(spu['image_alt'])}', '{escape_sql_string(spu['inventory_policy'])}', " - f"{spu['inventory_quantity']}, '{escape_sql_string(spu['inventory_tracking'])}', " - f"{spu['published']}, '{escape_sql_string(spu['published_at'])}', " - f"{spu['requires_shipping']}, {spu['taxable']}, " - f"{spu['fake_sales']}, {spu['display_fake_sales']}, {spu['mixed_wholesale']}, " - f"{spu['need_variant_image']}, {spu['has_only_default_variant']}, " - f"'{escape_sql_string(spu['tags'])}', '{escape_sql_string(spu['note'])}', " - f"'{escape_sql_string(spu['category'])}', '{escape_sql_string(spu['shoplazza_created_at'])}', " - f"'{escape_sql_string(spu['shoplazza_updated_at'])}', '{escape_sql_string(spu['tenant_id'])}', " - f"'{escape_sql_string(spu['creator'])}', '{escape_sql_string(spu['create_time'])}', " - f"'{escape_sql_string(spu['updater'])}', '{escape_sql_string(spu['update_time'])}', " - f"{spu['deleted']})" - ) - f.write(values) - if i < len(spus) - 1: - f.write(",\n") - else: - f.write(";\n\n") - - f.write("-- SKU Data from tenant2 CSV\n") - f.write("INSERT INTO shoplazza_product_sku (\n") - f.write(" id, spu_id, shop_id, shoplazza_id, shoplazza_product_id, shoplazza_image_id,\n") - f.write(" title, sku, barcode, position, price, compare_at_price, cost_price,\n") - f.write(" option1, option2, option3, inventory_quantity, weight, weight_unit,\n") - f.write(" image_src, wholesale_price, note, extend,\n") - f.write(" shoplazza_created_at, shoplazza_updated_at, tenant_id,\n") - f.write(" creator, create_time, updater, update_time, deleted\n") - f.write(") VALUES\n") - - for i, sku in enumerate(skus): - # Handle extend field (JSON, can be NULL) - extend_value = 'NULL' if sku['extend'] is None else f"'{escape_sql_string(sku['extend'])}'" - - values = ( - f"({sku['id']}, {sku['spu_id']}, {sku['shop_id']}, '{escape_sql_string(sku['shoplazza_id'])}', " - f"'{escape_sql_string(sku['shoplazza_product_id'])}', '{escape_sql_string(sku['shoplazza_image_id'])}', " - f"'{escape_sql_string(sku['title'])}', '{escape_sql_string(sku['sku'])}', " - f"'{escape_sql_string(sku['barcode'])}', {sku['position']}, " - f"{sku['price']}, {sku['compare_at_price']}, {sku['cost_price']}, " - f"'{escape_sql_string(sku['option1'])}', '{escape_sql_string(sku['option2'])}', " - f"'{escape_sql_string(sku['option3'])}', {sku['inventory_quantity']}, {sku['weight']}, " - f"'{escape_sql_string(sku['weight_unit'])}', '{escape_sql_string(sku['image_src'])}', " - f"'{escape_sql_string(sku['wholesale_price'])}', '{escape_sql_string(sku['note'])}', " - f"{extend_value}, '{escape_sql_string(sku['shoplazza_created_at'])}', " - f"'{escape_sql_string(sku['shoplazza_updated_at'])}', '{escape_sql_string(sku['tenant_id'])}', " - f"'{escape_sql_string(sku['creator'])}', '{escape_sql_string(sku['create_time'])}', " - f"'{escape_sql_string(sku['updater'])}', '{escape_sql_string(sku['update_time'])}', " - f"{sku['deleted']})" - ) - f.write(values) - if i < len(skus) - 1: - f.write(",\n") - else: - f.write(";\n") - - -def get_max_ids_from_db(db_config=None): - """ - Get maximum IDs from database to avoid primary key conflicts. - - Args: - db_config: Optional database config dict with keys: host, port, database, username, password - - Returns: - tuple: (max_spu_id, max_sku_id) or (0, 0) if cannot connect - """ - if not db_config: - return 0, 0 - - try: - from utils.db_connector import create_db_connection - from sqlalchemy import text - - db_engine = create_db_connection( - host=db_config['host'], - port=db_config['port'], - database=db_config['database'], - username=db_config['username'], - password=db_config['password'] - ) - - with db_engine.connect() as conn: - result = conn.execute(text('SELECT MAX(id) FROM shoplazza_product_spu')) - max_spu_id = result.scalar() or 0 - - result = conn.execute(text('SELECT MAX(id) FROM shoplazza_product_sku')) - max_sku_id = result.scalar() or 0 - - return max_spu_id, max_sku_id - except Exception as e: - print(f"Warning: Could not get max IDs from database: {e}") - return 0, 0 - - -def main(): - parser = argparse.ArgumentParser(description='Import tenant2 CSV data into MySQL Shoplazza tables') - parser.add_argument('--csv-file', required=True, help='CSV file path') - parser.add_argument('--tenant-id', default='2', help='Tenant ID (default: 2)') - parser.add_argument('--start-spu-id', type=int, default=None, help='Starting SPU ID (default: auto-calculate from DB)') - parser.add_argument('--output', default='tenant2_data.sql', help='Output SQL file (default: tenant2_data.sql)') - parser.add_argument('--db-host', help='Database host (for auto-calculating start IDs)') - parser.add_argument('--db-port', type=int, default=3306, help='Database port (default: 3306)') - parser.add_argument('--db-database', help='Database name (for auto-calculating start IDs)') - parser.add_argument('--db-username', help='Database username (for auto-calculating start IDs)') - parser.add_argument('--db-password', help='Database password (for auto-calculating start IDs)') - - args = parser.parse_args() - - print(f"Reading CSV file: {args.csv_file}") - csv_data_list = read_csv_file(args.csv_file) - print(f"Read {len(csv_data_list)} rows from CSV") - - # Auto-calculate start IDs if not provided and DB config available - start_spu_id = args.start_spu_id - if start_spu_id is None and args.db_host and args.db_database and args.db_username and args.db_password: - print("Auto-calculating start IDs from database...") - db_config = { - 'host': args.db_host, - 'port': args.db_port, - 'database': args.db_database, - 'username': args.db_username, - 'password': args.db_password - } - max_spu_id, max_sku_id = get_max_ids_from_db(db_config) - start_spu_id = max_spu_id + 1 - print(f" Max SPU ID in DB: {max_spu_id}") - print(f" Using start SPU ID: {start_spu_id}") - elif start_spu_id is None: - start_spu_id = 1 - print(f"Using default start SPU ID: {start_spu_id}") - - # Generate SPU and SKU data - print(f"Generating SPU and SKU data (tenant_id={args.tenant_id})...") - spus = [] - skus = [] - spu_id = start_spu_id - - for csv_data in csv_data_list: - # Generate SPU - spu = generate_spu_data(csv_data, spu_id, args.tenant_id) - spus.append(spu) - - # Generate SKU - use skuId from CSV as SKU ID - try: - sku_id = int(csv_data['skuId']) - except: - # If skuId is not valid, use a generated ID - sku_id = 1000000 + spu_id - - sku = generate_sku_data(csv_data, spu_id, sku_id, args.tenant_id) - skus.append(sku) - - spu_id += 1 - - print(f"Generated {len(spus)} SPUs and {len(skus)} SKUs") - - # Generate SQL file - print(f"Generating SQL file: {args.output}") - generate_sql_inserts(spus, skus, args.output) - print(f"SQL file generated: {args.output}") - print(f" - SPUs: {len(spus)}") - print(f" - SKUs: {len(skus)}") - - -if __name__ == '__main__': - main() - diff --git a/scripts/indexer__old_2025_11/import_test_data.py b/scripts/indexer__old_2025_11/import_test_data.py deleted file mode 100644 index 97ea83d..0000000 --- a/scripts/indexer__old_2025_11/import_test_data.py +++ /dev/null @@ -1,277 +0,0 @@ -#!/usr/bin/env python3 -""" -Import test data into MySQL Shoplazza tables. - -Reads SQL file generated by generate_test_data.py and imports into MySQL. -""" - -import sys -import os -import argparse -from pathlib import Path - -# Add parent directory to path -sys.path.insert(0, str(Path(__file__).parent.parent)) - -from utils.db_connector import create_db_connection, test_connection - - -def import_sql_file(db_engine, sql_file: str): - """ - Import SQL file into database using MySQL client (more reliable for large files). - - Args: - db_engine: SQLAlchemy database engine (used to get connection info) - sql_file: Path to SQL file - """ - import subprocess - import os - from pathlib import Path - - # Get connection info from engine URL - engine_url = str(db_engine.url) - # Parse: mysql+pymysql://user:pass@host:port/database - import re - match = re.match(r'mysql\+pymysql://([^:]+):([^@]+)@([^:]+):(\d+)/(.+)', engine_url) - if not match: - raise ValueError(f"Cannot parse database URL: {engine_url}") - - username, password, host, port, database = match.groups() - - # Use MySQL client to execute SQL file (more reliable) - sql_file_path = Path(sql_file).absolute() - - # Build mysql command - mysql_cmd = [ - 'mysql', - f'-h{host}', - f'-P{port}', - f'-u{username}', - f'-p{password}', - database - ] - - print(f"Executing SQL file using MySQL client...") - print(f" File: {sql_file_path}") - print(f" Database: {host}:{port}/{database}") - - try: - with open(sql_file_path, 'r', encoding='utf-8') as f: - result = subprocess.run( - mysql_cmd, - stdin=f, - capture_output=True, - text=True, - timeout=300 # 5 minute timeout - ) - - if result.returncode != 0: - error_msg = result.stderr or result.stdout - print(f"ERROR: MySQL execution failed") - print(f"Error output: {error_msg[:500]}") - raise Exception(f"MySQL execution failed: {error_msg[:200]}") - - print("SQL file executed successfully") - return True - - except FileNotFoundError: - # Fallback to SQLAlchemy if mysql client not available - print("MySQL client not found, falling back to SQLAlchemy...") - return import_sql_file_sqlalchemy(db_engine, sql_file) - except subprocess.TimeoutExpired: - raise Exception("SQL execution timed out after 5 minutes") - except Exception as e: - print(f"Error using MySQL client: {e}") - print("Falling back to SQLAlchemy...") - return import_sql_file_sqlalchemy(db_engine, sql_file) - - -def import_sql_file_sqlalchemy(db_engine, sql_file: str): - """ - Fallback method: Import SQL file using SQLAlchemy (for when mysql client unavailable). - """ - from sqlalchemy import text - - with open(sql_file, 'r', encoding='utf-8') as f: - sql_content = f.read() - - # Remove comment lines - lines = sql_content.split('\n') - cleaned_lines = [] - for line in lines: - stripped = line.lstrip() - if stripped.startswith('--'): - continue - cleaned_lines.append(line) - - sql_content = '\n'.join(cleaned_lines) - - # Split by semicolon - but we need to handle strings properly - # Use a state machine to track string boundaries - statements = [] - current = [] - in_string = False - i = 0 - - while i < len(sql_content): - char = sql_content[i] - - if char == "'": - # Check for escaped quote (two single quotes) - if i + 1 < len(sql_content) and sql_content[i+1] == "'": - current.append("''") - i += 1 # Skip next quote - elif not in_string: - in_string = True - current.append(char) - else: - in_string = False - current.append(char) - else: - current.append(char) - - # Split on semicolon only if not in string - if char == ';' and not in_string: - stmt = ''.join(current).strip() - if stmt and stmt.upper().startswith('INSERT INTO'): - statements.append(stmt) - current = [] - - i += 1 - - # Handle last statement - if current: - stmt = ''.join(current).strip() - if stmt and stmt.upper().startswith('INSERT INTO'): - statements.append(stmt) - - print(f"Parsed {len(statements)} SQL statements") - print(f"Executing {len(statements)} SQL statements...") - - # Use raw connection to avoid SQLAlchemy parameter parsing - raw_conn = db_engine.raw_connection() - try: - cursor = raw_conn.cursor() - try: - for i, statement in enumerate(statements, 1): - try: - # Execute raw SQL directly using pymysql cursor - cursor.execute(statement) - raw_conn.commit() - if i % 1000 == 0 or i == len(statements): - print(f" [{i}/{len(statements)}] Executed successfully") - except Exception as e: - print(f" [{i}/{len(statements)}] ERROR: {e}") - error_start = max(0, statement.find('VALUES') - 100) - error_end = min(len(statement), error_start + 500) - print(f" Statement context: ...{statement[error_start:error_end]}...") - raise - finally: - cursor.close() - finally: - raw_conn.close() - - return True - - -def verify_import(db_engine, tenant_id: str): - """ - Verify imported data. - - Args: - db_engine: SQLAlchemy database engine - tenant_id: Tenant ID to verify - """ - from sqlalchemy import text - - with db_engine.connect() as conn: - # Count SPUs - result = conn.execute(text("SELECT COUNT(*) FROM shoplazza_product_spu WHERE tenant_id = :tenant_id"), {"tenant_id": tenant_id}) - spu_count = result.scalar() - - # Count SKUs - result = conn.execute(text("SELECT COUNT(*) FROM shoplazza_product_sku WHERE tenant_id = :tenant_id"), {"tenant_id": tenant_id}) - sku_count = result.scalar() - - print(f"\nVerification:") - print(f" SPUs: {spu_count}") - print(f" SKUs: {sku_count}") - - return spu_count, sku_count - - -def main(): - parser = argparse.ArgumentParser(description='Import test data into MySQL') - - # Database connection - parser.add_argument('--db-host', required=True, help='MySQL host') - parser.add_argument('--db-port', type=int, default=3306, help='MySQL port (default: 3306)') - parser.add_argument('--db-database', required=True, help='MySQL database name') - parser.add_argument('--db-username', required=True, help='MySQL username') - parser.add_argument('--db-password', required=True, help='MySQL password') - - # Import options - parser.add_argument('--sql-file', required=True, help='SQL file to import') - parser.add_argument('--tenant-id', help='Tenant ID to verify (optional)') - - args = parser.parse_args() - - print(f"Connecting to MySQL: {args.db_host}:{args.db_port}/{args.db_database}") - - # Connect to database - try: - db_engine = create_db_connection( - host=args.db_host, - port=args.db_port, - database=args.db_database, - username=args.db_username, - password=args.db_password - ) - except Exception as e: - print(f"ERROR: Failed to connect to MySQL: {e}") - return 1 - - # Test connection - if not test_connection(db_engine): - print("ERROR: Database connection test failed") - return 1 - - print("Database connection successful") - - # Clean existing data if tenant_id provided - if args.tenant_id: - print(f"\nCleaning existing data for tenant_id: {args.tenant_id}") - from sqlalchemy import text - try: - with db_engine.connect() as conn: - # Delete SKUs first (foreign key constraint) - conn.execute(text(f"DELETE FROM shoplazza_product_sku WHERE tenant_id = '{args.tenant_id}'")) - # Delete SPUs - conn.execute(text(f"DELETE FROM shoplazza_product_spu WHERE tenant_id = '{args.tenant_id}'")) - conn.commit() - print("✓ Existing data cleaned") - except Exception as e: - print(f"⚠ Warning: Failed to clean existing data: {e}") - # Continue anyway - - # Import SQL file - print(f"\nImporting SQL file: {args.sql_file}") - try: - import_sql_file(db_engine, args.sql_file) - print("Import completed successfully") - except Exception as e: - print(f"ERROR: Failed to import SQL file: {e}") - import traceback - traceback.print_exc() - return 1 - - # Verify import if tenant_id provided - if args.tenant_id: - verify_import(db_engine, args.tenant_id) - - return 0 - - -if __name__ == '__main__': - sys.exit(main()) - diff --git a/scripts/indexer__old_2025_11/ingest.sh b/scripts/indexer__old_2025_11/ingest.sh deleted file mode 100755 index 572ab81..0000000 --- a/scripts/indexer__old_2025_11/ingest.sh +++ /dev/null @@ -1,92 +0,0 @@ -#!/bin/bash - -# Unified data ingestion script for saas-search -# Ingests data from MySQL to Elasticsearch -# -# [LEGACY] 此脚本仅保留用于历史兼容,不建议新流程继续使用。 -# 推荐改用: -# 1) ./scripts/create_tenant_index.sh -# 2) POST /indexer/reindex - -cd "$(dirname "$0")/.." -source /home/tw/miniconda3/etc/profile.d/conda.sh -conda activate searchengine - -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -RED='\033[0;31m' -NC='\033[0m' - -echo -e "${GREEN}========================================${NC}" -echo -e "${GREEN}数据灌入脚本${NC}" -echo -e "${GREEN}========================================${NC}" - -# Load config from .env file if it exists -if [ -f .env ]; then - set -a - source .env - set +a -fi - -# Parameters -TENANT_ID=${1:-""} -RECREATE_INDEX=${2:-"false"} - -DB_HOST=${DB_HOST:-"120.79.247.228"} -DB_PORT=${DB_PORT:-"3316"} -DB_DATABASE=${DB_DATABASE:-"saas"} -DB_USERNAME=${DB_USERNAME:-"saas"} -DB_PASSWORD=${DB_PASSWORD:-"P89cZHS5d7dFyc9R"} -ES_HOST=${ES_HOST:-"http://localhost:9200"} -BATCH_SIZE=${BATCH_SIZE:-500} - -echo -e "\n${YELLOW}Configuration:${NC}" -echo " Tenant ID: $TENANT_ID" -echo " Recreate Index: $RECREATE_INDEX" -echo " MySQL: $DB_HOST:$DB_PORT/$DB_DATABASE" -echo " Elasticsearch: $ES_HOST" -echo " Batch Size: $BATCH_SIZE" - -# Validate parameters -if [ -z "$TENANT_ID" ]; then - echo -e "${RED}ERROR: Tenant ID is required${NC}" - echo "Usage: $0 [recreate_index]" - echo " tenant_id: Required, tenant ID" - echo " recreate_index: Optional, recreate index if exists (true/false, default: false)" - exit 1 -fi - -if [ -z "$DB_PASSWORD" ]; then - echo -e "${RED}ERROR: DB_PASSWORD未设置,请检查.env文件或环境变量${NC}" - exit 1 -fi - -# Build command -CMD="python scripts/ingest_shoplazza.py \ - --db-host $DB_HOST \ - --db-port $DB_PORT \ - --db-database $DB_DATABASE \ - --db-username $DB_USERNAME \ - --db-password $DB_PASSWORD \ - --tenant-id $TENANT_ID \ - --es-host $ES_HOST \ - --batch-size $BATCH_SIZE" - -if [ "$RECREATE_INDEX" = "true" ] || [ "$RECREATE_INDEX" = "1" ]; then - CMD="$CMD --recreate" - echo -e "\n${YELLOW}Warning: Index will be deleted and recreated!${NC}" -fi - -echo -e "\n${YELLOW}Starting data ingestion...${NC}" -eval $CMD - -if [ $? -eq 0 ]; then - echo -e "\n${GREEN}========================================${NC}" - echo -e "${GREEN}数据灌入完成!${NC}" - echo -e "${GREEN}========================================${NC}" -else - echo -e "\n${RED}========================================${NC}" - echo -e "${RED}数据灌入失败!${NC}" - echo -e "${RED}========================================${NC}" - exit 1 -fi diff --git a/scripts/indexer__old_2025_11/ingest_shoplazza.py b/scripts/indexer__old_2025_11/ingest_shoplazza.py deleted file mode 100644 index 60699c0..0000000 --- a/scripts/indexer__old_2025_11/ingest_shoplazza.py +++ /dev/null @@ -1,146 +0,0 @@ -#!/usr/bin/env python3 -""" -Shoplazza data ingestion script. - -Loads SPU and SKU data from MySQL and indexes into Elasticsearch using SPU transformer. -""" - -import sys -import os -import argparse -from pathlib import Path - -# Add parent directory to path -sys.path.insert(0, str(Path(__file__).parent.parent)) - -from utils.db_connector import create_db_connection -from utils.es_client import ESClient -from indexer.spu_transformer import SPUTransformer -from indexer.mapping_generator import load_mapping, DEFAULT_INDEX_NAME -from indexer.bulk_indexer import BulkIndexer - - -def main(): - parser = argparse.ArgumentParser(description='Ingest Shoplazza SPU/SKU data into Elasticsearch') - - # Database connection - parser.add_argument('--db-host', required=True, help='MySQL host') - parser.add_argument('--db-port', type=int, default=3306, help='MySQL port (default: 3306)') - parser.add_argument('--db-database', required=True, help='MySQL database name') - parser.add_argument('--db-username', required=True, help='MySQL username') - parser.add_argument('--db-password', required=True, help='MySQL password') - - # Tenant and index - parser.add_argument('--tenant-id', required=True, help='Tenant ID (required)') - parser.add_argument('--es-host', default='http://localhost:9200', help='Elasticsearch host') - - # Options - parser.add_argument('--recreate', action='store_true', help='Recreate index if exists') - parser.add_argument('--batch-size', type=int, default=500, help='Batch size for indexing (default: 500)') - - args = parser.parse_args() - - print(f"Starting Shoplazza data ingestion for tenant: {args.tenant_id}") - - # Load mapping from JSON file - try: - mapping = load_mapping() - print(f"Loaded mapping configuration") - except Exception as e: - print(f"ERROR: Failed to load mapping: {e}") - return 1 - - index_name = DEFAULT_INDEX_NAME - - # Connect to MySQL - print(f"Connecting to MySQL: {args.db_host}:{args.db_port}/{args.db_database}") - try: - db_engine = create_db_connection( - host=args.db_host, - port=args.db_port, - database=args.db_database, - username=args.db_username, - password=args.db_password - ) - except Exception as e: - print(f"ERROR: Failed to connect to MySQL: {e}") - return 1 - - # Connect to Elasticsearch - es_host = args.es_host - es_username = os.environ.get('ES_USERNAME') - es_password = os.environ.get('ES_PASSWORD') - - print(f"Connecting to Elasticsearch: {es_host}") - if es_username and es_password: - print(f"Using authentication: {es_username}") - es_client = ESClient(hosts=[es_host], username=es_username, password=es_password) - else: - es_client = ESClient(hosts=[es_host]) - - if not es_client.ping(): - print(f"ERROR: Cannot connect to Elasticsearch at {es_host}") - return 1 - - # Create index if needed - if args.recreate: - if es_client.index_exists(index_name): - print(f"Deleting existing index: {index_name}") - if not es_client.delete_index(index_name): - print(f"ERROR: Failed to delete index '{index_name}'") - return 1 - - if not es_client.index_exists(index_name): - print(f"Creating index: {index_name}") - if not es_client.create_index(index_name, mapping): - print(f"ERROR: Failed to create index '{index_name}'") - print("Please check the mapping configuration and try again.") - return 1 - else: - print(f"Using existing index: {index_name}") - - # Initialize SPU transformer - print(f"Initializing SPU transformer for tenant: {args.tenant_id}") - transformer = SPUTransformer(db_engine, args.tenant_id) - - # Transform data - print("Transforming SPU and SKU data...") - try: - documents = transformer.transform_batch() - print(f"Transformed {len(documents)} SPU documents") - except Exception as e: - print(f"ERROR: Failed to transform data: {e}") - import traceback - traceback.print_exc() - return 1 - - if not documents: - print("WARNING: No documents to index") - return 0 - - # Bulk index - print(f"Indexing {len(documents)} documents (batch size: {args.batch_size})...") - indexer = BulkIndexer(es_client, index_name, batch_size=args.batch_size) - - try: - results = indexer.index_documents(documents, id_field="spu_id", show_progress=True) - print(f"\nIngestion complete:") - print(f" Success: {results['success']}") - print(f" Failed: {results['failed']}") - print(f" Time: {results.get('elapsed_time', 0):.2f}s") - - if results['failed'] > 0: - print(f"\nWARNING: {results['failed']} documents failed to index") - return 1 - - return 0 - except Exception as e: - print(f"ERROR: Failed to index documents: {e}") - import traceback - traceback.print_exc() - return 1 - - -if __name__ == '__main__': - sys.exit(main()) - diff --git a/scripts/indexer__old_2025_11/recreate_and_import.py b/scripts/indexer__old_2025_11/recreate_and_import.py deleted file mode 100755 index af0a448..0000000 --- a/scripts/indexer__old_2025_11/recreate_and_import.py +++ /dev/null @@ -1,184 +0,0 @@ -#!/usr/bin/env python3 -""" -重建索引并导入数据的脚本。 - -清除旧索引,使用新的mapping重建索引,然后导入数据。 -""" - -import sys -import os -import argparse -from pathlib import Path - -# Add parent directory to path -sys.path.insert(0, str(Path(__file__).parent.parent)) - -from utils.db_connector import create_db_connection -from utils.es_client import ESClient -from indexer.mapping_generator import load_mapping, delete_index_if_exists, DEFAULT_INDEX_NAME -from indexer.spu_transformer import SPUTransformer -from indexer.bulk_indexer import BulkIndexer - - -def main(): - parser = argparse.ArgumentParser(description='重建ES索引并导入数据') - - # Database connection - parser.add_argument('--db-host', help='MySQL host (或使用环境变量 DB_HOST)') - parser.add_argument('--db-port', type=int, help='MySQL port (或使用环境变量 DB_PORT, 默认: 3306)') - parser.add_argument('--db-database', help='MySQL database (或使用环境变量 DB_DATABASE)') - parser.add_argument('--db-username', help='MySQL username (或使用环境变量 DB_USERNAME)') - parser.add_argument('--db-password', help='MySQL password (或使用环境变量 DB_PASSWORD)') - - # Tenant and ES - parser.add_argument('--tenant-id', required=True, help='Tenant ID (必需)') - parser.add_argument('--es-host', help='Elasticsearch host (或使用环境变量 ES_HOST, 默认: http://localhost:9200)') - - # Options - parser.add_argument('--batch-size', type=int, default=500, help='批量导入大小 (默认: 500)') - parser.add_argument('--skip-delete', action='store_true', help='跳过删除旧索引步骤') - - args = parser.parse_args() - - print("=" * 60) - print("重建ES索引并导入数据") - print("=" * 60) - - # 加载mapping - print("\n[1/4] 加载mapping配置...") - try: - mapping = load_mapping() - print(f"✓ 成功加载mapping配置") - except Exception as e: - print(f"✗ 加载mapping失败: {e}") - return 1 - - index_name = DEFAULT_INDEX_NAME - print(f"索引名称: {index_name}") - - # 连接Elasticsearch - print("\n[2/4] 连接Elasticsearch...") - es_host = args.es_host or os.environ.get('ES_HOST', 'http://localhost:9200') - es_username = os.environ.get('ES_USERNAME') - es_password = os.environ.get('ES_PASSWORD') - - print(f"ES地址: {es_host}") - if es_username: - print(f"ES用户名: {es_username}") - - try: - if es_username and es_password: - es_client = ESClient(hosts=[es_host], username=es_username, password=es_password) - else: - es_client = ESClient(hosts=[es_host]) - - if not es_client.ping(): - print(f"✗ 无法连接到Elasticsearch: {es_host}") - return 1 - print("✓ Elasticsearch连接成功") - except Exception as e: - print(f"✗ 连接Elasticsearch失败: {e}") - return 1 - - # 删除旧索引 - if not args.skip_delete: - print("\n[3/4] 删除旧索引...") - if es_client.index_exists(index_name): - print(f"发现已存在的索引: {index_name}") - if delete_index_if_exists(es_client, index_name): - print(f"✓ 成功删除索引: {index_name}") - else: - print(f"✗ 删除索引失败: {index_name}") - return 1 - else: - print(f"索引不存在,跳过删除: {index_name}") - else: - print("\n[3/4] 跳过删除旧索引步骤") - - # 创建新索引 - print("\n[4/4] 创建新索引...") - try: - if es_client.index_exists(index_name): - print(f"✓ 索引已存在: {index_name},跳过创建") - else: - print(f"创建索引: {index_name}") - if es_client.create_index(index_name, mapping): - print(f"✓ 成功创建索引: {index_name}") - else: - print(f"✗ 创建索引失败: {index_name}") - return 1 - except Exception as e: - print(f"✗ 创建索引失败: {e}") - import traceback - traceback.print_exc() - return 1 - - # 连接MySQL - print("\n[5/5] 连接MySQL...") - db_host = args.db_host or os.environ.get('DB_HOST') - db_port = args.db_port or int(os.environ.get('DB_PORT', 3306)) - db_database = args.db_database or os.environ.get('DB_DATABASE') - db_username = args.db_username or os.environ.get('DB_USERNAME') - db_password = args.db_password or os.environ.get('DB_PASSWORD') - - if not all([db_host, db_database, db_username, db_password]): - print("✗ MySQL连接参数不完整") - print("请提供 --db-host, --db-database, --db-username, --db-password") - print("或设置环境变量: DB_HOST, DB_DATABASE, DB_USERNAME, DB_PASSWORD") - return 1 - - print(f"MySQL: {db_host}:{db_port}/{db_database}") - try: - db_engine = create_db_connection( - host=db_host, - port=db_port, - database=db_database, - username=db_username, - password=db_password - ) - print("✓ MySQL连接成功") - except Exception as e: - print(f"✗ 连接MySQL失败: {e}") - return 1 - - # 导入数据 - print("\n[6/6] 导入数据...") - print(f"Tenant ID: {args.tenant_id}") - print(f"批量大小: {args.batch_size}") - - try: - transformer = SPUTransformer(db_engine, args.tenant_id) - print("正在转换数据...") - documents = transformer.transform_batch() - print(f"✓ 转换完成: {len(documents)} 个文档") - - if not documents: - print("⚠ 没有数据需要导入") - return 0 - - print(f"正在导入数据到ES (批量大小: {args.batch_size})...") - indexer = BulkIndexer(es_client, index_name, batch_size=args.batch_size) - results = indexer.index_documents(documents, id_field="spu_id", show_progress=True) - - print(f"\n{'='*60}") - print("导入完成!") - print(f"{'='*60}") - print(f"成功: {results['success']}") - print(f"失败: {results['failed']}") - print(f"耗时: {results.get('elapsed_time', 0):.2f}秒") - - if results['failed'] > 0: - print(f"\n⚠ 警告: {results['failed']} 个文档导入失败") - return 1 - - return 0 - except Exception as e: - print(f"✗ 导入数据失败: {e}") - import traceback - traceback.print_exc() - return 1 - - -if __name__ == '__main__': - sys.exit(main()) - diff --git a/scripts/install_server_deps.sh b/scripts/install_server_deps.sh deleted file mode 100755 index b144d65..0000000 --- a/scripts/install_server_deps.sh +++ /dev/null @@ -1,14 +0,0 @@ -#!/bin/bash - -echo "Installing server security dependencies..." - -# Check if we're in a conda environment -if [ -z "$CONDA_DEFAULT_ENV" ]; then - echo "Warning: No conda environment detected. Installing with pip..." - pip install slowapi>=0.1.9 anyio>=3.7.0 -else - echo "Installing in conda environment: $CONDA_DEFAULT_ENV" - pip install slowapi>=0.1.9 anyio>=3.7.0 -fi - -echo "Dependencies installed successfully!" \ No newline at end of file diff --git a/scripts/patch_rerank_vllm_benchmark_config.py b/scripts/patch_rerank_vllm_benchmark_config.py deleted file mode 100755 index c2daec9..0000000 --- a/scripts/patch_rerank_vllm_benchmark_config.py +++ /dev/null @@ -1,100 +0,0 @@ -#!/usr/bin/env python3 -""" -Surgically patch config/config.yaml: - services.rerank.backend - services.rerank.backends.qwen3_vllm.instruction_format - services.rerank.backends.qwen3_vllm_score.instruction_format - -Preserves comments and unrelated lines. Used for benchmark matrix runs. -""" - -from __future__ import annotations - -import argparse -import re -import sys -from pathlib import Path - - -def _with_stripped_body(line: str) -> tuple[str, str]: - """Return (body without end newline, newline suffix including '' if none).""" - if line.endswith("\r\n"): - return line[:-2], "\r\n" - if line.endswith("\n"): - return line[:-1], "\n" - return line, "" - - -def _patch_backend_in_rerank_block(lines: list[str], backend: str) -> None: - in_rerank = False - for i, line in enumerate(lines): - if line.startswith(" rerank:"): - in_rerank = True - continue - if in_rerank: - if line.startswith(" ") and not line.startswith(" ") and line.strip(): - in_rerank = False - continue - body, nl = _with_stripped_body(line) - m = re.match(r'^(\s*backend:\s*")[^"]+(".*)$', body) - if m: - lines[i] = f'{m.group(1)}{backend}{m.group(2)}{nl}' - return - raise RuntimeError("services.rerank.backend line not found") - - -def _patch_instruction_format_under_backend( - lines: list[str], section: str, fmt: str -) -> None: - """section is 'qwen3_vllm' or 'qwen3_vllm_score' (first line is ' qwen3_vllm:').""" - header = f" {section}:" - start = None - for i, line in enumerate(lines): - if line.rstrip() == header: - start = i - break - if start is None: - raise RuntimeError(f"section {section!r} not found") - - for j in range(start + 1, len(lines)): - line = lines[j] - body, nl = _with_stripped_body(line) - if re.match(r"^ [a-zA-Z0-9_]+:\s*$", body): - break - m = re.match(r"^(\s*instruction_format:\s*)\S+", body) - if m: - lines[j] = f"{m.group(1)}{fmt}{nl}" - return - raise RuntimeError(f"instruction_format not found under {section!r}") - - -def main() -> int: - p = argparse.ArgumentParser() - p.add_argument( - "--config", - type=Path, - default=Path(__file__).resolve().parent.parent / "config" / "config.yaml", - ) - p.add_argument("--backend", choices=("qwen3_vllm", "qwen3_vllm_score"), required=True) - p.add_argument( - "--instruction-format", - dest="instruction_format", - choices=("compact", "standard"), - required=True, - ) - args = p.parse_args() - text = args.config.read_text(encoding="utf-8") - lines = text.splitlines(keepends=True) - if not lines: - print("empty config", file=sys.stderr) - return 2 - _patch_backend_in_rerank_block(lines, args.backend) - _patch_instruction_format_under_backend(lines, "qwen3_vllm", args.instruction_format) - _patch_instruction_format_under_backend(lines, "qwen3_vllm_score", args.instruction_format) - args.config.write_text("".join(lines), encoding="utf-8") - print(f"patched {args.config}: backend={args.backend} instruction_format={args.instruction_format} (both vLLM blocks)") - return 0 - - -if __name__ == "__main__": - raise SystemExit(main()) diff --git a/scripts/perf_api_benchmark.py b/scripts/perf_api_benchmark.py deleted file mode 100755 index 4795f2e..0000000 --- a/scripts/perf_api_benchmark.py +++ /dev/null @@ -1,757 +0,0 @@ -#!/usr/bin/env python3 -""" -API-level performance test script for search stack services. - -Default scenarios (aligned with docs/搜索API对接指南 分册,如 -01 / -02 / -07): -- backend_search POST /search/ -- backend_suggest GET /search/suggestions -- embed_text POST /embed/text -- embed_image POST /embed/image -- translate POST /translate -- rerank POST /rerank - -Examples: - python scripts/perf_api_benchmark.py --scenario backend_search --duration 30 --concurrency 20 --tenant-id 162 - python scripts/perf_api_benchmark.py --scenario backend_suggest --duration 30 --concurrency 50 --tenant-id 162 - python scripts/perf_api_benchmark.py --scenario all --duration 60 --concurrency 80 --tenant-id 162 - python scripts/perf_api_benchmark.py --scenario all --cases-file scripts/perf_cases.json.example --output perf_result.json - # Embedding admission / priority (query param `priority`; same semantics as embedding service): - python scripts/perf_api_benchmark.py --scenario embed_text --embed-text-priority 1 --duration 30 --concurrency 20 - python scripts/perf_api_benchmark.py --scenario embed_image --embed-image-priority 1 --duration 30 --concurrency 10 -""" - -from __future__ import annotations - -import argparse -import asyncio -import json -import math -import random -import statistics -import time -from dataclasses import dataclass -from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple - -import httpx - - -@dataclass -class RequestTemplate: - method: str - path: str - params: Optional[Dict[str, Any]] = None - json_body: Optional[Any] = None - headers: Optional[Dict[str, str]] = None - - -@dataclass -class Scenario: - name: str - templates: List[RequestTemplate] - timeout_sec: float - - -@dataclass -class RequestResult: - ok: bool - status_code: int - latency_ms: float - error: str = "" - - -def _is_finite_number(v: Any) -> bool: - if isinstance(v, bool): - return False - if isinstance(v, (int, float)): - return math.isfinite(float(v)) - return False - - -def validate_response_payload( - scenario_name: str, - tpl: RequestTemplate, - payload: Any, -) -> Tuple[bool, str]: - """ - Lightweight payload validation for correctness-aware perf tests. - Strict for embed_text / embed_image to catch NaN/null vector regressions. - """ - if scenario_name not in ("embed_text", "embed_image"): - return True, "" - - expected_len = len(tpl.json_body) if isinstance(tpl.json_body, list) else None - if not isinstance(payload, list): - return False, "invalid_payload_non_list" - if expected_len is not None and len(payload) != expected_len: - return False, "invalid_payload_length" - if len(payload) == 0: - return False, "invalid_payload_empty" - - for i, vec in enumerate(payload): - if not isinstance(vec, list) or len(vec) == 0: - return False, f"invalid_vector_{i}_shape" - for x in vec: - if not _is_finite_number(x): - return False, f"invalid_vector_{i}_non_finite" - return True, "" - - -def percentile(sorted_values: List[float], p: float) -> float: - if not sorted_values: - return 0.0 - if p <= 0: - return sorted_values[0] - if p >= 100: - return sorted_values[-1] - rank = (len(sorted_values) - 1) * (p / 100.0) - low = int(math.floor(rank)) - high = int(math.ceil(rank)) - if low == high: - return sorted_values[low] - weight = rank - low - return sorted_values[low] * (1.0 - weight) + sorted_values[high] * weight - - -def make_default_templates(tenant_id: str) -> Dict[str, List[RequestTemplate]]: - return { - "backend_search": [ - RequestTemplate( - method="POST", - path="/search/", - headers={"X-Tenant-ID": tenant_id}, - json_body={"query": "wireless mouse", "size": 10, "language": "en"}, - ), - RequestTemplate( - method="POST", - path="/search/", - headers={"X-Tenant-ID": tenant_id}, - json_body={"query": "芭比娃娃", "size": 10, "language": "zh"}, - ), - RequestTemplate( - method="POST", - path="/search/", - headers={"X-Tenant-ID": tenant_id}, - json_body={"query": "f", "size": 10, "language": "en"}, - ), - ], - "backend_suggest": [ - RequestTemplate( - method="GET", - path="/search/suggestions", - headers={"X-Tenant-ID": tenant_id}, - params={"q": "f", "size": 10, "language": "en"}, - ), - RequestTemplate( - method="GET", - path="/search/suggestions", - headers={"X-Tenant-ID": tenant_id}, - params={"q": "玩", "size": 10, "language": "zh"}, - ), - RequestTemplate( - method="GET", - path="/search/suggestions", - headers={"X-Tenant-ID": tenant_id}, - params={"q": "shi", "size": 10, "language": "en"}, - ), - ], - "embed_text": [ - RequestTemplate( - method="POST", - path="/embed/text", - json_body=["wireless mouse", "gaming keyboard", "barbie doll"], - ) - ], - "embed_image": [ - RequestTemplate( - method="POST", - path="/embed/image", - json_body=["/data/saas-search/docs/image-dress1.png"], - ) - ], - "translate": [ - RequestTemplate( - method="POST", - path="/translate", - json_body={"text": "商品名称", "target_lang": "en", "source_lang": "zh", "model": "qwen"}, - ), - RequestTemplate( - method="POST", - path="/translate", - json_body={"text": "Product title", "target_lang": "zh", "model": "qwen"}, - ), - ], - "rerank": [ - RequestTemplate( - method="POST", - path="/rerank", - json_body={ - "query": "wireless mouse", - "docs": [ - "Wireless ergonomic mouse with rechargeable battery", - "USB-C cable 1m", - "Gaming mouse 26000 DPI", - ], - "normalize": True, - }, - ) - ], - } - - -def load_cases_from_file(path: Path, tenant_id: str) -> Dict[str, List[RequestTemplate]]: - data = json.loads(path.read_text(encoding="utf-8")) - out: Dict[str, List[RequestTemplate]] = {} - for scenario_name, requests_data in (data.get("scenarios") or {}).items(): - templates: List[RequestTemplate] = [] - for item in requests_data: - headers = dict(item.get("headers") or {}) - if "X-Tenant-ID" in headers and str(headers["X-Tenant-ID"]).strip() == "${tenant_id}": - headers["X-Tenant-ID"] = tenant_id - templates.append( - RequestTemplate( - method=str(item.get("method", "GET")).upper(), - path=str(item.get("path", "")).strip(), - params=item.get("params"), - json_body=item.get("json"), - headers=headers or None, - ) - ) - if templates: - out[scenario_name] = templates - return out - - -def apply_embed_priority_params( - scenarios: Dict[str, Scenario], - embed_text_priority: int, - embed_image_priority: int, -) -> None: - """ - Merge default `priority` query param into embed templates when absent. - `scripts/perf_cases.json` may set per-request `params.priority` to override. - """ - mapping = { - "embed_text": max(0, int(embed_text_priority)), - "embed_image": max(0, int(embed_image_priority)), - } - for name, pri in mapping.items(): - if name not in scenarios: - continue - scen = scenarios[name] - new_templates: List[RequestTemplate] = [] - for t in scen.templates: - params = dict(t.params or {}) - params.setdefault("priority", str(pri)) - new_templates.append( - RequestTemplate( - method=t.method, - path=t.path, - params=params, - json_body=t.json_body, - headers=t.headers, - ) - ) - scenarios[name] = Scenario( - name=scen.name, - templates=new_templates, - timeout_sec=scen.timeout_sec, - ) - - -def build_scenarios(args: argparse.Namespace) -> Dict[str, Scenario]: - defaults = make_default_templates(args.tenant_id) - if args.cases_file: - custom = load_cases_from_file(Path(args.cases_file), tenant_id=args.tenant_id) - defaults.update(custom) - - scenario_base = { - "backend_search": args.backend_base, - "backend_suggest": args.backend_base, - "embed_text": args.embedding_text_base, - "embed_image": args.embedding_image_base, - "translate": args.translator_base, - "rerank": args.reranker_base, - } - - scenarios: Dict[str, Scenario] = {} - for name, templates in defaults.items(): - if name not in scenario_base: - continue - base = scenario_base[name].rstrip("/") - rewritten: List[RequestTemplate] = [] - for t in templates: - path = t.path if t.path.startswith("/") else f"/{t.path}" - rewritten.append( - RequestTemplate( - method=t.method, - path=f"{base}{path}", - params=t.params, - json_body=t.json_body, - headers=t.headers, - ) - ) - scenarios[name] = Scenario(name=name, templates=rewritten, timeout_sec=args.timeout) - apply_embed_priority_params( - scenarios, - embed_text_priority=args.embed_text_priority, - embed_image_priority=args.embed_image_priority, - ) - return scenarios - - -async def run_single_scenario( - scenario: Scenario, - duration_sec: int, - concurrency: int, - max_requests: int, - max_errors: int, - rerank_dynamic_cfg: Optional[Dict[str, Any]] = None, -) -> Dict[str, Any]: - latencies: List[float] = [] - status_counter: Dict[int, int] = {} - err_counter: Dict[str, int] = {} - total_requests = 0 - success_requests = 0 - stop_flag = False - lock = asyncio.Lock() - start = time.perf_counter() - - timeout = httpx.Timeout(timeout=scenario.timeout_sec) - limits = httpx.Limits(max_connections=max(concurrency * 2, 20), max_keepalive_connections=max(concurrency, 10)) - - async def worker(worker_id: int, client: httpx.AsyncClient) -> None: - nonlocal total_requests, success_requests, stop_flag - idx = worker_id % len(scenario.templates) - worker_rng: Optional[random.Random] = None - if rerank_dynamic_cfg is not None: - worker_rng = random.Random(int(rerank_dynamic_cfg["seed"]) + worker_id) - - while not stop_flag: - elapsed = time.perf_counter() - start - if duration_sec > 0 and elapsed >= duration_sec: - break - - async with lock: - if max_requests > 0 and total_requests >= max_requests: - stop_flag = True - break - total_requests += 1 - - tpl = scenario.templates[idx % len(scenario.templates)] - idx += 1 - - t0 = time.perf_counter() - ok = False - status = 0 - err = "" - try: - req_json_body = tpl.json_body - if rerank_dynamic_cfg is not None and worker_rng is not None: - req_json_body = build_random_rerank_payload(rerank_dynamic_cfg, worker_rng) - resp = await client.request( - method=tpl.method, - url=tpl.path, - params=tpl.params, - json=req_json_body, - headers=tpl.headers, - ) - status = int(resp.status_code) - ok = 200 <= status < 300 - if ok: - try: - payload = resp.json() - except Exception: - ok = False - err = "invalid_json_response" - else: - valid, reason = validate_response_payload( - scenario_name=scenario.name, - tpl=tpl, - payload=payload, - ) - if not valid: - ok = False - err = reason or "invalid_payload" - if not ok and not err: - err = f"http_{status}" - except Exception as e: - err = type(e).__name__ - t1 = time.perf_counter() - cost_ms = (t1 - t0) * 1000.0 - - async with lock: - latencies.append(cost_ms) - if status: - status_counter[status] = status_counter.get(status, 0) + 1 - if ok: - success_requests += 1 - else: - err_counter[err or "unknown"] = err_counter.get(err or "unknown", 0) + 1 - total_err = sum(err_counter.values()) - if max_errors > 0 and total_err >= max_errors: - stop_flag = True - - async with httpx.AsyncClient(timeout=timeout, limits=limits) as client: - tasks = [asyncio.create_task(worker(i, client)) for i in range(concurrency)] - await asyncio.gather(*tasks) - - elapsed = max(time.perf_counter() - start, 1e-9) - lat_sorted = sorted(latencies) - - result = { - "scenario": scenario.name, - "duration_sec": round(elapsed, 3), - "total_requests": total_requests, - "success_requests": success_requests, - "failed_requests": max(total_requests - success_requests, 0), - "success_rate": round((success_requests / total_requests) * 100.0, 2) if total_requests else 0.0, - "throughput_rps": round(total_requests / elapsed, 2), - "latency_ms": { - "avg": round(statistics.mean(lat_sorted), 2) if lat_sorted else 0.0, - "p50": round(percentile(lat_sorted, 50), 2), - "p90": round(percentile(lat_sorted, 90), 2), - "p95": round(percentile(lat_sorted, 95), 2), - "p99": round(percentile(lat_sorted, 99), 2), - "max": round(max(lat_sorted), 2) if lat_sorted else 0.0, - }, - "status_codes": dict(sorted(status_counter.items(), key=lambda x: x[0])), - "errors": dict(sorted(err_counter.items(), key=lambda x: x[0])), - } - return result - - -def format_summary(result: Dict[str, Any]) -> str: - lines = [] - lines.append(f"\\n=== Scenario: {result['scenario']} ===") - lines.append( - "requests={total_requests} success={success_requests} fail={failed_requests} " - "success_rate={success_rate}% rps={throughput_rps}".format(**result) - ) - lat = result["latency_ms"] - lines.append( - f"latency(ms): avg={lat['avg']} p50={lat['p50']} p90={lat['p90']} p95={lat['p95']} p99={lat['p99']} max={lat['max']}" - ) - lines.append(f"status_codes: {result['status_codes']}") - if result["errors"]: - lines.append(f"errors: {result['errors']}") - return "\\n".join(lines) - - -def aggregate_results(results: List[Dict[str, Any]]) -> Dict[str, Any]: - if not results: - return {} - total_requests = sum(x["total_requests"] for x in results) - success_requests = sum(x["success_requests"] for x in results) - failed_requests = sum(x["failed_requests"] for x in results) - total_duration = sum(x["duration_sec"] for x in results) - weighted_avg_latency = 0.0 - if total_requests > 0: - weighted_avg_latency = sum(x["latency_ms"]["avg"] * x["total_requests"] for x in results) / total_requests - - return { - "scenario": "ALL", - "total_requests": total_requests, - "success_requests": success_requests, - "failed_requests": failed_requests, - "success_rate": round((success_requests / total_requests) * 100.0, 2) if total_requests else 0.0, - "aggregate_rps": round(total_requests / max(total_duration, 1e-9), 2), - "weighted_avg_latency_ms": round(weighted_avg_latency, 2), - } - - -def parse_csv_items(raw: str) -> List[str]: - return [x.strip() for x in str(raw or "").split(",") if x.strip()] - - -def parse_csv_ints(raw: str) -> List[int]: - values: List[int] = [] - seen = set() - for item in parse_csv_items(raw): - try: - value = int(item) - except ValueError as exc: - raise ValueError(f"Invalid integer in CSV list: {item}") from exc - if value <= 0: - raise ValueError(f"Concurrency must be > 0, got {value}") - if value in seen: - continue - seen.add(value) - values.append(value) - return values - - -def parse_args() -> argparse.Namespace: - parser = argparse.ArgumentParser(description="Interface-level load test for search and related microservices") - parser.add_argument( - "--scenario", - type=str, - default="all", - help="Scenario: backend_search | backend_suggest | embed_text | embed_image | translate | rerank | all | comma-separated list", - ) - parser.add_argument("--tenant-id", type=str, default="162", help="Tenant ID for backend search/suggest") - parser.add_argument("--duration", type=int, default=30, help="Duration seconds per scenario; <=0 means no duration cap") - parser.add_argument("--concurrency", type=int, default=20, help="Concurrent workers per scenario") - parser.add_argument("--max-requests", type=int, default=0, help="Stop after N requests per scenario (0 means unlimited)") - parser.add_argument("--timeout", type=float, default=10.0, help="Request timeout seconds") - parser.add_argument("--max-errors", type=int, default=0, help="Stop scenario when accumulated errors reach this value") - - parser.add_argument("--backend-base", type=str, default="http://127.0.0.1:6002", help="Base URL for backend search API") - parser.add_argument("--embedding-text-base", type=str, default="http://127.0.0.1:6005", help="Base URL for text embedding service") - parser.add_argument("--embedding-image-base", type=str, default="http://127.0.0.1:6008", help="Base URL for image embedding service") - parser.add_argument("--translator-base", type=str, default="http://127.0.0.1:6006", help="Base URL for translation service") - parser.add_argument("--reranker-base", type=str, default="http://127.0.0.1:6007", help="Base URL for reranker service") - - parser.add_argument("--cases-file", type=str, default="", help="Optional JSON file to override/add request templates") - parser.add_argument("--output", type=str, default="", help="Optional output JSON path") - parser.add_argument("--pause", type=float, default=0.0, help="Pause seconds between scenarios in all mode") - parser.add_argument( - "--concurrency-list", - type=str, - default="", - help="Comma-separated concurrency list (e.g. 1,5,10,20). If set, overrides --concurrency.", - ) - parser.add_argument( - "--rerank-dynamic-docs", - action="store_true", - help="For rerank scenario, generate docs payload dynamically on every request.", - ) - parser.add_argument("--rerank-doc-count", type=int, default=386, help="Doc count per rerank request when dynamic docs are enabled") - parser.add_argument("--rerank-vocab-size", type=int, default=1000, help="Word pool size for rerank dynamic docs generation") - parser.add_argument("--rerank-sentence-min-words", type=int, default=15, help="Minimum words per generated doc sentence") - parser.add_argument("--rerank-sentence-max-words", type=int, default=40, help="Maximum words per generated doc sentence") - parser.add_argument("--rerank-query", type=str, default="wireless mouse", help="Fixed query used for rerank dynamic docs mode") - parser.add_argument("--rerank-seed", type=int, default=20260312, help="Base random seed for rerank dynamic docs mode") - parser.add_argument( - "--rerank-top-n", - type=int, - default=0, - help="Optional top_n for rerank requests in dynamic docs mode (0 means omit top_n).", - ) - parser.add_argument( - "--embed-text-priority", - type=int, - default=0, - help="Default query param priority= for embed_text (0=offline admission; >0 bypasses rejection). Merged into params unless set in --cases-file.", - ) - parser.add_argument( - "--embed-image-priority", - type=int, - default=0, - help="Default query param priority= for embed_image (same semantics as embed-text-priority).", - ) - return parser.parse_args() - - -def build_rerank_dynamic_cfg(args: argparse.Namespace) -> Dict[str, Any]: - min_words = int(args.rerank_sentence_min_words) - max_words = int(args.rerank_sentence_max_words) - doc_count = int(args.rerank_doc_count) - vocab_size = int(args.rerank_vocab_size) - if doc_count <= 0: - raise ValueError(f"rerank-doc-count must be > 0, got {doc_count}") - if vocab_size <= 0: - raise ValueError(f"rerank-vocab-size must be > 0, got {vocab_size}") - if min_words <= 0: - raise ValueError(f"rerank-sentence-min-words must be > 0, got {min_words}") - if max_words < min_words: - raise ValueError( - f"rerank-sentence-max-words must be >= rerank-sentence-min-words, got {max_words} < {min_words}" - ) - if args.rerank_seed < 0: - raise ValueError(f"rerank-seed must be >= 0, got {args.rerank_seed}") - if int(args.rerank_top_n) < 0: - raise ValueError(f"rerank-top-n must be >= 0, got {args.rerank_top_n}") - - # Use deterministic, letter-only pseudo words to avoid long tokenization of numeric strings. - syllables = [ - "al", "an", "ar", "as", "at", "ba", "be", "bi", "bo", "ca", - "ce", "ci", "co", "da", "de", "di", "do", "el", "en", "er", - "fa", "fe", "fi", "fo", "ga", "ge", "gi", "go", "ha", "he", - "hi", "ho", "ia", "ie", "il", "in", "io", "is", "ka", "ke", - "ki", "ko", "la", "le", "li", "lo", "ma", "me", "mi", "mo", - ] - word_pool: List[str] = [] - for a in syllables: - for b in syllables: - word_pool.append(f"{a}{b}") - if len(word_pool) >= vocab_size: - break - if len(word_pool) >= vocab_size: - break - if len(word_pool) < vocab_size: - raise ValueError(f"Unable to generate enough synthetic words: requested={vocab_size}, got={len(word_pool)}") - return { - "query": args.rerank_query, - "doc_count": doc_count, - "min_words": min_words, - "max_words": max_words, - "seed": int(args.rerank_seed), - "normalize": True, - "top_n": int(args.rerank_top_n), - "word_pool": word_pool, - } - - -def build_random_rerank_payload( - cfg: Dict[str, Any], - rng: random.Random, -) -> Dict[str, Any]: - word_pool: List[str] = cfg["word_pool"] - docs = [] - for _ in range(cfg["doc_count"]): - doc_len = rng.randint(cfg["min_words"], cfg["max_words"]) - docs.append(" ".join(rng.choices(word_pool, k=doc_len))) - return { - "query": cfg["query"], - "docs": docs, - "normalize": bool(cfg.get("normalize", True)), - **({"top_n": int(cfg["top_n"])} if int(cfg.get("top_n", 0)) > 0 else {}), - } - - -async def main_async() -> int: - args = parse_args() - scenarios = build_scenarios(args) - - all_names = ["backend_search", "backend_suggest", "embed_text", "embed_image", "translate", "rerank"] - if args.scenario == "all": - run_names = [x for x in all_names if x in scenarios] - else: - requested = parse_csv_items(args.scenario) - if not requested: - print("No scenario specified.") - return 2 - unknown = [name for name in requested if name not in scenarios] - if unknown: - print(f"Unknown scenario(s): {', '.join(unknown)}") - print(f"Available: {', '.join(sorted(scenarios.keys()))}") - return 2 - run_names = requested - - if not run_names: - print("No scenarios to run.") - return 2 - - rerank_dynamic_cfg: Optional[Dict[str, Any]] = None - if args.rerank_dynamic_docs: - try: - rerank_dynamic_cfg = build_rerank_dynamic_cfg(args) - except ValueError as exc: - print(str(exc)) - return 2 - - concurrency_values = [args.concurrency] - if args.concurrency_list: - try: - concurrency_values = parse_csv_ints(args.concurrency_list) - except ValueError as exc: - print(str(exc)) - return 2 - if not concurrency_values: - print("concurrency-list is empty after parsing.") - return 2 - - print("Load test config:") - print(f" scenario={args.scenario}") - print(f" tenant_id={args.tenant_id}") - print(f" duration={args.duration}s") - print(f" concurrency={args.concurrency}") - print(f" concurrency_list={concurrency_values}") - print(f" max_requests={args.max_requests}") - print(f" timeout={args.timeout}s") - print(f" max_errors={args.max_errors}") - print(f" backend_base={args.backend_base}") - print(f" embedding_text_base={args.embedding_text_base}") - print(f" embedding_image_base={args.embedding_image_base}") - print(f" translator_base={args.translator_base}") - print(f" reranker_base={args.reranker_base}") - print(f" embed_text_priority={args.embed_text_priority}") - print(f" embed_image_priority={args.embed_image_priority}") - if args.rerank_dynamic_docs: - print(" rerank_dynamic_docs=True") - print(f" rerank_doc_count={args.rerank_doc_count}") - print(f" rerank_vocab_size={args.rerank_vocab_size}") - print(f" rerank_sentence_words=[{args.rerank_sentence_min_words},{args.rerank_sentence_max_words}]") - print(f" rerank_query={args.rerank_query}") - print(f" rerank_seed={args.rerank_seed}") - print(f" rerank_top_n={args.rerank_top_n}") - - results: List[Dict[str, Any]] = [] - total_jobs = len(run_names) * len(concurrency_values) - job_idx = 0 - for name in run_names: - scenario = scenarios[name] - for c in concurrency_values: - job_idx += 1 - print(f"\\n[{job_idx}/{total_jobs}] running {name} @ concurrency={c} ...") - result = await run_single_scenario( - scenario=scenario, - duration_sec=args.duration, - concurrency=c, - max_requests=args.max_requests, - max_errors=args.max_errors, - rerank_dynamic_cfg=rerank_dynamic_cfg if name == "rerank" else None, - ) - result["concurrency"] = c - print(format_summary(result)) - results.append(result) - - if args.pause > 0 and job_idx < total_jobs: - await asyncio.sleep(args.pause) - - final = { - "timestamp": time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()), - "config": { - "scenario": args.scenario, - "run_names": run_names, - "tenant_id": args.tenant_id, - "duration_sec": args.duration, - "concurrency": args.concurrency, - "concurrency_list": concurrency_values, - "max_requests": args.max_requests, - "timeout_sec": args.timeout, - "max_errors": args.max_errors, - "backend_base": args.backend_base, - "embedding_text_base": args.embedding_text_base, - "embedding_image_base": args.embedding_image_base, - "translator_base": args.translator_base, - "reranker_base": args.reranker_base, - "cases_file": args.cases_file or None, - "rerank_dynamic_docs": args.rerank_dynamic_docs, - "rerank_doc_count": args.rerank_doc_count, - "rerank_vocab_size": args.rerank_vocab_size, - "rerank_sentence_min_words": args.rerank_sentence_min_words, - "rerank_sentence_max_words": args.rerank_sentence_max_words, - "rerank_query": args.rerank_query, - "rerank_seed": args.rerank_seed, - "rerank_top_n": args.rerank_top_n, - "embed_text_priority": args.embed_text_priority, - "embed_image_priority": args.embed_image_priority, - }, - "results": results, - "overall": aggregate_results(results), - } - - print("\\n=== Overall ===") - print(json.dumps(final["overall"], ensure_ascii=False, indent=2)) - - if args.output: - out_path = Path(args.output) - out_path.parent.mkdir(parents=True, exist_ok=True) - out_path.write_text(json.dumps(final, ensure_ascii=False, indent=2), encoding="utf-8") - print(f"Saved JSON report: {out_path}") - - return 0 - - -def main() -> int: - try: - return asyncio.run(main_async()) - except KeyboardInterrupt: - print("Interrupted by user") - return 130 - - -if __name__ == "__main__": - raise SystemExit(main()) diff --git a/scripts/perf_cases.json.example b/scripts/perf_cases.json.example deleted file mode 100644 index 0291dcb..0000000 --- a/scripts/perf_cases.json.example +++ /dev/null @@ -1,71 +0,0 @@ -{ - "scenarios": { - "backend_search": [ - { - "method": "POST", - "path": "/search/", - "headers": {"X-Tenant-ID": "${tenant_id}"}, - "json": {"query": "wireless mouse", "size": 20, "language": "en", "enable_rerank": false} - }, - { - "method": "POST", - "path": "/search/", - "headers": {"X-Tenant-ID": "${tenant_id}"}, - "json": {"query": "芭比娃娃", "size": 20, "language": "zh", "enable_rerank": false} - } - ], - "backend_suggest": [ - { - "method": "GET", - "path": "/search/suggestions", - "headers": {"X-Tenant-ID": "${tenant_id}"}, - "params": {"q": "f", "size": 20, "language": "en"} - }, - { - "method": "GET", - "path": "/search/suggestions", - "headers": {"X-Tenant-ID": "${tenant_id}"}, - "params": {"q": "玩", "size": 20, "language": "zh"} - } - ], - "embed_text": [ - { - "method": "POST", - "path": "/embed/text", - "params": {"priority": "0"}, - "json": ["wireless mouse", "gaming keyboard", "USB-C cable", "barbie doll"] - } - ], - "embed_image": [ - { - "method": "POST", - "path": "/embed/image", - "params": {"normalize": "true", "priority": "0"}, - "json": ["/data/saas-search/docs/image-dress1.png"] - } - ], - "translate": [ - { - "method": "POST", - "path": "/translate", - "json": {"text": "商品标题", "target_lang": "en", "source_lang": "zh", "model": "qwen"} - } - ], - "rerank": [ - { - "method": "POST", - "path": "/rerank", - "json": { - "query": "wireless mouse", - "docs": [ - "Wireless ergonomic mouse", - "Bluetooth gaming mouse", - "USB cable 1 meter", - "Mouse pad large size" - ], - "normalize": true - } - } - ] - } -} diff --git a/scripts/reindex_from_remote_tenant_170_to_0.sh b/scripts/reindex_from_remote_tenant_170_to_0.sh deleted file mode 100755 index 214766e..0000000 --- a/scripts/reindex_from_remote_tenant_170_to_0.sh +++ /dev/null @@ -1,99 +0,0 @@ -#!/bin/bash -# -# 从远程 ES 的 search_products_tenant_170 同步 10000 条到本机 search_products_tenant_0。 -# 请求发往本机 ES,由本机去拉远程数据;需在本机 elasticsearch.yml 配置 reindex.remote.whitelist。 -# -# 用法: -# ./scripts/reindex_from_remote_tenant_170_to_0.sh -# -# 环境变量(可选): -# LOCAL_ES_HOST 本机 ES 地址,用于创建索引和发送 _reindex(默认从 .env 的 ES_HOST 读取,应为本机) -# REMOTE_ES_HOST 远程 ES 地址(默认 http://120.76.41.98:9200) -# REMOTE_ES_USER 远程 ES 用户名(默认 essa) -# REMOTE_ES_PASS 远程 ES 密码(默认 4hOaLaf41y2VuI8y) -# MAX_DOCS 同步条数(默认 10000) -# - -set -e - -cd "$(dirname "$0")/.." -PROJECT_ROOT="$(pwd)" - -# 加载 .env -# shellcheck source=scripts/lib/load_env.sh -source "${PROJECT_ROOT}/scripts/lib/load_env.sh" -load_env_file "${PROJECT_ROOT}/.env" - -# 本机 ES(发 _reindex 请求的目标) -LOCAL_ES_HOST="${LOCAL_ES_HOST:-${ES_HOST:-http://localhost:9200}}" -ES_USERNAME="${ES_USERNAME:-}" -ES_PASSWORD="${ES_PASSWORD:-}" -ES_INDEX_NAMESPACE="${ES_INDEX_NAMESPACE:-}" - -# 远程 ES(数据源) -REMOTE_ES_HOST="${REMOTE_ES_HOST:-http://120.76.41.98:9200}" -REMOTE_ES_USER="${REMOTE_ES_USER:-essa}" -REMOTE_ES_PASS="${REMOTE_ES_PASS:-4hOaLaf41y2VuI8y}" - -MAX_DOCS="${MAX_DOCS:-10000}" -SOURCE_INDEX="search_products_tenant_170" -DEST_INDEX="${ES_INDEX_NAMESPACE}search_products_tenant_0" -MAPPING_FILE="${PROJECT_ROOT}/mappings/search_products.json" - -# 本机 curl 认证 -AUTH_PARAM="" -if [ -n "$ES_USERNAME" ] && [ -n "$ES_PASSWORD" ]; then - AUTH_PARAM="-u ${ES_USERNAME}:${ES_PASSWORD}" -fi - -echo "本机 ES: $LOCAL_ES_HOST" -echo "远程 ES: $REMOTE_ES_HOST" -echo "源索引: $SOURCE_INDEX" -echo "目标索引: $DEST_INDEX" -echo "同步条数: $MAX_DOCS" -echo "" - -# 1. 若目标索引不存在,则创建 -if ! curl -s $AUTH_PARAM "${LOCAL_ES_HOST}/${DEST_INDEX}" -o /dev/null -w "%{http_code}" | grep -q 200; then - echo "创建目标索引: $DEST_INDEX" - if [ ! -f "$MAPPING_FILE" ]; then - echo "错误: mapping 文件不存在: $MAPPING_FILE" - exit 1 - fi - curl -X PUT "${LOCAL_ES_HOST}/${DEST_INDEX}" \ - -H "Content-Type: application/json" \ - $AUTH_PARAM \ - -d @"${MAPPING_FILE}" \ - -w "\nHTTP: %{http_code}\n" -s | tail -1 - echo "" -else - echo "目标索引已存在: $DEST_INDEX,将写入数据(可能覆盖同 id 文档)" -fi - -# 2. Reindex from remote(JSON 中的密码用 env 传入,避免 shell 转义) -echo "执行 Reindex from remote(最多 $MAX_DOCS 条)..." -export REMOTE_ES_HOST REMOTE_ES_USER REMOTE_ES_PASS SOURCE_INDEX DEST_INDEX MAX_DOCS -# ES 9.x 将 wait_for_completion 放在 query 参数,不在 body -curl -X POST "${LOCAL_ES_HOST}/_reindex?wait_for_completion=true&pretty" \ - -H "Content-Type: application/json" \ - $AUTH_PARAM \ - -d @- </dev/null)"; then - return 1 - fi - echo "$body" | "$PYTHON" -c " -import json, sys -want_b, want_f = sys.argv[1], sys.argv[2] -d = json.load(sys.stdin) -if d.get('status') != 'ok' or not d.get('model_loaded'): - sys.exit(1) -if d.get('backend') != want_b: - sys.exit(1) -if d.get('instruction_format') != want_f: - sys.exit(1) -sys.exit(0) -" "$want_backend" "$want_fmt" -} - -wait_health() { - local want_backend="$1" - local want_fmt="$2" - local i - for i in $(seq 1 180); do - if health_ok "$want_backend" "$want_fmt"; then - curl -sS "http://127.0.0.1:6007/health" | "$PYTHON" -m json.tool - return 0 - fi - echo "[wait] ${i}/180 backend=${want_backend} instruction_format=${want_fmt} ..." - sleep 3 - done - echo "[error] health did not match in time" >&2 - return 1 -} - -run_one() { - local backend="$1" - local fmt="$2" - local tag="${backend}|${fmt}" - local jf="${OUT_DIR}/${backend}_${fmt}.json" - - echo "========== ${tag} ==========" - "$PYTHON" "${ROOT}/scripts/patch_rerank_vllm_benchmark_config.py" \ - --backend "$backend" --instruction-format "$fmt" - - "${ROOT}/restart.sh" reranker - wait_health "$backend" "$fmt" - - if ! "$PYTHON" "${ROOT}/scripts/benchmark_reranker_random_titles.py" \ - 100,200,400,600,800,1000 \ - --repeat 5 \ - --seed 42 \ - --quiet-runs \ - --timeout 360 \ - --tag "$tag" \ - --json-summary-out "$jf" - then - echo "[warn] benchmark exited non-zero for ${tag} (see ${jf} failed flag / partial runs)" >&2 - fi - - echo "artifact: $jf" -} - -run_one qwen3_vllm compact -run_one qwen3_vllm standard -run_one qwen3_vllm_score compact -run_one qwen3_vllm_score standard - -# Restore repo-default-style rerank settings (score + compact). -"$PYTHON" "${ROOT}/scripts/patch_rerank_vllm_benchmark_config.py" \ - --backend qwen3_vllm_score --instruction-format compact -"${ROOT}/restart.sh" reranker -wait_health qwen3_vllm_score compact -echo "Restored config: qwen3_vllm_score + compact. Done. Artifacts under ${OUT_DIR}" diff --git a/scripts/smoke_qwen3_vllm_score_backend.py b/scripts/smoke_qwen3_vllm_score_backend.py deleted file mode 100644 index 02322ac..0000000 --- a/scripts/smoke_qwen3_vllm_score_backend.py +++ /dev/null @@ -1,76 +0,0 @@ -#!/usr/bin/env python3 -""" -Smoke test: load Qwen3VLLMScoreRerankerBackend (must run as a file, not stdin — vLLM spawn). - -Usage (from repo root, score venv): - PYTHONPATH=. ./.venv-reranker-score/bin/python scripts/smoke_qwen3_vllm_score_backend.py - -Same as production: vLLM child processes need the venv's ``bin`` on PATH (for pip's ``ninja`` when -vLLM auto-selects FLASHINFER on T4/Turing). ``start_reranker.sh`` exports that; this script prepends -``sysconfig.get_path("scripts")`` (the stdlib location for this environment's console scripts, -independent of ``python`` symlink targets). -""" - -from __future__ import annotations - -import argparse -import logging -import os -import sys -import sysconfig -from pathlib import Path - -# Repo root on sys.path when run as scripts/smoke_*.py -_ROOT = Path(__file__).resolve().parents[1] -if str(_ROOT) not in sys.path: - sys.path.insert(0, str(_ROOT)) - -logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s") - -import torch - -from reranker.backends.qwen3_vllm_score import ( - Qwen3VLLMScoreRerankerBackend, -) - - -def main() -> int: - p = argparse.ArgumentParser() - p.add_argument( - "--gpu-memory-utilization", - type=float, - default=0.12, - help="vLLM gpu_memory_utilization (default 0.12 for tight GPUs)", - ) - args = p.parse_args() - - scripts = sysconfig.get_path("scripts") - if scripts: - os.environ["PATH"] = scripts + os.pathsep + os.environ.get("PATH", "") - - if not torch.cuda.is_available(): - print("SKIP: CUDA not available") - return 0 - - cfg = { - "model_name": "Qwen/Qwen3-Reranker-0.6B", - "max_model_len": 160, - "tensor_parallel_size": 1, - "gpu_memory_utilization": args.gpu_memory_utilization, - "dtype": "float16", - "enable_prefix_caching": False, - "enforce_eager": True, - "infer_batch_size": 4, - "instruction_format": "compact", - } - print("Loading backend ...") - backend = Qwen3VLLMScoreRerankerBackend(cfg) - scores, meta = backend.score_with_meta("smoke query", ["title one", "title two"], normalize=False) - print("scores:", scores) - print("meta:", {k: meta[k] for k in ("backend", "infer_batch_size", "instruction_format") if k in meta}) - print("OK") - return 0 - - -if __name__ == "__main__": - raise SystemExit(main()) diff --git a/scripts/start.sh b/scripts/start.sh deleted file mode 100755 index cc15e75..0000000 --- a/scripts/start.sh +++ /dev/null @@ -1,10 +0,0 @@ -#!/bin/bash - -# Service start entrypoint. -# Delegates to unified service controller. - -set -euo pipefail - -cd "$(dirname "$0")/.." - -./scripts/service_ctl.sh up "$@" diff --git a/scripts/test_build_docs_api.py b/scripts/test_build_docs_api.py deleted file mode 100644 index 24aa533..0000000 --- a/scripts/test_build_docs_api.py +++ /dev/null @@ -1,159 +0,0 @@ -#!/usr/bin/env python3 -""" -测试 POST /indexer/build-docs 接口:构造请求数据、调用接口、打印完整响应。 - -用法: - 1. 先启动 Indexer 服务: ./scripts/start_indexer.sh (或 uvicorn api.indexer_app:app --port 6004) - 2. 执行: python scripts/test_build_docs_api.py - - 也可指定地址: INDEXER_URL=http://localhost:6004 python scripts/test_build_docs_api.py -""" - -import json -import os -import sys -from datetime import datetime, timezone - -# 项目根目录 -ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) -sys.path.insert(0, ROOT) - -# 默认使用 requests 调真实服务;若未安装则回退到 TestClient -try: - import requests - HAS_REQUESTS = True -except ImportError: - HAS_REQUESTS = False - - -def build_sample_request(): - """构造一条完整的 build-docs 请求体(对应 shoplazza_product_spu / sku / option 表结构)。""" - now = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") - sample_image_url = os.getenv( - "SAMPLE_IMAGE_URL", - "https://oss.essa.cn/98532128-cf8e-456c-9e30-6f2a5ea0c19f.jpg", - ) - return { - "tenant_id": "162", - "items": [ - { - "spu": { - "id": 10001, - "tenant_id": "162", - "title": "测试T恤 纯棉短袖", - "brief": "舒适纯棉,多色可选", - "description": "这是一款适合日常穿着的纯棉T恤,透气吸汗。", - "vendor": "测试品牌", - "category": "服装/上衣/T恤", - "category_id": 100, - "category_level": 2, - "category_path": "服装/上衣/T恤", - "fake_sales": 1280, - "image_src": sample_image_url, - "tags": "T恤,纯棉,短袖,夏季", - "create_time": now, - "update_time": now, - }, - "skus": [ - { - "id": 20001, - "spu_id": 10001, - "price": 99.0, - "compare_at_price": 129.0, - "sku": "SKU-TSHIRT-001", - "inventory_quantity": 50, - "option1": "黑色", - "option2": "M", - "option3": None, - }, - { - "id": 20002, - "spu_id": 10001, - "price": 99.0, - "compare_at_price": 129.0, - "sku": "SKU-TSHIRT-002", - "inventory_quantity": 30, - "option1": "白色", - "option2": "L", - "option3": None, - }, - ], - "options": [ - {"id": 1, "position": 1, "name": "颜色"}, - {"id": 2, "position": 2, "name": "尺码"}, - ], - } - ], - } - - -def call_via_http(base_url: str, body: dict): - """通过 HTTP 调用 build-docs。""" - url = f"{base_url.rstrip('/')}/indexer/build-docs" - r = requests.post(url, json=body, timeout=30) - return r.status_code, r.text, r.json() if r.headers.get("content-type", "").startswith("application/json") else None - - -def call_via_test_client(body: dict): - """通过 FastAPI TestClient 调用(不依赖已启动服务,但需 DB/ES 已配置)。""" - from fastapi.testclient import TestClient - import api.indexer_app as indexer_app - - with TestClient(indexer_app.app) as client: - r = client.post("/indexer/build-docs", json=body) - return r.status_code, r.text, r.json() if r.headers.get("content-type", "").startswith("application/json") else None - - -def main(): - body = build_sample_request() - - print("=" * 60) - print("【请求】POST /indexer/build-docs") - print("=" * 60) - print(json.dumps(body, ensure_ascii=False, indent=2)) - - base_url = os.getenv("INDEXER_URL", "http://localhost:6004") - use_http = HAS_REQUESTS and (os.getenv("USE_TEST_CLIENT", "").lower() not in ("1", "true", "yes")) - - if use_http: - try: - status, raw, data = call_via_http(base_url, body) - except requests.RequestException as e: - print("\n[错误] 无法连接 Indexer 服务:", e) - print("请先启动: ./scripts/start_indexer.sh 或 uvicorn api.indexer_app:app --port 6004") - if HAS_REQUESTS: - print("或使用进程内测试: USE_TEST_CLIENT=1 python scripts/test_build_docs_api.py") - sys.exit(1) - else: - if not use_http and not HAS_REQUESTS: - print("\n[提示] 未安装 requests,使用 TestClient 调用(需配置 DB/ES)。") - else: - print("\n[提示] 使用 TestClient 调用(USE_TEST_CLIENT=1)。") - try: - status, raw, data = call_via_test_client(body) - except Exception as e: - print("\n[错误] TestClient 调用失败:", e) - print("请确保已 source activate.sh 且 DB/ES 环境变量正确,或先启动 Indexer 再用 HTTP 调用。") - sys.exit(1) - - print("\n" + "=" * 60) - print("【响应】HTTP status =", status) - print("=" * 60) - if data is not None: - print(json.dumps(data, ensure_ascii=False, indent=2, default=str)) - if data.get("docs"): - doc = data["docs"][0] - print("\n" + "=" * 60) - print("【返回 doc 顶层字段】共 {} 个".format(len(doc))) - print("=" * 60) - for k in sorted(doc.keys()): - print(" ", k) - else: - print(raw) - - if status != 200: - sys.exit(1) - - -if __name__ == "__main__": - main() diff --git a/scripts/trace_indexer_calls.sh b/scripts/trace_indexer_calls.sh deleted file mode 100755 index b17f480..0000000 --- a/scripts/trace_indexer_calls.sh +++ /dev/null @@ -1,76 +0,0 @@ -#!/bin/bash -# -# 排查「谁在调用索引服务」的脚本 -# 用法: ./scripts/trace_indexer_calls.sh -# - -set -euo pipefail - -cd "$(dirname "$0")/.." -source ./activate.sh 2>/dev/null || true - -echo "==========================================" -echo "索引服务调用方排查" -echo "==========================================" - -INDEXER_PORT="${INDEXER_PORT:-6004}" -EMBEDDING_TEXT_PORT="${EMBEDDING_TEXT_PORT:-6005}" -EMBEDDING_IMAGE_PORT="${EMBEDDING_IMAGE_PORT:-6008}" - -echo "" -echo "1. 监听端口 6004 的进程(Indexer 服务)" -echo "------------------------------------------" -if command -v lsof >/dev/null 2>&1; then - lsof -i :"${INDEXER_PORT}" 2>/dev/null || echo " (无进程监听或 lsof 无权限)" -else - ss -tlnp 2>/dev/null | grep ":${INDEXER_PORT}" || echo " (无进程监听)" -fi - -echo "" -echo "2. 连接到 6004 的客户端(谁在请求 Indexer)" -echo "------------------------------------------" -if command -v ss >/dev/null 2>&1; then - ss -tnp 2>/dev/null | grep ":${INDEXER_PORT}" || echo " (当前无活跃连接)" -elif command -v netstat >/dev/null 2>&1; then - netstat -tnp 2>/dev/null | grep ":${INDEXER_PORT}" || echo " (当前无活跃连接)" -else - echo " 请安装 ss 或 netstat" -fi - -echo "" -echo "3. 连接到 Embedding 服务的客户端" -echo "------------------------------------------" -if command -v ss >/dev/null 2>&1; then - ss -tnp 2>/dev/null | grep -E ":${EMBEDDING_TEXT_PORT}|:${EMBEDDING_IMAGE_PORT}" || echo " (当前无活跃连接)" -fi - -echo "" -echo "4. 检查定时任务(cron)" -echo "------------------------------------------" -(crontab -l 2>/dev/null | grep -i indexer) || echo " 当前用户无相关 cron" -if [ -d /etc/cron.d ]; then - grep -l -i indexer /etc/cron.d/* 2>/dev/null || true -fi - -echo "" -echo "5. 端口与逻辑说明" -echo "------------------------------------------" -echo " - Indexer 服务: 端口 ${INDEXER_PORT}" -echo " 启动: ./scripts/start_indexer.sh 或 python main.py serve-indexer" -echo " 接口: POST /indexer/reindex, POST /indexer/index, POST /indexer/build-docs 等" -echo "" -echo " - 调用方(文档说明): 外部 Java 程序或 curl 等 HTTP 客户端" -echo " 全量: curl -X POST http://localhost:${INDEXER_PORT}/indexer/reindex -d '{\"tenant_id\":\"170\",\"batch_size\":500}'" -echo " 增量: curl -X POST http://localhost:${INDEXER_PORT}/indexer/index -d '{\"tenant_id\":\"170\",\"spu_ids\":[\"123\"]}'" -echo "" -echo " - Indexer 内部会调用:" -echo " - Text Embedding 服务 (${EMBEDDING_TEXT_PORT}): POST /embed/text" -echo " - Image Embedding 服务 (${EMBEDDING_IMAGE_PORT}): POST /embed/image" -echo " - Qwen API: dashscope.aliyuncs.com (翻译、LLM 分析)" -echo " - MySQL: 商品数据" -echo " - Elasticsearch: 写入索引" -echo "" -echo "6. 实时监控连接(按 Ctrl+C 停止)" -echo "------------------------------------------" -echo " 运行: watch -n 2 'ss -tnp | grep -E \":${INDEXER_PORT}|:${EMBEDDING_TEXT_PORT}|:${EMBEDDING_IMAGE_PORT}\"'" -echo "" diff --git a/tests/manual/README.md b/tests/manual/README.md new file mode 100644 index 0000000..e72e11f --- /dev/null +++ b/tests/manual/README.md @@ -0,0 +1,5 @@ +# Manual Tests + +`tests/manual/` 存放需要人工启动依赖服务、手动观察结果或依赖真实外部环境的试跑脚本。 + +这类脚本不属于 `pytest` 自动回归范围,也不应与 `tests/ci` 的契约测试混为一类。 diff --git a/tests/manual/test_build_docs_api.py b/tests/manual/test_build_docs_api.py new file mode 100644 index 0000000..bf13793 --- /dev/null +++ b/tests/manual/test_build_docs_api.py @@ -0,0 +1,159 @@ +#!/usr/bin/env python3 +""" +测试 POST /indexer/build-docs 接口:构造请求数据、调用接口、打印完整响应。 + +用法: + 1. 先启动 Indexer 服务: ./scripts/start_indexer.sh (或 uvicorn api.indexer_app:app --port 6004) + 2. 执行: python tests/manual/test_build_docs_api.py + + 也可指定地址: INDEXER_URL=http://localhost:6004 python tests/manual/test_build_docs_api.py +""" + +import json +import os +import sys +from datetime import datetime, timezone + +# 项目根目录 +ROOT = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) +sys.path.insert(0, ROOT) + +# 默认使用 requests 调真实服务;若未安装则回退到 TestClient +try: + import requests + HAS_REQUESTS = True +except ImportError: + HAS_REQUESTS = False + + +def build_sample_request(): + """构造一条完整的 build-docs 请求体(对应 shoplazza_product_spu / sku / option 表结构)。""" + now = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") + sample_image_url = os.getenv( + "SAMPLE_IMAGE_URL", + "https://oss.essa.cn/98532128-cf8e-456c-9e30-6f2a5ea0c19f.jpg", + ) + return { + "tenant_id": "162", + "items": [ + { + "spu": { + "id": 10001, + "tenant_id": "162", + "title": "测试T恤 纯棉短袖", + "brief": "舒适纯棉,多色可选", + "description": "这是一款适合日常穿着的纯棉T恤,透气吸汗。", + "vendor": "测试品牌", + "category": "服装/上衣/T恤", + "category_id": 100, + "category_level": 2, + "category_path": "服装/上衣/T恤", + "fake_sales": 1280, + "image_src": sample_image_url, + "tags": "T恤,纯棉,短袖,夏季", + "create_time": now, + "update_time": now, + }, + "skus": [ + { + "id": 20001, + "spu_id": 10001, + "price": 99.0, + "compare_at_price": 129.0, + "sku": "SKU-TSHIRT-001", + "inventory_quantity": 50, + "option1": "黑色", + "option2": "M", + "option3": None, + }, + { + "id": 20002, + "spu_id": 10001, + "price": 99.0, + "compare_at_price": 129.0, + "sku": "SKU-TSHIRT-002", + "inventory_quantity": 30, + "option1": "白色", + "option2": "L", + "option3": None, + }, + ], + "options": [ + {"id": 1, "position": 1, "name": "颜色"}, + {"id": 2, "position": 2, "name": "尺码"}, + ], + } + ], + } + + +def call_via_http(base_url: str, body: dict): + """通过 HTTP 调用 build-docs。""" + url = f"{base_url.rstrip('/')}/indexer/build-docs" + r = requests.post(url, json=body, timeout=30) + return r.status_code, r.text, r.json() if r.headers.get("content-type", "").startswith("application/json") else None + + +def call_via_test_client(body: dict): + """通过 FastAPI TestClient 调用(不依赖已启动服务,但需 DB/ES 已配置)。""" + from fastapi.testclient import TestClient + import api.indexer_app as indexer_app + + with TestClient(indexer_app.app) as client: + r = client.post("/indexer/build-docs", json=body) + return r.status_code, r.text, r.json() if r.headers.get("content-type", "").startswith("application/json") else None + + +def main(): + body = build_sample_request() + + print("=" * 60) + print("【请求】POST /indexer/build-docs") + print("=" * 60) + print(json.dumps(body, ensure_ascii=False, indent=2)) + + base_url = os.getenv("INDEXER_URL", "http://localhost:6004") + use_http = HAS_REQUESTS and (os.getenv("USE_TEST_CLIENT", "").lower() not in ("1", "true", "yes")) + + if use_http: + try: + status, raw, data = call_via_http(base_url, body) + except requests.RequestException as e: + print("\n[错误] 无法连接 Indexer 服务:", e) + print("请先启动: ./scripts/start_indexer.sh 或 uvicorn api.indexer_app:app --port 6004") + if HAS_REQUESTS: + print("或使用进程内测试: USE_TEST_CLIENT=1 python tests/manual/test_build_docs_api.py") + sys.exit(1) + else: + if not use_http and not HAS_REQUESTS: + print("\n[提示] 未安装 requests,使用 TestClient 调用(需配置 DB/ES)。") + else: + print("\n[提示] 使用 TestClient 调用(USE_TEST_CLIENT=1)。") + try: + status, raw, data = call_via_test_client(body) + except Exception as e: + print("\n[错误] TestClient 调用失败:", e) + print("请确保已 source activate.sh 且 DB/ES 环境变量正确,或先启动 Indexer 再用 HTTP 调用。") + sys.exit(1) + + print("\n" + "=" * 60) + print("【响应】HTTP status =", status) + print("=" * 60) + if data is not None: + print(json.dumps(data, ensure_ascii=False, indent=2, default=str)) + if data.get("docs"): + doc = data["docs"][0] + print("\n" + "=" * 60) + print("【返回 doc 顶层字段】共 {} 个".format(len(doc))) + print("=" * 60) + for k in sorted(doc.keys()): + print(" ", k) + else: + print(raw) + + if status != 200: + sys.exit(1) + + +if __name__ == "__main__": + main() diff --git a/tests/reranker_performance/curl1.sh b/tests/reranker_performance/curl1.sh deleted file mode 100644 index e9b946e..0000000 --- a/tests/reranker_performance/curl1.sh +++ /dev/null @@ -1,23 +0,0 @@ -#!/bin/bash - -start=$(date +%s%N) # 开始时间,纳秒级 - -# 将 titles.400 每一行转成 JSON 数组 - -docs_json=$(jq -R -s 'split("\n") | map(select(length > 0))' /data/saas-search/tests/data/titles.400) - -time curl -X POST "http://localhost:6007/rerank" \ - -H "Content-Type: application/json" \ - -d "$(jq -n \ - --arg query "健身女生T恤短袖" \ - --argjson docs "$docs_json" \ - '{ - query: $query, - docs: $docs, - top_n: 386, - normalize: true - }')" - -end=$(date +%s%N) # 结束时间,纳秒级 -duration=$(( (end - start) / 1000000 )) # 转换为毫秒 -echo "Command took $duration milliseconds." diff --git a/tests/reranker_performance/curl1_simple.sh b/tests/reranker_performance/curl1_simple.sh deleted file mode 100644 index f55842e..0000000 --- a/tests/reranker_performance/curl1_simple.sh +++ /dev/null @@ -1,417 +0,0 @@ -#!/bin/bash -start=$(date +%s%N) # 开始时间,纳秒级 - -time curl -X POST "http://localhost:6007/rerank" \ - -H "Content-Type: application/json" \ - -d '{ - "query": "健身女生T恤短袖", - "docs": [ "60 Jelly Bracelets 80 s Adult Size - MAQIHAN Neon Gummy Bracelets for Women 80s Jelly Bangles Glow Silicone Bands Jewelry Wristband Rainbow Jellies Bangle Girls Boys Colored Accessories Party Favor", -"MEROKEETY Women s 2025 Summer Square Neck Puff Sleeve Boho Midi Dress Swiss Dot Ruffle Flowy Tie Back Dress", -"FITORY Mens Sandals", -"Lefant 3 Packs Dust Bags Replacement Kit Suitable for Lefant Base Station of M3/M3 Max Robot Vacuum", -"Merrell Mens Hydro Moc", -"Lounge Sets for Women Summer Outfits Women 2 Piece Sets 2025 Sleeveless Matching Lounge Crop Top High Waisted Short", -"Men s Underwear", -"Executive Functioning Workbook for Teens: 101 Activities and Strategies for Enhancing Self-Discipline", -"LEVSOX Compression Socks Women and Men", -"MGparty 12 Pieces Christmas Headbands Christmas Parties Favors Decoration Supplies Xmas Gifts Photo Booth Xmas Tree Snowman Reindeer Antlers Santa Hat", -"10 Large Vacuum Storage Bags with Hand Pump", -"Disney Lilo and Stitch Boys Swim Set", -"Sterling Silver Hoop Earrings", -"23 Pcs Day of The Dead Altar Decorations Set", -"Travel Makeup Bag for Women Fashion Large Capacity Pouch Open Flat Cosmetic Portable Organizer Waterproof Large Opening Storage Toiletry Bags Vertical Free-Standing Brush Holder for Easy Access Blue", -"Iron Flame: Empyrean", -"Luxebell Luggage Straps Suitcase Belt TSA Approved Travel Accessories Gift 4-Pack 6.56ft (Green)", -"TONY & SANDY Christian Gifts for Women", -"Blue Birthday Party Supplies", -"Vionic Women s Coral Loafer Moccasin", -"LIQING 35L Large Picnic Basket 2 Layers of Internal Pockets Leak-Proof and Insulated ,Folding with Internal Support for enhansed Stability", -"40oz Softball Tumbler with Handle Softball Gifts Stuff for Women Girls Men Gift for Coach Lovers Fan Stainless Steel Cup", -"Crayola Colour & Erase Reusable Puzzle Set", -"Carry On Luggage with Front Compartment and Cup Holder", -"Interactive Cat Toy Rechargeable", -"Nike Air Rift", -"Portable Hookah Set for Travel - Premium Handheld Glass Aluminum Mini Hookah Real Metal Accessories", -"Clear Backpack for Boys", -"Women’s Knee High Boots Round Toe Chunky Heel Faux Leather Tall Riding Boots with Side Zipper", -"Golf Grip Trainer & Connection Band 2Set", -"Monster High Self Scare Day Cleo De Nile Doll Play Set", -"Fortnite eGift Card - Powered by the Epic Games Store", -"Mesh Beach Bags", -"Crowye Anime Cosplay Costume for Halloween Princess Costume Accessories Anime White Cosplay Wig Egypt Arm Cuff Bracelet Gold Earrings Greek Goddess Set for Halloween Dress up Princess", -"Premium Women s Leather Tote Handbag - Bag for Everyday Use", -"Ekouaer Maternity Nursing Gown and Robe Set Labor Delivery Nursing Nightgowns for Breastfeeding Pregnancy Clothes", -"Superband Mermaid Tails for Swimming for Women and Adults Without Monofin", -"Pink Queen Women s 2025 Casual Pullover Sweaters Sexy V Neck Long Sleeve Twist Knot Cropped Knit Sweater Tops", -"WDIRARA Girl s Bow Puff Sleeve A Line Midi Dress Cute Collared Ruffle Hem Swing Dresses", -"Funziez! Adult Onesie Halloween Costume Animal Dinosaur Shark Unisex Plush One Piece Cosplay Suit for Adults", -"Rockland Duffel Bag", -"Centipede Demon Baby Shoes Baby Boys Girls Walking Shoes Non Slip Booties Sock Shoe Infants Breathable Sneakers Lightweight Barefoot Slip On Sneakers", -"CYDREAM Long Sleeve Bodysuits for Women - Square Neck Shapewear Bodysuit Tops Going Out Body Suits Shirt Leotard", -"Men s Oversized Letter Graphic Tank Top Sleeveless Casual Summer Tops Y2K Streetwear", -"Flower Claw Clip 7 PCS Claw Clips", -"waist twister,waist twisting machine ab twister board with 300 lbs Weight Capacity", -"PAGE ONE Womens Winter Ribbed Beanie Crossed Cap Chunky Cable Knit Pompom Soft Warm Hat", -"5 Pack Cute Keychains for Girls", -"Dragon Ball Super - Complete Series - Blu-ray", -"VejiA Multifunctional Simple Shoe Cabinet Storage Shoe Rack Save Space Hallway Furniture", -"50Pcs Handbag Purse Feet Handbag Nailhead Brass Studs Screw-Back Feet Flat Head Stud Metal Studs Rivet Leather Craft DIY for DIY Purse Leather Craft", -"Wearable Blanket Hoodie with Letter A-Z - Oversized Blanket Hooded Personalized Birthday Christmas Gifts for Women Mom", -"On Women s Cloudnova Form 2 Sneakers", -"SANTINY 18 Skorts for Women with 4 Pockets High Waist Long Athletic Tennis Skirt Golf Skort Dressy Casual", -"Compatible with AirTag Case Keychain", -"Rod Holder Plugs", -"Protective Case Compatible with Have A Seat Figure-Clear PVC Portable Storage Box with Keychain", -"adidas Men s Swift Run 1.0 Running Shoes", -"M MOOHAM Cross Necklace for Women Teen Girls", -"Sportneer Adjustable Ankle Weights for Women and Men 7 lbs/Pair Adjustable Leg Weights with Secure Straps", -"PRETTYGARDEN Women s 2 Piece Outfits Sleeveless Suit Vest and Wide Leg Pants Business Casual Blazer Sets", -"Bouncer Seat for Babies 0-12 Months", -"Womens Crew Socks Cotton Long Gym Socks Lightweight Athletic Running Socks", -"Denior Magnetic Card Phone Wallet Holder for iPhone 17/16/15/14/13/12 Series", -"LIGHT DOT Women s Summer Dress Plisse Maxi Tube Bodycon Dress Back Tie Beach Resort Vacation", -"Vivresina UV Resin 400g (400.0", -"Wide Leg Pants High Waisted Pleated Trousers with 4 Colors", -"Osprey Daylite Shoulder Sling Bag – Compact Crossbody Backpack for Everyday Carry", -"Tote Bag for Women Large PVC Tote Bag Letters Print Plastic Handbag for Christmas Gift", -"Hello Kitty Giant Coloring & Activity Book 11x16", -"Skechers Mens Delson 3.0 - Roth 210606", -"3pcs Heart Badge Reel with Alligator Clip Cute Retractable Badge Holder Acrylic Nurse Badge Clip for Office Workers", -"Ortho Balance Hiking Shoes for Men Women", -"GOLDENMATE 1000VA/600W Lithium UPS Battery Backup and Surge Protector", -"Gelante Solid Color 100% Cotton Bucket Hat for Women and Men Packable Travel Summer Beach Hat", -"Sonic The Hedgehog 3 Movie Action Figures 2.5-Inch Movie Collector Toy Figure Multi-Pack Includes Sonic The Hedgehog Knuckles Shadow Buzz Bomber & Drone- Officially Licensed Toys", -"61 Pcs Nacho Libre Stickers Comedy Movie Graffiti Waterproof Vinyl for Adults for Birthday Party Supplies Decoration Favors for Water Bottles Laptop Suitcase Scrapbooking Choice", -"Neck Lift Tape", -"925 Sterling Silver Earrings for Womens Sparkly Colorful Full Diamond Simple Stylish Elegant Hypoallergenic Jewelry", -"Pink Ceramic Bow Vase for Flowers", -"Winter Coats For Men Winter Jackets Water Resistant Warm Thicken Parka Puffer Coat Long Down Jacket", -"Alarm Clocks for Bedrooms", -"KINURI Running Belt for Men & Women – Fits All Smartphones – Waterproof Waist Pack with Adjustable Strap – Ideal for Jogging", -"DREAM PAIRS Heels for Women Flip Flops Kitten Low Heels Open Square Toe Thong Heeled Sandals", -"Amazon Basics All Purpose Washable School Craft Liquid Glue for Making Slime", -"Inflatable Costume Adult Frog Full Body Deluxe Funny Air Blow Up Costume for Men Women Halloween", -"Mens Golf Pants Stretch Casual Dress Pants Elastic Drawstring Slacks for Men Lightweight Trousers with 5 Pockets", -"Lip Smacker Hello Kitty Lip Balm", -"Brown Sugar Keeper 3D – Terracotta Clay Bear Softener", -"MEETSUN Polarized Sunglasses for Women Men Trendy Classic Retro Designer Style", -"Corset Top Bustier Lingerie for Women Zipper Front Flower Sexy Burlesque Vintage", -"Pro Club Men s Heavyweight Mesh Basketball Shorts", -"Nike Tech Men s Full-Zip Windrunner Hoodie (HV0949-237", -"Ear Piercing Kit", -"Timberland Men s 6 Premium Boot", -"STAR WARS The Black Series Darth Maul", -"VZQI Halloween Cosplay Costumes Kamado Tanjir Kids Anime Kimono Halloween Green Cloak", -"Fringe Vest for Women Faux Suede Open Front Cardigan Sleeveless Tassels Fringed Vest Cardigan Hippie Jacket", -"Smart Health Ring 2.0 for Women Men", -"Fast Forward Kid s Licensed 15 Backpack With Lunch Box Combo Set (Hello Kitty)", -"Handmade Authentic Katana - 41-inch Full Tang Sharp Blade", -"Inateck Sling Bag X", -"EXLURA Women s Fashion Faux Wool Mini Skirt High Waisted Y2K Trendy Side Slit Tweed Plaid Skirts 2025 Fall Winter Outfits", -"LASLULU Womens Sexy Crossover Crop Top Long Sleeve Workout Tops Crewneck Athletic Yoga T-Shirts Fall Outfits", -"Wrangler Authentics Men s Classic Relaxed Fit Five Pocket Jean Short", -"ZeroBound Built in Bra Tank Tops for Women - High Neck Racerback Tank Tops", -"Nike Mens Air Max Alpha Trainer 6", -"MAZZERI Solid Gold Plated Sterling Silver Italian 1.3/1.6/2.2/2.8mm Diamond-Cut Braided Rope Chain Necklace for Men Women", -"Milumia Women s Polka Dots Twist Front Halter Top Dressy Casual Textured Peplum Going Out Tops", -"80s 90s Outfit for Women", -"EFAN Womens Sexy Sleeveless Double Lined Crop Tops Workout Cute Tight Racerback Tank Tops Summer Clothes Teen Girls 2025", -"Nike Mens Shorts Dri-Fit Flex Woven Shorts 7inch (US", -"top handle satchel Women", -"Kono Expandable Luggage 3 Piece Set Hardshell Lightweight 20in 24in 28in Carry On Suitcase with Spinner Wheels TSA Lock(Black & Brown)", -"Nations of The World | National Pride Flag Symbol Arms Tee Unisex T-Shirt for Men or Women", -"Jo & Bette Seamless Thongs for Women - High Waist Panties 6 Pack - Thong Underwear Pack Breathable No show Sports", -"eKids Disney Frozen 2 Bluetooth Headphones with Microphone", -"Arctix Kids Insulated Snow Bib Overalls", -"USA Flag Charlie Gift T-Shirt", -"CBKSUHBADE 15in×11in Anime One Piece Wanted Bounty Posters", -"Plus Size Underwear for Women XL-5XL Cotton High Waist Women Briefs Full Coverage Ladies Panties 4 Pack", -"Little Adventures Enchanted Rapunzel Dress-Up Costume for Adult Women", -"G Gradual Tennis Dress for Women Golf Outfits with Shorts and Pockets Sleeveless Active Exercise Athletic Dresses for Women", -"Pastoral Style Porch Goose Outfits", -"Vive Thigh High Compression Stockings for Women & Men - 15-20 mmHg Graduated Support Hose - Opaque Closed Toe Compression Tights - Stockings for Varicose Veins", -"Canada is Not for Sale Vintage Cotton Twill Cap", -"TomTiger Yoga Shorts for Women Tummy Control High Waist Biker Shorts Exercise Workout Butt Lifting Tights Women s Short Pants", -"4PCS GOD IS FIRST IM SECOND Bracelet: Faith Priority Bracelet - Engraved Cross Silicone Wristband for Daily Encouragement", -"Tahitian Black Pearl Pendant Necklace AAAA 18K White Gold Plated 925 Sterling Silver Black Pearl Jewelry Gift for Women Mother Wife Her for Anniversary Christmas Birthday", -"HOTOUCH Womens Short Sleeve Button Down Shirts Loose Fit V Neck Business Casual Blouses Summer Top with Pockets S-XXL", -"Men s Corduroy Short Sleeved Cargo Shirt Relaxed Fit Button Down Casual Wear Tops with Flap Pockets", -"Orange Blue Light Blocking Glasses for Better Sleep - 99.5% Premium Acetate Migraine Glasses for Women & Men", -"Disney Stitch Beach Towel for Kids Cotton Bath Towels with 2 Clothes Pins Travel Swimming Quick Dry Towel Beach Vacation Essentials", -"PGANDS Womens Crew Neck Solid/Color Block Sweatshirts Long Sleeve Casual Lightweight Pullover Tops", -"Premium Organic Whole Cloves 5.3 oz (150 grams)", -"habibee Bra for Women No Underwire Comfort Seamless Bras Push Up Wireless Bras Full Coverage Bralettes", -"Puma Mens Caven 2.0 Shoes", -"PRETTYGARDEN Women s Fall Button Down Shirts Dressy Casual Spring Long Puff Sleeve Eyelet Loose Fit Collared Blouse Top", -"TNNZEET 2 Pack Plus Size Biker Shorts for Women - 8 Black High Waisted Tummy Control Spandex Workout Shorts (XL-4XL)", -"Marvel Legends Series Captain America Shield", -"PAVOI 14K Gold AAA+ Handpicked White Freshwater Cultured Pearl Earrings Studs", -"Trendy Queen Long Skirts for Women Boho Maxi Skirt Winter Swing Tiered A-Line Elastic High Waist Dress with Pockets Fashion", -"Reebok Classic Leather Sneakers for Men", -"PRETTYGARDEN Women s Summer Bodycon Maxi Tube Dress Ribbed Strapless Side Slit Long Going Out Casual Elegant Party Dresses", -"Favorite Daughter Women s Classic Logo Baseball Cap", -"Reebok Men s Cotton Vital Fleece Sweatpant", -"COOFANDY Mens Hawaiian Shirt Short Sleeve Button Down Shirts Tropical Summer Beach Shirts Casual Floral Aloha Shirts", -"Columbia Mens Grander Marlin Iii Offshore Short", -"Satin One Shoulder Flower Girl Dress with Bow Wedding Princess Pageant Party Gown Puffy Formal First Communion", -"Nike Mens V5 RNR", -"Speed Cube 3x3", -"FOURSTEEDS Women s Cotton Zipper Front Multi-Pocket Twill Bermuda Women Cargo Shorts", -"Curly Hair Brush Defining", -"YQXCC Cooling Towels | 4 Pack 47x12 | Ice Cool for Neck | Microfiber Soft Breathable Chilly | for Yoga", -"Hot Wheels Toy Car Playset with Lights", -"Carhartt Men s Loose Fit Heavyweight Short-Sleeve Pocket Henley T-Shirt", -"Women s Mid-High Rise Ripped Denim Shorts Stretchy Distressed Jean Shorts with Pockets Folded Hem Casual Summer Jorts", -"Monster High Cleo De Nile Doll in Golden Blouse & Layered Skirt", -"Ariat Women’s Fatbaby Western Boot", -"UYYE Car Registration and Insurance Card Holder", -"365 by Whole Foods Market", -"Crystal Bracelet for Women Fashion 7 Inch Approximately Rainbow Sparkling Crystal Bracelet with Adjustable Elastic Cord", -"Samsung Galaxy Watch 7 (44mm) AI Smartwatch w/ 1.5 AMOLED", -"DOUKEN 4 Pair Sneaker Creases Protector", -"Elvis: The Legend music word search puzzle.: Great Country Music Word Scrambles about Elvis. Large print word puzzle for adults and rock music lovers. ... Great music gift for your friends or family.", -"Pinkfong Bebefinn Plush Toy - 12 (30cm) Stuffed Doll | Soft Cuddly Plush for Toddlers | Bebefinn Toy | Perfect Birthday", -"Thrusting Dildo Vibrator Sex Toys for Women", -"VANLOVEMAC Baseball Gifts for Boys 8-12 Baseball Stuff College Going Away Gifts Welcome Back to School Gifts Dorm Room Essentials for Guys Off to College", -"Hello Kitty and Friends - Cinnamoroll 12” Pink Monochrome Plush", -"BOBISUKA Pearl White Face Body Paint", -"OMKAGI 2 Piece Workout Sets for Women Halter Sports Bras Gym Sets Booty Leggings Outfits", -"Ivay Womens Scoop Neck Ribbed Knit Tank Top Sleeveless Cotton Wife Beater Camisole Shirts", -"SOLY HUX Women s Graphic Tee Shirts Novelty Funny Short Sleeve Summer Casual Tops", -"Wooden Taper Candle Holders: Wood Candlestick Holders Rustic Brown Farmhouse Fall Decor for Living Room Dinning Table Centerpiece Christmas Set of 2", -"PRETTYGARDEN Long Sleeve Shirts for Women 2025 Fall V Neck Waffle Basic Tee Dressy Casual Winter Blouses Knit Tunic Tops", -"Ray-Ban RB2140 Original Wayfarer Square Sunglasses", -"Lee Womens Ultra Lux Comfort with Flex-to-go Utility Skimmer Capri Pant", -"3D Pedometer for Walking", -"HiiFeuer Medieval Faux Leather Chest Armor", -"Pet Deadly Dog Costume", -"Western Chief Kids Freestyle Neoprene Outdoor Boot", -"SKECHERS Women s Ultra Flex 3.0-Brilliant Path Hands Free Slip-INS Sneaker", -"LUOBO Keychain Accessory Decor Keychain Decoration backpacks Bag Pendant", -"10inch Teddy Bear Stuffed Animal", -"Halloweentown University T-Shirt for Women Fall Pumpkin Shirts Funny Halloween Thanksgiving Gift Tops", -"Women s Sexy American Flag Crop Tank 4th of July Patriotic Sleeveless Tee Tops", -"Gillette Fusion5 ProGlide Men s Razor Blade Refills", -"Poppy Playtime - Mommy Long Legs Plush (14 Medium Plush", -"Women’s Heated Vest with 12V 20000mAh Battery – Cropped Stand Collar Lightweight Insulated Winter Vest.", -"toolant Winter Work Gloves for Men", -"192Pcs Halloween Favors Stationery Gift Set", -"20 Pcs Ultra Thin Tattoo Cover up Patch Waterproof Tattoo Cover up Tape Sweatproof Tattoos Covers Patches Cuttable Invisible Non-Woven Fabric Patches for Tattoos Scar Birthmark 4.72×3.35In(Light Skin)", -"Popcorns Maker", -"Paladone Kuromi GloBuddies Night Light", -"Creativity for Kids Sensory Minis Dinosaur Kit | Cloud Clay Sensory Toy for Toddlers | Squish", -"Mouse Ears Headband Fully Sewn Sturdy Headbands 2-Pcs, 4.6-Inch Sequin Big Ears 3D Silk Satin Bowknot Suitable for Women and Girls Theme Role Play Costume Accessories Party", -"Tanluhu Sweatbands Sport Headbands for Men & Women", -"Pilates Reformer Machine", -"Fossil Fenmore Analog Men Watch", -"Stray Kids Official Lightstick Ver 2", -"Zima Dental Pod PRO: New Ultrasonic Retainer Cleaner Machine. Market-Leading", -"2300pcs Polymer Clay Beads Bracelet Making Kit", -"AI ACCESSORY INNOVATIONS Bluey 4 Piece Backpack Set for Pre School Girls & Boys", -"MIRITY Women s High Waist Cotton Underwear - Soft Full Coverage Briefs with Double-Layer Waistedband", -"Plus Size Summer Dresses - Floral Beach Wedding Guest Semi Formal Tiered Flowy Long Sundress", -"AUTOMET Womens Tops Summer Sweater Long Tunic Dressy Casual Blouses Business Cute Trendy Short Sleeve Shirt 2025", -"Black Sabbath Sketch Band T-Shirt", -"Loomie Upgraded 6 Drawer White Dresser for Bedroom", -"Michael Kors Womens Zuma Trainer", -"Chunky Silver Bohemian Flower Bracelet For Wemen Men", -"Classic Black Western Felt Roll Up Brim Cowboy and Cowgirl Hat for Women and Men - Decoration with Western Belt Bukle", -"Jellycat Little Pig Bag Charm", -"LARNMERN Steel Toe Work Boots Men", -"3PCS Gold Hair Ties", -"Red Kap Men s Snap Front Cotton Coverall", -"Citizen Quartz Mens Watch", -"ATHMILE Long Sleeve Shirts for Women Tunic Fall Tops Loose Fit Dressy Crew Neck Basic Sweaters 2025", -"Narecte Summer Maxi Dresses for Women Back Strap Beach Dress Women s Casual Dress Long Flowy Dresses for Vacation", -"LIDHAY Cowboy Hat for Women and Men Western Cowgirl Hats Suede Cowboy Hat for Rodeo", -"BIC Classic Maxi Pocket Lighter", -"A + S Luxxe Diaper Bag Tote – Stylish", -"100pack Name Badge Holders Name Tag Holder Clear Plastic Badge Holder ID Holders for Lanyard (100Pcs Vertical)", -"MOOSEA Christmas Gifts for Women Wife - Love Knot Moissanite Necklace 1-3ct D Color VVS1 Clarity Moissanite 925 Sterling Silver Necklace Anniversary Birthday Gifts for Women Wife Mom Girlfriend", -"Solid Wood Retangle End Table with Drawer and Storage Shelf", -"Madden Girl womens Beella Heeled SandalHeeled Sandal", -"Ekouaer 2 Pack Womens Pajama Sets Short Sleeve Sleepwear Soft Crew Neck Pj Shorts Set Printed Loungewear Set S-XXL", -"NPQQUAN Original Classic Low Profile Baseball Cap Golf Dad Hat Adjustable Cotton Hats Men Women Unconstructed Plain Cap", -"YEOREO Women Workout Biker Shorts Impact 4.5 No Front Seam Hidden Scrunch Lifting Seamless Yoga Gym Shorts", -"Merino Wool Underwear Men by Thermowave - Sport & Everyday Men s Merino Wool Boxer Brief - 150 GSM Stretchy & Soft", -"COACH Women s Leah Platform Loafers", -"Doodle Me Happy Kids Thank You Cards - 25 Cards With Envelopes - Cute", -"Spring Summer Women Pleated Casual Denim V Neck Ruffle Sleeve Dress Light Blue XL", -"Disney Hooded Matching Family Cosplay T-Shirt Infant to Adult Sizes (12 Months - 2XL)", -"Leather CPR Cleaner & Conditioner 18oz - Cleans", -"Baseball Shirts Women Baseball Mom Tshirt Baseball Heart Graphic Tee Game Day Gifts Funny Short Sleeve Tops", -"4 Pack Cooling Towels", -"ZEEPORTE Mask Fin Snorkel Set", -"60 Pcs Bride Tribe Bachelorette Party Favors Bulk Friendship Bridesmaid Gifts 12 Set Friendship Bracelets Heart Sunglasses Satin Scrunchie for Engagement Bridal Shower Wedding Favor", -"AUSELILY Summer Dress Sundress Beach Cover up Swing Dresses", -"Loungefly Disney Minnie Mouse Crossbody Satchel Handbag", -"Tactical Gym Bag for Men,50L Large 3 in 1 Sports Duffle Bag with Shoes Compartment for Travel", -"YETI Rambler 42 oz Tumbler with Handle and Straw Lid", -"Samsonite Classic Leather Slim Backpack", -"Vive Thigh High Compression Stockings for Women & Men - 15-20 mmHg Graduated Support Hose - Opaque Closed Toe Compression Tights - Stockings for Varicose Veins", -"Canada is Not for Sale Vintage Cotton Twill Cap", -"TomTiger Yoga Shorts for Women Tummy Control High Waist Biker Shorts Exercise Workout Butt Lifting Tights Women s Short Pants", -"4PCS GOD IS FIRST IM SECOND Bracelet: Faith Priority Bracelet - Engraved Cross Silicone Wristband for Daily Encouragement", -"Tahitian Black Pearl Pendant Necklace AAAA 18K White Gold Plated 925 Sterling Silver Black Pearl Jewelry Gift for Women Mother Wife Her for Anniversary Christmas Birthday", -"HOTOUCH Womens Short Sleeve Button Down Shirts Loose Fit V Neck Business Casual Blouses Summer Top with Pockets S-XXL", -"Men s Corduroy Short Sleeved Cargo Shirt Relaxed Fit Button Down Casual Wear Tops with Flap Pockets", -"Orange Blue Light Blocking Glasses for Better Sleep - 99.5% Premium Acetate Migraine Glasses for Women & Men", -"Disney Stitch Beach Towel for Kids Cotton Bath Towels with 2 Clothes Pins Travel Swimming Quick Dry Towel Beach Vacation Essentials", -"PGANDS Womens Crew Neck Solid/Color Block Sweatshirts Long Sleeve Casual Lightweight Pullover Tops", -"Premium Organic Whole Cloves 5.3 oz (150 grams)", -"habibee Bra for Women No Underwire Comfort Seamless Bras Push Up Wireless Bras Full Coverage Bralettes", -"Puma Mens Caven 2.0 Shoes", -"PRETTYGARDEN Women s Fall Button Down Shirts Dressy Casual Spring Long Puff Sleeve Eyelet Loose Fit Collared Blouse Top", -"TNNZEET 2 Pack Plus Size Biker Shorts for Women - 8 Black High Waisted Tummy Control Spandex Workout Shorts (XL-4XL)", -"Marvel Legends Series Captain America Shield", -"PAVOI 14K Gold AAA+ Handpicked White Freshwater Cultured Pearl Earrings Studs", -"Trendy Queen Long Skirts for Women Boho Maxi Skirt Winter Swing Tiered A-Line Elastic High Waist Dress with Pockets Fashion", -"Reebok Classic Leather Sneakers for Men", -"PRETTYGARDEN Women s Summer Bodycon Maxi Tube Dress Ribbed Strapless Side Slit Long Going Out Casual Elegant Party Dresses", -"Favorite Daughter Women s Classic Logo Baseball Cap", -"Reebok Men s Cotton Vital Fleece Sweatpant", -"COOFANDY Mens Hawaiian Shirt Short Sleeve Button Down Shirts Tropical Summer Beach Shirts Casual Floral Aloha Shirts", -"Columbia Mens Grander Marlin Iii Offshore Short", -"Satin One Shoulder Flower Girl Dress with Bow Wedding Princess Pageant Party Gown Puffy Formal First Communion", -"Nike Mens V5 RNR", -"Speed Cube 3x3", -"FOURSTEEDS Women s Cotton Zipper Front Multi-Pocket Twill Bermuda Women Cargo Shorts", -"Curly Hair Brush Defining", -"YQXCC Cooling Towels | 4 Pack 47x12 | Ice Cool for Neck | Microfiber Soft Breathable Chilly | for Yoga", -"Hot Wheels Toy Car Playset with Lights", -"Carhartt Men s Loose Fit Heavyweight Short-Sleeve Pocket Henley T-Shirt", -"Women s Mid-High Rise Ripped Denim Shorts Stretchy Distressed Jean Shorts with Pockets Folded Hem Casual Summer Jorts", -"Monster High Cleo De Nile Doll in Golden Blouse & Layered Skirt", -"Ariat Women’s Fatbaby Western Boot", -"UYYE Car Registration and Insurance Card Holder", -"365 by Whole Foods Market", -"Crystal Bracelet for Women Fashion 7 Inch Approximately Rainbow Sparkling Crystal Bracelet with Adjustable Elastic Cord", -"Samsung Galaxy Watch 7 (44mm) AI Smartwatch w/ 1.5 AMOLED", -"DOUKEN 4 Pair Sneaker Creases Protector", -"Elvis: The Legend music word search puzzle.: Great Country Music Word Scrambles about Elvis. Large print word puzzle for adults and rock music lovers. ... Great music gift for your friends or family.", -"Pinkfong Bebefinn Plush Toy - 12 (30cm) Stuffed Doll | Soft Cuddly Plush for Toddlers | Bebefinn Toy | Perfect Birthday", -"Thrusting Dildo Vibrator Sex Toys for Women", -"VANLOVEMAC Baseball Gifts for Boys 8-12 Baseball Stuff College Going Away Gifts Welcome Back to School Gifts Dorm Room Essentials for Guys Off to College", -"Hello Kitty and Friends - Cinnamoroll 12” Pink Monochrome Plush", -"BOBISUKA Pearl White Face Body Paint", -"OMKAGI 2 Piece Workout Sets for Women Halter Sports Bras Gym Sets Booty Leggings Outfits", -"Ivay Womens Scoop Neck Ribbed Knit Tank Top Sleeveless Cotton Wife Beater Camisole Shirts", -"SOLY HUX Women s Graphic Tee Shirts Novelty Funny Short Sleeve Summer Casual Tops", -"Wooden Taper Candle Holders: Wood Candlestick Holders Rustic Brown Farmhouse Fall Decor for Living Room Dinning Table Centerpiece Christmas Set of 2", -"PRETTYGARDEN Long Sleeve Shirts for Women 2025 Fall V Neck Waffle Basic Tee Dressy Casual Winter Blouses Knit Tunic Tops", -"Ray-Ban RB2140 Original Wayfarer Square Sunglasses", -"Lee Womens Ultra Lux Comfort with Flex-to-go Utility Skimmer Capri Pant", -"3D Pedometer for Walking", -"HiiFeuer Medieval Faux Leather Chest Armor", -"Pet Deadly Dog Costume", -"Western Chief Kids Freestyle Neoprene Outdoor Boot", -"SKECHERS Women s Ultra Flex 3.0-Brilliant Path Hands Free Slip-INS Sneaker", -"LUOBO Keychain Accessory Decor Keychain Decoration backpacks Bag Pendant", -"10inch Teddy Bear Stuffed Animal", -"Halloweentown University T-Shirt for Women Fall Pumpkin Shirts Funny Halloween Thanksgiving Gift Tops", -"Women s Sexy American Flag Crop Tank 4th of July Patriotic Sleeveless Tee Tops", -"Gillette Fusion5 ProGlide Men s Razor Blade Refills", -"Poppy Playtime - Mommy Long Legs Plush (14 Medium Plush", -"Women’s Heated Vest with 12V 20000mAh Battery – Cropped Stand Collar Lightweight Insulated Winter Vest.", -"toolant Winter Work Gloves for Men", -"192Pcs Halloween Favors Stationery Gift Set", -"20 Pcs Ultra Thin Tattoo Cover up Patch Waterproof Tattoo Cover up Tape Sweatproof Tattoos Covers Patches Cuttable Invisible Non-Woven Fabric Patches for Tattoos Scar Birthmark 4.72×3.35In(Light Skin)", -"Popcorns Maker", -"Paladone Kuromi GloBuddies Night Light", -"Creativity for Kids Sensory Minis Dinosaur Kit | Cloud Clay Sensory Toy for Toddlers | Squish", -"Mouse Ears Headband Fully Sewn Sturdy Headbands 2-Pcs, 4.6-Inch Sequin Big Ears 3D Silk Satin Bowknot Suitable for Women and Girls Theme Role Play Costume Accessories Party", -"Tanluhu Sweatbands Sport Headbands for Men & Women", -"Pilates Reformer Machine", -"Fossil Fenmore Analog Men Watch", -"Stray Kids Official Lightstick Ver 2", -"Zima Dental Pod PRO: New Ultrasonic Retainer Cleaner Machine. Market-Leading", -"2300pcs Polymer Clay Beads Bracelet Making Kit", -"AI ACCESSORY INNOVATIONS Bluey 4 Piece Backpack Set for Pre School Girls & Boys", -"MIRITY Women s High Waist Cotton Underwear - Soft Full Coverage Briefs with Double-Layer Waistedband", -"Plus Size Summer Dresses - Floral Beach Wedding Guest Semi Formal Tiered Flowy Long Sundress", -"AUTOMET Womens Tops Summer Sweater Long Tunic Dressy Casual Blouses Business Cute Trendy Short Sleeve Shirt 2025", -"Black Sabbath Sketch Band T-Shirt", -"Loomie Upgraded 6 Drawer White Dresser for Bedroom", -"Michael Kors Womens Zuma Trainer", -"Chunky Silver Bohemian Flower Bracelet For Wemen Men", -"Classic Black Western Felt Roll Up Brim Cowboy and Cowgirl Hat for Women and Men - Decoration with Western Belt Bukle", -"Jellycat Little Pig Bag Charm", -"LARNMERN Steel Toe Work Boots Men", -"3PCS Gold Hair Ties", -"Red Kap Men s Snap Front Cotton Coverall", -"Citizen Quartz Mens Watch", -"ATHMILE Long Sleeve Shirts for Women Tunic Fall Tops Loose Fit Dressy Crew Neck Basic Sweaters 2025", -"Narecte Summer Maxi Dresses for Women Back Strap Beach Dress Women s Casual Dress Long Flowy Dresses for Vacation", -"LIDHAY Cowboy Hat for Women and Men Western Cowgirl Hats Suede Cowboy Hat for Rodeo", -"BIC Classic Maxi Pocket Lighter", -"A + S Luxxe Diaper Bag Tote – Stylish", -"100pack Name Badge Holders Name Tag Holder Clear Plastic Badge Holder ID Holders for Lanyard (100Pcs Vertical)", -"MOOSEA Christmas Gifts for Women Wife - Love Knot Moissanite Necklace 1-3ct D Color VVS1 Clarity Moissanite 925 Sterling Silver Necklace Anniversary Birthday Gifts for Women Wife Mom Girlfriend", -"Solid Wood Retangle End Table with Drawer and Storage Shelf", -"Madden Girl womens Beella Heeled SandalHeeled Sandal", -"Ekouaer 2 Pack Womens Pajama Sets Short Sleeve Sleepwear Soft Crew Neck Pj Shorts Set Printed Loungewear Set S-XXL", -"NPQQUAN Original Classic Low Profile Baseball Cap Golf Dad Hat Adjustable Cotton Hats Men Women Unconstructed Plain Cap", -"YEOREO Women Workout Biker Shorts Impact 4.5 No Front Seam Hidden Scrunch Lifting Seamless Yoga Gym Shorts", -"Merino Wool Underwear Men by Thermowave - Sport & Everyday Men s Merino Wool Boxer Brief - 150 GSM Stretchy & Soft", -"COACH Women s Leah Platform Loafers", -"Doodle Me Happy Kids Thank You Cards - 25 Cards With Envelopes - Cute", -"Spring Summer Women Pleated Casual Denim V Neck Ruffle Sleeve Dress Light Blue XL", -"Disney Hooded Matching Family Cosplay T-Shirt Infant to Adult Sizes (12 Months - 2XL)", -"Leather CPR Cleaner & Conditioner 18oz - Cleans", -"Baseball Shirts Women Baseball Mom Tshirt Baseball Heart Graphic Tee Game Day Gifts Funny Short Sleeve Tops", -"4 Pack Cooling Towels", -"ZEEPORTE Mask Fin Snorkel Set", -"60 Pcs Bride Tribe Bachelorette Party Favors Bulk Friendship Bridesmaid Gifts 12 Set Friendship Bracelets Heart Sunglasses Satin Scrunchie for Engagement Bridal Shower Wedding Favor", -"AUSELILY Summer Dress Sundress Beach Cover up Swing Dresses", -"Loungefly Disney Minnie Mouse Crossbody Satchel Handbag", -"Tactical Gym Bag for Men,50L Large 3 in 1 Sports Duffle Bag with Shoes Compartment for Travel", -"YETI Rambler 42 oz Tumbler with Handle and Straw Lid", -"Samsonite Classic Leather Slim Backpack", -"Fabletics Men s Only Short", -"3pcs Heart Badge Reel with Alligator Clip Cute Retractable Badge Holder Acrylic Nurse Badge Clip for Office Workers", -"Ortho Balance Hiking Shoes for Men Women", -"GOLDENMATE 1000VA/600W Lithium UPS Battery Backup and Surge Protector", -"Gelante Solid Color 100% Cotton Bucket Hat for Women and Men Packable Travel Summer Beach Hat", -"Sonic The Hedgehog 3 Movie Action Figures 2.5-Inch Movie Collector Toy Figure Multi-Pack Includes Sonic The Hedgehog Knuckles Shadow Buzz Bomber & Drone- Officially Licensed Toys", -"61 Pcs Nacho Libre Stickers Comedy Movie Graffiti Waterproof Vinyl for Adults for Birthday Party Supplies Decoration Favors for Water Bottles Laptop Suitcase Scrapbooking Choice", -"Neck Lift Tape", -"925 Sterling Silver Earrings for Womens Sparkly Colorful Full Diamond Simple Stylish Elegant Hypoallergenic Jewelry", -"Pink Ceramic Bow Vase for Flowers", -"Winter Coats For Men Winter Jackets Water Resistant Warm Thicken Parka Puffer Coat Long Down Jacket", -"Alarm Clocks for Bedrooms", -"KINURI Running Belt for Men & Women – Fits All Smartphones – Waterproof Waist Pack with Adjustable Strap – Ideal for Jogging", -"DREAM PAIRS Heels for Women Flip Flops Kitten Low Heels Open Square Toe Thong Heeled Sandals", -"Amazon Basics All Purpose Washable School Craft Liquid Glue for Making Slime", -"Inflatable Costume Adult Frog Full Body Deluxe Funny Air Blow Up Costume for Men Women Halloween", -"Mens Golf Pants Stretch Casual Dress Pants Elastic Drawstring Slacks for Men Lightweight Trousers with 5 Pockets", -"Lip Smacker Hello Kitty Lip Balm", -"Brown Sugar Keeper 3D – Terracotta Clay Bear Softener", -"MEETSUN Polarized Sunglasses for Women Men Trendy Classic Retro Designer Style", -"Corset Top Bustier Lingerie for Women Zipper Front Flower Sexy Burlesque Vintage", -"Pro Club Men s Heavyweight Mesh Basketball Shorts", -"Nike Tech Men s Full-Zip Windrunner Hoodie (HV0949-237", -"Ear Piercing Kit", -"Timberland Men s 6 Premium Boot", -"Nike Air Rift", -"Portable Hookah Set for Travel - Premium Handheld Glass Aluminum Mini Hookah Real Metal Accessories", -"Clear Backpack for Boys", -"Women’s Knee High Boots Round Toe Chunky Heel Faux Leather Tall Riding Boots with Side Zipper", -"Golf Grip Trainer & Connection Band 2Set", -"Monster High Self Scare Day Cleo De Nile Doll Play Set", -"Fortnite eGift Card - Powered by the Epic Games Store", -"Mesh Beach Bags", -"Crowye Anime Cosplay Costume for Halloween Princess Costume Accessories Anime White Cosplay Wig Egypt Arm Cuff Bracelet Gold Earrings Greek Goddess Set for Halloween Dress up Princess", -"Premium Women s Leather Tote Handbag - Bag for Everyday Use", -"Ekouaer Maternity Nursing Gown and Robe Set Labor Delivery Nursing Nightgowns for Breastfeeding Pregnancy Clothes", -"Superband Mermaid Tails for Swimming for Women and Adults Without Monofin", -"Pink Queen Women s 2025 Casual Pullover Sweaters Sexy V Neck Long Sleeve Twist Knot Cropped Knit Sweater Tops" - ], - "top_n":386, - "normalize": true - }' - -end=$(date +%s%N) # 结束时间,纳秒级 -duration=$(( (end - start) / 1000000 )) # 转换为毫秒 -echo "Command took $duration milliseconds." - - diff --git a/tests/reranker_performance/curl2.sh b/tests/reranker_performance/curl2.sh deleted file mode 100644 index f5f894a..0000000 --- a/tests/reranker_performance/curl2.sh +++ /dev/null @@ -1,26 +0,0 @@ -#!/bin/bash - -start=$(date +%s%N) # 开始时间,纳秒级 - -# 将 titles.400 每一行转成 JSON 数组 -documents_json=$(jq -R -s 'split("\n") | map(select(length > 0))' /data/saas-search/tests/data/titles.400) -#echo $documents_json -#exit - -time curl -X POST "http://10.200.16.14:9997/v1/rerank" \ - -H "accept: application/json" \ - -H "Content-Type: application/json" \ - -d "$(jq -n \ - --arg model "Qwen3-Reranker-0.6B" \ - --arg query "健身女生T恤短袖" \ - --argjson documents "$documents_json" \ - '{ - model: $model, - query: $query, - documents: $documents - }')" \ - -i - -end=$(date +%s%N) # 结束时间,纳秒级 -duration=$(( (end - start) / 1000000 )) # 转换为毫秒 -echo "Command took $duration milliseconds." diff --git a/tests/reranker_performance/rerank_performance_compare.sh b/tests/reranker_performance/rerank_performance_compare.sh deleted file mode 100644 index 32539d7..0000000 --- a/tests/reranker_performance/rerank_performance_compare.sh +++ /dev/null @@ -1,117 +0,0 @@ -#!/bin/bash - -set -u - -FILE="/data/saas-search/tests/data/titles.1.8w" -ROUNDS=10 -SAMPLE_SIZE=400 - -if [ ! -f "$FILE" ]; then - echo "文件不存在: $FILE" - exit 1 -fi - -# 生成随机 400 行文本,并转成 JSON 数组 -generate_docs_json() { - shuf -n "$SAMPLE_SIZE" "$FILE" | jq -R -s 'split("\n")[:-1]' -} - -# 统计汇总 -summarize_times() { - local name="$1" - shift - local arr=("$@") - local total=0 - local min=${arr[0]} - local max=${arr[0]} - local count=${#arr[@]} - - for t in "${arr[@]}"; do - total=$((total + t)) - if [ "$t" -lt "$min" ]; then - min=$t - fi - if [ "$t" -gt "$max" ]; then - max=$t - fi - done - - local avg=$((total / count)) - - echo "========================================" - echo "$name 汇总" - echo "测试次数: $count" - echo "总耗时: ${total} ms" - echo "平均耗时: ${avg} ms" - echo "最小耗时: ${min} ms" - echo "最大耗时: ${max} ms" - echo "========================================" -} - -echo "开始测试..." -echo "数据文件: $FILE" -echo "每次随机抽样: $SAMPLE_SIZE 行" -echo "每个测试对象执行次数: $ROUNDS" -echo - -times_obj1=() -times_obj2=() - -for ((i=1; i<=ROUNDS; i++)); do - echo "---------- 第 $i 轮 ----------" - - # 每轮随机生成一组 400 行 - DOCS_JSON=$(generate_docs_json) - - # 测试对象1 - PAYLOAD1=$(jq -n \ - --arg query "健身女生T恤短袖" \ - --argjson docs "$DOCS_JSON" \ - --argjson top_n 386 \ - --argjson normalize true \ - '{ - query: $query, - docs: $docs, - top_n: $top_n, - normalize: $normalize - }') - - start1=$(date +%s%N) - curl -s -o /dev/null -X POST "http://localhost:6007/rerank" \ - -H "Content-Type: application/json" \ - -d "$PAYLOAD1" - end1=$(date +%s%N) - duration1=$(( (end1 - start1) / 1000000 )) - times_obj1+=("$duration1") - echo "测试对象1 第 $i 次耗时: ${duration1} ms" - - # 测试对象2 - PAYLOAD2=$(jq -n \ - --arg model "Qwen3-Reranker-0.6B" \ - --arg query "什么是机器学习" \ - --argjson documents "$DOCS_JSON" \ - '{ - model: $model, - query: $query, - documents: $documents - }') - - start2=$(date +%s%N) - curl -s -o /dev/null -X POST "http://10.200.16.14:9997/v1/rerank" \ - -H "accept: application/json" \ - -H "Content-Type: application/json" \ - -d "$PAYLOAD2" - end2=$(date +%s%N) - duration2=$(( (end2 - start2) / 1000000 )) - times_obj2+=("$duration2") - echo "测试对象2 第 $i 次耗时: ${duration2} ms" - - echo -done - -echo -echo "测试完成,开始汇总..." -echo - -summarize_times "测试对象1" "${times_obj1[@]}" -summarize_times "测试对象2" "${times_obj2[@]}" diff --git a/translation/README.md b/translation/README.md index b2c9b37..810424d 100644 --- a/translation/README.md +++ b/translation/README.md @@ -12,8 +12,8 @@ - 启动脚本:[`scripts/start_translator.sh`](/data/saas-search/scripts/start_translator.sh) - 虚拟环境:[`scripts/setup_translator_venv.sh`](/data/saas-search/scripts/setup_translator_venv.sh) - 模型下载:[`scripts/download_translation_models.py`](/data/saas-search/scripts/download_translation_models.py) -- 本地模型压测:[`scripts/benchmark_translation_local_models.py`](/data/saas-search/scripts/benchmark_translation_local_models.py) -- 聚焦压测脚本:[`scripts/benchmark_translation_local_models_focus.py`](/data/saas-search/scripts/benchmark_translation_local_models_focus.py) +- 本地模型压测:[`benchmarks/translation/benchmark_translation_local_models.py`](/data/saas-search/benchmarks/translation/benchmark_translation_local_models.py) +- 聚焦压测脚本:[`benchmarks/translation/benchmark_translation_local_models_focus.py`](/data/saas-search/benchmarks/translation/benchmark_translation_local_models_focus.py) - 基线性能报告:[`perf_reports/20260318/translation_local_models/README.md`](/data/saas-search/perf_reports/20260318/translation_local_models/README.md) - CT2 扩展报告:[`perf_reports/20260318/translation_local_models_ct2/README.md`](/data/saas-search/perf_reports/20260318/translation_local_models_ct2/README.md) - CT2 聚焦调优报告:[`perf_reports/20260318/translation_local_models_ct2_focus/README.md`](/data/saas-search/perf_reports/20260318/translation_local_models_ct2_focus/README.md) @@ -550,8 +550,8 @@ curl -X POST http://127.0.0.1:6006/translate \ - 切换到 CTranslate2 后需要重新跑一轮基准,尤其关注 `nllb-200-distilled-600m` 的单条延迟、并发 tail latency 和 `opus-mt-*` 的 batch throughput。 性能脚本: -- [`scripts/benchmark_translation_local_models.py`](/data/saas-search/scripts/benchmark_translation_local_models.py) -- [`scripts/benchmark_translation_local_models_focus.py`](/data/saas-search/scripts/benchmark_translation_local_models_focus.py) +- [`benchmarks/translation/benchmark_translation_local_models.py`](/data/saas-search/benchmarks/translation/benchmark_translation_local_models.py) +- [`benchmarks/translation/benchmark_translation_local_models_focus.py`](/data/saas-search/benchmarks/translation/benchmark_translation_local_models_focus.py) 数据集: - [`products_analyzed.csv`](/data/saas-search/products_analyzed.csv) @@ -601,14 +601,14 @@ curl -X POST http://127.0.0.1:6006/translate \ ```bash cd /data/saas-search -./.venv-translator/bin/python scripts/benchmark_translation_local_models.py +./.venv-translator/bin/python benchmarks/translation/benchmark_translation_local_models.py ``` 本轮扩展压测复现命令: ```bash cd /data/saas-search -./.venv-translator/bin/python scripts/benchmark_translation_local_models.py \ +./.venv-translator/bin/python benchmarks/translation/benchmark_translation_local_models.py \ --suite extended \ --disable-cache \ --serial-items-per-case 256 \ @@ -620,7 +620,7 @@ cd /data/saas-search 单模型扩展压测示例: ```bash -./.venv-translator/bin/python scripts/benchmark_translation_local_models.py \ +./.venv-translator/bin/python benchmarks/translation/benchmark_translation_local_models.py \ --single \ --suite extended \ --model opus-mt-zh-en \ @@ -639,7 +639,7 @@ cd /data/saas-search 单条请求延迟复现: ```bash -./.venv-translator/bin/python scripts/benchmark_translation_local_models.py \ +./.venv-translator/bin/python benchmarks/translation/benchmark_translation_local_models.py \ --single \ --suite extended \ --model nllb-200-distilled-600m \ -- libgit2 0.21.2