Commit d3dd01d3687413795804d7c9164f10d8aadc585d
1 parent
628ff04f
自动寻参:
- 把 batch timeout 改成“可无限长跑”: - [tune_fusion.py](/data/saas-search/scripts/evaluation/tune_fusion.py:400) - 现在 `--batch-eval-timeout-sec <= 0` 时,不再给 `subprocess.run` 设置 Python 层超时 - 新增 resilient wrapper,负责自动续跑: - [run_coarse_fusion_tuning_resilient.sh](/data/saas-search/scripts/evaluation/run_coarse_fusion_tuning_resilient.sh) - 逻辑是:检查 `trials.jsonl` 里已完成的 live eval 数量,没到 `max_evals` 就继续 `resume-run` - 即使异常退出,也会 sleep 后自动从已有 `run_dir` 继续 - 启动/续跑脚本都切到 resilient 模式: - [start_coarse_fusion_tuning_long.sh](/data/saas-search/scripts/evaluation/start_coarse_fusion_tuning_long.sh) - [resume_coarse_fusion_tuning_long.sh](/data/saas-search/scripts/evaluation/resume_coarse_fusion_tuning_long.sh) **当前任务** - `run_name`: `coarse_fusion_clothing_top771_resilient_20260422T091650Z` - `run_dir`: [coarse_fusion_clothing_top771_resilient_20260422T091650Z](/data/saas-search/artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z) - `launch log`: [coarse_fusion_clothing_top771_resilient_20260422T091650Z.log](/data/saas-search/artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_resilient_20260422T091650Z.log) **已确认** - wrapper 已启动并进入 `attempt=1` - 真正传入的是 `--batch-eval-timeout-sec 0` - `tune_fusion.py` 正在运行 - `build_annotation_set.py batch` 已经在运行 - `eval.log` 已经打出这轮的前几条 query 评测进度,说明不是空转 **监控方式** - `tail -f artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_resilient_20260422T091650Z.log` - `tail -f logs/eval.log` - `tail -f artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/trials.jsonl` - `cat artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/leaderboard.csv` **这次和上次的关键区别** - 上次是“单轮 batch 被 Python 超时截断” - 这次是“单轮不设 Python 超时 + 外层 wrapper 自动续跑” - 所以长时间运行、中途中断、再恢复,都会沿着同一个 `run_dir` 往下推进
Showing
19 changed files
with
473 additions
and
184 deletions
Show diff stats
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_20260422T023815Z.cmd
0 → 100644
| ... | ... | @@ -0,0 +1 @@ |
| 1 | +python scripts/evaluation/tune_fusion.py --mode optimize --run-name coarse_fusion_clothing_top771_20260422T023815Z --search-space scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml --seed-report artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md --tenant-id 163 --dataset-id clothing_top771 --queries-file scripts/evaluation/queries/queries.txt --top-k 100 --language en --search-base-url http://127.0.0.1:6002 --eval-web-base-url http://127.0.0.1:6010 --max-evals 18 --batch-size 2 --candidate-pool-size 160 --random-seed 20260422 | ... | ... |
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_20260422T023815Z.pid
0 → 100644
| ... | ... | @@ -0,0 +1 @@ |
| 1 | +3843738 | ... | ... |
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_20260422T023951Z.cmd
0 → 100644
| ... | ... | @@ -0,0 +1 @@ |
| 1 | +python scripts/evaluation/tune_fusion.py --mode optimize --run-name coarse_fusion_clothing_top771_20260422T023951Z --search-space scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml --seed-report artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md --tenant-id 163 --dataset-id clothing_top771 --queries-file scripts/evaluation/queries/queries.txt --top-k 100 --language en --search-base-url http://127.0.0.1:6002 --eval-web-base-url http://127.0.0.1:6010 --max-evals 18 --batch-size 2 --candidate-pool-size 160 --random-seed 20260422 | ... | ... |
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_20260422T023951Z.pid
0 → 100644
| ... | ... | @@ -0,0 +1 @@ |
| 1 | +3845416 | ... | ... |
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_dryrun.cmd
0 → 100644
| ... | ... | @@ -0,0 +1 @@ |
| 1 | +python scripts/evaluation/tune_fusion.py --mode optimize --run-name coarse_fusion_clothing_top771_dryrun --search-space scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml --seed-report artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T021002Z_5426bba1a6/report.md --tenant-id 163 --dataset-id clothing_top771 --queries-file scripts/evaluation/queries/queries.txt --top-k 100 --language en --search-base-url http://127.0.0.1:6002 --eval-web-base-url http://127.0.0.1:6010 --max-evals 18 --batch-size 2 --candidate-pool-size 160 --random-seed 20260422 --help | ... | ... |
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_dryrun.pid
0 → 100644
| ... | ... | @@ -0,0 +1 @@ |
| 1 | +3842050 | ... | ... |
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_dryrun2.cmd
0 → 100644
| ... | ... | @@ -0,0 +1 @@ |
| 1 | +python scripts/evaluation/tune_fusion.py --mode optimize --run-name coarse_fusion_clothing_top771_dryrun2 --search-space scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml --seed-report artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md --tenant-id 163 --dataset-id clothing_top771 --queries-file scripts/evaluation/queries/queries.txt --top-k 100 --language en --search-base-url http://127.0.0.1:6002 --eval-web-base-url http://127.0.0.1:6010 --max-evals 18 --batch-size 2 --candidate-pool-size 160 --random-seed 20260422 --help | ... | ... |
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_dryrun2.pid
0 → 100644
| ... | ... | @@ -0,0 +1 @@ |
| 1 | +3843512 | ... | ... |
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_resilient_dryrun.cmd
0 → 100644
| ... | ... | @@ -0,0 +1 @@ |
| 1 | +bash scripts/evaluation/run_coarse_fusion_tuning_resilient.sh coarse_fusion_clothing_top771_resilient_dryrun clothing_top771 18 2 160 20260422 scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md --help | ... | ... |
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_resilient_dryrun.pid
0 → 100644
| ... | ... | @@ -0,0 +1 @@ |
| 1 | +4126011 | ... | ... |
config/config-with-reranker.yaml
| ... | ... | @@ -260,6 +260,7 @@ function_score: |
| 260 | 260 | score_mode: sum |
| 261 | 261 | boost_mode: multiply |
| 262 | 262 | functions: [] |
| 263 | + | |
| 263 | 264 | coarse_rank: |
| 264 | 265 | enabled: true |
| 265 | 266 | input_window: 480 |
| ... | ... | @@ -271,7 +272,7 @@ coarse_rank: |
| 271 | 272 | text_exponent: 0.35 |
| 272 | 273 | # base_query_trans_* 相对 base_query 的权重(见 search/rerank_client 中文本 dismax 融合) |
| 273 | 274 | # 因为es的打分已经给了trans进行了折扣,所以这里不再继续折扣 |
| 274 | - text_translation_weight: 0.8 | |
| 275 | + text_translation_weight: 1.0 | |
| 275 | 276 | knn_text_weight: 1.0 |
| 276 | 277 | knn_image_weight: 2.0 |
| 277 | 278 | knn_tie_breaker: 0.3 | ... | ... |
config/config.yaml
| ... | ... | @@ -100,11 +100,8 @@ es_settings: |
| 100 | 100 | number_of_shards: 1 |
| 101 | 101 | number_of_replicas: 0 |
| 102 | 102 | refresh_interval: 30s |
| 103 | - | |
| 104 | -# 统一按“字段基名”配置;查询时按实际检索语言动态拼接 .{lang} | |
| 105 | 103 | field_boosts: |
| 106 | 104 | title: 3.0 |
| 107 | - # qanchors enriched_tags 在 enriched_attributes.value中也存在,所以其实他的权重为自身权重+enriched_attributes.value的权重 | |
| 108 | 105 | qanchors: 1.0 |
| 109 | 106 | enriched_tags: 1.0 |
| 110 | 107 | enriched_attributes.value: 1.5 |
| ... | ... | @@ -118,7 +115,6 @@ field_boosts: |
| 118 | 115 | brief: 1.0 |
| 119 | 116 | description: 1.0 |
| 120 | 117 | vendor: 1.0 |
| 121 | - | |
| 122 | 118 | query_config: |
| 123 | 119 | supported_languages: |
| 124 | 120 | - zh |
| ... | ... | @@ -126,16 +122,12 @@ query_config: |
| 126 | 122 | default_language: en |
| 127 | 123 | enable_text_embedding: true |
| 128 | 124 | enable_query_rewrite: true |
| 129 | - | |
| 130 | - zh_to_en_model: nllb-200-distilled-600m # nllb-200-distilled-600m deepl opus-mt-zh-en / opus-mt-en-zh | |
| 125 | + zh_to_en_model: nllb-200-distilled-600m | |
| 131 | 126 | en_to_zh_model: nllb-200-distilled-600m |
| 132 | 127 | default_translation_model: nllb-200-distilled-600m |
| 133 | - # 源语种不在 index_languages时翻译质量比较重要,因此单独配置 | |
| 134 | 128 | zh_to_en_model__source_not_in_index: deepl |
| 135 | 129 | en_to_zh_model__source_not_in_index: deepl |
| 136 | 130 | default_translation_model__source_not_in_index: deepl |
| 137 | - | |
| 138 | - # 查询解析阶段:翻译与 query 向量并发执行,共用同一等待预算(毫秒) | |
| 139 | 131 | translation_embedding_wait_budget_ms_source_in_index: 300 |
| 140 | 132 | translation_embedding_wait_budget_ms_source_not_in_index: 400 |
| 141 | 133 | style_intent: |
| ... | ... | @@ -165,31 +157,22 @@ query_config: |
| 165 | 157 | enabled: true |
| 166 | 158 | dictionary_path: config/dictionaries/product_title_exclusion.tsv |
| 167 | 159 | search_fields: |
| 168 | - # 统一按“字段基名”配置;查询时按实际检索语言动态拼接 .{lang} | |
| 169 | 160 | multilingual_fields: |
| 170 | 161 | - title |
| 171 | 162 | - keywords |
| 172 | 163 | - qanchors |
| 173 | 164 | - enriched_tags |
| 174 | 165 | - enriched_attributes.value |
| 175 | - # - enriched_taxonomy_attributes.value | |
| 176 | 166 | - option1_values |
| 177 | 167 | - option2_values |
| 178 | 168 | - option3_values |
| 179 | 169 | - category_path |
| 180 | 170 | - category_name_text |
| 181 | - # - brief | |
| 182 | - # - description | |
| 183 | - # - vendor | |
| 184 | - # shared_fields: 无语言后缀字段;示例: tags, option1_values, option2_values, option3_values | |
| 185 | - | |
| 186 | 171 | shared_fields: null |
| 187 | 172 | core_multilingual_fields: |
| 188 | 173 | - title |
| 189 | 174 | - qanchors |
| 190 | 175 | - category_name_text |
| 191 | - | |
| 192 | - # 文本召回(主查询 + 翻译查询) | |
| 193 | 176 | text_query_strategy: |
| 194 | 177 | base_minimum_should_match: 60% |
| 195 | 178 | translation_minimum_should_match: 60% |
| ... | ... | @@ -206,8 +189,6 @@ query_config: |
| 206 | 189 | phrase_match_boost: 3.0 |
| 207 | 190 | text_embedding_field: title_embedding |
| 208 | 191 | image_embedding_field: image_embedding.vector |
| 209 | - | |
| 210 | - # null表示返回所有字段,[]表示不返回任何字段 | |
| 211 | 192 | source_fields: |
| 212 | 193 | - spu_id |
| 213 | 194 | - handle |
| ... | ... | @@ -223,13 +204,8 @@ query_config: |
| 223 | 204 | - category1_name |
| 224 | 205 | - category2_name |
| 225 | 206 | - category3_name |
| 226 | - # - tags | |
| 227 | - # - keywords | |
| 228 | - # - qanchors | |
| 229 | - # - enriched_tags | |
| 230 | 207 | - enriched_attributes |
| 231 | 208 | - enriched_taxonomy_attributes |
| 232 | - | |
| 233 | 209 | - min_price |
| 234 | 210 | - compare_at_price |
| 235 | 211 | - image_url |
| ... | ... | @@ -245,17 +221,14 @@ query_config: |
| 245 | 221 | - option3_values |
| 246 | 222 | - specifications |
| 247 | 223 | - skus |
| 248 | - | |
| 249 | - # KNN:文本向量与多模态(图片)向量各自 boost 与召回(k / num_candidates) | |
| 250 | 224 | knn_text_boost: 4 |
| 251 | 225 | knn_image_boost: 4 |
| 252 | 226 | knn_text_k: 160 |
| 253 | - knn_text_num_candidates: 560 # k * 3.4 | |
| 227 | + knn_text_num_candidates: 560 | |
| 254 | 228 | knn_text_k_long: 400 |
| 255 | 229 | knn_text_num_candidates_long: 1200 |
| 256 | 230 | knn_image_k: 400 |
| 257 | 231 | knn_image_num_candidates: 1200 |
| 258 | - | |
| 259 | 232 | function_score: |
| 260 | 233 | score_mode: sum |
| 261 | 234 | boost_mode: multiply |
| ... | ... | @@ -269,20 +242,18 @@ coarse_rank: |
| 269 | 242 | es_exponent: 0.05 |
| 270 | 243 | text_bias: 0.1 |
| 271 | 244 | text_exponent: 0.35 |
| 272 | - # base_query_trans_* 相对 base_query 的权重(见 search/rerank_client 中文本 dismax 融合) | |
| 273 | - # 因为es的打分已经给了trans进行了折扣,所以这里不再继续折扣 | |
| 274 | 245 | text_translation_weight: 1.0 |
| 275 | 246 | knn_text_weight: 1.0 |
| 276 | 247 | knn_image_weight: 2.0 |
| 277 | 248 | knn_tie_breaker: 0.3 |
| 278 | - knn_bias: 0.2 | |
| 279 | - knn_exponent: 5.6 | |
| 249 | + knn_bias: 0.6 | |
| 250 | + knn_exponent: 0.4 | |
| 280 | 251 | knn_text_bias: 0.2 |
| 281 | 252 | knn_text_exponent: 0.0 |
| 282 | 253 | knn_image_bias: 0.2 |
| 283 | 254 | knn_image_exponent: 0.0 |
| 284 | 255 | fine_rank: |
| 285 | - enabled: false # false 时保序透传 | |
| 256 | + enabled: false | |
| 286 | 257 | input_window: 160 |
| 287 | 258 | output_window: 80 |
| 288 | 259 | timeout_sec: 10.0 |
| ... | ... | @@ -290,7 +261,7 @@ fine_rank: |
| 290 | 261 | rerank_doc_template: '{title}' |
| 291 | 262 | service_profile: fine |
| 292 | 263 | rerank: |
| 293 | - enabled: false # false 时保序透传 | |
| 264 | + enabled: false | |
| 294 | 265 | rerank_window: 160 |
| 295 | 266 | exact_knn_rescore_enabled: true |
| 296 | 267 | exact_knn_rescore_window: 160 |
| ... | ... | @@ -300,10 +271,6 @@ rerank: |
| 300 | 271 | rerank_query_template: '{query}' |
| 301 | 272 | rerank_doc_template: '{title}' |
| 302 | 273 | service_profile: default |
| 303 | - # 乘法融合:fused = Π (max(score,0) + bias) ** exponent(es / rerank / fine / text / knn) | |
| 304 | - # 其中 knn_score 先做一层 dis_max: | |
| 305 | - # max(knn_text_weight * text_knn, knn_image_weight * image_knn) | |
| 306 | - # + knn_tie_breaker * 另一侧较弱信号 | |
| 307 | 274 | fusion: |
| 308 | 275 | es_bias: 10.0 |
| 309 | 276 | es_exponent: 0.05 |
| ... | ... | @@ -312,7 +279,6 @@ rerank: |
| 312 | 279 | fine_bias: 0.1 |
| 313 | 280 | fine_exponent: 1.0 |
| 314 | 281 | text_bias: 0.1 |
| 315 | - # base_query_trans_* 相对 base_query 的权重(见 search/rerank_client 中文本 dismax 融合) | |
| 316 | 282 | text_exponent: 0.25 |
| 317 | 283 | text_translation_weight: 0.8 |
| 318 | 284 | knn_text_weight: 1.0 |
| ... | ... | @@ -320,7 +286,6 @@ rerank: |
| 320 | 286 | knn_tie_breaker: 0.3 |
| 321 | 287 | knn_bias: 0.0 |
| 322 | 288 | knn_exponent: 5.6 |
| 323 | - | |
| 324 | 289 | services: |
| 325 | 290 | translation: |
| 326 | 291 | service_url: http://127.0.0.1:6006 |
| ... | ... | @@ -330,9 +295,6 @@ services: |
| 330 | 295 | cache: |
| 331 | 296 | ttl_seconds: 62208000 |
| 332 | 297 | sliding_expiration: true |
| 333 | - # When false, cache keys are exact-match per request model only (ignores model_quality_tiers for lookups) | |
| 334 | - # Higher tier = better quality. Multiple models may share one tier (同级). | |
| 335 | - # A request may reuse Redis keys from models with tier > A or tier == A (not from lower tiers). | |
| 336 | 298 | enable_model_quality_tier_cache: true |
| 337 | 299 | model_quality_tiers: |
| 338 | 300 | deepl: 30 |
| ... | ... | @@ -443,10 +405,7 @@ services: |
| 443 | 405 | device: cuda |
| 444 | 406 | batch_size: 32 |
| 445 | 407 | normalize_embeddings: true |
| 446 | - # 服务内图片后端(embedding 进程启动时读取;cnclip gRPC 与 6008 须同一 model_name) | |
| 447 | - # Chinese-CLIP:ViT-H-14 → 1024 维,ViT-L-14 → 768 维。须与 mappings/search_products.json 中 | |
| 448 | - # image_embedding.vector.dims 一致(当前索引为 1024 → 默认 ViT-H-14)。 | |
| 449 | - image_backend: clip_as_service # clip_as_service | local_cnclip | |
| 408 | + image_backend: clip_as_service | |
| 450 | 409 | image_backends: |
| 451 | 410 | clip_as_service: |
| 452 | 411 | server: grpc://127.0.0.1:51000 |
| ... | ... | @@ -472,7 +431,6 @@ services: |
| 472 | 431 | request: |
| 473 | 432 | max_docs: 1000 |
| 474 | 433 | normalize: true |
| 475 | - # 命名实例:同一套 reranker 代码按实例名读取不同端口 / 后端 / runtime 目录。 | |
| 476 | 434 | default_instance: default |
| 477 | 435 | instances: |
| 478 | 436 | default: |
| ... | ... | @@ -515,31 +473,11 @@ services: |
| 515 | 473 | enforce_eager: false |
| 516 | 474 | infer_batch_size: 100 |
| 517 | 475 | sort_by_doc_length: true |
| 518 | - | |
| 519 | - # standard=_format_instruction__standard(固定 yes/no system);compact=_format_instruction(instruction 作 system 且 user 内重复 Instruct) | |
| 520 | - instruction_format: standard # compact standard | |
| 521 | - # instruction: "Given a query, score the product for relevance" | |
| 522 | - # "rank products by given query" 比 “Given a query, score the product for relevance” 更好点 | |
| 523 | - # instruction: "rank products by given query, category match first" | |
| 524 | - # instruction: "Rank products by query relevance, prioritizing category match" | |
| 525 | - # instruction: "Rank products by query relevance, prioritizing category and style match" | |
| 526 | - # instruction: "Rank by query relevance, prioritize category & style" | |
| 527 | - # instruction: "Relevance ranking: category & style match first" | |
| 528 | - # instruction: "Score product relevance by query with category & style match prioritized" | |
| 529 | - # instruction: "Rank products by query with category & style match prioritized" | |
| 530 | - # instruction: "Given a fashion shopping query, retrieve relevant products that answer the query" | |
| 476 | + instruction_format: standard | |
| 531 | 477 | instruction: rank products by given query |
| 532 | - | |
| 533 | - # vLLM LLM.score()(跨编码打分)。独立高性能环境 .venv-reranker-score(vllm 0.18 固定版):./scripts/setup_reranker_venv.sh qwen3_vllm_score | |
| 534 | - # 与 qwen3_vllm 可共用同一 model_name / HF 缓存;venv 分离以便升级 vLLM 而不影响 generate 后端。 | |
| 535 | 478 | qwen3_vllm_score: |
| 536 | 479 | model_name: Qwen/Qwen3-Reranker-0.6B |
| 537 | - # 官方 Hub 原版需 true;若改用已转换的 seq-cls 权重(如 tomaarsen/...-seq-cls)则设为 false | |
| 538 | 480 | use_original_qwen3_hf_overrides: true |
| 539 | - # vllm_runner: "auto" | |
| 540 | - # vllm_convert: "auto" | |
| 541 | - # 可选:在 use_original_qwen3_hf_overrides 为 true 时与内置 overrides 合并 | |
| 542 | - # hf_overrides: {} | |
| 543 | 481 | engine: vllm |
| 544 | 482 | max_model_len: 172 |
| 545 | 483 | tensor_parallel_size: 1 |
| ... | ... | @@ -549,10 +487,7 @@ services: |
| 549 | 487 | enforce_eager: false |
| 550 | 488 | infer_batch_size: 80 |
| 551 | 489 | sort_by_doc_length: true |
| 552 | - # 默认 standard 与 vLLM 官方 Qwen3 reranker 前缀一致 | |
| 553 | - instruction_format: standard # compact standard | |
| 554 | - # instruction: "Rank products by query with category & style match prioritized" | |
| 555 | - # instruction: "Given a shopping query, rank products by relevance" | |
| 490 | + instruction_format: standard | |
| 556 | 491 | instruction: Rank products by query with category & style match prioritized |
| 557 | 492 | qwen3_transformers: |
| 558 | 493 | model_name: Qwen/Qwen3-Reranker-0.6B |
| ... | ... | @@ -620,25 +555,19 @@ services: |
| 620 | 555 | endpoint: https://dashscope.aliyuncs.com/compatible-api/v1/reranks |
| 621 | 556 | api_key_env: RERANK_DASHSCOPE_API_KEY_CN |
| 622 | 557 | timeout_sec: 10.0 |
| 623 | - top_n_cap: 0 # 0 表示 top_n=当前请求文档数 | |
| 624 | - batchsize: 64 # 0 关闭;>0 启用并发小包调度(top_n/top_n_cap 仍生效,分包后全局截断) | |
| 558 | + top_n_cap: 0 | |
| 559 | + batchsize: 64 | |
| 625 | 560 | instruct: Given a shopping query, rank product titles by relevance |
| 626 | 561 | max_retries: 2 |
| 627 | 562 | retry_backoff_sec: 0.2 |
| 628 | - | |
| 629 | 563 | spu_config: |
| 630 | 564 | enabled: true |
| 631 | 565 | spu_field: spu_id |
| 632 | 566 | inner_hits_size: 10 |
| 633 | - # 配置哪些option维度参与检索(进索引、以及在线搜索) | |
| 634 | - # 格式为list,选择option1/option2/option3中的一个或多个 | |
| 635 | 567 | searchable_option_dimensions: |
| 636 | 568 | - option1 |
| 637 | 569 | - option2 |
| 638 | 570 | - option3 |
| 639 | - | |
| 640 | -# 每个租户可配置主语言 primary_language 与索引语言 index_languages(主市场语言,商家可勾选) | |
| 641 | -# 默认 index_languages: [en, zh],可配置为任意 SOURCE_LANG_CODE_MAP.keys() 的子集 | |
| 642 | 571 | tenant_config: |
| 643 | 572 | default: |
| 644 | 573 | primary_language: en | ... | ... |
docs/issues/issue-2026-04-16-bayes寻参-clothing_top771数据集上寻参.md
0 → 100644
| ... | ... | @@ -0,0 +1,89 @@ |
| 1 | +Prompt - 1 | |
| 2 | + | |
| 3 | +二、在大标注集上寻参 | |
| 4 | + | |
| 5 | +我以前经过过一轮调参,是基于54个评测样本(queries.txt),过程中发现的最优的参数是这一组: | |
| 6 | +0.641241 {'es_bias': '7.214', 'es_exponent': '0.2025', 'text_bias': '4.0', 'text_exponent': '1.584', 'text_translation_weight': '1.4441', 'knn_text_weight': '0.1', 'knn_image_weight': '5.6232', 'knn_tie_breaker': | |
| 7 | + '0.021', 'knn_bias': '0.0019', 'knn_exponent': '11.8477', 'knn_text_bias': '2.3125', 'knn_text_exponent': '1.1547', 'knn_image_bias': '0.9641', 'knn_image_exponent': '5.8671'} | |
| 8 | + | |
| 9 | +这一组参数分布比较极端,text_bias太大(文本项得分事0~1的,加上4被稀释的很大),图片的exponent太大,不过在这个数据集上面确实是最好的,我觉得有过拟合的可能,因此要扩大数据集,先扩展标注集,然后使用扩展的标注集,继续进行寻参。 | |
| 10 | + | |
| 11 | +因此新建了一个标注集合,标注任务也已经完成:Clothing Filtered 771。请启动寻参任务,把任务跑起来,以后等程序跑完了应该能拿到寻参结果,下次你可以结合程序执行的结果进行结论分析。 | |
| 12 | + | |
| 13 | +至于调参方式,请参考以前的一轮调参: | |
| 14 | +我当时的调参需求: | |
| 15 | + | |
| 16 | +请对coarse_rank fusion公式进行调参: | |
| 17 | + 目前的baseline是这一组,Primary_Metric_Score: 0.637642: | |
| 18 | + coarse_rank: | |
| 19 | + ... | |
| 20 | + fusion: | |
| 21 | + es_bias: 10.0 | |
| 22 | + es_exponent: 0.05 | |
| 23 | + text_bias: 0.1 | |
| 24 | + text_exponent: 0.35 | |
| 25 | + text_translation_weight: 1.0 | |
| 26 | + knn_text_weight: 1.0 | |
| 27 | + knn_image_weight: 2.0 | |
| 28 | + knn_tie_breaker: 0.3 | |
| 29 | + knn_bias: 0.2 | |
| 30 | + knn_exponent: 5.6 | |
| 31 | + knn_text_bias: 0.2 | |
| 32 | + knn_text_exponent: 0.0 | |
| 33 | + knn_image_bias: 0.2 | |
| 34 | + knn_image_exponent: 0.0 | |
| 35 | + 评估指标在/data/saas-search/artifacts/search_evaluation/batch_reports/batch_20260415T150754Z_00b6a8aa3d.md | |
| 36 | + 请以这个为基准,发散思维,进行宽一点的范围调参。因为每次重启、评测都需要几分钟,请写一个调参的框架,基于框架、设定好多组参数、写好脚本,每轮收集效果自动的调整参数分布(因为笛卡尔积、多种参数调参成本太高,因此考虑贝叶斯调参等方法通 | |
| 37 | + 过多轮迭代通过脚本自动收敛) | |
| 38 | + 每次调整参数后需要重启backend (有时候重启backend后eval-web好像也挂了,如果有可以追查原因并修复) | |
| 39 | + ./restart.sh backend | |
| 40 | + 注意:请你调试脚本、进行一轮分析,最终要沉淀一套调参脚本,我下次可以重新跑(还是针对这组参数调参),能自动迭代(调整参数分布)、收集每组参数的指标、调优到最优的参数组合。 | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | +你当时给出的调参脚本( “种子实验 + 随机探索 + 基于历史最优分布的自适应采样”反复迭代。搜索范围已经放宽,新增了多组偏离 baseline 很远的 seed,也保留了 baseline 附近抖动,适合长时间慢慢收敛。): | |
| 45 | + | |
| 46 | + 长时间自动调参脚本已经配好,核心文件是: | |
| 47 | + | |
| 48 | + - 调参主程序:scripts/evaluation/tune_fusion.py | |
| 49 | + - 宽范围搜索空间:scripts/evaluation/tuning/coarse_rank_fusion_space.yaml | |
| 50 | + - 后台长跑启动脚本:scripts/evaluation/start_coarse_fusion_tuning_long.sh | |
| 51 | + - 续跑脚本:scripts/evaluation/resume_coarse_fusion_tuning_long.sh | |
| 52 | + - 使用说明:scripts/evaluation/tuning/README.md | |
| 53 | + | |
| 54 | + 直接启动一轮: | |
| 55 | + | |
| 56 | + ./scripts/evaluation/start_coarse_fusion_tuning_long.sh | |
| 57 | + | |
| 58 | + 如果想跑更久一点: | |
| 59 | + | |
| 60 | + MAX_EVALS=48 BATCH_SIZE=3 CANDIDATE_POOL_SIZE=512 RUN_NAME=coarse_fusion_long_001 \ | |
| 61 | + ./scripts/evaluation/start_coarse_fusion_tuning_long.sh | |
| 62 | + | |
| 63 | + 看日志: | |
| 64 | + | |
| 65 | + tail -f artifacts/search_evaluation/tuning_launches/<run_name>.log | |
| 66 | + | |
| 67 | + 看结果目录: | |
| 68 | + | |
| 69 | + ls artifacts/search_evaluation/tuning_runs/<run_name>/ | |
| 70 | + | |
| 71 | + 续跑: | |
| 72 | + | |
| 73 | + ./scripts/evaluation/resume_coarse_fusion_tuning_long.sh <run_name> | |
| 74 | + | |
| 75 | + 每轮会自动做这些事: | |
| 76 | + | |
| 77 | + - 生成一批候选 coarse_rank.fusion 参数 | |
| 78 | + - 写入配置并重启 backend | |
| 79 | + - 检查 eval-web,必要时自愈拉起 | |
| 80 | + - 跑评测并收集 Primary_Metric_Score | |
| 81 | + - 更新 trials.jsonl、leaderboard.csv、summary.md | |
| 82 | + - 基于历史结果调整下一轮采样分布 | |
| 83 | + | |
| 84 | +上面只是历史的寻参任务的参考。现在请你使用新的数据集来进行寻参。 | |
| 85 | +要注意的是这个数据集比较大,因此每一轮行参任务耗时会比较长,进行大范围的精细的搜索不太可取。考虑仔细分析上次寻参的结果,在已有成果上,进行精细搜索,如果已有成果还不太够,可以在小数据集上进行粗搜,然后再到大数据集上寻参。 | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | +Response - 1 | ... | ... |
docs/issues/issue-2026-04-16-数据集扩增&bayes寻参-TODO.md
| ... | ... | @@ -377,10 +377,9 @@ CLI / 启动脚本设计 |
| 377 | 377 | |
| 378 | 378 | 这一组参数分布比较极端,text_bias太大(文本项得分事0~1的,加上4被稀释的很大),图片的exponent太大,不过在这个数据集上面确实是最好的,我觉得有过拟合的可能,因此要扩大数据集,先扩展标注集,然后使用扩展的标注集,继续进行寻参。 |
| 379 | 379 | |
| 380 | -我已经新建了一个标注集合,请启动寻参任务,把任务跑起来,以后等程序跑完了应该能拿到寻参结果,下次你可以结合程序执行的结果进行结论分析。 | |
| 380 | +因此新建了一个标注集合,标注任务也已经完成:Clothing Filtered 771。请启动寻参任务,把任务跑起来,以后等程序跑完了应该能拿到寻参结果,下次你可以结合程序执行的结果进行结论分析。 | |
| 381 | 381 | |
| 382 | - | |
| 383 | -以前的一轮调参: | |
| 382 | +至于调参方式,请参考以前的一轮调参: | |
| 384 | 383 | 我当时的调参需求: |
| 385 | 384 | |
| 386 | 385 | 请对coarse_rank fusion公式进行调参: |
| ... | ... | @@ -411,7 +410,7 @@ CLI / 启动脚本设计 |
| 411 | 410 | |
| 412 | 411 | |
| 413 | 412 | |
| 414 | -你给出的调参脚本: | |
| 413 | +你当时给出的调参脚本( “种子实验 + 随机探索 + 基于历史最优分布的自适应采样”反复迭代。搜索范围已经放宽,新增了多组偏离 baseline 很远的 seed,也保留了 baseline 附近抖动,适合长时间慢慢收敛。): | |
| 415 | 414 | |
| 416 | 415 | 长时间自动调参脚本已经配好,核心文件是: |
| 417 | 416 | |
| ... | ... | @@ -421,8 +420,6 @@ CLI / 启动脚本设计 |
| 421 | 420 | - 续跑脚本:scripts/evaluation/resume_coarse_fusion_tuning_long.sh |
| 422 | 421 | - 使用说明:scripts/evaluation/tuning/README.md |
| 423 | 422 | |
| 424 | - 这套方案不是笛卡尔积穷举,而是“种子实验 + 随机探索 + 基于历史最优分布的自适应采样”反复迭代。搜索范围已经放宽,新增了多组偏离 baseline 很远的 seed,也保留了 baseline 附近抖动,适合长时间慢慢收敛。 | |
| 425 | - | |
| 426 | 423 | 直接启动一轮: |
| 427 | 424 | |
| 428 | 425 | ./scripts/evaluation/start_coarse_fusion_tuning_long.sh |
| ... | ... | @@ -453,56 +450,5 @@ CLI / 启动脚本设计 |
| 453 | 450 | - 更新 trials.jsonl、leaderboard.csv、summary.md |
| 454 | 451 | - 基于历史结果调整下一轮采样分布 |
| 455 | 452 | |
| 456 | - 你先跑一轮长任务。跑完后把 run_name 或 scripts/evaluation/tuning/README.md 里对应实验目录发我,我再基于结果做下一步分析和收缩搜索空间。 | |
| 457 | - | |
| 458 | - | |
| 459 | - | |
| 460 | - | |
| 461 | -› 我运行了:MAX_EVALS=400 BATCH_SIZE=3 CANDIDATE_POOL_SIZE=512 RUN_NAME=coarse_fusion_long_001 ./scripts/evaluation/start_coarse_fusion_tuning_long.sh | |
| 462 | - | |
| 463 | - run_name=coarse_fusion_long_001 | |
| 464 | - pid=2218620 | |
| 465 | - log=artifacts/search_evaluation/tuning_launches/coarse_fusion_long_001.log | |
| 466 | - pid_file=artifacts/search_evaluation/tuning_launches/coarse_fusion_long_001.pid | |
| 467 | - cmd_file=artifacts/search_evaluation/tuning_launches/coarse_fusion_long_001.cmd | |
| 468 | - run_dir=artifacts/search_evaluation/tuning_runs/coarse_fusion_long_001 | |
| 469 | - | |
| 470 | - tail -f artifacts/search_evaluation/tuning_launches/coarse_fusion_long_001.log | |
| 471 | - cat artifacts/search_evaluation/tuning_runs/coarse_fusion_long_001/leaderboard.csv | |
| 472 | - | |
| 473 | - | |
| 474 | -• 我先看这次长跑的日志和当前排行榜,确认任务是否正常推进、是否有异常中断,以及目前最优参数落在哪个方向。 | |
| 475 | - | |
| 476 | -• Explored | |
| 477 | - └ Read coarse_fusion_long_001.log, leaderboard.csv | |
| 478 | - List coarse_fusion_long_001 | |
| 479 | - Read summary.md | |
| 480 | - | |
| 481 | -─── | |
| 482 | - | |
| 483 | -• 结果已经足够看出明显趋势了。我再补一层聚合分析,看看 top 区间里哪些参数方向反复出现,避免只盯着单个最优点。 | |
| 484 | - | |
| 485 | -• Ran ps -p 2218620 -o pid=,etimes=,cmd= | |
| 486 | - └ (no output) | |
| 487 | - | |
| 488 | -• Ran python - <<'PY' | |
| 489 | - │ import csv, statistics, math | |
| 490 | - │ from pathlib import Path | |
| 491 | - │ … +15 lines | |
| 492 | - └ rows 385 | |
| 493 | - | |
| 494 | - … +61 lines (ctrl + t to view transcript) | |
| 495 | - knn_image_bias: mean=0.5631 median=0.2000 min=0.0010 max=4.0000 | |
| 496 | - knn_image_exponent: mean=4.6316 median=5.8680 min=0.0000 max=6.0000 | |
| 497 | - | |
| 498 | -• Ran python - <<'PY' | |
| 499 | - │ import csv | |
| 500 | - │ from pathlib import Path | |
| 501 | - │ … +19 lines | |
| 502 | - └ baseline_score 0.637642 | |
| 503 | - | |
| 504 | - … +30 lines (ctrl + t to view transcript) | |
| 505 | - 10 bo_234 0.641241 {'es_bias': '7.214', 'es_exponent': '0.2025', 'text_bias': '4.0', 'text_exponent': '1.584', 'text_translation_weight': '1.4441', 'knn_text_weight': '0.1', 'knn_image_weight': '5.6232', 'knn_tie_breaker': | |
| 506 | - '0.021', 'knn_bias': '0.0019', 'knn_exponent': '11.8477', 'knn_text_bias': '2.3125', 'knn_text_exponent': '1.1547', 'knn_image_bias': '0.9641', 'knn_image_exponent': '5.8671'} | |
| 507 | - | |
| 508 | -这一次因为外部原因(磁盘满)终止了,以上是最好的一组参数。 | |
| 453 | +上面只是历史的寻参任务的参考。现在请你使用新的数据集来进行寻参。 | |
| 454 | +要注意的是这个数据集比较大,因此每一轮行参任务耗时会比较长,进行大范围的精细的搜索不太可取。考虑仔细分析上次寻参的结果,在已有成果上,进行精细搜索,如果已有成果还不太够,可以在小数据集上进行粗搜,然后再到大数据集上寻参。 | |
| 509 | 455 | \ No newline at end of file | ... | ... |
scripts/evaluation/resume_coarse_fusion_tuning_long.sh
| ... | ... | @@ -26,10 +26,28 @@ if [ ! -d "${RUN_DIR}" ]; then |
| 26 | 26 | exit 1 |
| 27 | 27 | fi |
| 28 | 28 | |
| 29 | -MAX_EVALS="${MAX_EVALS:-36}" | |
| 30 | -BATCH_SIZE="${BATCH_SIZE:-3}" | |
| 31 | -CANDIDATE_POOL_SIZE="${CANDIDATE_POOL_SIZE:-512}" | |
| 32 | 29 | DATASET_ID="${REPO_EVAL_DATASET_ID:-core_queries}" |
| 30 | +case "${DATASET_ID}" in | |
| 31 | + clothing_top771) | |
| 32 | + DEFAULT_SEED_REPORT="artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md" | |
| 33 | + DEFAULT_MAX_EVALS="18" | |
| 34 | + DEFAULT_BATCH_SIZE="2" | |
| 35 | + DEFAULT_CANDIDATE_POOL_SIZE="160" | |
| 36 | + ;; | |
| 37 | + *) | |
| 38 | + DEFAULT_SEED_REPORT="artifacts/search_evaluation/batch_reports/batch_20260415T150754Z_00b6a8aa3d.md" | |
| 39 | + DEFAULT_MAX_EVALS="36" | |
| 40 | + DEFAULT_BATCH_SIZE="3" | |
| 41 | + DEFAULT_CANDIDATE_POOL_SIZE="512" | |
| 42 | + ;; | |
| 43 | +esac | |
| 44 | + | |
| 45 | +SEED_REPORT="${SEED_REPORT:-${DEFAULT_SEED_REPORT}}" | |
| 46 | +MAX_EVALS="${MAX_EVALS:-${DEFAULT_MAX_EVALS}}" | |
| 47 | +BATCH_SIZE="${BATCH_SIZE:-${DEFAULT_BATCH_SIZE}}" | |
| 48 | +CANDIDATE_POOL_SIZE="${CANDIDATE_POOL_SIZE:-${DEFAULT_CANDIDATE_POOL_SIZE}}" | |
| 49 | +BATCH_EVAL_TIMEOUT_SEC="${BATCH_EVAL_TIMEOUT_SEC:-0}" | |
| 50 | +RANDOM_SEED="${RANDOM_SEED:-20260422}" | |
| 33 | 51 | |
| 34 | 52 | LAUNCH_DIR="artifacts/search_evaluation/tuning_launches" |
| 35 | 53 | mkdir -p "${LAUNCH_DIR}" |
| ... | ... | @@ -38,28 +56,25 @@ PID_PATH="${LAUNCH_DIR}/${RUN_NAME}.resume.pid" |
| 38 | 56 | CMD_PATH="${LAUNCH_DIR}/${RUN_NAME}.resume.cmd" |
| 39 | 57 | |
| 40 | 58 | CMD=( |
| 41 | - python | |
| 42 | - scripts/evaluation/tune_fusion.py | |
| 43 | - --mode optimize | |
| 44 | - --resume-run "${RUN_DIR}" | |
| 45 | - --search-space "${RUN_DIR}/search_space.yaml" | |
| 46 | - --seed-report artifacts/search_evaluation/batch_reports/batch_20260415T150754Z_00b6a8aa3d.md | |
| 47 | - --tenant-id 163 | |
| 48 | - --dataset-id "${DATASET_ID}" | |
| 49 | - --queries-file scripts/evaluation/queries/queries.txt | |
| 50 | - --top-k 100 | |
| 51 | - --language en | |
| 52 | - --search-base-url http://127.0.0.1:6002 | |
| 53 | - --eval-web-base-url http://127.0.0.1:6010 | |
| 54 | - --max-evals "${MAX_EVALS}" | |
| 55 | - --batch-size "${BATCH_SIZE}" | |
| 56 | - --candidate-pool-size "${CANDIDATE_POOL_SIZE}" | |
| 59 | + bash | |
| 60 | + scripts/evaluation/run_coarse_fusion_tuning_resilient.sh | |
| 61 | + "${RUN_NAME}" | |
| 62 | + "${DATASET_ID}" | |
| 63 | + "${MAX_EVALS}" | |
| 64 | + "${BATCH_SIZE}" | |
| 65 | + "${CANDIDATE_POOL_SIZE}" | |
| 66 | + "${RANDOM_SEED}" | |
| 67 | + "${RUN_DIR}/search_space.yaml" | |
| 68 | + "${SEED_REPORT}" | |
| 69 | + "${RUN_DIR}" | |
| 57 | 70 | ) |
| 58 | 71 | |
| 59 | 72 | if [ "$#" -gt 0 ]; then |
| 60 | 73 | CMD+=("$@") |
| 61 | 74 | fi |
| 62 | 75 | |
| 76 | +export BATCH_EVAL_TIMEOUT_SEC | |
| 77 | + | |
| 63 | 78 | printf '%q ' "${CMD[@]}" > "${CMD_PATH}" |
| 64 | 79 | printf '\n' >> "${CMD_PATH}" |
| 65 | 80 | ... | ... |
scripts/evaluation/run_coarse_fusion_tuning_resilient.sh
0 → 100755
| ... | ... | @@ -0,0 +1,117 @@ |
| 1 | +#!/bin/bash | |
| 2 | + | |
| 3 | +set -euo pipefail | |
| 4 | + | |
| 5 | +cd "$(dirname "$0")/../.." | |
| 6 | +source ./activate.sh | |
| 7 | + | |
| 8 | +usage() { | |
| 9 | + echo "usage: $0 <run_name> <dataset_id> <max_evals> <batch_size> <candidate_pool_size> <random_seed> <search_space> <seed_report> [resume_run_dir]" >&2 | |
| 10 | + exit 1 | |
| 11 | +} | |
| 12 | + | |
| 13 | +if [ "$#" -lt 8 ]; then | |
| 14 | + usage | |
| 15 | +fi | |
| 16 | + | |
| 17 | +RUN_NAME="$1" | |
| 18 | +DATASET_ID="$2" | |
| 19 | +MAX_EVALS="$3" | |
| 20 | +BATCH_SIZE="$4" | |
| 21 | +CANDIDATE_POOL_SIZE="$5" | |
| 22 | +RANDOM_SEED="$6" | |
| 23 | +SEARCH_SPACE="$7" | |
| 24 | +SEED_REPORT="$8" | |
| 25 | +RESUME_RUN_DIR="${9:-}" | |
| 26 | + | |
| 27 | +BATCH_EVAL_TIMEOUT_SEC="${BATCH_EVAL_TIMEOUT_SEC:-0}" | |
| 28 | +RESTART_SLEEP_SEC="${RESTART_SLEEP_SEC:-30}" | |
| 29 | +SEARCH_BASE_URL="${SEARCH_BASE_URL:-http://127.0.0.1:6002}" | |
| 30 | +EVAL_WEB_BASE_URL="${EVAL_WEB_BASE_URL:-http://127.0.0.1:6010}" | |
| 31 | +RUN_DIR="artifacts/search_evaluation/tuning_runs/${RUN_NAME}" | |
| 32 | + | |
| 33 | +mkdir -p "$(dirname "$RUN_DIR")" | |
| 34 | + | |
| 35 | +count_live_successes() { | |
| 36 | + python3 - "$RUN_DIR" <<'PY' | |
| 37 | +import json | |
| 38 | +import sys | |
| 39 | +from pathlib import Path | |
| 40 | + | |
| 41 | +run_dir = Path(sys.argv[1]) | |
| 42 | +path = run_dir / "trials.jsonl" | |
| 43 | +count = 0 | |
| 44 | +if path.is_file(): | |
| 45 | + for line in path.read_text(encoding="utf-8").splitlines(): | |
| 46 | + line = line.strip() | |
| 47 | + if not line: | |
| 48 | + continue | |
| 49 | + obj = json.loads(line) | |
| 50 | + if obj.get("status") == "ok" and not obj.get("is_seed"): | |
| 51 | + count += 1 | |
| 52 | +print(count) | |
| 53 | +PY | |
| 54 | +} | |
| 55 | + | |
| 56 | +build_cmd() { | |
| 57 | + local cmd=( | |
| 58 | + python | |
| 59 | + scripts/evaluation/tune_fusion.py | |
| 60 | + --mode optimize | |
| 61 | + --search-space "$SEARCH_SPACE" | |
| 62 | + --seed-report "$SEED_REPORT" | |
| 63 | + --tenant-id 163 | |
| 64 | + --dataset-id "$DATASET_ID" | |
| 65 | + --queries-file scripts/evaluation/queries/queries.txt | |
| 66 | + --top-k 100 | |
| 67 | + --language en | |
| 68 | + --search-base-url "$SEARCH_BASE_URL" | |
| 69 | + --eval-web-base-url "$EVAL_WEB_BASE_URL" | |
| 70 | + --max-evals "$MAX_EVALS" | |
| 71 | + --batch-size "$BATCH_SIZE" | |
| 72 | + --candidate-pool-size "$CANDIDATE_POOL_SIZE" | |
| 73 | + --random-seed "$RANDOM_SEED" | |
| 74 | + --batch-eval-timeout-sec "$BATCH_EVAL_TIMEOUT_SEC" | |
| 75 | + ) | |
| 76 | + if [ -n "$RESUME_RUN_DIR" ]; then | |
| 77 | + cmd+=(--resume-run "$RESUME_RUN_DIR") | |
| 78 | + else | |
| 79 | + cmd+=(--run-name "$RUN_NAME") | |
| 80 | + fi | |
| 81 | + printf '%q ' "${cmd[@]}" | |
| 82 | + printf '\n' | |
| 83 | +} | |
| 84 | + | |
| 85 | +attempt=0 | |
| 86 | +while true; do | |
| 87 | + live_successes="$(count_live_successes)" | |
| 88 | + if [ "$live_successes" -ge "$MAX_EVALS" ]; then | |
| 89 | + echo "[resilient] complete run_name=$RUN_NAME live_successes=$live_successes target=$MAX_EVALS" | |
| 90 | + exit 0 | |
| 91 | + fi | |
| 92 | + | |
| 93 | + attempt=$((attempt + 1)) | |
| 94 | + if [ -d "$RUN_DIR" ]; then | |
| 95 | + RESUME_RUN_DIR="$RUN_DIR" | |
| 96 | + fi | |
| 97 | + | |
| 98 | + echo "[resilient] attempt=$attempt run_name=$RUN_NAME live_successes=$live_successes target=$MAX_EVALS" | |
| 99 | + CMD_STR="$(build_cmd)" | |
| 100 | + echo "[resilient] cmd=$CMD_STR" | |
| 101 | + | |
| 102 | + set +e | |
| 103 | + bash -lc "$CMD_STR" | |
| 104 | + exit_code=$? | |
| 105 | + set -e | |
| 106 | + | |
| 107 | + live_successes="$(count_live_successes)" | |
| 108 | + echo "[resilient] exit_code=$exit_code live_successes=$live_successes" | |
| 109 | + | |
| 110 | + if [ "$live_successes" -ge "$MAX_EVALS" ]; then | |
| 111 | + echo "[resilient] finished after attempt=$attempt" | |
| 112 | + exit 0 | |
| 113 | + fi | |
| 114 | + | |
| 115 | + echo "[resilient] sleeping ${RESTART_SLEEP_SEC}s before resume" | |
| 116 | + sleep "$RESTART_SLEEP_SEC" | |
| 117 | +done | ... | ... |
scripts/evaluation/start_coarse_fusion_tuning_long.sh
| ... | ... | @@ -5,12 +5,34 @@ set -euo pipefail |
| 5 | 5 | cd "$(dirname "$0")/../.." |
| 6 | 6 | source ./activate.sh |
| 7 | 7 | |
| 8 | -RUN_NAME="${RUN_NAME:-coarse_fusion_long_$(date -u +%Y%m%dT%H%M%SZ)}" | |
| 9 | -MAX_EVALS="${MAX_EVALS:-36}" | |
| 10 | -BATCH_SIZE="${BATCH_SIZE:-3}" | |
| 11 | -CANDIDATE_POOL_SIZE="${CANDIDATE_POOL_SIZE:-512}" | |
| 12 | -RANDOM_SEED="${RANDOM_SEED:-20260416}" | |
| 13 | 8 | DATASET_ID="${REPO_EVAL_DATASET_ID:-core_queries}" |
| 9 | +case "${DATASET_ID}" in | |
| 10 | + clothing_top771) | |
| 11 | + DEFAULT_SEARCH_SPACE="scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml" | |
| 12 | + DEFAULT_SEED_REPORT="artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md" | |
| 13 | + DEFAULT_MAX_EVALS="18" | |
| 14 | + DEFAULT_BATCH_SIZE="2" | |
| 15 | + DEFAULT_CANDIDATE_POOL_SIZE="160" | |
| 16 | + DEFAULT_RANDOM_SEED="20260422" | |
| 17 | + ;; | |
| 18 | + *) | |
| 19 | + DEFAULT_SEARCH_SPACE="scripts/evaluation/tuning/coarse_rank_fusion_space.yaml" | |
| 20 | + DEFAULT_SEED_REPORT="artifacts/search_evaluation/batch_reports/batch_20260415T150754Z_00b6a8aa3d.md" | |
| 21 | + DEFAULT_MAX_EVALS="36" | |
| 22 | + DEFAULT_BATCH_SIZE="3" | |
| 23 | + DEFAULT_CANDIDATE_POOL_SIZE="512" | |
| 24 | + DEFAULT_RANDOM_SEED="20260416" | |
| 25 | + ;; | |
| 26 | +esac | |
| 27 | + | |
| 28 | +RUN_NAME="${RUN_NAME:-coarse_fusion_${DATASET_ID}_$(date -u +%Y%m%dT%H%M%SZ)}" | |
| 29 | +SEARCH_SPACE="${SEARCH_SPACE:-${DEFAULT_SEARCH_SPACE}}" | |
| 30 | +SEED_REPORT="${SEED_REPORT:-${DEFAULT_SEED_REPORT}}" | |
| 31 | +MAX_EVALS="${MAX_EVALS:-${DEFAULT_MAX_EVALS}}" | |
| 32 | +BATCH_SIZE="${BATCH_SIZE:-${DEFAULT_BATCH_SIZE}}" | |
| 33 | +CANDIDATE_POOL_SIZE="${CANDIDATE_POOL_SIZE:-${DEFAULT_CANDIDATE_POOL_SIZE}}" | |
| 34 | +RANDOM_SEED="${RANDOM_SEED:-${DEFAULT_RANDOM_SEED}}" | |
| 35 | +BATCH_EVAL_TIMEOUT_SEC="${BATCH_EVAL_TIMEOUT_SEC:-0}" | |
| 14 | 36 | |
| 15 | 37 | LAUNCH_DIR="artifacts/search_evaluation/tuning_launches" |
| 16 | 38 | mkdir -p "${LAUNCH_DIR}" |
| ... | ... | @@ -19,29 +41,24 @@ PID_PATH="${LAUNCH_DIR}/${RUN_NAME}.pid" |
| 19 | 41 | CMD_PATH="${LAUNCH_DIR}/${RUN_NAME}.cmd" |
| 20 | 42 | |
| 21 | 43 | CMD=( |
| 22 | - python | |
| 23 | - scripts/evaluation/tune_fusion.py | |
| 24 | - --mode optimize | |
| 25 | - --run-name "${RUN_NAME}" | |
| 26 | - --search-space scripts/evaluation/tuning/coarse_rank_fusion_space.yaml | |
| 27 | - --seed-report artifacts/search_evaluation/batch_reports/batch_20260415T150754Z_00b6a8aa3d.md | |
| 28 | - --tenant-id 163 | |
| 29 | - --dataset-id "${DATASET_ID}" | |
| 30 | - --queries-file scripts/evaluation/queries/queries.txt | |
| 31 | - --top-k 100 | |
| 32 | - --language en | |
| 33 | - --search-base-url http://127.0.0.1:6002 | |
| 34 | - --eval-web-base-url http://127.0.0.1:6010 | |
| 35 | - --max-evals "${MAX_EVALS}" | |
| 36 | - --batch-size "${BATCH_SIZE}" | |
| 37 | - --candidate-pool-size "${CANDIDATE_POOL_SIZE}" | |
| 38 | - --random-seed "${RANDOM_SEED}" | |
| 44 | + bash | |
| 45 | + scripts/evaluation/run_coarse_fusion_tuning_resilient.sh | |
| 46 | + "${RUN_NAME}" | |
| 47 | + "${DATASET_ID}" | |
| 48 | + "${MAX_EVALS}" | |
| 49 | + "${BATCH_SIZE}" | |
| 50 | + "${CANDIDATE_POOL_SIZE}" | |
| 51 | + "${RANDOM_SEED}" | |
| 52 | + "${SEARCH_SPACE}" | |
| 53 | + "${SEED_REPORT}" | |
| 39 | 54 | ) |
| 40 | 55 | |
| 41 | 56 | if [ "$#" -gt 0 ]; then |
| 42 | 57 | CMD+=("$@") |
| 43 | 58 | fi |
| 44 | 59 | |
| 60 | +export BATCH_EVAL_TIMEOUT_SEC | |
| 61 | + | |
| 45 | 62 | printf '%q ' "${CMD[@]}" > "${CMD_PATH}" |
| 46 | 63 | printf '\n' >> "${CMD_PATH}" |
| 47 | 64 | ... | ... |
scripts/evaluation/tune_fusion.py
| ... | ... | @@ -379,6 +379,7 @@ def run_batch_eval( |
| 379 | 379 | top_k: int, |
| 380 | 380 | language: str, |
| 381 | 381 | force_refresh_labels: bool, |
| 382 | + timeout_sec: int, | |
| 382 | 383 | ) -> Dict[str, Any]: |
| 383 | 384 | cmd = [ |
| 384 | 385 | str(PROJECT_ROOT / ".venv" / "bin" / "python"), |
| ... | ... | @@ -397,13 +398,14 @@ def run_batch_eval( |
| 397 | 398 | cmd.extend(["--queries-file", str(queries_file)]) |
| 398 | 399 | if force_refresh_labels: |
| 399 | 400 | cmd.append("--force-refresh-labels") |
| 401 | + timeout = timeout_sec if timeout_sec and timeout_sec > 0 else None | |
| 400 | 402 | completed = subprocess.run( |
| 401 | 403 | cmd, |
| 402 | 404 | cwd=PROJECT_ROOT, |
| 403 | 405 | check=True, |
| 404 | 406 | capture_output=True, |
| 405 | 407 | text=True, |
| 406 | - timeout=7200, | |
| 408 | + timeout=timeout, | |
| 407 | 409 | ) |
| 408 | 410 | output = (completed.stdout or "") + "\n" + (completed.stderr or "") |
| 409 | 411 | batch_ids = re.findall(r"batch_id=([A-Za-z0-9_]+)", output) |
| ... | ... | @@ -1221,6 +1223,7 @@ def run_optimize_mode(args: argparse.Namespace) -> None: |
| 1221 | 1223 | top_k=args.top_k, |
| 1222 | 1224 | language=args.language, |
| 1223 | 1225 | force_refresh_labels=force_refresh_labels, |
| 1226 | + timeout_sec=args.batch_eval_timeout_sec, | |
| 1224 | 1227 | ) |
| 1225 | 1228 | ensure_disk_headroom( |
| 1226 | 1229 | min_free_gb=args.min_free_gb, |
| ... | ... | @@ -1362,6 +1365,7 @@ def build_parser() -> argparse.ArgumentParser: |
| 1362 | 1365 | parser.add_argument("--resume-run", default=None) |
| 1363 | 1366 | parser.add_argument("--max-evals", type=int, default=12) |
| 1364 | 1367 | parser.add_argument("--batch-size", type=int, default=3) |
| 1368 | + parser.add_argument("--batch-eval-timeout-sec", type=int, default=0) | |
| 1365 | 1369 | parser.add_argument("--init-random", type=int, default=None) |
| 1366 | 1370 | parser.add_argument("--candidate-pool-size", type=int, default=None) |
| 1367 | 1371 | parser.add_argument("--random-seed", type=int, default=20260415) | ... | ... |
scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml
0 → 100644
| ... | ... | @@ -0,0 +1,161 @@ |
| 1 | +target_path: coarse_rank.fusion | |
| 2 | + | |
| 3 | +baseline: | |
| 4 | + es_bias: 10.0 | |
| 5 | + es_exponent: 0.05 | |
| 6 | + text_bias: 0.1 | |
| 7 | + text_exponent: 0.35 | |
| 8 | + text_translation_weight: 1.0 | |
| 9 | + knn_text_weight: 1.0 | |
| 10 | + knn_image_weight: 2.0 | |
| 11 | + knn_tie_breaker: 0.3 | |
| 12 | + knn_bias: 0.2 | |
| 13 | + knn_exponent: 5.6 | |
| 14 | + knn_text_bias: 0.2 | |
| 15 | + knn_text_exponent: 0.0 | |
| 16 | + knn_image_bias: 0.2 | |
| 17 | + knn_image_exponent: 0.0 | |
| 18 | + | |
| 19 | +parameters: | |
| 20 | + es_bias: {min: 2.0, max: 20.0, scale: log, round: 4} | |
| 21 | + es_exponent: {min: 0.03, max: 0.28, scale: linear, round: 4} | |
| 22 | + text_bias: {min: 0.01, max: 4.0, scale: log, round: 4} | |
| 23 | + text_exponent: {min: 0.2, max: 1.6, scale: linear, round: 4} | |
| 24 | + text_translation_weight: {min: 0.7, max: 1.8, scale: linear, round: 4} | |
| 25 | + knn_text_weight: {min: 0.05, max: 1.8, scale: linear, round: 4} | |
| 26 | + knn_image_weight: {min: 1.2, max: 6.0, scale: linear, round: 4} | |
| 27 | + knn_tie_breaker: {min: 0.0, max: 0.4, scale: linear, round: 4} | |
| 28 | + knn_bias: {min: 0.001, max: 2.5, scale: log, round: 4} | |
| 29 | + knn_exponent: {min: 0.05, max: 12.0, scale: log, round: 4} | |
| 30 | + knn_text_bias: {min: 0.001, max: 4.0, scale: log, round: 4} | |
| 31 | + knn_text_exponent: {min: 0.0, max: 2.0, scale: linear, round: 4} | |
| 32 | + knn_image_bias: {min: 0.01, max: 1.5, scale: log, round: 4} | |
| 33 | + knn_image_exponent: {min: 0.0, max: 6.0, scale: linear, round: 4} | |
| 34 | + | |
| 35 | +seed_experiments: | |
| 36 | + - name: seed_low_knn_global | |
| 37 | + description: 先验证 021002 中出现的低 knn 全局指数,去掉 reranker 后是否仍有收益。 | |
| 38 | + params: | |
| 39 | + knn_bias: 0.6 | |
| 40 | + knn_exponent: 0.4 | |
| 41 | + - name: seed_bigset_knn_soft | |
| 42 | + description: 从低 knn 全局指数出发,继续平滑 knn 非线性。 | |
| 43 | + params: | |
| 44 | + text_exponent: 0.42 | |
| 45 | + text_translation_weight: 1.05 | |
| 46 | + knn_text_weight: 0.85 | |
| 47 | + knn_image_weight: 2.4 | |
| 48 | + knn_tie_breaker: 0.18 | |
| 49 | + knn_bias: 0.9 | |
| 50 | + knn_exponent: 0.18 | |
| 51 | + knn_image_exponent: 0.2 | |
| 52 | + - name: seed_bigset_knn_mid | |
| 53 | + description: 保留平滑 knn,但让 image 通路再强一点,验证大集是否需要适度非线性。 | |
| 54 | + params: | |
| 55 | + es_bias: 8.0 | |
| 56 | + es_exponent: 0.08 | |
| 57 | + text_bias: 0.15 | |
| 58 | + text_exponent: 0.5 | |
| 59 | + text_translation_weight: 1.15 | |
| 60 | + knn_text_weight: 0.65 | |
| 61 | + knn_image_weight: 3.1 | |
| 62 | + knn_tie_breaker: 0.12 | |
| 63 | + knn_bias: 0.45 | |
| 64 | + knn_exponent: 0.85 | |
| 65 | + knn_text_bias: 0.35 | |
| 66 | + knn_text_exponent: 0.2 | |
| 67 | + knn_image_bias: 0.22 | |
| 68 | + knn_image_exponent: 0.8 | |
| 69 | + - name: seed_bigset_text_stable | |
| 70 | + description: 提高 lexical 区分度,观察大集是否更偏好稳健文本排序。 | |
| 71 | + params: | |
| 72 | + es_bias: 7.0 | |
| 73 | + es_exponent: 0.12 | |
| 74 | + text_bias: 0.25 | |
| 75 | + text_exponent: 0.72 | |
| 76 | + text_translation_weight: 1.0 | |
| 77 | + knn_text_weight: 0.55 | |
| 78 | + knn_image_weight: 2.2 | |
| 79 | + knn_tie_breaker: 0.08 | |
| 80 | + knn_bias: 0.7 | |
| 81 | + knn_exponent: 0.35 | |
| 82 | + knn_text_bias: 0.5 | |
| 83 | + knn_text_exponent: 0.4 | |
| 84 | + knn_image_bias: 0.18 | |
| 85 | + knn_image_exponent: 0.35 | |
| 86 | + - name: seed_hybrid_transfer | |
| 87 | + description: 以大集 baseline 为主,温和吸收小集历史赢家中的 image/text 强化模式。 | |
| 88 | + params: | |
| 89 | + es_bias: 7.2 | |
| 90 | + es_exponent: 0.15 | |
| 91 | + text_bias: 0.6 | |
| 92 | + text_exponent: 0.82 | |
| 93 | + text_translation_weight: 1.28 | |
| 94 | + knn_text_weight: 0.45 | |
| 95 | + knn_image_weight: 4.0 | |
| 96 | + knn_tie_breaker: 0.08 | |
| 97 | + knn_bias: 0.2 | |
| 98 | + knn_exponent: 1.2 | |
| 99 | + knn_text_bias: 0.8 | |
| 100 | + knn_text_exponent: 0.45 | |
| 101 | + knn_image_bias: 0.3 | |
| 102 | + knn_image_exponent: 1.4 | |
| 103 | + - name: seed_legacy_bo234 | |
| 104 | + description: 直接验证 53 条集历史最优在 771 条集上的迁移表现。 | |
| 105 | + params: | |
| 106 | + es_bias: 7.214 | |
| 107 | + es_exponent: 0.2025 | |
| 108 | + text_bias: 4.0 | |
| 109 | + text_exponent: 1.584 | |
| 110 | + text_translation_weight: 1.4441 | |
| 111 | + knn_text_weight: 0.1 | |
| 112 | + knn_image_weight: 5.6232 | |
| 113 | + knn_tie_breaker: 0.021 | |
| 114 | + knn_bias: 0.0019 | |
| 115 | + knn_exponent: 11.8477 | |
| 116 | + knn_text_bias: 2.3125 | |
| 117 | + knn_text_exponent: 1.1547 | |
| 118 | + knn_image_bias: 0.9641 | |
| 119 | + knn_image_exponent: 5.8671 | |
| 120 | + - name: seed_legacy_bo340 | |
| 121 | + description: 验证小集冠军参数在大集上是否仍有价值。 | |
| 122 | + params: | |
| 123 | + es_bias: 5.887 | |
| 124 | + es_exponent: 0.2145 | |
| 125 | + text_bias: 4.0 | |
| 126 | + text_exponent: 1.6 | |
| 127 | + text_translation_weight: 1.4788 | |
| 128 | + knn_text_weight: 0.3693 | |
| 129 | + knn_image_weight: 5.7028 | |
| 130 | + knn_tie_breaker: 0.0174 | |
| 131 | + knn_bias: 0.0016 | |
| 132 | + knn_exponent: 12.0 | |
| 133 | + knn_text_bias: 2.6071 | |
| 134 | + knn_text_exponent: 1.0458 | |
| 135 | + knn_image_bias: 0.8282 | |
| 136 | + knn_image_exponent: 6.0 | |
| 137 | + - name: seed_image_guard | |
| 138 | + description: 控制 image 权重但允许 image 子项指数,检查 recall 与 precision 的平衡点。 | |
| 139 | + params: | |
| 140 | + es_bias: 9.0 | |
| 141 | + es_exponent: 0.09 | |
| 142 | + text_bias: 0.12 | |
| 143 | + text_exponent: 0.45 | |
| 144 | + text_translation_weight: 1.1 | |
| 145 | + knn_text_weight: 0.7 | |
| 146 | + knn_image_weight: 2.8 | |
| 147 | + knn_tie_breaker: 0.1 | |
| 148 | + knn_bias: 0.55 | |
| 149 | + knn_exponent: 0.55 | |
| 150 | + knn_text_bias: 0.25 | |
| 151 | + knn_text_exponent: 0.15 | |
| 152 | + knn_image_bias: 0.28 | |
| 153 | + knn_image_exponent: 1.0 | |
| 154 | + | |
| 155 | +optimizer: | |
| 156 | + init_random: 2 | |
| 157 | + candidate_pool_size: 160 | |
| 158 | + explore_probability: 0.12 | |
| 159 | + local_jitter_probability: 0.62 | |
| 160 | + elite_fraction: 0.25 | |
| 161 | + min_normalized_distance: 0.08 | ... | ... |