Commit d3dd01d3687413795804d7c9164f10d8aadc585d

Authored by tangwang
1 parent 628ff04f

自动寻参:

- 把 batch timeout 改成“可无限长跑”:
  - [tune_fusion.py](/data/saas-search/scripts/evaluation/tune_fusion.py:400)
  - 现在 `--batch-eval-timeout-sec <= 0` 时,不再给 `subprocess.run` 设置 Python 层超时
- 新增 resilient wrapper,负责自动续跑:
  - [run_coarse_fusion_tuning_resilient.sh](/data/saas-search/scripts/evaluation/run_coarse_fusion_tuning_resilient.sh)
  - 逻辑是:检查 `trials.jsonl` 里已完成的 live eval 数量,没到 `max_evals` 就继续 `resume-run`
  - 即使异常退出,也会 sleep 后自动从已有 `run_dir` 继续
- 启动/续跑脚本都切到 resilient 模式:
  - [start_coarse_fusion_tuning_long.sh](/data/saas-search/scripts/evaluation/start_coarse_fusion_tuning_long.sh)
  - [resume_coarse_fusion_tuning_long.sh](/data/saas-search/scripts/evaluation/resume_coarse_fusion_tuning_long.sh)

**当前任务**
- `run_name`: `coarse_fusion_clothing_top771_resilient_20260422T091650Z`
- `run_dir`: [coarse_fusion_clothing_top771_resilient_20260422T091650Z](/data/saas-search/artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z)
- `launch log`: [coarse_fusion_clothing_top771_resilient_20260422T091650Z.log](/data/saas-search/artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_resilient_20260422T091650Z.log)

**已确认**
- wrapper 已启动并进入 `attempt=1`
- 真正传入的是 `--batch-eval-timeout-sec 0`
- `tune_fusion.py` 正在运行
- `build_annotation_set.py batch` 已经在运行
- `eval.log` 已经打出这轮的前几条 query 评测进度,说明不是空转

**监控方式**
- `tail -f artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_resilient_20260422T091650Z.log`
- `tail -f logs/eval.log`
- `tail -f artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/trials.jsonl`
- `cat artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/leaderboard.csv`

**这次和上次的关键区别**
- 上次是“单轮 batch 被 Python 超时截断”
- 这次是“单轮不设 Python 超时 + 外层 wrapper 自动续跑”
- 所以长时间运行、中途中断、再恢复,都会沿着同一个 `run_dir` 往下推进
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_20260422T023815Z.cmd 0 → 100644
... ... @@ -0,0 +1 @@
  1 +python scripts/evaluation/tune_fusion.py --mode optimize --run-name coarse_fusion_clothing_top771_20260422T023815Z --search-space scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml --seed-report artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md --tenant-id 163 --dataset-id clothing_top771 --queries-file scripts/evaluation/queries/queries.txt --top-k 100 --language en --search-base-url http://127.0.0.1:6002 --eval-web-base-url http://127.0.0.1:6010 --max-evals 18 --batch-size 2 --candidate-pool-size 160 --random-seed 20260422
... ...
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_20260422T023815Z.pid 0 → 100644
... ... @@ -0,0 +1 @@
  1 +3843738
... ...
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_20260422T023951Z.cmd 0 → 100644
... ... @@ -0,0 +1 @@
  1 +python scripts/evaluation/tune_fusion.py --mode optimize --run-name coarse_fusion_clothing_top771_20260422T023951Z --search-space scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml --seed-report artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md --tenant-id 163 --dataset-id clothing_top771 --queries-file scripts/evaluation/queries/queries.txt --top-k 100 --language en --search-base-url http://127.0.0.1:6002 --eval-web-base-url http://127.0.0.1:6010 --max-evals 18 --batch-size 2 --candidate-pool-size 160 --random-seed 20260422
... ...
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_20260422T023951Z.pid 0 → 100644
... ... @@ -0,0 +1 @@
  1 +3845416
... ...
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_dryrun.cmd 0 → 100644
... ... @@ -0,0 +1 @@
  1 +python scripts/evaluation/tune_fusion.py --mode optimize --run-name coarse_fusion_clothing_top771_dryrun --search-space scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml --seed-report artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T021002Z_5426bba1a6/report.md --tenant-id 163 --dataset-id clothing_top771 --queries-file scripts/evaluation/queries/queries.txt --top-k 100 --language en --search-base-url http://127.0.0.1:6002 --eval-web-base-url http://127.0.0.1:6010 --max-evals 18 --batch-size 2 --candidate-pool-size 160 --random-seed 20260422 --help
... ...
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_dryrun.pid 0 → 100644
... ... @@ -0,0 +1 @@
  1 +3842050
... ...
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_dryrun2.cmd 0 → 100644
... ... @@ -0,0 +1 @@
  1 +python scripts/evaluation/tune_fusion.py --mode optimize --run-name coarse_fusion_clothing_top771_dryrun2 --search-space scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml --seed-report artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md --tenant-id 163 --dataset-id clothing_top771 --queries-file scripts/evaluation/queries/queries.txt --top-k 100 --language en --search-base-url http://127.0.0.1:6002 --eval-web-base-url http://127.0.0.1:6010 --max-evals 18 --batch-size 2 --candidate-pool-size 160 --random-seed 20260422 --help
... ...
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_dryrun2.pid 0 → 100644
... ... @@ -0,0 +1 @@
  1 +3843512
... ...
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_resilient_dryrun.cmd 0 → 100644
... ... @@ -0,0 +1 @@
  1 +bash scripts/evaluation/run_coarse_fusion_tuning_resilient.sh coarse_fusion_clothing_top771_resilient_dryrun clothing_top771 18 2 160 20260422 scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md --help
... ...
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_resilient_dryrun.pid 0 → 100644
... ... @@ -0,0 +1 @@
  1 +4126011
... ...
config/config-with-reranker.yaml
... ... @@ -260,6 +260,7 @@ function_score:
260 260 score_mode: sum
261 261 boost_mode: multiply
262 262 functions: []
  263 +
263 264 coarse_rank:
264 265 enabled: true
265 266 input_window: 480
... ... @@ -271,7 +272,7 @@ coarse_rank:
271 272 text_exponent: 0.35
272 273 # base_query_trans_* 相对 base_query 的权重(见 search/rerank_client 中文本 dismax 融合)
273 274 # 因为es的打分已经给了trans进行了折扣,所以这里不再继续折扣
274   - text_translation_weight: 0.8
  275 + text_translation_weight: 1.0
275 276 knn_text_weight: 1.0
276 277 knn_image_weight: 2.0
277 278 knn_tie_breaker: 0.3
... ...
config/config.yaml
... ... @@ -100,11 +100,8 @@ es_settings:
100 100 number_of_shards: 1
101 101 number_of_replicas: 0
102 102 refresh_interval: 30s
103   -
104   -# 统一按“字段基名”配置;查询时按实际检索语言动态拼接 .{lang}
105 103 field_boosts:
106 104 title: 3.0
107   - # qanchors enriched_tags 在 enriched_attributes.value中也存在,所以其实他的权重为自身权重+enriched_attributes.value的权重
108 105 qanchors: 1.0
109 106 enriched_tags: 1.0
110 107 enriched_attributes.value: 1.5
... ... @@ -118,7 +115,6 @@ field_boosts:
118 115 brief: 1.0
119 116 description: 1.0
120 117 vendor: 1.0
121   -
122 118 query_config:
123 119 supported_languages:
124 120 - zh
... ... @@ -126,16 +122,12 @@ query_config:
126 122 default_language: en
127 123 enable_text_embedding: true
128 124 enable_query_rewrite: true
129   -
130   - zh_to_en_model: nllb-200-distilled-600m # nllb-200-distilled-600m deepl opus-mt-zh-en / opus-mt-en-zh
  125 + zh_to_en_model: nllb-200-distilled-600m
131 126 en_to_zh_model: nllb-200-distilled-600m
132 127 default_translation_model: nllb-200-distilled-600m
133   - # 源语种不在 index_languages时翻译质量比较重要,因此单独配置
134 128 zh_to_en_model__source_not_in_index: deepl
135 129 en_to_zh_model__source_not_in_index: deepl
136 130 default_translation_model__source_not_in_index: deepl
137   -
138   - # 查询解析阶段:翻译与 query 向量并发执行,共用同一等待预算(毫秒)
139 131 translation_embedding_wait_budget_ms_source_in_index: 300
140 132 translation_embedding_wait_budget_ms_source_not_in_index: 400
141 133 style_intent:
... ... @@ -165,31 +157,22 @@ query_config:
165 157 enabled: true
166 158 dictionary_path: config/dictionaries/product_title_exclusion.tsv
167 159 search_fields:
168   - # 统一按“字段基名”配置;查询时按实际检索语言动态拼接 .{lang}
169 160 multilingual_fields:
170 161 - title
171 162 - keywords
172 163 - qanchors
173 164 - enriched_tags
174 165 - enriched_attributes.value
175   - # - enriched_taxonomy_attributes.value
176 166 - option1_values
177 167 - option2_values
178 168 - option3_values
179 169 - category_path
180 170 - category_name_text
181   - # - brief
182   - # - description
183   - # - vendor
184   - # shared_fields: 无语言后缀字段;示例: tags, option1_values, option2_values, option3_values
185   -
186 171 shared_fields: null
187 172 core_multilingual_fields:
188 173 - title
189 174 - qanchors
190 175 - category_name_text
191   -
192   - # 文本召回(主查询 + 翻译查询)
193 176 text_query_strategy:
194 177 base_minimum_should_match: 60%
195 178 translation_minimum_should_match: 60%
... ... @@ -206,8 +189,6 @@ query_config:
206 189 phrase_match_boost: 3.0
207 190 text_embedding_field: title_embedding
208 191 image_embedding_field: image_embedding.vector
209   -
210   - # null表示返回所有字段,[]表示不返回任何字段
211 192 source_fields:
212 193 - spu_id
213 194 - handle
... ... @@ -223,13 +204,8 @@ query_config:
223 204 - category1_name
224 205 - category2_name
225 206 - category3_name
226   - # - tags
227   - # - keywords
228   - # - qanchors
229   - # - enriched_tags
230 207 - enriched_attributes
231 208 - enriched_taxonomy_attributes
232   -
233 209 - min_price
234 210 - compare_at_price
235 211 - image_url
... ... @@ -245,17 +221,14 @@ query_config:
245 221 - option3_values
246 222 - specifications
247 223 - skus
248   -
249   - # KNN:文本向量与多模态(图片)向量各自 boost 与召回(k / num_candidates)
250 224 knn_text_boost: 4
251 225 knn_image_boost: 4
252 226 knn_text_k: 160
253   - knn_text_num_candidates: 560 # k * 3.4
  227 + knn_text_num_candidates: 560
254 228 knn_text_k_long: 400
255 229 knn_text_num_candidates_long: 1200
256 230 knn_image_k: 400
257 231 knn_image_num_candidates: 1200
258   -
259 232 function_score:
260 233 score_mode: sum
261 234 boost_mode: multiply
... ... @@ -269,20 +242,18 @@ coarse_rank:
269 242 es_exponent: 0.05
270 243 text_bias: 0.1
271 244 text_exponent: 0.35
272   - # base_query_trans_* 相对 base_query 的权重(见 search/rerank_client 中文本 dismax 融合)
273   - # 因为es的打分已经给了trans进行了折扣,所以这里不再继续折扣
274 245 text_translation_weight: 1.0
275 246 knn_text_weight: 1.0
276 247 knn_image_weight: 2.0
277 248 knn_tie_breaker: 0.3
278   - knn_bias: 0.2
279   - knn_exponent: 5.6
  249 + knn_bias: 0.6
  250 + knn_exponent: 0.4
280 251 knn_text_bias: 0.2
281 252 knn_text_exponent: 0.0
282 253 knn_image_bias: 0.2
283 254 knn_image_exponent: 0.0
284 255 fine_rank:
285   - enabled: false # false 时保序透传
  256 + enabled: false
286 257 input_window: 160
287 258 output_window: 80
288 259 timeout_sec: 10.0
... ... @@ -290,7 +261,7 @@ fine_rank:
290 261 rerank_doc_template: '{title}'
291 262 service_profile: fine
292 263 rerank:
293   - enabled: false # false 时保序透传
  264 + enabled: false
294 265 rerank_window: 160
295 266 exact_knn_rescore_enabled: true
296 267 exact_knn_rescore_window: 160
... ... @@ -300,10 +271,6 @@ rerank:
300 271 rerank_query_template: '{query}'
301 272 rerank_doc_template: '{title}'
302 273 service_profile: default
303   - # 乘法融合:fused = Π (max(score,0) + bias) ** exponent(es / rerank / fine / text / knn)
304   - # 其中 knn_score 先做一层 dis_max:
305   - # max(knn_text_weight * text_knn, knn_image_weight * image_knn)
306   - # + knn_tie_breaker * 另一侧较弱信号
307 274 fusion:
308 275 es_bias: 10.0
309 276 es_exponent: 0.05
... ... @@ -312,7 +279,6 @@ rerank:
312 279 fine_bias: 0.1
313 280 fine_exponent: 1.0
314 281 text_bias: 0.1
315   - # base_query_trans_* 相对 base_query 的权重(见 search/rerank_client 中文本 dismax 融合)
316 282 text_exponent: 0.25
317 283 text_translation_weight: 0.8
318 284 knn_text_weight: 1.0
... ... @@ -320,7 +286,6 @@ rerank:
320 286 knn_tie_breaker: 0.3
321 287 knn_bias: 0.0
322 288 knn_exponent: 5.6
323   -
324 289 services:
325 290 translation:
326 291 service_url: http://127.0.0.1:6006
... ... @@ -330,9 +295,6 @@ services:
330 295 cache:
331 296 ttl_seconds: 62208000
332 297 sliding_expiration: true
333   - # When false, cache keys are exact-match per request model only (ignores model_quality_tiers for lookups)
334   - # Higher tier = better quality. Multiple models may share one tier (同级).
335   - # A request may reuse Redis keys from models with tier > A or tier == A (not from lower tiers).
336 298 enable_model_quality_tier_cache: true
337 299 model_quality_tiers:
338 300 deepl: 30
... ... @@ -443,10 +405,7 @@ services:
443 405 device: cuda
444 406 batch_size: 32
445 407 normalize_embeddings: true
446   - # 服务内图片后端(embedding 进程启动时读取;cnclip gRPC 与 6008 须同一 model_name)
447   - # Chinese-CLIP:ViT-H-14 → 1024 维,ViT-L-14 → 768 维。须与 mappings/search_products.json 中
448   - # image_embedding.vector.dims 一致(当前索引为 1024 → 默认 ViT-H-14)。
449   - image_backend: clip_as_service # clip_as_service | local_cnclip
  408 + image_backend: clip_as_service
450 409 image_backends:
451 410 clip_as_service:
452 411 server: grpc://127.0.0.1:51000
... ... @@ -472,7 +431,6 @@ services:
472 431 request:
473 432 max_docs: 1000
474 433 normalize: true
475   - # 命名实例:同一套 reranker 代码按实例名读取不同端口 / 后端 / runtime 目录。
476 434 default_instance: default
477 435 instances:
478 436 default:
... ... @@ -515,31 +473,11 @@ services:
515 473 enforce_eager: false
516 474 infer_batch_size: 100
517 475 sort_by_doc_length: true
518   -
519   - # standard=_format_instruction__standard(固定 yes/no system);compact=_format_instruction(instruction 作 system 且 user 内重复 Instruct)
520   - instruction_format: standard # compact standard
521   - # instruction: "Given a query, score the product for relevance"
522   - # "rank products by given query" 比 “Given a query, score the product for relevance” 更好点
523   - # instruction: "rank products by given query, category match first"
524   - # instruction: "Rank products by query relevance, prioritizing category match"
525   - # instruction: "Rank products by query relevance, prioritizing category and style match"
526   - # instruction: "Rank by query relevance, prioritize category & style"
527   - # instruction: "Relevance ranking: category & style match first"
528   - # instruction: "Score product relevance by query with category & style match prioritized"
529   - # instruction: "Rank products by query with category & style match prioritized"
530   - # instruction: "Given a fashion shopping query, retrieve relevant products that answer the query"
  476 + instruction_format: standard
531 477 instruction: rank products by given query
532   -
533   - # vLLM LLM.score()(跨编码打分)。独立高性能环境 .venv-reranker-score(vllm 0.18 固定版):./scripts/setup_reranker_venv.sh qwen3_vllm_score
534   - # 与 qwen3_vllm 可共用同一 model_name / HF 缓存;venv 分离以便升级 vLLM 而不影响 generate 后端。
535 478 qwen3_vllm_score:
536 479 model_name: Qwen/Qwen3-Reranker-0.6B
537   - # 官方 Hub 原版需 true;若改用已转换的 seq-cls 权重(如 tomaarsen/...-seq-cls)则设为 false
538 480 use_original_qwen3_hf_overrides: true
539   - # vllm_runner: "auto"
540   - # vllm_convert: "auto"
541   - # 可选:在 use_original_qwen3_hf_overrides 为 true 时与内置 overrides 合并
542   - # hf_overrides: {}
543 481 engine: vllm
544 482 max_model_len: 172
545 483 tensor_parallel_size: 1
... ... @@ -549,10 +487,7 @@ services:
549 487 enforce_eager: false
550 488 infer_batch_size: 80
551 489 sort_by_doc_length: true
552   - # 默认 standard 与 vLLM 官方 Qwen3 reranker 前缀一致
553   - instruction_format: standard # compact standard
554   - # instruction: "Rank products by query with category & style match prioritized"
555   - # instruction: "Given a shopping query, rank products by relevance"
  490 + instruction_format: standard
556 491 instruction: Rank products by query with category & style match prioritized
557 492 qwen3_transformers:
558 493 model_name: Qwen/Qwen3-Reranker-0.6B
... ... @@ -620,25 +555,19 @@ services:
620 555 endpoint: https://dashscope.aliyuncs.com/compatible-api/v1/reranks
621 556 api_key_env: RERANK_DASHSCOPE_API_KEY_CN
622 557 timeout_sec: 10.0
623   - top_n_cap: 0 # 0 表示 top_n=当前请求文档数
624   - batchsize: 64 # 0 关闭;>0 启用并发小包调度(top_n/top_n_cap 仍生效,分包后全局截断)
  558 + top_n_cap: 0
  559 + batchsize: 64
625 560 instruct: Given a shopping query, rank product titles by relevance
626 561 max_retries: 2
627 562 retry_backoff_sec: 0.2
628   -
629 563 spu_config:
630 564 enabled: true
631 565 spu_field: spu_id
632 566 inner_hits_size: 10
633   - # 配置哪些option维度参与检索(进索引、以及在线搜索)
634   - # 格式为list,选择option1/option2/option3中的一个或多个
635 567 searchable_option_dimensions:
636 568 - option1
637 569 - option2
638 570 - option3
639   -
640   -# 每个租户可配置主语言 primary_language 与索引语言 index_languages(主市场语言,商家可勾选)
641   -# 默认 index_languages: [en, zh],可配置为任意 SOURCE_LANG_CODE_MAP.keys() 的子集
642 571 tenant_config:
643 572 default:
644 573 primary_language: en
... ...
docs/issues/issue-2026-04-16-bayes寻参-clothing_top771数据集上寻参.md 0 → 100644
... ... @@ -0,0 +1,89 @@
  1 +Prompt - 1
  2 +
  3 +二、在大标注集上寻参
  4 +
  5 +我以前经过过一轮调参,是基于54个评测样本(queries.txt),过程中发现的最优的参数是这一组:
  6 +0.641241 {'es_bias': '7.214', 'es_exponent': '0.2025', 'text_bias': '4.0', 'text_exponent': '1.584', 'text_translation_weight': '1.4441', 'knn_text_weight': '0.1', 'knn_image_weight': '5.6232', 'knn_tie_breaker':
  7 + '0.021', 'knn_bias': '0.0019', 'knn_exponent': '11.8477', 'knn_text_bias': '2.3125', 'knn_text_exponent': '1.1547', 'knn_image_bias': '0.9641', 'knn_image_exponent': '5.8671'}
  8 +
  9 +这一组参数分布比较极端,text_bias太大(文本项得分事0~1的,加上4被稀释的很大),图片的exponent太大,不过在这个数据集上面确实是最好的,我觉得有过拟合的可能,因此要扩大数据集,先扩展标注集,然后使用扩展的标注集,继续进行寻参。
  10 +
  11 +因此新建了一个标注集合,标注任务也已经完成:Clothing Filtered 771。请启动寻参任务,把任务跑起来,以后等程序跑完了应该能拿到寻参结果,下次你可以结合程序执行的结果进行结论分析。
  12 +
  13 +至于调参方式,请参考以前的一轮调参:
  14 +我当时的调参需求:
  15 +
  16 +请对coarse_rank fusion公式进行调参:
  17 + 目前的baseline是这一组,Primary_Metric_Score: 0.637642:
  18 + coarse_rank:
  19 + ...
  20 + fusion:
  21 + es_bias: 10.0
  22 + es_exponent: 0.05
  23 + text_bias: 0.1
  24 + text_exponent: 0.35
  25 + text_translation_weight: 1.0
  26 + knn_text_weight: 1.0
  27 + knn_image_weight: 2.0
  28 + knn_tie_breaker: 0.3
  29 + knn_bias: 0.2
  30 + knn_exponent: 5.6
  31 + knn_text_bias: 0.2
  32 + knn_text_exponent: 0.0
  33 + knn_image_bias: 0.2
  34 + knn_image_exponent: 0.0
  35 + 评估指标在/data/saas-search/artifacts/search_evaluation/batch_reports/batch_20260415T150754Z_00b6a8aa3d.md
  36 + 请以这个为基准,发散思维,进行宽一点的范围调参。因为每次重启、评测都需要几分钟,请写一个调参的框架,基于框架、设定好多组参数、写好脚本,每轮收集效果自动的调整参数分布(因为笛卡尔积、多种参数调参成本太高,因此考虑贝叶斯调参等方法通
  37 + 过多轮迭代通过脚本自动收敛)
  38 + 每次调整参数后需要重启backend (有时候重启backend后eval-web好像也挂了,如果有可以追查原因并修复)
  39 + ./restart.sh backend
  40 + 注意:请你调试脚本、进行一轮分析,最终要沉淀一套调参脚本,我下次可以重新跑(还是针对这组参数调参),能自动迭代(调整参数分布)、收集每组参数的指标、调优到最优的参数组合。
  41 +
  42 +
  43 +
  44 +你当时给出的调参脚本( “种子实验 + 随机探索 + 基于历史最优分布的自适应采样”反复迭代。搜索范围已经放宽,新增了多组偏离 baseline 很远的 seed,也保留了 baseline 附近抖动,适合长时间慢慢收敛。):
  45 +
  46 + 长时间自动调参脚本已经配好,核心文件是:
  47 +
  48 + - 调参主程序:scripts/evaluation/tune_fusion.py
  49 + - 宽范围搜索空间:scripts/evaluation/tuning/coarse_rank_fusion_space.yaml
  50 + - 后台长跑启动脚本:scripts/evaluation/start_coarse_fusion_tuning_long.sh
  51 + - 续跑脚本:scripts/evaluation/resume_coarse_fusion_tuning_long.sh
  52 + - 使用说明:scripts/evaluation/tuning/README.md
  53 +
  54 + 直接启动一轮:
  55 +
  56 + ./scripts/evaluation/start_coarse_fusion_tuning_long.sh
  57 +
  58 + 如果想跑更久一点:
  59 +
  60 + MAX_EVALS=48 BATCH_SIZE=3 CANDIDATE_POOL_SIZE=512 RUN_NAME=coarse_fusion_long_001 \
  61 + ./scripts/evaluation/start_coarse_fusion_tuning_long.sh
  62 +
  63 + 看日志:
  64 +
  65 + tail -f artifacts/search_evaluation/tuning_launches/<run_name>.log
  66 +
  67 + 看结果目录:
  68 +
  69 + ls artifacts/search_evaluation/tuning_runs/<run_name>/
  70 +
  71 + 续跑:
  72 +
  73 + ./scripts/evaluation/resume_coarse_fusion_tuning_long.sh <run_name>
  74 +
  75 + 每轮会自动做这些事:
  76 +
  77 + - 生成一批候选 coarse_rank.fusion 参数
  78 + - 写入配置并重启 backend
  79 + - 检查 eval-web,必要时自愈拉起
  80 + - 跑评测并收集 Primary_Metric_Score
  81 + - 更新 trials.jsonl、leaderboard.csv、summary.md
  82 + - 基于历史结果调整下一轮采样分布
  83 +
  84 +上面只是历史的寻参任务的参考。现在请你使用新的数据集来进行寻参。
  85 +要注意的是这个数据集比较大,因此每一轮行参任务耗时会比较长,进行大范围的精细的搜索不太可取。考虑仔细分析上次寻参的结果,在已有成果上,进行精细搜索,如果已有成果还不太够,可以在小数据集上进行粗搜,然后再到大数据集上寻参。
  86 +
  87 +
  88 +
  89 +Response - 1
... ...
docs/issues/issue-2026-04-16-数据集扩增&bayes寻参-TODO.md
... ... @@ -377,10 +377,9 @@ CLI / 启动脚本设计
377 377  
378 378 这一组参数分布比较极端,text_bias太大(文本项得分事0~1的,加上4被稀释的很大),图片的exponent太大,不过在这个数据集上面确实是最好的,我觉得有过拟合的可能,因此要扩大数据集,先扩展标注集,然后使用扩展的标注集,继续进行寻参。
379 379  
380   -我已经新建了一个标注集合,请启动寻参任务,把任务跑起来,以后等程序跑完了应该能拿到寻参结果,下次你可以结合程序执行的结果进行结论分析。
  380 +因此新建了一个标注集合,标注任务也已经完成:Clothing Filtered 771。请启动寻参任务,把任务跑起来,以后等程序跑完了应该能拿到寻参结果,下次你可以结合程序执行的结果进行结论分析。
381 381  
382   -
383   -以前的一轮调参:
  382 +至于调参方式,请参考以前的一轮调参:
384 383 我当时的调参需求:
385 384  
386 385 请对coarse_rank fusion公式进行调参:
... ... @@ -411,7 +410,7 @@ CLI / 启动脚本设计
411 410  
412 411  
413 412  
414   -你给出的调参脚本
  413 +你当时给出的调参脚本( “种子实验 + 随机探索 + 基于历史最优分布的自适应采样”反复迭代。搜索范围已经放宽,新增了多组偏离 baseline 很远的 seed,也保留了 baseline 附近抖动,适合长时间慢慢收敛。)
415 414  
416 415 长时间自动调参脚本已经配好,核心文件是:
417 416  
... ... @@ -421,8 +420,6 @@ CLI / 启动脚本设计
421 420 - 续跑脚本:scripts/evaluation/resume_coarse_fusion_tuning_long.sh
422 421 - 使用说明:scripts/evaluation/tuning/README.md
423 422  
424   - 这套方案不是笛卡尔积穷举,而是“种子实验 + 随机探索 + 基于历史最优分布的自适应采样”反复迭代。搜索范围已经放宽,新增了多组偏离 baseline 很远的 seed,也保留了 baseline 附近抖动,适合长时间慢慢收敛。
425   -
426 423 直接启动一轮:
427 424  
428 425 ./scripts/evaluation/start_coarse_fusion_tuning_long.sh
... ... @@ -453,56 +450,5 @@ CLI / 启动脚本设计
453 450 - 更新 trials.jsonl、leaderboard.csv、summary.md
454 451 - 基于历史结果调整下一轮采样分布
455 452  
456   - 你先跑一轮长任务。跑完后把 run_name 或 scripts/evaluation/tuning/README.md 里对应实验目录发我,我再基于结果做下一步分析和收缩搜索空间。
457   -
458   -
459   -
460   -
461   -› 我运行了:MAX_EVALS=400 BATCH_SIZE=3 CANDIDATE_POOL_SIZE=512 RUN_NAME=coarse_fusion_long_001 ./scripts/evaluation/start_coarse_fusion_tuning_long.sh
462   -
463   - run_name=coarse_fusion_long_001
464   - pid=2218620
465   - log=artifacts/search_evaluation/tuning_launches/coarse_fusion_long_001.log
466   - pid_file=artifacts/search_evaluation/tuning_launches/coarse_fusion_long_001.pid
467   - cmd_file=artifacts/search_evaluation/tuning_launches/coarse_fusion_long_001.cmd
468   - run_dir=artifacts/search_evaluation/tuning_runs/coarse_fusion_long_001
469   -
470   - tail -f artifacts/search_evaluation/tuning_launches/coarse_fusion_long_001.log
471   - cat artifacts/search_evaluation/tuning_runs/coarse_fusion_long_001/leaderboard.csv
472   -
473   -
474   -• 我先看这次长跑的日志和当前排行榜,确认任务是否正常推进、是否有异常中断,以及目前最优参数落在哪个方向。
475   -
476   -• Explored
477   - └ Read coarse_fusion_long_001.log, leaderboard.csv
478   - List coarse_fusion_long_001
479   - Read summary.md
480   -
481   -───
482   -
483   -• 结果已经足够看出明显趋势了。我再补一层聚合分析,看看 top 区间里哪些参数方向反复出现,避免只盯着单个最优点。
484   -
485   -• Ran ps -p 2218620 -o pid=,etimes=,cmd=
486   - └ (no output)
487   -
488   -• Ran python - <<'PY'
489   - │ import csv, statistics, math
490   - │ from pathlib import Path
491   - │ … +15 lines
492   - └ rows 385
493   -
494   - … +61 lines (ctrl + t to view transcript)
495   - knn_image_bias: mean=0.5631 median=0.2000 min=0.0010 max=4.0000
496   - knn_image_exponent: mean=4.6316 median=5.8680 min=0.0000 max=6.0000
497   -
498   -• Ran python - <<'PY'
499   - │ import csv
500   - │ from pathlib import Path
501   - │ … +19 lines
502   - └ baseline_score 0.637642
503   -
504   - … +30 lines (ctrl + t to view transcript)
505   - 10 bo_234 0.641241 {'es_bias': '7.214', 'es_exponent': '0.2025', 'text_bias': '4.0', 'text_exponent': '1.584', 'text_translation_weight': '1.4441', 'knn_text_weight': '0.1', 'knn_image_weight': '5.6232', 'knn_tie_breaker':
506   - '0.021', 'knn_bias': '0.0019', 'knn_exponent': '11.8477', 'knn_text_bias': '2.3125', 'knn_text_exponent': '1.1547', 'knn_image_bias': '0.9641', 'knn_image_exponent': '5.8671'}
507   -
508   -这一次因为外部原因(磁盘满)终止了,以上是最好的一组参数。
  453 +上面只是历史的寻参任务的参考。现在请你使用新的数据集来进行寻参。
  454 +要注意的是这个数据集比较大,因此每一轮行参任务耗时会比较长,进行大范围的精细的搜索不太可取。考虑仔细分析上次寻参的结果,在已有成果上,进行精细搜索,如果已有成果还不太够,可以在小数据集上进行粗搜,然后再到大数据集上寻参。
509 455 \ No newline at end of file
... ...
scripts/evaluation/resume_coarse_fusion_tuning_long.sh
... ... @@ -26,10 +26,28 @@ if [ ! -d &quot;${RUN_DIR}&quot; ]; then
26 26 exit 1
27 27 fi
28 28  
29   -MAX_EVALS="${MAX_EVALS:-36}"
30   -BATCH_SIZE="${BATCH_SIZE:-3}"
31   -CANDIDATE_POOL_SIZE="${CANDIDATE_POOL_SIZE:-512}"
32 29 DATASET_ID="${REPO_EVAL_DATASET_ID:-core_queries}"
  30 +case "${DATASET_ID}" in
  31 + clothing_top771)
  32 + DEFAULT_SEED_REPORT="artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md"
  33 + DEFAULT_MAX_EVALS="18"
  34 + DEFAULT_BATCH_SIZE="2"
  35 + DEFAULT_CANDIDATE_POOL_SIZE="160"
  36 + ;;
  37 + *)
  38 + DEFAULT_SEED_REPORT="artifacts/search_evaluation/batch_reports/batch_20260415T150754Z_00b6a8aa3d.md"
  39 + DEFAULT_MAX_EVALS="36"
  40 + DEFAULT_BATCH_SIZE="3"
  41 + DEFAULT_CANDIDATE_POOL_SIZE="512"
  42 + ;;
  43 +esac
  44 +
  45 +SEED_REPORT="${SEED_REPORT:-${DEFAULT_SEED_REPORT}}"
  46 +MAX_EVALS="${MAX_EVALS:-${DEFAULT_MAX_EVALS}}"
  47 +BATCH_SIZE="${BATCH_SIZE:-${DEFAULT_BATCH_SIZE}}"
  48 +CANDIDATE_POOL_SIZE="${CANDIDATE_POOL_SIZE:-${DEFAULT_CANDIDATE_POOL_SIZE}}"
  49 +BATCH_EVAL_TIMEOUT_SEC="${BATCH_EVAL_TIMEOUT_SEC:-0}"
  50 +RANDOM_SEED="${RANDOM_SEED:-20260422}"
33 51  
34 52 LAUNCH_DIR="artifacts/search_evaluation/tuning_launches"
35 53 mkdir -p "${LAUNCH_DIR}"
... ... @@ -38,28 +56,25 @@ PID_PATH=&quot;${LAUNCH_DIR}/${RUN_NAME}.resume.pid&quot;
38 56 CMD_PATH="${LAUNCH_DIR}/${RUN_NAME}.resume.cmd"
39 57  
40 58 CMD=(
41   - python
42   - scripts/evaluation/tune_fusion.py
43   - --mode optimize
44   - --resume-run "${RUN_DIR}"
45   - --search-space "${RUN_DIR}/search_space.yaml"
46   - --seed-report artifacts/search_evaluation/batch_reports/batch_20260415T150754Z_00b6a8aa3d.md
47   - --tenant-id 163
48   - --dataset-id "${DATASET_ID}"
49   - --queries-file scripts/evaluation/queries/queries.txt
50   - --top-k 100
51   - --language en
52   - --search-base-url http://127.0.0.1:6002
53   - --eval-web-base-url http://127.0.0.1:6010
54   - --max-evals "${MAX_EVALS}"
55   - --batch-size "${BATCH_SIZE}"
56   - --candidate-pool-size "${CANDIDATE_POOL_SIZE}"
  59 + bash
  60 + scripts/evaluation/run_coarse_fusion_tuning_resilient.sh
  61 + "${RUN_NAME}"
  62 + "${DATASET_ID}"
  63 + "${MAX_EVALS}"
  64 + "${BATCH_SIZE}"
  65 + "${CANDIDATE_POOL_SIZE}"
  66 + "${RANDOM_SEED}"
  67 + "${RUN_DIR}/search_space.yaml"
  68 + "${SEED_REPORT}"
  69 + "${RUN_DIR}"
57 70 )
58 71  
59 72 if [ "$#" -gt 0 ]; then
60 73 CMD+=("$@")
61 74 fi
62 75  
  76 +export BATCH_EVAL_TIMEOUT_SEC
  77 +
63 78 printf '%q ' "${CMD[@]}" > "${CMD_PATH}"
64 79 printf '\n' >> "${CMD_PATH}"
65 80  
... ...
scripts/evaluation/run_coarse_fusion_tuning_resilient.sh 0 → 100755
... ... @@ -0,0 +1,117 @@
  1 +#!/bin/bash
  2 +
  3 +set -euo pipefail
  4 +
  5 +cd "$(dirname "$0")/../.."
  6 +source ./activate.sh
  7 +
  8 +usage() {
  9 + echo "usage: $0 <run_name> <dataset_id> <max_evals> <batch_size> <candidate_pool_size> <random_seed> <search_space> <seed_report> [resume_run_dir]" >&2
  10 + exit 1
  11 +}
  12 +
  13 +if [ "$#" -lt 8 ]; then
  14 + usage
  15 +fi
  16 +
  17 +RUN_NAME="$1"
  18 +DATASET_ID="$2"
  19 +MAX_EVALS="$3"
  20 +BATCH_SIZE="$4"
  21 +CANDIDATE_POOL_SIZE="$5"
  22 +RANDOM_SEED="$6"
  23 +SEARCH_SPACE="$7"
  24 +SEED_REPORT="$8"
  25 +RESUME_RUN_DIR="${9:-}"
  26 +
  27 +BATCH_EVAL_TIMEOUT_SEC="${BATCH_EVAL_TIMEOUT_SEC:-0}"
  28 +RESTART_SLEEP_SEC="${RESTART_SLEEP_SEC:-30}"
  29 +SEARCH_BASE_URL="${SEARCH_BASE_URL:-http://127.0.0.1:6002}"
  30 +EVAL_WEB_BASE_URL="${EVAL_WEB_BASE_URL:-http://127.0.0.1:6010}"
  31 +RUN_DIR="artifacts/search_evaluation/tuning_runs/${RUN_NAME}"
  32 +
  33 +mkdir -p "$(dirname "$RUN_DIR")"
  34 +
  35 +count_live_successes() {
  36 + python3 - "$RUN_DIR" <<'PY'
  37 +import json
  38 +import sys
  39 +from pathlib import Path
  40 +
  41 +run_dir = Path(sys.argv[1])
  42 +path = run_dir / "trials.jsonl"
  43 +count = 0
  44 +if path.is_file():
  45 + for line in path.read_text(encoding="utf-8").splitlines():
  46 + line = line.strip()
  47 + if not line:
  48 + continue
  49 + obj = json.loads(line)
  50 + if obj.get("status") == "ok" and not obj.get("is_seed"):
  51 + count += 1
  52 +print(count)
  53 +PY
  54 +}
  55 +
  56 +build_cmd() {
  57 + local cmd=(
  58 + python
  59 + scripts/evaluation/tune_fusion.py
  60 + --mode optimize
  61 + --search-space "$SEARCH_SPACE"
  62 + --seed-report "$SEED_REPORT"
  63 + --tenant-id 163
  64 + --dataset-id "$DATASET_ID"
  65 + --queries-file scripts/evaluation/queries/queries.txt
  66 + --top-k 100
  67 + --language en
  68 + --search-base-url "$SEARCH_BASE_URL"
  69 + --eval-web-base-url "$EVAL_WEB_BASE_URL"
  70 + --max-evals "$MAX_EVALS"
  71 + --batch-size "$BATCH_SIZE"
  72 + --candidate-pool-size "$CANDIDATE_POOL_SIZE"
  73 + --random-seed "$RANDOM_SEED"
  74 + --batch-eval-timeout-sec "$BATCH_EVAL_TIMEOUT_SEC"
  75 + )
  76 + if [ -n "$RESUME_RUN_DIR" ]; then
  77 + cmd+=(--resume-run "$RESUME_RUN_DIR")
  78 + else
  79 + cmd+=(--run-name "$RUN_NAME")
  80 + fi
  81 + printf '%q ' "${cmd[@]}"
  82 + printf '\n'
  83 +}
  84 +
  85 +attempt=0
  86 +while true; do
  87 + live_successes="$(count_live_successes)"
  88 + if [ "$live_successes" -ge "$MAX_EVALS" ]; then
  89 + echo "[resilient] complete run_name=$RUN_NAME live_successes=$live_successes target=$MAX_EVALS"
  90 + exit 0
  91 + fi
  92 +
  93 + attempt=$((attempt + 1))
  94 + if [ -d "$RUN_DIR" ]; then
  95 + RESUME_RUN_DIR="$RUN_DIR"
  96 + fi
  97 +
  98 + echo "[resilient] attempt=$attempt run_name=$RUN_NAME live_successes=$live_successes target=$MAX_EVALS"
  99 + CMD_STR="$(build_cmd)"
  100 + echo "[resilient] cmd=$CMD_STR"
  101 +
  102 + set +e
  103 + bash -lc "$CMD_STR"
  104 + exit_code=$?
  105 + set -e
  106 +
  107 + live_successes="$(count_live_successes)"
  108 + echo "[resilient] exit_code=$exit_code live_successes=$live_successes"
  109 +
  110 + if [ "$live_successes" -ge "$MAX_EVALS" ]; then
  111 + echo "[resilient] finished after attempt=$attempt"
  112 + exit 0
  113 + fi
  114 +
  115 + echo "[resilient] sleeping ${RESTART_SLEEP_SEC}s before resume"
  116 + sleep "$RESTART_SLEEP_SEC"
  117 +done
... ...
scripts/evaluation/start_coarse_fusion_tuning_long.sh
... ... @@ -5,12 +5,34 @@ set -euo pipefail
5 5 cd "$(dirname "$0")/../.."
6 6 source ./activate.sh
7 7  
8   -RUN_NAME="${RUN_NAME:-coarse_fusion_long_$(date -u +%Y%m%dT%H%M%SZ)}"
9   -MAX_EVALS="${MAX_EVALS:-36}"
10   -BATCH_SIZE="${BATCH_SIZE:-3}"
11   -CANDIDATE_POOL_SIZE="${CANDIDATE_POOL_SIZE:-512}"
12   -RANDOM_SEED="${RANDOM_SEED:-20260416}"
13 8 DATASET_ID="${REPO_EVAL_DATASET_ID:-core_queries}"
  9 +case "${DATASET_ID}" in
  10 + clothing_top771)
  11 + DEFAULT_SEARCH_SPACE="scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml"
  12 + DEFAULT_SEED_REPORT="artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md"
  13 + DEFAULT_MAX_EVALS="18"
  14 + DEFAULT_BATCH_SIZE="2"
  15 + DEFAULT_CANDIDATE_POOL_SIZE="160"
  16 + DEFAULT_RANDOM_SEED="20260422"
  17 + ;;
  18 + *)
  19 + DEFAULT_SEARCH_SPACE="scripts/evaluation/tuning/coarse_rank_fusion_space.yaml"
  20 + DEFAULT_SEED_REPORT="artifacts/search_evaluation/batch_reports/batch_20260415T150754Z_00b6a8aa3d.md"
  21 + DEFAULT_MAX_EVALS="36"
  22 + DEFAULT_BATCH_SIZE="3"
  23 + DEFAULT_CANDIDATE_POOL_SIZE="512"
  24 + DEFAULT_RANDOM_SEED="20260416"
  25 + ;;
  26 +esac
  27 +
  28 +RUN_NAME="${RUN_NAME:-coarse_fusion_${DATASET_ID}_$(date -u +%Y%m%dT%H%M%SZ)}"
  29 +SEARCH_SPACE="${SEARCH_SPACE:-${DEFAULT_SEARCH_SPACE}}"
  30 +SEED_REPORT="${SEED_REPORT:-${DEFAULT_SEED_REPORT}}"
  31 +MAX_EVALS="${MAX_EVALS:-${DEFAULT_MAX_EVALS}}"
  32 +BATCH_SIZE="${BATCH_SIZE:-${DEFAULT_BATCH_SIZE}}"
  33 +CANDIDATE_POOL_SIZE="${CANDIDATE_POOL_SIZE:-${DEFAULT_CANDIDATE_POOL_SIZE}}"
  34 +RANDOM_SEED="${RANDOM_SEED:-${DEFAULT_RANDOM_SEED}}"
  35 +BATCH_EVAL_TIMEOUT_SEC="${BATCH_EVAL_TIMEOUT_SEC:-0}"
14 36  
15 37 LAUNCH_DIR="artifacts/search_evaluation/tuning_launches"
16 38 mkdir -p "${LAUNCH_DIR}"
... ... @@ -19,29 +41,24 @@ PID_PATH=&quot;${LAUNCH_DIR}/${RUN_NAME}.pid&quot;
19 41 CMD_PATH="${LAUNCH_DIR}/${RUN_NAME}.cmd"
20 42  
21 43 CMD=(
22   - python
23   - scripts/evaluation/tune_fusion.py
24   - --mode optimize
25   - --run-name "${RUN_NAME}"
26   - --search-space scripts/evaluation/tuning/coarse_rank_fusion_space.yaml
27   - --seed-report artifacts/search_evaluation/batch_reports/batch_20260415T150754Z_00b6a8aa3d.md
28   - --tenant-id 163
29   - --dataset-id "${DATASET_ID}"
30   - --queries-file scripts/evaluation/queries/queries.txt
31   - --top-k 100
32   - --language en
33   - --search-base-url http://127.0.0.1:6002
34   - --eval-web-base-url http://127.0.0.1:6010
35   - --max-evals "${MAX_EVALS}"
36   - --batch-size "${BATCH_SIZE}"
37   - --candidate-pool-size "${CANDIDATE_POOL_SIZE}"
38   - --random-seed "${RANDOM_SEED}"
  44 + bash
  45 + scripts/evaluation/run_coarse_fusion_tuning_resilient.sh
  46 + "${RUN_NAME}"
  47 + "${DATASET_ID}"
  48 + "${MAX_EVALS}"
  49 + "${BATCH_SIZE}"
  50 + "${CANDIDATE_POOL_SIZE}"
  51 + "${RANDOM_SEED}"
  52 + "${SEARCH_SPACE}"
  53 + "${SEED_REPORT}"
39 54 )
40 55  
41 56 if [ "$#" -gt 0 ]; then
42 57 CMD+=("$@")
43 58 fi
44 59  
  60 +export BATCH_EVAL_TIMEOUT_SEC
  61 +
45 62 printf '%q ' "${CMD[@]}" > "${CMD_PATH}"
46 63 printf '\n' >> "${CMD_PATH}"
47 64  
... ...
scripts/evaluation/tune_fusion.py
... ... @@ -379,6 +379,7 @@ def run_batch_eval(
379 379 top_k: int,
380 380 language: str,
381 381 force_refresh_labels: bool,
  382 + timeout_sec: int,
382 383 ) -> Dict[str, Any]:
383 384 cmd = [
384 385 str(PROJECT_ROOT / ".venv" / "bin" / "python"),
... ... @@ -397,13 +398,14 @@ def run_batch_eval(
397 398 cmd.extend(["--queries-file", str(queries_file)])
398 399 if force_refresh_labels:
399 400 cmd.append("--force-refresh-labels")
  401 + timeout = timeout_sec if timeout_sec and timeout_sec > 0 else None
400 402 completed = subprocess.run(
401 403 cmd,
402 404 cwd=PROJECT_ROOT,
403 405 check=True,
404 406 capture_output=True,
405 407 text=True,
406   - timeout=7200,
  408 + timeout=timeout,
407 409 )
408 410 output = (completed.stdout or "") + "\n" + (completed.stderr or "")
409 411 batch_ids = re.findall(r"batch_id=([A-Za-z0-9_]+)", output)
... ... @@ -1221,6 +1223,7 @@ def run_optimize_mode(args: argparse.Namespace) -&gt; None:
1221 1223 top_k=args.top_k,
1222 1224 language=args.language,
1223 1225 force_refresh_labels=force_refresh_labels,
  1226 + timeout_sec=args.batch_eval_timeout_sec,
1224 1227 )
1225 1228 ensure_disk_headroom(
1226 1229 min_free_gb=args.min_free_gb,
... ... @@ -1362,6 +1365,7 @@ def build_parser() -&gt; argparse.ArgumentParser:
1362 1365 parser.add_argument("--resume-run", default=None)
1363 1366 parser.add_argument("--max-evals", type=int, default=12)
1364 1367 parser.add_argument("--batch-size", type=int, default=3)
  1368 + parser.add_argument("--batch-eval-timeout-sec", type=int, default=0)
1365 1369 parser.add_argument("--init-random", type=int, default=None)
1366 1370 parser.add_argument("--candidate-pool-size", type=int, default=None)
1367 1371 parser.add_argument("--random-seed", type=int, default=20260415)
... ...
scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml 0 → 100644
... ... @@ -0,0 +1,161 @@
  1 +target_path: coarse_rank.fusion
  2 +
  3 +baseline:
  4 + es_bias: 10.0
  5 + es_exponent: 0.05
  6 + text_bias: 0.1
  7 + text_exponent: 0.35
  8 + text_translation_weight: 1.0
  9 + knn_text_weight: 1.0
  10 + knn_image_weight: 2.0
  11 + knn_tie_breaker: 0.3
  12 + knn_bias: 0.2
  13 + knn_exponent: 5.6
  14 + knn_text_bias: 0.2
  15 + knn_text_exponent: 0.0
  16 + knn_image_bias: 0.2
  17 + knn_image_exponent: 0.0
  18 +
  19 +parameters:
  20 + es_bias: {min: 2.0, max: 20.0, scale: log, round: 4}
  21 + es_exponent: {min: 0.03, max: 0.28, scale: linear, round: 4}
  22 + text_bias: {min: 0.01, max: 4.0, scale: log, round: 4}
  23 + text_exponent: {min: 0.2, max: 1.6, scale: linear, round: 4}
  24 + text_translation_weight: {min: 0.7, max: 1.8, scale: linear, round: 4}
  25 + knn_text_weight: {min: 0.05, max: 1.8, scale: linear, round: 4}
  26 + knn_image_weight: {min: 1.2, max: 6.0, scale: linear, round: 4}
  27 + knn_tie_breaker: {min: 0.0, max: 0.4, scale: linear, round: 4}
  28 + knn_bias: {min: 0.001, max: 2.5, scale: log, round: 4}
  29 + knn_exponent: {min: 0.05, max: 12.0, scale: log, round: 4}
  30 + knn_text_bias: {min: 0.001, max: 4.0, scale: log, round: 4}
  31 + knn_text_exponent: {min: 0.0, max: 2.0, scale: linear, round: 4}
  32 + knn_image_bias: {min: 0.01, max: 1.5, scale: log, round: 4}
  33 + knn_image_exponent: {min: 0.0, max: 6.0, scale: linear, round: 4}
  34 +
  35 +seed_experiments:
  36 + - name: seed_low_knn_global
  37 + description: 先验证 021002 中出现的低 knn 全局指数,去掉 reranker 后是否仍有收益。
  38 + params:
  39 + knn_bias: 0.6
  40 + knn_exponent: 0.4
  41 + - name: seed_bigset_knn_soft
  42 + description: 从低 knn 全局指数出发,继续平滑 knn 非线性。
  43 + params:
  44 + text_exponent: 0.42
  45 + text_translation_weight: 1.05
  46 + knn_text_weight: 0.85
  47 + knn_image_weight: 2.4
  48 + knn_tie_breaker: 0.18
  49 + knn_bias: 0.9
  50 + knn_exponent: 0.18
  51 + knn_image_exponent: 0.2
  52 + - name: seed_bigset_knn_mid
  53 + description: 保留平滑 knn,但让 image 通路再强一点,验证大集是否需要适度非线性。
  54 + params:
  55 + es_bias: 8.0
  56 + es_exponent: 0.08
  57 + text_bias: 0.15
  58 + text_exponent: 0.5
  59 + text_translation_weight: 1.15
  60 + knn_text_weight: 0.65
  61 + knn_image_weight: 3.1
  62 + knn_tie_breaker: 0.12
  63 + knn_bias: 0.45
  64 + knn_exponent: 0.85
  65 + knn_text_bias: 0.35
  66 + knn_text_exponent: 0.2
  67 + knn_image_bias: 0.22
  68 + knn_image_exponent: 0.8
  69 + - name: seed_bigset_text_stable
  70 + description: 提高 lexical 区分度,观察大集是否更偏好稳健文本排序。
  71 + params:
  72 + es_bias: 7.0
  73 + es_exponent: 0.12
  74 + text_bias: 0.25
  75 + text_exponent: 0.72
  76 + text_translation_weight: 1.0
  77 + knn_text_weight: 0.55
  78 + knn_image_weight: 2.2
  79 + knn_tie_breaker: 0.08
  80 + knn_bias: 0.7
  81 + knn_exponent: 0.35
  82 + knn_text_bias: 0.5
  83 + knn_text_exponent: 0.4
  84 + knn_image_bias: 0.18
  85 + knn_image_exponent: 0.35
  86 + - name: seed_hybrid_transfer
  87 + description: 以大集 baseline 为主,温和吸收小集历史赢家中的 image/text 强化模式。
  88 + params:
  89 + es_bias: 7.2
  90 + es_exponent: 0.15
  91 + text_bias: 0.6
  92 + text_exponent: 0.82
  93 + text_translation_weight: 1.28
  94 + knn_text_weight: 0.45
  95 + knn_image_weight: 4.0
  96 + knn_tie_breaker: 0.08
  97 + knn_bias: 0.2
  98 + knn_exponent: 1.2
  99 + knn_text_bias: 0.8
  100 + knn_text_exponent: 0.45
  101 + knn_image_bias: 0.3
  102 + knn_image_exponent: 1.4
  103 + - name: seed_legacy_bo234
  104 + description: 直接验证 53 条集历史最优在 771 条集上的迁移表现。
  105 + params:
  106 + es_bias: 7.214
  107 + es_exponent: 0.2025
  108 + text_bias: 4.0
  109 + text_exponent: 1.584
  110 + text_translation_weight: 1.4441
  111 + knn_text_weight: 0.1
  112 + knn_image_weight: 5.6232
  113 + knn_tie_breaker: 0.021
  114 + knn_bias: 0.0019
  115 + knn_exponent: 11.8477
  116 + knn_text_bias: 2.3125
  117 + knn_text_exponent: 1.1547
  118 + knn_image_bias: 0.9641
  119 + knn_image_exponent: 5.8671
  120 + - name: seed_legacy_bo340
  121 + description: 验证小集冠军参数在大集上是否仍有价值。
  122 + params:
  123 + es_bias: 5.887
  124 + es_exponent: 0.2145
  125 + text_bias: 4.0
  126 + text_exponent: 1.6
  127 + text_translation_weight: 1.4788
  128 + knn_text_weight: 0.3693
  129 + knn_image_weight: 5.7028
  130 + knn_tie_breaker: 0.0174
  131 + knn_bias: 0.0016
  132 + knn_exponent: 12.0
  133 + knn_text_bias: 2.6071
  134 + knn_text_exponent: 1.0458
  135 + knn_image_bias: 0.8282
  136 + knn_image_exponent: 6.0
  137 + - name: seed_image_guard
  138 + description: 控制 image 权重但允许 image 子项指数,检查 recall 与 precision 的平衡点。
  139 + params:
  140 + es_bias: 9.0
  141 + es_exponent: 0.09
  142 + text_bias: 0.12
  143 + text_exponent: 0.45
  144 + text_translation_weight: 1.1
  145 + knn_text_weight: 0.7
  146 + knn_image_weight: 2.8
  147 + knn_tie_breaker: 0.1
  148 + knn_bias: 0.55
  149 + knn_exponent: 0.55
  150 + knn_text_bias: 0.25
  151 + knn_text_exponent: 0.15
  152 + knn_image_bias: 0.28
  153 + knn_image_exponent: 1.0
  154 +
  155 +optimizer:
  156 + init_random: 2
  157 + candidate_pool_size: 160
  158 + explore_probability: 0.12
  159 + local_jitter_probability: 0.62
  160 + elite_fraction: 0.25
  161 + min_normalized_distance: 0.08
... ...