Commit d3dd01d3687413795804d7c9164f10d8aadc585d

Authored by tangwang
1 parent 628ff04f

自动寻参:

- 把 batch timeout 改成“可无限长跑”:
  - [tune_fusion.py](/data/saas-search/scripts/evaluation/tune_fusion.py:400)
  - 现在 `--batch-eval-timeout-sec <= 0` 时,不再给 `subprocess.run` 设置 Python 层超时
- 新增 resilient wrapper,负责自动续跑:
  - [run_coarse_fusion_tuning_resilient.sh](/data/saas-search/scripts/evaluation/run_coarse_fusion_tuning_resilient.sh)
  - 逻辑是:检查 `trials.jsonl` 里已完成的 live eval 数量,没到 `max_evals` 就继续 `resume-run`
  - 即使异常退出,也会 sleep 后自动从已有 `run_dir` 继续
- 启动/续跑脚本都切到 resilient 模式:
  - [start_coarse_fusion_tuning_long.sh](/data/saas-search/scripts/evaluation/start_coarse_fusion_tuning_long.sh)
  - [resume_coarse_fusion_tuning_long.sh](/data/saas-search/scripts/evaluation/resume_coarse_fusion_tuning_long.sh)

**当前任务**
- `run_name`: `coarse_fusion_clothing_top771_resilient_20260422T091650Z`
- `run_dir`: [coarse_fusion_clothing_top771_resilient_20260422T091650Z](/data/saas-search/artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z)
- `launch log`: [coarse_fusion_clothing_top771_resilient_20260422T091650Z.log](/data/saas-search/artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_resilient_20260422T091650Z.log)

**已确认**
- wrapper 已启动并进入 `attempt=1`
- 真正传入的是 `--batch-eval-timeout-sec 0`
- `tune_fusion.py` 正在运行
- `build_annotation_set.py batch` 已经在运行
- `eval.log` 已经打出这轮的前几条 query 评测进度,说明不是空转

**监控方式**
- `tail -f artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_resilient_20260422T091650Z.log`
- `tail -f logs/eval.log`
- `tail -f artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/trials.jsonl`
- `cat artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/leaderboard.csv`

**这次和上次的关键区别**
- 上次是“单轮 batch 被 Python 超时截断”
- 这次是“单轮不设 Python 超时 + 外层 wrapper 自动续跑”
- 所以长时间运行、中途中断、再恢复,都会沿着同一个 `run_dir` 往下推进
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_20260422T023815Z.cmd 0 → 100644
@@ -0,0 +1 @@ @@ -0,0 +1 @@
  1 +python scripts/evaluation/tune_fusion.py --mode optimize --run-name coarse_fusion_clothing_top771_20260422T023815Z --search-space scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml --seed-report artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md --tenant-id 163 --dataset-id clothing_top771 --queries-file scripts/evaluation/queries/queries.txt --top-k 100 --language en --search-base-url http://127.0.0.1:6002 --eval-web-base-url http://127.0.0.1:6010 --max-evals 18 --batch-size 2 --candidate-pool-size 160 --random-seed 20260422
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_20260422T023815Z.pid 0 → 100644
@@ -0,0 +1 @@ @@ -0,0 +1 @@
  1 +3843738
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_20260422T023951Z.cmd 0 → 100644
@@ -0,0 +1 @@ @@ -0,0 +1 @@
  1 +python scripts/evaluation/tune_fusion.py --mode optimize --run-name coarse_fusion_clothing_top771_20260422T023951Z --search-space scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml --seed-report artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md --tenant-id 163 --dataset-id clothing_top771 --queries-file scripts/evaluation/queries/queries.txt --top-k 100 --language en --search-base-url http://127.0.0.1:6002 --eval-web-base-url http://127.0.0.1:6010 --max-evals 18 --batch-size 2 --candidate-pool-size 160 --random-seed 20260422
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_20260422T023951Z.pid 0 → 100644
@@ -0,0 +1 @@ @@ -0,0 +1 @@
  1 +3845416
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_dryrun.cmd 0 → 100644
@@ -0,0 +1 @@ @@ -0,0 +1 @@
  1 +python scripts/evaluation/tune_fusion.py --mode optimize --run-name coarse_fusion_clothing_top771_dryrun --search-space scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml --seed-report artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T021002Z_5426bba1a6/report.md --tenant-id 163 --dataset-id clothing_top771 --queries-file scripts/evaluation/queries/queries.txt --top-k 100 --language en --search-base-url http://127.0.0.1:6002 --eval-web-base-url http://127.0.0.1:6010 --max-evals 18 --batch-size 2 --candidate-pool-size 160 --random-seed 20260422 --help
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_dryrun.pid 0 → 100644
@@ -0,0 +1 @@ @@ -0,0 +1 @@
  1 +3842050
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_dryrun2.cmd 0 → 100644
@@ -0,0 +1 @@ @@ -0,0 +1 @@
  1 +python scripts/evaluation/tune_fusion.py --mode optimize --run-name coarse_fusion_clothing_top771_dryrun2 --search-space scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml --seed-report artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md --tenant-id 163 --dataset-id clothing_top771 --queries-file scripts/evaluation/queries/queries.txt --top-k 100 --language en --search-base-url http://127.0.0.1:6002 --eval-web-base-url http://127.0.0.1:6010 --max-evals 18 --batch-size 2 --candidate-pool-size 160 --random-seed 20260422 --help
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_dryrun2.pid 0 → 100644
@@ -0,0 +1 @@ @@ -0,0 +1 @@
  1 +3843512
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_resilient_dryrun.cmd 0 → 100644
@@ -0,0 +1 @@ @@ -0,0 +1 @@
  1 +bash scripts/evaluation/run_coarse_fusion_tuning_resilient.sh coarse_fusion_clothing_top771_resilient_dryrun clothing_top771 18 2 160 20260422 scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md --help
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_resilient_dryrun.pid 0 → 100644
@@ -0,0 +1 @@ @@ -0,0 +1 @@
  1 +4126011
config/config-with-reranker.yaml
@@ -260,6 +260,7 @@ function_score: @@ -260,6 +260,7 @@ function_score:
260 score_mode: sum 260 score_mode: sum
261 boost_mode: multiply 261 boost_mode: multiply
262 functions: [] 262 functions: []
  263 +
263 coarse_rank: 264 coarse_rank:
264 enabled: true 265 enabled: true
265 input_window: 480 266 input_window: 480
@@ -271,7 +272,7 @@ coarse_rank: @@ -271,7 +272,7 @@ coarse_rank:
271 text_exponent: 0.35 272 text_exponent: 0.35
272 # base_query_trans_* 相对 base_query 的权重(见 search/rerank_client 中文本 dismax 融合) 273 # base_query_trans_* 相对 base_query 的权重(见 search/rerank_client 中文本 dismax 融合)
273 # 因为es的打分已经给了trans进行了折扣,所以这里不再继续折扣 274 # 因为es的打分已经给了trans进行了折扣,所以这里不再继续折扣
274 - text_translation_weight: 0.8 275 + text_translation_weight: 1.0
275 knn_text_weight: 1.0 276 knn_text_weight: 1.0
276 knn_image_weight: 2.0 277 knn_image_weight: 2.0
277 knn_tie_breaker: 0.3 278 knn_tie_breaker: 0.3
config/config.yaml
@@ -100,11 +100,8 @@ es_settings: @@ -100,11 +100,8 @@ es_settings:
100 number_of_shards: 1 100 number_of_shards: 1
101 number_of_replicas: 0 101 number_of_replicas: 0
102 refresh_interval: 30s 102 refresh_interval: 30s
103 -  
104 -# 统一按“字段基名”配置;查询时按实际检索语言动态拼接 .{lang}  
105 field_boosts: 103 field_boosts:
106 title: 3.0 104 title: 3.0
107 - # qanchors enriched_tags 在 enriched_attributes.value中也存在,所以其实他的权重为自身权重+enriched_attributes.value的权重  
108 qanchors: 1.0 105 qanchors: 1.0
109 enriched_tags: 1.0 106 enriched_tags: 1.0
110 enriched_attributes.value: 1.5 107 enriched_attributes.value: 1.5
@@ -118,7 +115,6 @@ field_boosts: @@ -118,7 +115,6 @@ field_boosts:
118 brief: 1.0 115 brief: 1.0
119 description: 1.0 116 description: 1.0
120 vendor: 1.0 117 vendor: 1.0
121 -  
122 query_config: 118 query_config:
123 supported_languages: 119 supported_languages:
124 - zh 120 - zh
@@ -126,16 +122,12 @@ query_config: @@ -126,16 +122,12 @@ query_config:
126 default_language: en 122 default_language: en
127 enable_text_embedding: true 123 enable_text_embedding: true
128 enable_query_rewrite: true 124 enable_query_rewrite: true
129 -  
130 - zh_to_en_model: nllb-200-distilled-600m # nllb-200-distilled-600m deepl opus-mt-zh-en / opus-mt-en-zh 125 + zh_to_en_model: nllb-200-distilled-600m
131 en_to_zh_model: nllb-200-distilled-600m 126 en_to_zh_model: nllb-200-distilled-600m
132 default_translation_model: nllb-200-distilled-600m 127 default_translation_model: nllb-200-distilled-600m
133 - # 源语种不在 index_languages时翻译质量比较重要,因此单独配置  
134 zh_to_en_model__source_not_in_index: deepl 128 zh_to_en_model__source_not_in_index: deepl
135 en_to_zh_model__source_not_in_index: deepl 129 en_to_zh_model__source_not_in_index: deepl
136 default_translation_model__source_not_in_index: deepl 130 default_translation_model__source_not_in_index: deepl
137 -  
138 - # 查询解析阶段:翻译与 query 向量并发执行,共用同一等待预算(毫秒)  
139 translation_embedding_wait_budget_ms_source_in_index: 300 131 translation_embedding_wait_budget_ms_source_in_index: 300
140 translation_embedding_wait_budget_ms_source_not_in_index: 400 132 translation_embedding_wait_budget_ms_source_not_in_index: 400
141 style_intent: 133 style_intent:
@@ -165,31 +157,22 @@ query_config: @@ -165,31 +157,22 @@ query_config:
165 enabled: true 157 enabled: true
166 dictionary_path: config/dictionaries/product_title_exclusion.tsv 158 dictionary_path: config/dictionaries/product_title_exclusion.tsv
167 search_fields: 159 search_fields:
168 - # 统一按“字段基名”配置;查询时按实际检索语言动态拼接 .{lang}  
169 multilingual_fields: 160 multilingual_fields:
170 - title 161 - title
171 - keywords 162 - keywords
172 - qanchors 163 - qanchors
173 - enriched_tags 164 - enriched_tags
174 - enriched_attributes.value 165 - enriched_attributes.value
175 - # - enriched_taxonomy_attributes.value  
176 - option1_values 166 - option1_values
177 - option2_values 167 - option2_values
178 - option3_values 168 - option3_values
179 - category_path 169 - category_path
180 - category_name_text 170 - category_name_text
181 - # - brief  
182 - # - description  
183 - # - vendor  
184 - # shared_fields: 无语言后缀字段;示例: tags, option1_values, option2_values, option3_values  
185 -  
186 shared_fields: null 171 shared_fields: null
187 core_multilingual_fields: 172 core_multilingual_fields:
188 - title 173 - title
189 - qanchors 174 - qanchors
190 - category_name_text 175 - category_name_text
191 -  
192 - # 文本召回(主查询 + 翻译查询)  
193 text_query_strategy: 176 text_query_strategy:
194 base_minimum_should_match: 60% 177 base_minimum_should_match: 60%
195 translation_minimum_should_match: 60% 178 translation_minimum_should_match: 60%
@@ -206,8 +189,6 @@ query_config: @@ -206,8 +189,6 @@ query_config:
206 phrase_match_boost: 3.0 189 phrase_match_boost: 3.0
207 text_embedding_field: title_embedding 190 text_embedding_field: title_embedding
208 image_embedding_field: image_embedding.vector 191 image_embedding_field: image_embedding.vector
209 -  
210 - # null表示返回所有字段,[]表示不返回任何字段  
211 source_fields: 192 source_fields:
212 - spu_id 193 - spu_id
213 - handle 194 - handle
@@ -223,13 +204,8 @@ query_config: @@ -223,13 +204,8 @@ query_config:
223 - category1_name 204 - category1_name
224 - category2_name 205 - category2_name
225 - category3_name 206 - category3_name
226 - # - tags  
227 - # - keywords  
228 - # - qanchors  
229 - # - enriched_tags  
230 - enriched_attributes 207 - enriched_attributes
231 - enriched_taxonomy_attributes 208 - enriched_taxonomy_attributes
232 -  
233 - min_price 209 - min_price
234 - compare_at_price 210 - compare_at_price
235 - image_url 211 - image_url
@@ -245,17 +221,14 @@ query_config: @@ -245,17 +221,14 @@ query_config:
245 - option3_values 221 - option3_values
246 - specifications 222 - specifications
247 - skus 223 - skus
248 -  
249 - # KNN:文本向量与多模态(图片)向量各自 boost 与召回(k / num_candidates)  
250 knn_text_boost: 4 224 knn_text_boost: 4
251 knn_image_boost: 4 225 knn_image_boost: 4
252 knn_text_k: 160 226 knn_text_k: 160
253 - knn_text_num_candidates: 560 # k * 3.4 227 + knn_text_num_candidates: 560
254 knn_text_k_long: 400 228 knn_text_k_long: 400
255 knn_text_num_candidates_long: 1200 229 knn_text_num_candidates_long: 1200
256 knn_image_k: 400 230 knn_image_k: 400
257 knn_image_num_candidates: 1200 231 knn_image_num_candidates: 1200
258 -  
259 function_score: 232 function_score:
260 score_mode: sum 233 score_mode: sum
261 boost_mode: multiply 234 boost_mode: multiply
@@ -269,20 +242,18 @@ coarse_rank: @@ -269,20 +242,18 @@ coarse_rank:
269 es_exponent: 0.05 242 es_exponent: 0.05
270 text_bias: 0.1 243 text_bias: 0.1
271 text_exponent: 0.35 244 text_exponent: 0.35
272 - # base_query_trans_* 相对 base_query 的权重(见 search/rerank_client 中文本 dismax 融合)  
273 - # 因为es的打分已经给了trans进行了折扣,所以这里不再继续折扣  
274 text_translation_weight: 1.0 245 text_translation_weight: 1.0
275 knn_text_weight: 1.0 246 knn_text_weight: 1.0
276 knn_image_weight: 2.0 247 knn_image_weight: 2.0
277 knn_tie_breaker: 0.3 248 knn_tie_breaker: 0.3
278 - knn_bias: 0.2  
279 - knn_exponent: 5.6 249 + knn_bias: 0.6
  250 + knn_exponent: 0.4
280 knn_text_bias: 0.2 251 knn_text_bias: 0.2
281 knn_text_exponent: 0.0 252 knn_text_exponent: 0.0
282 knn_image_bias: 0.2 253 knn_image_bias: 0.2
283 knn_image_exponent: 0.0 254 knn_image_exponent: 0.0
284 fine_rank: 255 fine_rank:
285 - enabled: false # false 时保序透传 256 + enabled: false
286 input_window: 160 257 input_window: 160
287 output_window: 80 258 output_window: 80
288 timeout_sec: 10.0 259 timeout_sec: 10.0
@@ -290,7 +261,7 @@ fine_rank: @@ -290,7 +261,7 @@ fine_rank:
290 rerank_doc_template: '{title}' 261 rerank_doc_template: '{title}'
291 service_profile: fine 262 service_profile: fine
292 rerank: 263 rerank:
293 - enabled: false # false 时保序透传 264 + enabled: false
294 rerank_window: 160 265 rerank_window: 160
295 exact_knn_rescore_enabled: true 266 exact_knn_rescore_enabled: true
296 exact_knn_rescore_window: 160 267 exact_knn_rescore_window: 160
@@ -300,10 +271,6 @@ rerank: @@ -300,10 +271,6 @@ rerank:
300 rerank_query_template: '{query}' 271 rerank_query_template: '{query}'
301 rerank_doc_template: '{title}' 272 rerank_doc_template: '{title}'
302 service_profile: default 273 service_profile: default
303 - # 乘法融合:fused = Π (max(score,0) + bias) ** exponent(es / rerank / fine / text / knn)  
304 - # 其中 knn_score 先做一层 dis_max:  
305 - # max(knn_text_weight * text_knn, knn_image_weight * image_knn)  
306 - # + knn_tie_breaker * 另一侧较弱信号  
307 fusion: 274 fusion:
308 es_bias: 10.0 275 es_bias: 10.0
309 es_exponent: 0.05 276 es_exponent: 0.05
@@ -312,7 +279,6 @@ rerank: @@ -312,7 +279,6 @@ rerank:
312 fine_bias: 0.1 279 fine_bias: 0.1
313 fine_exponent: 1.0 280 fine_exponent: 1.0
314 text_bias: 0.1 281 text_bias: 0.1
315 - # base_query_trans_* 相对 base_query 的权重(见 search/rerank_client 中文本 dismax 融合)  
316 text_exponent: 0.25 282 text_exponent: 0.25
317 text_translation_weight: 0.8 283 text_translation_weight: 0.8
318 knn_text_weight: 1.0 284 knn_text_weight: 1.0
@@ -320,7 +286,6 @@ rerank: @@ -320,7 +286,6 @@ rerank:
320 knn_tie_breaker: 0.3 286 knn_tie_breaker: 0.3
321 knn_bias: 0.0 287 knn_bias: 0.0
322 knn_exponent: 5.6 288 knn_exponent: 5.6
323 -  
324 services: 289 services:
325 translation: 290 translation:
326 service_url: http://127.0.0.1:6006 291 service_url: http://127.0.0.1:6006
@@ -330,9 +295,6 @@ services: @@ -330,9 +295,6 @@ services:
330 cache: 295 cache:
331 ttl_seconds: 62208000 296 ttl_seconds: 62208000
332 sliding_expiration: true 297 sliding_expiration: true
333 - # When false, cache keys are exact-match per request model only (ignores model_quality_tiers for lookups)  
334 - # Higher tier = better quality. Multiple models may share one tier (同级).  
335 - # A request may reuse Redis keys from models with tier > A or tier == A (not from lower tiers).  
336 enable_model_quality_tier_cache: true 298 enable_model_quality_tier_cache: true
337 model_quality_tiers: 299 model_quality_tiers:
338 deepl: 30 300 deepl: 30
@@ -443,10 +405,7 @@ services: @@ -443,10 +405,7 @@ services:
443 device: cuda 405 device: cuda
444 batch_size: 32 406 batch_size: 32
445 normalize_embeddings: true 407 normalize_embeddings: true
446 - # 服务内图片后端(embedding 进程启动时读取;cnclip gRPC 与 6008 须同一 model_name)  
447 - # Chinese-CLIP:ViT-H-14 → 1024 维,ViT-L-14 → 768 维。须与 mappings/search_products.json 中  
448 - # image_embedding.vector.dims 一致(当前索引为 1024 → 默认 ViT-H-14)。  
449 - image_backend: clip_as_service # clip_as_service | local_cnclip 408 + image_backend: clip_as_service
450 image_backends: 409 image_backends:
451 clip_as_service: 410 clip_as_service:
452 server: grpc://127.0.0.1:51000 411 server: grpc://127.0.0.1:51000
@@ -472,7 +431,6 @@ services: @@ -472,7 +431,6 @@ services:
472 request: 431 request:
473 max_docs: 1000 432 max_docs: 1000
474 normalize: true 433 normalize: true
475 - # 命名实例:同一套 reranker 代码按实例名读取不同端口 / 后端 / runtime 目录。  
476 default_instance: default 434 default_instance: default
477 instances: 435 instances:
478 default: 436 default:
@@ -515,31 +473,11 @@ services: @@ -515,31 +473,11 @@ services:
515 enforce_eager: false 473 enforce_eager: false
516 infer_batch_size: 100 474 infer_batch_size: 100
517 sort_by_doc_length: true 475 sort_by_doc_length: true
518 -  
519 - # standard=_format_instruction__standard(固定 yes/no system);compact=_format_instruction(instruction 作 system 且 user 内重复 Instruct)  
520 - instruction_format: standard # compact standard  
521 - # instruction: "Given a query, score the product for relevance"  
522 - # "rank products by given query" 比 “Given a query, score the product for relevance” 更好点  
523 - # instruction: "rank products by given query, category match first"  
524 - # instruction: "Rank products by query relevance, prioritizing category match"  
525 - # instruction: "Rank products by query relevance, prioritizing category and style match"  
526 - # instruction: "Rank by query relevance, prioritize category & style"  
527 - # instruction: "Relevance ranking: category & style match first"  
528 - # instruction: "Score product relevance by query with category & style match prioritized"  
529 - # instruction: "Rank products by query with category & style match prioritized"  
530 - # instruction: "Given a fashion shopping query, retrieve relevant products that answer the query" 476 + instruction_format: standard
531 instruction: rank products by given query 477 instruction: rank products by given query
532 -  
533 - # vLLM LLM.score()(跨编码打分)。独立高性能环境 .venv-reranker-score(vllm 0.18 固定版):./scripts/setup_reranker_venv.sh qwen3_vllm_score  
534 - # 与 qwen3_vllm 可共用同一 model_name / HF 缓存;venv 分离以便升级 vLLM 而不影响 generate 后端。  
535 qwen3_vllm_score: 478 qwen3_vllm_score:
536 model_name: Qwen/Qwen3-Reranker-0.6B 479 model_name: Qwen/Qwen3-Reranker-0.6B
537 - # 官方 Hub 原版需 true;若改用已转换的 seq-cls 权重(如 tomaarsen/...-seq-cls)则设为 false  
538 use_original_qwen3_hf_overrides: true 480 use_original_qwen3_hf_overrides: true
539 - # vllm_runner: "auto"  
540 - # vllm_convert: "auto"  
541 - # 可选:在 use_original_qwen3_hf_overrides 为 true 时与内置 overrides 合并  
542 - # hf_overrides: {}  
543 engine: vllm 481 engine: vllm
544 max_model_len: 172 482 max_model_len: 172
545 tensor_parallel_size: 1 483 tensor_parallel_size: 1
@@ -549,10 +487,7 @@ services: @@ -549,10 +487,7 @@ services:
549 enforce_eager: false 487 enforce_eager: false
550 infer_batch_size: 80 488 infer_batch_size: 80
551 sort_by_doc_length: true 489 sort_by_doc_length: true
552 - # 默认 standard 与 vLLM 官方 Qwen3 reranker 前缀一致  
553 - instruction_format: standard # compact standard  
554 - # instruction: "Rank products by query with category & style match prioritized"  
555 - # instruction: "Given a shopping query, rank products by relevance" 490 + instruction_format: standard
556 instruction: Rank products by query with category & style match prioritized 491 instruction: Rank products by query with category & style match prioritized
557 qwen3_transformers: 492 qwen3_transformers:
558 model_name: Qwen/Qwen3-Reranker-0.6B 493 model_name: Qwen/Qwen3-Reranker-0.6B
@@ -620,25 +555,19 @@ services: @@ -620,25 +555,19 @@ services:
620 endpoint: https://dashscope.aliyuncs.com/compatible-api/v1/reranks 555 endpoint: https://dashscope.aliyuncs.com/compatible-api/v1/reranks
621 api_key_env: RERANK_DASHSCOPE_API_KEY_CN 556 api_key_env: RERANK_DASHSCOPE_API_KEY_CN
622 timeout_sec: 10.0 557 timeout_sec: 10.0
623 - top_n_cap: 0 # 0 表示 top_n=当前请求文档数  
624 - batchsize: 64 # 0 关闭;>0 启用并发小包调度(top_n/top_n_cap 仍生效,分包后全局截断) 558 + top_n_cap: 0
  559 + batchsize: 64
625 instruct: Given a shopping query, rank product titles by relevance 560 instruct: Given a shopping query, rank product titles by relevance
626 max_retries: 2 561 max_retries: 2
627 retry_backoff_sec: 0.2 562 retry_backoff_sec: 0.2
628 -  
629 spu_config: 563 spu_config:
630 enabled: true 564 enabled: true
631 spu_field: spu_id 565 spu_field: spu_id
632 inner_hits_size: 10 566 inner_hits_size: 10
633 - # 配置哪些option维度参与检索(进索引、以及在线搜索)  
634 - # 格式为list,选择option1/option2/option3中的一个或多个  
635 searchable_option_dimensions: 567 searchable_option_dimensions:
636 - option1 568 - option1
637 - option2 569 - option2
638 - option3 570 - option3
639 -  
640 -# 每个租户可配置主语言 primary_language 与索引语言 index_languages(主市场语言,商家可勾选)  
641 -# 默认 index_languages: [en, zh],可配置为任意 SOURCE_LANG_CODE_MAP.keys() 的子集  
642 tenant_config: 571 tenant_config:
643 default: 572 default:
644 primary_language: en 573 primary_language: en
docs/issues/issue-2026-04-16-bayes寻参-clothing_top771数据集上寻参.md 0 → 100644
@@ -0,0 +1,89 @@ @@ -0,0 +1,89 @@
  1 +Prompt - 1
  2 +
  3 +二、在大标注集上寻参
  4 +
  5 +我以前经过过一轮调参,是基于54个评测样本(queries.txt),过程中发现的最优的参数是这一组:
  6 +0.641241 {'es_bias': '7.214', 'es_exponent': '0.2025', 'text_bias': '4.0', 'text_exponent': '1.584', 'text_translation_weight': '1.4441', 'knn_text_weight': '0.1', 'knn_image_weight': '5.6232', 'knn_tie_breaker':
  7 + '0.021', 'knn_bias': '0.0019', 'knn_exponent': '11.8477', 'knn_text_bias': '2.3125', 'knn_text_exponent': '1.1547', 'knn_image_bias': '0.9641', 'knn_image_exponent': '5.8671'}
  8 +
  9 +这一组参数分布比较极端,text_bias太大(文本项得分事0~1的,加上4被稀释的很大),图片的exponent太大,不过在这个数据集上面确实是最好的,我觉得有过拟合的可能,因此要扩大数据集,先扩展标注集,然后使用扩展的标注集,继续进行寻参。
  10 +
  11 +因此新建了一个标注集合,标注任务也已经完成:Clothing Filtered 771。请启动寻参任务,把任务跑起来,以后等程序跑完了应该能拿到寻参结果,下次你可以结合程序执行的结果进行结论分析。
  12 +
  13 +至于调参方式,请参考以前的一轮调参:
  14 +我当时的调参需求:
  15 +
  16 +请对coarse_rank fusion公式进行调参:
  17 + 目前的baseline是这一组,Primary_Metric_Score: 0.637642:
  18 + coarse_rank:
  19 + ...
  20 + fusion:
  21 + es_bias: 10.0
  22 + es_exponent: 0.05
  23 + text_bias: 0.1
  24 + text_exponent: 0.35
  25 + text_translation_weight: 1.0
  26 + knn_text_weight: 1.0
  27 + knn_image_weight: 2.0
  28 + knn_tie_breaker: 0.3
  29 + knn_bias: 0.2
  30 + knn_exponent: 5.6
  31 + knn_text_bias: 0.2
  32 + knn_text_exponent: 0.0
  33 + knn_image_bias: 0.2
  34 + knn_image_exponent: 0.0
  35 + 评估指标在/data/saas-search/artifacts/search_evaluation/batch_reports/batch_20260415T150754Z_00b6a8aa3d.md
  36 + 请以这个为基准,发散思维,进行宽一点的范围调参。因为每次重启、评测都需要几分钟,请写一个调参的框架,基于框架、设定好多组参数、写好脚本,每轮收集效果自动的调整参数分布(因为笛卡尔积、多种参数调参成本太高,因此考虑贝叶斯调参等方法通
  37 + 过多轮迭代通过脚本自动收敛)
  38 + 每次调整参数后需要重启backend (有时候重启backend后eval-web好像也挂了,如果有可以追查原因并修复)
  39 + ./restart.sh backend
  40 + 注意:请你调试脚本、进行一轮分析,最终要沉淀一套调参脚本,我下次可以重新跑(还是针对这组参数调参),能自动迭代(调整参数分布)、收集每组参数的指标、调优到最优的参数组合。
  41 +
  42 +
  43 +
  44 +你当时给出的调参脚本( “种子实验 + 随机探索 + 基于历史最优分布的自适应采样”反复迭代。搜索范围已经放宽,新增了多组偏离 baseline 很远的 seed,也保留了 baseline 附近抖动,适合长时间慢慢收敛。):
  45 +
  46 + 长时间自动调参脚本已经配好,核心文件是:
  47 +
  48 + - 调参主程序:scripts/evaluation/tune_fusion.py
  49 + - 宽范围搜索空间:scripts/evaluation/tuning/coarse_rank_fusion_space.yaml
  50 + - 后台长跑启动脚本:scripts/evaluation/start_coarse_fusion_tuning_long.sh
  51 + - 续跑脚本:scripts/evaluation/resume_coarse_fusion_tuning_long.sh
  52 + - 使用说明:scripts/evaluation/tuning/README.md
  53 +
  54 + 直接启动一轮:
  55 +
  56 + ./scripts/evaluation/start_coarse_fusion_tuning_long.sh
  57 +
  58 + 如果想跑更久一点:
  59 +
  60 + MAX_EVALS=48 BATCH_SIZE=3 CANDIDATE_POOL_SIZE=512 RUN_NAME=coarse_fusion_long_001 \
  61 + ./scripts/evaluation/start_coarse_fusion_tuning_long.sh
  62 +
  63 + 看日志:
  64 +
  65 + tail -f artifacts/search_evaluation/tuning_launches/<run_name>.log
  66 +
  67 + 看结果目录:
  68 +
  69 + ls artifacts/search_evaluation/tuning_runs/<run_name>/
  70 +
  71 + 续跑:
  72 +
  73 + ./scripts/evaluation/resume_coarse_fusion_tuning_long.sh <run_name>
  74 +
  75 + 每轮会自动做这些事:
  76 +
  77 + - 生成一批候选 coarse_rank.fusion 参数
  78 + - 写入配置并重启 backend
  79 + - 检查 eval-web,必要时自愈拉起
  80 + - 跑评测并收集 Primary_Metric_Score
  81 + - 更新 trials.jsonl、leaderboard.csv、summary.md
  82 + - 基于历史结果调整下一轮采样分布
  83 +
  84 +上面只是历史的寻参任务的参考。现在请你使用新的数据集来进行寻参。
  85 +要注意的是这个数据集比较大,因此每一轮行参任务耗时会比较长,进行大范围的精细的搜索不太可取。考虑仔细分析上次寻参的结果,在已有成果上,进行精细搜索,如果已有成果还不太够,可以在小数据集上进行粗搜,然后再到大数据集上寻参。
  86 +
  87 +
  88 +
  89 +Response - 1
docs/issues/issue-2026-04-16-数据集扩增&bayes寻参-TODO.md
@@ -377,10 +377,9 @@ CLI / 启动脚本设计 @@ -377,10 +377,9 @@ CLI / 启动脚本设计
377 377
378 这一组参数分布比较极端,text_bias太大(文本项得分事0~1的,加上4被稀释的很大),图片的exponent太大,不过在这个数据集上面确实是最好的,我觉得有过拟合的可能,因此要扩大数据集,先扩展标注集,然后使用扩展的标注集,继续进行寻参。 378 这一组参数分布比较极端,text_bias太大(文本项得分事0~1的,加上4被稀释的很大),图片的exponent太大,不过在这个数据集上面确实是最好的,我觉得有过拟合的可能,因此要扩大数据集,先扩展标注集,然后使用扩展的标注集,继续进行寻参。
379 379
380 -我已经新建了一个标注集合,请启动寻参任务,把任务跑起来,以后等程序跑完了应该能拿到寻参结果,下次你可以结合程序执行的结果进行结论分析。 380 +因此新建了一个标注集合,标注任务也已经完成:Clothing Filtered 771。请启动寻参任务,把任务跑起来,以后等程序跑完了应该能拿到寻参结果,下次你可以结合程序执行的结果进行结论分析。
381 381
382 -  
383 -以前的一轮调参: 382 +至于调参方式,请参考以前的一轮调参:
384 我当时的调参需求: 383 我当时的调参需求:
385 384
386 请对coarse_rank fusion公式进行调参: 385 请对coarse_rank fusion公式进行调参:
@@ -411,7 +410,7 @@ CLI / 启动脚本设计 @@ -411,7 +410,7 @@ CLI / 启动脚本设计
411 410
412 411
413 412
414 -你给出的调参脚本 413 +你当时给出的调参脚本( “种子实验 + 随机探索 + 基于历史最优分布的自适应采样”反复迭代。搜索范围已经放宽,新增了多组偏离 baseline 很远的 seed,也保留了 baseline 附近抖动,适合长时间慢慢收敛。)
415 414
416 长时间自动调参脚本已经配好,核心文件是: 415 长时间自动调参脚本已经配好,核心文件是:
417 416
@@ -421,8 +420,6 @@ CLI / 启动脚本设计 @@ -421,8 +420,6 @@ CLI / 启动脚本设计
421 - 续跑脚本:scripts/evaluation/resume_coarse_fusion_tuning_long.sh 420 - 续跑脚本:scripts/evaluation/resume_coarse_fusion_tuning_long.sh
422 - 使用说明:scripts/evaluation/tuning/README.md 421 - 使用说明:scripts/evaluation/tuning/README.md
423 422
424 - 这套方案不是笛卡尔积穷举,而是“种子实验 + 随机探索 + 基于历史最优分布的自适应采样”反复迭代。搜索范围已经放宽,新增了多组偏离 baseline 很远的 seed,也保留了 baseline 附近抖动,适合长时间慢慢收敛。  
425 -  
426 直接启动一轮: 423 直接启动一轮:
427 424
428 ./scripts/evaluation/start_coarse_fusion_tuning_long.sh 425 ./scripts/evaluation/start_coarse_fusion_tuning_long.sh
@@ -453,56 +450,5 @@ CLI / 启动脚本设计 @@ -453,56 +450,5 @@ CLI / 启动脚本设计
453 - 更新 trials.jsonl、leaderboard.csv、summary.md 450 - 更新 trials.jsonl、leaderboard.csv、summary.md
454 - 基于历史结果调整下一轮采样分布 451 - 基于历史结果调整下一轮采样分布
455 452
456 - 你先跑一轮长任务。跑完后把 run_name 或 scripts/evaluation/tuning/README.md 里对应实验目录发我,我再基于结果做下一步分析和收缩搜索空间。  
457 -  
458 -  
459 -  
460 -  
461 -› 我运行了:MAX_EVALS=400 BATCH_SIZE=3 CANDIDATE_POOL_SIZE=512 RUN_NAME=coarse_fusion_long_001 ./scripts/evaluation/start_coarse_fusion_tuning_long.sh  
462 -  
463 - run_name=coarse_fusion_long_001  
464 - pid=2218620  
465 - log=artifacts/search_evaluation/tuning_launches/coarse_fusion_long_001.log  
466 - pid_file=artifacts/search_evaluation/tuning_launches/coarse_fusion_long_001.pid  
467 - cmd_file=artifacts/search_evaluation/tuning_launches/coarse_fusion_long_001.cmd  
468 - run_dir=artifacts/search_evaluation/tuning_runs/coarse_fusion_long_001  
469 -  
470 - tail -f artifacts/search_evaluation/tuning_launches/coarse_fusion_long_001.log  
471 - cat artifacts/search_evaluation/tuning_runs/coarse_fusion_long_001/leaderboard.csv  
472 -  
473 -  
474 -• 我先看这次长跑的日志和当前排行榜,确认任务是否正常推进、是否有异常中断,以及目前最优参数落在哪个方向。  
475 -  
476 -• Explored  
477 - └ Read coarse_fusion_long_001.log, leaderboard.csv  
478 - List coarse_fusion_long_001  
479 - Read summary.md  
480 -  
481 -───  
482 -  
483 -• 结果已经足够看出明显趋势了。我再补一层聚合分析,看看 top 区间里哪些参数方向反复出现,避免只盯着单个最优点。  
484 -  
485 -• Ran ps -p 2218620 -o pid=,etimes=,cmd=  
486 - └ (no output)  
487 -  
488 -• Ran python - <<'PY'  
489 - │ import csv, statistics, math  
490 - │ from pathlib import Path  
491 - │ … +15 lines  
492 - └ rows 385  
493 -  
494 - … +61 lines (ctrl + t to view transcript)  
495 - knn_image_bias: mean=0.5631 median=0.2000 min=0.0010 max=4.0000  
496 - knn_image_exponent: mean=4.6316 median=5.8680 min=0.0000 max=6.0000  
497 -  
498 -• Ran python - <<'PY'  
499 - │ import csv  
500 - │ from pathlib import Path  
501 - │ … +19 lines  
502 - └ baseline_score 0.637642  
503 -  
504 - … +30 lines (ctrl + t to view transcript)  
505 - 10 bo_234 0.641241 {'es_bias': '7.214', 'es_exponent': '0.2025', 'text_bias': '4.0', 'text_exponent': '1.584', 'text_translation_weight': '1.4441', 'knn_text_weight': '0.1', 'knn_image_weight': '5.6232', 'knn_tie_breaker':  
506 - '0.021', 'knn_bias': '0.0019', 'knn_exponent': '11.8477', 'knn_text_bias': '2.3125', 'knn_text_exponent': '1.1547', 'knn_image_bias': '0.9641', 'knn_image_exponent': '5.8671'}  
507 -  
508 -这一次因为外部原因(磁盘满)终止了,以上是最好的一组参数。 453 +上面只是历史的寻参任务的参考。现在请你使用新的数据集来进行寻参。
  454 +要注意的是这个数据集比较大,因此每一轮行参任务耗时会比较长,进行大范围的精细的搜索不太可取。考虑仔细分析上次寻参的结果,在已有成果上,进行精细搜索,如果已有成果还不太够,可以在小数据集上进行粗搜,然后再到大数据集上寻参。
509 \ No newline at end of file 455 \ No newline at end of file
scripts/evaluation/resume_coarse_fusion_tuning_long.sh
@@ -26,10 +26,28 @@ if [ ! -d &quot;${RUN_DIR}&quot; ]; then @@ -26,10 +26,28 @@ if [ ! -d &quot;${RUN_DIR}&quot; ]; then
26 exit 1 26 exit 1
27 fi 27 fi
28 28
29 -MAX_EVALS="${MAX_EVALS:-36}"  
30 -BATCH_SIZE="${BATCH_SIZE:-3}"  
31 -CANDIDATE_POOL_SIZE="${CANDIDATE_POOL_SIZE:-512}"  
32 DATASET_ID="${REPO_EVAL_DATASET_ID:-core_queries}" 29 DATASET_ID="${REPO_EVAL_DATASET_ID:-core_queries}"
  30 +case "${DATASET_ID}" in
  31 + clothing_top771)
  32 + DEFAULT_SEED_REPORT="artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md"
  33 + DEFAULT_MAX_EVALS="18"
  34 + DEFAULT_BATCH_SIZE="2"
  35 + DEFAULT_CANDIDATE_POOL_SIZE="160"
  36 + ;;
  37 + *)
  38 + DEFAULT_SEED_REPORT="artifacts/search_evaluation/batch_reports/batch_20260415T150754Z_00b6a8aa3d.md"
  39 + DEFAULT_MAX_EVALS="36"
  40 + DEFAULT_BATCH_SIZE="3"
  41 + DEFAULT_CANDIDATE_POOL_SIZE="512"
  42 + ;;
  43 +esac
  44 +
  45 +SEED_REPORT="${SEED_REPORT:-${DEFAULT_SEED_REPORT}}"
  46 +MAX_EVALS="${MAX_EVALS:-${DEFAULT_MAX_EVALS}}"
  47 +BATCH_SIZE="${BATCH_SIZE:-${DEFAULT_BATCH_SIZE}}"
  48 +CANDIDATE_POOL_SIZE="${CANDIDATE_POOL_SIZE:-${DEFAULT_CANDIDATE_POOL_SIZE}}"
  49 +BATCH_EVAL_TIMEOUT_SEC="${BATCH_EVAL_TIMEOUT_SEC:-0}"
  50 +RANDOM_SEED="${RANDOM_SEED:-20260422}"
33 51
34 LAUNCH_DIR="artifacts/search_evaluation/tuning_launches" 52 LAUNCH_DIR="artifacts/search_evaluation/tuning_launches"
35 mkdir -p "${LAUNCH_DIR}" 53 mkdir -p "${LAUNCH_DIR}"
@@ -38,28 +56,25 @@ PID_PATH=&quot;${LAUNCH_DIR}/${RUN_NAME}.resume.pid&quot; @@ -38,28 +56,25 @@ PID_PATH=&quot;${LAUNCH_DIR}/${RUN_NAME}.resume.pid&quot;
38 CMD_PATH="${LAUNCH_DIR}/${RUN_NAME}.resume.cmd" 56 CMD_PATH="${LAUNCH_DIR}/${RUN_NAME}.resume.cmd"
39 57
40 CMD=( 58 CMD=(
41 - python  
42 - scripts/evaluation/tune_fusion.py  
43 - --mode optimize  
44 - --resume-run "${RUN_DIR}"  
45 - --search-space "${RUN_DIR}/search_space.yaml"  
46 - --seed-report artifacts/search_evaluation/batch_reports/batch_20260415T150754Z_00b6a8aa3d.md  
47 - --tenant-id 163  
48 - --dataset-id "${DATASET_ID}"  
49 - --queries-file scripts/evaluation/queries/queries.txt  
50 - --top-k 100  
51 - --language en  
52 - --search-base-url http://127.0.0.1:6002  
53 - --eval-web-base-url http://127.0.0.1:6010  
54 - --max-evals "${MAX_EVALS}"  
55 - --batch-size "${BATCH_SIZE}"  
56 - --candidate-pool-size "${CANDIDATE_POOL_SIZE}" 59 + bash
  60 + scripts/evaluation/run_coarse_fusion_tuning_resilient.sh
  61 + "${RUN_NAME}"
  62 + "${DATASET_ID}"
  63 + "${MAX_EVALS}"
  64 + "${BATCH_SIZE}"
  65 + "${CANDIDATE_POOL_SIZE}"
  66 + "${RANDOM_SEED}"
  67 + "${RUN_DIR}/search_space.yaml"
  68 + "${SEED_REPORT}"
  69 + "${RUN_DIR}"
57 ) 70 )
58 71
59 if [ "$#" -gt 0 ]; then 72 if [ "$#" -gt 0 ]; then
60 CMD+=("$@") 73 CMD+=("$@")
61 fi 74 fi
62 75
  76 +export BATCH_EVAL_TIMEOUT_SEC
  77 +
63 printf '%q ' "${CMD[@]}" > "${CMD_PATH}" 78 printf '%q ' "${CMD[@]}" > "${CMD_PATH}"
64 printf '\n' >> "${CMD_PATH}" 79 printf '\n' >> "${CMD_PATH}"
65 80
scripts/evaluation/run_coarse_fusion_tuning_resilient.sh 0 → 100755
@@ -0,0 +1,117 @@ @@ -0,0 +1,117 @@
  1 +#!/bin/bash
  2 +
  3 +set -euo pipefail
  4 +
  5 +cd "$(dirname "$0")/../.."
  6 +source ./activate.sh
  7 +
  8 +usage() {
  9 + echo "usage: $0 <run_name> <dataset_id> <max_evals> <batch_size> <candidate_pool_size> <random_seed> <search_space> <seed_report> [resume_run_dir]" >&2
  10 + exit 1
  11 +}
  12 +
  13 +if [ "$#" -lt 8 ]; then
  14 + usage
  15 +fi
  16 +
  17 +RUN_NAME="$1"
  18 +DATASET_ID="$2"
  19 +MAX_EVALS="$3"
  20 +BATCH_SIZE="$4"
  21 +CANDIDATE_POOL_SIZE="$5"
  22 +RANDOM_SEED="$6"
  23 +SEARCH_SPACE="$7"
  24 +SEED_REPORT="$8"
  25 +RESUME_RUN_DIR="${9:-}"
  26 +
  27 +BATCH_EVAL_TIMEOUT_SEC="${BATCH_EVAL_TIMEOUT_SEC:-0}"
  28 +RESTART_SLEEP_SEC="${RESTART_SLEEP_SEC:-30}"
  29 +SEARCH_BASE_URL="${SEARCH_BASE_URL:-http://127.0.0.1:6002}"
  30 +EVAL_WEB_BASE_URL="${EVAL_WEB_BASE_URL:-http://127.0.0.1:6010}"
  31 +RUN_DIR="artifacts/search_evaluation/tuning_runs/${RUN_NAME}"
  32 +
  33 +mkdir -p "$(dirname "$RUN_DIR")"
  34 +
  35 +count_live_successes() {
  36 + python3 - "$RUN_DIR" <<'PY'
  37 +import json
  38 +import sys
  39 +from pathlib import Path
  40 +
  41 +run_dir = Path(sys.argv[1])
  42 +path = run_dir / "trials.jsonl"
  43 +count = 0
  44 +if path.is_file():
  45 + for line in path.read_text(encoding="utf-8").splitlines():
  46 + line = line.strip()
  47 + if not line:
  48 + continue
  49 + obj = json.loads(line)
  50 + if obj.get("status") == "ok" and not obj.get("is_seed"):
  51 + count += 1
  52 +print(count)
  53 +PY
  54 +}
  55 +
  56 +build_cmd() {
  57 + local cmd=(
  58 + python
  59 + scripts/evaluation/tune_fusion.py
  60 + --mode optimize
  61 + --search-space "$SEARCH_SPACE"
  62 + --seed-report "$SEED_REPORT"
  63 + --tenant-id 163
  64 + --dataset-id "$DATASET_ID"
  65 + --queries-file scripts/evaluation/queries/queries.txt
  66 + --top-k 100
  67 + --language en
  68 + --search-base-url "$SEARCH_BASE_URL"
  69 + --eval-web-base-url "$EVAL_WEB_BASE_URL"
  70 + --max-evals "$MAX_EVALS"
  71 + --batch-size "$BATCH_SIZE"
  72 + --candidate-pool-size "$CANDIDATE_POOL_SIZE"
  73 + --random-seed "$RANDOM_SEED"
  74 + --batch-eval-timeout-sec "$BATCH_EVAL_TIMEOUT_SEC"
  75 + )
  76 + if [ -n "$RESUME_RUN_DIR" ]; then
  77 + cmd+=(--resume-run "$RESUME_RUN_DIR")
  78 + else
  79 + cmd+=(--run-name "$RUN_NAME")
  80 + fi
  81 + printf '%q ' "${cmd[@]}"
  82 + printf '\n'
  83 +}
  84 +
  85 +attempt=0
  86 +while true; do
  87 + live_successes="$(count_live_successes)"
  88 + if [ "$live_successes" -ge "$MAX_EVALS" ]; then
  89 + echo "[resilient] complete run_name=$RUN_NAME live_successes=$live_successes target=$MAX_EVALS"
  90 + exit 0
  91 + fi
  92 +
  93 + attempt=$((attempt + 1))
  94 + if [ -d "$RUN_DIR" ]; then
  95 + RESUME_RUN_DIR="$RUN_DIR"
  96 + fi
  97 +
  98 + echo "[resilient] attempt=$attempt run_name=$RUN_NAME live_successes=$live_successes target=$MAX_EVALS"
  99 + CMD_STR="$(build_cmd)"
  100 + echo "[resilient] cmd=$CMD_STR"
  101 +
  102 + set +e
  103 + bash -lc "$CMD_STR"
  104 + exit_code=$?
  105 + set -e
  106 +
  107 + live_successes="$(count_live_successes)"
  108 + echo "[resilient] exit_code=$exit_code live_successes=$live_successes"
  109 +
  110 + if [ "$live_successes" -ge "$MAX_EVALS" ]; then
  111 + echo "[resilient] finished after attempt=$attempt"
  112 + exit 0
  113 + fi
  114 +
  115 + echo "[resilient] sleeping ${RESTART_SLEEP_SEC}s before resume"
  116 + sleep "$RESTART_SLEEP_SEC"
  117 +done
scripts/evaluation/start_coarse_fusion_tuning_long.sh
@@ -5,12 +5,34 @@ set -euo pipefail @@ -5,12 +5,34 @@ set -euo pipefail
5 cd "$(dirname "$0")/../.." 5 cd "$(dirname "$0")/../.."
6 source ./activate.sh 6 source ./activate.sh
7 7
8 -RUN_NAME="${RUN_NAME:-coarse_fusion_long_$(date -u +%Y%m%dT%H%M%SZ)}"  
9 -MAX_EVALS="${MAX_EVALS:-36}"  
10 -BATCH_SIZE="${BATCH_SIZE:-3}"  
11 -CANDIDATE_POOL_SIZE="${CANDIDATE_POOL_SIZE:-512}"  
12 -RANDOM_SEED="${RANDOM_SEED:-20260416}"  
13 DATASET_ID="${REPO_EVAL_DATASET_ID:-core_queries}" 8 DATASET_ID="${REPO_EVAL_DATASET_ID:-core_queries}"
  9 +case "${DATASET_ID}" in
  10 + clothing_top771)
  11 + DEFAULT_SEARCH_SPACE="scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml"
  12 + DEFAULT_SEED_REPORT="artifacts/search_evaluation/datasets/clothing_top771/batch_reports/batch_20260422T014610Z_5426bba1a6/report.md"
  13 + DEFAULT_MAX_EVALS="18"
  14 + DEFAULT_BATCH_SIZE="2"
  15 + DEFAULT_CANDIDATE_POOL_SIZE="160"
  16 + DEFAULT_RANDOM_SEED="20260422"
  17 + ;;
  18 + *)
  19 + DEFAULT_SEARCH_SPACE="scripts/evaluation/tuning/coarse_rank_fusion_space.yaml"
  20 + DEFAULT_SEED_REPORT="artifacts/search_evaluation/batch_reports/batch_20260415T150754Z_00b6a8aa3d.md"
  21 + DEFAULT_MAX_EVALS="36"
  22 + DEFAULT_BATCH_SIZE="3"
  23 + DEFAULT_CANDIDATE_POOL_SIZE="512"
  24 + DEFAULT_RANDOM_SEED="20260416"
  25 + ;;
  26 +esac
  27 +
  28 +RUN_NAME="${RUN_NAME:-coarse_fusion_${DATASET_ID}_$(date -u +%Y%m%dT%H%M%SZ)}"
  29 +SEARCH_SPACE="${SEARCH_SPACE:-${DEFAULT_SEARCH_SPACE}}"
  30 +SEED_REPORT="${SEED_REPORT:-${DEFAULT_SEED_REPORT}}"
  31 +MAX_EVALS="${MAX_EVALS:-${DEFAULT_MAX_EVALS}}"
  32 +BATCH_SIZE="${BATCH_SIZE:-${DEFAULT_BATCH_SIZE}}"
  33 +CANDIDATE_POOL_SIZE="${CANDIDATE_POOL_SIZE:-${DEFAULT_CANDIDATE_POOL_SIZE}}"
  34 +RANDOM_SEED="${RANDOM_SEED:-${DEFAULT_RANDOM_SEED}}"
  35 +BATCH_EVAL_TIMEOUT_SEC="${BATCH_EVAL_TIMEOUT_SEC:-0}"
14 36
15 LAUNCH_DIR="artifacts/search_evaluation/tuning_launches" 37 LAUNCH_DIR="artifacts/search_evaluation/tuning_launches"
16 mkdir -p "${LAUNCH_DIR}" 38 mkdir -p "${LAUNCH_DIR}"
@@ -19,29 +41,24 @@ PID_PATH=&quot;${LAUNCH_DIR}/${RUN_NAME}.pid&quot; @@ -19,29 +41,24 @@ PID_PATH=&quot;${LAUNCH_DIR}/${RUN_NAME}.pid&quot;
19 CMD_PATH="${LAUNCH_DIR}/${RUN_NAME}.cmd" 41 CMD_PATH="${LAUNCH_DIR}/${RUN_NAME}.cmd"
20 42
21 CMD=( 43 CMD=(
22 - python  
23 - scripts/evaluation/tune_fusion.py  
24 - --mode optimize  
25 - --run-name "${RUN_NAME}"  
26 - --search-space scripts/evaluation/tuning/coarse_rank_fusion_space.yaml  
27 - --seed-report artifacts/search_evaluation/batch_reports/batch_20260415T150754Z_00b6a8aa3d.md  
28 - --tenant-id 163  
29 - --dataset-id "${DATASET_ID}"  
30 - --queries-file scripts/evaluation/queries/queries.txt  
31 - --top-k 100  
32 - --language en  
33 - --search-base-url http://127.0.0.1:6002  
34 - --eval-web-base-url http://127.0.0.1:6010  
35 - --max-evals "${MAX_EVALS}"  
36 - --batch-size "${BATCH_SIZE}"  
37 - --candidate-pool-size "${CANDIDATE_POOL_SIZE}"  
38 - --random-seed "${RANDOM_SEED}" 44 + bash
  45 + scripts/evaluation/run_coarse_fusion_tuning_resilient.sh
  46 + "${RUN_NAME}"
  47 + "${DATASET_ID}"
  48 + "${MAX_EVALS}"
  49 + "${BATCH_SIZE}"
  50 + "${CANDIDATE_POOL_SIZE}"
  51 + "${RANDOM_SEED}"
  52 + "${SEARCH_SPACE}"
  53 + "${SEED_REPORT}"
39 ) 54 )
40 55
41 if [ "$#" -gt 0 ]; then 56 if [ "$#" -gt 0 ]; then
42 CMD+=("$@") 57 CMD+=("$@")
43 fi 58 fi
44 59
  60 +export BATCH_EVAL_TIMEOUT_SEC
  61 +
45 printf '%q ' "${CMD[@]}" > "${CMD_PATH}" 62 printf '%q ' "${CMD[@]}" > "${CMD_PATH}"
46 printf '\n' >> "${CMD_PATH}" 63 printf '\n' >> "${CMD_PATH}"
47 64
scripts/evaluation/tune_fusion.py
@@ -379,6 +379,7 @@ def run_batch_eval( @@ -379,6 +379,7 @@ def run_batch_eval(
379 top_k: int, 379 top_k: int,
380 language: str, 380 language: str,
381 force_refresh_labels: bool, 381 force_refresh_labels: bool,
  382 + timeout_sec: int,
382 ) -> Dict[str, Any]: 383 ) -> Dict[str, Any]:
383 cmd = [ 384 cmd = [
384 str(PROJECT_ROOT / ".venv" / "bin" / "python"), 385 str(PROJECT_ROOT / ".venv" / "bin" / "python"),
@@ -397,13 +398,14 @@ def run_batch_eval( @@ -397,13 +398,14 @@ def run_batch_eval(
397 cmd.extend(["--queries-file", str(queries_file)]) 398 cmd.extend(["--queries-file", str(queries_file)])
398 if force_refresh_labels: 399 if force_refresh_labels:
399 cmd.append("--force-refresh-labels") 400 cmd.append("--force-refresh-labels")
  401 + timeout = timeout_sec if timeout_sec and timeout_sec > 0 else None
400 completed = subprocess.run( 402 completed = subprocess.run(
401 cmd, 403 cmd,
402 cwd=PROJECT_ROOT, 404 cwd=PROJECT_ROOT,
403 check=True, 405 check=True,
404 capture_output=True, 406 capture_output=True,
405 text=True, 407 text=True,
406 - timeout=7200, 408 + timeout=timeout,
407 ) 409 )
408 output = (completed.stdout or "") + "\n" + (completed.stderr or "") 410 output = (completed.stdout or "") + "\n" + (completed.stderr or "")
409 batch_ids = re.findall(r"batch_id=([A-Za-z0-9_]+)", output) 411 batch_ids = re.findall(r"batch_id=([A-Za-z0-9_]+)", output)
@@ -1221,6 +1223,7 @@ def run_optimize_mode(args: argparse.Namespace) -&gt; None: @@ -1221,6 +1223,7 @@ def run_optimize_mode(args: argparse.Namespace) -&gt; None:
1221 top_k=args.top_k, 1223 top_k=args.top_k,
1222 language=args.language, 1224 language=args.language,
1223 force_refresh_labels=force_refresh_labels, 1225 force_refresh_labels=force_refresh_labels,
  1226 + timeout_sec=args.batch_eval_timeout_sec,
1224 ) 1227 )
1225 ensure_disk_headroom( 1228 ensure_disk_headroom(
1226 min_free_gb=args.min_free_gb, 1229 min_free_gb=args.min_free_gb,
@@ -1362,6 +1365,7 @@ def build_parser() -&gt; argparse.ArgumentParser: @@ -1362,6 +1365,7 @@ def build_parser() -&gt; argparse.ArgumentParser:
1362 parser.add_argument("--resume-run", default=None) 1365 parser.add_argument("--resume-run", default=None)
1363 parser.add_argument("--max-evals", type=int, default=12) 1366 parser.add_argument("--max-evals", type=int, default=12)
1364 parser.add_argument("--batch-size", type=int, default=3) 1367 parser.add_argument("--batch-size", type=int, default=3)
  1368 + parser.add_argument("--batch-eval-timeout-sec", type=int, default=0)
1365 parser.add_argument("--init-random", type=int, default=None) 1369 parser.add_argument("--init-random", type=int, default=None)
1366 parser.add_argument("--candidate-pool-size", type=int, default=None) 1370 parser.add_argument("--candidate-pool-size", type=int, default=None)
1367 parser.add_argument("--random-seed", type=int, default=20260415) 1371 parser.add_argument("--random-seed", type=int, default=20260415)
scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771.yaml 0 → 100644
@@ -0,0 +1,161 @@ @@ -0,0 +1,161 @@
  1 +target_path: coarse_rank.fusion
  2 +
  3 +baseline:
  4 + es_bias: 10.0
  5 + es_exponent: 0.05
  6 + text_bias: 0.1
  7 + text_exponent: 0.35
  8 + text_translation_weight: 1.0
  9 + knn_text_weight: 1.0
  10 + knn_image_weight: 2.0
  11 + knn_tie_breaker: 0.3
  12 + knn_bias: 0.2
  13 + knn_exponent: 5.6
  14 + knn_text_bias: 0.2
  15 + knn_text_exponent: 0.0
  16 + knn_image_bias: 0.2
  17 + knn_image_exponent: 0.0
  18 +
  19 +parameters:
  20 + es_bias: {min: 2.0, max: 20.0, scale: log, round: 4}
  21 + es_exponent: {min: 0.03, max: 0.28, scale: linear, round: 4}
  22 + text_bias: {min: 0.01, max: 4.0, scale: log, round: 4}
  23 + text_exponent: {min: 0.2, max: 1.6, scale: linear, round: 4}
  24 + text_translation_weight: {min: 0.7, max: 1.8, scale: linear, round: 4}
  25 + knn_text_weight: {min: 0.05, max: 1.8, scale: linear, round: 4}
  26 + knn_image_weight: {min: 1.2, max: 6.0, scale: linear, round: 4}
  27 + knn_tie_breaker: {min: 0.0, max: 0.4, scale: linear, round: 4}
  28 + knn_bias: {min: 0.001, max: 2.5, scale: log, round: 4}
  29 + knn_exponent: {min: 0.05, max: 12.0, scale: log, round: 4}
  30 + knn_text_bias: {min: 0.001, max: 4.0, scale: log, round: 4}
  31 + knn_text_exponent: {min: 0.0, max: 2.0, scale: linear, round: 4}
  32 + knn_image_bias: {min: 0.01, max: 1.5, scale: log, round: 4}
  33 + knn_image_exponent: {min: 0.0, max: 6.0, scale: linear, round: 4}
  34 +
  35 +seed_experiments:
  36 + - name: seed_low_knn_global
  37 + description: 先验证 021002 中出现的低 knn 全局指数,去掉 reranker 后是否仍有收益。
  38 + params:
  39 + knn_bias: 0.6
  40 + knn_exponent: 0.4
  41 + - name: seed_bigset_knn_soft
  42 + description: 从低 knn 全局指数出发,继续平滑 knn 非线性。
  43 + params:
  44 + text_exponent: 0.42
  45 + text_translation_weight: 1.05
  46 + knn_text_weight: 0.85
  47 + knn_image_weight: 2.4
  48 + knn_tie_breaker: 0.18
  49 + knn_bias: 0.9
  50 + knn_exponent: 0.18
  51 + knn_image_exponent: 0.2
  52 + - name: seed_bigset_knn_mid
  53 + description: 保留平滑 knn,但让 image 通路再强一点,验证大集是否需要适度非线性。
  54 + params:
  55 + es_bias: 8.0
  56 + es_exponent: 0.08
  57 + text_bias: 0.15
  58 + text_exponent: 0.5
  59 + text_translation_weight: 1.15
  60 + knn_text_weight: 0.65
  61 + knn_image_weight: 3.1
  62 + knn_tie_breaker: 0.12
  63 + knn_bias: 0.45
  64 + knn_exponent: 0.85
  65 + knn_text_bias: 0.35
  66 + knn_text_exponent: 0.2
  67 + knn_image_bias: 0.22
  68 + knn_image_exponent: 0.8
  69 + - name: seed_bigset_text_stable
  70 + description: 提高 lexical 区分度,观察大集是否更偏好稳健文本排序。
  71 + params:
  72 + es_bias: 7.0
  73 + es_exponent: 0.12
  74 + text_bias: 0.25
  75 + text_exponent: 0.72
  76 + text_translation_weight: 1.0
  77 + knn_text_weight: 0.55
  78 + knn_image_weight: 2.2
  79 + knn_tie_breaker: 0.08
  80 + knn_bias: 0.7
  81 + knn_exponent: 0.35
  82 + knn_text_bias: 0.5
  83 + knn_text_exponent: 0.4
  84 + knn_image_bias: 0.18
  85 + knn_image_exponent: 0.35
  86 + - name: seed_hybrid_transfer
  87 + description: 以大集 baseline 为主,温和吸收小集历史赢家中的 image/text 强化模式。
  88 + params:
  89 + es_bias: 7.2
  90 + es_exponent: 0.15
  91 + text_bias: 0.6
  92 + text_exponent: 0.82
  93 + text_translation_weight: 1.28
  94 + knn_text_weight: 0.45
  95 + knn_image_weight: 4.0
  96 + knn_tie_breaker: 0.08
  97 + knn_bias: 0.2
  98 + knn_exponent: 1.2
  99 + knn_text_bias: 0.8
  100 + knn_text_exponent: 0.45
  101 + knn_image_bias: 0.3
  102 + knn_image_exponent: 1.4
  103 + - name: seed_legacy_bo234
  104 + description: 直接验证 53 条集历史最优在 771 条集上的迁移表现。
  105 + params:
  106 + es_bias: 7.214
  107 + es_exponent: 0.2025
  108 + text_bias: 4.0
  109 + text_exponent: 1.584
  110 + text_translation_weight: 1.4441
  111 + knn_text_weight: 0.1
  112 + knn_image_weight: 5.6232
  113 + knn_tie_breaker: 0.021
  114 + knn_bias: 0.0019
  115 + knn_exponent: 11.8477
  116 + knn_text_bias: 2.3125
  117 + knn_text_exponent: 1.1547
  118 + knn_image_bias: 0.9641
  119 + knn_image_exponent: 5.8671
  120 + - name: seed_legacy_bo340
  121 + description: 验证小集冠军参数在大集上是否仍有价值。
  122 + params:
  123 + es_bias: 5.887
  124 + es_exponent: 0.2145
  125 + text_bias: 4.0
  126 + text_exponent: 1.6
  127 + text_translation_weight: 1.4788
  128 + knn_text_weight: 0.3693
  129 + knn_image_weight: 5.7028
  130 + knn_tie_breaker: 0.0174
  131 + knn_bias: 0.0016
  132 + knn_exponent: 12.0
  133 + knn_text_bias: 2.6071
  134 + knn_text_exponent: 1.0458
  135 + knn_image_bias: 0.8282
  136 + knn_image_exponent: 6.0
  137 + - name: seed_image_guard
  138 + description: 控制 image 权重但允许 image 子项指数,检查 recall 与 precision 的平衡点。
  139 + params:
  140 + es_bias: 9.0
  141 + es_exponent: 0.09
  142 + text_bias: 0.12
  143 + text_exponent: 0.45
  144 + text_translation_weight: 1.1
  145 + knn_text_weight: 0.7
  146 + knn_image_weight: 2.8
  147 + knn_tie_breaker: 0.1
  148 + knn_bias: 0.55
  149 + knn_exponent: 0.55
  150 + knn_text_bias: 0.25
  151 + knn_text_exponent: 0.15
  152 + knn_image_bias: 0.28
  153 + knn_image_exponent: 1.0
  154 +
  155 +optimizer:
  156 + init_random: 2
  157 + candidate_pool_size: 160
  158 + explore_probability: 0.12
  159 + local_jitter_probability: 0.62
  160 + elite_fraction: 0.25
  161 + min_normalized_distance: 0.08