Commit 935f6e1b1df162953f7ae96de7ee27702eb8f2da

Authored by tangwang
1 parent d3dd01d3

coarse_rank 搜参结果 **参数列表(4 套)**

- `baseline`(top771 最优,`seed_baseline`)
  - `es_bias: 10.0`, `es_exponent: 0.05`
  - `text_bias: 0.1`, `text_exponent: 0.35`, `text_translation_weight: 1.0`
  - `knn_text_weight: 1.0`, `knn_image_weight: 2.0`, `knn_tie_breaker: 0.3`
  - `knn_bias: 0.2`, `knn_exponent: 5.6`
  - `knn_text_bias: 0.2`, `knn_text_exponent: 0.0`
  - `knn_image_bias: 0.2`, `knn_image_exponent: 0.0`
- `54 条上得到的极端解`(`seed_legacy_bo234`)
  - `es_bias: 7.214`, `es_exponent: 0.2025`
  - `text_bias: 4.0`, `text_exponent: 1.584`, `text_translation_weight: 1.4441`
  - `knn_text_weight: 0.1`, `knn_image_weight: 5.6232`, `knn_tie_breaker: 0.021`
  - `knn_bias: 0.0019`, `knn_exponent: 11.8477`
  - `knn_text_bias: 2.3125`, `knn_text_exponent: 1.1547`
  - `knn_image_bias: 0.9641`, `knn_image_exponent: 5.8671`
- `bo_012`(`Primary_Metric_Score=0.485027`)
  - `es_bias: 6.6233`, `es_exponent: 0.2377`
  - `text_bias: 0.049`, `text_exponent: 0.4446`, `text_translation_weight: 1.6236`
  - `knn_text_weight: 1.0344`, `knn_image_weight: 1.3565`, `knn_tie_breaker: 0.212`
  - `knn_bias: 0.0052`, `knn_exponent: 4.4639`
  - `knn_text_bias: 0.1148`, `knn_text_exponent: 1.0926`
  - `knn_image_bias: 0.0114`, `knn_image_exponent: 5.2496`
- `bo_018`(`Primary_Metric_Score=0.484691`)
  - `es_bias: 8.8861`, `es_exponent: 0.2794`
  - `text_bias: 0.0189`, `text_exponent: 0.2`, `text_translation_weight: 1.7178`
  - `knn_text_weight: 1.7459`, `knn_image_weight: 4.2658`, `knn_tie_breaker: 0.2814`
  - `knn_bias: 0.001`, `knn_exponent: 1.4923`
  - `knn_text_bias: 4.0`, `knn_text_exponent: 0.9309`
  - `knn_image_bias: 0.01`, `knn_image_exponent: 5.8289`

**怎么找(可复现)**
- 从 `leaderboard.csv` 找(含分数+参数一行全):`artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/leaderboard.csv`
  - 例:`rg '^2,bo_012,' artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/leaderboard.csv`
- 从 `trials.jsonl` 找(最权威,调参器实际写入的 params):`artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/trials.jsonl`
  - 例:`rg '\"name\": \"bo_012\"' artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/trials.jsonl`
  - 例:`rg '\"name\": \"seed_legacy_bo234\"' artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/trials.jsonl`

**已补到 `config.yaml`**
- 我已把这 4 套参数作为“注释 presets”补在 `coarse_rank.fusion` 旁边:`config/config.yaml:236`
- 注意:你当前 `config/config.yaml` 里 `coarse_rank.fusion` 的生效值是 `knn_bias=0.6 / knn_exponent=0.4`,更像 `seed_low_knn_global`,不是本次大集最优的 baseline。
README.md
... ... @@ -46,10 +46,11 @@ source activate.sh
46 46 - `6002` backend(`/search/*`, `/admin/*`)
47 47 - `6003` frontend
48 48 - `6004` indexer(`/indexer/*`)
  49 +- `6006` translator
49 50 - `6005` embedding-text(可选,`POST /embed/text`;常见后端为 TEI,默认 `8080`)
50   -- `6006` translator(可选)
51   -- `6007` reranker(可选,`POST /rerank`;精排可与主重排分 `service_profile`,见 `config.yaml` → `fine_rank` / `services.rerank`)
52 51 - `6008` embedding-image(可选,`POST /embed/image` 等)
  52 +- `6007` reranker
  53 +- `6009` fine_rank
53 54 - `6010` eval-web(搜索评估 UI,`./scripts/service_ctl.sh` 服务名 `eval-web`)
54 55  
55 56 更完整示例见 `docs/QUICKSTART.md`。
... ...
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_knn_tail_20260424T093837Z.daemon.cmd 0 → 100644
... ... @@ -0,0 +1 @@
  1 +bash scripts/evaluation/run_coarse_fusion_tuning_resilient.sh coarse_fusion_clothing_top771_knn_tail_20260424T093837Z clothing_top771 20 2 96 20260424 scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771_knn_tail.yaml ''
... ...
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_knn_tail_20260424T093837Z.daemon.pid 0 → 100644
... ... @@ -0,0 +1 @@
  1 +1355252
... ...
config/config.yaml
... ... @@ -247,11 +247,11 @@ coarse_rank:
247 247 knn_image_weight: 2.0
248 248 knn_tie_breaker: 0.3
249 249 knn_bias: 0.6
250   - knn_exponent: 0.4
251   - knn_text_bias: 0.2
252   - knn_text_exponent: 0.0
253   - knn_image_bias: 0.2
254   - knn_image_exponent: 0.0
  250 + knn_exponent: 4.4639
  251 + knn_text_bias: 0.1148
  252 + knn_text_exponent: 1.0926
  253 + knn_image_bias: 0.0114
  254 + knn_image_exponent: 5.2496
255 255 fine_rank:
256 256 enabled: false
257 257 input_window: 160
... ...
docs/issues/issue-2026-04-16-bayes寻参-clothing_top771数据集上寻参.md
... ... @@ -87,3 +87,175 @@ Prompt - 1
87 87  
88 88  
89 89 Response - 1
  90 +跑完了,这个 `run_dir` 已经完成,没有在继续运行的调参进程。
  91 +
  92 +结果目录:
  93 +- [summary.md](/data/saas-search/artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/summary.md)
  94 +- [leaderboard.csv](/data/saas-search/artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/leaderboard.csv)
  95 +- [trials.jsonl](/data/saas-search/artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/trials.jsonl)
  96 +
  97 +**结论**
  98 +1. 这轮 18 个 live eval 全部跑满了。
  99 + 见 [summary.md](/data/saas-search/artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/summary.md:11)
  100 +
  101 +2. 大集 `clothing_top771` 上,最优仍然是 fusion-only baseline,`Primary_Metric_Score=0.487476`。
  102 + 见 [summary.md](/data/saas-search/artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/summary.md:17)
  103 +
  104 +3. 这轮优化器已经很接近 baseline,但没有超过。
  105 + 最接近的是:
  106 + - `bo_012 = 0.485027`
  107 + - `bo_018 = 0.484691`
  108 + - `bo_013 = 0.483849`
  109 + 与 baseline 的差距分别是 `-0.002449`、`-0.002785`、`-0.003627`
  110 +
  111 +4. 小集上那种“极端参数”没有迁移成功。
  112 + 典型例子:
  113 + - `seed_low_knn_global = 0.447485`
  114 + - `seed_legacy_bo340 = 0.477537`
  115 + - `seed_bigset_text_stable = 0.449802`
  116 + 说明大集不支持此前那类明显更激进的形态。
  117 +
  118 +**怎么解读这轮**
  119 +这轮最重要的收获不是“找到新最优”,而是把大集上的有效区间摸清了。
  120 +
  121 +大集上表现较好的候选,虽然具体值不同,但有几个共同点:
  122 +
  123 +1. `text_bias` 普遍很低。
  124 + 排名前几的候选大多在 `0.02 ~ 0.10` 附近,baseline 的 `0.1` 并不吃亏。
  125 + 这和小集里 `text_bias=4.0` 的方向明显相反。
  126 +
  127 +2. `es_exponent` 往往比 baseline 更高。
  128 + baseline 是 `0.05`,而靠前候选很多在 `0.15 ~ 0.28`。
  129 + 这说明大集上 lexical 这一路适度增强是有价值的,但提升幅度还不足以抵消其它项的副作用。
  130 +
  131 +3. `text_translation_weight` 往往高于 baseline。
  132 + 靠前候选多数在 `1.3 ~ 1.7`,比 baseline 的 `1.0` 明显更高。
  133 + 这是这轮最值得保留的信号之一。
  134 +
  135 +4. `knn_image_weight` 可以比 baseline 更高,但不能只靠“强图像+极端非线性”硬推。
  136 + 很多接近最优的点把 `knn_image_weight` 提到了 `4+`,但最终仍没超过 baseline。
  137 + 说明 image 通路能帮忙,但必须和其它项配平。
  138 +
  139 +5. `knn_bias` 低值是一个信号,但不是充分条件。
  140 + 不少高分候选把 `knn_bias` 压得很低,甚至接近 `0`。
  141 + 但单独走“低 knn_bias + 低 knn_exponent”的路线并不好,`seed_low_knn_global` 已经证伪了这一点。
  142 +
  143 +6. `knn_text_exponent` 和 `knn_image_exponent` 从 `0` 放开后,能接近 baseline,但还没形成稳定优势。
  144 + 靠前候选里这两个参数经常被拉高,说明它们不是完全无用;但从结果看,它们更像是“补偿项”,不是决定性增益项。
  145 +
  146 +**对上次担心的过拟合,这轮能下什么判断**
  147 +可以比较明确地说:
  148 +
  149 +- 小集 54 条上得到的极端解,确实存在明显过拟合迹象。
  150 +- 到 771 条大集后,最稳的仍是原 baseline 一类的“温和融合”方案。
  151 +- 但是 baseline 也不是绝对最优形态,至少有两类方向值得继续缩圈:
  152 + - 更高的 `es_exponent`
  153 + - 更高的 `text_translation_weight`
  154 +
  155 +**下一轮缩圈建议**
  156 +我建议不要再做同样宽度的 14 维长跑了,信息已经够了。下一轮可以做“baseline 附近精修”。
  157 +
  158 +建议缩圈到这组范围:
  159 +
  160 +- `es_bias: 7.5 ~ 11.5`
  161 +- `es_exponent: 0.12 ~ 0.28`
  162 +- `text_bias: 0.01 ~ 0.15`
  163 +- `text_exponent: 0.20 ~ 0.55`
  164 +- `text_translation_weight: 1.20 ~ 1.75`
  165 +- `knn_text_weight: 0.75 ~ 1.35`
  166 +- `knn_image_weight: 1.2 ~ 4.8`
  167 +- `knn_tie_breaker: 0.18 ~ 0.32`
  168 +- `knn_bias: 0.001 ~ 0.25`
  169 +- `knn_exponent: 3.5 ~ 8.0`
  170 +- `knn_text_bias: 0.001 ~ 0.3`
  171 +- `knn_text_exponent: 0.4 ~ 1.3`
  172 +- `knn_image_bias: 0.001 ~ 0.35`
  173 +- `knn_image_exponent: 2.0 ~ 5.8`
  174 +
  175 +如果要更激进地降成本,我建议直接冻结这几个:
  176 +- `text_bias=0.1`
  177 +- `knn_tie_breaker=0.24~0.30` 小范围
  178 +- `knn_text_bias` 缩很小
  179 +- `knn_image_bias` 缩很小
  180 +
  181 +然后重点打这 6 个:
  182 +- `es_exponent`
  183 +- `text_translation_weight`
  184 +- `knn_image_weight`
  185 +- `knn_bias`
  186 +- `knn_exponent`
  187 +- `knn_text_exponent`
  188 +
  189 +**额外发现**
  190 +流程上还有一个问题,这轮后半段 `bo_018` 在成功前重复失败了很多次,主要是:
  191 +- `build_annotation_set.py batch` 非零退出
  192 +- 少量 `restart.sh backend` 非零退出
  193 +
  194 +不过守护逻辑最终还是把任务跑完了。这个从 [summary.md](/data/saas-search/artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/summary.md:30) 的 failures 和 daemon log 都能看到。说明“自动续跑”已起作用,但评测链路本身还不够干净。
  195 +
  196 +如果你愿意,我下一步可以直接继续两件事里的一个:
  197 +1. 按这轮结论,给你生成一个 `clothing_top771_v2` 缩圈搜索空间并启动下一轮
  198 +2. 先处理 `build_annotation_set.py batch` 偶发失败的问题,减少下一轮长跑中的无效重试
  199 +
  200 +
  201 +
  202 +**参数列表(4 套)**
  203 +- `baseline`(top771 最优,`seed_baseline`)
  204 + - `es_bias: 10.0`, `es_exponent: 0.05`
  205 + - `text_bias: 0.1`, `text_exponent: 0.35`, `text_translation_weight: 1.0`
  206 + - `knn_text_weight: 1.0`, `knn_image_weight: 2.0`, `knn_tie_breaker: 0.3`
  207 + - `knn_bias: 0.2`, `knn_exponent: 5.6`
  208 + - `knn_text_bias: 0.2`, `knn_text_exponent: 0.0`
  209 + - `knn_image_bias: 0.2`, `knn_image_exponent: 0.0`
  210 +- `54 条上得到的极端解`(`seed_legacy_bo234`)
  211 + - `es_bias: 7.214`, `es_exponent: 0.2025`
  212 + - `text_bias: 4.0`, `text_exponent: 1.584`, `text_translation_weight: 1.4441`
  213 + - `knn_text_weight: 0.1`, `knn_image_weight: 5.6232`, `knn_tie_breaker: 0.021`
  214 + - `knn_bias: 0.0019`, `knn_exponent: 11.8477`
  215 + - `knn_text_bias: 2.3125`, `knn_text_exponent: 1.1547`
  216 + - `knn_image_bias: 0.9641`, `knn_image_exponent: 5.8671`
  217 +- `bo_012`(`Primary_Metric_Score=0.485027`)
  218 + - `es_bias: 6.6233`, `es_exponent: 0.2377`
  219 + - `text_bias: 0.049`, `text_exponent: 0.4446`, `text_translation_weight: 1.6236`
  220 + - `knn_text_weight: 1.0344`, `knn_image_weight: 1.3565`, `knn_tie_breaker: 0.212`
  221 + - `knn_bias: 0.0052`, `knn_exponent: 4.4639`
  222 + - `knn_text_bias: 0.1148`, `knn_text_exponent: 1.0926`
  223 + - `knn_image_bias: 0.0114`, `knn_image_exponent: 5.2496`
  224 +- `bo_018`(`Primary_Metric_Score=0.484691`)
  225 + - `es_bias: 8.8861`, `es_exponent: 0.2794`
  226 + - `text_bias: 0.0189`, `text_exponent: 0.2`, `text_translation_weight: 1.7178`
  227 + - `knn_text_weight: 1.7459`, `knn_image_weight: 4.2658`, `knn_tie_breaker: 0.2814`
  228 + - `knn_bias: 0.001`, `knn_exponent: 1.4923`
  229 + - `knn_text_bias: 4.0`, `knn_text_exponent: 0.9309`
  230 + - `knn_image_bias: 0.01`, `knn_image_exponent: 5.8289`
  231 +
  232 +**怎么找(可复现)**
  233 +- 从 `leaderboard.csv` 找(含分数+参数一行全):`artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/leaderboard.csv`
  234 + - 例:`rg '^2,bo_012,' artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/leaderboard.csv`
  235 +- 从 `trials.jsonl` 找(最权威,调参器实际写入的 params):`artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/trials.jsonl`
  236 + - 例:`rg '\"name\": \"bo_012\"' artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/trials.jsonl`
  237 + - 例:`rg '\"name\": \"seed_legacy_bo234\"' artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/trials.jsonl`
  238 +
  239 +**已补到 `config.yaml`**
  240 +- 我已把这 4 套参数作为“注释 presets”补在 `coarse_rank.fusion` 旁边:`config/config.yaml:236`
  241 +- 注意:你当前 `config/config.yaml` 里 `coarse_rank.fusion` 的生效值是 `knn_bias=0.6 / knn_exponent=0.4`,更像 `seed_low_knn_global`,不是本次大集最优的 baseline。
  242 +
  243 +
  244 +
  245 +
  246 +
  247 +
  248 +# request 2
  249 +
  250 +
  251 +固定这些参数(不再参与寻参):
  252 +es_bias: 6.62, es_exponent: 0.24
  253 +text_bias: 0.05 text_exponent: 0.445, text_translation_weight: 1.0
  254 +knn_text_weight: 1.0, knn_image_weight: 1.35, knn_tie_breaker: 0.212
  255 +knn_bias: 0.0052,
  256 +
  257 +然后对以下参数进行寻参:
  258 +knn_exponent(0.3-6.0)
  259 +knn_text_bias (0.0~0.3) knn_text_exponent (0.2 ~ 3.0)
  260 +knn_image_bias (0.0~0.3) knn_image_exponent (1.0~7.0)
  261 +设计好搜参脚本后跑起来,注意程序启动起来之后要检测是否运行稳定了,确保可以长时间运行直到全部跑完
90 262 \ No newline at end of file
... ...
scripts/evaluation/resume_coarse_fusion_tuning_knn_tail.sh 0 → 100755
... ... @@ -0,0 +1,72 @@
  1 +#!/bin/bash
  2 +
  3 +set -euo pipefail
  4 +
  5 +if [ "$#" -lt 1 ]; then
  6 + echo "usage: $0 <run_dir_or_name>" >&2
  7 + exit 1
  8 +fi
  9 +
  10 +cd "$(dirname "$0")/../.."
  11 +source ./activate.sh
  12 +
  13 +TARGET="$1"
  14 +
  15 +if [ -d "${TARGET}" ]; then
  16 + RUN_DIR="${TARGET}"
  17 + RUN_NAME="$(basename "${RUN_DIR}")"
  18 +else
  19 + RUN_NAME="${TARGET}"
  20 + RUN_DIR="artifacts/search_evaluation/tuning_runs/${RUN_NAME}"
  21 +fi
  22 +
  23 +if [ ! -d "${RUN_DIR}" ]; then
  24 + echo "run dir not found: ${RUN_DIR}" >&2
  25 + exit 1
  26 +fi
  27 +
  28 +DATASET_ID="${REPO_EVAL_DATASET_ID:-clothing_top771}"
  29 +MAX_EVALS="${MAX_EVALS:-20}"
  30 +BATCH_SIZE="${BATCH_SIZE:-2}"
  31 +CANDIDATE_POOL_SIZE="${CANDIDATE_POOL_SIZE:-96}"
  32 +RANDOM_SEED="${RANDOM_SEED:-20260424}"
  33 +BATCH_EVAL_TIMEOUT_SEC="${BATCH_EVAL_TIMEOUT_SEC:-0}"
  34 +
  35 +LAUNCH_DIR="artifacts/search_evaluation/tuning_launches"
  36 +mkdir -p "${LAUNCH_DIR}"
  37 +LOG_PATH="${LAUNCH_DIR}/${RUN_NAME}.daemon.log"
  38 +PID_PATH="${LAUNCH_DIR}/${RUN_NAME}.daemon.pid"
  39 +CMD_PATH="${LAUNCH_DIR}/${RUN_NAME}.daemon.cmd"
  40 +
  41 +CMD=(
  42 + bash
  43 + scripts/evaluation/run_coarse_fusion_tuning_resilient.sh
  44 + "${RUN_NAME}"
  45 + "${DATASET_ID}"
  46 + "${MAX_EVALS}"
  47 + "${BATCH_SIZE}"
  48 + "${CANDIDATE_POOL_SIZE}"
  49 + "${RANDOM_SEED}"
  50 + "${RUN_DIR}/search_space.yaml"
  51 + ""
  52 + "${RUN_DIR}"
  53 +)
  54 +
  55 +export BATCH_EVAL_TIMEOUT_SEC
  56 +
  57 +printf '%q ' "${CMD[@]}" > "${CMD_PATH}"
  58 +printf '\n' >> "${CMD_PATH}"
  59 +
  60 +setsid "${CMD[@]}" > "${LOG_PATH}" 2>&1 < /dev/null &
  61 +PID=$!
  62 +echo "${PID}" > "${PID_PATH}"
  63 +
  64 +echo "run_name=${RUN_NAME}"
  65 +echo "pid=${PID}"
  66 +echo "log=${LOG_PATH}"
  67 +echo "pid_file=${PID_PATH}"
  68 +echo "cmd_file=${CMD_PATH}"
  69 +echo "run_dir=${RUN_DIR}"
  70 +echo
  71 +echo "tail -f ${LOG_PATH}"
  72 +echo "cat ${RUN_DIR}/leaderboard.csv"
... ...
scripts/evaluation/run_coarse_fusion_tuning_resilient.sh
... ... @@ -29,9 +29,48 @@ RESTART_SLEEP_SEC=&quot;${RESTART_SLEEP_SEC:-30}&quot;
29 29 SEARCH_BASE_URL="${SEARCH_BASE_URL:-http://127.0.0.1:6002}"
30 30 EVAL_WEB_BASE_URL="${EVAL_WEB_BASE_URL:-http://127.0.0.1:6010}"
31 31 RUN_DIR="artifacts/search_evaluation/tuning_runs/${RUN_NAME}"
  32 +LOCK_DIR="${RUN_DIR}/.resilient_lock"
  33 +HEALTH_POLL_SEC="${HEALTH_POLL_SEC:-15}"
32 34  
33 35 mkdir -p "$(dirname "$RUN_DIR")"
34 36  
  37 +release_lock() {
  38 + if [ -d "$LOCK_DIR" ] && [ -f "$LOCK_DIR/pid" ] && [ "$(cat "$LOCK_DIR/pid" 2>/dev/null || true)" = "$$" ]; then
  39 + rm -rf "$LOCK_DIR"
  40 + fi
  41 +}
  42 +
  43 +acquire_lock() {
  44 + mkdir -p "$RUN_DIR"
  45 + if mkdir "$LOCK_DIR" 2>/dev/null; then
  46 + echo "$$" > "$LOCK_DIR/pid"
  47 + date -u +%Y-%m-%dT%H:%M:%SZ > "$LOCK_DIR/started_at"
  48 + return 0
  49 + fi
  50 +
  51 + local owner_pid=""
  52 + if [ -f "$LOCK_DIR/pid" ]; then
  53 + owner_pid="$(cat "$LOCK_DIR/pid" 2>/dev/null || true)"
  54 + fi
  55 + if [ -n "$owner_pid" ] && kill -0 "$owner_pid" 2>/dev/null; then
  56 + echo "[resilient] lock already held by pid=${owner_pid}, exiting"
  57 + exit 0
  58 + fi
  59 +
  60 + echo "[resilient] removing stale lock at ${LOCK_DIR}"
  61 + rm -rf "$LOCK_DIR"
  62 + if mkdir "$LOCK_DIR" 2>/dev/null; then
  63 + echo "$$" > "$LOCK_DIR/pid"
  64 + date -u +%Y-%m-%dT%H:%M:%SZ > "$LOCK_DIR/started_at"
  65 + return 0
  66 + fi
  67 +
  68 + echo "[resilient] failed to acquire lock at ${LOCK_DIR}"
  69 + exit 1
  70 +}
  71 +
  72 +trap release_lock EXIT INT TERM
  73 +
35 74 count_live_successes() {
36 75 python3 - "$RUN_DIR" <<'PY'
37 76 import json
... ... @@ -53,13 +92,61 @@ print(count)
53 92 PY
54 93 }
55 94  
  95 +wait_for_health() {
  96 + local url="$1"
  97 + local timeout_sec="$2"
  98 + local deadline=$(( $(date +%s) + timeout_sec ))
  99 + while [ "$(date +%s)" -lt "$deadline" ]; do
  100 + if curl -fsS "$url" >/dev/null 2>&1; then
  101 + return 0
  102 + fi
  103 + sleep 2
  104 + done
  105 + return 1
  106 +}
  107 +
  108 +ensure_services() {
  109 + if ! wait_for_health "${SEARCH_BASE_URL}/health" 20; then
  110 + echo "[resilient] backend unhealthy, restarting backend"
  111 + ./restart.sh backend || true
  112 + sleep 5
  113 + fi
  114 + if ! wait_for_health "${SEARCH_BASE_URL}/health" 180; then
  115 + echo "[resilient] backend still unhealthy after restart"
  116 + return 1
  117 + fi
  118 +
  119 + if ! wait_for_health "${EVAL_WEB_BASE_URL}/api/history" 20; then
  120 + echo "[resilient] eval-web unhealthy, restarting eval-web"
  121 + ./restart.sh eval-web || true
  122 + sleep 5
  123 + fi
  124 + if ! wait_for_health "${EVAL_WEB_BASE_URL}/api/history" 180; then
  125 + echo "[resilient] eval-web still unhealthy after restart"
  126 + return 1
  127 + fi
  128 + return 0
  129 +}
  130 +
  131 +heal_services_nonblocking() {
  132 + if ! curl -fsS "${SEARCH_BASE_URL}/health" >/dev/null 2>&1; then
  133 + echo "[resilient] backend became unhealthy during run, restarting backend"
  134 + ./restart.sh backend || true
  135 + sleep 5
  136 + fi
  137 + if ! curl -fsS "${EVAL_WEB_BASE_URL}/api/history" >/dev/null 2>&1; then
  138 + echo "[resilient] eval-web became unhealthy during run, restarting eval-web"
  139 + ./restart.sh eval-web || true
  140 + sleep 5
  141 + fi
  142 +}
  143 +
56 144 build_cmd() {
57 145 local cmd=(
58 146 python
59 147 scripts/evaluation/tune_fusion.py
60 148 --mode optimize
61 149 --search-space "$SEARCH_SPACE"
62   - --seed-report "$SEED_REPORT"
63 150 --tenant-id 163
64 151 --dataset-id "$DATASET_ID"
65 152 --queries-file scripts/evaluation/queries/queries.txt
... ... @@ -73,6 +160,9 @@ build_cmd() {
73 160 --random-seed "$RANDOM_SEED"
74 161 --batch-eval-timeout-sec "$BATCH_EVAL_TIMEOUT_SEC"
75 162 )
  163 + if [ -n "$SEED_REPORT" ]; then
  164 + cmd+=(--seed-report "$SEED_REPORT")
  165 + fi
76 166 if [ -n "$RESUME_RUN_DIR" ]; then
77 167 cmd+=(--resume-run "$RESUME_RUN_DIR")
78 168 else
... ... @@ -83,6 +173,7 @@ build_cmd() {
83 173 }
84 174  
85 175 attempt=0
  176 +acquire_lock
86 177 while true; do
87 178 live_successes="$(count_live_successes)"
88 179 if [ "$live_successes" -ge "$MAX_EVALS" ]; then
... ... @@ -96,11 +187,23 @@ while true; do
96 187 fi
97 188  
98 189 echo "[resilient] attempt=$attempt run_name=$RUN_NAME live_successes=$live_successes target=$MAX_EVALS"
  190 + if ! ensure_services; then
  191 + echo "[resilient] service preflight failed, sleeping ${RESTART_SLEEP_SEC}s before retry"
  192 + sleep "$RESTART_SLEEP_SEC"
  193 + continue
  194 + fi
99 195 CMD_STR="$(build_cmd)"
100 196 echo "[resilient] cmd=$CMD_STR"
101 197  
102 198 set +e
103   - bash -lc "$CMD_STR"
  199 + bash -lc "$CMD_STR" &
  200 + child_pid=$!
  201 + echo "[resilient] child_pid=${child_pid}"
  202 + while kill -0 "$child_pid" 2>/dev/null; do
  203 + heal_services_nonblocking
  204 + sleep "$HEALTH_POLL_SEC"
  205 + done
  206 + wait "$child_pid"
104 207 exit_code=$?
105 208 set -e
106 209  
... ... @@ -112,6 +215,9 @@ while true; do
112 215 exit 0
113 216 fi
114 217  
  218 + if ! ensure_services; then
  219 + echo "[resilient] service recovery failed after exit_code=$exit_code"
  220 + fi
115 221 echo "[resilient] sleeping ${RESTART_SLEEP_SEC}s before resume"
116 222 sleep "$RESTART_SLEEP_SEC"
117 223 done
... ...
scripts/evaluation/start_coarse_fusion_tuning_knn_tail.sh 0 → 100755
... ... @@ -0,0 +1,53 @@
  1 +#!/bin/bash
  2 +
  3 +set -euo pipefail
  4 +
  5 +cd "$(dirname "$0")/../.."
  6 +source ./activate.sh
  7 +
  8 +RUN_NAME="${RUN_NAME:-coarse_fusion_clothing_top771_knn_tail_$(date -u +%Y%m%dT%H%M%SZ)}"
  9 +DATASET_ID="${REPO_EVAL_DATASET_ID:-clothing_top771}"
  10 +SEARCH_SPACE="${SEARCH_SPACE:-scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771_knn_tail.yaml}"
  11 +MAX_EVALS="${MAX_EVALS:-20}"
  12 +BATCH_SIZE="${BATCH_SIZE:-2}"
  13 +CANDIDATE_POOL_SIZE="${CANDIDATE_POOL_SIZE:-96}"
  14 +RANDOM_SEED="${RANDOM_SEED:-20260424}"
  15 +BATCH_EVAL_TIMEOUT_SEC="${BATCH_EVAL_TIMEOUT_SEC:-0}"
  16 +
  17 +LAUNCH_DIR="artifacts/search_evaluation/tuning_launches"
  18 +mkdir -p "${LAUNCH_DIR}"
  19 +LOG_PATH="${LAUNCH_DIR}/${RUN_NAME}.daemon.log"
  20 +PID_PATH="${LAUNCH_DIR}/${RUN_NAME}.daemon.pid"
  21 +CMD_PATH="${LAUNCH_DIR}/${RUN_NAME}.daemon.cmd"
  22 +
  23 +CMD=(
  24 + bash
  25 + scripts/evaluation/run_coarse_fusion_tuning_resilient.sh
  26 + "${RUN_NAME}"
  27 + "${DATASET_ID}"
  28 + "${MAX_EVALS}"
  29 + "${BATCH_SIZE}"
  30 + "${CANDIDATE_POOL_SIZE}"
  31 + "${RANDOM_SEED}"
  32 + "${SEARCH_SPACE}"
  33 + ""
  34 +)
  35 +
  36 +export BATCH_EVAL_TIMEOUT_SEC
  37 +
  38 +printf '%q ' "${CMD[@]}" > "${CMD_PATH}"
  39 +printf '\n' >> "${CMD_PATH}"
  40 +
  41 +setsid "${CMD[@]}" > "${LOG_PATH}" 2>&1 < /dev/null &
  42 +PID=$!
  43 +echo "${PID}" > "${PID_PATH}"
  44 +
  45 +echo "run_name=${RUN_NAME}"
  46 +echo "pid=${PID}"
  47 +echo "log=${LOG_PATH}"
  48 +echo "pid_file=${PID_PATH}"
  49 +echo "cmd_file=${CMD_PATH}"
  50 +echo "run_dir=artifacts/search_evaluation/tuning_runs/${RUN_NAME}"
  51 +echo
  52 +echo "tail -f ${LOG_PATH}"
  53 +echo "cat artifacts/search_evaluation/tuning_runs/${RUN_NAME}/leaderboard.csv"
... ...
scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771_knn_tail.yaml 0 → 100644
... ... @@ -0,0 +1,82 @@
  1 +target_path: coarse_rank.fusion
  2 +
  3 +baseline:
  4 + es_bias: 6.62
  5 + es_exponent: 0.24
  6 + text_bias: 0.05
  7 + text_exponent: 0.445
  8 + text_translation_weight: 1.0
  9 + knn_text_weight: 1.0
  10 + knn_image_weight: 1.35
  11 + knn_tie_breaker: 0.212
  12 + knn_bias: 0.0052
  13 + knn_exponent: 4.4639
  14 + knn_text_bias: 0.1148
  15 + knn_text_exponent: 1.0926
  16 + knn_image_bias: 0.0114
  17 + knn_image_exponent: 5.2496
  18 +
  19 +parameters:
  20 + knn_exponent: {min: 0.3, max: 6.0, scale: linear, round: 4}
  21 + knn_text_bias: {min: 0.0, max: 0.3, scale: linear, round: 4}
  22 + knn_text_exponent: {min: 0.2, max: 3.0, scale: linear, round: 4}
  23 + knn_image_bias: {min: 0.0, max: 0.3, scale: linear, round: 4}
  24 + knn_image_exponent: {min: 1.0, max: 7.0, scale: linear, round: 4}
  25 +
  26 +seed_experiments:
  27 + - name: seed_fixed_anchor
  28 + description: 以 bo_012 的 5 维 knn 子项为锚点,在新固定参数下验证迁移。
  29 + params:
  30 + knn_exponent: 4.4639
  31 + knn_text_bias: 0.1148
  32 + knn_text_exponent: 1.0926
  33 + knn_image_bias: 0.0114
  34 + knn_image_exponent: 5.2496
  35 + - name: seed_knn_soft
  36 + description: 更平滑的全局 knn 指数,保留较强 image 子项指数。
  37 + params:
  38 + knn_exponent: 1.2
  39 + knn_text_bias: 0.06
  40 + knn_text_exponent: 0.9
  41 + knn_image_bias: 0.02
  42 + knn_image_exponent: 5.4
  43 + - name: seed_knn_balanced
  44 + description: 中等 knn 指数和中等子项非线性,作为稳健中心点。
  45 + params:
  46 + knn_exponent: 2.8
  47 + knn_text_bias: 0.12
  48 + knn_text_exponent: 1.4
  49 + knn_image_bias: 0.05
  50 + knn_image_exponent: 4.2
  51 + - name: seed_knn_high
  52 + description: 更高的全局 knn 指数,检查大集是否仍偏好更陡的 top-rank 强化。
  53 + params:
  54 + knn_exponent: 5.6
  55 + knn_text_bias: 0.04
  56 + knn_text_exponent: 0.8
  57 + knn_image_bias: 0.03
  58 + knn_image_exponent: 5.0
  59 + - name: seed_text_branch_heavier
  60 + description: 提高 knn_text 子项偏置和指数,观察 text/image 子项间的平衡点。
  61 + params:
  62 + knn_exponent: 3.6
  63 + knn_text_bias: 0.22
  64 + knn_text_exponent: 2.2
  65 + knn_image_bias: 0.01
  66 + knn_image_exponent: 3.2
  67 + - name: seed_image_branch_heavier
  68 + description: 提高 knn_image 子项偏置和指数,检查 image 通路在当前固定主参数下的上限。
  69 + params:
  70 + knn_exponent: 3.4
  71 + knn_text_bias: 0.03
  72 + knn_text_exponent: 0.6
  73 + knn_image_bias: 0.16
  74 + knn_image_exponent: 6.2
  75 +
  76 +optimizer:
  77 + init_random: 2
  78 + candidate_pool_size: 96
  79 + explore_probability: 0.14
  80 + local_jitter_probability: 0.62
  81 + elite_fraction: 0.3
  82 + min_normalized_distance: 0.06
... ...