Commit 935f6e1b1df162953f7ae96de7ee27702eb8f2da
1 parent
d3dd01d3
coarse_rank 搜参结果 **参数列表(4 套)**
- `baseline`(top771 最优,`seed_baseline`) - `es_bias: 10.0`, `es_exponent: 0.05` - `text_bias: 0.1`, `text_exponent: 0.35`, `text_translation_weight: 1.0` - `knn_text_weight: 1.0`, `knn_image_weight: 2.0`, `knn_tie_breaker: 0.3` - `knn_bias: 0.2`, `knn_exponent: 5.6` - `knn_text_bias: 0.2`, `knn_text_exponent: 0.0` - `knn_image_bias: 0.2`, `knn_image_exponent: 0.0` - `54 条上得到的极端解`(`seed_legacy_bo234`) - `es_bias: 7.214`, `es_exponent: 0.2025` - `text_bias: 4.0`, `text_exponent: 1.584`, `text_translation_weight: 1.4441` - `knn_text_weight: 0.1`, `knn_image_weight: 5.6232`, `knn_tie_breaker: 0.021` - `knn_bias: 0.0019`, `knn_exponent: 11.8477` - `knn_text_bias: 2.3125`, `knn_text_exponent: 1.1547` - `knn_image_bias: 0.9641`, `knn_image_exponent: 5.8671` - `bo_012`(`Primary_Metric_Score=0.485027`) - `es_bias: 6.6233`, `es_exponent: 0.2377` - `text_bias: 0.049`, `text_exponent: 0.4446`, `text_translation_weight: 1.6236` - `knn_text_weight: 1.0344`, `knn_image_weight: 1.3565`, `knn_tie_breaker: 0.212` - `knn_bias: 0.0052`, `knn_exponent: 4.4639` - `knn_text_bias: 0.1148`, `knn_text_exponent: 1.0926` - `knn_image_bias: 0.0114`, `knn_image_exponent: 5.2496` - `bo_018`(`Primary_Metric_Score=0.484691`) - `es_bias: 8.8861`, `es_exponent: 0.2794` - `text_bias: 0.0189`, `text_exponent: 0.2`, `text_translation_weight: 1.7178` - `knn_text_weight: 1.7459`, `knn_image_weight: 4.2658`, `knn_tie_breaker: 0.2814` - `knn_bias: 0.001`, `knn_exponent: 1.4923` - `knn_text_bias: 4.0`, `knn_text_exponent: 0.9309` - `knn_image_bias: 0.01`, `knn_image_exponent: 5.8289` **怎么找(可复现)** - 从 `leaderboard.csv` 找(含分数+参数一行全):`artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/leaderboard.csv` - 例:`rg '^2,bo_012,' artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/leaderboard.csv` - 从 `trials.jsonl` 找(最权威,调参器实际写入的 params):`artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/trials.jsonl` - 例:`rg '\"name\": \"bo_012\"' artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/trials.jsonl` - 例:`rg '\"name\": \"seed_legacy_bo234\"' artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/trials.jsonl` **已补到 `config.yaml`** - 我已把这 4 套参数作为“注释 presets”补在 `coarse_rank.fusion` 旁边:`config/config.yaml:236` - 注意:你当前 `config/config.yaml` 里 `coarse_rank.fusion` 的生效值是 `knn_bias=0.6 / knn_exponent=0.4`,更像 `seed_low_knn_global`,不是本次大集最优的 baseline。
Showing
9 changed files
with
497 additions
and
9 deletions
Show diff stats
README.md
| ... | ... | @@ -46,10 +46,11 @@ source activate.sh |
| 46 | 46 | - `6002` backend(`/search/*`, `/admin/*`) |
| 47 | 47 | - `6003` frontend |
| 48 | 48 | - `6004` indexer(`/indexer/*`) |
| 49 | +- `6006` translator | |
| 49 | 50 | - `6005` embedding-text(可选,`POST /embed/text`;常见后端为 TEI,默认 `8080`) |
| 50 | -- `6006` translator(可选) | |
| 51 | -- `6007` reranker(可选,`POST /rerank`;精排可与主重排分 `service_profile`,见 `config.yaml` → `fine_rank` / `services.rerank`) | |
| 52 | 51 | - `6008` embedding-image(可选,`POST /embed/image` 等) |
| 52 | +- `6007` reranker | |
| 53 | +- `6009` fine_rank | |
| 53 | 54 | - `6010` eval-web(搜索评估 UI,`./scripts/service_ctl.sh` 服务名 `eval-web`) |
| 54 | 55 | |
| 55 | 56 | 更完整示例见 `docs/QUICKSTART.md`。 | ... | ... |
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_knn_tail_20260424T093837Z.daemon.cmd
0 → 100644
| ... | ... | @@ -0,0 +1 @@ |
| 1 | +bash scripts/evaluation/run_coarse_fusion_tuning_resilient.sh coarse_fusion_clothing_top771_knn_tail_20260424T093837Z clothing_top771 20 2 96 20260424 scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771_knn_tail.yaml '' | ... | ... |
artifacts/search_evaluation/tuning_launches/coarse_fusion_clothing_top771_knn_tail_20260424T093837Z.daemon.pid
0 → 100644
| ... | ... | @@ -0,0 +1 @@ |
| 1 | +1355252 | ... | ... |
config/config.yaml
| ... | ... | @@ -247,11 +247,11 @@ coarse_rank: |
| 247 | 247 | knn_image_weight: 2.0 |
| 248 | 248 | knn_tie_breaker: 0.3 |
| 249 | 249 | knn_bias: 0.6 |
| 250 | - knn_exponent: 0.4 | |
| 251 | - knn_text_bias: 0.2 | |
| 252 | - knn_text_exponent: 0.0 | |
| 253 | - knn_image_bias: 0.2 | |
| 254 | - knn_image_exponent: 0.0 | |
| 250 | + knn_exponent: 4.4639 | |
| 251 | + knn_text_bias: 0.1148 | |
| 252 | + knn_text_exponent: 1.0926 | |
| 253 | + knn_image_bias: 0.0114 | |
| 254 | + knn_image_exponent: 5.2496 | |
| 255 | 255 | fine_rank: |
| 256 | 256 | enabled: false |
| 257 | 257 | input_window: 160 | ... | ... |
docs/issues/issue-2026-04-16-bayes寻参-clothing_top771数据集上寻参.md
| ... | ... | @@ -87,3 +87,175 @@ Prompt - 1 |
| 87 | 87 | |
| 88 | 88 | |
| 89 | 89 | Response - 1 |
| 90 | +跑完了,这个 `run_dir` 已经完成,没有在继续运行的调参进程。 | |
| 91 | + | |
| 92 | +结果目录: | |
| 93 | +- [summary.md](/data/saas-search/artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/summary.md) | |
| 94 | +- [leaderboard.csv](/data/saas-search/artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/leaderboard.csv) | |
| 95 | +- [trials.jsonl](/data/saas-search/artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/trials.jsonl) | |
| 96 | + | |
| 97 | +**结论** | |
| 98 | +1. 这轮 18 个 live eval 全部跑满了。 | |
| 99 | + 见 [summary.md](/data/saas-search/artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/summary.md:11) | |
| 100 | + | |
| 101 | +2. 大集 `clothing_top771` 上,最优仍然是 fusion-only baseline,`Primary_Metric_Score=0.487476`。 | |
| 102 | + 见 [summary.md](/data/saas-search/artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/summary.md:17) | |
| 103 | + | |
| 104 | +3. 这轮优化器已经很接近 baseline,但没有超过。 | |
| 105 | + 最接近的是: | |
| 106 | + - `bo_012 = 0.485027` | |
| 107 | + - `bo_018 = 0.484691` | |
| 108 | + - `bo_013 = 0.483849` | |
| 109 | + 与 baseline 的差距分别是 `-0.002449`、`-0.002785`、`-0.003627` | |
| 110 | + | |
| 111 | +4. 小集上那种“极端参数”没有迁移成功。 | |
| 112 | + 典型例子: | |
| 113 | + - `seed_low_knn_global = 0.447485` | |
| 114 | + - `seed_legacy_bo340 = 0.477537` | |
| 115 | + - `seed_bigset_text_stable = 0.449802` | |
| 116 | + 说明大集不支持此前那类明显更激进的形态。 | |
| 117 | + | |
| 118 | +**怎么解读这轮** | |
| 119 | +这轮最重要的收获不是“找到新最优”,而是把大集上的有效区间摸清了。 | |
| 120 | + | |
| 121 | +大集上表现较好的候选,虽然具体值不同,但有几个共同点: | |
| 122 | + | |
| 123 | +1. `text_bias` 普遍很低。 | |
| 124 | + 排名前几的候选大多在 `0.02 ~ 0.10` 附近,baseline 的 `0.1` 并不吃亏。 | |
| 125 | + 这和小集里 `text_bias=4.0` 的方向明显相反。 | |
| 126 | + | |
| 127 | +2. `es_exponent` 往往比 baseline 更高。 | |
| 128 | + baseline 是 `0.05`,而靠前候选很多在 `0.15 ~ 0.28`。 | |
| 129 | + 这说明大集上 lexical 这一路适度增强是有价值的,但提升幅度还不足以抵消其它项的副作用。 | |
| 130 | + | |
| 131 | +3. `text_translation_weight` 往往高于 baseline。 | |
| 132 | + 靠前候选多数在 `1.3 ~ 1.7`,比 baseline 的 `1.0` 明显更高。 | |
| 133 | + 这是这轮最值得保留的信号之一。 | |
| 134 | + | |
| 135 | +4. `knn_image_weight` 可以比 baseline 更高,但不能只靠“强图像+极端非线性”硬推。 | |
| 136 | + 很多接近最优的点把 `knn_image_weight` 提到了 `4+`,但最终仍没超过 baseline。 | |
| 137 | + 说明 image 通路能帮忙,但必须和其它项配平。 | |
| 138 | + | |
| 139 | +5. `knn_bias` 低值是一个信号,但不是充分条件。 | |
| 140 | + 不少高分候选把 `knn_bias` 压得很低,甚至接近 `0`。 | |
| 141 | + 但单独走“低 knn_bias + 低 knn_exponent”的路线并不好,`seed_low_knn_global` 已经证伪了这一点。 | |
| 142 | + | |
| 143 | +6. `knn_text_exponent` 和 `knn_image_exponent` 从 `0` 放开后,能接近 baseline,但还没形成稳定优势。 | |
| 144 | + 靠前候选里这两个参数经常被拉高,说明它们不是完全无用;但从结果看,它们更像是“补偿项”,不是决定性增益项。 | |
| 145 | + | |
| 146 | +**对上次担心的过拟合,这轮能下什么判断** | |
| 147 | +可以比较明确地说: | |
| 148 | + | |
| 149 | +- 小集 54 条上得到的极端解,确实存在明显过拟合迹象。 | |
| 150 | +- 到 771 条大集后,最稳的仍是原 baseline 一类的“温和融合”方案。 | |
| 151 | +- 但是 baseline 也不是绝对最优形态,至少有两类方向值得继续缩圈: | |
| 152 | + - 更高的 `es_exponent` | |
| 153 | + - 更高的 `text_translation_weight` | |
| 154 | + | |
| 155 | +**下一轮缩圈建议** | |
| 156 | +我建议不要再做同样宽度的 14 维长跑了,信息已经够了。下一轮可以做“baseline 附近精修”。 | |
| 157 | + | |
| 158 | +建议缩圈到这组范围: | |
| 159 | + | |
| 160 | +- `es_bias: 7.5 ~ 11.5` | |
| 161 | +- `es_exponent: 0.12 ~ 0.28` | |
| 162 | +- `text_bias: 0.01 ~ 0.15` | |
| 163 | +- `text_exponent: 0.20 ~ 0.55` | |
| 164 | +- `text_translation_weight: 1.20 ~ 1.75` | |
| 165 | +- `knn_text_weight: 0.75 ~ 1.35` | |
| 166 | +- `knn_image_weight: 1.2 ~ 4.8` | |
| 167 | +- `knn_tie_breaker: 0.18 ~ 0.32` | |
| 168 | +- `knn_bias: 0.001 ~ 0.25` | |
| 169 | +- `knn_exponent: 3.5 ~ 8.0` | |
| 170 | +- `knn_text_bias: 0.001 ~ 0.3` | |
| 171 | +- `knn_text_exponent: 0.4 ~ 1.3` | |
| 172 | +- `knn_image_bias: 0.001 ~ 0.35` | |
| 173 | +- `knn_image_exponent: 2.0 ~ 5.8` | |
| 174 | + | |
| 175 | +如果要更激进地降成本,我建议直接冻结这几个: | |
| 176 | +- `text_bias=0.1` | |
| 177 | +- `knn_tie_breaker=0.24~0.30` 小范围 | |
| 178 | +- `knn_text_bias` 缩很小 | |
| 179 | +- `knn_image_bias` 缩很小 | |
| 180 | + | |
| 181 | +然后重点打这 6 个: | |
| 182 | +- `es_exponent` | |
| 183 | +- `text_translation_weight` | |
| 184 | +- `knn_image_weight` | |
| 185 | +- `knn_bias` | |
| 186 | +- `knn_exponent` | |
| 187 | +- `knn_text_exponent` | |
| 188 | + | |
| 189 | +**额外发现** | |
| 190 | +流程上还有一个问题,这轮后半段 `bo_018` 在成功前重复失败了很多次,主要是: | |
| 191 | +- `build_annotation_set.py batch` 非零退出 | |
| 192 | +- 少量 `restart.sh backend` 非零退出 | |
| 193 | + | |
| 194 | +不过守护逻辑最终还是把任务跑完了。这个从 [summary.md](/data/saas-search/artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/summary.md:30) 的 failures 和 daemon log 都能看到。说明“自动续跑”已起作用,但评测链路本身还不够干净。 | |
| 195 | + | |
| 196 | +如果你愿意,我下一步可以直接继续两件事里的一个: | |
| 197 | +1. 按这轮结论,给你生成一个 `clothing_top771_v2` 缩圈搜索空间并启动下一轮 | |
| 198 | +2. 先处理 `build_annotation_set.py batch` 偶发失败的问题,减少下一轮长跑中的无效重试 | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | +**参数列表(4 套)** | |
| 203 | +- `baseline`(top771 最优,`seed_baseline`) | |
| 204 | + - `es_bias: 10.0`, `es_exponent: 0.05` | |
| 205 | + - `text_bias: 0.1`, `text_exponent: 0.35`, `text_translation_weight: 1.0` | |
| 206 | + - `knn_text_weight: 1.0`, `knn_image_weight: 2.0`, `knn_tie_breaker: 0.3` | |
| 207 | + - `knn_bias: 0.2`, `knn_exponent: 5.6` | |
| 208 | + - `knn_text_bias: 0.2`, `knn_text_exponent: 0.0` | |
| 209 | + - `knn_image_bias: 0.2`, `knn_image_exponent: 0.0` | |
| 210 | +- `54 条上得到的极端解`(`seed_legacy_bo234`) | |
| 211 | + - `es_bias: 7.214`, `es_exponent: 0.2025` | |
| 212 | + - `text_bias: 4.0`, `text_exponent: 1.584`, `text_translation_weight: 1.4441` | |
| 213 | + - `knn_text_weight: 0.1`, `knn_image_weight: 5.6232`, `knn_tie_breaker: 0.021` | |
| 214 | + - `knn_bias: 0.0019`, `knn_exponent: 11.8477` | |
| 215 | + - `knn_text_bias: 2.3125`, `knn_text_exponent: 1.1547` | |
| 216 | + - `knn_image_bias: 0.9641`, `knn_image_exponent: 5.8671` | |
| 217 | +- `bo_012`(`Primary_Metric_Score=0.485027`) | |
| 218 | + - `es_bias: 6.6233`, `es_exponent: 0.2377` | |
| 219 | + - `text_bias: 0.049`, `text_exponent: 0.4446`, `text_translation_weight: 1.6236` | |
| 220 | + - `knn_text_weight: 1.0344`, `knn_image_weight: 1.3565`, `knn_tie_breaker: 0.212` | |
| 221 | + - `knn_bias: 0.0052`, `knn_exponent: 4.4639` | |
| 222 | + - `knn_text_bias: 0.1148`, `knn_text_exponent: 1.0926` | |
| 223 | + - `knn_image_bias: 0.0114`, `knn_image_exponent: 5.2496` | |
| 224 | +- `bo_018`(`Primary_Metric_Score=0.484691`) | |
| 225 | + - `es_bias: 8.8861`, `es_exponent: 0.2794` | |
| 226 | + - `text_bias: 0.0189`, `text_exponent: 0.2`, `text_translation_weight: 1.7178` | |
| 227 | + - `knn_text_weight: 1.7459`, `knn_image_weight: 4.2658`, `knn_tie_breaker: 0.2814` | |
| 228 | + - `knn_bias: 0.001`, `knn_exponent: 1.4923` | |
| 229 | + - `knn_text_bias: 4.0`, `knn_text_exponent: 0.9309` | |
| 230 | + - `knn_image_bias: 0.01`, `knn_image_exponent: 5.8289` | |
| 231 | + | |
| 232 | +**怎么找(可复现)** | |
| 233 | +- 从 `leaderboard.csv` 找(含分数+参数一行全):`artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/leaderboard.csv` | |
| 234 | + - 例:`rg '^2,bo_012,' artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/leaderboard.csv` | |
| 235 | +- 从 `trials.jsonl` 找(最权威,调参器实际写入的 params):`artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/trials.jsonl` | |
| 236 | + - 例:`rg '\"name\": \"bo_012\"' artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/trials.jsonl` | |
| 237 | + - 例:`rg '\"name\": \"seed_legacy_bo234\"' artifacts/search_evaluation/tuning_runs/coarse_fusion_clothing_top771_resilient_20260422T091650Z/trials.jsonl` | |
| 238 | + | |
| 239 | +**已补到 `config.yaml`** | |
| 240 | +- 我已把这 4 套参数作为“注释 presets”补在 `coarse_rank.fusion` 旁边:`config/config.yaml:236` | |
| 241 | +- 注意:你当前 `config/config.yaml` 里 `coarse_rank.fusion` 的生效值是 `knn_bias=0.6 / knn_exponent=0.4`,更像 `seed_low_knn_global`,不是本次大集最优的 baseline。 | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | +# request 2 | |
| 249 | + | |
| 250 | + | |
| 251 | +固定这些参数(不再参与寻参): | |
| 252 | +es_bias: 6.62, es_exponent: 0.24 | |
| 253 | +text_bias: 0.05 text_exponent: 0.445, text_translation_weight: 1.0 | |
| 254 | +knn_text_weight: 1.0, knn_image_weight: 1.35, knn_tie_breaker: 0.212 | |
| 255 | +knn_bias: 0.0052, | |
| 256 | + | |
| 257 | +然后对以下参数进行寻参: | |
| 258 | +knn_exponent(0.3-6.0) | |
| 259 | +knn_text_bias (0.0~0.3) knn_text_exponent (0.2 ~ 3.0) | |
| 260 | +knn_image_bias (0.0~0.3) knn_image_exponent (1.0~7.0) | |
| 261 | +设计好搜参脚本后跑起来,注意程序启动起来之后要检测是否运行稳定了,确保可以长时间运行直到全部跑完 | |
| 90 | 262 | \ No newline at end of file | ... | ... |
scripts/evaluation/resume_coarse_fusion_tuning_knn_tail.sh
0 → 100755
| ... | ... | @@ -0,0 +1,72 @@ |
| 1 | +#!/bin/bash | |
| 2 | + | |
| 3 | +set -euo pipefail | |
| 4 | + | |
| 5 | +if [ "$#" -lt 1 ]; then | |
| 6 | + echo "usage: $0 <run_dir_or_name>" >&2 | |
| 7 | + exit 1 | |
| 8 | +fi | |
| 9 | + | |
| 10 | +cd "$(dirname "$0")/../.." | |
| 11 | +source ./activate.sh | |
| 12 | + | |
| 13 | +TARGET="$1" | |
| 14 | + | |
| 15 | +if [ -d "${TARGET}" ]; then | |
| 16 | + RUN_DIR="${TARGET}" | |
| 17 | + RUN_NAME="$(basename "${RUN_DIR}")" | |
| 18 | +else | |
| 19 | + RUN_NAME="${TARGET}" | |
| 20 | + RUN_DIR="artifacts/search_evaluation/tuning_runs/${RUN_NAME}" | |
| 21 | +fi | |
| 22 | + | |
| 23 | +if [ ! -d "${RUN_DIR}" ]; then | |
| 24 | + echo "run dir not found: ${RUN_DIR}" >&2 | |
| 25 | + exit 1 | |
| 26 | +fi | |
| 27 | + | |
| 28 | +DATASET_ID="${REPO_EVAL_DATASET_ID:-clothing_top771}" | |
| 29 | +MAX_EVALS="${MAX_EVALS:-20}" | |
| 30 | +BATCH_SIZE="${BATCH_SIZE:-2}" | |
| 31 | +CANDIDATE_POOL_SIZE="${CANDIDATE_POOL_SIZE:-96}" | |
| 32 | +RANDOM_SEED="${RANDOM_SEED:-20260424}" | |
| 33 | +BATCH_EVAL_TIMEOUT_SEC="${BATCH_EVAL_TIMEOUT_SEC:-0}" | |
| 34 | + | |
| 35 | +LAUNCH_DIR="artifacts/search_evaluation/tuning_launches" | |
| 36 | +mkdir -p "${LAUNCH_DIR}" | |
| 37 | +LOG_PATH="${LAUNCH_DIR}/${RUN_NAME}.daemon.log" | |
| 38 | +PID_PATH="${LAUNCH_DIR}/${RUN_NAME}.daemon.pid" | |
| 39 | +CMD_PATH="${LAUNCH_DIR}/${RUN_NAME}.daemon.cmd" | |
| 40 | + | |
| 41 | +CMD=( | |
| 42 | + bash | |
| 43 | + scripts/evaluation/run_coarse_fusion_tuning_resilient.sh | |
| 44 | + "${RUN_NAME}" | |
| 45 | + "${DATASET_ID}" | |
| 46 | + "${MAX_EVALS}" | |
| 47 | + "${BATCH_SIZE}" | |
| 48 | + "${CANDIDATE_POOL_SIZE}" | |
| 49 | + "${RANDOM_SEED}" | |
| 50 | + "${RUN_DIR}/search_space.yaml" | |
| 51 | + "" | |
| 52 | + "${RUN_DIR}" | |
| 53 | +) | |
| 54 | + | |
| 55 | +export BATCH_EVAL_TIMEOUT_SEC | |
| 56 | + | |
| 57 | +printf '%q ' "${CMD[@]}" > "${CMD_PATH}" | |
| 58 | +printf '\n' >> "${CMD_PATH}" | |
| 59 | + | |
| 60 | +setsid "${CMD[@]}" > "${LOG_PATH}" 2>&1 < /dev/null & | |
| 61 | +PID=$! | |
| 62 | +echo "${PID}" > "${PID_PATH}" | |
| 63 | + | |
| 64 | +echo "run_name=${RUN_NAME}" | |
| 65 | +echo "pid=${PID}" | |
| 66 | +echo "log=${LOG_PATH}" | |
| 67 | +echo "pid_file=${PID_PATH}" | |
| 68 | +echo "cmd_file=${CMD_PATH}" | |
| 69 | +echo "run_dir=${RUN_DIR}" | |
| 70 | +echo | |
| 71 | +echo "tail -f ${LOG_PATH}" | |
| 72 | +echo "cat ${RUN_DIR}/leaderboard.csv" | ... | ... |
scripts/evaluation/run_coarse_fusion_tuning_resilient.sh
| ... | ... | @@ -29,9 +29,48 @@ RESTART_SLEEP_SEC="${RESTART_SLEEP_SEC:-30}" |
| 29 | 29 | SEARCH_BASE_URL="${SEARCH_BASE_URL:-http://127.0.0.1:6002}" |
| 30 | 30 | EVAL_WEB_BASE_URL="${EVAL_WEB_BASE_URL:-http://127.0.0.1:6010}" |
| 31 | 31 | RUN_DIR="artifacts/search_evaluation/tuning_runs/${RUN_NAME}" |
| 32 | +LOCK_DIR="${RUN_DIR}/.resilient_lock" | |
| 33 | +HEALTH_POLL_SEC="${HEALTH_POLL_SEC:-15}" | |
| 32 | 34 | |
| 33 | 35 | mkdir -p "$(dirname "$RUN_DIR")" |
| 34 | 36 | |
| 37 | +release_lock() { | |
| 38 | + if [ -d "$LOCK_DIR" ] && [ -f "$LOCK_DIR/pid" ] && [ "$(cat "$LOCK_DIR/pid" 2>/dev/null || true)" = "$$" ]; then | |
| 39 | + rm -rf "$LOCK_DIR" | |
| 40 | + fi | |
| 41 | +} | |
| 42 | + | |
| 43 | +acquire_lock() { | |
| 44 | + mkdir -p "$RUN_DIR" | |
| 45 | + if mkdir "$LOCK_DIR" 2>/dev/null; then | |
| 46 | + echo "$$" > "$LOCK_DIR/pid" | |
| 47 | + date -u +%Y-%m-%dT%H:%M:%SZ > "$LOCK_DIR/started_at" | |
| 48 | + return 0 | |
| 49 | + fi | |
| 50 | + | |
| 51 | + local owner_pid="" | |
| 52 | + if [ -f "$LOCK_DIR/pid" ]; then | |
| 53 | + owner_pid="$(cat "$LOCK_DIR/pid" 2>/dev/null || true)" | |
| 54 | + fi | |
| 55 | + if [ -n "$owner_pid" ] && kill -0 "$owner_pid" 2>/dev/null; then | |
| 56 | + echo "[resilient] lock already held by pid=${owner_pid}, exiting" | |
| 57 | + exit 0 | |
| 58 | + fi | |
| 59 | + | |
| 60 | + echo "[resilient] removing stale lock at ${LOCK_DIR}" | |
| 61 | + rm -rf "$LOCK_DIR" | |
| 62 | + if mkdir "$LOCK_DIR" 2>/dev/null; then | |
| 63 | + echo "$$" > "$LOCK_DIR/pid" | |
| 64 | + date -u +%Y-%m-%dT%H:%M:%SZ > "$LOCK_DIR/started_at" | |
| 65 | + return 0 | |
| 66 | + fi | |
| 67 | + | |
| 68 | + echo "[resilient] failed to acquire lock at ${LOCK_DIR}" | |
| 69 | + exit 1 | |
| 70 | +} | |
| 71 | + | |
| 72 | +trap release_lock EXIT INT TERM | |
| 73 | + | |
| 35 | 74 | count_live_successes() { |
| 36 | 75 | python3 - "$RUN_DIR" <<'PY' |
| 37 | 76 | import json |
| ... | ... | @@ -53,13 +92,61 @@ print(count) |
| 53 | 92 | PY |
| 54 | 93 | } |
| 55 | 94 | |
| 95 | +wait_for_health() { | |
| 96 | + local url="$1" | |
| 97 | + local timeout_sec="$2" | |
| 98 | + local deadline=$(( $(date +%s) + timeout_sec )) | |
| 99 | + while [ "$(date +%s)" -lt "$deadline" ]; do | |
| 100 | + if curl -fsS "$url" >/dev/null 2>&1; then | |
| 101 | + return 0 | |
| 102 | + fi | |
| 103 | + sleep 2 | |
| 104 | + done | |
| 105 | + return 1 | |
| 106 | +} | |
| 107 | + | |
| 108 | +ensure_services() { | |
| 109 | + if ! wait_for_health "${SEARCH_BASE_URL}/health" 20; then | |
| 110 | + echo "[resilient] backend unhealthy, restarting backend" | |
| 111 | + ./restart.sh backend || true | |
| 112 | + sleep 5 | |
| 113 | + fi | |
| 114 | + if ! wait_for_health "${SEARCH_BASE_URL}/health" 180; then | |
| 115 | + echo "[resilient] backend still unhealthy after restart" | |
| 116 | + return 1 | |
| 117 | + fi | |
| 118 | + | |
| 119 | + if ! wait_for_health "${EVAL_WEB_BASE_URL}/api/history" 20; then | |
| 120 | + echo "[resilient] eval-web unhealthy, restarting eval-web" | |
| 121 | + ./restart.sh eval-web || true | |
| 122 | + sleep 5 | |
| 123 | + fi | |
| 124 | + if ! wait_for_health "${EVAL_WEB_BASE_URL}/api/history" 180; then | |
| 125 | + echo "[resilient] eval-web still unhealthy after restart" | |
| 126 | + return 1 | |
| 127 | + fi | |
| 128 | + return 0 | |
| 129 | +} | |
| 130 | + | |
| 131 | +heal_services_nonblocking() { | |
| 132 | + if ! curl -fsS "${SEARCH_BASE_URL}/health" >/dev/null 2>&1; then | |
| 133 | + echo "[resilient] backend became unhealthy during run, restarting backend" | |
| 134 | + ./restart.sh backend || true | |
| 135 | + sleep 5 | |
| 136 | + fi | |
| 137 | + if ! curl -fsS "${EVAL_WEB_BASE_URL}/api/history" >/dev/null 2>&1; then | |
| 138 | + echo "[resilient] eval-web became unhealthy during run, restarting eval-web" | |
| 139 | + ./restart.sh eval-web || true | |
| 140 | + sleep 5 | |
| 141 | + fi | |
| 142 | +} | |
| 143 | + | |
| 56 | 144 | build_cmd() { |
| 57 | 145 | local cmd=( |
| 58 | 146 | python |
| 59 | 147 | scripts/evaluation/tune_fusion.py |
| 60 | 148 | --mode optimize |
| 61 | 149 | --search-space "$SEARCH_SPACE" |
| 62 | - --seed-report "$SEED_REPORT" | |
| 63 | 150 | --tenant-id 163 |
| 64 | 151 | --dataset-id "$DATASET_ID" |
| 65 | 152 | --queries-file scripts/evaluation/queries/queries.txt |
| ... | ... | @@ -73,6 +160,9 @@ build_cmd() { |
| 73 | 160 | --random-seed "$RANDOM_SEED" |
| 74 | 161 | --batch-eval-timeout-sec "$BATCH_EVAL_TIMEOUT_SEC" |
| 75 | 162 | ) |
| 163 | + if [ -n "$SEED_REPORT" ]; then | |
| 164 | + cmd+=(--seed-report "$SEED_REPORT") | |
| 165 | + fi | |
| 76 | 166 | if [ -n "$RESUME_RUN_DIR" ]; then |
| 77 | 167 | cmd+=(--resume-run "$RESUME_RUN_DIR") |
| 78 | 168 | else |
| ... | ... | @@ -83,6 +173,7 @@ build_cmd() { |
| 83 | 173 | } |
| 84 | 174 | |
| 85 | 175 | attempt=0 |
| 176 | +acquire_lock | |
| 86 | 177 | while true; do |
| 87 | 178 | live_successes="$(count_live_successes)" |
| 88 | 179 | if [ "$live_successes" -ge "$MAX_EVALS" ]; then |
| ... | ... | @@ -96,11 +187,23 @@ while true; do |
| 96 | 187 | fi |
| 97 | 188 | |
| 98 | 189 | echo "[resilient] attempt=$attempt run_name=$RUN_NAME live_successes=$live_successes target=$MAX_EVALS" |
| 190 | + if ! ensure_services; then | |
| 191 | + echo "[resilient] service preflight failed, sleeping ${RESTART_SLEEP_SEC}s before retry" | |
| 192 | + sleep "$RESTART_SLEEP_SEC" | |
| 193 | + continue | |
| 194 | + fi | |
| 99 | 195 | CMD_STR="$(build_cmd)" |
| 100 | 196 | echo "[resilient] cmd=$CMD_STR" |
| 101 | 197 | |
| 102 | 198 | set +e |
| 103 | - bash -lc "$CMD_STR" | |
| 199 | + bash -lc "$CMD_STR" & | |
| 200 | + child_pid=$! | |
| 201 | + echo "[resilient] child_pid=${child_pid}" | |
| 202 | + while kill -0 "$child_pid" 2>/dev/null; do | |
| 203 | + heal_services_nonblocking | |
| 204 | + sleep "$HEALTH_POLL_SEC" | |
| 205 | + done | |
| 206 | + wait "$child_pid" | |
| 104 | 207 | exit_code=$? |
| 105 | 208 | set -e |
| 106 | 209 | |
| ... | ... | @@ -112,6 +215,9 @@ while true; do |
| 112 | 215 | exit 0 |
| 113 | 216 | fi |
| 114 | 217 | |
| 218 | + if ! ensure_services; then | |
| 219 | + echo "[resilient] service recovery failed after exit_code=$exit_code" | |
| 220 | + fi | |
| 115 | 221 | echo "[resilient] sleeping ${RESTART_SLEEP_SEC}s before resume" |
| 116 | 222 | sleep "$RESTART_SLEEP_SEC" |
| 117 | 223 | done | ... | ... |
scripts/evaluation/start_coarse_fusion_tuning_knn_tail.sh
0 → 100755
| ... | ... | @@ -0,0 +1,53 @@ |
| 1 | +#!/bin/bash | |
| 2 | + | |
| 3 | +set -euo pipefail | |
| 4 | + | |
| 5 | +cd "$(dirname "$0")/../.." | |
| 6 | +source ./activate.sh | |
| 7 | + | |
| 8 | +RUN_NAME="${RUN_NAME:-coarse_fusion_clothing_top771_knn_tail_$(date -u +%Y%m%dT%H%M%SZ)}" | |
| 9 | +DATASET_ID="${REPO_EVAL_DATASET_ID:-clothing_top771}" | |
| 10 | +SEARCH_SPACE="${SEARCH_SPACE:-scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771_knn_tail.yaml}" | |
| 11 | +MAX_EVALS="${MAX_EVALS:-20}" | |
| 12 | +BATCH_SIZE="${BATCH_SIZE:-2}" | |
| 13 | +CANDIDATE_POOL_SIZE="${CANDIDATE_POOL_SIZE:-96}" | |
| 14 | +RANDOM_SEED="${RANDOM_SEED:-20260424}" | |
| 15 | +BATCH_EVAL_TIMEOUT_SEC="${BATCH_EVAL_TIMEOUT_SEC:-0}" | |
| 16 | + | |
| 17 | +LAUNCH_DIR="artifacts/search_evaluation/tuning_launches" | |
| 18 | +mkdir -p "${LAUNCH_DIR}" | |
| 19 | +LOG_PATH="${LAUNCH_DIR}/${RUN_NAME}.daemon.log" | |
| 20 | +PID_PATH="${LAUNCH_DIR}/${RUN_NAME}.daemon.pid" | |
| 21 | +CMD_PATH="${LAUNCH_DIR}/${RUN_NAME}.daemon.cmd" | |
| 22 | + | |
| 23 | +CMD=( | |
| 24 | + bash | |
| 25 | + scripts/evaluation/run_coarse_fusion_tuning_resilient.sh | |
| 26 | + "${RUN_NAME}" | |
| 27 | + "${DATASET_ID}" | |
| 28 | + "${MAX_EVALS}" | |
| 29 | + "${BATCH_SIZE}" | |
| 30 | + "${CANDIDATE_POOL_SIZE}" | |
| 31 | + "${RANDOM_SEED}" | |
| 32 | + "${SEARCH_SPACE}" | |
| 33 | + "" | |
| 34 | +) | |
| 35 | + | |
| 36 | +export BATCH_EVAL_TIMEOUT_SEC | |
| 37 | + | |
| 38 | +printf '%q ' "${CMD[@]}" > "${CMD_PATH}" | |
| 39 | +printf '\n' >> "${CMD_PATH}" | |
| 40 | + | |
| 41 | +setsid "${CMD[@]}" > "${LOG_PATH}" 2>&1 < /dev/null & | |
| 42 | +PID=$! | |
| 43 | +echo "${PID}" > "${PID_PATH}" | |
| 44 | + | |
| 45 | +echo "run_name=${RUN_NAME}" | |
| 46 | +echo "pid=${PID}" | |
| 47 | +echo "log=${LOG_PATH}" | |
| 48 | +echo "pid_file=${PID_PATH}" | |
| 49 | +echo "cmd_file=${CMD_PATH}" | |
| 50 | +echo "run_dir=artifacts/search_evaluation/tuning_runs/${RUN_NAME}" | |
| 51 | +echo | |
| 52 | +echo "tail -f ${LOG_PATH}" | |
| 53 | +echo "cat artifacts/search_evaluation/tuning_runs/${RUN_NAME}/leaderboard.csv" | ... | ... |
scripts/evaluation/tuning/coarse_rank_fusion_space_clothing_top771_knn_tail.yaml
0 → 100644
| ... | ... | @@ -0,0 +1,82 @@ |
| 1 | +target_path: coarse_rank.fusion | |
| 2 | + | |
| 3 | +baseline: | |
| 4 | + es_bias: 6.62 | |
| 5 | + es_exponent: 0.24 | |
| 6 | + text_bias: 0.05 | |
| 7 | + text_exponent: 0.445 | |
| 8 | + text_translation_weight: 1.0 | |
| 9 | + knn_text_weight: 1.0 | |
| 10 | + knn_image_weight: 1.35 | |
| 11 | + knn_tie_breaker: 0.212 | |
| 12 | + knn_bias: 0.0052 | |
| 13 | + knn_exponent: 4.4639 | |
| 14 | + knn_text_bias: 0.1148 | |
| 15 | + knn_text_exponent: 1.0926 | |
| 16 | + knn_image_bias: 0.0114 | |
| 17 | + knn_image_exponent: 5.2496 | |
| 18 | + | |
| 19 | +parameters: | |
| 20 | + knn_exponent: {min: 0.3, max: 6.0, scale: linear, round: 4} | |
| 21 | + knn_text_bias: {min: 0.0, max: 0.3, scale: linear, round: 4} | |
| 22 | + knn_text_exponent: {min: 0.2, max: 3.0, scale: linear, round: 4} | |
| 23 | + knn_image_bias: {min: 0.0, max: 0.3, scale: linear, round: 4} | |
| 24 | + knn_image_exponent: {min: 1.0, max: 7.0, scale: linear, round: 4} | |
| 25 | + | |
| 26 | +seed_experiments: | |
| 27 | + - name: seed_fixed_anchor | |
| 28 | + description: 以 bo_012 的 5 维 knn 子项为锚点,在新固定参数下验证迁移。 | |
| 29 | + params: | |
| 30 | + knn_exponent: 4.4639 | |
| 31 | + knn_text_bias: 0.1148 | |
| 32 | + knn_text_exponent: 1.0926 | |
| 33 | + knn_image_bias: 0.0114 | |
| 34 | + knn_image_exponent: 5.2496 | |
| 35 | + - name: seed_knn_soft | |
| 36 | + description: 更平滑的全局 knn 指数,保留较强 image 子项指数。 | |
| 37 | + params: | |
| 38 | + knn_exponent: 1.2 | |
| 39 | + knn_text_bias: 0.06 | |
| 40 | + knn_text_exponent: 0.9 | |
| 41 | + knn_image_bias: 0.02 | |
| 42 | + knn_image_exponent: 5.4 | |
| 43 | + - name: seed_knn_balanced | |
| 44 | + description: 中等 knn 指数和中等子项非线性,作为稳健中心点。 | |
| 45 | + params: | |
| 46 | + knn_exponent: 2.8 | |
| 47 | + knn_text_bias: 0.12 | |
| 48 | + knn_text_exponent: 1.4 | |
| 49 | + knn_image_bias: 0.05 | |
| 50 | + knn_image_exponent: 4.2 | |
| 51 | + - name: seed_knn_high | |
| 52 | + description: 更高的全局 knn 指数,检查大集是否仍偏好更陡的 top-rank 强化。 | |
| 53 | + params: | |
| 54 | + knn_exponent: 5.6 | |
| 55 | + knn_text_bias: 0.04 | |
| 56 | + knn_text_exponent: 0.8 | |
| 57 | + knn_image_bias: 0.03 | |
| 58 | + knn_image_exponent: 5.0 | |
| 59 | + - name: seed_text_branch_heavier | |
| 60 | + description: 提高 knn_text 子项偏置和指数,观察 text/image 子项间的平衡点。 | |
| 61 | + params: | |
| 62 | + knn_exponent: 3.6 | |
| 63 | + knn_text_bias: 0.22 | |
| 64 | + knn_text_exponent: 2.2 | |
| 65 | + knn_image_bias: 0.01 | |
| 66 | + knn_image_exponent: 3.2 | |
| 67 | + - name: seed_image_branch_heavier | |
| 68 | + description: 提高 knn_image 子项偏置和指数,检查 image 通路在当前固定主参数下的上限。 | |
| 69 | + params: | |
| 70 | + knn_exponent: 3.4 | |
| 71 | + knn_text_bias: 0.03 | |
| 72 | + knn_text_exponent: 0.6 | |
| 73 | + knn_image_bias: 0.16 | |
| 74 | + knn_image_exponent: 6.2 | |
| 75 | + | |
| 76 | +optimizer: | |
| 77 | + init_random: 2 | |
| 78 | + candidate_pool_size: 96 | |
| 79 | + explore_probability: 0.14 | |
| 80 | + local_jitter_probability: 0.62 | |
| 81 | + elite_fraction: 0.3 | |
| 82 | + min_normalized_distance: 0.06 | ... | ... |