19 Mar, 2026
2 commits
-
- Text and image embedding are now split into separate services/processes, while still keeping a single replica as requested. The split lives in [embeddings/server.py](/data/saas-search/embeddings/server.py#L112), [config/services_config.py](/data/saas-search/config/services_config.py#L68), [providers/embedding.py](/data/saas-search/providers/embedding.py#L27), and the start scripts [scripts/start_embedding_service.sh](/data/saas-search/scripts/start_embedding_service.sh#L36), [scripts/start_embedding_text_service.sh](/data/saas-search/scripts/start_embedding_text_service.sh), [scripts/start_embedding_image_service.sh](/data/saas-search/scripts/start_embedding_image_service.sh). - Independent admission control is in place now: text and image have separate inflight limits, and image can be kept much stricter than text. The request handling, reject path, `/health`, and `/ready` are in [embeddings/server.py](/data/saas-search/embeddings/server.py#L613), [embeddings/server.py](/data/saas-search/embeddings/server.py#L786), and [embeddings/server.py](/data/saas-search/embeddings/server.py#L1028). - I checked the Redis embedding cache. It did exist, but there was a real flaw: cache keys did not distinguish `normalize=true` from `normalize=false`. I fixed that in [embeddings/cache_keys.py](/data/saas-search/embeddings/cache_keys.py#L6), and both text and image now use the same normalize-aware keying. I also added service-side BF16 cache hits that short-circuit before the model lane, so repeated requests no longer get throttled behind image inference. **What This Means** - Image pressure no longer blocks text, because they are on different ports/processes. - Repeated text/image requests now return from Redis without consuming model capacity. - Over-capacity requests are rejected quickly instead of sitting blocked. - I did not add a load balancer or multi-replica HA, per your GPU constraint. I also did not build Grafana/Prometheus dashboards in this pass, but `/health` now exposes the metrics needed to wire them. **Validation** - Tests passed: `.venv/bin/python -m pytest -q tests/test_embedding_pipeline.py tests/test_embedding_service_limits.py` -> `10 passed` - Stress test tool updates are in [scripts/perf_api_benchmark.py](/data/saas-search/scripts/perf_api_benchmark.py#L155) - Fresh benchmark on split text service `6105`: 535 requests / 3s, 100% success, `174.56 rps`, avg `88.48 ms` - Fresh benchmark on split image service `6108`: 1213 requests / 3s, 100% success, `403.32 rps`, avg `9.64 ms` - Live health after the run showed cache hits and non-zero cache-hit latency accounting: - text `avg_latency_ms=4.251` - image `avg_latency_ms=1.462`
13 Mar, 2026
2 commits
09 Mar, 2026
2 commits
-
CNCLIP_DEVICE=cuda TEI_USE_GPU=1 ./scripts/service_ctl.sh start 搜索后端+indexer+测试前段+4个微服务 跑通
08 Mar, 2026
1 commit
07 Mar, 2026
1 commit
02 Dec, 2025
2 commits
-
1. 加了一个配置searchable_option_dimensions,功能是配置子sku的option1_value option2_value option3_value 哪些参与检索(进索引、以及在线搜索的时候将对应字段纳入搜索field)。格式为list,选择三者中的一个或多个。 2. 索引 @mappings/search_products.json 要加3个字段 option1_values option2_values option3_values,各自的 数据灌入(mysql->ES)的模块也要修改,这个字段是对子sku的option1_value option2_value option3_value分别提取去抽后得到的list。 searchable_option_dimensions 中配置的,才进索引,比如 searchable_option_dimensions = ['option1'] 则 只对option1提取属性值去重组织list进入索引,其余两个字段为空 3. 在线 对应的将 searchable_option_dimensions 中 对应的索引字段纳入 multi_match 的 fields,权重设为0.5 (各个字段的权重配置放到一起集中管理) 1. 配置文件改动 (config/config.yaml) ✅ 在 spu_config 中添加了 searchable_option_dimensions 配置项,默认值为 ['option1', 'option2', 'option3'] ✅ 添加了3个新字段定义:option1_values, option2_values, option3_values,类型为 KEYWORD,权重为 0.5 ✅ 在 default 索引域的 fields 列表中添加了这3个字段,使其参与搜索 2. ES索引Mapping改动 (mappings/search_products.json) ✅ 添加了3个新字段:option1_values, option2_values, option3_values,类型为 keyword 3. 配置加载器改动 (config/config_loader.py) ✅ 在 SPUConfig 类中添加了 searchable_option_dimensions 字段 ✅ 更新了配置解析逻辑,支持读取 searchable_option_dimensions ✅ 更新了配置转换为字典的逻辑 4. 数据灌入改动 (indexer/spu_transformer.py) ✅ 在初始化时加载配置,获取 searchable_option_dimensions ✅ 在 _transform_spu_to_doc 方法中添加逻辑: 从所有子SKU中提取 option1, option2, option3 值 去重后存入 option1_values, option2_values, option3_values 根据配置决定哪些字段实际写入数据(未配置的字段写空数组) =
-
query config/ranking config优化
13 Nov, 2025
1 commit
12 Nov, 2025
1 commit
-
核心改动: 1. 配置化打分规则 - 新增FunctionScoreConfig和RerankConfig配置类 - 支持filter_weight、field_value_factor、decay三种ES原生function - 从代码中移除硬编码的打分逻辑 2. 配置模型定义 - FunctionScoreConfig: score_mode, boost_mode, functions - RerankConfig: enabled, expression(当前禁用) - 添加到CustomerConfig中 3. 查询构建器改造 - MultiLanguageQueryBuilder.init添加function_score_config引用 - _build_score_functions从配置动态构建ES functions - 支持配置的score_mode和boost_mode 4. 配置文件示例 - 添加完整的function_score配置示例 - 包含3种function类型的详细注释 - 提供常见场景的配置模板 5. ES原生能力支持 - Filter+Weight: 条件匹配提权 - Field Value Factor: 字段值映射打分 * modifier支持: none, log, log1p, log2p, ln, ln1p, ln2p, square, sqrt, reciprocal - Decay Functions: 衰减函数 * 支持: gauss, exp, linear 配置示例: - 7天新品提权(weight: 1.3) - 30天新品提权(weight: 1.15) - 有视频提权(weight: 1.05) - 销量因子(field_value_factor + log1p) - 时间衰减(gauss decay) 优势: ✓ 配置化 - 客户自己调整,无需改代码 ✓ 基于ES原生 - 性能最优,功能完整 ✓ 灵活易用 - YAML格式,有示例和注释 ✓ 统一约定 - function_score必需,简化设计 参考:https://www.elastic.co/docs/reference/query-languages/query-dsl/query-dsl-function-score-query
08 Nov, 2025
1 commit