ai-saas / saas-search

07 Apr, 2026

1 commit

9f33fe3c fix suggestion rebuild flow and es index creation ... Browse File »

- consolidate suggestion rebuild flow into build_suggestions.sh via --rebuild and remove the redundant rebuild_suggestions.sh wrapper
- make suggestion versioned index names use microseconds and handle index-create retries/timeouts without false already_exists failures
- treat create requests as successful when the index was created server-side, then explicitly wait for shard readiness and surface allocation diagnostics
- clean up freshly created suggestion indices on rebuild failure to avoid leaving red orphan indices behind
- make rebuild smoke tests target the local backend by default, with SUGGESTIONS_SMOKE_BASE_URL as the explicit override
- add unit coverage for microsecond versioned index names and cleanup on unallocatable index failures

2026-04-07 12:43:04 +0800

31 Mar, 2026

1 commit

7b8d9e1a 评估框架的启动脚本 Browse File »

tangwang
2026-03-31 19:36:47 +0800

27 Mar, 2026

1 commit

daa2690b 漏斗参数调优&呈现优化 Browse File »

tangwang
2026-03-27 23:00:16 +0800

26 Mar, 2026

1 commit

7a013ca7 多模态文本向量服务ok Browse File »

tangwang
2026-03-26 20:46:24 +0800

19 Mar, 2026

2 commits

af03fdef embedding模块代码整理 Browse File »

tangwang
2026-03-19 14:24:35 +0800

7214c2e7 mplemented** ... Browse File »

- Text and image embedding are now split into separate
  services/processes, while still keeping a single replica as requested.
The split lives in
[embeddings/server.py](/data/saas-search/embeddings/server.py#L112),
[config/services_config.py](/data/saas-search/config/services_config.py#L68),
[providers/embedding.py](/data/saas-search/providers/embedding.py#L27),
and the start scripts
[scripts/start_embedding_service.sh](/data/saas-search/scripts/start_embedding_service.sh#L36),
[scripts/start_embedding_text_service.sh](/data/saas-search/scripts/start_embedding_text_service.sh),
[scripts/start_embedding_image_service.sh](/data/saas-search/scripts/start_embedding_image_service.sh).
- Independent admission control is in place now: text and image have
  separate inflight limits, and image can be kept much stricter than
text. The request handling, reject path, `/health`, and `/ready` are in
[embeddings/server.py](/data/saas-search/embeddings/server.py#L613),
[embeddings/server.py](/data/saas-search/embeddings/server.py#L786), and
[embeddings/server.py](/data/saas-search/embeddings/server.py#L1028).
- I checked the Redis embedding cache. It did exist, but there was a
  real flaw: cache keys did not distinguish `normalize=true` from
`normalize=false`. I fixed that in
[embeddings/cache_keys.py](/data/saas-search/embeddings/cache_keys.py#L6),
and both text and image now use the same normalize-aware keying. I also
added service-side BF16 cache hits that short-circuit before the model
lane, so repeated requests no longer get throttled behind image
inference.

**What This Means**
- Image pressure no longer blocks text, because they are on different
  ports/processes.
- Repeated text/image requests now return from Redis without consuming
  model capacity.
- Over-capacity requests are rejected quickly instead of sitting
  blocked.
- I did not add a load balancer or multi-replica HA, per your GPU
  constraint. I also did not build Grafana/Prometheus dashboards in this
pass, but `/health` now exposes the metrics needed to wire them.

**Validation**
- Tests passed: `.venv/bin/python -m pytest -q
  tests/test_embedding_pipeline.py
tests/test_embedding_service_limits.py` -> `10 passed`
- Stress test tool updates are in
  [scripts/perf_api_benchmark.py](/data/saas-search/scripts/perf_api_benchmark.py#L155)
- Fresh benchmark on split text service `6105`: 535 requests / 3s, 100%
  success, `174.56 rps`, avg `88.48 ms`
- Fresh benchmark on split image service `6108`: 1213 requests / 3s,
  100% success, `403.32 rps`, avg `9.64 ms`
- Live health after the run showed cache hits and non-zero cache-hit
  latency accounting:
  - text `avg_latency_ms=4.251`
  - image `avg_latency_ms=1.462`

2026-03-19 13:21:01 +0800

18 Mar, 2026

1 commit

c90f80ed 相关性优化 Browse File »

tangwang
2026-03-18 16:44:27 +0800

13 Mar, 2026

4 commits

985752f5 1. 前端调试功能 ... Browse File »
```
2. 翻译限速 对应处理（qwen-mt限速）
```
tangwang
2026-03-13 16:15:06 +0800
22ae00c7 product_annotator Browse File »

tangwang
2026-03-13 13:48:23 +0800
2260eed2 推送报警到微信群webhook Browse File »

tangwang
2026-03-13 12:19:25 +0800
a7bb846c monitor Browse File »

tangwang
2026-03-13 12:08:20 +0800

12 Mar, 2026

2 commits

c6da6bca add status.sh Browse File »

tangwang
2026-03-12 23:51:43 +0800
7913e2fb 服务管理和监控 Browse File »

tangwang
2026-03-12 23:31:59 +0800

11 Mar, 2026

4 commits

28e57bb1 日志体系优化 Browse File »

tangwang
2026-03-11 23:04:17 +0800

af7ee060 service_ctl 简化为“显式服务清单”模式 ... Browse File »

去掉 START_* 控制变量逻辑，默认只启动核心服务 backend/indexer/frontend。
可选服务改为显式命令：./scripts/service_ctl.sh start embedding
translator reranker tei cnclip。
统一 translator 端口读取为 TRANSLATION_PORT（移除 TRANSLATOR_PORT
兼容）。
保留未知服务强校验。
关键文件：service_ctl.sh
“重名/歧义”修复
frontend 端口命名统一：FRONTEND_PORT 为主，PORT 仅后备。
start_frontend.sh 显式导出 PORT="${FRONTEND_PORT}"，避免配置了
FRONTEND_PORT 但服务仍跑 6003 的问题。
文件：start_frontend.sh、frontend_server.py、env_config.py
日志/PID 命名治理继续收口
统一规则继续落地为 logs/<service>.log、logs/<service>.pid。
cnclip 保持 logs/cnclip.log + logs/cnclip.pid。
文件：service_ctl.sh、start_cnclip_service.sh、stop_cnclip_service.sh
backend/indexer 启动风格统一补齐相关项
frontend/translator 也对齐到 set -euo pipefail，并用 exec 直启主进程。
文件：start_frontend.sh、start_translator.sh、start_backend.sh、start_indexer.sh
legacy 入口清理
删除：start_servers.py、stop_reranker.sh、stop_translator.sh。
reranker 停止逻辑并入 service_ctl（含 VLLM::EngineCore 清理）。
benchmark 脚本改为统一入口：service_ctl.sh stop reranker。
文件：benchmark_reranker_1000docs.sh

2026-03-11 22:39:39 +0800

7fbca0d7 启动脚本优化 Browse File »

tangwang
2026-03-11 19:23:57 +0800
9f5994b4 reranker Browse File »

tangwang
2026-03-11 14:26:34 +0800

10 Mar, 2026

2 commits

200fdddf embed norm Browse File »

tangwang
2026-03-10 17:56:28 +0800

c7e80cc2 新的 .env 管理机制如下： ... Browse File »

1. 新增 `scripts/init_env.sh`
- 若 `.env` 不存在，从 `.env.example` 复制生成
- 支持 `--force`：覆盖 `.env` 并备份为 `.env.bak`
- 首次搭建时统一执行：`./scripts/init_env.sh`

 2. 统一加载逻辑 `scripts/lib/load_env.sh`
- 移除 `activate.sh` 和 `service_ctl.sh` 中的重复解析逻辑
- 使用共享的 `load_env_file`，并改为 `eval "$(printf 'export %s=%q\n'
  "$key" "$value")"` 安全导出
- 支持含 ``、`$`、空格等特殊字符的值（需在 `.env` 中用引号包裹）

 3. 使用方式
- **activate.sh**：`source scripts/lib/load_env.sh` 后调用
  `load_env_file`
- **service_ctl.sh**：同上，去掉内联的 `load_env_file` 实现
- **create_tenant_index.sh**：改为使用共享 loader，不再用 `set -a;
  source .env`

 4. 文档更新
- **README.md**：在快速开始中加入 `./scripts/init_env.sh`
- **docs/QUICKSTART.md**：说明 `init_env.sh`
  用法，并强调含特殊字符的密码需加引号
- **.env.example**：补充注释说明引号规则

 5. setup.sh
- 用 `./scripts/init_env.sh` 替代原先的 `cp .env.example .env`

---

**推荐流程**：
```bash
./scripts/create_venv.sh
./scripts/init_env.sh     从 .env.example 生成本地 .env
source activate.sh
./run.sh
```

**密码写法**：若密码包含 ``、`$`、`&`、空格等，需加引号，例如：
```env
DB_PASSWORD="qY8tgodLoA&KTyQ"
ES_PASSWORD="4hOaLaf41y2VuI8y"
```

2026-03-10 10:40:14 +0800

09 Mar, 2026

2 commits

07cf5a93 START_EMBEDDING=1 START_TRANSLATOR=1 START_RERANKER=1 START_TEI=1 ... Browse File »
```
CNCLIP_DEVICE=cuda TEI_USE_GPU=1 ./scripts/service_ctl.sh start
搜索后端+indexer+测试前段+4个微服务 跑通
```
tangwang
2026-03-09 23:29:07 +0800
cc11ae04 cnclip Browse File »

tangwang
2026-03-09 13:26:40 +0800

07 Mar, 2026

1 commit

d1d356f8 脚本优化 Browse File »

tangwang
2026-03-07 11:48:59 +0800