Commit af7ee060e8748965f823930ecc840238d66ecfbf

Authored by tangwang
1 parent bb6420d3

service_ctl 简化为“显式服务清单”模式

去掉 START_* 控制变量逻辑,默认只启动核心服务 backend/indexer/frontend。
可选服务改为显式命令:./scripts/service_ctl.sh start embedding
translator reranker tei cnclip。
统一 translator 端口读取为 TRANSLATION_PORT(移除 TRANSLATOR_PORT
兼容)。
保留未知服务强校验。
关键文件:service_ctl.sh
“重名/歧义”修复
frontend 端口命名统一:FRONTEND_PORT 为主,PORT 仅后备。
start_frontend.sh 显式导出 PORT="${FRONTEND_PORT}",避免配置了
FRONTEND_PORT 但服务仍跑 6003 的问题。
文件:start_frontend.sh、frontend_server.py、env_config.py
日志/PID 命名治理继续收口
统一规则继续落地为 logs/<service>.log、logs/<service>.pid。
cnclip 保持 logs/cnclip.log + logs/cnclip.pid。
文件:service_ctl.sh、start_cnclip_service.sh、stop_cnclip_service.sh
backend/indexer 启动风格统一补齐相关项
frontend/translator 也对齐到 set -euo pipefail,并用 exec 直启主进程。
文件:start_frontend.sh、start_translator.sh、start_backend.sh、start_indexer.sh
legacy 入口清理
删除:start_servers.py、stop_reranker.sh、stop_translator.sh。
reranker 停止逻辑并入 service_ctl(含 VLLM::EngineCore 清理)。
benchmark 脚本改为统一入口:service_ctl.sh stop reranker。
文件:benchmark_reranker_1000docs.sh
.env.example
... ... @@ -24,8 +24,10 @@ INDEXER_HOST=0.0.0.0
24 24 INDEXER_PORT=6004
25 25  
26 26 # Optional service ports
  27 +FRONTEND_PORT=6003
27 28 EMBEDDING_PORT=6005
28 29 TEI_PORT=8080
  30 +CNCLIP_PORT=51000
29 31 TRANSLATION_PORT=6006
30 32 RERANKER_PORT=6007
31 33 EMBEDDING_SERVICE_URL=http://127.0.0.1:6005
... ... @@ -35,7 +37,7 @@ TRANSLATION_PROVIDER=direct
35 37 TRANSLATION_MODEL=qwen
36 38 EMBEDDING_BACKEND=tei
37 39 TEI_BASE_URL=http://127.0.0.1:8080
38   -TEI_USE_GPU=1
  40 +TEI_DEVICE=cuda
39 41 TEI_VERSION=1.9
40 42 TEI_MAX_BATCH_TOKENS=2048
41 43 TEI_MAX_CLIENT_BATCH_SIZE=8
... ... @@ -43,13 +45,6 @@ TEI_HEALTH_TIMEOUT_SEC=300
43 45 RERANK_PROVIDER=http
44 46 RERANK_BACKEND=qwen3_vllm
45 47  
46   -# Optional startup switches (run.sh / scripts/service_ctl.sh)
47   -START_EMBEDDING=0
48   -START_TRANSLATOR=0
49   -START_RERANKER=0
50   -START_TEI=0
51   -START_CNCLIP=0
52   -
53 48 # Cache Directory
54 49 CACHE_DIR=.cache
55 50  
... ...
README.md
... ... @@ -29,12 +29,15 @@ source activate.sh
29 29 ./run.sh
30 30  
31 31 # 可选:附加能力服务(按需开启)
32   -START_EMBEDDING=1 START_TRANSLATOR=1 START_RERANKER=1 START_TEI=1 START_CNCLIP=1 ./run.sh
  32 +./scripts/service_ctl.sh start embedding translator reranker tei cnclip
33 33  
34 34 # 查看状态
35 35 ./scripts/service_ctl.sh status
36 36 ```
37 37  
  38 +服务管理全盘说明(入口职责、默认行为、全量启停方式)见:
  39 +- `docs/Usage-Guide.md` -> `服务管理总览`
  40 +
38 41 核心端口:
39 42  
40 43 - `6002` backend(`/search/*`, `/admin/*`)
... ...
config/env_config.py
... ... @@ -62,7 +62,7 @@ INDEXER_PORT = int(os.getenv(&#39;INDEXER_PORT&#39;, 6004))
62 62 EMBEDDING_HOST = os.getenv('EMBEDDING_HOST', '127.0.0.1')
63 63 EMBEDDING_PORT = int(os.getenv('EMBEDDING_PORT', 6005))
64 64 TRANSLATION_HOST = os.getenv('TRANSLATION_HOST', '127.0.0.1')
65   -TRANSLATION_PORT = int(os.getenv('TRANSLATION_PORT', os.getenv('TRANSLATOR_PORT', 6006)))
  65 +TRANSLATION_PORT = int(os.getenv('TRANSLATION_PORT', 6006))
66 66 TRANSLATION_PROVIDER = os.getenv('TRANSLATION_PROVIDER', 'direct')
67 67 TRANSLATION_MODEL = os.getenv('TRANSLATION_MODEL', 'qwen')
68 68 RERANKER_HOST = os.getenv('RERANKER_HOST', '127.0.0.1')
... ...
docs/CNCLIP_SERVICE说明文档.md
... ... @@ -70,7 +70,8 @@ cd /data/saas-search
70 70 ### 6.1 通过统一编排启动
71 71  
72 72 ```bash
73   -START_EMBEDDING=1 START_TEI=1 START_CNCLIP=1 ./scripts/service_ctl.sh start
  73 +./scripts/service_ctl.sh start cnclip
  74 +# 或一次启动可选能力:./scripts/service_ctl.sh start embedding tei cnclip
74 75 ```
75 76  
76 77 ### 6.2 设备选择优先级
... ... @@ -110,6 +111,11 @@ cat third-party/clip-as-service/server/torch-flow-temp.yml
110 111 - GPU 模式:`device: 'cuda'`
111 112 - CPU 模式:`device: 'cpu'`
112 113  
  114 +### 7.2.1 日志与 PID 文件
  115 +
  116 +- 日志:`logs/cnclip.log`
  117 +- PID:`logs/cnclip.pid`
  118 +
113 119 ### 7.3 发送一次编码请求(触发模型加载)
114 120  
115 121 ```bash
... ...
docs/QUICKSTART.md
... ... @@ -66,18 +66,20 @@ source activate.sh
66 66 ```bash
67 67 ./run.sh
68 68 # 启动全部能力
69   -START_EMBEDDING=1 START_TRANSLATOR=1 START_RERANKER=1 START_TEI=1 START_CNCLIP=1 ./run.sh
70   -# 等价方式(直接使用服务控制器)
71   -START_EMBEDDING=1 START_TRANSLATOR=1 START_RERANKER=1 START_TEI=1 START_CNCLIP=1 ./scripts/service_ctl.sh start
  69 +# 追加可选能力服务(显式指定)
  70 +TEI_DEVICE=cuda CNCLIP_DEVICE=cuda ./scripts/service_ctl.sh start tei cnclip embedding translator reranker
72 71 # 说明:
73 72 # - reranker 为 GPU 强制模式(资源不足会直接启动失败)
74   -# - TEI 默认使用 GPU;当 TEI_USE_GPU=1 且 GPU 不可用时会直接失败(不会自动降级到 CPU)
  73 +# - TEI 默认使用 GPU;当 TEI_DEVICE=cuda 且 GPU 不可用时会直接失败(不会自动降级到 CPU)
75 74 # - cnclip 默认使用 cuda;若显式配置为 cuda 且 GPU 不可用会直接失败(不会自动降级到 cpu)
76 75  
77 76 ./scripts/service_ctl.sh status
78 77 ./scripts/stop.sh
79 78 ```
80 79  
  80 +服务管理方式(入口职责、默认行为、全量拉起顺序)见:
  81 +- `docs/Usage-Guide.md` -> `服务管理总览`
  82 +
81 83 ### 1.3 常用 API 请求示例
82 84  
83 85 #### 搜索 API(backend 6002)
... ... @@ -135,7 +137,7 @@ API 文档:`http://localhost:6004/docs`
135 137 ```bash
136 138 # TEI(文本向量后端,默认)
137 139 # GPU(需 nvidia-container-toolkit)
138   -TEI_USE_GPU=1 ./scripts/start_tei_service.sh
  140 +TEI_DEVICE=cuda ./scripts/start_tei_service.sh
139 141  
140 142 # Embedding API(会校验 TEI /health)
141 143 ./scripts/start_embedding_service.sh
... ... @@ -151,7 +153,7 @@ curl -X POST http://localhost:6005/embed/image \
151 153  
152 154 说明:
153 155 - TEI 默认镜像按 `TEI_VERSION` 组装:`cuda-<version>`(默认 `1.9`)。
154   -- `TEI_USE_GPU=1` 时会严格校验 Docker GPU runtime;未配置会直接报错退出。
  156 +- `TEI_DEVICE=cuda` 时会严格校验 Docker GPU runtime;未配置会直接报错退出。
155 157 - `/embed/image` 依赖 `cnclip`(`grpc://127.0.0.1:51000`),未启动时 embedding 服务会启动失败。
156 158  
157 159 #### Translator 服务(6006)
... ... @@ -514,6 +516,8 @@ curl http://localhost:6007/health
514 516 - `logs/embedding.log`
515 517 - `logs/translator.log`
516 518 - `logs/reranker.log`
  519 +- `logs/tei.log`
  520 +- `logs/cnclip.log`
517 521 - `logs/search_engine.log`
518 522 - `logs/errors.log`
519 523  
... ...
docs/TEI_SERVICE说明文档.md
... ... @@ -63,13 +63,13 @@ docker info --format &#39;{{json .Runtimes}}&#39;
63 63 ### 5.1 GPU 模式启动(默认)
64 64  
65 65 ```bash
66   -TEI_USE_GPU=1 ./scripts/start_tei_service.sh
  66 +TEI_DEVICE=cuda ./scripts/start_tei_service.sh
67 67 ```
68 68  
69 69 预期输出包含:
70 70  
71 71 - `Image: ghcr.io/huggingface/text-embeddings-inference:turing-...` 或 `cuda-...`(脚本按 GPU 架构自动选择)
72   -- `Mode: gpu`
  72 +- `Mode: cuda`
73 73 - `TEI is ready and output probe passed: http://127.0.0.1:8080`
74 74  
75 75 说明:
... ... @@ -79,7 +79,7 @@ TEI_USE_GPU=1 ./scripts/start_tei_service.sh
79 79 ### 5.2 CPU 模式启动(显式)
80 80  
81 81 ```bash
82   -TEI_USE_GPU=0 ./scripts/start_tei_service.sh
  82 +TEI_DEVICE=cpu ./scripts/start_tei_service.sh
83 83 ```
84 84  
85 85 预期输出包含:
... ... @@ -135,7 +135,7 @@ curl -sS -X POST &quot;http://127.0.0.1:6005/embed/text&quot; \
135 135  
136 136 `scripts/start_tei_service.sh` 支持下列变量:
137 137  
138   -- `TEI_USE_GPU`:`1/0`(或 `true/false`),默认 `1`
  138 +- `TEI_DEVICE`:`cuda/cpu`,默认 `cuda`
139 139 - `TEI_CONTAINER_NAME`:容器名,默认 `saas-search-tei`
140 140 - `TEI_PORT`:宿主机端口,默认 `8080`
141 141 - `TEI_MODEL_ID`:默认 `Qwen/Qwen3-Embedding-0.6B`
... ... @@ -152,7 +152,7 @@ curl -sS -X POST &quot;http://127.0.0.1:6005/embed/text&quot; \
152 152 启动全套(含 TEI):
153 153  
154 154 ```bash
155   -START_EMBEDDING=1 START_TRANSLATOR=1 START_RERANKER=1 START_TEI=1 START_CNCLIP=1 TEI_USE_GPU=1 ./scripts/service_ctl.sh start
  155 +TEI_DEVICE=cuda ./scripts/service_ctl.sh start embedding translator reranker tei cnclip
156 156 ```
157 157  
158 158 仅启动 TEI:
... ... @@ -167,9 +167,11 @@ START_EMBEDDING=1 START_TRANSLATOR=1 START_RERANKER=1 START_TEI=1 START_CNCLIP=1
167 167 ./scripts/service_ctl.sh status tei
168 168 ```
169 169  
  170 +日志文件:`logs/tei.log`
  171 +
170 172 ## 9. 常见问题与排障
171 173  
172   -### 9.1 `ERROR: TEI_USE_GPU=1 but Docker nvidia runtime is not configured`
  174 +### 9.1 `ERROR: TEI_DEVICE=cuda but Docker nvidia runtime is not configured`
173 175  
174 176 - 原因:Docker 未配置 NVIDIA runtime。
175 177 - 处理:按本文 4.2 配置后重启 Docker。
... ... @@ -181,7 +183,7 @@ START_EMBEDDING=1 START_TRANSLATOR=1 START_RERANKER=1 START_TEI=1 START_CNCLIP=1
181 183  
182 184 ```bash
183 185 ./scripts/stop_tei_service.sh
184   -TEI_USE_GPU=0 ./scripts/start_tei_service.sh # 或改为 1
  186 +TEI_DEVICE=cpu ./scripts/start_tei_service.sh # 或改为 cuda
185 187 ```
186 188  
187 189 ### 9.3 embedding 服务报 TEI 不可达
... ...
docs/Usage-Guide.md
... ... @@ -8,10 +8,11 @@
8 8  
9 9 1. [环境准备](#环境准备)
10 10 2. [服务启动](#服务启动)
11   -3. [配置说明](#配置说明)
12   -4. [查看日志](#查看日志)
13   -5. [测试验证](#测试验证)
14   -6. [常见问题](#常见问题)
  11 +3. [服务管理总览](#全盘串讲服务管理方式)
  12 +4. [配置说明](#配置说明)
  13 +5. [查看日志](#查看日志)
  14 +6. [测试验证](#测试验证)
  15 +7. [常见问题](#常见问题)
15 16  
16 17 ---
17 18  
... ... @@ -50,10 +51,10 @@ TEI 文本向量服务使用 Docker 容器:
50 51  
51 52 ```bash
52 53 # GPU(需 nvidia-container-toolkit)
53   -TEI_USE_GPU=1 ./scripts/start_tei_service.sh
  54 +TEI_DEVICE=cuda ./scripts/start_tei_service.sh
54 55  
55 56 # CPU
56   -TEI_USE_GPU=0 ./scripts/start_tei_service.sh
  57 +TEI_DEVICE=cpu ./scripts/start_tei_service.sh
57 58 ```
58 59  
59 60 专项说明:
... ... @@ -138,7 +139,7 @@ cd /data/saas-search
138 139 可选:全功能模式(同时启动 embedding/translator/reranker/tei/cnclip):
139 140  
140 141 ```bash
141   -START_EMBEDDING=1 START_TRANSLATOR=1 START_RERANKER=1 START_TEI=1 START_CNCLIP=1 ./run.sh
  142 +TEI_DEVICE=cuda CNCLIP_DEVICE=cuda ./scripts/service_ctl.sh start tei cnclip embedding translator reranker
142 143 ```
143 144  
144 145 ### 方式2: 统一控制脚本(推荐)
... ... @@ -241,6 +242,49 @@ cd frontend
241 242 python -m http.server 6003
242 243 ```
243 244  
  245 +## 服务管理总览
  246 +
  247 +### 1) 入口脚本职责
  248 +
  249 +- `./run.sh`:仅启动核心服务(`backend/indexer/frontend`)。
  250 +- `./restart.sh`:重启逻辑为“先停所有已知服务,再启动核心服务”。
  251 +- `./scripts/stop.sh`:停止所有已知服务。
  252 +- `./scripts/service_ctl.sh`:统一控制器,支持 `start/stop/restart/status`,是唯一推荐入口。
  253 +
  254 +### 2) `service_ctl.sh` 的默认行为
  255 +
  256 +- `start`(不带服务名):启动核心服务 `backend/indexer/frontend`。
  257 +- `stop`(不带服务名):停止全部已知服务(含可选服务)。
  258 +- `restart`(不带服务名):先停全部,再只启动核心服务。
  259 +- `status`(不带服务名):显示全部已知服务状态。
  260 +
  261 +### 3) 全量服务一键拉起
  262 +
  263 +```bash
  264 +TEI_DEVICE=cuda CNCLIP_DEVICE=cuda ./scripts/service_ctl.sh start tei cnclip embedding translator reranker
  265 +```
  266 +
  267 +说明:
  268 +- `TEI_DEVICE` / `CNCLIP_DEVICE` 统一使用 `cuda|cpu`。
  269 +- 显式把 `tei`、`cnclip` 放在前面,避免 `embedding` 因依赖未就绪启动失败。
  270 +
  271 +### 4) 常用运维命令
  272 +
  273 +```bash
  274 +# 先重启核心,再拉起可选服务(最常用)
  275 +./restart.sh
  276 +TEI_DEVICE=cuda CNCLIP_DEVICE=cuda ./scripts/service_ctl.sh start tei cnclip embedding translator reranker
  277 +
  278 +# 查看全量状态
  279 +./scripts/service_ctl.sh status
  280 +
  281 +# 仅重启某个服务
  282 +./scripts/service_ctl.sh restart embedding
  283 +
  284 +# 停止全部
  285 +./scripts/service_ctl.sh stop
  286 +```
  287 +
244 288 ### 停止服务
245 289  
246 290 ```bash
... ... @@ -303,16 +347,12 @@ INDEXER_HOST=0.0.0.0
303 347 INDEXER_PORT=6004
304 348  
305 349 # Optional service ports
  350 +FRONTEND_PORT=6003
306 351 EMBEDDING_PORT=6005
  352 +TEI_PORT=8080
  353 +CNCLIP_PORT=51000
307 354 TRANSLATION_PORT=6006
308 355 RERANKER_PORT=6007
309   -
310   -# Optional startup switches (for run.sh / service_ctl.sh)
311   -START_EMBEDDING=0
312   -START_TRANSLATOR=0
313   -START_RERANKER=0
314   -START_TEI=0
315   -START_CNCLIP=0
316 356 ```
317 357  
318 358 ### 修改配置
... ... @@ -334,6 +374,8 @@ START_CNCLIP=0
334 374 - `logs/embedding.log` - 向量服务日志(可选)
335 375 - `logs/translator.log` - 翻译服务日志(可选)
336 376 - `logs/reranker.log` - 重排服务日志(可选)
  377 +- `logs/tei.log` - TEI 启停日志(可选)
  378 +- `logs/cnclip.log` - CN-CLIP 启停日志(可选)
337 379 - `logs/search_engine.log` - 应用主日志(按天轮转)
338 380 - `logs/errors.log` - 错误日志(按天轮转)
339 381  
... ...
docs/搜索API对接指南.md
... ... @@ -1655,6 +1655,7 @@ curl -X POST &quot;http://localhost:6004/indexer/enrich-content&quot; \
1655 1655 | 重排服务 | 6007 | `http://localhost:6007` | 对检索结果进行二次排序 |
1656 1656  
1657 1657 生产环境请将 `localhost` 替换为实际服务地址。
  1658 +服务管理入口与完整启停规则见:`docs/Usage-Guide.md` -> `服务管理总览`。
1658 1659  
1659 1660 ### 7.1 向量服务(Embedding)
1660 1661  
... ... @@ -1663,7 +1664,7 @@ curl -X POST &quot;http://localhost:6004/indexer/enrich-content&quot; \
1663 1664 - **依赖**:
1664 1665 - 文本向量后端默认走 TEI(`http://127.0.0.1:8080`)
1665 1666 - 图片向量依赖 `cnclip`(`grpc://127.0.0.1:51000`)
1666   - - TEI 默认使用 GPU(`TEI_USE_GPU=1`);当配置为 GPU 且不可用时会启动失败(不会自动降级到 CPU)
  1667 + - TEI 默认使用 GPU(`TEI_DEVICE=cuda`);当配置为 GPU 且不可用时会启动失败(不会自动降级到 CPU)
1667 1668 - cnclip 默认使用 `cuda`;若配置为 `cuda` 但 GPU 不可用会启动失败(不会自动降级到 `cpu`)
1668 1669  
1669 1670 #### 7.1.1 `POST /embed/text` — 文本向量化
... ...
embeddings/README.md
... ... @@ -65,10 +65,10 @@
65 65  
66 66 ```bash
67 67 # GPU(需 nvidia-container-toolkit)
68   -TEI_USE_GPU=1 ./scripts/start_tei_service.sh
  68 +TEI_DEVICE=cuda ./scripts/start_tei_service.sh
69 69  
70 70 # CPU
71   -TEI_USE_GPU=0 ./scripts/start_tei_service.sh
  71 +TEI_DEVICE=cpu ./scripts/start_tei_service.sh
72 72  
73 73 ./scripts/start_embedding_service.sh
74 74 ```
... ... @@ -82,4 +82,4 @@ TEI_USE_GPU=0 ./scripts/start_tei_service.sh
82 82 - `IMAGE_NORMALIZE_EMBEDDINGS`(默认 true)
83 83 - `USE_CLIP_AS_SERVICE`, `CLIP_AS_SERVICE_SERVER`:图片向量(clip-as-service)
84 84 - `IMAGE_MODEL_NAME`, `IMAGE_DEVICE`:本地 CN-CLIP(当 `USE_CLIP_AS_SERVICE=false` 时)
85   -- TEI 相关:`TEI_USE_GPU`、`TEI_VERSION`、`TEI_MAX_BATCH_TOKENS`、`TEI_MAX_CLIENT_BATCH_SIZE`、`TEI_HEALTH_TIMEOUT_SEC`
  85 +- TEI 相关:`TEI_DEVICE`、`TEI_VERSION`、`TEI_MAX_BATCH_TOKENS`、`TEI_MAX_CLIENT_BATCH_SIZE`、`TEI_HEALTH_TIMEOUT_SEC`
... ...
reranker/DEPLOYMENT_AND_TUNING.md
... ... @@ -78,7 +78,7 @@ services:
78 78  
79 79 ```bash
80 80 ./scripts/start_reranker.sh
81   -./scripts/stop_reranker.sh
  81 +./scripts/service_ctl.sh stop reranker
82 82 ```
83 83  
84 84 健康检查:
... ...
restart.sh
... ... @@ -2,6 +2,8 @@
2 2  
3 3 # Unified restart script for saas-search services
4 4  
  5 +set -euo pipefail
  6 +
5 7 cd "$(dirname "$0")"
6 8  
7 9 ./scripts/service_ctl.sh restart
... ...
... ... @@ -2,6 +2,8 @@
2 2  
3 3 # Unified startup script for saas-search services
4 4  
  5 +set -euo pipefail
  6 +
5 7 cd "$(dirname "$0")"
6 8  
7 9 ./scripts/service_ctl.sh start
... ...
scripts/benchmark_reranker_1000docs.sh
... ... @@ -33,7 +33,7 @@ TMP_CASES=&quot;/tmp/rerank_1000_shortdocs_cases.json&quot;
33 33 mkdir -p "${OUT_DIR}"
34 34  
35 35 cleanup() {
36   - ./scripts/stop_reranker.sh >/dev/null 2>&1 || true
  36 + ./scripts/service_ctl.sh stop reranker >/dev/null 2>&1 || true
37 37 }
38 38 trap cleanup EXIT
39 39  
... ...
scripts/frontend_server.py
... ... @@ -30,8 +30,8 @@ BACKEND_PROXY_URL = os.getenv(&#39;BACKEND_PROXY_URL&#39;, &#39;http://127.0.0.1:6002&#39;).rstr
30 30 frontend_dir = os.path.join(os.path.dirname(__file__), '../frontend')
31 31 os.chdir(frontend_dir)
32 32  
33   -# Get port from environment variable or default
34   -PORT = int(os.getenv('PORT', 6003))
  33 +# FRONTEND_PORT is the canonical config; keep PORT as a secondary fallback.
  34 +PORT = int(os.getenv('FRONTEND_PORT', os.getenv('PORT', 6003)))
35 35  
36 36 # Configure logging to suppress scanner noise
37 37 logging.basicConfig(level=logging.ERROR, format='%(asctime)s - %(levelname)s - %(message)s')
... ...
scripts/service_ctl.sh
... ... @@ -28,7 +28,7 @@ get_port() {
28 28 indexer) echo "${INDEXER_PORT:-6004}" ;;
29 29 frontend) echo "${FRONTEND_PORT:-6003}" ;;
30 30 embedding) echo "${EMBEDDING_PORT:-6005}" ;;
31   - translator) echo "${TRANSLATION_PORT:-${TRANSLATOR_PORT:-6006}}" ;;
  31 + translator) echo "${TRANSLATION_PORT:-6006}" ;;
32 32 reranker) echo "${RERANKER_PORT:-6007}" ;;
33 33 tei) echo "${TEI_PORT:-8080}" ;;
34 34 cnclip) echo "${CNCLIP_PORT:-51000}" ;;
... ... @@ -38,10 +38,7 @@ get_port() {
38 38  
39 39 pid_file() {
40 40 local service="$1"
41   - case "${service}" in
42   - cnclip) echo "${LOG_DIR}/cnclip_service.pid" ;;
43   - *) echo "${LOG_DIR}/${service}.pid" ;;
44   - esac
  41 + echo "${LOG_DIR}/${service}.pid"
45 42 }
46 43  
47 44 log_file() {
... ... @@ -64,6 +61,24 @@ service_start_cmd() {
64 61 esac
65 62 }
66 63  
  64 +service_exists() {
  65 + local service="$1"
  66 + case "${service}" in
  67 + backend|indexer|frontend|embedding|translator|reranker|tei|cnclip) return 0 ;;
  68 + *) return 1 ;;
  69 + esac
  70 +}
  71 +
  72 +validate_targets() {
  73 + local targets="$1"
  74 + for svc in ${targets}; do
  75 + if ! service_exists "${svc}"; then
  76 + echo "[error] unknown service: ${svc}" >&2
  77 + return 1
  78 + fi
  79 + done
  80 +}
  81 +
67 82 wait_for_health() {
68 83 local service="$1"
69 84 local max_retries="${2:-30}"
... ... @@ -188,9 +203,15 @@ start_one() {
188 203 cnclip|tei)
189 204 echo "[start] ${service} (managed by native script)"
190 205 if [ "${service}" = "cnclip" ]; then
191   - CNCLIP_DEVICE="${CNCLIP_DEVICE:-cuda}" bash -lc "${cmd}" >> "${lf}" 2>&1
  206 + if ! CNCLIP_DEVICE="${CNCLIP_DEVICE:-cuda}" bash -lc "${cmd}" >> "${lf}" 2>&1; then
  207 + echo "[error] ${service} start script failed, inspect ${lf}" >&2
  208 + return 1
  209 + fi
192 210 else
193   - bash -lc "${cmd}" >> "${lf}" 2>&1
  211 + if ! bash -lc "${cmd}" >> "${lf}" 2>&1; then
  212 + echo "[error] ${service} start script failed, inspect ${lf}" >&2
  213 + return 1
  214 + fi
194 215 fi
195 216 if [ "${service}" = "tei" ]; then
196 217 if is_running_tei_container; then
... ... @@ -244,6 +265,24 @@ start_one() {
244 265 esac
245 266 }
246 267  
  268 +cleanup_reranker_orphans() {
  269 + local engine_pids
  270 + engine_pids="$(pgrep -f 'VLLM::EngineCore' 2>/dev/null || true)"
  271 + if [ -z "${engine_pids}" ]; then
  272 + return 0
  273 + fi
  274 +
  275 + echo "[stop] reranker orphan engines=${engine_pids}"
  276 + for pid in ${engine_pids}; do
  277 + kill -TERM "${pid}" 2>/dev/null || true
  278 + done
  279 + sleep 1
  280 + engine_pids="$(pgrep -f 'VLLM::EngineCore' 2>/dev/null || true)"
  281 + for pid in ${engine_pids}; do
  282 + kill -KILL "${pid}" 2>/dev/null || true
  283 + done
  284 +}
  285 +
247 286 stop_one() {
248 287 local service="$1"
249 288 cd "${PROJECT_ROOT}"
... ... @@ -257,11 +296,6 @@ stop_one() {
257 296 bash -lc "./scripts/stop_tei_service.sh" || true
258 297 return 0
259 298 fi
260   - if [ "${service}" = "reranker" ]; then
261   - echo "[stop] reranker (managed by native script)"
262   - bash -lc "./scripts/stop_reranker.sh" || true
263   - return 0
264   - fi
265 299  
266 300 local pf
267 301 pf="$(pid_file "${service}")"
... ... @@ -297,6 +331,10 @@ stop_one() {
297 331 done
298 332 fi
299 333 fi
  334 +
  335 + if [ "${service}" = "reranker" ]; then
  336 + cleanup_reranker_orphans
  337 + fi
300 338 }
301 339  
302 340 status_one() {
... ... @@ -340,13 +378,7 @@ resolve_targets() {
340 378  
341 379 case "${scope}" in
342 380 start)
343   - local targets=("${CORE_SERVICES[@]}")
344   - if [ "${START_TEI:-0}" = "1" ]; then targets+=("tei"); fi
345   - if [ "${START_CNCLIP:-0}" = "1" ]; then targets+=("cnclip"); fi
346   - if [ "${START_EMBEDDING:-0}" = "1" ]; then targets+=("embedding"); fi
347   - if [ "${START_TRANSLATOR:-0}" = "1" ]; then targets+=("translator"); fi
348   - if [ "${START_RERANKER:-0}" = "1" ]; then targets+=("reranker"); fi
349   - echo "${targets[@]}"
  381 + echo "${CORE_SERVICES[@]}"
350 382 ;;
351 383 stop|status)
352 384 echo "$(all_services)"
... ... @@ -370,14 +402,14 @@ Usage:
370 402 ./scripts/service_ctl.sh status [service...]
371 403  
372 404 Default target set (when no service provided):
373   - start -> backend indexer frontend (+ optional by env flags)
  405 + start -> backend indexer frontend
374 406 stop -> all known services
375 407 restart -> stop all known services, then start with start targets
376 408 status -> all known services
377 409  
378   -Optional startup flags:
379   - START_EMBEDDING=1 START_TRANSLATOR=1 START_RERANKER=1 START_TEI=1 START_CNCLIP=1 ./run.sh
380   - START_EMBEDDING=1 START_TRANSLATOR=1 START_RERANKER=1 START_TEI=1 START_CNCLIP=1 ./scripts/service_ctl.sh start
  410 +Optional service startup:
  411 + ./scripts/service_ctl.sh start embedding translator reranker tei cnclip
  412 + TEI_DEVICE=cuda|cpu ./scripts/service_ctl.sh start tei
381 413 CNCLIP_DEVICE=cuda|cpu ./scripts/service_ctl.sh start cnclip
382 414 EOF
383 415 }
... ... @@ -404,6 +436,7 @@ main() {
404 436 usage
405 437 exit 1
406 438 fi
  439 + validate_targets "${targets}"
407 440  
408 441 case "${action}" in
409 442 start)
... ...
scripts/start.sh
1 1 #!/bin/bash
2 2  
3   -# Backward-compatible start entrypoint.
  3 +# Service start entrypoint.
4 4 # Delegates to unified service controller.
5 5  
6   -set -e
  6 +set -euo pipefail
7 7  
8 8 cd "$(dirname "$0")/.."
9 9  
... ... @@ -11,12 +11,12 @@ echo &quot;========================================&quot;
11 11 echo "saas-search 服务启动"
12 12 echo "========================================"
13 13 echo "默认启动核心服务: backend/indexer/frontend"
14   -echo "可选服务通过环境变量开启:"
15   -echo " START_EMBEDDING=1 START_TRANSLATOR=1 START_RERANKER=1 START_TEI=1 START_CNCLIP=1 ./run.sh"
  14 +echo "可选服务请显式指定:"
  15 +echo " ./scripts/service_ctl.sh start embedding translator reranker tei cnclip"
16 16 echo
17 17  
18 18 ./scripts/service_ctl.sh start
19 19  
20 20 echo
21 21 echo "当前服务状态:"
22   -./scripts/service_ctl.sh status backend indexer frontend embedding translator reranker tei
  22 +./scripts/service_ctl.sh status backend indexer frontend embedding translator reranker tei cnclip
... ...
scripts/start_backend.sh
... ... @@ -2,7 +2,7 @@
2 2  
3 3 # Start Backend API Service
4 4  
5   -set -e
  5 +set -euo pipefail
6 6  
7 7 cd "$(dirname "$0")/.."
8 8 source ./activate.sh
... ... @@ -15,24 +15,25 @@ echo -e &quot;${GREEN}========================================${NC}&quot;
15 15 echo -e "${GREEN}Starting Backend API Service${NC}"
16 16 echo -e "${GREEN}========================================${NC}"
17 17  
  18 +API_HOST="${API_HOST:-0.0.0.0}"
  19 +API_PORT="${API_PORT:-6002}"
  20 +ES_HOST="${ES_HOST:-http://localhost:9200}"
  21 +ES_USERNAME="${ES_USERNAME:-}"
  22 +ES_PASSWORD="${ES_PASSWORD:-}"
  23 +
18 24 echo -e "\n${YELLOW}Configuration:${NC}"
19   -echo " API Host: ${API_HOST:-0.0.0.0}"
20   -echo " API Port: ${API_PORT:-6002}"
21   -echo " ES Host: ${ES_HOST:-http://localhost:9200}"
  25 +echo " API Host: ${API_HOST}"
  26 +echo " API Port: ${API_PORT}"
  27 +echo " ES Host: ${ES_HOST}"
22 28 echo " ES Username: ${ES_USERNAME:-not set}"
23 29  
24 30 echo -e "\n${YELLOW}Starting backend API service (search + admin)...${NC}"
25 31  
26 32 # Export environment variables for the Python process
27   -export API_HOST=${API_HOST:-0.0.0.0}
28   -export API_PORT=${API_PORT:-6002}
29   -export ES_HOST=${ES_HOST:-http://localhost:9200}
30   -export ES_USERNAME=${ES_USERNAME:-}
31   -export ES_PASSWORD=${ES_PASSWORD:-}
32   -
33   -python main.py serve \
34   - --host $API_HOST \
35   - --port $API_PORT \
36   - --es-host $ES_HOST
  33 +export API_HOST API_PORT ES_HOST ES_USERNAME ES_PASSWORD
37 34  
  35 +exec python main.py serve \
  36 + --host "${API_HOST}" \
  37 + --port "${API_PORT}" \
  38 + --es-host "${ES_HOST}"
38 39  
... ...
scripts/start_cnclip_service.sh
... ... @@ -12,9 +12,6 @@
12 12 # 选项:
13 13 # --port PORT 服务端口(默认:51000)
14 14 # --device DEVICE 设备类型:cuda 或 cpu(默认:cuda)
15   -# --batch-size SIZE 批处理大小(默认:32)
16   -# --num-workers NUM 预处理线程数(默认:4)
17   -# --dtype TYPE 数据类型:float16 或 float32(默认:float16)
18 15 # --model-name NAME 模型名称(默认:CN-CLIP/ViT-H-14)
19 16 # --replicas NUM 副本数(默认:1)
20 17 # --help 显示帮助信息
... ... @@ -22,11 +19,10 @@
22 19 # 示例:
23 20 # ./scripts/start_cnclip_service.sh
24 21 # ./scripts/start_cnclip_service.sh --port 52000 --device cuda
25   -# ./scripts/start_cnclip_service.sh --batch-size 16 --dtype float32
26 22 #
27 23 ###############################################################################
28 24  
29   -set -e # 遇到错误立即退出
  25 +set -euo pipefail
30 26  
31 27 # 颜色定义
32 28 RED='\033[0;31m'
... ... @@ -38,9 +34,6 @@ NC=&#39;\033[0m&#39; # No Color
38 34 # 默认配置
39 35 DEFAULT_PORT=51000
40 36 DEFAULT_DEVICE="cuda"
41   -DEFAULT_BATCH_SIZE=32
42   -DEFAULT_NUM_WORKERS=4
43   -DEFAULT_DTYPE="float16"
44 37 DEFAULT_MODEL_NAME="CN-CLIP/ViT-H-14"
45 38 # DEFAULT_MODEL_NAME="CN-CLIP/ViT-L-14-336"
46 39 DEFAULT_REPLICAS=1 # 副本数
... ... @@ -49,8 +42,8 @@ DEFAULT_REPLICAS=1 # 副本数
49 42 PROJECT_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
50 43 CLIP_SERVER_DIR="${PROJECT_ROOT}/third-party/clip-as-service/server"
51 44 LOG_DIR="${PROJECT_ROOT}/logs"
52   -PID_FILE="${LOG_DIR}/cnclip_service.pid"
53   -LOG_FILE="${LOG_DIR}/cnclip_service.log"
  45 +PID_FILE="${LOG_DIR}/cnclip.pid"
  46 +LOG_FILE="${LOG_DIR}/cnclip.log"
54 47  
55 48 # 帮助信息
56 49 show_help() {
... ... @@ -59,11 +52,8 @@ show_help() {
59 52 echo "用法: $0 [选项]"
60 53 echo ""
61 54 echo "选项:"
62   - echo " --port PORT 服务端口(默认:${DEFAULT_PORT})"
  55 + echo " --port PORT 服务端口(默认:${CNCLIP_PORT:-${DEFAULT_PORT}})"
63 56 echo " --device DEVICE 设备类型:cuda 或 cpu(默认:cuda)"
64   - echo " --batch-size SIZE 批处理大小(默认:${DEFAULT_BATCH_SIZE})"
65   - echo " --num-workers NUM 预处理线程数(默认:${DEFAULT_NUM_WORKERS})"
66   - echo " --dtype TYPE 数据类型:float16 或 float32(默认:${DEFAULT_DTYPE})"
67 57 echo " --model-name NAME 模型名称(默认:${DEFAULT_MODEL_NAME})"
68 58 echo " --replicas NUM 副本数(默认:${DEFAULT_REPLICAS})"
69 59 echo " --help 显示此帮助信息"
... ... @@ -72,23 +62,19 @@ show_help() {
72 62 echo " $0 # 使用默认配置启动"
73 63 echo " $0 --port 52000 --device cuda # 指定 CUDA 模式,端口 52000"
74 64 echo " $0 --port 52000 --device cpu # 显式使用 CPU 模式"
75   - echo " $0 --batch-size 16 --dtype float32 # 小批处理,float32 精度"
76 65 echo " $0 --replicas 2 # 启动2个副本(需8-10GB显存)"
77 66 echo ""
78 67 echo "支持的模型:"
79 68 echo " - CN-CLIP/ViT-B-16 基础版本,速度快"
80 69 echo " - CN-CLIP/ViT-L-14 平衡版本"
81   - echo " - CN-CLIP/ViT-L-14-336 高分辨率版本(默认)"
82   - echo " - CN-CLIP/ViT-H-14 大型版本,精度高"
  70 + echo " - CN-CLIP/ViT-L-14-336 高分辨率版本"
  71 + echo " - CN-CLIP/ViT-H-14 大型版本,精度高(默认)"
83 72 echo " - CN-CLIP/RN50 ResNet-50 版本"
84 73 }
85 74  
86 75 # 解析命令行参数
87   -PORT=${DEFAULT_PORT}
  76 +PORT="${CNCLIP_PORT:-${DEFAULT_PORT}}"
88 77 DEVICE=${DEFAULT_DEVICE}
89   -BATCH_SIZE=${DEFAULT_BATCH_SIZE}
90   -NUM_WORKERS=${DEFAULT_NUM_WORKERS}
91   -DTYPE=${DEFAULT_DTYPE}
92 78 MODEL_NAME=${DEFAULT_MODEL_NAME}
93 79 REPLICAS=${DEFAULT_REPLICAS}
94 80  
... ... @@ -102,18 +88,6 @@ while [[ $# -gt 0 ]]; do
102 88 DEVICE="$2"
103 89 shift 2
104 90 ;;
105   - --batch-size)
106   - BATCH_SIZE="$2"
107   - shift 2
108   - ;;
109   - --num-workers)
110   - NUM_WORKERS="$2"
111   - shift 2
112   - ;;
113   - --dtype)
114   - DTYPE="$2"
115   - shift 2
116   - ;;
117 91 --model-name)
118 92 MODEL_NAME="$2"
119 93 shift 2
... ... @@ -197,7 +171,7 @@ python -c &quot;import cn_clip&quot; 2&gt;/dev/null || {
197 171 }
198 172  
199 173 # clip_server 通过 PYTHONPATH 加载(见下方启动命令),此处仅做可导入性检查
200   -export PYTHONPATH="${CLIP_SERVER_DIR}:${PYTHONPATH}"
  174 +export PYTHONPATH="${CLIP_SERVER_DIR}${PYTHONPATH:+:${PYTHONPATH}}"
201 175 python -c "import clip_server" 2>/dev/null || {
202 176 echo -e "${RED}错误: clip_server 不可用${NC}"
203 177 echo -e "${YELLOW}请重建专用环境: ./scripts/setup_cnclip_venv.sh${NC}"
... ... @@ -251,7 +225,7 @@ fi
251 225 cd "${CLIP_SERVER_DIR}"
252 226  
253 227 # 设置环境变量
254   -export PYTHONPATH="${CLIP_SERVER_DIR}:${PYTHONPATH}"
  228 +export PYTHONPATH="${CLIP_SERVER_DIR}${PYTHONPATH:+:${PYTHONPATH}}"
255 229 export NO_VERSION_CHECK=1 # 跳过版本检查
256 230  
257 231 # 启动服务
... ...
scripts/start_frontend.sh
... ... @@ -2,7 +2,7 @@
2 2  
3 3 # Start Frontend Server
4 4  
5   -set -e
  5 +set -euo pipefail
6 6  
7 7 cd "$(dirname "$0")/.."
8 8 source ./activate.sh
... ... @@ -17,6 +17,7 @@ echo -e &quot;${GREEN}========================================${NC}&quot;
17 17  
18 18 FRONTEND_PORT="${FRONTEND_PORT:-6003}"
19 19 API_PORT="${API_PORT:-6002}"
  20 +PORT="${FRONTEND_PORT}"
20 21  
21 22 echo -e "\n${YELLOW}Frontend will be available at:${NC}"
22 23 echo -e " ${GREEN}http://localhost:${FRONTEND_PORT}${NC}"
... ... @@ -25,4 +26,5 @@ echo -e &quot;${YELLOW}Make sure the backend API is running at:${NC}&quot;
25 26 echo -e " ${GREEN}http://localhost:${API_PORT}${NC}"
26 27 echo ""
27 28  
28   -python scripts/frontend_server.py
  29 +export FRONTEND_PORT API_PORT PORT
  30 +exec python scripts/frontend_server.py
... ...
scripts/start_indexer.sh
... ... @@ -2,7 +2,7 @@
2 2  
3 3 # Start dedicated Indexer API Service
4 4  
5   -set -e
  5 +set -euo pipefail
6 6  
7 7 cd "$(dirname "$0")/.."
8 8 source ./activate.sh
... ... @@ -15,24 +15,25 @@ echo -e &quot;${GREEN}========================================${NC}&quot;
15 15 echo -e "${GREEN}Starting Indexer API Service${NC}"
16 16 echo -e "${GREEN}========================================${NC}"
17 17  
  18 +INDEXER_HOST="${INDEXER_HOST:-0.0.0.0}"
  19 +INDEXER_PORT="${INDEXER_PORT:-6004}"
  20 +ES_HOST="${ES_HOST:-http://localhost:9200}"
  21 +ES_USERNAME="${ES_USERNAME:-}"
  22 +ES_PASSWORD="${ES_PASSWORD:-}"
  23 +
18 24 echo -e "\n${YELLOW}Configuration:${NC}"
19   -echo " INDEXER Host: ${INDEXER_HOST:-0.0.0.0}"
20   -echo " INDEXER Port: ${INDEXER_PORT:-6004}"
21   -echo " ES Host: ${ES_HOST:-http://localhost:9200}"
  25 +echo " INDEXER Host: ${INDEXER_HOST}"
  26 +echo " INDEXER Port: ${INDEXER_PORT}"
  27 +echo " ES Host: ${ES_HOST}"
22 28 echo " ES Username: ${ES_USERNAME:-not set}"
23 29  
24 30 echo -e "\n${YELLOW}Starting indexer service...${NC}"
25 31  
26 32 # Export environment variables for the Python process
27   -export INDEXER_HOST=${INDEXER_HOST:-0.0.0.0}
28   -export INDEXER_PORT=${INDEXER_PORT:-6004}
29   -export ES_HOST=${ES_HOST:-http://localhost:9200}
30   -export ES_USERNAME=${ES_USERNAME:-}
31   -export ES_PASSWORD=${ES_PASSWORD:-}
32   -
33   -python main.py serve-indexer \
34   - --host $INDEXER_HOST \
35   - --port $INDEXER_PORT \
36   - --es-host $ES_HOST
  33 +export INDEXER_HOST INDEXER_PORT ES_HOST ES_USERNAME ES_PASSWORD
37 34  
  35 +exec python main.py serve-indexer \
  36 + --host "${INDEXER_HOST}" \
  37 + --port "${INDEXER_PORT}" \
  38 + --es-host "${ES_HOST}"
38 39  
... ...
scripts/start_servers.py deleted
... ... @@ -1,249 +0,0 @@
1   -#!/usr/bin/env python3
2   -"""
3   -Production-ready server startup script with proper error handling and monitoring.
4   -
5   -[LEGACY]
6   -This script is kept for historical compatibility.
7   -Preferred entrypoint is:
8   - ./scripts/service_ctl.sh start
9   -"""
10   -
11   -import os
12   -import sys
13   -import signal
14   -import time
15   -import subprocess
16   -import logging
17   -import argparse
18   -from typing import Dict, List, Optional
19   -import multiprocessing
20   -import threading
21   -
22   -# Configure logging
23   -logging.basicConfig(
24   - level=logging.INFO,
25   - format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
26   - handlers=[
27   - logging.StreamHandler(),
28   - logging.FileHandler('/tmp/search_engine_startup.log', mode='a')
29   - ]
30   -)
31   -logger = logging.getLogger(__name__)
32   -
33   -class ServerManager:
34   - """Manages frontend and API server processes."""
35   -
36   - def __init__(self):
37   - self.processes: Dict[str, subprocess.Popen] = {}
38   - self.running = True
39   -
40   - def start_frontend_server(self) -> bool:
41   - """Start the frontend server."""
42   - try:
43   - frontend_script = os.path.join(os.path.dirname(__file__), 'frontend_server.py')
44   -
45   - cmd = [sys.executable, frontend_script]
46   - env = os.environ.copy()
47   - env['PYTHONUNBUFFERED'] = '1'
48   -
49   - process = subprocess.Popen(
50   - cmd,
51   - env=env,
52   - stdout=subprocess.PIPE,
53   - stderr=subprocess.STDOUT,
54   - universal_newlines=True,
55   - bufsize=1
56   - )
57   -
58   - self.processes['frontend'] = process
59   - logger.info(f"Frontend server started with PID: {process.pid}")
60   -
61   - # Start monitoring thread
62   - threading.Thread(
63   - target=self._monitor_output,
64   - args=('frontend', process),
65   - daemon=True
66   - ).start()
67   -
68   - return True
69   -
70   - except Exception as e:
71   - logger.error(f"Failed to start frontend server: {e}")
72   - return False
73   -
74   - def start_api_server(self, es_host: str = "http://localhost:9200") -> bool:
75   - """Start the API server."""
76   - try:
77   - cmd = [
78   - sys.executable, 'main.py', 'serve',
79   - '--es-host', es_host,
80   - '--host', '0.0.0.0',
81   - '--port', '6002'
82   - ]
83   -
84   - env = os.environ.copy()
85   - env['PYTHONUNBUFFERED'] = '1'
86   - env['ES_HOST'] = es_host
87   -
88   - process = subprocess.Popen(
89   - cmd,
90   - env=env,
91   - stdout=subprocess.PIPE,
92   - stderr=subprocess.STDOUT,
93   - universal_newlines=True,
94   - bufsize=1
95   - )
96   -
97   - self.processes['api'] = process
98   - logger.info(f"API server started with PID: {process.pid}")
99   -
100   - # Start monitoring thread
101   - threading.Thread(
102   - target=self._monitor_output,
103   - args=('api', process),
104   - daemon=True
105   - ).start()
106   -
107   - return True
108   -
109   - except Exception as e:
110   - logger.error(f"Failed to start API server: {e}")
111   - return False
112   -
113   - def _monitor_output(self, name: str, process: subprocess.Popen):
114   - """Monitor process output and log appropriately."""
115   - try:
116   - for line in iter(process.stdout.readline, ''):
117   - if line.strip() and self.running:
118   - # Filter out scanner noise for frontend server
119   - if name == 'frontend':
120   - noise_patterns = [
121   - 'code 400',
122   - 'Bad request version',
123   - 'Bad request syntax',
124   - 'Bad HTTP/0.9 request type'
125   - ]
126   - if any(pattern in line for pattern in noise_patterns):
127   - continue
128   -
129   - logger.info(f"[{name}] {line.strip()}")
130   -
131   - except Exception as e:
132   - if self.running:
133   - logger.error(f"Error monitoring {name} output: {e}")
134   -
135   - def check_servers(self) -> bool:
136   - """Check if all servers are still running."""
137   - all_running = True
138   -
139   - for name, process in self.processes.items():
140   - if process.poll() is not None:
141   - logger.error(f"{name} server has stopped with exit code: {process.returncode}")
142   - all_running = False
143   -
144   - return all_running
145   -
146   - def stop_all(self):
147   - """Stop all servers gracefully."""
148   - logger.info("Stopping all servers...")
149   - self.running = False
150   -
151   - for name, process in self.processes.items():
152   - try:
153   - logger.info(f"Stopping {name} server (PID: {process.pid})...")
154   -
155   - # Try graceful shutdown first
156   - process.terminate()
157   -
158   - # Wait up to 10 seconds for graceful shutdown
159   - try:
160   - process.wait(timeout=10)
161   - logger.info(f"{name} server stopped gracefully")
162   - except subprocess.TimeoutExpired:
163   - # Force kill if graceful shutdown fails
164   - logger.warning(f"{name} server didn't stop gracefully, forcing...")
165   - process.kill()
166   - process.wait()
167   - logger.info(f"{name} server stopped forcefully")
168   -
169   - except Exception as e:
170   - logger.error(f"Error stopping {name} server: {e}")
171   -
172   - self.processes.clear()
173   - logger.info("All servers stopped")
174   -
175   -def signal_handler(signum, frame):
176   - """Handle shutdown signals."""
177   - logger.info(f"Received signal {signum}, shutting down...")
178   - if 'manager' in globals():
179   - manager.stop_all()
180   - sys.exit(0)
181   -
182   -def main():
183   - """Main function to start all servers."""
184   - global manager
185   -
186   - parser = argparse.ArgumentParser(description='Start saas-search servers (multi-tenant)')
187   - parser.add_argument('--es-host', default='http://localhost:9200', help='Elasticsearch host')
188   - parser.add_argument('--check-dependencies', action='store_true', help='Check dependencies before starting')
189   - args = parser.parse_args()
190   -
191   - logger.info("Starting saas-search servers (multi-tenant)...")
192   - logger.info(f"Elasticsearch: {args.es_host}")
193   -
194   - # Check dependencies if requested
195   - if args.check_dependencies:
196   - logger.info("Checking dependencies...")
197   - try:
198   - import slowapi
199   - import anyio
200   - logger.info("✓ All dependencies available")
201   - except ImportError as e:
202   - logger.error(f"✗ Missing dependency: {e}")
203   - logger.info("Please run: pip install -r requirements_server.txt")
204   - sys.exit(1)
205   -
206   - manager = ServerManager()
207   -
208   - # Set up signal handlers
209   - signal.signal(signal.SIGINT, signal_handler)
210   - signal.signal(signal.SIGTERM, signal_handler)
211   -
212   - try:
213   - # Start servers
214   - if not manager.start_api_server(args.es_host):
215   - logger.error("Failed to start API server")
216   - sys.exit(1)
217   -
218   - # Wait a moment before starting frontend server
219   - time.sleep(2)
220   -
221   - if not manager.start_frontend_server():
222   - logger.error("Failed to start frontend server")
223   - manager.stop_all()
224   - sys.exit(1)
225   -
226   - logger.info("All servers started successfully!")
227   - logger.info("Frontend: http://localhost:6003")
228   - logger.info("API: http://localhost:6002")
229   - logger.info("API Docs: http://localhost:6002/docs")
230   - logger.info("Press Ctrl+C to stop all servers")
231   -
232   - # Monitor servers
233   - while manager.running:
234   - if not manager.check_servers():
235   - logger.error("One or more servers have stopped unexpectedly")
236   - manager.stop_all()
237   - sys.exit(1)
238   -
239   - time.sleep(5) # Check every 5 seconds
240   -
241   - except KeyboardInterrupt:
242   - logger.info("Received interrupt signal")
243   - except Exception as e:
244   - logger.error(f"Unexpected error: {e}")
245   - finally:
246   - manager.stop_all()
247   -
248   -if __name__ == '__main__':
249   - main()
250 0 \ No newline at end of file
scripts/start_tei_service.sh
... ... @@ -27,14 +27,10 @@ TEI_DTYPE=&quot;${TEI_DTYPE:-float16}&quot;
27 27 HF_CACHE_DIR="${HF_CACHE_DIR:-$HOME/.cache/huggingface}"
28 28 TEI_HEALTH_TIMEOUT_SEC="${TEI_HEALTH_TIMEOUT_SEC:-300}"
29 29  
30   -USE_GPU_RAW="${TEI_USE_GPU:-1}"
31   -USE_GPU="$(echo "${USE_GPU_RAW}" | tr '[:upper:]' '[:lower:]')"
32   -if [[ "${USE_GPU}" == "1" || "${USE_GPU}" == "true" || "${USE_GPU}" == "yes" ]]; then
33   - USE_GPU="1"
34   -elif [[ "${USE_GPU}" == "0" || "${USE_GPU}" == "false" || "${USE_GPU}" == "no" ]]; then
35   - USE_GPU="0"
36   -else
37   - echo "ERROR: invalid TEI_USE_GPU=${USE_GPU_RAW}. Use 1/0 (or true/false)." >&2
  30 +TEI_DEVICE_RAW="${TEI_DEVICE:-cuda}"
  31 +TEI_DEVICE="$(echo "${TEI_DEVICE_RAW}" | tr '[:upper:]' '[:lower:]')"
  32 +if [[ "${TEI_DEVICE}" != "cuda" && "${TEI_DEVICE}" != "cpu" ]]; then
  33 + echo "ERROR: invalid TEI_DEVICE=${TEI_DEVICE_RAW}. Use cuda/cpu." >&2
38 34 exit 1
39 35 fi
40 36  
... ... @@ -50,19 +46,19 @@ detect_gpu_tei_image() {
50 46 fi
51 47 }
52 48  
53   -if [[ "${USE_GPU}" == "1" ]]; then
  49 +if [[ "${TEI_DEVICE}" == "cuda" ]]; then
54 50 if ! command -v nvidia-smi >/dev/null 2>&1 || ! nvidia-smi >/dev/null 2>&1; then
55   - echo "ERROR: TEI_USE_GPU=1 but NVIDIA GPU is not available. No CPU fallback." >&2
  51 + echo "ERROR: TEI_DEVICE=cuda but NVIDIA GPU is not available. No CPU fallback." >&2
56 52 exit 1
57 53 fi
58 54 if ! docker info --format '{{json .Runtimes}}' 2>/dev/null | grep -q 'nvidia'; then
59   - echo "ERROR: TEI_USE_GPU=1 but Docker nvidia runtime is not configured." >&2
  55 + echo "ERROR: TEI_DEVICE=cuda but Docker nvidia runtime is not configured." >&2
60 56 echo "Install and configure nvidia-container-toolkit, then restart Docker." >&2
61 57 exit 1
62 58 fi
63 59 TEI_IMAGE="${TEI_IMAGE:-$(detect_gpu_tei_image)}"
64 60 GPU_ARGS=(--gpus all)
65   - TEI_MODE="gpu"
  61 + TEI_MODE="cuda"
66 62 else
67 63 TEI_IMAGE="${TEI_IMAGE:-ghcr.io/huggingface/text-embeddings-inference:${TEI_VERSION}}"
68 64 GPU_ARGS=()
... ... @@ -81,9 +77,9 @@ if [[ -n &quot;${existing_id}&quot; ]]; then
81 77 if [[ "${current_image}" == *":cuda-"* || "${current_image}" == *":turing-"* ]]; then
82 78 current_is_gpu_image=1
83 79 fi
84   - if [[ "${USE_GPU}" == "1" ]]; then
  80 + if [[ "${TEI_DEVICE}" == "cuda" ]]; then
85 81 if [[ "${current_is_gpu_image}" -eq 1 ]] && [[ "${device_req}" != "null" ]] && [[ "${current_image}" == "${TEI_IMAGE}" ]]; then
86   - echo "TEI already running (GPU): ${TEI_CONTAINER_NAME}"
  82 + echo "TEI already running (CUDA): ${TEI_CONTAINER_NAME}"
87 83 exit 0
88 84 fi
89 85 echo "TEI running with different mode/image; recreating container ${TEI_CONTAINER_NAME}"
... ...
scripts/start_translator.sh
... ... @@ -3,13 +3,13 @@
3 3 # Start Translation Service
4 4 #
5 5  
6   -set -e
  6 +set -euo pipefail
7 7  
8 8 cd "$(dirname "$0")/.."
9 9 source ./activate.sh
10 10  
11 11 TRANSLATION_HOST="${TRANSLATION_HOST:-0.0.0.0}"
12   -TRANSLATION_PORT="${TRANSLATION_PORT:-${TRANSLATOR_PORT:-6006}}"
  12 +TRANSLATION_PORT="${TRANSLATION_PORT:-6006}"
13 13  
14 14 echo "========================================"
15 15 echo "Starting Translation Service"
... ...
scripts/stop.sh
1 1 #!/bin/bash
2 2  
3   -# Backward-compatible stop entrypoint.
  3 +# Service stop entrypoint.
4 4 # Delegates to unified service controller.
5 5  
6   -set -e
  6 +set -euo pipefail
7 7  
8 8 cd "$(dirname "$0")/.."
9 9  
... ...
scripts/stop_cnclip_service.sh
... ... @@ -11,7 +11,7 @@
11 11 #
12 12 ###############################################################################
13 13  
14   -set -e
  14 +set -euo pipefail
15 15  
16 16 # 颜色定义
17 17 RED='\033[0;31m'
... ... @@ -23,7 +23,7 @@ NC=&#39;\033[0m&#39; # No Color
23 23 # 项目路径
24 24 PROJECT_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
25 25 LOG_DIR="${PROJECT_ROOT}/logs"
26   -PID_FILE="${LOG_DIR}/cnclip_service.pid"
  26 +PID_FILE="${LOG_DIR}/cnclip.pid"
27 27  
28 28 echo -e "${BLUE}========================================${NC}"
29 29 echo -e "${BLUE}停止 CN-CLIP 服务${NC}"
... ...
scripts/stop_reranker.sh deleted
... ... @@ -1,57 +0,0 @@
1   -#!/bin/bash
2   -#
3   -# Stop Reranker Service
4   -#
5   -
6   -set -e
7   -
8   -cd "$(dirname "$0")/.."
9   -
10   -PID_FILE="logs/reranker.pid"
11   -RERANKER_PORT="${RERANKER_PORT:-6007}"
12   -
13   -echo "========================================"
14   -echo "Stopping Reranker Service"
15   -echo "========================================"
16   -
17   -if [ -f "${PID_FILE}" ]; then
18   - PID="$(cat "${PID_FILE}" 2>/dev/null || true)"
19   - if [ -n "${PID}" ] && kill -0 "${PID}" 2>/dev/null; then
20   - echo "Stopping PID from file: ${PID}"
21   - kill -TERM "${PID}" 2>/dev/null || true
22   - sleep 1
23   - if kill -0 "${PID}" 2>/dev/null; then
24   - kill -KILL "${PID}" 2>/dev/null || true
25   - fi
26   - fi
27   - rm -f "${PID_FILE}"
28   -fi
29   -
30   -PORT_PIDS="$(lsof -ti:${RERANKER_PORT} 2>/dev/null || true)"
31   -if [ -n "${PORT_PIDS}" ]; then
32   - echo "Stopping process on port ${RERANKER_PORT}: ${PORT_PIDS}"
33   - for PID in ${PORT_PIDS}; do
34   - kill -TERM "${PID}" 2>/dev/null || true
35   - done
36   - sleep 1
37   - PORT_PIDS="$(lsof -ti:${RERANKER_PORT} 2>/dev/null || true)"
38   - for PID in ${PORT_PIDS}; do
39   - kill -KILL "${PID}" 2>/dev/null || true
40   - done
41   -fi
42   -
43   -# Cleanup orphaned vLLM engine workers that may survive a failed startup.
44   -ENGINE_PIDS="$(pgrep -f 'VLLM::EngineCore' 2>/dev/null || true)"
45   -if [ -n "${ENGINE_PIDS}" ]; then
46   - echo "Stopping orphaned vLLM engine processes: ${ENGINE_PIDS}"
47   - for PID in ${ENGINE_PIDS}; do
48   - kill -TERM "${PID}" 2>/dev/null || true
49   - done
50   - sleep 1
51   - ENGINE_PIDS="$(pgrep -f 'VLLM::EngineCore' 2>/dev/null || true)"
52   - for PID in ${ENGINE_PIDS}; do
53   - kill -KILL "${PID}" 2>/dev/null || true
54   - done
55   -fi
56   -
57   -echo "Reranker service stopped."
scripts/stop_translator.sh deleted
... ... @@ -1,43 +0,0 @@
1   -#!/bin/bash
2   -#
3   -# Stop Translation Service
4   -#
5   -
6   -set -e
7   -
8   -cd "$(dirname "$0")/.."
9   -
10   -PID_FILE="logs/translator.pid"
11   -TRANSLATION_PORT="${TRANSLATION_PORT:-${TRANSLATOR_PORT:-6006}}"
12   -
13   -echo "========================================"
14   -echo "Stopping Translation Service"
15   -echo "========================================"
16   -
17   -if [ -f "${PID_FILE}" ]; then
18   - PID="$(cat "${PID_FILE}" 2>/dev/null || true)"
19   - if [ -n "${PID}" ] && kill -0 "${PID}" 2>/dev/null; then
20   - echo "Stopping PID from file: ${PID}"
21   - kill -TERM "${PID}" 2>/dev/null || true
22   - sleep 1
23   - if kill -0 "${PID}" 2>/dev/null; then
24   - kill -KILL "${PID}" 2>/dev/null || true
25   - fi
26   - fi
27   - rm -f "${PID_FILE}"
28   -fi
29   -
30   -PORT_PIDS="$(lsof -ti:${TRANSLATION_PORT} 2>/dev/null || true)"
31   -if [ -n "${PORT_PIDS}" ]; then
32   - echo "Stopping process on port ${TRANSLATION_PORT}: ${PORT_PIDS}"
33   - for PID in ${PORT_PIDS}; do
34   - kill -TERM "${PID}" 2>/dev/null || true
35   - done
36   - sleep 1
37   - PORT_PIDS="$(lsof -ti:${TRANSLATION_PORT} 2>/dev/null || true)"
38   - for PID in ${PORT_PIDS}; do
39   - kill -KILL "${PID}" 2>/dev/null || true
40   - done
41   -fi
42   -
43   -echo "Translation service stopped."