Commit 32e9b30c71aba5d8ce1c76793107546c6d6a9712

Authored by tangwang
1 parent 3abbc95a

scripts/ 根目录主要保留启动/编排入口,其他脚本归到了几个固定子目录:

  - 数据转换放到 scripts/data_import/README.md
  - 诊断巡检放到 scripts/inspect/README.md
  - 运维辅助放到 scripts/ops/README.md
  - 前端辅助服务放到 scripts/frontend/frontend_server.py
  - 翻译模型下载放到 scripts/translation/download_translation_models.py
  - 临时图片补 embedding 脚本收敛成
    scripts/maintenance/embed_tenant_image_urls.py
  - Redis 监控脚本并入 redis/,现在是 scripts/redis/monitor_eviction.py

  同时我把真实调用链都改到了新位置:

  - scripts/start_frontend.sh
  - scripts/start_cnclip_service.sh
  - scripts/service_ctl.sh
  - scripts/setup_translator_venv.sh
  - scripts/README.md

  文档里涉及这些脚本的路径也同步修了,主要是 docs/QUICKSTART.md 和
translation/README.md。
Showing 28 changed files with 83 additions and 53 deletions   Show diff stats
docs/QUICKSTART.md
... ... @@ -166,7 +166,7 @@ curl -X POST http://localhost:6008/embed/image \
166 166  
167 167 ```bash
168 168 ./scripts/setup_translator_venv.sh
169   -./.venv-translator/bin/python scripts/download_translation_models.py --all-local # 如需本地模型
  169 +./.venv-translator/bin/python scripts/translation/download_translation_models.py --all-local # 如需本地模型
170 170 ./scripts/start_translator.sh
171 171  
172 172 curl -X POST http://localhost:6006/translate \
... ...
docs/工作总结-微服务性能优化与架构.md
... ... @@ -133,8 +133,8 @@ instruction: "Given a shopping query, rank product titles by relevance"
133 133 - 启动时:backend/indexer/frontend/embedding/translator/reranker 会写 pid 到 `logs/<service>.pid`,并执行 `wait_for_health`(GET `http://127.0.0.1:<port>/health`);reranker 健康重试 90 次,其余 30 次;TEI 校验 Docker 容器存在且 `/health` 成功;cnclip 无 HTTP 健康则仅校验进程/端口。
134 134 - **监控常驻**:
135 135 - `./scripts/service_ctl.sh monitor-start <targets>` 启动后台监控进程,将 targets 写入 `logs/service-monitor.targets`,pid 写入 `logs/service-monitor.pid`,日志追加到 `logs/service-monitor.log`。
136   - - 轮询间隔 `MONITOR_INTERVAL_SEC` 默认 **10** 秒;连续 **3** 次(`MONITOR_FAIL_THRESHOLD`)健康失败则触发重启;重启冷却 `MONITOR_RESTART_COOLDOWN_SEC` 默认 **30** 秒;每小时最多重启 `MONITOR_MAX_RESTARTS_PER_HOUR` 默认 **6** 次;超限时调用 `scripts/wechat_alert.py` 告警(若存在)。
137   -- **日志**:各服务按日滚动到 `logs/<service>-<date>.log`,通过 `scripts/daily_log_router.sh` 与 `LOG_RETENTION_DAYS`(默认 30)控制保留。
  136 + - 轮询间隔 `MONITOR_INTERVAL_SEC` 默认 **10** 秒;连续 **3** 次(`MONITOR_FAIL_THRESHOLD`)健康失败则触发重启;重启冷却 `MONITOR_RESTART_COOLDOWN_SEC` 默认 **30** 秒;每小时最多重启 `MONITOR_MAX_RESTARTS_PER_HOUR` 默认 **6** 次;超限时调用 `scripts/ops/wechat_alert.py` 告警(若存在)。
  137 +- **日志**:各服务按日滚动到 `logs/<service>-<date>.log`,通过 `scripts/ops/daily_log_router.sh` 与 `LOG_RETENTION_DAYS`(默认 30)控制保留。
138 138  
139 139 详见:`scripts/service_ctl.sh` 内注释及 `docs/Usage-Guide.md`。
140 140  
... ...
scripts/README.md
1 1 # Scripts
2 2  
3   -`scripts/` 现在只保留当前架构下仍然有效的运行、运维、环境和数据处理脚本
  3 +`scripts/` 现在只保留当前架构下仍然有效的运行、运维、环境和数据处理脚本,并按职责拆到稳定子目录,避免继续在根目录平铺
4 4  
5 5 ## 当前分类
6 6  
... ... @@ -20,6 +20,8 @@
20 20 - `stop.sh`
21 21 - `stop_tei_service.sh`
22 22 - `stop_cnclip_service.sh`
  23 + - `frontend/`
  24 + - `ops/`
23 25  
24 26 - 环境初始化
25 27 - `create_venv.sh`
... ... @@ -33,11 +35,15 @@
33 35 - `create_tenant_index.sh`
34 36 - `build_suggestions.sh`
35 37 - `mock_data.sh`
  38 + - `data_import/`
  39 + - `inspect/`
  40 + - `maintenance/`
36 41  
37 42 - 评估与专项工具
38 43 - `evaluation/`
39 44 - `redis/`
40 45 - `debug/`
  46 + - `translation/`
41 47  
42 48 ## 已迁移
43 49  
... ...
scripts/data_import/README.md 0 → 100644
... ... @@ -0,0 +1,13 @@
  1 +# Data Import Scripts
  2 +
  3 +这一组脚本用于把外部商品数据或 CSV/XLSX 样本转换为 Shoplazza 导入格式。
  4 +
  5 +- `amazon_xlsx_to_shoplazza_xlsx.py`
  6 +- `competitor_xlsx_to_shoplazza_xlsx.py`
  7 +- `csv_to_excel.py`
  8 +- `csv_to_excel_multi_variant.py`
  9 +- `shoplazza_excel_template.py`
  10 +- `shoplazza_import_template.py`
  11 +- `tenant3_csv_to_shoplazza_xlsx.sh`
  12 +
  13 +这里是离线数据转换工具,不属于线上服务运维入口。
... ...
scripts/amazon_xlsx_to_shoplazza_xlsx.py renamed to scripts/data_import/amazon_xlsx_to_shoplazza_xlsx.py
... ... @@ -35,9 +35,10 @@ from pathlib import Path
35 35  
36 36 from openpyxl import load_workbook
37 37  
38   -# Allow running as `python scripts/xxx.py` without installing as a package
39   -sys.path.insert(0, str(Path(__file__).resolve().parent))
40   -from shoplazza_excel_template import create_excel_from_template_fast
  38 +REPO_ROOT = Path(__file__).resolve().parents[2]
  39 +sys.path.insert(0, str(REPO_ROOT))
  40 +
  41 +from scripts.data_import.shoplazza_excel_template import create_excel_from_template_fast
41 42  
42 43  
43 44 PREFERRED_OPTION_KEYS = [
... ... @@ -612,4 +613,3 @@ def main():
612 613 if __name__ == "__main__":
613 614 main()
614 615  
615   -
... ...
scripts/competitor_xlsx_to_shoplazza_xlsx.py renamed to scripts/data_import/competitor_xlsx_to_shoplazza_xlsx.py
... ... @@ -6,7 +6,7 @@ The input `data/mai_jia_jing_ling/products_data/*.xlsx` files are Amazon-format
6 6 (Parent/Child ASIN), not “competitor data”.
7 7  
8 8 Please use:
9   - - `scripts/amazon_xlsx_to_shoplazza_xlsx.py`
  9 + - `scripts/data_import/amazon_xlsx_to_shoplazza_xlsx.py`
10 10  
11 11 This wrapper simply forwards all CLI args to the correctly named script, so you
12 12 automatically get the latest performance improvements (fast read/write).
... ... @@ -15,13 +15,12 @@ automatically get the latest performance improvements (fast read/write).
15 15 import sys
16 16 from pathlib import Path
17 17  
18   -# Allow running as `python scripts/xxx.py` without installing as a package
19   -sys.path.insert(0, str(Path(__file__).resolve().parent))
  18 +REPO_ROOT = Path(__file__).resolve().parents[2]
  19 +sys.path.insert(0, str(REPO_ROOT))
20 20  
21   -from amazon_xlsx_to_shoplazza_xlsx import main as amazon_main
  21 +from scripts.data_import.amazon_xlsx_to_shoplazza_xlsx import main as amazon_main
22 22  
23 23  
24 24 if __name__ == "__main__":
25 25 amazon_main()
26 26  
27   -
... ...
scripts/csv_to_excel.py renamed to scripts/data_import/csv_to_excel.py
... ... @@ -22,12 +22,12 @@ from openpyxl import load_workbook
22 22 from openpyxl.styles import Font, Alignment
23 23 from openpyxl.utils import get_column_letter
24 24  
25   -# Shared helpers (keeps template writing consistent across scripts)
26   -from scripts.shoplazza_import_template import create_excel_from_template as _create_excel_from_template_shared
27   -from scripts.shoplazza_import_template import generate_handle as _generate_handle_shared
  25 +REPO_ROOT = Path(__file__).resolve().parents[2]
  26 +sys.path.insert(0, str(REPO_ROOT))
28 27  
29   -# Add parent directory to path
30   -sys.path.insert(0, str(Path(__file__).parent.parent))
  28 +# Shared helpers (keeps template writing consistent across scripts)
  29 +from scripts.data_import.shoplazza_import_template import create_excel_from_template as _create_excel_from_template_shared
  30 +from scripts.data_import.shoplazza_import_template import generate_handle as _generate_handle_shared
31 31  
32 32  
33 33 def clean_value(value):
... ... @@ -299,4 +299,3 @@ def main():
299 299  
300 300 if __name__ == '__main__':
301 301 main()
302   -
... ...
scripts/csv_to_excel_multi_variant.py renamed to scripts/data_import/csv_to_excel_multi_variant.py
... ... @@ -22,12 +22,12 @@ import itertools
22 22 from openpyxl import load_workbook
23 23 from openpyxl.styles import Alignment
24 24  
25   -# Shared helpers (keeps template writing consistent across scripts)
26   -from scripts.shoplazza_import_template import create_excel_from_template as _create_excel_from_template_shared
27   -from scripts.shoplazza_import_template import generate_handle as _generate_handle_shared
  25 +REPO_ROOT = Path(__file__).resolve().parents[2]
  26 +sys.path.insert(0, str(REPO_ROOT))
28 27  
29   -# Add parent directory to path
30   -sys.path.insert(0, str(Path(__file__).parent.parent))
  28 +# Shared helpers (keeps template writing consistent across scripts)
  29 +from scripts.data_import.shoplazza_import_template import create_excel_from_template as _create_excel_from_template_shared
  30 +from scripts.data_import.shoplazza_import_template import generate_handle as _generate_handle_shared
31 31  
32 32 # Color definitions
33 33 COLORS = [
... ... @@ -562,4 +562,3 @@ def main():
562 562  
563 563 if __name__ == '__main__':
564 564 main()
565   -
... ...
scripts/shoplazza_excel_template.py renamed to scripts/data_import/shoplazza_excel_template.py
scripts/shoplazza_import_template.py renamed to scripts/data_import/shoplazza_import_template.py
scripts/tenant3__csv_to_shoplazza_xlsx.sh renamed to scripts/data_import/tenant3_csv_to_shoplazza_xlsx.sh
... ... @@ -5,16 +5,16 @@ cd &quot;$(dirname &quot;$0&quot;)/..&quot;
5 5 source ./activate.sh
6 6  
7 7 # # 基本使用(生成所有数据)
8   -# python scripts/csv_to_excel.py
  8 +# python scripts/data_import/csv_to_excel.py
9 9  
10 10 # # 指定输出文件
11   -# python scripts/csv_to_excel.py --output tenant3_imports.xlsx
  11 +# python scripts/data_import/csv_to_excel.py --output tenant3_imports.xlsx
12 12  
13 13 # # 限制处理行数(用于测试)
14   -# python scripts/csv_to_excel.py --limit 100
  14 +# python scripts/data_import/csv_to_excel.py --limit 100
15 15  
16 16 # 指定CSV文件和模板文件
17   -python scripts/csv_to_excel.py \
  17 +python scripts/data_import/csv_to_excel.py \
18 18 --csv-file data/customer1/goods_with_pic.5years_congku.csv.shuf.1w \
19 19 --template docs/商品导入模板.xlsx \
20   - --output tenant3_imports.xlsx
21 20 \ No newline at end of file
  21 + --output tenant3_imports.xlsx
... ...
scripts/frontend_server.py renamed to scripts/frontend/frontend_server.py
... ... @@ -16,7 +16,7 @@ from pathlib import Path
16 16 from dotenv import load_dotenv
17 17  
18 18 # Load .env file
19   -project_root = Path(__file__).parent.parent
  19 +project_root = Path(__file__).resolve().parents[2]
20 20 load_dotenv(project_root / '.env')
21 21  
22 22 # Get API_BASE_URL from environment(默认不注入,避免被旧 .env 覆盖同源策略)
... ... @@ -27,7 +27,7 @@ INJECT_API_BASE_URL = os.getenv(&#39;FRONTEND_INJECT_API_BASE_URL&#39;, &#39;0&#39;) == &#39;1&#39;
27 27 BACKEND_PROXY_URL = os.getenv('BACKEND_PROXY_URL', 'http://127.0.0.1:6002').rstrip('/')
28 28  
29 29 # Change to frontend directory
30   -frontend_dir = os.path.join(os.path.dirname(__file__), '../frontend')
  30 +frontend_dir = os.path.join(project_root, 'frontend')
31 31 os.chdir(frontend_dir)
32 32  
33 33 # FRONTEND_PORT is the canonical config; keep PORT as a secondary fallback.
... ...
scripts/inspect/README.md 0 → 100644
... ... @@ -0,0 +1,10 @@
  1 +# Inspect Scripts
  2 +
  3 +这一组脚本用于做一次性诊断、索引检查和数据核对:
  4 +
  5 +- `check_data_source.py`
  6 +- `check_es_data.py`
  7 +- `check_index_mapping.py`
  8 +- `compare_index_mappings.py`
  9 +
  10 +它们依赖真实 DB / ES 环境,不属于 CI 测试或 benchmark。
... ...
scripts/check_data_source.py renamed to scripts/inspect/check_data_source.py
... ... @@ -14,8 +14,8 @@ import argparse
14 14 from pathlib import Path
15 15 from sqlalchemy import create_engine, text
16 16  
17   -# Add parent directory to path
18   -sys.path.insert(0, str(Path(__file__).parent.parent))
  17 +# Add repo root to path
  18 +sys.path.insert(0, str(Path(__file__).resolve().parents[2]))
19 19  
20 20 from utils.db_connector import create_db_connection
21 21  
... ... @@ -298,4 +298,3 @@ def main():
298 298  
299 299 if __name__ == '__main__':
300 300 sys.exit(main())
301   -
... ...
scripts/check_es_data.py renamed to scripts/inspect/check_es_data.py
... ... @@ -8,7 +8,7 @@ import os
8 8 import argparse
9 9 from pathlib import Path
10 10  
11   -sys.path.insert(0, str(Path(__file__).parent.parent))
  11 +sys.path.insert(0, str(Path(__file__).resolve().parents[2]))
12 12  
13 13 from utils.es_client import ESClient
14 14  
... ... @@ -265,4 +265,3 @@ def main():
265 265  
266 266 if __name__ == '__main__':
267 267 sys.exit(main())
268   -
... ...
scripts/check_index_mapping.py renamed to scripts/inspect/check_index_mapping.py
... ... @@ -8,7 +8,7 @@ import sys
8 8 import json
9 9 from pathlib import Path
10 10  
11   -sys.path.insert(0, str(Path(__file__).parent.parent))
  11 +sys.path.insert(0, str(Path(__file__).resolve().parents[2]))
12 12  
13 13 from utils.es_client import get_es_client_from_env
14 14 from indexer.mapping_generator import get_tenant_index_name
... ...
scripts/compare_index_mappings.py renamed to scripts/inspect/compare_index_mappings.py
... ... @@ -9,7 +9,7 @@ import json
9 9 from pathlib import Path
10 10 from typing import Dict, Any
11 11  
12   -sys.path.insert(0, str(Path(__file__).parent.parent))
  12 +sys.path.insert(0, str(Path(__file__).resolve().parents[2]))
13 13  
14 14 from utils.es_client import get_es_client_from_env
15 15  
... ... @@ -186,4 +186,3 @@ def main():
186 186  
187 187 if __name__ == '__main__':
188 188 sys.exit(main())
189   -
... ...
scripts/temp_embed_tenant_image_urls.py renamed to scripts/maintenance/embed_tenant_image_urls.py
... ... @@ -5,7 +5,7 @@
5 5  
6 6 用法:
7 7 source activate.sh # 会加载 .env,提供 ES_HOST / ES_USERNAME / ES_PASSWORD
8   - python scripts/temp_embed_tenant_image_urls.py
  8 + python scripts/maintenance/embed_tenant_image_urls.py
9 9  
10 10 未 source 时脚本也会尝试加载项目根目录 .env。
11 11 """
... ... @@ -30,7 +30,7 @@ from elasticsearch.helpers import scan
30 30 try:
31 31 from dotenv import load_dotenv
32 32  
33   - _ROOT = Path(__file__).resolve().parents[1]
  33 + _ROOT = Path(__file__).resolve().parents[2]
34 34 load_dotenv(_ROOT / ".env")
35 35 except ImportError:
36 36 pass
... ...
scripts/ops/README.md 0 → 100644
... ... @@ -0,0 +1,8 @@
  1 +# Ops Scripts
  2 +
  3 +这一组脚本是服务编排过程中的辅助脚本:
  4 +
  5 +- `daily_log_router.sh`:按天切日志
  6 +- `wechat_alert.py`:监控告警发送
  7 +
  8 +如果其他启动脚本引用这些文件,应通过这里的固定路径,不要再复制出新的同类工具。
... ...
scripts/daily_log_router.sh renamed to scripts/ops/daily_log_router.sh
... ... @@ -3,7 +3,7 @@
3 3 # Route incoming log stream into per-day files.
4 4 #
5 5 # Usage:
6   -# command 2>&1 | ./scripts/daily_log_router.sh <service> <log_dir> [retention_days]
  6 +# command 2>&1 | ./scripts/ops/daily_log_router.sh <service> <log_dir> [retention_days]
7 7 #
8 8  
9 9 set -euo pipefail
... ...
scripts/wechat_alert.py renamed to scripts/ops/wechat_alert.py
... ... @@ -6,7 +6,7 @@ This module is intentionally small and focused so that Bash-based monitors
6 6 can invoke it without pulling in the full application stack.
7 7  
8 8 Usage example:
9   - python scripts/wechat_alert.py --service backend --level error --message "backend restarted"
  9 + python scripts/ops/wechat_alert.py --service backend --level error --message "backend restarted"
10 10 """
11 11  
12 12 import argparse
... ... @@ -101,4 +101,3 @@ def main(argv: list[str] | None = None) -&gt; int:
101 101  
102 102 if __name__ == "__main__":
103 103 raise SystemExit(main())
104   -
... ...
scripts/monitor_eviction.py renamed to scripts/redis/monitor_eviction.py
... ... @@ -12,7 +12,7 @@ from pathlib import Path
12 12 from datetime import datetime
13 13  
14 14 # 添加项目路径
15   -project_root = Path(__file__).parent.parent
  15 +project_root = Path(__file__).resolve().parents[2]
16 16 sys.path.insert(0, str(project_root))
17 17  
18 18 from config.env_config import REDIS_CONFIG
... ...
scripts/service_ctl.sh
... ... @@ -334,7 +334,7 @@ monitor_services() {
334 334 local fail_threshold="${MONITOR_FAIL_THRESHOLD:-3}"
335 335 local restart_cooldown_sec="${MONITOR_RESTART_COOLDOWN_SEC:-30}"
336 336 local max_restarts_per_hour="${MONITOR_MAX_RESTARTS_PER_HOUR:-6}"
337   - local wechat_alert_py="${PROJECT_ROOT}/scripts/wechat_alert.py"
  337 + local wechat_alert_py="${PROJECT_ROOT}/scripts/ops/wechat_alert.py"
338 338  
339 339 require_positive_int "MONITOR_INTERVAL_SEC" "${interval_sec}"
340 340 require_positive_int "MONITOR_FAIL_THRESHOLD" "${fail_threshold}"
... ...
scripts/setup_translator_venv.sh
... ... @@ -39,5 +39,5 @@ echo &quot;Using TMPDIR=${TMPDIR}&quot;
39 39 echo
40 40 echo "Done."
41 41 echo "Translator venv: ${VENV_DIR}"
42   -echo "Download local models: ./.venv-translator/bin/python scripts/download_translation_models.py --all-local"
  42 +echo "Download local models: ./.venv-translator/bin/python scripts/translation/download_translation_models.py --all-local"
43 43 echo "Start service: ./scripts/start_translator.sh"
... ...
scripts/start_cnclip_service.sh
... ... @@ -61,7 +61,7 @@ LOG_DIR=&quot;${PROJECT_ROOT}/logs&quot;
61 61 PID_FILE="${LOG_DIR}/cnclip.pid"
62 62 LOG_LINK="${LOG_DIR}/cnclip.log"
63 63 LOG_FILE="${LOG_DIR}/cnclip-$(date +%F).log"
64   -LOG_ROUTER_SCRIPT="${PROJECT_ROOT}/scripts/daily_log_router.sh"
  64 +LOG_ROUTER_SCRIPT="${PROJECT_ROOT}/scripts/ops/daily_log_router.sh"
65 65  
66 66 # 帮助信息
67 67 show_help() {
... ...
scripts/start_frontend.sh
... ... @@ -27,4 +27,4 @@ echo -e &quot; ${GREEN}http://localhost:${API_PORT}${NC}&quot;
27 27 echo ""
28 28  
29 29 export FRONTEND_PORT API_PORT PORT
30   -exec python scripts/frontend_server.py
  30 +exec python scripts/frontend/frontend_server.py
... ...
scripts/download_translation_models.py renamed to scripts/translation/download_translation_models.py
... ... @@ -13,7 +13,7 @@ from typing import Iterable
13 13  
14 14 from huggingface_hub import snapshot_download
15 15  
16   -PROJECT_ROOT = Path(__file__).resolve().parent.parent
  16 +PROJECT_ROOT = Path(__file__).resolve().parents[2]
17 17 if str(PROJECT_ROOT) not in sys.path:
18 18 sys.path.insert(0, str(PROJECT_ROOT))
19 19 os.environ.setdefault("HF_HUB_DISABLE_XET", "1")
... ...
translation/README.md
... ... @@ -11,7 +11,7 @@
11 11 相关脚本与报告:
12 12 - 启动脚本:[`scripts/start_translator.sh`](/data/saas-search/scripts/start_translator.sh)
13 13 - 虚拟环境:[`scripts/setup_translator_venv.sh`](/data/saas-search/scripts/setup_translator_venv.sh)
14   -- 模型下载:[`scripts/download_translation_models.py`](/data/saas-search/scripts/download_translation_models.py)
  14 +- 模型下载:[`scripts/translation/download_translation_models.py`](/data/saas-search/scripts/translation/download_translation_models.py)
15 15 - 本地模型压测:[`benchmarks/translation/benchmark_translation_local_models.py`](/data/saas-search/benchmarks/translation/benchmark_translation_local_models.py)
16 16 - 聚焦压测脚本:[`benchmarks/translation/benchmark_translation_local_models_focus.py`](/data/saas-search/benchmarks/translation/benchmark_translation_local_models_focus.py)
17 17 - 基线性能报告:[`perf_reports/20260318/translation_local_models/README.md`](/data/saas-search/perf_reports/20260318/translation_local_models/README.md)
... ... @@ -493,7 +493,7 @@ cd /data/saas-search
493 493 下载全部本地模型:
494 494  
495 495 ```bash
496   -./.venv-translator/bin/python scripts/download_translation_models.py --all-local
  496 +./.venv-translator/bin/python scripts/translation/download_translation_models.py --all-local
497 497 ```
498 498  
499 499 下载完成后,默认目录应存在:
... ...