Commit e8d3bbb21f8437e885b57c10e9d58411bc9cb2f4

Authored by tangwang
1 parent b3ffdc72

Add test env TEI GPU overrides examples

This branch is intended to differ from master only by deployment configuration for the test machine.

- Add `.env.test.example` as a secrets-free override snippet to be appended onto `.env`.
  - Pins TEI to GPU mode (`TEI_DEVICE=cuda`) with `float16` for performance.
  - Pins a Tesla T4 compatible TEI image (`text-embeddings-inference:turing-1.9`) to avoid
    compute-capability mismatch errors (T4=sm75 vs non-turing images compiled for sm80).
  - Keeps TEI request limits aligned with current service settings (`TEI_MAX_BATCH_TOKENS=2048`,
    `TEI_MAX_CLIENT_BATCH_SIZE=8`) and provides an example BGE-M3 snapshot path.
- Extend `.env.example` with guidance on selecting the correct TEI image tag (`turing-*` for T4,
  `cuda-*` for Ampere+) and optional mirror repository override.

No credentials are committed; `.env` remains local-only.

Made-with: Cursor
Showing 2 changed files with 48 additions and 0 deletions   Show diff stats
.env.example
... ... @@ -46,6 +46,14 @@ EMBEDDING_BACKEND=tei
46 46 TEI_BASE_URL=http://127.0.0.1:8080
47 47 TEI_DEVICE=cuda
48 48 TEI_VERSION=1.9
  49 +# Optional: override TEI docker image repository (useful for mirrors).
  50 +# TEI_IMAGE_REPO=ghcr.m.daocloud.io/huggingface/text-embeddings-inference
  51 +#
  52 +# Optional: pin an explicit TEI image tag.
  53 +# - For Tesla T4 (compute capability 7.5), prefer the `turing-*` image tag, e.g.:
  54 +# TEI_IMAGE=ghcr.m.daocloud.io/huggingface/text-embeddings-inference:turing-1.9
  55 +# - For Ampere+ GPUs, prefer `cuda-*` image tag, e.g.:
  56 +# TEI_IMAGE=ghcr.m.daocloud.io/huggingface/text-embeddings-inference:cuda-1.9
49 57 TEI_MAX_BATCH_TOKENS=2048
50 58 TEI_MAX_CLIENT_BATCH_SIZE=8
51 59 TEI_HEALTH_TIMEOUT_SEC=300
... ...
.env.test.example 0 → 100644
... ... @@ -0,0 +1,40 @@
  1 +# Test environment overrides example (no secrets).
  2 +#
  3 +# Usage:
  4 +# cp .env.example .env
  5 +# cat .env.test.example >> .env
  6 +#
  7 +# Notes:
  8 +# - This repo is multi-service; values below focus on local test deployment.
  9 +# - Keep real credentials (Redis/MySQL/ES passwords) out of VCS.
  10 +
  11 +# ===== runtime / namespace =====
  12 +RUNTIME_ENV=test
  13 +ES_INDEX_NAMESPACE=test_
  14 +
  15 +# ===== Elasticsearch (example: local docker on non-default port) =====
  16 +ES_HOST=http://127.0.0.1:19200
  17 +ES_USERNAME=
  18 +ES_PASSWORD=
  19 +ES_DOCKER_HTTP_PORT=19200
  20 +ES_DOCKER_CONTAINER_NAME=saas-search-es9-test
  21 +
  22 +# ===== HuggingFace cache =====
  23 +HF_CACHE_DIR=/data/tw/.cache/huggingface
  24 +
  25 +# ===== TEI (text embeddings inference) =====
  26 +# Service port exposed by container (host:8080 -> container:80)
  27 +TEI_PORT=8080
  28 +# Use GPU when available
  29 +TEI_DEVICE=cuda
  30 +# Use float16 for performance on GPU
  31 +TEI_DTYPE=float16
  32 +# IMPORTANT for Tesla T4 (compute capability 7.5): use turing image tag
  33 +TEI_IMAGE=ghcr.m.daocloud.io/huggingface/text-embeddings-inference:turing-1.9
  34 +# Example pinned model snapshot path (update per-machine)
  35 +TEI_MODEL_ID=/data/hub/models--BAAI--bge-m3/snapshots/5617a9f61b028005a4858fdac845db406aefb181
  36 +TEI_MAX_BATCH_TOKENS=2048
  37 +TEI_MAX_CLIENT_BATCH_SIZE=8
  38 +TEI_HEALTH_TIMEOUT_SEC=240
  39 +TEI_CONTAINER_NAME=saas-search-tei-test
  40 +
... ...