Add test env TEI GPU overrides examples

This branch is intended to differ from master only by deployment configuration for the test machine. - Add `.env.test.example` as a secrets-free override snippet to be appended onto `.env`. - Pins TEI to GPU mode (`TEI_DEVICE=cuda`) with `float16` for performance. - Pins a Tesla T4 compatible TEI image (`text-embeddings-inference:turing-1.9`) to avoid compute-capability mismatch errors (T4=sm75 vs non-turing images compiled for sm80). - Keeps TEI request limits aligned with current service settings (`TEI_MAX_BATCH_TOKENS=2048`, `TEI_MAX_CLIENT_BATCH_SIZE=8`) and provides an example BGE-M3 snapshot path. - Extend `.env.example` with guidance on selecting the correct TEI image tag (`turing-*` for T4, `cuda-*` for Ampere+) and optional mirror repository override. No credentials are committed; `.env` remains local-only. Made-with: Cursor

Add test env TEI GPU overrides examples
This branch is intended to differ from master only by deployment configuration for the test machine. - Add `.env.test.example` as a secrets-free override snippet to be appended onto `.env`. - Pins TEI to GPU mode (`TEI_DEVICE=cuda`) with `float16` for performance. - Pins a Tesla T4 compatible TEI image (`text-embeddings-inference:turing-1.9`) to avoid compute-capability mismatch errors (T4=sm75 vs non-turing images compiled for sm80). - Keeps TEI request limits aligned with current service settings (`TEI_MAX_BATCH_TOKENS=2048`, `TEI_MAX_CLIENT_BATCH_SIZE=8`) and provides an example BGE-M3 snapshot path. - Extend `.env.example` with guidance on selecting the correct TEI image tag (`turing-*` for T4, `cuda-*` for Ampere+) and optional mirror repository override. No credentials are committed; `.env` remains local-only. Made-with: Cursor
tangwang
1 parent b3ffdc72
Showing 2 changed files with 48 additions and 0 deletions Show diff stats
.env.example
.env.test.example
@@ -46,6 +46,14 @@ EMBEDDING_BACKEND=tei
 TEI_BASE_URL=http://127.0.0.1:8080
 TEI_DEVICE=cuda
 TEI_VERSION=1.9
+# Optional: override TEI docker image repository (useful for mirrors).
+# TEI_IMAGE_REPO=ghcr.m.daocloud.io/huggingface/text-embeddings-inference
+#
+# Optional: pin an explicit TEI image tag.
+# - For Tesla T4 (compute capability 7.5), prefer the `turing-*` image tag, e.g.:
+#   TEI_IMAGE=ghcr.m.daocloud.io/huggingface/text-embeddings-inference:turing-1.9
+# - For Ampere+ GPUs, prefer `cuda-*` image tag, e.g.:
+#   TEI_IMAGE=ghcr.m.daocloud.io/huggingface/text-embeddings-inference:cuda-1.9
 TEI_MAX_BATCH_TOKENS=2048
 TEI_MAX_CLIENT_BATCH_SIZE=8
 TEI_HEALTH_TIMEOUT_SEC=300
@@ -0,0 +1,40 @@
+# Test environment overrides example (no secrets).
+#
+# Usage:
+#   cp .env.example .env
+#   cat .env.test.example >> .env
+#
+# Notes:
+# - This repo is multi-service; values below focus on local test deployment.
+# - Keep real credentials (Redis/MySQL/ES passwords) out of VCS.
+
+# ===== runtime / namespace =====
+RUNTIME_ENV=test
+ES_INDEX_NAMESPACE=test_
+
+# ===== Elasticsearch (example: local docker on non-default port) =====
+ES_HOST=http://127.0.0.1:19200
+ES_USERNAME=
+ES_PASSWORD=
+ES_DOCKER_HTTP_PORT=19200
+ES_DOCKER_CONTAINER_NAME=saas-search-es9-test
+
+# ===== HuggingFace cache =====
+HF_CACHE_DIR=/data/tw/.cache/huggingface
+
+# ===== TEI (text embeddings inference) =====
+# Service port exposed by container (host:8080 -> container:80)
+TEI_PORT=8080
+# Use GPU when available
+TEI_DEVICE=cuda
+# Use float16 for performance on GPU
+TEI_DTYPE=float16
+# IMPORTANT for Tesla T4 (compute capability 7.5): use turing image tag
+TEI_IMAGE=ghcr.m.daocloud.io/huggingface/text-embeddings-inference:turing-1.9
+# Example pinned model snapshot path (update per-machine)
+TEI_MODEL_ID=/data/hub/models--BAAI--bge-m3/snapshots/5617a9f61b028005a4858fdac845db406aefb181
+TEI_MAX_BATCH_TOKENS=2048
+TEI_MAX_CLIENT_BATCH_SIZE=8
+TEI_HEALTH_TIMEOUT_SEC=240
+TEI_CONTAINER_NAME=saas-search-tei-test
+