Commit e8d3bbb21f8437e885b57c10e9d58411bc9cb2f4
1 parent
b3ffdc72
Add test env TEI GPU overrides examples
This branch is intended to differ from master only by deployment configuration for the test machine.
- Add `.env.test.example` as a secrets-free override snippet to be appended onto `.env`.
- Pins TEI to GPU mode (`TEI_DEVICE=cuda`) with `float16` for performance.
- Pins a Tesla T4 compatible TEI image (`text-embeddings-inference:turing-1.9`) to avoid
compute-capability mismatch errors (T4=sm75 vs non-turing images compiled for sm80).
- Keeps TEI request limits aligned with current service settings (`TEI_MAX_BATCH_TOKENS=2048`,
`TEI_MAX_CLIENT_BATCH_SIZE=8`) and provides an example BGE-M3 snapshot path.
- Extend `.env.example` with guidance on selecting the correct TEI image tag (`turing-*` for T4,
`cuda-*` for Ampere+) and optional mirror repository override.
No credentials are committed; `.env` remains local-only.
Made-with: Cursor
Showing
2 changed files
with
48 additions
and
0 deletions
Show diff stats
.env.example
| ... | ... | @@ -46,6 +46,14 @@ EMBEDDING_BACKEND=tei |
| 46 | 46 | TEI_BASE_URL=http://127.0.0.1:8080 |
| 47 | 47 | TEI_DEVICE=cuda |
| 48 | 48 | TEI_VERSION=1.9 |
| 49 | +# Optional: override TEI docker image repository (useful for mirrors). | |
| 50 | +# TEI_IMAGE_REPO=ghcr.m.daocloud.io/huggingface/text-embeddings-inference | |
| 51 | +# | |
| 52 | +# Optional: pin an explicit TEI image tag. | |
| 53 | +# - For Tesla T4 (compute capability 7.5), prefer the `turing-*` image tag, e.g.: | |
| 54 | +# TEI_IMAGE=ghcr.m.daocloud.io/huggingface/text-embeddings-inference:turing-1.9 | |
| 55 | +# - For Ampere+ GPUs, prefer `cuda-*` image tag, e.g.: | |
| 56 | +# TEI_IMAGE=ghcr.m.daocloud.io/huggingface/text-embeddings-inference:cuda-1.9 | |
| 49 | 57 | TEI_MAX_BATCH_TOKENS=2048 |
| 50 | 58 | TEI_MAX_CLIENT_BATCH_SIZE=8 |
| 51 | 59 | TEI_HEALTH_TIMEOUT_SEC=300 | ... | ... |
| ... | ... | @@ -0,0 +1,40 @@ |
| 1 | +# Test environment overrides example (no secrets). | |
| 2 | +# | |
| 3 | +# Usage: | |
| 4 | +# cp .env.example .env | |
| 5 | +# cat .env.test.example >> .env | |
| 6 | +# | |
| 7 | +# Notes: | |
| 8 | +# - This repo is multi-service; values below focus on local test deployment. | |
| 9 | +# - Keep real credentials (Redis/MySQL/ES passwords) out of VCS. | |
| 10 | + | |
| 11 | +# ===== runtime / namespace ===== | |
| 12 | +RUNTIME_ENV=test | |
| 13 | +ES_INDEX_NAMESPACE=test_ | |
| 14 | + | |
| 15 | +# ===== Elasticsearch (example: local docker on non-default port) ===== | |
| 16 | +ES_HOST=http://127.0.0.1:19200 | |
| 17 | +ES_USERNAME= | |
| 18 | +ES_PASSWORD= | |
| 19 | +ES_DOCKER_HTTP_PORT=19200 | |
| 20 | +ES_DOCKER_CONTAINER_NAME=saas-search-es9-test | |
| 21 | + | |
| 22 | +# ===== HuggingFace cache ===== | |
| 23 | +HF_CACHE_DIR=/data/tw/.cache/huggingface | |
| 24 | + | |
| 25 | +# ===== TEI (text embeddings inference) ===== | |
| 26 | +# Service port exposed by container (host:8080 -> container:80) | |
| 27 | +TEI_PORT=8080 | |
| 28 | +# Use GPU when available | |
| 29 | +TEI_DEVICE=cuda | |
| 30 | +# Use float16 for performance on GPU | |
| 31 | +TEI_DTYPE=float16 | |
| 32 | +# IMPORTANT for Tesla T4 (compute capability 7.5): use turing image tag | |
| 33 | +TEI_IMAGE=ghcr.m.daocloud.io/huggingface/text-embeddings-inference:turing-1.9 | |
| 34 | +# Example pinned model snapshot path (update per-machine) | |
| 35 | +TEI_MODEL_ID=/data/hub/models--BAAI--bge-m3/snapshots/5617a9f61b028005a4858fdac845db406aefb181 | |
| 36 | +TEI_MAX_BATCH_TOKENS=2048 | |
| 37 | +TEI_MAX_CLIENT_BATCH_SIZE=8 | |
| 38 | +TEI_HEALTH_TIMEOUT_SEC=240 | |
| 39 | +TEI_CONTAINER_NAME=saas-search-tei-test | |
| 40 | + | ... | ... |