# Reranker 模块 **请求示例**见 `docs/QUICKSTART.md` §3.5。 --- A minimal, production-ready reranker service based on **BAAI/bge-reranker-v2-m3**. Features - FP16 on GPU - Length-based sorting to reduce padding waste - Deduplication to avoid redundant inference - Scores returned in original input order - Simple FastAPI service ## Files - `reranker/bge_reranker.py`: core model loading + scoring logic - `reranker/server.py`: FastAPI service with `/health` and `/rerank` - `reranker/config.py`: simple configuration ## Requirements Install Python deps (already in project requirements): - `torch` - `modelscope` - `fastapi` - `uvicorn` ## Configuration Edit `reranker/config.py`: - `MODEL_NAME`: default `BAAI/bge-reranker-v2-m3` - `DEVICE`: `None` (auto), `cuda`, or `cpu` - `USE_FP16`: enable fp16 on GPU - `BATCH_SIZE`: default 64 - `MAX_LENGTH`: default 512 - `PORT`: default 6007 - `MAX_DOCS`: request limit (default 1000) ## Run the Service ```bash uvicorn reranker.server:app --host 0.0.0.0 --port 6007 ``` ## API ### Health ``` GET /health ``` ### Rerank ``` POST /rerank Content-Type: application/json { "query": "wireless mouse", "docs": ["logitech mx master", "usb cable", "wireless mouse bluetooth"] } ``` Response: ``` { "scores": [0.93, 0.02, 0.88], "meta": { "input_docs": 3, "usable_docs": 3, "unique_docs": 3, "dedup_ratio": 0.0, "elapsed_ms": 12.4, "model": "BAAI/bge-reranker-v2-m3", "device": "cuda", "fp16": true, "batch_size": 64, "max_length": 512, "normalize": true, "service_elapsed_ms": 13.1 } } ``` ## Logging The service uses standard Python logging. For structured logs and full output, run uvicorn with: ```bash uvicorn reranker.server:app --host 0.0.0.0 --port 6007 --log-level info ``` ## Notes - No caching is used by design. - Inputs are deduplicated by exact string match. - Empty or null docs are skipped and scored as 0.