README.md
Reranker 模块
请求示例见 docs/QUICKSTART.md §3.5。
A minimal, production-ready reranker service based on BAAI/bge-reranker-v2-m3.
Features
- FP16 on GPU
- Length-based sorting to reduce padding waste
- Deduplication to avoid redundant inference
- Scores returned in original input order
- Simple FastAPI service
Files
reranker/bge_reranker.py: core model loading + scoring logicreranker/server.py: FastAPI service with/healthand/rerankreranker/config.py: simple configuration
Requirements
Install Python deps (already in project requirements):
torchmodelscopefastapiuvicorn
Configuration
Edit reranker/config.py:
MODEL_NAME: defaultBAAI/bge-reranker-v2-m3DEVICE:None(auto),cuda, orcpuUSE_FP16: enable fp16 on GPUBATCH_SIZE: default 64MAX_LENGTH: default 512PORT: default 6007MAX_DOCS: request limit (default 1000)
Run the Service
uvicorn reranker.server:app --host 0.0.0.0 --port 6007
API
Health
GET /health
Rerank
POST /rerank
Content-Type: application/json
{
"query": "wireless mouse",
"docs": ["logitech mx master", "usb cable", "wireless mouse bluetooth"]
}
Response:
{
"scores": [0.93, 0.02, 0.88],
"meta": {
"input_docs": 3,
"usable_docs": 3,
"unique_docs": 3,
"dedup_ratio": 0.0,
"elapsed_ms": 12.4,
"model": "BAAI/bge-reranker-v2-m3",
"device": "cuda",
"fp16": true,
"batch_size": 64,
"max_length": 512,
"normalize": true,
"service_elapsed_ms": 13.1
}
}
Logging
The service uses standard Python logging. For structured logs and full output, run uvicorn with:
uvicorn reranker.server:app --host 0.0.0.0 --port 6007 --log-level info
Notes
- No caching is used by design.
- Inputs are deduplicated by exact string match.
- Empty or null docs are skipped and scored as 0.