# Reranker 模块

**请求示例**见 `docs/QUICKSTART.md` §3.5。

---

A minimal, production-ready reranker service based on **BAAI/bge-reranker-v2-m3**.

Features
- FP16 on GPU
- Length-based sorting to reduce padding waste
- Deduplication to avoid redundant inference
- Scores returned in original input order
- Simple FastAPI service

## Files
- `reranker/bge_reranker.py`: core model loading + scoring logic
- `reranker/server.py`: FastAPI service with `/health` and `/rerank`
- `reranker/config.py`: simple configuration

## Requirements
Install Python deps (already in project requirements):
- `torch`
- `modelscope`
- `fastapi`
- `uvicorn`

## Configuration
Edit `reranker/config.py`:
- `MODEL_NAME`: default `BAAI/bge-reranker-v2-m3`
- `DEVICE`: `None` (auto), `cuda`, or `cpu`
- `USE_FP16`: enable fp16 on GPU
- `BATCH_SIZE`: default 64
- `MAX_LENGTH`: default 512
- `PORT`: default 6007
- `MAX_DOCS`: request limit (default 1000)

## Run the Service
```bash
uvicorn reranker.server:app --host 0.0.0.0 --port 6007
```

## API
### Health
```
GET /health
```

### Rerank
```
POST /rerank
Content-Type: application/json

{
  "query": "wireless mouse",
  "docs": ["logitech mx master", "usb cable", "wireless mouse bluetooth"]
}
```

Response:
```
{
  "scores": [0.93, 0.02, 0.88],
  "meta": {
    "input_docs": 3,
    "usable_docs": 3,
    "unique_docs": 3,
    "dedup_ratio": 0.0,
    "elapsed_ms": 12.4,
    "model": "BAAI/bge-reranker-v2-m3",
    "device": "cuda",
    "fp16": true,
    "batch_size": 64,
    "max_length": 512,
    "normalize": true,
    "service_elapsed_ms": 13.1
  }
}
```

## Logging
The service uses standard Python logging. For structured logs and full output,
run uvicorn with:
```bash
uvicorn reranker.server:app --host 0.0.0.0 --port 6007 --log-level info
```

## Notes
- No caching is used by design.
- Inputs are deduplicated by exact string match.
- Empty or null docs are skipped and scored as 0.