README.md 1.85 KB

Reranker Service (BGE v2 m3)

A minimal, production-ready reranker service based on BAAI/bge-reranker-v2-m3.

Features

  • FP16 on GPU
  • Length-based sorting to reduce padding waste
  • Deduplication to avoid redundant inference
  • Scores returned in original input order
  • Simple FastAPI service

Files

  • reranker/bge_reranker.py: core model loading + scoring logic
  • reranker/server.py: FastAPI service with /health and /rerank
  • reranker/config.py: simple configuration

Requirements

Install Python deps (already in project requirements):

  • torch
  • modelscope
  • fastapi
  • uvicorn

Configuration

Edit reranker/config.py:

  • MODEL_NAME: default BAAI/bge-reranker-v2-m3
  • DEVICE: None (auto), cuda, or cpu
  • USE_FP16: enable fp16 on GPU
  • BATCH_SIZE: default 64
  • MAX_LENGTH: default 512
  • PORT: default 6007
  • MAX_DOCS: request limit (default 1000)

Run the Service

uvicorn reranker.server:app --host 0.0.0.0 --port 6007

API

Health

GET /health

Rerank

POST /rerank
Content-Type: application/json

{
  "query": "wireless mouse",
  "docs": ["logitech mx master", "usb cable", "wireless mouse bluetooth"]
}

Response:

{
  "scores": [0.93, 0.02, 0.88],
  "meta": {
    "input_docs": 3,
    "usable_docs": 3,
    "unique_docs": 3,
    "dedup_ratio": 0.0,
    "elapsed_ms": 12.4,
    "model": "BAAI/bge-reranker-v2-m3",
    "device": "cuda",
    "fp16": true,
    "batch_size": 64,
    "max_length": 512,
    "normalize": true,
    "service_elapsed_ms": 13.1
  }
}

Logging

The service uses standard Python logging. For structured logs and full output, run uvicorn with:

uvicorn reranker.server:app --host 0.0.0.0 --port 6007 --log-level info

Notes

  • No caching is used by design.
  • Inputs are deduplicated by exact string match.
  • Empty or null docs are skipped and scored as 0.