Name	Last Update	Last Commit 42e3aea6 – tidy History
..
README.md	Loading commit data...
bge_reranker.py	Loading commit data...
config.py	Loading commit data...
server.py	Loading commit data...

README.md

Reranker 模块

请求示例见 docs/QUICKSTART.md §3.5。

A minimal, production-ready reranker service based on BAAI/bge-reranker-v2-m3.

Features

FP16 on GPU
Length-based sorting to reduce padding waste
Deduplication to avoid redundant inference
Scores returned in original input order
Simple FastAPI service

Files

reranker/bge_reranker.py: core model loading + scoring logic
reranker/server.py: FastAPI service with /health and /rerank
reranker/config.py: simple configuration

Requirements

Install Python deps (already in project requirements):

torch
modelscope
fastapi
uvicorn

Configuration

Edit reranker/config.py:

MODEL_NAME: default BAAI/bge-reranker-v2-m3
DEVICE: None (auto), cuda, or cpu
USE_FP16: enable fp16 on GPU
BATCH_SIZE: default 64
MAX_LENGTH: default 512
PORT: default 6007
MAX_DOCS: request limit (default 1000)

Run the Service

uvicorn reranker.server:app --host 0.0.0.0 --port 6007

API

Health

GET /health

Rerank

POST /rerank
Content-Type: application/json

{
  "query": "wireless mouse",
  "docs": ["logitech mx master", "usb cable", "wireless mouse bluetooth"]
}

Response:

{
  "scores": [0.93, 0.02, 0.88],
  "meta": {
    "input_docs": 3,
    "usable_docs": 3,
    "unique_docs": 3,
    "dedup_ratio": 0.0,
    "elapsed_ms": 12.4,
    "model": "BAAI/bge-reranker-v2-m3",
    "device": "cuda",
    "fp16": true,
    "batch_size": 64,
    "max_length": 512,
    "normalize": true,
    "service_elapsed_ms": 13.1
  }
}

Logging

The service uses standard Python logging. For structured logs and full output, run uvicorn with:

uvicorn reranker.server:app --host 0.0.0.0 --port 6007 --log-level info

Notes

No caching is used by design.
Inputs are deduplicated by exact string match.
Empty or null docs are skipped and scored as 0.

GITLAB

ai-saas / saas-search