Blame view

reranker/README.md 1.89 KB
42e3aea6   tangwang   tidy
1
2
3
4
5
  # Reranker 模块
  
  **请求示例**`docs/QUICKSTART.md` §3.5。
  
  ---
d90e7428   tangwang   补充重排
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
  
  A minimal, production-ready reranker service based on **BAAI/bge-reranker-v2-m3**.
  
  Features
  - FP16 on GPU
  - Length-based sorting to reduce padding waste
  - Deduplication to avoid redundant inference
  - Scores returned in original input order
  - Simple FastAPI service
  
  ## Files
  - `reranker/bge_reranker.py`: core model loading + scoring logic
  - `reranker/server.py`: FastAPI service with `/health` and `/rerank`
  - `reranker/config.py`: simple configuration
  
  ## Requirements
  Install Python deps (already in project requirements):
  - `torch`
  - `modelscope`
  - `fastapi`
  - `uvicorn`
  
  ## Configuration
  Edit `reranker/config.py`:
  - `MODEL_NAME`: default `BAAI/bge-reranker-v2-m3`
  - `DEVICE`: `None` (auto), `cuda`, or `cpu`
  - `USE_FP16`: enable fp16 on GPU
  - `BATCH_SIZE`: default 64
  - `MAX_LENGTH`: default 512
  - `PORT`: default 6007
  - `MAX_DOCS`: request limit (default 1000)
  
  ## Run the Service
  ```bash
  uvicorn reranker.server:app --host 0.0.0.0 --port 6007
  ```
  
  ## API
  ### Health
  ```
  GET /health
  ```
  
  ### Rerank
  ```
  POST /rerank
  Content-Type: application/json
  
  {
    "query": "wireless mouse",
    "docs": ["logitech mx master", "usb cable", "wireless mouse bluetooth"]
  }
  ```
  
  Response:
  ```
  {
    "scores": [0.93, 0.02, 0.88],
    "meta": {
      "input_docs": 3,
      "usable_docs": 3,
      "unique_docs": 3,
      "dedup_ratio": 0.0,
      "elapsed_ms": 12.4,
      "model": "BAAI/bge-reranker-v2-m3",
      "device": "cuda",
      "fp16": true,
      "batch_size": 64,
      "max_length": 512,
      "normalize": true,
      "service_elapsed_ms": 13.1
    }
  }
  ```
  
  ## Logging
  The service uses standard Python logging. For structured logs and full output,
  run uvicorn with:
  ```bash
  uvicorn reranker.server:app --host 0.0.0.0 --port 6007 --log-level info
  ```
  
  ## Notes
  - No caching is used by design.
  - Inputs are deduplicated by exact string match.
  - Empty or null docs are skipped and scored as 0.