Commit 971a085177e6b04b92dc5ca6fcad888121774e06

Authored by tangwang
1 parent 93be98cb

补充reranker-jina,探索listwize的优势

config/config.yaml
... ... @@ -238,7 +238,7 @@ services:
238 238 translation:
239 239 service_url: "http://127.0.0.1:6006"
240 240 # default_model: "nllb-200-distilled-600m"
241   - default_model: "deepl"
  241 + default_model: "nllb-200-distilled-600m"
242 242 default_scene: "general"
243 243 timeout_sec: 10.0
244 244 cache:
... ... @@ -382,7 +382,7 @@ services:
382 382 max_docs: 1000
383 383 normalize: true
384 384 # 服务内后端(reranker 进程启动时读取)
385   - backend: "qwen3_vllm_score" # bge | qwen3_vllm | qwen3_vllm_score | qwen3_transformers | qwen3_transformers_packed | qwen3_gguf | qwen3_gguf_06b | dashscope_rerank
  385 + backend: "qwen3_vllm_score" # bge | jina_reranker_v3 | qwen3_vllm | qwen3_vllm_score | qwen3_transformers | qwen3_transformers_packed | qwen3_gguf | qwen3_gguf_06b | dashscope_rerank
386 386 backends:
387 387 bge:
388 388 model_name: "BAAI/bge-reranker-v2-m3"
... ... @@ -392,6 +392,13 @@ services:
392 392 max_length: 160
393 393 cache_dir: "./model_cache"
394 394 enable_warmup: true
  395 + jina_reranker_v3:
  396 + model_name: "jinaai/jina-reranker-v3"
  397 + device: null
  398 + dtype: "auto"
  399 + batch_size: 64
  400 + cache_dir: "./model_cache"
  401 + trust_remote_code: true
395 402 qwen3_vllm:
396 403 model_name: "Qwen/Qwen3-Reranker-0.6B"
397 404 engine: "vllm"
... ...
requirements_reranker_jina_reranker_v3.txt 0 → 100644
... ... @@ -0,0 +1,5 @@
  1 +# Isolated dependencies for jina_reranker_v3 reranker backend.
  2 +
  3 +-r requirements_reranker_base.txt
  4 +torch>=2.0.0
  5 +transformers>=4.51.0
... ...
reranker/README.md
... ... @@ -4,7 +4,7 @@
4 4  
5 5 ---
6 6  
7   -Reranker 服务提供统一的 `/rerank` API,支持可插拔后端(BGE、Qwen3-vLLM、Qwen3-Transformers、Qwen3-GGUF、DashScope 云重排)。调用方通过 HTTP 访问,不关心具体后端。
  7 +Reranker 服务提供统一的 `/rerank` API,支持可插拔后端(BGE、Jina Reranker v3、Qwen3-vLLM、Qwen3-Transformers、Qwen3-GGUF、DashScope 云重排)。调用方通过 HTTP 访问,不关心具体后端。
8 8  
9 9 ## 当前结论
10 10  
... ... @@ -26,6 +26,7 @@ Reranker 服务提供统一的 `/rerank` API,支持可插拔后端(BGE、Qwe
26 26 |------|----------|------|
27 27 | `qwen3_vllm_score` | 主推荐 | 走 vLLM **`LLM.score()`** 的 **pooling / classify** 路径:对每条 (query, doc) **直接产出相关分**,不经 causal LM 的整步 **generate**。相对 **`qwen3_vllm`**(`generate(max_tokens=1)` + **yes/no** 的 logprob 推导),**省去**每对样本上**大词表 softmax / 采样约束**那一层的常规开销,语义与 cross-encoder 式 rerank 更一致;在当前栈与 T4 上延迟表现最好 |
28 28 | `qwen3_vllm` | 次推荐 | 稳定、成熟、好排障,是很好的 fallback 和对照组 |
  29 +| `jina_reranker_v3` | 新增本地方案 | 按官方推荐使用 `AutoModel(..., trust_remote_code=True)` + `model.rerank(query, docs)`,更接近 Jina 原生 listwise rerank 用法 |
29 30 | `qwen3_transformers` | 兼容方案 | |
30 31 | `qwen3_transformers_packed` | 特定场景方案 | T可能实现还有问题,没调好 |
31 32 | `qwen3_gguf` / `qwen3_gguf_06b` | 低显存 / 功能兜底 | 更适合资源受限场景,不适合作为当前主在线方案 |
... ... @@ -36,6 +37,7 @@ Reranker 服务提供统一的 `/rerank` API,支持可插拔后端(BGE、Qwe
36 37 - `reranker/server.py`:FastAPI 服务,启动时按配置加载一个后端
37 38 - `reranker/backends/`:后端实现与工厂
38 39 - `backends/__init__.py`:`get_rerank_backend(name, config)`
  40 + - `backends/jina_reranker_v3.py`:Jina 官方 `model.rerank(...)` 接法
39 41 - `backends/qwen3_vllm_score.py`:当前最优的本地 GPU reranker
40 42 - `backends/qwen3_vllm.py`:次优的本地 GPU reranker
41 43 - `backends/qwen3_transformers.py`:Transformers 基线实现
... ... @@ -64,6 +66,7 @@ Reranker 服务提供统一的 `/rerank` API,支持可插拔后端(BGE、Qwe
64 66  
65 67 - `qwen3_vllm` -> `.venv-reranker`
66 68 - `qwen3_vllm_score` -> `.venv-reranker-score`
  69 +- `jina_reranker_v3` -> `.venv-reranker-jina`
67 70 - `qwen3_transformers` -> `.venv-reranker-transformers`
68 71 - `qwen3_transformers_packed` -> `.venv-reranker-transformers-packed`
69 72 - `qwen3_gguf` -> `.venv-reranker-gguf`
... ... @@ -91,6 +94,12 @@ Reranker 服务提供统一的 `/rerank` API,支持可插拔后端(BGE、Qwe
91 94 ./scripts/setup_reranker_venv.sh qwen3_vllm
92 95 ```
93 96  
  97 +`jina_reranker_v3`:
  98 +
  99 +```bash
  100 +./scripts/setup_reranker_venv.sh jina_reranker_v3
  101 +```
  102 +
94 103 ### 2. 基础检查
95 104  
96 105 ```bash
... ... @@ -112,6 +121,43 @@ nvidia-smi
112 121 PYTHONPATH=. ./.venv-reranker-score/bin/python scripts/smoke_qwen3_vllm_score_backend.py --gpu-memory-utilization 0.2
113 122 ```
114 123  
  124 +## `jina_reranker_v3`
  125 +
  126 +该后端参考 Jina 官方模型卡接入,使用:
  127 +
  128 +```python
  129 +from transformers import AutoModel
  130 +
  131 +model = AutoModel.from_pretrained(
  132 + "jinaai/jina-reranker-v3",
  133 + dtype="auto",
  134 + trust_remote_code=True,
  135 +)
  136 +results = model.rerank(query, documents)
  137 +```
  138 +
  139 +服务内实现补了几件工程化工作:
  140 +
  141 +- 统一适配 `/rerank` 协议,返回与输入 docs 对齐的 `scores`
  142 +- 对空文档与重复文档做预处理,避免重复推理
  143 +- 支持 `top_n` hint,并保留原始输入顺序输出
  144 +- 保留 `cache_dir` / `device` / `dtype` / `batch_size` 等配置项
  145 +
  146 +推荐配置:
  147 +
  148 +```yaml
  149 +services:
  150 + rerank:
  151 + backends:
  152 + jina_reranker_v3:
  153 + model_name: "jinaai/jina-reranker-v3"
  154 + device: null
  155 + dtype: "auto"
  156 + batch_size: 64
  157 + cache_dir: "./model_cache"
  158 + trust_remote_code: true
  159 +```
  160 +
115 161 ## 当前最优方案:`qwen3_vllm_score`
116 162  
117 163  
... ... @@ -238,4 +284,4 @@ ll tests/reranker_performance/
238 284 curl1.sh
239 285 curl1_simple.sh
240 286 rerank_performance_compare.sh
241   -```
242 287 \ No newline at end of file
  288 +```
... ...
reranker/backends/__init__.py
... ... @@ -40,6 +40,9 @@ def get_rerank_backend(name: str, config: Dict[str, Any]) -> RerankBackendProtoc
40 40 if name == "bge":
41 41 from reranker.backends.bge import BGERerankerBackend
42 42 return BGERerankerBackend(config)
  43 + if name == "jina_reranker_v3":
  44 + from reranker.backends.jina_reranker_v3 import JinaRerankerV3Backend
  45 + return JinaRerankerV3Backend(config)
43 46 if name == "qwen3_vllm":
44 47 from reranker.backends.qwen3_vllm import Qwen3VLLMRerankerBackend
45 48 return Qwen3VLLMRerankerBackend(config)
... ... @@ -68,7 +71,7 @@ def get_rerank_backend(name: str, config: Dict[str, Any]) -> RerankBackendProtoc
68 71 from reranker.backends.dashscope_rerank import DashScopeRerankBackend
69 72 return DashScopeRerankBackend(config)
70 73 raise ValueError(
71   - f"Unknown rerank backend: {name!r}. Supported: bge, qwen3_vllm, qwen3_vllm_score, qwen3_transformers, qwen3_transformers_packed, qwen3_gguf, qwen3_gguf_06b, dashscope_rerank"
  74 + f"Unknown rerank backend: {name!r}. Supported: bge, jina_reranker_v3, qwen3_vllm, qwen3_vllm_score, qwen3_transformers, qwen3_transformers_packed, qwen3_gguf, qwen3_gguf_06b, dashscope_rerank"
72 75 )
73 76  
74 77  
... ...
reranker/backends/jina_reranker_v3.py 0 → 100644
... ... @@ -0,0 +1,193 @@
  1 +"""
  2 +Jina reranker v3 backend using the model card's recommended AutoModel API.
  3 +
  4 +Reference: https://huggingface.co/jinaai/jina-reranker-v3
  5 +Requires: transformers, torch.
  6 +"""
  7 +
  8 +from __future__ import annotations
  9 +
  10 +import logging
  11 +import threading
  12 +import time
  13 +from typing import Any, Dict, List, Tuple
  14 +
  15 +import torch
  16 +from transformers import AutoModel
  17 +
  18 +logger = logging.getLogger("reranker.backends.jina_reranker_v3")
  19 +
  20 +
  21 +class JinaRerankerV3Backend:
  22 + """
  23 + jina-reranker-v3 backend using `AutoModel(..., trust_remote_code=True)`.
  24 +
  25 + The official model card recommends calling:
  26 + model = AutoModel.from_pretrained(..., trust_remote_code=True)
  27 + model.rerank(query, documents, top_n=...)
  28 +
  29 + Config from services.rerank.backends.jina_reranker_v3.
  30 + """
  31 +
  32 + def __init__(self, config: Dict[str, Any]) -> None:
  33 + self._config = config or {}
  34 + self._model_name = str(
  35 + self._config.get("model_name") or "jinaai/jina-reranker-v3"
  36 + )
  37 + self._cache_dir = self._config.get("cache_dir") or "./model_cache"
  38 + self._dtype = str(self._config.get("dtype") or "auto")
  39 + self._device = self._config.get("device")
  40 + self._batch_size = max(1, int(self._config.get("batch_size", 64)))
  41 + self._return_embeddings = bool(self._config.get("return_embeddings", False))
  42 + self._trust_remote_code = bool(self._config.get("trust_remote_code", True))
  43 + self._lock = threading.Lock()
  44 +
  45 + logger.info(
  46 + "[Jina_Reranker_V3] Loading model %s (dtype=%s, device=%s, batch=%s)",
  47 + self._model_name,
  48 + self._dtype,
  49 + self._device,
  50 + self._batch_size,
  51 + )
  52 +
  53 + load_kwargs: Dict[str, Any] = {
  54 + "trust_remote_code": self._trust_remote_code,
  55 + "cache_dir": self._cache_dir,
  56 + "dtype": self._dtype,
  57 + }
  58 + self._model = AutoModel.from_pretrained(self._model_name, **load_kwargs)
  59 + self._model.eval()
  60 +
  61 + if self._device is not None:
  62 + self._model = self._model.to(self._device)
  63 + elif torch.cuda.is_available():
  64 + self._device = "cuda"
  65 + self._model = self._model.to(self._device)
  66 + else:
  67 + self._device = "cpu"
  68 +
  69 + logger.info(
  70 + "[Jina_Reranker_V3] Model ready | model=%s device=%s",
  71 + self._model_name,
  72 + self._device,
  73 + )
  74 +
  75 + def score_with_meta(
  76 + self,
  77 + query: str,
  78 + docs: List[str],
  79 + normalize: bool = True,
  80 + ) -> Tuple[List[float], Dict[str, Any]]:
  81 + return self.score_with_meta_topn(query, docs, normalize=normalize, top_n=None)
  82 +
  83 + def score_with_meta_topn(
  84 + self,
  85 + query: str,
  86 + docs: List[str],
  87 + normalize: bool = True,
  88 + top_n: int | None = None,
  89 + ) -> Tuple[List[float], Dict[str, Any]]:
  90 + start_ts = time.time()
  91 + total_docs = len(docs) if docs else 0
  92 + output_scores: List[float] = [0.0] * total_docs
  93 +
  94 + query = "" if query is None else str(query).strip()
  95 + indexed: List[Tuple[int, str]] = []
  96 + for i, doc in enumerate(docs or []):
  97 + if doc is None:
  98 + continue
  99 + text = str(doc).strip()
  100 + if not text:
  101 + continue
  102 + indexed.append((i, text))
  103 +
  104 + if not query or not indexed:
  105 + elapsed_ms = (time.time() - start_ts) * 1000.0
  106 + return output_scores, {
  107 + "input_docs": total_docs,
  108 + "usable_docs": len(indexed),
  109 + "unique_docs": 0,
  110 + "dedup_ratio": 0.0,
  111 + "elapsed_ms": round(elapsed_ms, 3),
  112 + "model": self._model_name,
  113 + "backend": "jina_reranker_v3",
  114 + "normalize": normalize,
  115 + "normalize_note": "jina_reranker_v3 returns model relevance scores directly",
  116 + }
  117 +
  118 + unique_texts: List[str] = []
  119 + unique_first_indices: List[int] = []
  120 + text_to_unique_idx: Dict[str, int] = {}
  121 + for orig_idx, text in indexed:
  122 + unique_idx = text_to_unique_idx.get(text)
  123 + if unique_idx is None:
  124 + unique_idx = len(unique_texts)
  125 + text_to_unique_idx[text] = unique_idx
  126 + unique_texts.append(text)
  127 + unique_first_indices.append(orig_idx)
  128 +
  129 + effective_top_n = min(top_n, len(unique_texts)) if top_n is not None else None
  130 +
  131 + unique_scores = self._rerank_unique(
  132 + query=query,
  133 + docs=unique_texts,
  134 + top_n=effective_top_n,
  135 + )
  136 +
  137 + for orig_idx, text in indexed:
  138 + unique_idx = text_to_unique_idx[text]
  139 + output_scores[orig_idx] = float(unique_scores[unique_idx])
  140 +
  141 + elapsed_ms = (time.time() - start_ts) * 1000.0
  142 + dedup_ratio = 1.0 - (len(unique_texts) / float(len(indexed))) if indexed else 0.0
  143 + meta = {
  144 + "input_docs": total_docs,
  145 + "usable_docs": len(indexed),
  146 + "unique_docs": len(unique_texts),
  147 + "dedup_ratio": round(dedup_ratio, 4),
  148 + "elapsed_ms": round(elapsed_ms, 3),
  149 + "model": self._model_name,
  150 + "backend": "jina_reranker_v3",
  151 + "device": self._device,
  152 + "dtype": self._dtype,
  153 + "batch_size": self._batch_size,
  154 + "normalize": normalize,
  155 + "normalize_note": "jina_reranker_v3 returns model relevance scores directly",
  156 + }
  157 + if effective_top_n is not None:
  158 + meta["top_n"] = effective_top_n
  159 + if len(unique_texts) > self._batch_size:
  160 + meta["top_n_note"] = (
  161 + "Applied as a request hint only; full scores were computed because "
  162 + "global top_n across multiple local batches would be lossy."
  163 + )
  164 + return output_scores, meta
  165 +
  166 + def _rerank_unique(
  167 + self,
  168 + query: str,
  169 + docs: List[str],
  170 + top_n: int | None,
  171 + ) -> List[float]:
  172 + if not docs:
  173 + return []
  174 +
  175 + unique_scores: List[float] = [0.0] * len(docs)
  176 +
  177 + with self._lock:
  178 + for start in range(0, len(docs), self._batch_size):
  179 + batch_docs = docs[start : start + self._batch_size]
  180 + batch_top_n = None
  181 + if top_n is not None and len(docs) <= self._batch_size:
  182 + batch_top_n = min(top_n, len(batch_docs))
  183 + results = self._model.rerank(
  184 + query,
  185 + batch_docs,
  186 + top_n=batch_top_n,
  187 + return_embeddings=self._return_embeddings,
  188 + )
  189 + for item in results:
  190 + batch_index = int(item["index"])
  191 + unique_scores[start + batch_index] = float(item["relevance_score"])
  192 +
  193 + return unique_scores
... ...
reranker/server.py
... ... @@ -7,7 +7,7 @@ Request: { &quot;query&quot;: &quot;...&quot;, &quot;docs&quot;: [&quot;doc1&quot;, &quot;doc2&quot;, ...], &quot;normalize&quot;: optional
7 7 Response: { "scores": [float], "meta": {...} }
8 8  
9 9 Backend selected via config: services.rerank.backend
10   -(bge | qwen3_vllm | qwen3_vllm_score | qwen3_transformers | qwen3_transformers_packed | qwen3_gguf | qwen3_gguf_06b | dashscope_rerank), env RERANK_BACKEND.
  10 +(bge | jina_reranker_v3 | qwen3_vllm | qwen3_vllm_score | qwen3_transformers | qwen3_transformers_packed | qwen3_gguf | qwen3_gguf_06b | dashscope_rerank), env RERANK_BACKEND.
11 11 """
12 12  
13 13 import logging
... ...
scripts/lib/reranker_backend_env.sh
... ... @@ -40,6 +40,7 @@ reranker_backend_venv_dir() {
40 40 case "${backend}" in
41 41 qwen3_vllm) printf '%s/.venv-reranker\n' "${project_root}" ;;
42 42 qwen3_vllm_score) printf '%s/.venv-reranker-score\n' "${project_root}" ;;
  43 + jina_reranker_v3) printf '%s/.venv-reranker-jina\n' "${project_root}" ;;
43 44 qwen3_gguf) printf '%s/.venv-reranker-gguf\n' "${project_root}" ;;
44 45 qwen3_gguf_06b) printf '%s/.venv-reranker-gguf-06b\n' "${project_root}" ;;
45 46 qwen3_transformers) printf '%s/.venv-reranker-transformers\n' "${project_root}" ;;
... ... @@ -57,6 +58,7 @@ reranker_backend_requirements_file() {
57 58 case "${backend}" in
58 59 qwen3_vllm) printf '%s/requirements_reranker_qwen3_vllm.txt\n' "${project_root}" ;;
59 60 qwen3_vllm_score) printf '%s/requirements_reranker_qwen3_vllm_score.txt\n' "${project_root}" ;;
  61 + jina_reranker_v3) printf '%s/requirements_reranker_jina_reranker_v3.txt\n' "${project_root}" ;;
60 62 qwen3_gguf) printf '%s/requirements_reranker_qwen3_gguf.txt\n' "${project_root}" ;;
61 63 qwen3_gguf_06b) printf '%s/requirements_reranker_qwen3_gguf_06b.txt\n' "${project_root}" ;;
62 64 qwen3_transformers) printf '%s/requirements_reranker_qwen3_transformers.txt\n' "${project_root}" ;;
... ...