Commit 971a085177e6b04b92dc5ca6fcad888121774e06
1 parent
93be98cb
补充reranker-jina,探索listwize的优势
Showing
7 changed files
with
262 additions
and
6 deletions
Show diff stats
config/config.yaml
| ... | ... | @@ -238,7 +238,7 @@ services: |
| 238 | 238 | translation: |
| 239 | 239 | service_url: "http://127.0.0.1:6006" |
| 240 | 240 | # default_model: "nllb-200-distilled-600m" |
| 241 | - default_model: "deepl" | |
| 241 | + default_model: "nllb-200-distilled-600m" | |
| 242 | 242 | default_scene: "general" |
| 243 | 243 | timeout_sec: 10.0 |
| 244 | 244 | cache: |
| ... | ... | @@ -382,7 +382,7 @@ services: |
| 382 | 382 | max_docs: 1000 |
| 383 | 383 | normalize: true |
| 384 | 384 | # 服务内后端(reranker 进程启动时读取) |
| 385 | - backend: "qwen3_vllm_score" # bge | qwen3_vllm | qwen3_vllm_score | qwen3_transformers | qwen3_transformers_packed | qwen3_gguf | qwen3_gguf_06b | dashscope_rerank | |
| 385 | + backend: "qwen3_vllm_score" # bge | jina_reranker_v3 | qwen3_vllm | qwen3_vllm_score | qwen3_transformers | qwen3_transformers_packed | qwen3_gguf | qwen3_gguf_06b | dashscope_rerank | |
| 386 | 386 | backends: |
| 387 | 387 | bge: |
| 388 | 388 | model_name: "BAAI/bge-reranker-v2-m3" |
| ... | ... | @@ -392,6 +392,13 @@ services: |
| 392 | 392 | max_length: 160 |
| 393 | 393 | cache_dir: "./model_cache" |
| 394 | 394 | enable_warmup: true |
| 395 | + jina_reranker_v3: | |
| 396 | + model_name: "jinaai/jina-reranker-v3" | |
| 397 | + device: null | |
| 398 | + dtype: "auto" | |
| 399 | + batch_size: 64 | |
| 400 | + cache_dir: "./model_cache" | |
| 401 | + trust_remote_code: true | |
| 395 | 402 | qwen3_vllm: |
| 396 | 403 | model_name: "Qwen/Qwen3-Reranker-0.6B" |
| 397 | 404 | engine: "vllm" | ... | ... |
reranker/README.md
| ... | ... | @@ -4,7 +4,7 @@ |
| 4 | 4 | |
| 5 | 5 | --- |
| 6 | 6 | |
| 7 | -Reranker 服务提供统一的 `/rerank` API,支持可插拔后端(BGE、Qwen3-vLLM、Qwen3-Transformers、Qwen3-GGUF、DashScope 云重排)。调用方通过 HTTP 访问,不关心具体后端。 | |
| 7 | +Reranker 服务提供统一的 `/rerank` API,支持可插拔后端(BGE、Jina Reranker v3、Qwen3-vLLM、Qwen3-Transformers、Qwen3-GGUF、DashScope 云重排)。调用方通过 HTTP 访问,不关心具体后端。 | |
| 8 | 8 | |
| 9 | 9 | ## 当前结论 |
| 10 | 10 | |
| ... | ... | @@ -26,6 +26,7 @@ Reranker 服务提供统一的 `/rerank` API,支持可插拔后端(BGE、Qwe |
| 26 | 26 | |------|----------|------| |
| 27 | 27 | | `qwen3_vllm_score` | 主推荐 | 走 vLLM **`LLM.score()`** 的 **pooling / classify** 路径:对每条 (query, doc) **直接产出相关分**,不经 causal LM 的整步 **generate**。相对 **`qwen3_vllm`**(`generate(max_tokens=1)` + **yes/no** 的 logprob 推导),**省去**每对样本上**大词表 softmax / 采样约束**那一层的常规开销,语义与 cross-encoder 式 rerank 更一致;在当前栈与 T4 上延迟表现最好 | |
| 28 | 28 | | `qwen3_vllm` | 次推荐 | 稳定、成熟、好排障,是很好的 fallback 和对照组 | |
| 29 | +| `jina_reranker_v3` | 新增本地方案 | 按官方推荐使用 `AutoModel(..., trust_remote_code=True)` + `model.rerank(query, docs)`,更接近 Jina 原生 listwise rerank 用法 | | |
| 29 | 30 | | `qwen3_transformers` | 兼容方案 | | |
| 30 | 31 | | `qwen3_transformers_packed` | 特定场景方案 | T可能实现还有问题,没调好 | |
| 31 | 32 | | `qwen3_gguf` / `qwen3_gguf_06b` | 低显存 / 功能兜底 | 更适合资源受限场景,不适合作为当前主在线方案 | |
| ... | ... | @@ -36,6 +37,7 @@ Reranker 服务提供统一的 `/rerank` API,支持可插拔后端(BGE、Qwe |
| 36 | 37 | - `reranker/server.py`:FastAPI 服务,启动时按配置加载一个后端 |
| 37 | 38 | - `reranker/backends/`:后端实现与工厂 |
| 38 | 39 | - `backends/__init__.py`:`get_rerank_backend(name, config)` |
| 40 | + - `backends/jina_reranker_v3.py`:Jina 官方 `model.rerank(...)` 接法 | |
| 39 | 41 | - `backends/qwen3_vllm_score.py`:当前最优的本地 GPU reranker |
| 40 | 42 | - `backends/qwen3_vllm.py`:次优的本地 GPU reranker |
| 41 | 43 | - `backends/qwen3_transformers.py`:Transformers 基线实现 |
| ... | ... | @@ -64,6 +66,7 @@ Reranker 服务提供统一的 `/rerank` API,支持可插拔后端(BGE、Qwe |
| 64 | 66 | |
| 65 | 67 | - `qwen3_vllm` -> `.venv-reranker` |
| 66 | 68 | - `qwen3_vllm_score` -> `.venv-reranker-score` |
| 69 | +- `jina_reranker_v3` -> `.venv-reranker-jina` | |
| 67 | 70 | - `qwen3_transformers` -> `.venv-reranker-transformers` |
| 68 | 71 | - `qwen3_transformers_packed` -> `.venv-reranker-transformers-packed` |
| 69 | 72 | - `qwen3_gguf` -> `.venv-reranker-gguf` |
| ... | ... | @@ -91,6 +94,12 @@ Reranker 服务提供统一的 `/rerank` API,支持可插拔后端(BGE、Qwe |
| 91 | 94 | ./scripts/setup_reranker_venv.sh qwen3_vllm |
| 92 | 95 | ``` |
| 93 | 96 | |
| 97 | +`jina_reranker_v3`: | |
| 98 | + | |
| 99 | +```bash | |
| 100 | +./scripts/setup_reranker_venv.sh jina_reranker_v3 | |
| 101 | +``` | |
| 102 | + | |
| 94 | 103 | ### 2. 基础检查 |
| 95 | 104 | |
| 96 | 105 | ```bash |
| ... | ... | @@ -112,6 +121,43 @@ nvidia-smi |
| 112 | 121 | PYTHONPATH=. ./.venv-reranker-score/bin/python scripts/smoke_qwen3_vllm_score_backend.py --gpu-memory-utilization 0.2 |
| 113 | 122 | ``` |
| 114 | 123 | |
| 124 | +## `jina_reranker_v3` | |
| 125 | + | |
| 126 | +该后端参考 Jina 官方模型卡接入,使用: | |
| 127 | + | |
| 128 | +```python | |
| 129 | +from transformers import AutoModel | |
| 130 | + | |
| 131 | +model = AutoModel.from_pretrained( | |
| 132 | + "jinaai/jina-reranker-v3", | |
| 133 | + dtype="auto", | |
| 134 | + trust_remote_code=True, | |
| 135 | +) | |
| 136 | +results = model.rerank(query, documents) | |
| 137 | +``` | |
| 138 | + | |
| 139 | +服务内实现补了几件工程化工作: | |
| 140 | + | |
| 141 | +- 统一适配 `/rerank` 协议,返回与输入 docs 对齐的 `scores` | |
| 142 | +- 对空文档与重复文档做预处理,避免重复推理 | |
| 143 | +- 支持 `top_n` hint,并保留原始输入顺序输出 | |
| 144 | +- 保留 `cache_dir` / `device` / `dtype` / `batch_size` 等配置项 | |
| 145 | + | |
| 146 | +推荐配置: | |
| 147 | + | |
| 148 | +```yaml | |
| 149 | +services: | |
| 150 | + rerank: | |
| 151 | + backends: | |
| 152 | + jina_reranker_v3: | |
| 153 | + model_name: "jinaai/jina-reranker-v3" | |
| 154 | + device: null | |
| 155 | + dtype: "auto" | |
| 156 | + batch_size: 64 | |
| 157 | + cache_dir: "./model_cache" | |
| 158 | + trust_remote_code: true | |
| 159 | +``` | |
| 160 | + | |
| 115 | 161 | ## 当前最优方案:`qwen3_vllm_score` |
| 116 | 162 | |
| 117 | 163 | |
| ... | ... | @@ -238,4 +284,4 @@ ll tests/reranker_performance/ |
| 238 | 284 | curl1.sh |
| 239 | 285 | curl1_simple.sh |
| 240 | 286 | rerank_performance_compare.sh |
| 241 | -``` | |
| 242 | 287 | \ No newline at end of file |
| 288 | +``` | ... | ... |
reranker/backends/__init__.py
| ... | ... | @@ -40,6 +40,9 @@ def get_rerank_backend(name: str, config: Dict[str, Any]) -> RerankBackendProtoc |
| 40 | 40 | if name == "bge": |
| 41 | 41 | from reranker.backends.bge import BGERerankerBackend |
| 42 | 42 | return BGERerankerBackend(config) |
| 43 | + if name == "jina_reranker_v3": | |
| 44 | + from reranker.backends.jina_reranker_v3 import JinaRerankerV3Backend | |
| 45 | + return JinaRerankerV3Backend(config) | |
| 43 | 46 | if name == "qwen3_vllm": |
| 44 | 47 | from reranker.backends.qwen3_vllm import Qwen3VLLMRerankerBackend |
| 45 | 48 | return Qwen3VLLMRerankerBackend(config) |
| ... | ... | @@ -68,7 +71,7 @@ def get_rerank_backend(name: str, config: Dict[str, Any]) -> RerankBackendProtoc |
| 68 | 71 | from reranker.backends.dashscope_rerank import DashScopeRerankBackend |
| 69 | 72 | return DashScopeRerankBackend(config) |
| 70 | 73 | raise ValueError( |
| 71 | - f"Unknown rerank backend: {name!r}. Supported: bge, qwen3_vllm, qwen3_vllm_score, qwen3_transformers, qwen3_transformers_packed, qwen3_gguf, qwen3_gguf_06b, dashscope_rerank" | |
| 74 | + f"Unknown rerank backend: {name!r}. Supported: bge, jina_reranker_v3, qwen3_vllm, qwen3_vllm_score, qwen3_transformers, qwen3_transformers_packed, qwen3_gguf, qwen3_gguf_06b, dashscope_rerank" | |
| 72 | 75 | ) |
| 73 | 76 | |
| 74 | 77 | ... | ... |
| ... | ... | @@ -0,0 +1,193 @@ |
| 1 | +""" | |
| 2 | +Jina reranker v3 backend using the model card's recommended AutoModel API. | |
| 3 | + | |
| 4 | +Reference: https://huggingface.co/jinaai/jina-reranker-v3 | |
| 5 | +Requires: transformers, torch. | |
| 6 | +""" | |
| 7 | + | |
| 8 | +from __future__ import annotations | |
| 9 | + | |
| 10 | +import logging | |
| 11 | +import threading | |
| 12 | +import time | |
| 13 | +from typing import Any, Dict, List, Tuple | |
| 14 | + | |
| 15 | +import torch | |
| 16 | +from transformers import AutoModel | |
| 17 | + | |
| 18 | +logger = logging.getLogger("reranker.backends.jina_reranker_v3") | |
| 19 | + | |
| 20 | + | |
| 21 | +class JinaRerankerV3Backend: | |
| 22 | + """ | |
| 23 | + jina-reranker-v3 backend using `AutoModel(..., trust_remote_code=True)`. | |
| 24 | + | |
| 25 | + The official model card recommends calling: | |
| 26 | + model = AutoModel.from_pretrained(..., trust_remote_code=True) | |
| 27 | + model.rerank(query, documents, top_n=...) | |
| 28 | + | |
| 29 | + Config from services.rerank.backends.jina_reranker_v3. | |
| 30 | + """ | |
| 31 | + | |
| 32 | + def __init__(self, config: Dict[str, Any]) -> None: | |
| 33 | + self._config = config or {} | |
| 34 | + self._model_name = str( | |
| 35 | + self._config.get("model_name") or "jinaai/jina-reranker-v3" | |
| 36 | + ) | |
| 37 | + self._cache_dir = self._config.get("cache_dir") or "./model_cache" | |
| 38 | + self._dtype = str(self._config.get("dtype") or "auto") | |
| 39 | + self._device = self._config.get("device") | |
| 40 | + self._batch_size = max(1, int(self._config.get("batch_size", 64))) | |
| 41 | + self._return_embeddings = bool(self._config.get("return_embeddings", False)) | |
| 42 | + self._trust_remote_code = bool(self._config.get("trust_remote_code", True)) | |
| 43 | + self._lock = threading.Lock() | |
| 44 | + | |
| 45 | + logger.info( | |
| 46 | + "[Jina_Reranker_V3] Loading model %s (dtype=%s, device=%s, batch=%s)", | |
| 47 | + self._model_name, | |
| 48 | + self._dtype, | |
| 49 | + self._device, | |
| 50 | + self._batch_size, | |
| 51 | + ) | |
| 52 | + | |
| 53 | + load_kwargs: Dict[str, Any] = { | |
| 54 | + "trust_remote_code": self._trust_remote_code, | |
| 55 | + "cache_dir": self._cache_dir, | |
| 56 | + "dtype": self._dtype, | |
| 57 | + } | |
| 58 | + self._model = AutoModel.from_pretrained(self._model_name, **load_kwargs) | |
| 59 | + self._model.eval() | |
| 60 | + | |
| 61 | + if self._device is not None: | |
| 62 | + self._model = self._model.to(self._device) | |
| 63 | + elif torch.cuda.is_available(): | |
| 64 | + self._device = "cuda" | |
| 65 | + self._model = self._model.to(self._device) | |
| 66 | + else: | |
| 67 | + self._device = "cpu" | |
| 68 | + | |
| 69 | + logger.info( | |
| 70 | + "[Jina_Reranker_V3] Model ready | model=%s device=%s", | |
| 71 | + self._model_name, | |
| 72 | + self._device, | |
| 73 | + ) | |
| 74 | + | |
| 75 | + def score_with_meta( | |
| 76 | + self, | |
| 77 | + query: str, | |
| 78 | + docs: List[str], | |
| 79 | + normalize: bool = True, | |
| 80 | + ) -> Tuple[List[float], Dict[str, Any]]: | |
| 81 | + return self.score_with_meta_topn(query, docs, normalize=normalize, top_n=None) | |
| 82 | + | |
| 83 | + def score_with_meta_topn( | |
| 84 | + self, | |
| 85 | + query: str, | |
| 86 | + docs: List[str], | |
| 87 | + normalize: bool = True, | |
| 88 | + top_n: int | None = None, | |
| 89 | + ) -> Tuple[List[float], Dict[str, Any]]: | |
| 90 | + start_ts = time.time() | |
| 91 | + total_docs = len(docs) if docs else 0 | |
| 92 | + output_scores: List[float] = [0.0] * total_docs | |
| 93 | + | |
| 94 | + query = "" if query is None else str(query).strip() | |
| 95 | + indexed: List[Tuple[int, str]] = [] | |
| 96 | + for i, doc in enumerate(docs or []): | |
| 97 | + if doc is None: | |
| 98 | + continue | |
| 99 | + text = str(doc).strip() | |
| 100 | + if not text: | |
| 101 | + continue | |
| 102 | + indexed.append((i, text)) | |
| 103 | + | |
| 104 | + if not query or not indexed: | |
| 105 | + elapsed_ms = (time.time() - start_ts) * 1000.0 | |
| 106 | + return output_scores, { | |
| 107 | + "input_docs": total_docs, | |
| 108 | + "usable_docs": len(indexed), | |
| 109 | + "unique_docs": 0, | |
| 110 | + "dedup_ratio": 0.0, | |
| 111 | + "elapsed_ms": round(elapsed_ms, 3), | |
| 112 | + "model": self._model_name, | |
| 113 | + "backend": "jina_reranker_v3", | |
| 114 | + "normalize": normalize, | |
| 115 | + "normalize_note": "jina_reranker_v3 returns model relevance scores directly", | |
| 116 | + } | |
| 117 | + | |
| 118 | + unique_texts: List[str] = [] | |
| 119 | + unique_first_indices: List[int] = [] | |
| 120 | + text_to_unique_idx: Dict[str, int] = {} | |
| 121 | + for orig_idx, text in indexed: | |
| 122 | + unique_idx = text_to_unique_idx.get(text) | |
| 123 | + if unique_idx is None: | |
| 124 | + unique_idx = len(unique_texts) | |
| 125 | + text_to_unique_idx[text] = unique_idx | |
| 126 | + unique_texts.append(text) | |
| 127 | + unique_first_indices.append(orig_idx) | |
| 128 | + | |
| 129 | + effective_top_n = min(top_n, len(unique_texts)) if top_n is not None else None | |
| 130 | + | |
| 131 | + unique_scores = self._rerank_unique( | |
| 132 | + query=query, | |
| 133 | + docs=unique_texts, | |
| 134 | + top_n=effective_top_n, | |
| 135 | + ) | |
| 136 | + | |
| 137 | + for orig_idx, text in indexed: | |
| 138 | + unique_idx = text_to_unique_idx[text] | |
| 139 | + output_scores[orig_idx] = float(unique_scores[unique_idx]) | |
| 140 | + | |
| 141 | + elapsed_ms = (time.time() - start_ts) * 1000.0 | |
| 142 | + dedup_ratio = 1.0 - (len(unique_texts) / float(len(indexed))) if indexed else 0.0 | |
| 143 | + meta = { | |
| 144 | + "input_docs": total_docs, | |
| 145 | + "usable_docs": len(indexed), | |
| 146 | + "unique_docs": len(unique_texts), | |
| 147 | + "dedup_ratio": round(dedup_ratio, 4), | |
| 148 | + "elapsed_ms": round(elapsed_ms, 3), | |
| 149 | + "model": self._model_name, | |
| 150 | + "backend": "jina_reranker_v3", | |
| 151 | + "device": self._device, | |
| 152 | + "dtype": self._dtype, | |
| 153 | + "batch_size": self._batch_size, | |
| 154 | + "normalize": normalize, | |
| 155 | + "normalize_note": "jina_reranker_v3 returns model relevance scores directly", | |
| 156 | + } | |
| 157 | + if effective_top_n is not None: | |
| 158 | + meta["top_n"] = effective_top_n | |
| 159 | + if len(unique_texts) > self._batch_size: | |
| 160 | + meta["top_n_note"] = ( | |
| 161 | + "Applied as a request hint only; full scores were computed because " | |
| 162 | + "global top_n across multiple local batches would be lossy." | |
| 163 | + ) | |
| 164 | + return output_scores, meta | |
| 165 | + | |
| 166 | + def _rerank_unique( | |
| 167 | + self, | |
| 168 | + query: str, | |
| 169 | + docs: List[str], | |
| 170 | + top_n: int | None, | |
| 171 | + ) -> List[float]: | |
| 172 | + if not docs: | |
| 173 | + return [] | |
| 174 | + | |
| 175 | + unique_scores: List[float] = [0.0] * len(docs) | |
| 176 | + | |
| 177 | + with self._lock: | |
| 178 | + for start in range(0, len(docs), self._batch_size): | |
| 179 | + batch_docs = docs[start : start + self._batch_size] | |
| 180 | + batch_top_n = None | |
| 181 | + if top_n is not None and len(docs) <= self._batch_size: | |
| 182 | + batch_top_n = min(top_n, len(batch_docs)) | |
| 183 | + results = self._model.rerank( | |
| 184 | + query, | |
| 185 | + batch_docs, | |
| 186 | + top_n=batch_top_n, | |
| 187 | + return_embeddings=self._return_embeddings, | |
| 188 | + ) | |
| 189 | + for item in results: | |
| 190 | + batch_index = int(item["index"]) | |
| 191 | + unique_scores[start + batch_index] = float(item["relevance_score"]) | |
| 192 | + | |
| 193 | + return unique_scores | ... | ... |
reranker/server.py
| ... | ... | @@ -7,7 +7,7 @@ Request: { "query": "...", "docs": ["doc1", "doc2", ...], "normalize": optional |
| 7 | 7 | Response: { "scores": [float], "meta": {...} } |
| 8 | 8 | |
| 9 | 9 | Backend selected via config: services.rerank.backend |
| 10 | -(bge | qwen3_vllm | qwen3_vllm_score | qwen3_transformers | qwen3_transformers_packed | qwen3_gguf | qwen3_gguf_06b | dashscope_rerank), env RERANK_BACKEND. | |
| 10 | +(bge | jina_reranker_v3 | qwen3_vllm | qwen3_vllm_score | qwen3_transformers | qwen3_transformers_packed | qwen3_gguf | qwen3_gguf_06b | dashscope_rerank), env RERANK_BACKEND. | |
| 11 | 11 | """ |
| 12 | 12 | |
| 13 | 13 | import logging | ... | ... |
scripts/lib/reranker_backend_env.sh
| ... | ... | @@ -40,6 +40,7 @@ reranker_backend_venv_dir() { |
| 40 | 40 | case "${backend}" in |
| 41 | 41 | qwen3_vllm) printf '%s/.venv-reranker\n' "${project_root}" ;; |
| 42 | 42 | qwen3_vllm_score) printf '%s/.venv-reranker-score\n' "${project_root}" ;; |
| 43 | + jina_reranker_v3) printf '%s/.venv-reranker-jina\n' "${project_root}" ;; | |
| 43 | 44 | qwen3_gguf) printf '%s/.venv-reranker-gguf\n' "${project_root}" ;; |
| 44 | 45 | qwen3_gguf_06b) printf '%s/.venv-reranker-gguf-06b\n' "${project_root}" ;; |
| 45 | 46 | qwen3_transformers) printf '%s/.venv-reranker-transformers\n' "${project_root}" ;; |
| ... | ... | @@ -57,6 +58,7 @@ reranker_backend_requirements_file() { |
| 57 | 58 | case "${backend}" in |
| 58 | 59 | qwen3_vllm) printf '%s/requirements_reranker_qwen3_vllm.txt\n' "${project_root}" ;; |
| 59 | 60 | qwen3_vllm_score) printf '%s/requirements_reranker_qwen3_vllm_score.txt\n' "${project_root}" ;; |
| 61 | + jina_reranker_v3) printf '%s/requirements_reranker_jina_reranker_v3.txt\n' "${project_root}" ;; | |
| 60 | 62 | qwen3_gguf) printf '%s/requirements_reranker_qwen3_gguf.txt\n' "${project_root}" ;; |
| 61 | 63 | qwen3_gguf_06b) printf '%s/requirements_reranker_qwen3_gguf_06b.txt\n' "${project_root}" ;; |
| 62 | 64 | qwen3_transformers) printf '%s/requirements_reranker_qwen3_transformers.txt\n' "${project_root}" ;; | ... | ... |