docs/TODO.txt

product_enrich : Partial Mode
https://help.aliyun.com/zh/model-studio/partial-mode?spm=a2c4g.11186623.help-menu-2400256.d_0_3_0_7.74a630119Ct6zR
需在messages 数组中将最后一条消息的 role 设置为 assistant，并在其 content 中提供前缀，在此消息中设置参数 "partial": true。messages格式如下：
[
    {
        "role": "user",
        "content": "请补全这个斐波那契函数，勿添加其它内容"
    },
    {
        "role": "assistant",
        "content": "def calculate_fibonacci(n):\n    if n <= 1:\n        return n\n    else:\n",
        "partial": true
    }
]
模型会以前缀内容为起点开始生成。
支持 非思考模式。
融合打分：
1. 融合公式
def fuse_scores(self, context: SearchContext) -> None:
    for result in context.results:
        # 计算文本相关性分数
        text_score = (
            result.es_base_score * 0.5 +
            result.es_phrase_score * 0.3 +
            result.es_keywords_score * 0.2
        )
        
        # 最终融合分数
        result.fused_score = (
            (result.rerank_text + 0.00001) ** 1.0 *
            (result.rerank_boost + 0.1) ** 1.0 *
            (result.es_knn_score + 0.6) ** 0.2 *
            (text_score + 0.1) ** 0.77
        )
2. matched_queries
        # Add debug information if matched_queries is present and from_search_debug is True
        if 'matched_queries' in hit:
            # Extract individual scores from matched_queries
            matched_queries = hit['matched_queries']
            if context.from_search_debug:
                result.matched_queries = matched_queries
            if isinstance(matched_queries, dict):
                result.es_knn_score = matched_queries.get('knn_query', 0.0)
                result.es_phrase_score = matched_queries.get('phrase_query', 0.0)
                result.es_base_score = matched_queries.get('base_query', 0.0)
                result.es_keywords_score = matched_queries.get('keywords_query', 0.0)
                result.es_tags_score = matched_queries.get('tags_query', 0.0)
            elif isinstance(matched_queries, list):
                for query in matched_queries:
                    if query == 'knn_query':
                        result.es_knn_score = 1.0
                    elif query == 'phrase_query':
                        result.es_phrase_score = 1.0
                    elif query == 'base_query':
                        result.es_base_score = 1.0
                    elif query == 'keywords_query':
                        result.es_keywords_score = 1.0
                    elif query == 'tags_query':
                        result.es_tags_score = 1.0
es_query["track_scores"] = True
为了获取 需要打开 track_scores ？ 不一定
Track scores
When sorting on a field, scores are not computed. By setting track_scores to true, scores will still be computed and tracked.
翻译的health很慢
provider backend 两者的关系，如何配合。
translator的设计 ： 
QueryParser 里面 并不是调用的6006，目前是把6006做了一个provider，然后translate的总体配置又有6006的baseurl，很混乱。
翻译模块重构已完成。以下旧结论已失效，不再适用：
- 业务侧不再把 translation 当 provider 选择。
- `QueryParser` / indexer 统一通过 `translation.create_translation_client()` 调用 6006 translator service。
- 翻译配置统一为 `services.translation`：
  - 外部配置只保留部署相关项，如 `service_url`、`default_model`、`default_scene`、各 capability 的 `backend/base_url/api_url/model_dir` 等。
  - scene 规则、语言码映射、LLM prompt 模板、本地模型方向约束统一收口在 `translation/` 内部。
- 外部接口统一使用 `model + scene`，不再对外暴露 `prompt`。
以以下文档为准：
- `docs/翻译模块说明.md`
- `docs/DEVELOPER_GUIDE.md`
- `docs/QUICKSTART.md`
- `docs/搜索API对接指南.md`
suggest 索引，现在是全量脚本，要交给金伟
翻译，增加facebook/nllb-200-distilled-600M
https://blog.csdn.net/qq_42746084/article/details/154947534
https://huggingface.co/facebook/nllb-200-distilled-600M
店铺的语言：英语能占到80%，所以专门增加一个en-zh的
https://huggingface.co/Helsinki-NLP/opus-mt-zh-en
https://huggingface.co/Helsinki-NLP/opus-mt-en-zh
opus-mt-zh-en
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_name = "./models/opus-mt-en-zh"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
data = 'test'
encoded = tokenizer([data], return_tensors="pt")
translation = model.generate(**encoded)
result = tokenizer.batch_decode(translation, skip_special_tokens=True)[0]
print(result)
Qwen3-Reranker-4B-GGUF
https://modelscope.cn/models/dengcao/Qwen3-Reranker-4B-GGUF/summary
1. 要确定选择哪种量化方式
2. 确定提示词
reranker 补充：nvidia/llama-nemotron-rerank-1b-v2
encoder架构。
比较新。
性能更好。
亚马逊 电商搜索数据集比qwen-reranker-4b更好。
支持vLLM。
查看翻译的缓存情况
向量的缓存
AI - 生产 - MySQL
HOST：10.200.16.14 / localhost
端口：3316
用户名：root
密码：qY8tgodLoA&KT#yQ
AI - 生产 - Redis
HOST：10.200.16.14 / localhost
端口：6479
密码：dxEkegEZ@C5SXWKv
远程登录方式：
# redis
redis-cli -h 43.166.252.75 -p 6479
# mysql 3个用户，都可以远程登录
mysql -uroot -p'qY8tgodLoA&KT#yQ'
CREATE USER 'saas'@'%' IDENTIFIED BY '6dlpco6dVGuqzt^l';
CREATE USER 'sa'@'%' IDENTIFIED BY 'C#HU!GPps7ck8tsM';
ES：
HOST：10.200.16.14 / localhost
端口：9200
访问示例：
用户名密码：saas:4hOaLaf41y2VuI8y
安装 nvidia-container-toolkit （done）
https://mirrors.aliyun.com/github/releases/NVIDIA/nvidia-container-toolkit/
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html
qwen3-embedding、qwen3-reranker （done）
选一个推理引擎，相比于我自己直接调 sentence-transformers，主要是多进程和负载均衡、连续批处理，比较有用
当前结论：embedding 场景优先 TEI；vLLM 更偏向生成式与 rerank 场景。
混用 大模型 使用：hunyuan-turbos-latest
混元 OpenAI 兼容接口相关调用示例：https://cloud.tencent.com/document/product/1729/111007
腾讯云 混元大模型 API_KEY：sk-mN2PiW2gp57B3ykxGs4QhvYxhPzXRZ2bcR5kPqadjboGYwiz
hunyuan翻译：使用模型  hunyuan-translation
https://cloud.tencent.com/document/product/1729/113395#4.-.E7.A4.BA.E4.BE.8B
谷歌翻译 基础版：https://docs.cloud.google.com/translate/docs/reference/rest/v2/translate
阿里云 百炼模型 现在使用的apikey是国内的。
各地域的 Base URL 和对应的 API Key 是绑定的。
现在使用了美国的服务器，使用了美国的地址，需要在 美国地域控制台页面（https://modelstudio.console.aliyun.com/us-east-1 ）中创建或获取API_KEY：
登录 百炼美国地域控制台:https://modelstudio.console.aliyun.com/us-east-1?spm=5176.2020520104.0.0.6b383a98WjpXff
在 API Key 管理 中创建或复制一个适用于美国地域的 Key