prompts.py 20.7 KB
Edit Raw Blame History

"""LLM prompt templates for relevance judging (keep wording changes here)."""

from __future__ import annotations

from typing import Sequence


_QUERY_INTENT_ANALYSIS_TEMPLATE_EN = """You are an intent analysis expert for a fashion e-commerce search system.

Given a user's search query, analyze the shopping intent behind the query in the context of fashion and apparel e-commerce, and summarize the user's core search need in one concise sentence.
Also provide the Chinese translation and English translation of the query.

Requirements:
- Keep the intent analysis concise and easy to understand, using 1 to 3 short sentences.
- Stay grounded in the original query and summarize the user's likely shopping intent without adding unnecessary context.
- When the query is vague or ambiguous, take a conservative approach and keep the analysis close to the original wording.
- Chinese translation: if the original query is already in Chinese, keep it unchanged.
- English translation: if the original query is already in English, keep it unchanged.
- Do not output anything other than the required three-line format.

Output format (strictly exactly three lines):
Intent: concise analysis of the user's search intent
Query中文翻译: Chinese translation of the query
Query English translation: English translation of the query

Now analyze the following query:

Query: {query}
"""

_QUERY_INTENT_ANALYSIS_RESULT_TEMPLATE_ZH = """
你是一个服装品类电商搜索意图分析专家。

给定用户输入的搜索词，请在服装品类电商场景下，分析该搜索词背后的购物意图，并用一句话简要描述用户的核心搜索需求。
同时，提供该搜索词的中文翻译和英文翻译。

要求：
- 意图分析应简洁易懂，用 1 到 3 句短句概括用户的搜索意图。
- 结合 query 本身，尽量贴近用户原始搜索需求进行总结，不添加不必要的背景、延伸或臆测。
- 如果 query 不够明确或有歧义，应保守处理，尽量保持与原词表达一致。
- 中文翻译：如果原始 query 本身就是中文，则按原样输出。
- 英文翻译：如果原始 query 本身就是英文，则按原样输出。
- 除指定格式外，不要输出任何额外说明。

输出格式（严格按三行输出）：
Intent: 对用户搜索意图的简洁分析
Query中文翻译: query 的中文翻译
Query English translation: query 的英文翻译

现在请分析以下搜索词：

Query: {query}
"""

_CLASSIFY_TEMPLATE_EN = """You are a relevance labeling assistant for a fashion e-commerce search system.
Given a user query and the information for each product, assign a relevance label to every product.

Your goal is to judge relevance from the perspective of e-commerce search ranking:
the key question is whether the user would view the product as the target item they want, or at least as an acceptable substitute.

## Relevance Labels

### Fully Relevant
The product satisfies the user’s core shopping intent: the core product type matches, and every explicitly stated key attribute in the query is supported by the product information.

Typical cases:
- The query contains only a product type, and the product is exactly that type.
- The query contains “product type + attributes,” and the product matches both the type and all explicitly stated attributes.

Examples:
- Query: “commuter skirt”
  Product: “black office skirt”
  → The category matches, and “office skirt” clearly reflects the commuter use case, so it satisfies all requirements in the query.

- Query: “slim-fit jeans”
  Product: “high-waisted straight-leg slight-flare cropped stretch jeans”
  → The product type matches, and the stretch fabric can support a slim-fit effect, so the query intent is fully satisfied. Other attributes (high-waisted, straight-leg, slight flare, cropped) are additional details and do not conflict with the query.

- Query: “beach vacation outfit”
  Product: “tropical resort Hawaiian-style summer dress”
  → The style matches and satisfies the vacation intent.

### Mostly Relevant
The product satisfies the user’s main intent: the core product type matches, but some explicitly requested attributes are missing from the product information, cannot be confirmed, or show slight / non-critical deviations. Overall, however, the product is still close to the ideal target and would be a strong substitute for the user’s core need.

Use “Mostly Relevant” in the following situations:
- The core product type matches, but some attributes are missing, not mentioned, or cannot be verified.
- The core product type matches, but there are minor deviations in color, material, style, fit, length, or similar attributes, as long as the deviation does not significantly undermine the main purchase intent.
- The product is not the user’s ideal target, but in a realistic shopping context it would still likely be seen as an acceptable and high-quality alternative.

Typical situations:
- Some attributes deviate, but the product remains reasonably close to the ideal target.
- Some required attributes cannot be confirmed, but there is no clear contradiction with the query, so the product may still fully satisfy the intent.

Examples:
- Query: “red slim-fit T-shirt”
  Product: “women’s T-shirt”
  → The category matches, but color and fit cannot be confirmed. It could still be the target item, so label it as “Mostly Relevant.”

- Query: “puff-sleeve short sleeve top”
  Product: “white puff-sleeve T-shirt”
  → It is unclear whether it is specifically short-sleeved, but it does have puff sleeves. Some attributes are unverified, yet nothing explicitly conflicts with the query, so there is a strong chance it is the intended item. Label it as “Mostly Relevant.”

- Query: “mid-length skirt”
  Product: “new loose-flowy long floral skirt with pleated design”
  Analysis:
    - The category matches, but the length differs (“long” rather than “mid-length”).
    - Still, it is a fairly similar substitute overall, so label it as “Mostly Relevant.”

### Weakly Relevant
The product is noticeably different from the user’s core target, but it still shares some similarity with the query in style, use case, function, or broader category. A small portion of users might still view it as a barely acceptable substitute. In other words, it is not the target item, but it still has some connection to the query.

Use “Weakly Relevant” in the following situations:
- The core product type does not match, but the two items are very similar in style, wearing scenario, or function, so there is still some substitute value.
- The core product type matches, but the product differs from the ideal target on multiple attributes. It still has some relevance, but it is no longer a strong substitute.
- An important attribute in the query is clearly violated, yet the product still retains a limited reason to be clicked.

Typical situations:
- Type matches, but the attribute mismatch is substantial (not especially close, but not fundamentally contradictory):
  - Query: “red T-shirt”
    Product: “blue T-shirt”
    → Both are T-shirts, but the color is different, so label it as “Weakly Relevant.”
  - Query: “floral dress for petites”
    Product: “summer sleeveless casual loose dress”
    → Both are dresses, but “petite-friendly” and “floral” are not satisfied, so label it as “Weakly Relevant.”
  - Query: “cotton long-sleeve shirt”
    Product: “men’s linen casual button-up long-sleeve shirt”
    → The style matches well, but the material requirement is different. The mismatch is meaningful, though not severely conflicting, so label it as “Weakly Relevant.”

- Type does not match, but style / use case is close:
  - Query: “white T-shirt”
    Product: “plain white base layer top”
    → A T-shirt and a base layer top are different product types, but their fit, material, and wearing scenario are very similar, and the color also matches. Label it as “Weakly Relevant.”
  - Query: “black mid-length skirt”
    Product: “new elegant black printed mid-length dress”
    → The core type differs (“skirt” vs. “dress”), but both are skirt-based garments, both are mid-length, and the styling scenarios are similar. Label it as “Weakly Relevant.”
  - Query: “jeans”
    Product: “casual pants”
    → The core type is different, but both belong to the broader pants category, and the style / use case may be similar. Label it as “Weakly Relevant.”

### Irrelevant
The product does not satisfy the user’s main shopping intent, and the chance of a click is very low.

This generally applies in any of the following situations:
- The core product type does not match the query and is not a style / scenario / functionally close substitute.
- The product belongs only to a loosely related broad category, but it is not interchangeable with the specific subtype named in the query, and the style or use case is also quite different.
- The core product type matches, but the product clearly violates an explicit and important requirement in the query, with little to no acceptable substitute value.

Typical situations:

**1. Different core categories, with no substitute value**
- Query: “pants”
  Product: “shoes”
- Query: “boots”
  Product: “sneakers”

**2. Core category is somewhat related, but a key attribute clearly conflicts**
- Query: “leggings”
  Product: “wide-leg pants”
- Query: “sleeveless dress”
  Product: “long-sleeve dress”
- Query: “oversized sweatshirt”
  Product: “slim-fit T-shirt”

**3. Same broad category, but major style / use-case mismatch**
- Query: “jeans”
  Product: “sweatpants / dress pants”

## Judgment Principles

1. **Product type is the highest-priority factor.**
   If the query explicitly specifies a concrete product type, the result must match that product type to be considered “Fully Relevant” or “Mostly Relevant.”
   Different product types should usually be labeled “Weakly Relevant” or “Irrelevant.”
   - **Weakly Relevant**: use only when the two product types are very close in style, use case, or function, such that users might still see one as a barely acceptable substitute for the other.
   - **Irrelevant**: use for all other product-type mismatches.

2. **When the query clearly specifies a product type, similar or related product types are usually not directly interchangeable; distinguish between “Weakly Relevant” and “Irrelevant” based on how close they really are.**
   For example:
   - **Close in style / use case → Weakly Relevant**: dress vs. skirt, long skirt vs. mid-length skirt, jeans vs. casual pants, sneakers vs. skate shoes.
   - **Far apart in style / use case → Irrelevant**: pants vs. shoes, T-shirt vs. hat, boots vs. sneakers, jeans vs. suit pants, backpack vs. handbag.

3. **Once the core product type matches, then evaluate attributes.**
   - All explicit attributes are directly shown or reasonably implied → **Fully Relevant**
   - Some attributes are missing, unmentioned, unverifiable, or slightly off, but the product remains close to the ideal target overall → **Mostly Relevant**
   - There are obvious deviations; the product is not especially close to the ideal target, but also not fundamentally opposed to it → **Weakly Relevant**
   - There is a clear and important hard conflict or opposite attribute (e.g. sleeveless vs. long-sleeve, slim-fit vs. loose-fit), with very low substitute value → **Irrelevant**

4. **Substitutability should be judged from a real shopping-intent perspective.**
   Do not rely on surface-level lexical similarity alone. Ask whether a user in an actual shopping context would plausibly accept the product.
   - Good substitute → **Mostly Relevant**
   - Barely acceptable substitute → **Weakly Relevant**
   - Not substitutable at all → **Irrelevant**

Query: {query}
{intent_suffix}

Products:
{lines}

## Output Format
Output exactly {n} lines.
Each line must be exactly one of the following four labels:
Fully Relevant
Mostly Relevant
Weakly Relevant
Irrelevant

Now, based on the query "{query}", assign a relevance label to each result.
The output lines must correspond exactly to the product order above.
Do not output anything else.
"""


_CLASSIFY_TEMPLATE_ZH = """你是一个服饰电商搜索系统中的相关性判断助手。
给定用户查询词以及每个商品的信息，请为每个商品分配一个相关性标签。

你的目标是从电商搜索排序的角度，判断商品是否满足用户的购物意图。
判断时应优先考虑“用户是否会把该商品视为目标商品，或可接受的替代品”。

## 相关性标签

### 完全相关
商品满足用户的核心购物意图：核心商品类型匹配，且查询中所有明确提及的关键属性均有商品信息支持。

典型适用场景：
- 查询仅包含商品类型，商品即为该类型。
- 查询包含“商品类型 + 属性”，商品在类型及所有明确属性上均符合。

典型案例：
- 查询：“通勤裙”，商品：“黑色职业裙”
  → 品类匹配且“职业裙”体现出通勤属性，满足了查询中所有的要求。

- 查询：“修身牛仔裤”，商品：“高腰直筒微喇九分弹力牛仔裤”
  → 产品类型匹配，弹力面料可以支撑修身效果，因此查询意图完全满足。其他属性（高腰、直筒、微喇、九分）是额外细节，不与查询冲突。

- 查询：“海滩度假装”，商品：“热带度假夏威夷风连衣裙”
  → 风格匹配，满足度假意图。

### 基本相关
商品满足用户的主要意图：核心商品类型匹配，查询中明确提出的部分要求未在商品信息中体现、无法确认，或存在轻微偏差/非关键偏差，但是整体上与理想目标比较接近，是满足用户核心需求的很好的替代品。

在以下情况使用“基本相关”：
- 核心商品类型匹配，但部分属性缺失、未提及或无法确认。
- 核心商品类型匹配，但颜色、材质、风格、版型、长度等属性存在轻微偏差，只要这种偏差不会明显破坏用户的主要购买意图。
- 商品不是用户最理想的目标，但在电商购物场景下仍可能被视为可接受、且较优的替代品。

典型情况：
- 部分属性有偏差，但是跟理想目标比较接近。
- 部分属性要求无法确认，但是整体上没有与用户需求存在矛盾的点，可能属于完全满足意图的产品。

案例：
- 查询：“红色修身T恤”，商品：“女士T恤”
  → 品类匹配，但是颜色、版型无法确认，仍有概率属于用户的目标商品，判为“基本相关”。

- 查询：“泡泡袖短袖”，商品：“白色短袖泡泡袖T恤”
  → 无法确定是否为短袖，但确实有泡泡袖。部分属性未核实，但没有任何内容与查询明确冲突，因此有很大概率是目标商品，判为“基本相关”。

- 查询：“中长半身裙”
  商品：“春秋季新款宽松显瘦大摆长裙碎花半身裙褶皱设计裙”
  分析：
    - 品类匹配，长度（长裙≠中长）有偏差，整体上为相似度较高的替代品，判为基本相关。

### 弱相关
商品与用户的核心目标存在明显差距，但仍与查询在风格、场景、功能或大类上具有一定相似性，可能被少量用户视为勉强可接受的替代品。属于“非目标，但仍有一定关联”。

在以下情况使用“弱相关”：
- 核心商品类型不一致，但两者在风格、穿着场景或功能上非常接近，仍具有一定替代性。
- 核心商品类型匹配，但在多个属性上与用户理想目标差距较大，虽仍有一定关联性，但已不是高质量替代品。
- 查询要求中的某个重要属性被明显违背，但商品仍保留少量被点击的理由。

典型情况：
- 类型匹配 + 属性要求有偏差（不属于相似，也不造成整体产品调性的严重冲突）：
  - 查询：“红色T恤”，商品：“蓝色T恤” → 都属于T恤，但颜色不同，判为“弱相关”。
  - 查询：“小个子碎花连衣裙”，商品：“夏季无袖休闲宽松连衣裙” → 都属于连衣裙，但未满足“小个子”及“碎花”要求，判为“弱相关”。
  - 查询：“棉质长袖衬衫”，商品：“男式亚麻衬衫休闲纽扣长袖” → 款式完全匹配，材质要求不同，不是非常接近、也没有严重冲突，判为“弱相关”。

- 类型不匹配 + 风格/场景接近：
  - 查询：“白色T恤”，商品：“纯白春秋打底衫” → T恤和打底衫品类不同，但版型、材质、穿着场景非常接近，且也是白色，因此应判为“弱相关”。
  - 查询：“黑色中长半身裙”，商品：“新款高腰V领中长款连衣裙 优雅印花黑色性感连衣裙” → 核心商品类型“半身裙”与“连衣裙”不同，但同属裙装，款式均为“中长款”，穿搭场景接近，判为“弱相关”。
  - 查询：“牛仔裤”，商品：“休闲裤” → 核心商品类型不同，但同属裤装大类，风格和穿着场景可能接近，判为“弱相关”。

### 不相关
商品未满足用户的主要购物意图，用户点击动机极低。

主要表现为以下情形之一：
- 核心商品类型与查询不匹配，且不属于风格/场景/功能接近的可替代品。
- 商品虽属于大致相关的大类，但与查询明确指定的具体子类不可互换，且风格或场景差异大。
- 核心商品类型匹配，但商品明显违背了查询中一个明确且重要的要求，且几乎不具备可接受的替代性。

典型情况：

**1. 核心品类不同，且无替代性**
- 查询：“裤子”，商品：“鞋子”
- 查询：“靴子”，商品：“运动鞋”

**2. 核心品类相近，但关键属性明显冲突**
- 查询：“紧身裤”，商品：“阔腿裤”
- 查询：“无袖连衣裙”，商品：“长袖连衣裙”
- 查询：“宽松卫衣”，商品：“修身T恤”

**3. 核心品类同属大类，但风格、场景差异巨大**
- 查询：“牛仔裤”，商品：“运动裤 / 西裤”

## 判断原则

1. **商品类型是最高优先级因素。**
   如果查询明确指定了具体商品类型，那么结果必须匹配该商品类型，才可能判为“完全相关”或“基本相关”。
   不同商品类型通常应判为“弱相关”或“不相关”。
   - **弱相关**：仅当两种商品类型在风格、场景、功能上非常接近，用户有一定概率将其视为勉强可接受的替代品时使用。
   - **不相关**：其他所有商品类型不匹配的情况。

2. **相似或相关的商品类型，在查询明确时通常不可直接互换，但要根据接近程度区分“弱相关”与“不相关”。**
   例如：
   - **风格/场景高度接近，可判为弱相关**：连衣裙 vs 半身裙、长裙 vs 中长裙、牛仔裤 vs 休闲裤、运动鞋 vs 板鞋。
   - **风格/场景差异大，应判为不相关**：裤子 vs 鞋子、T恤 vs 帽子、靴子 vs 运动鞋、牛仔裤 vs 西装裤、双肩包 vs 手提包。

3. **当核心商品类型匹配后，再评估属性。**
   - 所有明确属性都有被显示或隐式的支撑 → **完全相关**
   - 部分属性缺失、未提及、无法确认、或存在偏差，但是商品整体上理想目标比较接近 → **基本相关**
   - 存在明显的偏差、该偏差与理想目标不算接近但是也严重对立 → **弱相关**
   - 存在明确且重要的强冲突/相反（如无袖 vs 长袖、修身 vs 宽松），替代性极低 → **不相关**

4. **“是否可替代”应从真实电商购物意图出发判断。**
   不是只看字面相似，而要看用户在购物场景下是否可能接受该商品。
   - 良好替代品 → **基本相关**
   - 勉强替代品 → **弱相关**
   - 完全不可替代 → **不相关**

查询：{query}
{intent_suffix}

商品：
{lines}

## 输出格式
严格输出 {n} 行，每行只能是以下四者之一：
完全相关
基本相关
弱相关
不相关

现在请根据 query“{query}”，为每个结果标注相关性标签。输出行必须与上方商品顺序一一对应，不要输出任何其他内容。
"""


def intent_analysis_prompt(query: str) -> str:
    return _QUERY_INTENT_ANALYSIS_TEMPLATE_EN.format(query=query)


def classify_prompt(
    query: str,
    numbered_doc_lines: Sequence[str],
    *,
    query_intent_block: str = "",
) -> str:
    lines = "\n".join(numbered_doc_lines)
    n = len(numbered_doc_lines)
    intent_suffix = f"\n{query_intent_block.strip()}" if query_intent_block and query_intent_block.strip() else ""
    return _CLASSIFY_TEMPLATE_EN.format(query=query, intent_suffix=intent_suffix, lines=lines, n=n)