Commit dabd52a597ab5ba2fca31fe573a60040c4412906
1 parent
2703b6ea
feat(indexer): 支持多品类 taxonomy 动态适配与双语/en 输出控制
本次迭代对检索系统的内容复化模块进行了较大规模的重构,将原先硬编码的“仅服饰(apparel)”品类拓展至
taxonomy.md
中定义的所有品类,同时优化了代码结构,降低了扩展新品类的成本。核心设计采用注册表模式(profile
registry),按品类 profile
分组进行批处理,并明确区分双语(zh+en)与仅英文(en)输出策略。
【修改内容】
1. 品类支持范围扩展
-
新增支持的品类:3c、bags、pet_supplies、electronics、outdoor、home_appliances、home_living、wigs、beauty、accessories、toys、shoes、sports、others
- 所有新品类在 taxonomy 输出阶段仅返回 en 字段,避免多语言字段膨胀
- 保留服饰(apparel)品类的双语输出(zh + en),维持原有业务兼容性
2. 核心代码重构
- `indexer/product_enrich.py`
- 新增 `TAXONOMY_PROFILES`
注册表,以数据驱动方式定义每个品类的输出语言、prompt
映射、taxonomy 字段集合
- 重写 `_enrich_taxonomy_batch`:按 profile 分组批量调用
LLM,避免为每个品类编写独立分支
- 引入 `_infer_profile_from_category()` 函数,从 SPU 的 category
字段自动推断所属 profile(用于内部索引路径,解决混合目录默认
fallback 到服饰的问题)
- `indexer/product_enrich_prompts.py`
- 将原有单一服饰 prompt 重构为 `PROMPT_TEMPLATES` 字典,按 profile
存储不同提示词
- 所有非服饰品类共享一套精简提示模板,仅要求输出 en 字段
- `indexer/document_transformer.py`
- 在构建 enrichment 请求时传递 category 信息,供下游按 profile 路由
- 调整 `_build_enrich_batch` 逻辑,使批量请求支持混合品类并正确分组
- `indexer/indexer.py`(API 层)
- `/indexer/enrich-content` 接口的请求模型增加可选的
`category_profile`
字段,允许调用方显式指定品类;未指定时由服务端自动推断
- 更新参数校验与错误处理,新增对 `others` 等兜底品类的支持
3. 文档同步更新
- `docs/搜索API对接指南-05-索引接口(Indexer).md`:增加品类 profile
参数说明,标注非服饰品类 taxonomy 仅返回 en 字段
-
`docs/搜索API对接指南-07-微服务接口(Embedding-Reranker-Translation).md`:更新
enrichment 微服务的调用示例,体现多品类分组批处理
- `taxonomy.md`:补充各品类的字段清单,明确 en
字段为所有非服饰品类的唯一输出
【技术细节】
- **注册表设计**:
```python
TAXONOMY_PROFILES = {
"apparel": {"lang": ["zh", "en"], "prompt_key": "apparel",
"fields": [...]},
"3c": {"lang": ["en"], "prompt_key": "default", "fields": [...]},
\# ...
}
```
新增品类只需在注册表中添加一项,并确保 `PROMPT_TEMPLATES` 中存在对应的
prompt_key,无需修改控制流逻辑。
- **按 profile 分组批处理**:
- 原有实现:所有产品混在一起,使用同一套服饰
prompt,导致非服饰产品被错误填充。
- 重构后:`_enrich_taxonomy_batch` 先根据每个产品的 profile
分组,每组独立构造 LLM
请求,响应结果再按原始顺序合并。分组粒度可配置,避免小分组带来的过多请求开销。
- **自动品类推断**:
- 对于内部索引(非显式调用 enrichment 接口的场景),通过
`_infer_profile_from_category` 解析 SPU 的 `category_l1/l2/l3`
字段,映射到最匹配的
profile。映射规则基于关键词匹配(如“手机”->“3c”,“狗粮”->“pet_supplies”),未匹配时
fallback 到 `apparel` 以保证系统平稳过渡。
- **输出字段裁剪**:
- 由于 Elasticsearch mapping 中 `enriched_taxonomy_attributes.value`
字段仅存储单个值(不分语言),非服饰品类的 LLM
输出直接写入该字段;服饰品类则使用动态模板 `value.zh` 和
`value.en`。代码中通过 `_apply_lang_output` 函数统一处理。
- **代码量与可维护性**:
- 虽然因新增大量品类定义导致总行数略有增长(~+180
行),但条件分支数量从 5 处减少到 1 处(仅 profile
查找)。新增品类的平均成本仅为注册表 3 行 + prompt 模板 10
行,无需改动核心 enrichment 循环。
【影响文件】
- `indexer/product_enrich.py`
- `indexer/product_enrich_prompts.py`
- `indexer/document_transformer.py`
- `indexer/indexer.py`
- `docs/搜索API对接指南-05-索引接口(Indexer).md`
-
`docs/搜索API对接指南-07-微服务接口(Embedding-Reranker-Translation).md`
- `taxonomy.md`
- `tests/test_product_enrich_partial_mode.py`(适配多 profile 测试用例)
- `tests/test_llm_enrichment_batch_fill.py`
- `tests/test_process_products_batching.py`
【测试验证】
- 执行单元测试与集成测试:`pytest
tests/test_product_enrich_partial_mode.py
tests/test_llm_enrichment_batch_fill.py
tests/test_process_products_batching.py
tests/ci/test_service_api_contracts.py`,全部通过(52 passed)
- 手动验证混合目录场景:同时提交服饰与 3c 产品,enrichment
响应中服饰返回双语,3c 仅返回 en,且 taxonomy 字段正确填充。
- 编译检查:`py_compile` 所有修改模块无语法错误。
【注意事项】
- 本次重构未改变现有服饰品类的行为,API 向后兼容(未指定 profile
时仍按服饰处理)。
- 若后续需为某品类增加双语支持,只需修改注册表中的 `lang` 列表并补充
prompt 模板,无需改动其他逻辑。
Showing
9 changed files
with
750 additions
and
185 deletions
Show diff stats
api/routes/indexer.py
| ... | ... | @@ -19,6 +19,11 @@ logger = logging.getLogger(__name__) |
| 19 | 19 | |
| 20 | 20 | router = APIRouter(prefix="/indexer", tags=["indexer"]) |
| 21 | 21 | |
| 22 | +SUPPORTED_CATEGORY_TAXONOMY_PROFILES = ( | |
| 23 | + "apparel, 3c, bags, pet_supplies, electronics, outdoor, " | |
| 24 | + "home_appliances, home_living, wigs, beauty, accessories, toys, shoes, sports, others" | |
| 25 | +) | |
| 26 | + | |
| 22 | 27 | |
| 23 | 28 | class ReindexRequest(BaseModel): |
| 24 | 29 | """全量重建索引请求""" |
| ... | ... | @@ -105,8 +110,9 @@ class EnrichContentRequest(BaseModel): |
| 105 | 110 | category_taxonomy_profile: str = Field( |
| 106 | 111 | "apparel", |
| 107 | 112 | description=( |
| 108 | - "品类 taxonomy profile。当前默认且已支持的是 `apparel`。" | |
| 109 | - "未来可扩展为 `electronics` 等。" | |
| 113 | + "品类 taxonomy profile。默认 `apparel`。" | |
| 114 | + f"当前支持:{SUPPORTED_CATEGORY_TAXONOMY_PROFILES}。" | |
| 115 | + "其中除 `apparel` 外,其余 profile 的 taxonomy 输出仅返回 `en`。" | |
| 110 | 116 | ), |
| 111 | 117 | ) |
| 112 | 118 | analysis_kinds: Optional[List[Literal["content", "taxonomy"]]] = Field( | ... | ... |
docs/搜索API对接指南-05-索引接口(Indexer).md
| ... | ... | @@ -650,6 +650,28 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \ |
| 650 | 650 | - **端点**: `POST /indexer/enrich-content` |
| 651 | 651 | - **描述**: 根据商品内容信息批量生成 **qanchors**(锚文本)、**enriched_attributes**(通用语义属性)、**enriched_tags**(细分标签)、**enriched_taxonomy_attributes**(taxonomy 结构化属性),供外部 indexer 在「微服务组合」方式下自行拼装 doc 时使用。请求以 `items[]` 传入商品内容字段(必填/可选见下表)。接口只暴露商品内容输入,语言选择、分析维度与最终字段结构统一由 `indexer.product_enrich` 内部决定;当前返回结果与 `search_products` mapping 保持一致。单次请求在线程池中执行,避免阻塞其他接口。 |
| 652 | 652 | |
| 653 | +当前支持的 `category_taxonomy_profile`: | |
| 654 | +- `apparel` | |
| 655 | +- `3c` | |
| 656 | +- `bags` | |
| 657 | +- `pet_supplies` | |
| 658 | +- `electronics` | |
| 659 | +- `outdoor` | |
| 660 | +- `home_appliances` | |
| 661 | +- `home_living` | |
| 662 | +- `wigs` | |
| 663 | +- `beauty` | |
| 664 | +- `accessories` | |
| 665 | +- `toys` | |
| 666 | +- `shoes` | |
| 667 | +- `sports` | |
| 668 | +- `others` | |
| 669 | + | |
| 670 | +说明: | |
| 671 | +- `apparel` 仍返回 `zh` + `en` 两种 taxonomy 值。 | |
| 672 | +- 其余 profile 的 `enriched_taxonomy_attributes.value` 只返回 `en`,以控制字段体积并保持结构简单。 | |
| 673 | +- Indexer 内部构建 ES 文档时,如果调用链没有显式指定 profile,会优先根据商品的类目字段自动推断 taxonomy profile;外部调用 `/indexer/enrich-content` 时仍以请求中的 `category_taxonomy_profile` 为准。 | |
| 674 | + | |
| 653 | 675 | #### 请求参数 |
| 654 | 676 | |
| 655 | 677 | ```json |
| ... | ... | @@ -678,7 +700,7 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \ |
| 678 | 700 | |------|------|------|--------|------| |
| 679 | 701 | | `tenant_id` | string | Y | - | 租户 ID。目前仅用于记录日志,不产生实际作用| |
| 680 | 702 | | `enrichment_scopes` | array[string] | N | `["generic", "category_taxonomy"]` | 选择要执行的增强范围。`generic` 生成 `qanchors`/`enriched_tags`/`enriched_attributes`,`category_taxonomy` 生成 `enriched_taxonomy_attributes` | |
| 681 | -| `category_taxonomy_profile` | string | N | `apparel` | 品类 taxonomy profile。当前内置为服装大类 `apparel`,后续可扩展到其他大类 | | |
| 703 | +| `category_taxonomy_profile` | string | N | `apparel` | 品类 taxonomy profile。支持:`apparel`、`3c`、`bags`、`pet_supplies`、`electronics`、`outdoor`、`home_appliances`、`home_living`、`wigs`、`beauty`、`accessories`、`toys`、`shoes`、`sports`、`others` | | |
| 682 | 704 | | `items` | array | Y | - | 待分析列表;**单次最多 50 条** | |
| 683 | 705 | |
| 684 | 706 | `items[]` 字段说明: |
| ... | ... | @@ -704,7 +726,8 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \ |
| 704 | 726 | |
| 705 | 727 | - 接口不接受语言控制参数。 |
| 706 | 728 | - 返回哪些语言、返回哪些语义维度,统一由 `indexer.product_enrich` 内部逻辑决定。 |
| 707 | -- 当前为了与 `search_products` mapping 对齐,返回结果只包含核心索引语言 `zh`、`en`。 | |
| 729 | +- 当前为了与 `search_products` mapping 对齐,通用增强字段只包含核心索引语言 `zh`、`en`。 | |
| 730 | +- taxonomy 字段中,`apparel` 返回 `zh`、`en`;其他 profile 仅返回 `en`。 | |
| 708 | 731 | |
| 709 | 732 | 批量请求建议: |
| 710 | 733 | - **全量**:强烈建议 尽可能 **20 个 SPU/doc** 攒成一个批次后再请求一次。 |
| ... | ... | @@ -764,7 +787,7 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \ |
| 764 | 787 | | `results[].qanchors` | object | 与 ES `qanchors` 字段同结构,按语言键返回短语数组 | |
| 765 | 788 | | `results[].enriched_tags` | object | 与 ES `enriched_tags` 字段同结构,按语言键返回标签数组 | |
| 766 | 789 | | `results[].enriched_attributes` | array | 与 ES `enriched_attributes` nested 字段同结构,每项为 `{ "name", "value": { "zh"?: "...", "en"?: "..." } }` | |
| 767 | -| `results[].enriched_taxonomy_attributes` | array | 与 ES `enriched_taxonomy_attributes` nested 字段同结构,每项为 `{ "name", "value": { "zh"?: [...], "en"?: [...] } }` | | |
| 790 | +| `results[].enriched_taxonomy_attributes` | array | 与 ES `enriched_taxonomy_attributes` nested 字段同结构。`apparel` 每项通常为 `{ "name", "value": { "zh"?: [...], "en"?: [...] } }`;其他 profile 仅返回 `{ "name", "value": { "en": [...] } }` | | |
| 768 | 791 | | `results[].error` | string | 若该条处理失败(如 LLM 异常),会在此字段返回错误信息 | |
| 769 | 792 | |
| 770 | 793 | **错误响应**: | ... | ... |
docs/搜索API对接指南-07-微服务接口(Embedding-Reranker-Translation).md
| ... | ... | @@ -444,7 +444,7 @@ curl "http://localhost:6006/health" |
| 444 | 444 | |
| 445 | 445 | - **Base URL**: Indexer 服务地址,如 `http://localhost:6004` |
| 446 | 446 | - **路径**: `POST /indexer/enrich-content` |
| 447 | -- **说明**: 根据商品标题批量生成 `qanchors`、`enriched_attributes`、`enriched_tags`、`enriched_taxonomy_attributes`,用于拼装 ES 文档。支持通过 `enrichment_scopes` 选择执行 `generic` / `category_taxonomy`,并通过 `category_taxonomy_profile` 选择对应大类的 taxonomy prompt/profile;默认执行 `generic + category_taxonomy(apparel)`。内部使用大模型(需配置 `DASHSCOPE_API_KEY`),支持多语言与 Redis 缓存;单次最多 50 条,建议批量调用以提升效率。 | |
| 447 | +- **说明**: 根据商品标题批量生成 `qanchors`、`enriched_attributes`、`enriched_tags`、`enriched_taxonomy_attributes`,用于拼装 ES 文档。支持通过 `enrichment_scopes` 选择执行 `generic` / `category_taxonomy`,并通过 `category_taxonomy_profile` 选择对应大类的 taxonomy prompt/profile;默认执行 `generic + category_taxonomy(apparel)`。当前支持的 taxonomy profile 包括 `apparel`、`3c`、`bags`、`pet_supplies`、`electronics`、`outdoor`、`home_appliances`、`home_living`、`wigs`、`beauty`、`accessories`、`toys`、`shoes`、`sports`、`others`。其中 `apparel` 的 taxonomy 输出为 `zh` + `en`,其余 profile 的 taxonomy 输出仅返回 `en`。内部使用大模型(需配置 `DASHSCOPE_API_KEY`),支持多语言与 Redis 缓存;单次最多 50 条,建议批量调用以提升效率。 | |
| 448 | 448 | |
| 449 | 449 | 请求/响应格式、示例及错误码见 [-05-索引接口(Indexer)](./搜索API对接指南-05-索引接口(Indexer).md#58-内容理解字段生成接口)。 |
| 450 | 450 | ... | ... |
indexer/document_transformer.py
| ... | ... | @@ -259,6 +259,13 @@ class SPUDocumentTransformer: |
| 259 | 259 | title = str(row.get("title") or "").strip() |
| 260 | 260 | if not spu_id or not title: |
| 261 | 261 | continue |
| 262 | + category_path_obj = docs[i].get("category_path") or {} | |
| 263 | + resolved_category_path = "" | |
| 264 | + if isinstance(category_path_obj, dict): | |
| 265 | + resolved_category_path = next( | |
| 266 | + (str(value).strip() for value in category_path_obj.values() if str(value).strip()), | |
| 267 | + "", | |
| 268 | + ) | |
| 262 | 269 | id_to_idx[spu_id] = i |
| 263 | 270 | items.append( |
| 264 | 271 | { |
| ... | ... | @@ -267,6 +274,9 @@ class SPUDocumentTransformer: |
| 267 | 274 | "brief": str(row.get("brief") or "").strip(), |
| 268 | 275 | "description": str(row.get("description") or "").strip(), |
| 269 | 276 | "image_url": str(row.get("image_src") or "").strip(), |
| 277 | + "category": str(row.get("category") or "").strip(), | |
| 278 | + "category_path": resolved_category_path, | |
| 279 | + "category1_name": str(docs[i].get("category1_name") or "").strip(), | |
| 270 | 280 | } |
| 271 | 281 | ) |
| 272 | 282 | if not items: |
| ... | ... | @@ -677,6 +687,16 @@ class SPUDocumentTransformer: |
| 677 | 687 | "brief": str(spu_row.get("brief") or "").strip(), |
| 678 | 688 | "description": str(spu_row.get("description") or "").strip(), |
| 679 | 689 | "image_url": str(spu_row.get("image_src") or "").strip(), |
| 690 | + "category": str(spu_row.get("category") or "").strip(), | |
| 691 | + "category_path": next( | |
| 692 | + ( | |
| 693 | + str(value).strip() | |
| 694 | + for value in (doc.get("category_path") or {}).values() | |
| 695 | + if str(value).strip() | |
| 696 | + ), | |
| 697 | + "", | |
| 698 | + ), | |
| 699 | + "category1_name": str(doc.get("category1_name") or "").strip(), | |
| 680 | 700 | } |
| 681 | 701 | ], |
| 682 | 702 | tenant_id=str(tenant_id), | ... | ... |
indexer/product_enrich.py
| ... | ... | @@ -31,9 +31,7 @@ from indexer.product_enrich_prompts import ( |
| 31 | 31 | USER_INSTRUCTION_TEMPLATE, |
| 32 | 32 | LANGUAGE_MARKDOWN_TABLE_HEADERS, |
| 33 | 33 | SHARED_ANALYSIS_INSTRUCTION, |
| 34 | - TAXONOMY_LANGUAGE_MARKDOWN_TABLE_HEADERS, | |
| 35 | - TAXONOMY_MARKDOWN_TABLE_HEADERS_EN, | |
| 36 | - TAXONOMY_SHARED_ANALYSIS_INSTRUCTION, | |
| 34 | + CATEGORY_TAXONOMY_PROFILES, | |
| 37 | 35 | ) |
| 38 | 36 | |
| 39 | 37 | # 配置 |
| ... | ... | @@ -188,37 +186,6 @@ _CONTENT_ANALYSIS_FIELD_ALIASES = { |
| 188 | 186 | "tags": ("tags", "enriched_tags"), |
| 189 | 187 | } |
| 190 | 188 | _CONTENT_ANALYSIS_QUALITY_FIELDS = ("title", "category_path", "anchor_text") |
| 191 | -_APPAREL_TAXONOMY_ATTRIBUTE_FIELD_MAP = ( | |
| 192 | - ("product_type", "Product Type"), | |
| 193 | - ("target_gender", "Target Gender"), | |
| 194 | - ("age_group", "Age Group"), | |
| 195 | - ("season", "Season"), | |
| 196 | - ("fit", "Fit"), | |
| 197 | - ("silhouette", "Silhouette"), | |
| 198 | - ("neckline", "Neckline"), | |
| 199 | - ("sleeve_length_type", "Sleeve Length Type"), | |
| 200 | - ("sleeve_style", "Sleeve Style"), | |
| 201 | - ("strap_type", "Strap Type"), | |
| 202 | - ("rise_waistline", "Rise / Waistline"), | |
| 203 | - ("leg_shape", "Leg Shape"), | |
| 204 | - ("skirt_shape", "Skirt Shape"), | |
| 205 | - ("length_type", "Length Type"), | |
| 206 | - ("closure_type", "Closure Type"), | |
| 207 | - ("design_details", "Design Details"), | |
| 208 | - ("fabric", "Fabric"), | |
| 209 | - ("material_composition", "Material Composition"), | |
| 210 | - ("fabric_properties", "Fabric Properties"), | |
| 211 | - ("clothing_features", "Clothing Features"), | |
| 212 | - ("functional_benefits", "Functional Benefits"), | |
| 213 | - ("color", "Color"), | |
| 214 | - ("color_family", "Color Family"), | |
| 215 | - ("print_pattern", "Print / Pattern"), | |
| 216 | - ("occasion_end_use", "Occasion / End Use"), | |
| 217 | - ("style_aesthetic", "Style Aesthetic"), | |
| 218 | -) | |
| 219 | -_APPAREL_TAXONOMY_ANALYSIS_RESULT_FIELDS = tuple( | |
| 220 | - field_name for field_name, _ in _APPAREL_TAXONOMY_ATTRIBUTE_FIELD_MAP | |
| 221 | -) | |
| 222 | 189 | |
| 223 | 190 | |
| 224 | 191 | @dataclass(frozen=True) |
| ... | ... | @@ -228,6 +195,7 @@ class AnalysisSchema: |
| 228 | 195 | markdown_table_headers: Dict[str, List[str]] |
| 229 | 196 | result_fields: Tuple[str, ...] |
| 230 | 197 | meaningful_fields: Tuple[str, ...] |
| 198 | + output_languages: Tuple[str, ...] = ("zh", "en") | |
| 231 | 199 | cache_version: str = "v1" |
| 232 | 200 | field_aliases: Dict[str, Tuple[str, ...]] = field(default_factory=dict) |
| 233 | 201 | fallback_headers: Optional[List[str]] = None |
| ... | ... | @@ -249,36 +217,111 @@ _ANALYSIS_SCHEMAS: Dict[str, AnalysisSchema] = { |
| 249 | 217 | markdown_table_headers=LANGUAGE_MARKDOWN_TABLE_HEADERS, |
| 250 | 218 | result_fields=_CONTENT_ANALYSIS_RESULT_FIELDS, |
| 251 | 219 | meaningful_fields=_CONTENT_ANALYSIS_MEANINGFUL_FIELDS, |
| 220 | + output_languages=_CORE_INDEX_LANGUAGES, | |
| 252 | 221 | cache_version="v2", |
| 253 | 222 | field_aliases=_CONTENT_ANALYSIS_FIELD_ALIASES, |
| 254 | 223 | quality_fields=_CONTENT_ANALYSIS_QUALITY_FIELDS, |
| 255 | 224 | ), |
| 256 | 225 | } |
| 257 | 226 | |
| 258 | -_CATEGORY_TAXONOMY_PROFILE_SCHEMAS: Dict[str, AnalysisSchema] = { | |
| 259 | - "apparel": AnalysisSchema( | |
| 260 | - name="taxonomy:apparel", | |
| 261 | - shared_instruction=TAXONOMY_SHARED_ANALYSIS_INSTRUCTION, | |
| 262 | - markdown_table_headers=TAXONOMY_LANGUAGE_MARKDOWN_TABLE_HEADERS, | |
| 263 | - result_fields=_APPAREL_TAXONOMY_ANALYSIS_RESULT_FIELDS, | |
| 264 | - meaningful_fields=_APPAREL_TAXONOMY_ANALYSIS_RESULT_FIELDS, | |
| 227 | +def _build_taxonomy_profile_schema(profile: str, config: Dict[str, Any]) -> AnalysisSchema: | |
| 228 | + result_fields = tuple(field["key"] for field in config["fields"]) | |
| 229 | + headers = config["markdown_table_headers"] | |
| 230 | + return AnalysisSchema( | |
| 231 | + name=f"taxonomy:{profile}", | |
| 232 | + shared_instruction=config["shared_instruction"], | |
| 233 | + markdown_table_headers=headers, | |
| 234 | + result_fields=result_fields, | |
| 235 | + meaningful_fields=result_fields, | |
| 236 | + output_languages=tuple(config["output_languages"]), | |
| 265 | 237 | cache_version="v1", |
| 266 | - fallback_headers=TAXONOMY_MARKDOWN_TABLE_HEADERS_EN, | |
| 267 | - ), | |
| 238 | + fallback_headers=headers.get("en") if len(headers) > 1 else None, | |
| 239 | + ) | |
| 240 | + | |
| 241 | + | |
| 242 | +_CATEGORY_TAXONOMY_PROFILE_SCHEMAS: Dict[str, AnalysisSchema] = { | |
| 243 | + profile: _build_taxonomy_profile_schema(profile, config) | |
| 244 | + for profile, config in CATEGORY_TAXONOMY_PROFILES.items() | |
| 268 | 245 | } |
| 269 | 246 | |
| 270 | 247 | _CATEGORY_TAXONOMY_PROFILE_ATTRIBUTE_FIELD_MAPS: Dict[str, Tuple[Tuple[str, str], ...]] = { |
| 271 | - "apparel": _APPAREL_TAXONOMY_ATTRIBUTE_FIELD_MAP, | |
| 248 | + profile: tuple((field["key"], field["label"]) for field in config["fields"]) | |
| 249 | + for profile, config in CATEGORY_TAXONOMY_PROFILES.items() | |
| 272 | 250 | } |
| 273 | 251 | |
| 274 | 252 | |
| 253 | +def get_supported_category_taxonomy_profiles() -> Tuple[str, ...]: | |
| 254 | + return tuple(_CATEGORY_TAXONOMY_PROFILE_SCHEMAS.keys()) | |
| 255 | + | |
| 256 | + | |
| 257 | +def _normalize_category_hint(text: Any) -> str: | |
| 258 | + value = str(text or "").strip().lower() | |
| 259 | + if not value: | |
| 260 | + return "" | |
| 261 | + value = value.replace("_", " ").replace(">", " ").replace("/", " ") | |
| 262 | + value = re.sub(r"\s+", " ", value) | |
| 263 | + return value | |
| 264 | + | |
| 265 | + | |
| 266 | +_CATEGORY_TAXONOMY_PROFILE_ALIAS_MATCHERS: Tuple[Tuple[str, str], ...] = tuple( | |
| 267 | + sorted( | |
| 268 | + ( | |
| 269 | + (_normalize_category_hint(alias), profile) | |
| 270 | + for profile, config in CATEGORY_TAXONOMY_PROFILES.items() | |
| 271 | + for alias in (profile, *tuple(config.get("aliases") or ())) | |
| 272 | + if _normalize_category_hint(alias) | |
| 273 | + ), | |
| 274 | + key=lambda item: len(item[0]), | |
| 275 | + reverse=True, | |
| 276 | + ) | |
| 277 | +) | |
| 278 | + | |
| 279 | + | |
| 275 | 280 | def _normalize_category_taxonomy_profile(category_taxonomy_profile: Optional[str] = None) -> str: |
| 276 | 281 | profile = str(category_taxonomy_profile or _DEFAULT_CATEGORY_TAXONOMY_PROFILE).strip() |
| 277 | 282 | if profile not in _CATEGORY_TAXONOMY_PROFILE_SCHEMAS: |
| 278 | - raise ValueError(f"Unsupported category_taxonomy_profile: {profile}") | |
| 283 | + supported = ", ".join(get_supported_category_taxonomy_profiles()) | |
| 284 | + raise ValueError( | |
| 285 | + f"Unsupported category_taxonomy_profile: {profile}. Supported profiles: {supported}" | |
| 286 | + ) | |
| 279 | 287 | return profile |
| 280 | 288 | |
| 281 | 289 | |
| 290 | +def detect_category_taxonomy_profile(item: Dict[str, Any]) -> Optional[str]: | |
| 291 | + """ | |
| 292 | + 根据商品已有类目信息猜测 taxonomy profile。 | |
| 293 | + 未命中时返回 None,由上层决定是否回退到默认 profile。 | |
| 294 | + """ | |
| 295 | + category_hints = ( | |
| 296 | + item.get("category_taxonomy_profile"), | |
| 297 | + item.get("category1_name"), | |
| 298 | + item.get("category_name_text"), | |
| 299 | + item.get("category"), | |
| 300 | + item.get("category_path"), | |
| 301 | + ) | |
| 302 | + for hint in category_hints: | |
| 303 | + normalized_hint = _normalize_category_hint(hint) | |
| 304 | + if not normalized_hint: | |
| 305 | + continue | |
| 306 | + for alias, profile in _CATEGORY_TAXONOMY_PROFILE_ALIAS_MATCHERS: | |
| 307 | + if alias and alias in normalized_hint: | |
| 308 | + return profile | |
| 309 | + return None | |
| 310 | + | |
| 311 | + | |
| 312 | +def _resolve_category_taxonomy_profile( | |
| 313 | + item: Dict[str, Any], | |
| 314 | + fallback_profile: Optional[str] = None, | |
| 315 | +) -> str: | |
| 316 | + explicit_profile = str(item.get("category_taxonomy_profile") or "").strip() | |
| 317 | + if explicit_profile: | |
| 318 | + return _normalize_category_taxonomy_profile(explicit_profile) | |
| 319 | + detected_profile = detect_category_taxonomy_profile(item) | |
| 320 | + if detected_profile: | |
| 321 | + return detected_profile | |
| 322 | + return _normalize_category_taxonomy_profile(fallback_profile) | |
| 323 | + | |
| 324 | + | |
| 282 | 325 | def _get_analysis_schema( |
| 283 | 326 | analysis_kind: str, |
| 284 | 327 | *, |
| ... | ... | @@ -299,6 +342,17 @@ def _get_taxonomy_attribute_field_map( |
| 299 | 342 | return _CATEGORY_TAXONOMY_PROFILE_ATTRIBUTE_FIELD_MAPS[profile] |
| 300 | 343 | |
| 301 | 344 | |
| 345 | +def _get_analysis_output_languages( | |
| 346 | + analysis_kind: str, | |
| 347 | + *, | |
| 348 | + category_taxonomy_profile: Optional[str] = None, | |
| 349 | +) -> Tuple[str, ...]: | |
| 350 | + return _get_analysis_schema( | |
| 351 | + analysis_kind, | |
| 352 | + category_taxonomy_profile=category_taxonomy_profile, | |
| 353 | + ).output_languages | |
| 354 | + | |
| 355 | + | |
| 302 | 356 | def _normalize_enrichment_scopes( |
| 303 | 357 | enrichment_scopes: Optional[List[str]] = None, |
| 304 | 358 | ) -> Tuple[str, ...]: |
| ... | ... | @@ -508,6 +562,11 @@ def _normalize_index_content_item(item: Dict[str, Any]) -> Dict[str, str]: |
| 508 | 562 | "brief": str(item.get("brief") or "").strip(), |
| 509 | 563 | "description": str(item.get("description") or "").strip(), |
| 510 | 564 | "image_url": str(item.get("image_url") or "").strip(), |
| 565 | + "category": str(item.get("category") or "").strip(), | |
| 566 | + "category_path": str(item.get("category_path") or "").strip(), | |
| 567 | + "category_name_text": str(item.get("category_name_text") or "").strip(), | |
| 568 | + "category1_name": str(item.get("category1_name") or "").strip(), | |
| 569 | + "category_taxonomy_profile": str(item.get("category_taxonomy_profile") or "").strip(), | |
| 511 | 570 | } |
| 512 | 571 | |
| 513 | 572 | |
| ... | ... | @@ -525,7 +584,8 @@ def build_index_content_fields( |
| 525 | 584 | - `title` |
| 526 | 585 | - 可选 `brief` / `description` / `image_url` |
| 527 | 586 | - 可选 `enrichment_scopes`,默认同时执行 `generic` 与 `category_taxonomy` |
| 528 | - - 可选 `category_taxonomy_profile`,默认 `apparel` | |
| 587 | + - 可选 `category_taxonomy_profile`;若不传,则优先根据 item 自带的类目字段推断,否则回退到默认 `apparel` | |
| 588 | + - 可选类目提示字段:`category` / `category_path` / `category_name_text` / `category1_name` | |
| 529 | 589 | |
| 530 | 590 | 返回项结构: |
| 531 | 591 | - `id` |
| ... | ... | @@ -540,10 +600,21 @@ def build_index_content_fields( |
| 540 | 600 | - `enriched_tags.{lang}` 为标签数组 |
| 541 | 601 | """ |
| 542 | 602 | requested_enrichment_scopes = _normalize_enrichment_scopes(enrichment_scopes) |
| 543 | - normalized_taxonomy_profile = _normalize_category_taxonomy_profile(category_taxonomy_profile) | |
| 603 | + fallback_taxonomy_profile = ( | |
| 604 | + _normalize_category_taxonomy_profile(category_taxonomy_profile) | |
| 605 | + if category_taxonomy_profile | |
| 606 | + else None | |
| 607 | + ) | |
| 544 | 608 | normalized_items = [_normalize_index_content_item(item) for item in items] |
| 545 | 609 | if not normalized_items: |
| 546 | 610 | return [] |
| 611 | + taxonomy_profile_by_id = { | |
| 612 | + item["id"]: _resolve_category_taxonomy_profile( | |
| 613 | + item, | |
| 614 | + fallback_profile=fallback_taxonomy_profile, | |
| 615 | + ) | |
| 616 | + for item in normalized_items | |
| 617 | + } | |
| 547 | 618 | |
| 548 | 619 | results_by_id: Dict[str, Dict[str, Any]] = { |
| 549 | 620 | item["id"]: { |
| ... | ... | @@ -556,7 +627,7 @@ def build_index_content_fields( |
| 556 | 627 | for item in normalized_items |
| 557 | 628 | } |
| 558 | 629 | |
| 559 | - for lang in _CORE_INDEX_LANGUAGES: | |
| 630 | + for lang in _get_analysis_output_languages("content"): | |
| 560 | 631 | if "generic" in requested_enrichment_scopes: |
| 561 | 632 | try: |
| 562 | 633 | rows = analyze_products( |
| ... | ... | @@ -565,7 +636,7 @@ def build_index_content_fields( |
| 565 | 636 | batch_size=BATCH_SIZE, |
| 566 | 637 | tenant_id=tenant_id, |
| 567 | 638 | analysis_kind="content", |
| 568 | - category_taxonomy_profile=normalized_taxonomy_profile, | |
| 639 | + category_taxonomy_profile=fallback_taxonomy_profile, | |
| 569 | 640 | ) |
| 570 | 641 | except Exception as e: |
| 571 | 642 | logger.warning("build_index_content_fields content enrichment failed for lang=%s: %s", lang, e) |
| ... | ... | @@ -582,39 +653,49 @@ def build_index_content_fields( |
| 582 | 653 | continue |
| 583 | 654 | _apply_index_content_row(results_by_id[item_id], row=row, lang=lang) |
| 584 | 655 | |
| 585 | - if "category_taxonomy" in requested_enrichment_scopes: | |
| 586 | - try: | |
| 587 | - taxonomy_rows = analyze_products( | |
| 588 | - products=normalized_items, | |
| 589 | - target_lang=lang, | |
| 590 | - batch_size=BATCH_SIZE, | |
| 591 | - tenant_id=tenant_id, | |
| 592 | - analysis_kind="taxonomy", | |
| 593 | - category_taxonomy_profile=normalized_taxonomy_profile, | |
| 594 | - ) | |
| 595 | - except Exception as e: | |
| 596 | - logger.warning( | |
| 597 | - "build_index_content_fields taxonomy enrichment failed for lang=%s: %s", | |
| 598 | - lang, | |
| 599 | - e, | |
| 600 | - ) | |
| 601 | - for item in normalized_items: | |
| 602 | - results_by_id[item["id"]].setdefault("error", str(e)) | |
| 603 | - continue | |
| 656 | + if "category_taxonomy" in requested_enrichment_scopes: | |
| 657 | + items_by_profile: Dict[str, List[Dict[str, str]]] = {} | |
| 658 | + for item in normalized_items: | |
| 659 | + items_by_profile.setdefault(taxonomy_profile_by_id[item["id"]], []).append(item) | |
| 604 | 660 | |
| 605 | - for row in taxonomy_rows or []: | |
| 606 | - item_id = str(row.get("id") or "").strip() | |
| 607 | - if not item_id or item_id not in results_by_id: | |
| 608 | - continue | |
| 609 | - if row.get("error"): | |
| 610 | - results_by_id[item_id].setdefault("error", row["error"]) | |
| 661 | + for taxonomy_profile, profile_items in items_by_profile.items(): | |
| 662 | + for lang in _get_analysis_output_languages( | |
| 663 | + "taxonomy", | |
| 664 | + category_taxonomy_profile=taxonomy_profile, | |
| 665 | + ): | |
| 666 | + try: | |
| 667 | + taxonomy_rows = analyze_products( | |
| 668 | + products=profile_items, | |
| 669 | + target_lang=lang, | |
| 670 | + batch_size=BATCH_SIZE, | |
| 671 | + tenant_id=tenant_id, | |
| 672 | + analysis_kind="taxonomy", | |
| 673 | + category_taxonomy_profile=taxonomy_profile, | |
| 674 | + ) | |
| 675 | + except Exception as e: | |
| 676 | + logger.warning( | |
| 677 | + "build_index_content_fields taxonomy enrichment failed for profile=%s lang=%s: %s", | |
| 678 | + taxonomy_profile, | |
| 679 | + lang, | |
| 680 | + e, | |
| 681 | + ) | |
| 682 | + for item in profile_items: | |
| 683 | + results_by_id[item["id"]].setdefault("error", str(e)) | |
| 611 | 684 | continue |
| 612 | - _apply_index_taxonomy_row( | |
| 613 | - results_by_id[item_id], | |
| 614 | - row=row, | |
| 615 | - lang=lang, | |
| 616 | - category_taxonomy_profile=normalized_taxonomy_profile, | |
| 617 | - ) | |
| 685 | + | |
| 686 | + for row in taxonomy_rows or []: | |
| 687 | + item_id = str(row.get("id") or "").strip() | |
| 688 | + if not item_id or item_id not in results_by_id: | |
| 689 | + continue | |
| 690 | + if row.get("error"): | |
| 691 | + results_by_id[item_id].setdefault("error", row["error"]) | |
| 692 | + continue | |
| 693 | + _apply_index_taxonomy_row( | |
| 694 | + results_by_id[item_id], | |
| 695 | + row=row, | |
| 696 | + lang=lang, | |
| 697 | + category_taxonomy_profile=taxonomy_profile, | |
| 698 | + ) | |
| 618 | 699 | |
| 619 | 700 | return [results_by_id[item["id"]] for item in normalized_items] |
| 620 | 701 | ... | ... |
indexer/product_enrich_prompts.py
| 1 | 1 | #!/usr/bin/env python3 |
| 2 | 2 | |
| 3 | -from typing import Any, Dict | |
| 3 | +from typing import Any, Dict, Tuple | |
| 4 | 4 | |
| 5 | 5 | SYSTEM_MESSAGE = ( |
| 6 | 6 | "You are an e-commerce product annotator. " |
| ... | ... | @@ -33,110 +33,362 @@ Input product list: |
| 33 | 33 | USER_INSTRUCTION_TEMPLATE = """Please strictly return a Markdown table following the given columns in the specified language. For any column containing multiple values, separate them with commas. Do not add any other explanation. |
| 34 | 34 | Language: {language}""" |
| 35 | 35 | |
| 36 | -TAXONOMY_SHARED_ANALYSIS_INSTRUCTION = """Analyze each input product text and fill the columns below using an apparel attribute taxonomy. | |
| 36 | +def _taxonomy_field( | |
| 37 | + key: str, | |
| 38 | + label: str, | |
| 39 | + description: str, | |
| 40 | + zh_label: str | None = None, | |
| 41 | +) -> Dict[str, str]: | |
| 42 | + return { | |
| 43 | + "key": key, | |
| 44 | + "label": label, | |
| 45 | + "description": description, | |
| 46 | + "zh_label": zh_label or label, | |
| 47 | + } | |
| 37 | 48 | |
| 38 | -Output columns: | |
| 39 | -1. Product Type: concise ecommerce apparel category label, not a full marketing title | |
| 40 | -2. Target Gender: intended gender only if clearly implied | |
| 41 | -3. Age Group: only if clearly implied, e.g. adults, kids, teens, toddlers, babies | |
| 42 | -4. Season: season(s) or all-season suitability only if supported | |
| 43 | -5. Fit: body closeness, e.g. slim, regular, relaxed, oversized, fitted | |
| 44 | -6. Silhouette: overall garment shape, e.g. straight, A-line, boxy, tapered, bodycon, wide-leg | |
| 45 | -7. Neckline: neckline type when applicable, e.g. crew neck, V-neck, hooded, collared, square neck | |
| 46 | -8. Sleeve Length Type: sleeve length only, e.g. sleeveless, short sleeve, long sleeve, three-quarter sleeve | |
| 47 | -9. Sleeve Style: sleeve design only, e.g. puff sleeve, raglan sleeve, batwing sleeve, bell sleeve | |
| 48 | -10. Strap Type: strap design when applicable, e.g. spaghetti strap, wide strap, halter strap, adjustable strap | |
| 49 | -11. Rise / Waistline: waist placement when applicable, e.g. high rise, mid rise, low rise, empire waist | |
| 50 | -12. Leg Shape: for bottoms only, e.g. straight leg, wide leg, flare leg, tapered leg, skinny leg | |
| 51 | -13. Skirt Shape: for skirts only, e.g. A-line, pleated, pencil, mermaid | |
| 52 | -14. Length Type: design length only, not size, e.g. cropped, regular, longline, mini, midi, maxi, ankle length, full length | |
| 53 | -15. Closure Type: fastening method when applicable, e.g. zipper, button, drawstring, elastic waist, hook-and-loop | |
| 54 | -16. Design Details: construction or visual details, e.g. ruched, ruffled, pleated, cut-out, layered, distressed, split hem | |
| 55 | -17. Fabric: fabric type only, e.g. denim, knit, chiffon, jersey, fleece, cotton twill | |
| 56 | -18. Material Composition: fiber content or blend only if stated, e.g. cotton, polyester, spandex, linen blend, 95% cotton 5% elastane | |
| 57 | -19. Fabric Properties: inherent fabric traits, e.g. stretch, breathable, lightweight, soft-touch, water-resistant | |
| 58 | -20. Clothing Features: product features, e.g. lined, reversible, hooded, packable, padded, pocketed | |
| 59 | -21. Functional Benefits: wearer benefits, e.g. moisture-wicking, thermal insulation, UV protection, easy care, supportive compression | |
| 60 | -22. Color: specific color name when available | |
| 61 | -23. Color Family: normalized broad retail color group, e.g. black, white, blue, green, red, pink, beige, brown, gray | |
| 62 | -24. Print / Pattern: surface pattern when applicable, e.g. solid, striped, plaid, floral, graphic, animal print | |
| 63 | -25. Occasion / End Use: likely use occasion only if supported, e.g. office, casual wear, streetwear, lounge, workout, outdoor | |
| 64 | -26. Style Aesthetic: overall style only if supported, e.g. minimalist, streetwear, athleisure, smart casual, romantic, playful | |
| 65 | 49 | |
| 66 | -Rules: | |
| 67 | -- Keep the same row order and row count as input. | |
| 68 | -- Infer only from the provided product text. | |
| 69 | -- Leave blank if not applicable or not reasonably supported. | |
| 70 | -- Use concise, standardized ecommerce wording. | |
| 71 | -- Do not combine different attribute dimensions in one field. | |
| 72 | -- If multiple values are needed, use the delimiter required by the localization setting. | |
| 50 | +def _build_taxonomy_shared_instruction(profile_label: str, fields: Tuple[Dict[str, str], ...]) -> str: | |
| 51 | + lines = [ | |
| 52 | + f"Analyze each input product text and fill the columns below using a {profile_label} attribute taxonomy.", | |
| 53 | + "", | |
| 54 | + "Output columns:", | |
| 55 | + ] | |
| 56 | + for idx, field in enumerate(fields, start=1): | |
| 57 | + lines.append(f"{idx}. {field['label']}: {field['description']}") | |
| 58 | + lines.extend( | |
| 59 | + [ | |
| 60 | + "", | |
| 61 | + "Rules:", | |
| 62 | + "- Keep the same row order and row count as input.", | |
| 63 | + "- Infer only from the provided product text.", | |
| 64 | + "- Leave blank if not applicable or not reasonably supported.", | |
| 65 | + "- Use concise, standardized ecommerce wording.", | |
| 66 | + "- Do not combine different attribute dimensions in one field.", | |
| 67 | + "- If multiple values are needed, use the delimiter required by the localization setting.", | |
| 68 | + "", | |
| 69 | + "Input product list:", | |
| 70 | + ] | |
| 71 | + ) | |
| 72 | + return "\n".join(lines) | |
| 73 | 73 | |
| 74 | -Input product list: | |
| 75 | -""" | |
| 76 | 74 | |
| 77 | -TAXONOMY_MARKDOWN_TABLE_HEADERS_EN = [ | |
| 78 | - "No.", | |
| 79 | - "Product Type", | |
| 80 | - "Target Gender", | |
| 81 | - "Age Group", | |
| 82 | - "Season", | |
| 83 | - "Fit", | |
| 84 | - "Silhouette", | |
| 85 | - "Neckline", | |
| 86 | - "Sleeve Length Type", | |
| 87 | - "Sleeve Style", | |
| 88 | - "Strap Type", | |
| 89 | - "Rise / Waistline", | |
| 90 | - "Leg Shape", | |
| 91 | - "Skirt Shape", | |
| 92 | - "Length Type", | |
| 93 | - "Closure Type", | |
| 94 | - "Design Details", | |
| 95 | - "Fabric", | |
| 96 | - "Material Composition", | |
| 97 | - "Fabric Properties", | |
| 98 | - "Clothing Features", | |
| 99 | - "Functional Benefits", | |
| 100 | - "Color", | |
| 101 | - "Color Family", | |
| 102 | - "Print / Pattern", | |
| 103 | - "Occasion / End Use", | |
| 104 | - "Style Aesthetic", | |
| 105 | -] | |
| 75 | +def _make_taxonomy_profile( | |
| 76 | + profile_label: str, | |
| 77 | + fields: Tuple[Dict[str, str], ...], | |
| 78 | + *, | |
| 79 | + aliases: Tuple[str, ...], | |
| 80 | + output_languages: Tuple[str, ...] = ("en",), | |
| 81 | + zh_headers: Tuple[str, ...] = (), | |
| 82 | +) -> Dict[str, Any]: | |
| 83 | + headers = {"en": ["No.", *[field["label"] for field in fields]]} | |
| 84 | + if zh_headers: | |
| 85 | + headers["zh"] = ["序号", *zh_headers] | |
| 86 | + return { | |
| 87 | + "profile_label": profile_label, | |
| 88 | + "fields": fields, | |
| 89 | + "aliases": aliases, | |
| 90 | + "output_languages": output_languages, | |
| 91 | + "shared_instruction": _build_taxonomy_shared_instruction(profile_label, fields), | |
| 92 | + "markdown_table_headers": headers, | |
| 93 | + } | |
| 106 | 94 | |
| 107 | -TAXONOMY_LANGUAGE_MARKDOWN_TABLE_HEADERS: Dict[str, Dict[str, Any]] = { | |
| 108 | - "en": TAXONOMY_MARKDOWN_TABLE_HEADERS_EN, | |
| 109 | - "zh": [ | |
| 110 | - "序号", | |
| 111 | - "品类", | |
| 112 | - "目标性别", | |
| 113 | - "年龄段", | |
| 114 | - "适用季节", | |
| 115 | - "版型", | |
| 116 | - "廓形", | |
| 117 | - "领型", | |
| 118 | - "袖长类型", | |
| 119 | - "袖型", | |
| 120 | - "肩带设计", | |
| 121 | - "腰型", | |
| 122 | - "裤型", | |
| 123 | - "裙型", | |
| 124 | - "长度类型", | |
| 125 | - "闭合方式", | |
| 126 | - "设计细节", | |
| 127 | - "面料", | |
| 128 | - "成分", | |
| 129 | - "面料特性", | |
| 130 | - "服装特征", | |
| 131 | - "功能", | |
| 132 | - "主颜色", | |
| 133 | - "色系", | |
| 134 | - "印花 / 图案", | |
| 135 | - "适用场景", | |
| 136 | - "风格", | |
| 137 | - ], | |
| 95 | + | |
| 96 | +APPAREL_TAXONOMY_FIELDS = ( | |
| 97 | + _taxonomy_field("product_type", "Product Type", "concise ecommerce apparel category label, not a full marketing title", "品类"), | |
| 98 | + _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied", "目标性别"), | |
| 99 | + _taxonomy_field("age_group", "Age Group", "only if clearly implied, e.g. adults, kids, teens, toddlers, babies", "年龄段"), | |
| 100 | + _taxonomy_field("season", "Season", "season(s) or all-season suitability only if supported", "适用季节"), | |
| 101 | + _taxonomy_field("fit", "Fit", "body closeness, e.g. slim, regular, relaxed, oversized, fitted", "版型"), | |
| 102 | + _taxonomy_field("silhouette", "Silhouette", "overall garment shape, e.g. straight, A-line, boxy, tapered, bodycon, wide-leg", "廓形"), | |
| 103 | + _taxonomy_field("neckline", "Neckline", "neckline type when applicable, e.g. crew neck, V-neck, hooded, collared, square neck", "领型"), | |
| 104 | + _taxonomy_field("sleeve_length_type", "Sleeve Length Type", "sleeve length only, e.g. sleeveless, short sleeve, long sleeve, three-quarter sleeve", "袖长类型"), | |
| 105 | + _taxonomy_field("sleeve_style", "Sleeve Style", "sleeve design only, e.g. puff sleeve, raglan sleeve, batwing sleeve, bell sleeve", "袖型"), | |
| 106 | + _taxonomy_field("strap_type", "Strap Type", "strap design when applicable, e.g. spaghetti strap, wide strap, halter strap, adjustable strap", "肩带设计"), | |
| 107 | + _taxonomy_field("rise_waistline", "Rise / Waistline", "waist placement when applicable, e.g. high rise, mid rise, low rise, empire waist", "腰型"), | |
| 108 | + _taxonomy_field("leg_shape", "Leg Shape", "for bottoms only, e.g. straight leg, wide leg, flare leg, tapered leg, skinny leg", "裤型"), | |
| 109 | + _taxonomy_field("skirt_shape", "Skirt Shape", "for skirts only, e.g. A-line, pleated, pencil, mermaid", "裙型"), | |
| 110 | + _taxonomy_field("length_type", "Length Type", "design length only, not size, e.g. cropped, regular, longline, mini, midi, maxi, ankle length, full length", "长度类型"), | |
| 111 | + _taxonomy_field("closure_type", "Closure Type", "fastening method when applicable, e.g. zipper, button, drawstring, elastic waist, hook-and-loop", "闭合方式"), | |
| 112 | + _taxonomy_field("design_details", "Design Details", "construction or visual details, e.g. ruched, ruffled, pleated, cut-out, layered, distressed, split hem", "设计细节"), | |
| 113 | + _taxonomy_field("fabric", "Fabric", "fabric type only, e.g. denim, knit, chiffon, jersey, fleece, cotton twill", "面料"), | |
| 114 | + _taxonomy_field("material_composition", "Material Composition", "fiber content or blend only if stated, e.g. cotton, polyester, spandex, linen blend, 95% cotton 5% elastane", "成分"), | |
| 115 | + _taxonomy_field("fabric_properties", "Fabric Properties", "inherent fabric traits, e.g. stretch, breathable, lightweight, soft-touch, water-resistant", "面料特性"), | |
| 116 | + _taxonomy_field("clothing_features", "Clothing Features", "product features, e.g. lined, reversible, hooded, packable, padded, pocketed", "服装特征"), | |
| 117 | + _taxonomy_field("functional_benefits", "Functional Benefits", "wearer benefits, e.g. moisture-wicking, thermal insulation, UV protection, easy care, supportive compression", "功能"), | |
| 118 | + _taxonomy_field("color", "Color", "specific color name when available", "主颜色"), | |
| 119 | + _taxonomy_field("color_family", "Color Family", "normalized broad retail color group, e.g. black, white, blue, green, red, pink, beige, brown, gray", "色系"), | |
| 120 | + _taxonomy_field("print_pattern", "Print / Pattern", "surface pattern when applicable, e.g. solid, striped, plaid, floral, graphic, animal print", "印花 / 图案"), | |
| 121 | + _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use occasion only if supported, e.g. office, casual wear, streetwear, lounge, workout, outdoor", "适用场景"), | |
| 122 | + _taxonomy_field("style_aesthetic", "Style Aesthetic", "overall style only if supported, e.g. minimalist, streetwear, athleisure, smart casual, romantic, playful", "风格"), | |
| 123 | +) | |
| 124 | + | |
| 125 | +THREE_C_TAXONOMY_FIELDS = ( | |
| 126 | + _taxonomy_field("product_type", "Product Type", "concise 3C accessory or peripheral category label"), | |
| 127 | + _taxonomy_field("compatible_device", "Compatible Device / Model", "supported device family, series, model, or form factor when clearly stated"), | |
| 128 | + _taxonomy_field("connectivity", "Connectivity", "connection method such as wired, wireless, Bluetooth, Wi-Fi, NFC, or 2.4G"), | |
| 129 | + _taxonomy_field("interface_port_type", "Interface / Port Type", "relevant connector or port, e.g. USB-C, Lightning, HDMI, AUX, RJ45"), | |
| 130 | + _taxonomy_field("power_charging", "Power Source / Charging", "charging or power mode, e.g. battery powered, fast charging, rechargeable, plug-in"), | |
| 131 | + _taxonomy_field("key_features", "Key Features", "primary hardware features such as noise cancelling, foldable, magnetic, backlit, waterproof"), | |
| 132 | + _taxonomy_field("material_finish", "Material / Finish", "main material or exterior finish when supported"), | |
| 133 | + _taxonomy_field("color", "Color", "specific color name when available"), | |
| 134 | + _taxonomy_field("pack_size", "Pack Size", "unit count or bundle size when stated"), | |
| 135 | + _taxonomy_field("use_case", "Use Case", "intended usage such as travel, office, gaming, car, charging, streaming"), | |
| 136 | +) | |
| 137 | + | |
| 138 | +BAGS_TAXONOMY_FIELDS = ( | |
| 139 | + _taxonomy_field("product_type", "Product Type", "concise bag category such as backpack, tote bag, crossbody bag, luggage, or wallet"), | |
| 140 | + _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied"), | |
| 141 | + _taxonomy_field("carry_style", "Carry Style", "how the bag is worn or carried, e.g. handheld, shoulder, crossbody, backpack"), | |
| 142 | + _taxonomy_field("size_capacity", "Size / Capacity", "size tier or capacity when supported, e.g. mini, large capacity, 20L"), | |
| 143 | + _taxonomy_field("material", "Material", "main bag material such as leather, nylon, canvas, PU, straw"), | |
| 144 | + _taxonomy_field("closure_type", "Closure Type", "bag closure such as zipper, flap, buckle, drawstring, magnetic snap"), | |
| 145 | + _taxonomy_field("structure_compartments", "Structure / Compartments", "organizational structure such as multi-pocket, laptop sleeve, card slots, expandable"), | |
| 146 | + _taxonomy_field("strap_handle_type", "Strap / Handle Type", "strap or handle design such as chain strap, top handle, adjustable strap"), | |
| 147 | + _taxonomy_field("color", "Color", "specific color name when available"), | |
| 148 | + _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use such as commute, travel, evening, school, casual"), | |
| 149 | +) | |
| 150 | + | |
| 151 | +PET_SUPPLIES_TAXONOMY_FIELDS = ( | |
| 152 | + _taxonomy_field("product_type", "Product Type", "concise pet supplies category label"), | |
| 153 | + _taxonomy_field("pet_type", "Pet Type", "target pet such as dog, cat, bird, fish, hamster"), | |
| 154 | + _taxonomy_field("breed_size", "Breed Size", "pet size or breed size when stated, e.g. small breed, large dogs"), | |
| 155 | + _taxonomy_field("life_stage", "Life Stage", "pet age stage when supported, e.g. puppy, kitten, adult, senior"), | |
| 156 | + _taxonomy_field("material_ingredients", "Material / Ingredients", "main material or ingredient composition when supported"), | |
| 157 | + _taxonomy_field("flavor_scent", "Flavor / Scent", "flavor or scent when applicable"), | |
| 158 | + _taxonomy_field("key_features", "Key Features", "primary attributes such as interactive, leak-proof, orthopedic, washable, elevated"), | |
| 159 | + _taxonomy_field("functional_benefits", "Functional Benefits", "benefits such as dental care, calming, digestion support, joint support"), | |
| 160 | + _taxonomy_field("size_capacity", "Size / Capacity", "size, count, or net content when stated"), | |
| 161 | + _taxonomy_field("use_scenario", "Use Scenario", "usage such as feeding, training, grooming, travel, indoor play"), | |
| 162 | +) | |
| 163 | + | |
| 164 | +ELECTRONICS_TAXONOMY_FIELDS = ( | |
| 165 | + _taxonomy_field("product_type", "Product Type", "concise electronics device or component category label"), | |
| 166 | + _taxonomy_field("device_category", "Device Category / Compatibility", "supported platform, component class, or compatible device family when stated"), | |
| 167 | + _taxonomy_field("power_voltage", "Power / Voltage", "power, voltage, wattage, or battery spec when supported"), | |
| 168 | + _taxonomy_field("connectivity", "Connectivity", "connection method such as wired, Bluetooth, Wi-Fi, RF, or smart app control"), | |
| 169 | + _taxonomy_field("interface_port_type", "Interface / Port Type", "relevant port or interface such as USB-C, AC plug type, HDMI, SATA"), | |
| 170 | + _taxonomy_field("capacity_storage", "Capacity / Storage", "capacity or storage spec such as 256GB, 2TB, 5000mAh"), | |
| 171 | + _taxonomy_field("key_features", "Key Features", "main product features such as touch control, HD display, noise reduction, smart control"), | |
| 172 | + _taxonomy_field("material_finish", "Material / Finish", "main housing material or finish when supported"), | |
| 173 | + _taxonomy_field("color", "Color", "specific color name when available"), | |
| 174 | + _taxonomy_field("use_case", "Use Case", "intended use such as home entertainment, office, charging, security, repair"), | |
| 175 | +) | |
| 176 | + | |
| 177 | +OUTDOOR_TAXONOMY_FIELDS = ( | |
| 178 | + _taxonomy_field("product_type", "Product Type", "concise outdoor gear category label"), | |
| 179 | + _taxonomy_field("activity_type", "Activity Type", "primary outdoor activity such as camping, hiking, fishing, climbing, travel"), | |
| 180 | + _taxonomy_field("season_weather", "Season / Weather", "season or weather suitability when supported"), | |
| 181 | + _taxonomy_field("material", "Material", "main material such as aluminum, ripstop nylon, stainless steel, EVA"), | |
| 182 | + _taxonomy_field("capacity_size", "Capacity / Size", "size, length, or capacity when stated"), | |
| 183 | + _taxonomy_field("protection_resistance", "Protection / Resistance", "resistance or protection such as waterproof, UV resistant, windproof"), | |
| 184 | + _taxonomy_field("key_features", "Key Features", "primary gear attributes such as foldable, lightweight, insulated, non-slip"), | |
| 185 | + _taxonomy_field("portability_packability", "Portability / Packability", "carry or storage trait such as collapsible, compact, ultralight, packable"), | |
| 186 | + _taxonomy_field("color", "Color", "specific color name when available"), | |
| 187 | + _taxonomy_field("use_scenario", "Use Scenario", "likely use setting such as campsite, trail, survival kit, beach, picnic"), | |
| 188 | +) | |
| 189 | + | |
| 190 | +HOME_APPLIANCES_TAXONOMY_FIELDS = ( | |
| 191 | + _taxonomy_field("product_type", "Product Type", "concise home appliance category label"), | |
| 192 | + _taxonomy_field("appliance_category", "Appliance Category", "functional class such as kitchen appliance, cleaning appliance, personal care appliance"), | |
| 193 | + _taxonomy_field("power_voltage", "Power / Voltage", "wattage, voltage, plug type, or power supply when supported"), | |
| 194 | + _taxonomy_field("capacity_coverage", "Capacity / Coverage", "capacity or coverage metric such as 1.5L, 20L, 40sqm"), | |
| 195 | + _taxonomy_field("control_method", "Control Method", "operation method such as touch, knob, remote, app control"), | |
| 196 | + _taxonomy_field("installation_type", "Installation Type", "setup style such as countertop, handheld, portable, wall-mounted, built-in"), | |
| 197 | + _taxonomy_field("key_features", "Key Features", "main product features such as timer, steam, HEPA filter, self-cleaning"), | |
| 198 | + _taxonomy_field("material_finish", "Material / Finish", "main material or exterior finish when supported"), | |
| 199 | + _taxonomy_field("color", "Color", "specific color name when available"), | |
| 200 | + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as cooking, cleaning, grooming, cooling, air treatment"), | |
| 201 | +) | |
| 202 | + | |
| 203 | +HOME_LIVING_TAXONOMY_FIELDS = ( | |
| 204 | + _taxonomy_field("product_type", "Product Type", "concise home and living category label"), | |
| 205 | + _taxonomy_field("room_placement", "Room / Placement", "intended room or placement such as bedroom, kitchen, bathroom, desktop"), | |
| 206 | + _taxonomy_field("material", "Material", "main material such as wood, ceramic, cotton, glass, metal"), | |
| 207 | + _taxonomy_field("style", "Style", "home style such as modern, farmhouse, minimalist, boho, Nordic"), | |
| 208 | + _taxonomy_field("size_dimensions", "Size / Dimensions", "size or dimensions when stated"), | |
| 209 | + _taxonomy_field("color", "Color", "specific color name when available"), | |
| 210 | + _taxonomy_field("pattern_finish", "Pattern / Finish", "surface pattern or finish such as solid, marble, matte, ribbed"), | |
| 211 | + _taxonomy_field("key_features", "Key Features", "main product features such as stackable, washable, blackout, space-saving"), | |
| 212 | + _taxonomy_field("assembly_installation", "Assembly / Installation", "assembly or installation trait when supported"), | |
| 213 | + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as storage, dining, decor, sleep, organization"), | |
| 214 | +) | |
| 215 | + | |
| 216 | +WIGS_TAXONOMY_FIELDS = ( | |
| 217 | + _taxonomy_field("product_type", "Product Type", "concise wig or hairpiece category label"), | |
| 218 | + _taxonomy_field("hair_material", "Hair Material", "hair material such as human hair, synthetic fiber, heat-resistant fiber"), | |
| 219 | + _taxonomy_field("hair_texture", "Hair Texture", "texture or curl pattern such as straight, body wave, curly, kinky"), | |
| 220 | + _taxonomy_field("hair_length", "Hair Length", "hair length when stated"), | |
| 221 | + _taxonomy_field("hair_color", "Hair Color", "specific hair color or blend when available"), | |
| 222 | + _taxonomy_field("cap_construction", "Cap Construction", "cap type such as full lace, lace front, glueless, U part"), | |
| 223 | + _taxonomy_field("lace_area_part_type", "Lace Area / Part Type", "lace size or part style such as 13x4 lace, middle part, T part"), | |
| 224 | + _taxonomy_field("density_volume", "Density / Volume", "hair density or fullness when supported"), | |
| 225 | + _taxonomy_field("style_bang_type", "Style / Bang Type", "style cue such as bob, pixie, layered, with bangs"), | |
| 226 | + _taxonomy_field("occasion_end_use", "Occasion / End Use", "intended use such as daily wear, cosplay, protective style, party"), | |
| 227 | +) | |
| 228 | + | |
| 229 | +BEAUTY_TAXONOMY_FIELDS = ( | |
| 230 | + _taxonomy_field("product_type", "Product Type", "concise beauty or cosmetics category label"), | |
| 231 | + _taxonomy_field("target_area", "Target Area", "target area such as face, lips, eyes, nails, hair, body"), | |
| 232 | + _taxonomy_field("skin_hair_type", "Skin Type / Hair Type", "suitable skin or hair type when supported"), | |
| 233 | + _taxonomy_field("finish_effect", "Finish / Effect", "cosmetic finish or effect such as matte, dewy, volumizing, brightening"), | |
| 234 | + _taxonomy_field("key_ingredients", "Key Ingredients", "notable ingredients when stated"), | |
| 235 | + _taxonomy_field("shade_color", "Shade / Color", "specific shade or color when available"), | |
| 236 | + _taxonomy_field("scent", "Scent", "fragrance or scent only when supported"), | |
| 237 | + _taxonomy_field("formulation", "Formulation", "product form such as cream, serum, powder, gel, stick"), | |
| 238 | + _taxonomy_field("functional_benefits", "Functional Benefits", "benefits such as hydration, anti-aging, long-wear, repair, sun protection"), | |
| 239 | + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as daily routine, salon, travel, evening makeup"), | |
| 240 | +) | |
| 241 | + | |
| 242 | +ACCESSORIES_TAXONOMY_FIELDS = ( | |
| 243 | + _taxonomy_field("product_type", "Product Type", "concise accessory category label such as necklace, watch, belt, hat, or sunglasses"), | |
| 244 | + _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied"), | |
| 245 | + _taxonomy_field("material", "Material", "main material such as alloy, leather, stainless steel, acetate, fabric"), | |
| 246 | + _taxonomy_field("color", "Color", "specific color name when available"), | |
| 247 | + _taxonomy_field("pattern_finish", "Pattern / Finish", "surface treatment or style finish such as polished, textured, braided, rhinestone"), | |
| 248 | + _taxonomy_field("closure_fastening", "Closure / Fastening", "fastening method when applicable"), | |
| 249 | + _taxonomy_field("size_fit", "Size / Fit", "size or fit information such as adjustable, one size, 42mm"), | |
| 250 | + _taxonomy_field("style", "Style", "style cue such as minimalist, vintage, statement, sporty"), | |
| 251 | + _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use such as daily wear, formal, party, travel, sun protection"), | |
| 252 | + _taxonomy_field("set_pack_size", "Set / Pack Size", "set count or pack size when stated"), | |
| 253 | +) | |
| 254 | + | |
| 255 | +TOYS_TAXONOMY_FIELDS = ( | |
| 256 | + _taxonomy_field("product_type", "Product Type", "concise toy category label"), | |
| 257 | + _taxonomy_field("age_group", "Age Group", "intended age group when clearly implied"), | |
| 258 | + _taxonomy_field("character_theme", "Character / Theme", "licensed character, theme, or play theme when supported"), | |
| 259 | + _taxonomy_field("material", "Material", "main toy material such as plush, plastic, wood, silicone"), | |
| 260 | + _taxonomy_field("power_source", "Power Source", "battery, rechargeable, wind-up, or non-powered when supported"), | |
| 261 | + _taxonomy_field("interactive_features", "Interactive Features", "interactive functions such as sound, lights, remote control, motion"), | |
| 262 | + _taxonomy_field("educational_play_value", "Educational / Play Value", "play value such as STEM, pretend play, sensory, puzzle solving"), | |
| 263 | + _taxonomy_field("piece_count_size", "Piece Count / Size", "piece count or size when stated"), | |
| 264 | + _taxonomy_field("color", "Color", "specific color name when available"), | |
| 265 | + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as indoor play, bath time, party favor, outdoor play"), | |
| 266 | +) | |
| 267 | + | |
| 268 | +SHOES_TAXONOMY_FIELDS = ( | |
| 269 | + _taxonomy_field("product_type", "Product Type", "concise footwear category label"), | |
| 270 | + _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied"), | |
| 271 | + _taxonomy_field("age_group", "Age Group", "only if clearly implied"), | |
| 272 | + _taxonomy_field("closure_type", "Closure Type", "fastening method such as lace-up, slip-on, buckle, hook-and-loop"), | |
| 273 | + _taxonomy_field("toe_shape", "Toe Shape", "toe shape when applicable, e.g. round toe, pointed toe, open toe"), | |
| 274 | + _taxonomy_field("heel_sole_type", "Heel Height / Sole Type", "heel or sole profile such as flat, block heel, wedge, platform, thick sole"), | |
| 275 | + _taxonomy_field("upper_material", "Upper Material", "main upper material such as leather, knit, canvas, mesh"), | |
| 276 | + _taxonomy_field("lining_insole_material", "Lining / Insole Material", "lining or insole material when supported"), | |
| 277 | + _taxonomy_field("color", "Color", "specific color name when available"), | |
| 278 | + _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use such as running, casual, office, hiking, formal"), | |
| 279 | +) | |
| 280 | + | |
| 281 | +SPORTS_TAXONOMY_FIELDS = ( | |
| 282 | + _taxonomy_field("product_type", "Product Type", "concise sports product category label"), | |
| 283 | + _taxonomy_field("sport_activity", "Sport / Activity", "primary sport or activity such as fitness, yoga, basketball, cycling, swimming"), | |
| 284 | + _taxonomy_field("skill_level", "Skill Level", "target user level when supported, e.g. beginner, training, professional"), | |
| 285 | + _taxonomy_field("material", "Material", "main material such as EVA, carbon fiber, neoprene, latex"), | |
| 286 | + _taxonomy_field("size_capacity", "Size / Capacity", "size, weight, resistance level, or capacity when stated"), | |
| 287 | + _taxonomy_field("protection_support", "Protection / Support", "support or protection function such as ankle support, shock absorption, impact protection"), | |
| 288 | + _taxonomy_field("key_features", "Key Features", "main features such as anti-slip, adjustable, foldable, quick-dry"), | |
| 289 | + _taxonomy_field("power_source", "Power Source", "battery, electric, or non-powered when applicable"), | |
| 290 | + _taxonomy_field("color", "Color", "specific color name when available"), | |
| 291 | + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as gym, home workout, field training, competition"), | |
| 292 | +) | |
| 293 | + | |
| 294 | +OTHERS_TAXONOMY_FIELDS = ( | |
| 295 | + _taxonomy_field("product_type", "Product Type", "concise product category label, not a full marketing title"), | |
| 296 | + _taxonomy_field("product_category", "Product Category", "broader retail grouping when the specific product type is narrow"), | |
| 297 | + _taxonomy_field("target_user", "Target User", "intended user, audience, or recipient when clearly implied"), | |
| 298 | + _taxonomy_field("material_ingredients", "Material / Ingredients", "main material or ingredients when supported"), | |
| 299 | + _taxonomy_field("key_features", "Key Features", "primary product attributes or standout features"), | |
| 300 | + _taxonomy_field("functional_benefits", "Functional Benefits", "practical benefits or performance advantages when supported"), | |
| 301 | + _taxonomy_field("size_capacity", "Size / Capacity", "size, count, weight, or capacity when stated"), | |
| 302 | + _taxonomy_field("color", "Color", "specific color name when available"), | |
| 303 | + _taxonomy_field("style_theme", "Style / Theme", "overall style, design theme, or visual direction when supported"), | |
| 304 | + _taxonomy_field("use_scenario", "Use Scenario", "likely use occasion or application setting when supported"), | |
| 305 | +) | |
| 306 | + | |
| 307 | +CATEGORY_TAXONOMY_PROFILES: Dict[str, Dict[str, Any]] = { | |
| 308 | + "apparel": _make_taxonomy_profile( | |
| 309 | + "apparel", | |
| 310 | + APPAREL_TAXONOMY_FIELDS, | |
| 311 | + aliases=("服装", "服饰", "apparel", "clothing", "fashion"), | |
| 312 | + output_languages=("zh", "en"), | |
| 313 | + zh_headers=tuple(field["zh_label"] for field in APPAREL_TAXONOMY_FIELDS), | |
| 314 | + ), | |
| 315 | + "3c": _make_taxonomy_profile( | |
| 316 | + "3C", | |
| 317 | + THREE_C_TAXONOMY_FIELDS, | |
| 318 | + aliases=("3c", "数码", "phone accessories", "computer peripherals", "smart wearables", "audio", "gaming gear"), | |
| 319 | + ), | |
| 320 | + "bags": _make_taxonomy_profile( | |
| 321 | + "bags", | |
| 322 | + BAGS_TAXONOMY_FIELDS, | |
| 323 | + aliases=("bags", "bag", "包", "箱包", "handbag", "backpack", "wallet", "luggage"), | |
| 324 | + ), | |
| 325 | + "pet_supplies": _make_taxonomy_profile( | |
| 326 | + "pet supplies", | |
| 327 | + PET_SUPPLIES_TAXONOMY_FIELDS, | |
| 328 | + aliases=("pet", "宠物", "pet supplies", "pet food", "pet toys", "pet care"), | |
| 329 | + ), | |
| 330 | + "electronics": _make_taxonomy_profile( | |
| 331 | + "electronics", | |
| 332 | + ELECTRONICS_TAXONOMY_FIELDS, | |
| 333 | + aliases=("electronics", "电子", "electronic components", "consumer electronics", "digital devices"), | |
| 334 | + ), | |
| 335 | + "outdoor": _make_taxonomy_profile( | |
| 336 | + "outdoor products", | |
| 337 | + OUTDOOR_TAXONOMY_FIELDS, | |
| 338 | + aliases=("outdoor", "户外", "camping", "hiking", "fishing", "travel accessories"), | |
| 339 | + ), | |
| 340 | + "home_appliances": _make_taxonomy_profile( | |
| 341 | + "home appliances", | |
| 342 | + HOME_APPLIANCES_TAXONOMY_FIELDS, | |
| 343 | + aliases=("home appliances", "家电", "电器", "kitchen appliances", "cleaning appliances", "smart home devices"), | |
| 344 | + ), | |
| 345 | + "home_living": _make_taxonomy_profile( | |
| 346 | + "home and living", | |
| 347 | + HOME_LIVING_TAXONOMY_FIELDS, | |
| 348 | + aliases=("home", "living", "家居", "家具", "家纺", "home decor", "kitchenware"), | |
| 349 | + ), | |
| 350 | + "wigs": _make_taxonomy_profile( | |
| 351 | + "wigs", | |
| 352 | + WIGS_TAXONOMY_FIELDS, | |
| 353 | + aliases=("wig", "wigs", "假发", "hairpiece"), | |
| 354 | + ), | |
| 355 | + "beauty": _make_taxonomy_profile( | |
| 356 | + "beauty and cosmetics", | |
| 357 | + BEAUTY_TAXONOMY_FIELDS, | |
| 358 | + aliases=("beauty", "cosmetics", "美容", "美妆", "makeup", "skincare", "nail care"), | |
| 359 | + ), | |
| 360 | + "accessories": _make_taxonomy_profile( | |
| 361 | + "accessories", | |
| 362 | + ACCESSORIES_TAXONOMY_FIELDS, | |
| 363 | + aliases=("accessories", "配饰", "jewelry", "watches", "belts", "scarves", "hats", "sunglasses"), | |
| 364 | + ), | |
| 365 | + "toys": _make_taxonomy_profile( | |
| 366 | + "toys", | |
| 367 | + TOYS_TAXONOMY_FIELDS, | |
| 368 | + aliases=("toys", "toy", "玩具", "plush", "action figures", "puzzles", "educational toys"), | |
| 369 | + ), | |
| 370 | + "shoes": _make_taxonomy_profile( | |
| 371 | + "shoes", | |
| 372 | + SHOES_TAXONOMY_FIELDS, | |
| 373 | + aliases=("shoes", "shoe", "鞋", "sneakers", "boots", "sandals", "heels"), | |
| 374 | + ), | |
| 375 | + "sports": _make_taxonomy_profile( | |
| 376 | + "sports products", | |
| 377 | + SPORTS_TAXONOMY_FIELDS, | |
| 378 | + aliases=("sports", "sport", "运动", "fitness", "cycling", "team sports", "water sports"), | |
| 379 | + ), | |
| 380 | + "others": _make_taxonomy_profile( | |
| 381 | + "general merchandise", | |
| 382 | + OTHERS_TAXONOMY_FIELDS, | |
| 383 | + aliases=("others", "other", "其他", "general merchandise"), | |
| 384 | + ), | |
| 138 | 385 | } |
| 139 | 386 | |
| 387 | +CATEGORY_TAXONOMY_PROFILE_NAMES = tuple(CATEGORY_TAXONOMY_PROFILES.keys()) | |
| 388 | +TAXONOMY_SHARED_ANALYSIS_INSTRUCTION = CATEGORY_TAXONOMY_PROFILES["apparel"]["shared_instruction"] | |
| 389 | +TAXONOMY_MARKDOWN_TABLE_HEADERS_EN = CATEGORY_TAXONOMY_PROFILES["apparel"]["markdown_table_headers"]["en"] | |
| 390 | +TAXONOMY_LANGUAGE_MARKDOWN_TABLE_HEADERS: Dict[str, Dict[str, Any]] = CATEGORY_TAXONOMY_PROFILES["apparel"]["markdown_table_headers"] | |
| 391 | + | |
| 140 | 392 | LANGUAGE_MARKDOWN_TABLE_HEADERS: Dict[str, Dict[str, Any]] = { |
| 141 | 393 | "en": [ |
| 142 | 394 | "No.", | ... | ... |
indexer/taxonomy.md
| ... | ... | @@ -171,3 +171,27 @@ Rules: |
| 171 | 171 | Input product list: |
| 172 | 172 | """ |
| 173 | 173 | ``` |
| 174 | + | |
| 175 | +## 2. Other taxonomy profiles | |
| 176 | + | |
| 177 | +说明: | |
| 178 | +- `apparel` 继续返回 `zh` + `en`。 | |
| 179 | +- 其他 profile 只返回 `en`,并且只定义英文列名。 | |
| 180 | +- 代码中的 profile slug 与下面保持一致。 | |
| 181 | + | |
| 182 | +| Profile | Core columns (`en`) | | |
| 183 | +| --- | --- | | |
| 184 | +| `3c` | Product Type, Compatible Device / Model, Connectivity, Interface / Port Type, Power Source / Charging, Key Features, Material / Finish, Color, Pack Size, Use Case | | |
| 185 | +| `bags` | Product Type, Target Gender, Carry Style, Size / Capacity, Material, Closure Type, Structure / Compartments, Strap / Handle Type, Color, Occasion / End Use | | |
| 186 | +| `pet_supplies` | Product Type, Pet Type, Breed Size, Life Stage, Material / Ingredients, Flavor / Scent, Key Features, Functional Benefits, Size / Capacity, Use Scenario | | |
| 187 | +| `electronics` | Product Type, Device Category / Compatibility, Power / Voltage, Connectivity, Interface / Port Type, Capacity / Storage, Key Features, Material / Finish, Color, Use Case | | |
| 188 | +| `outdoor` | Product Type, Activity Type, Season / Weather, Material, Capacity / Size, Protection / Resistance, Key Features, Portability / Packability, Color, Use Scenario | | |
| 189 | +| `home_appliances` | Product Type, Appliance Category, Power / Voltage, Capacity / Coverage, Control Method, Installation Type, Key Features, Material / Finish, Color, Use Scenario | | |
| 190 | +| `home_living` | Product Type, Room / Placement, Material, Style, Size / Dimensions, Color, Pattern / Finish, Key Features, Assembly / Installation, Use Scenario | | |
| 191 | +| `wigs` | Product Type, Hair Material, Hair Texture, Hair Length, Hair Color, Cap Construction, Lace Area / Part Type, Density / Volume, Style / Bang Type, Occasion / End Use | | |
| 192 | +| `beauty` | Product Type, Target Area, Skin Type / Hair Type, Finish / Effect, Key Ingredients, Shade / Color, Scent, Formulation, Functional Benefits, Use Scenario | | |
| 193 | +| `accessories` | Product Type, Target Gender, Material, Color, Pattern / Finish, Closure / Fastening, Size / Fit, Style, Occasion / End Use, Set / Pack Size | | |
| 194 | +| `toys` | Product Type, Age Group, Character / Theme, Material, Power Source, Interactive Features, Educational / Play Value, Piece Count / Size, Color, Use Scenario | | |
| 195 | +| `shoes` | Product Type, Target Gender, Age Group, Closure Type, Toe Shape, Heel Height / Sole Type, Upper Material, Lining / Insole Material, Color, Occasion / End Use | | |
| 196 | +| `sports` | Product Type, Sport / Activity, Skill Level, Material, Size / Capacity, Protection / Support, Key Features, Power Source, Color, Use Scenario | | |
| 197 | +| `others` | Product Type, Product Category, Target User, Material / Ingredients, Key Features, Functional Benefits, Size / Capacity, Color, Style / Theme, Use Scenario | | ... | ... |
tests/ci/test_service_api_contracts.py
| ... | ... | @@ -454,6 +454,52 @@ def test_indexer_enrich_content_contract_accepts_deprecated_analysis_kinds(index |
| 454 | 454 | assert data["category_taxonomy_profile"] == "apparel" |
| 455 | 455 | |
| 456 | 456 | |
| 457 | +def test_indexer_enrich_content_contract_supports_non_apparel_taxonomy_profiles(indexer_client: TestClient, monkeypatch): | |
| 458 | + import indexer.product_enrich as process_products | |
| 459 | + | |
| 460 | + def _fake_build_index_content_fields( | |
| 461 | + items: List[Dict[str, str]], | |
| 462 | + tenant_id: str | None = None, | |
| 463 | + enrichment_scopes: List[str] | None = None, | |
| 464 | + category_taxonomy_profile: str = "apparel", | |
| 465 | + ): | |
| 466 | + assert tenant_id == "162" | |
| 467 | + assert enrichment_scopes == ["category_taxonomy"] | |
| 468 | + assert category_taxonomy_profile == "toys" | |
| 469 | + return [ | |
| 470 | + { | |
| 471 | + "id": items[0]["spu_id"], | |
| 472 | + "qanchors": {}, | |
| 473 | + "enriched_tags": {}, | |
| 474 | + "enriched_attributes": [], | |
| 475 | + "enriched_taxonomy_attributes": [ | |
| 476 | + {"name": "Product Type", "value": {"en": ["doll set"]}}, | |
| 477 | + {"name": "Age Group", "value": {"en": ["kids"]}}, | |
| 478 | + ], | |
| 479 | + } | |
| 480 | + ] | |
| 481 | + | |
| 482 | + monkeypatch.setattr(process_products, "build_index_content_fields", _fake_build_index_content_fields) | |
| 483 | + | |
| 484 | + response = indexer_client.post( | |
| 485 | + "/indexer/enrich-content", | |
| 486 | + json={ | |
| 487 | + "tenant_id": "162", | |
| 488 | + "enrichment_scopes": ["category_taxonomy"], | |
| 489 | + "category_taxonomy_profile": "toys", | |
| 490 | + "items": [{"spu_id": "1001", "title": "Toy"}], | |
| 491 | + }, | |
| 492 | + ) | |
| 493 | + | |
| 494 | + assert response.status_code == 200 | |
| 495 | + data = response.json() | |
| 496 | + assert data["category_taxonomy_profile"] == "toys" | |
| 497 | + assert data["results"][0]["enriched_taxonomy_attributes"] == [ | |
| 498 | + {"name": "Product Type", "value": {"en": ["doll set"]}}, | |
| 499 | + {"name": "Age Group", "value": {"en": ["kids"]}}, | |
| 500 | + ] | |
| 501 | + | |
| 502 | + | |
| 457 | 503 | def test_indexer_documents_contract(indexer_client: TestClient): |
| 458 | 504 | """POST /indexer/documents: tenant_id + spu_ids, returns success/failed lists (no ES write).""" |
| 459 | 505 | response = indexer_client.post( | ... | ... |
tests/test_product_enrich_partial_mode.py
| ... | ... | @@ -500,7 +500,6 @@ def test_build_index_content_fields_maps_internal_tags_to_enriched_tags_output() |
| 500 | 500 | "style_aesthetic": "", |
| 501 | 501 | } |
| 502 | 502 | ] |
| 503 | - assert category_taxonomy_profile == "apparel" | |
| 504 | 503 | return [ |
| 505 | 504 | { |
| 506 | 505 | "id": products[0]["id"], |
| ... | ... | @@ -562,6 +561,120 @@ def test_build_index_content_fields_maps_internal_tags_to_enriched_tags_output() |
| 562 | 561 | ] |
| 563 | 562 | |
| 564 | 563 | |
| 564 | +def test_detect_category_taxonomy_profile_matches_category_hints(): | |
| 565 | + assert product_enrich.detect_category_taxonomy_profile({"category1_name": "玩具"}) == "toys" | |
| 566 | + assert product_enrich.detect_category_taxonomy_profile({"category": "Beauty & Cosmetics"}) == "beauty" | |
| 567 | + assert product_enrich.detect_category_taxonomy_profile({"category_path": "Home Appliances / Kitchen"}) == "home_appliances" | |
| 568 | + | |
| 569 | + | |
| 570 | +def test_build_index_content_fields_routes_taxonomy_by_item_profile_and_non_apparel_returns_en_only(): | |
| 571 | + seen_calls = [] | |
| 572 | + | |
| 573 | + def fake_analyze_products( | |
| 574 | + products, | |
| 575 | + target_lang="zh", | |
| 576 | + batch_size=None, | |
| 577 | + tenant_id=None, | |
| 578 | + analysis_kind="content", | |
| 579 | + category_taxonomy_profile=None, | |
| 580 | + ): | |
| 581 | + seen_calls.append((analysis_kind, target_lang, category_taxonomy_profile, tuple(p["id"] for p in products))) | |
| 582 | + if analysis_kind == "taxonomy": | |
| 583 | + if category_taxonomy_profile == "apparel": | |
| 584 | + return [ | |
| 585 | + { | |
| 586 | + "id": products[0]["id"], | |
| 587 | + "lang": target_lang, | |
| 588 | + "title_input": products[0]["title"], | |
| 589 | + "product_type": f"{target_lang}-dress", | |
| 590 | + "target_gender": f"{target_lang}-women", | |
| 591 | + "age_group": "", | |
| 592 | + "season": "", | |
| 593 | + "fit": "", | |
| 594 | + "silhouette": "", | |
| 595 | + "neckline": "", | |
| 596 | + "sleeve_length_type": "", | |
| 597 | + "sleeve_style": "", | |
| 598 | + "strap_type": "", | |
| 599 | + "rise_waistline": "", | |
| 600 | + "leg_shape": "", | |
| 601 | + "skirt_shape": "", | |
| 602 | + "length_type": "", | |
| 603 | + "closure_type": "", | |
| 604 | + "design_details": "", | |
| 605 | + "fabric": "", | |
| 606 | + "material_composition": "", | |
| 607 | + "fabric_properties": "", | |
| 608 | + "clothing_features": "", | |
| 609 | + "functional_benefits": "", | |
| 610 | + "color": "", | |
| 611 | + "color_family": "", | |
| 612 | + "print_pattern": "", | |
| 613 | + "occasion_end_use": "", | |
| 614 | + "style_aesthetic": "", | |
| 615 | + } | |
| 616 | + ] | |
| 617 | + assert category_taxonomy_profile == "toys" | |
| 618 | + assert target_lang == "en" | |
| 619 | + return [ | |
| 620 | + { | |
| 621 | + "id": products[0]["id"], | |
| 622 | + "lang": "en", | |
| 623 | + "title_input": products[0]["title"], | |
| 624 | + "product_type": "doll set", | |
| 625 | + "age_group": "kids", | |
| 626 | + "character_theme": "", | |
| 627 | + "material": "", | |
| 628 | + "power_source": "", | |
| 629 | + "interactive_features": "", | |
| 630 | + "educational_play_value": "", | |
| 631 | + "piece_count_size": "", | |
| 632 | + "color": "", | |
| 633 | + "use_scenario": "", | |
| 634 | + } | |
| 635 | + ] | |
| 636 | + | |
| 637 | + return [ | |
| 638 | + { | |
| 639 | + "id": product["id"], | |
| 640 | + "lang": target_lang, | |
| 641 | + "title_input": product["title"], | |
| 642 | + "title": product["title"], | |
| 643 | + "category_path": "", | |
| 644 | + "tags": f"{target_lang}-tag", | |
| 645 | + "target_audience": "", | |
| 646 | + "usage_scene": "", | |
| 647 | + "season": "", | |
| 648 | + "key_attributes": "", | |
| 649 | + "material": "", | |
| 650 | + "features": "", | |
| 651 | + "anchor_text": f"{target_lang}-anchor", | |
| 652 | + } | |
| 653 | + for product in products | |
| 654 | + ] | |
| 655 | + | |
| 656 | + with mock.patch.object(product_enrich, "analyze_products", side_effect=fake_analyze_products): | |
| 657 | + result = product_enrich.build_index_content_fields( | |
| 658 | + items=[ | |
| 659 | + {"spu_id": "1", "title": "dress", "category_taxonomy_profile": "apparel"}, | |
| 660 | + {"spu_id": "2", "title": "toy", "category_taxonomy_profile": "toys"}, | |
| 661 | + ], | |
| 662 | + tenant_id="170", | |
| 663 | + category_taxonomy_profile="apparel", | |
| 664 | + ) | |
| 665 | + | |
| 666 | + assert result[0]["enriched_taxonomy_attributes"] == [ | |
| 667 | + {"name": "Product Type", "value": {"zh": ["zh-dress"], "en": ["en-dress"]}}, | |
| 668 | + {"name": "Target Gender", "value": {"zh": ["zh-women"], "en": ["en-women"]}}, | |
| 669 | + ] | |
| 670 | + assert result[1]["enriched_taxonomy_attributes"] == [ | |
| 671 | + {"name": "Product Type", "value": {"en": ["doll set"]}}, | |
| 672 | + {"name": "Age Group", "value": {"en": ["kids"]}}, | |
| 673 | + ] | |
| 674 | + assert ("taxonomy", "zh", "toys", ("2",)) not in seen_calls | |
| 675 | + assert ("taxonomy", "en", "toys", ("2",)) in seen_calls | |
| 676 | + | |
| 677 | + | |
| 565 | 678 | def test_anchor_cache_key_depends_on_product_input_not_identifiers(): |
| 566 | 679 | product_a = { |
| 567 | 680 | "id": "1", | ... | ... |