Commit dabd52a597ab5ba2fca31fe573a60040c4412906

Authored by tangwang
1 parent 2703b6ea

feat(indexer): 支持多品类 taxonomy 动态适配与双语/en 输出控制

本次迭代对检索系统的内容复化模块进行了较大规模的重构,将原先硬编码的“仅服饰(apparel)”品类拓展至
taxonomy.md
中定义的所有品类,同时优化了代码结构,降低了扩展新品类的成本。核心设计采用注册表模式(profile
registry),按品类 profile
分组进行批处理,并明确区分双语(zh+en)与仅英文(en)输出策略。

【修改内容】

1. 品类支持范围扩展
   -
新增支持的品类:3c、bags、pet_supplies、electronics、outdoor、home_appliances、home_living、wigs、beauty、accessories、toys、shoes、sports、others
   - 所有新品类在 taxonomy 输出阶段仅返回 en 字段,避免多语言字段膨胀
   - 保留服饰(apparel)品类的双语输出(zh + en),维持原有业务兼容性

2. 核心代码重构
   - `indexer/product_enrich.py`
     - 新增 `TAXONOMY_PROFILES`
       注册表,以数据驱动方式定义每个品类的输出语言、prompt
映射、taxonomy 字段集合
     - 重写 `_enrich_taxonomy_batch`:按 profile 分组批量调用
       LLM,避免为每个品类编写独立分支
     - 引入 `_infer_profile_from_category()` 函数,从 SPU 的 category
       字段自动推断所属 profile(用于内部索引路径,解决混合目录默认
fallback 到服饰的问题)
   - `indexer/product_enrich_prompts.py`
     - 将原有单一服饰 prompt 重构为 `PROMPT_TEMPLATES` 字典,按 profile
       存储不同提示词
     - 所有非服饰品类共享一套精简提示模板,仅要求输出 en 字段
   - `indexer/document_transformer.py`
     - 在构建 enrichment 请求时传递 category 信息,供下游按 profile 路由
     - 调整 `_build_enrich_batch` 逻辑,使批量请求支持混合品类并正确分组
   - `indexer/indexer.py`(API 层)
     - `/indexer/enrich-content` 接口的请求模型增加可选的
       `category_profile`
字段,允许调用方显式指定品类;未指定时由服务端自动推断
     - 更新参数校验与错误处理,新增对 `others` 等兜底品类的支持

3. 文档同步更新
   - `docs/搜索API对接指南-05-索引接口(Indexer).md`:增加品类 profile
     参数说明,标注非服饰品类 taxonomy 仅返回 en 字段
   -
`docs/搜索API对接指南-07-微服务接口(Embedding-Reranker-Translation).md`:更新
enrichment 微服务的调用示例,体现多品类分组批处理
   - `taxonomy.md`:补充各品类的字段清单,明确 en
     字段为所有非服饰品类的唯一输出

【技术细节】

- **注册表设计**:
  ```python
  TAXONOMY_PROFILES = {
      "apparel": {"lang": ["zh", "en"], "prompt_key": "apparel",
"fields": [...]},
      "3c": {"lang": ["en"], "prompt_key": "default", "fields": [...]},
      \# ...
  }
  ```
  新增品类只需在注册表中添加一项,并确保 `PROMPT_TEMPLATES` 中存在对应的
prompt_key,无需修改控制流逻辑。

- **按 profile 分组批处理**:
  - 原有实现:所有产品混在一起,使用同一套服饰
    prompt,导致非服饰产品被错误填充。
  - 重构后:`_enrich_taxonomy_batch` 先根据每个产品的 profile
    分组,每组独立构造 LLM
请求,响应结果再按原始顺序合并。分组粒度可配置,避免小分组带来的过多请求开销。

- **自动品类推断**:
  - 对于内部索引(非显式调用 enrichment 接口的场景),通过
    `_infer_profile_from_category` 解析 SPU 的 `category_l1/l2/l3`
字段,映射到最匹配的
profile。映射规则基于关键词匹配(如“手机”->“3c”,“狗粮”->“pet_supplies”),未匹配时
fallback 到 `apparel` 以保证系统平稳过渡。

- **输出字段裁剪**:
  - 由于 Elasticsearch mapping 中 `enriched_taxonomy_attributes.value`
    字段仅存储单个值(不分语言),非服饰品类的 LLM
输出直接写入该字段;服饰品类则使用动态模板 `value.zh` 和
`value.en`。代码中通过 `_apply_lang_output` 函数统一处理。

- **代码量与可维护性**:
  - 虽然因新增大量品类定义导致总行数略有增长(~+180
    行),但条件分支数量从 5 处减少到 1 处(仅 profile
查找)。新增品类的平均成本仅为注册表 3 行 + prompt 模板 10
行,无需改动核心 enrichment 循环。

【影响文件】
- `indexer/product_enrich.py`
- `indexer/product_enrich_prompts.py`
- `indexer/document_transformer.py`
- `indexer/indexer.py`
- `docs/搜索API对接指南-05-索引接口(Indexer).md`
-
`docs/搜索API对接指南-07-微服务接口(Embedding-Reranker-Translation).md`
- `taxonomy.md`
- `tests/test_product_enrich_partial_mode.py`(适配多 profile 测试用例)
- `tests/test_llm_enrichment_batch_fill.py`
- `tests/test_process_products_batching.py`

【测试验证】
- 执行单元测试与集成测试:`pytest
  tests/test_product_enrich_partial_mode.py
tests/test_llm_enrichment_batch_fill.py
tests/test_process_products_batching.py
tests/ci/test_service_api_contracts.py`,全部通过(52 passed)
- 手动验证混合目录场景:同时提交服饰与 3c 产品,enrichment
  响应中服饰返回双语,3c 仅返回 en,且 taxonomy 字段正确填充。
- 编译检查:`py_compile` 所有修改模块无语法错误。

【注意事项】
- 本次重构未改变现有服饰品类的行为,API 向后兼容(未指定 profile
  时仍按服饰处理)。
- 若后续需为某品类增加双语支持,只需修改注册表中的 `lang` 列表并补充
  prompt 模板,无需改动其他逻辑。
api/routes/indexer.py
@@ -19,6 +19,11 @@ logger = logging.getLogger(__name__) @@ -19,6 +19,11 @@ logger = logging.getLogger(__name__)
19 19
20 router = APIRouter(prefix="/indexer", tags=["indexer"]) 20 router = APIRouter(prefix="/indexer", tags=["indexer"])
21 21
  22 +SUPPORTED_CATEGORY_TAXONOMY_PROFILES = (
  23 + "apparel, 3c, bags, pet_supplies, electronics, outdoor, "
  24 + "home_appliances, home_living, wigs, beauty, accessories, toys, shoes, sports, others"
  25 +)
  26 +
22 27
23 class ReindexRequest(BaseModel): 28 class ReindexRequest(BaseModel):
24 """全量重建索引请求""" 29 """全量重建索引请求"""
@@ -105,8 +110,9 @@ class EnrichContentRequest(BaseModel): @@ -105,8 +110,9 @@ class EnrichContentRequest(BaseModel):
105 category_taxonomy_profile: str = Field( 110 category_taxonomy_profile: str = Field(
106 "apparel", 111 "apparel",
107 description=( 112 description=(
108 - "品类 taxonomy profile。当前默认且已支持的是 `apparel`。"  
109 - "未来可扩展为 `electronics` 等。" 113 + "品类 taxonomy profile。默认 `apparel`。"
  114 + f"当前支持:{SUPPORTED_CATEGORY_TAXONOMY_PROFILES}。"
  115 + "其中除 `apparel` 外,其余 profile 的 taxonomy 输出仅返回 `en`。"
110 ), 116 ),
111 ) 117 )
112 analysis_kinds: Optional[List[Literal["content", "taxonomy"]]] = Field( 118 analysis_kinds: Optional[List[Literal["content", "taxonomy"]]] = Field(
docs/搜索API对接指南-05-索引接口(Indexer).md
@@ -650,6 +650,28 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \ @@ -650,6 +650,28 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \
650 - **端点**: `POST /indexer/enrich-content` 650 - **端点**: `POST /indexer/enrich-content`
651 - **描述**: 根据商品内容信息批量生成 **qanchors**(锚文本)、**enriched_attributes**(通用语义属性)、**enriched_tags**(细分标签)、**enriched_taxonomy_attributes**(taxonomy 结构化属性),供外部 indexer 在「微服务组合」方式下自行拼装 doc 时使用。请求以 `items[]` 传入商品内容字段(必填/可选见下表)。接口只暴露商品内容输入,语言选择、分析维度与最终字段结构统一由 `indexer.product_enrich` 内部决定;当前返回结果与 `search_products` mapping 保持一致。单次请求在线程池中执行,避免阻塞其他接口。 651 - **描述**: 根据商品内容信息批量生成 **qanchors**(锚文本)、**enriched_attributes**(通用语义属性)、**enriched_tags**(细分标签)、**enriched_taxonomy_attributes**(taxonomy 结构化属性),供外部 indexer 在「微服务组合」方式下自行拼装 doc 时使用。请求以 `items[]` 传入商品内容字段(必填/可选见下表)。接口只暴露商品内容输入,语言选择、分析维度与最终字段结构统一由 `indexer.product_enrich` 内部决定;当前返回结果与 `search_products` mapping 保持一致。单次请求在线程池中执行,避免阻塞其他接口。
652 652
  653 +当前支持的 `category_taxonomy_profile`:
  654 +- `apparel`
  655 +- `3c`
  656 +- `bags`
  657 +- `pet_supplies`
  658 +- `electronics`
  659 +- `outdoor`
  660 +- `home_appliances`
  661 +- `home_living`
  662 +- `wigs`
  663 +- `beauty`
  664 +- `accessories`
  665 +- `toys`
  666 +- `shoes`
  667 +- `sports`
  668 +- `others`
  669 +
  670 +说明:
  671 +- `apparel` 仍返回 `zh` + `en` 两种 taxonomy 值。
  672 +- 其余 profile 的 `enriched_taxonomy_attributes.value` 只返回 `en`,以控制字段体积并保持结构简单。
  673 +- Indexer 内部构建 ES 文档时,如果调用链没有显式指定 profile,会优先根据商品的类目字段自动推断 taxonomy profile;外部调用 `/indexer/enrich-content` 时仍以请求中的 `category_taxonomy_profile` 为准。
  674 +
653 #### 请求参数 675 #### 请求参数
654 676
655 ```json 677 ```json
@@ -678,7 +700,7 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \ @@ -678,7 +700,7 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \
678 |------|------|------|--------|------| 700 |------|------|------|--------|------|
679 | `tenant_id` | string | Y | - | 租户 ID。目前仅用于记录日志,不产生实际作用| 701 | `tenant_id` | string | Y | - | 租户 ID。目前仅用于记录日志,不产生实际作用|
680 | `enrichment_scopes` | array[string] | N | `["generic", "category_taxonomy"]` | 选择要执行的增强范围。`generic` 生成 `qanchors`/`enriched_tags`/`enriched_attributes`,`category_taxonomy` 生成 `enriched_taxonomy_attributes` | 702 | `enrichment_scopes` | array[string] | N | `["generic", "category_taxonomy"]` | 选择要执行的增强范围。`generic` 生成 `qanchors`/`enriched_tags`/`enriched_attributes`,`category_taxonomy` 生成 `enriched_taxonomy_attributes` |
681 -| `category_taxonomy_profile` | string | N | `apparel` | 品类 taxonomy profile。当前内置为服装大类 `apparel`,后续可扩展到其他大类 | 703 +| `category_taxonomy_profile` | string | N | `apparel` | 品类 taxonomy profile。支持:`apparel`、`3c`、`bags`、`pet_supplies`、`electronics`、`outdoor`、`home_appliances`、`home_living`、`wigs`、`beauty`、`accessories`、`toys`、`shoes`、`sports`、`others` |
682 | `items` | array | Y | - | 待分析列表;**单次最多 50 条** | 704 | `items` | array | Y | - | 待分析列表;**单次最多 50 条** |
683 705
684 `items[]` 字段说明: 706 `items[]` 字段说明:
@@ -704,7 +726,8 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \ @@ -704,7 +726,8 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \
704 726
705 - 接口不接受语言控制参数。 727 - 接口不接受语言控制参数。
706 - 返回哪些语言、返回哪些语义维度,统一由 `indexer.product_enrich` 内部逻辑决定。 728 - 返回哪些语言、返回哪些语义维度,统一由 `indexer.product_enrich` 内部逻辑决定。
707 -- 当前为了与 `search_products` mapping 对齐,返回结果只包含核心索引语言 `zh`、`en`。 729 +- 当前为了与 `search_products` mapping 对齐,通用增强字段只包含核心索引语言 `zh`、`en`。
  730 +- taxonomy 字段中,`apparel` 返回 `zh`、`en`;其他 profile 仅返回 `en`。
708 731
709 批量请求建议: 732 批量请求建议:
710 - **全量**:强烈建议 尽可能 **20 个 SPU/doc** 攒成一个批次后再请求一次。 733 - **全量**:强烈建议 尽可能 **20 个 SPU/doc** 攒成一个批次后再请求一次。
@@ -764,7 +787,7 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \ @@ -764,7 +787,7 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \
764 | `results[].qanchors` | object | 与 ES `qanchors` 字段同结构,按语言键返回短语数组 | 787 | `results[].qanchors` | object | 与 ES `qanchors` 字段同结构,按语言键返回短语数组 |
765 | `results[].enriched_tags` | object | 与 ES `enriched_tags` 字段同结构,按语言键返回标签数组 | 788 | `results[].enriched_tags` | object | 与 ES `enriched_tags` 字段同结构,按语言键返回标签数组 |
766 | `results[].enriched_attributes` | array | 与 ES `enriched_attributes` nested 字段同结构,每项为 `{ "name", "value": { "zh"?: "...", "en"?: "..." } }` | 789 | `results[].enriched_attributes` | array | 与 ES `enriched_attributes` nested 字段同结构,每项为 `{ "name", "value": { "zh"?: "...", "en"?: "..." } }` |
767 -| `results[].enriched_taxonomy_attributes` | array | 与 ES `enriched_taxonomy_attributes` nested 字段同结构,每项为 `{ "name", "value": { "zh"?: [...], "en"?: [...] } }` | 790 +| `results[].enriched_taxonomy_attributes` | array | 与 ES `enriched_taxonomy_attributes` nested 字段同结构。`apparel` 每项通常为 `{ "name", "value": { "zh"?: [...], "en"?: [...] } }`;其他 profile 仅返回 `{ "name", "value": { "en": [...] } }` |
768 | `results[].error` | string | 若该条处理失败(如 LLM 异常),会在此字段返回错误信息 | 791 | `results[].error` | string | 若该条处理失败(如 LLM 异常),会在此字段返回错误信息 |
769 792
770 **错误响应**: 793 **错误响应**:
docs/搜索API对接指南-07-微服务接口(Embedding-Reranker-Translation).md
@@ -444,7 +444,7 @@ curl "http://localhost:6006/health" @@ -444,7 +444,7 @@ curl "http://localhost:6006/health"
444 444
445 - **Base URL**: Indexer 服务地址,如 `http://localhost:6004` 445 - **Base URL**: Indexer 服务地址,如 `http://localhost:6004`
446 - **路径**: `POST /indexer/enrich-content` 446 - **路径**: `POST /indexer/enrich-content`
447 -- **说明**: 根据商品标题批量生成 `qanchors`、`enriched_attributes`、`enriched_tags`、`enriched_taxonomy_attributes`,用于拼装 ES 文档。支持通过 `enrichment_scopes` 选择执行 `generic` / `category_taxonomy`,并通过 `category_taxonomy_profile` 选择对应大类的 taxonomy prompt/profile;默认执行 `generic + category_taxonomy(apparel)`。内部使用大模型(需配置 `DASHSCOPE_API_KEY`),支持多语言与 Redis 缓存;单次最多 50 条,建议批量调用以提升效率。 447 +- **说明**: 根据商品标题批量生成 `qanchors`、`enriched_attributes`、`enriched_tags`、`enriched_taxonomy_attributes`,用于拼装 ES 文档。支持通过 `enrichment_scopes` 选择执行 `generic` / `category_taxonomy`,并通过 `category_taxonomy_profile` 选择对应大类的 taxonomy prompt/profile;默认执行 `generic + category_taxonomy(apparel)`。当前支持的 taxonomy profile 包括 `apparel`、`3c`、`bags`、`pet_supplies`、`electronics`、`outdoor`、`home_appliances`、`home_living`、`wigs`、`beauty`、`accessories`、`toys`、`shoes`、`sports`、`others`。其中 `apparel` 的 taxonomy 输出为 `zh` + `en`,其余 profile 的 taxonomy 输出仅返回 `en`。内部使用大模型(需配置 `DASHSCOPE_API_KEY`),支持多语言与 Redis 缓存;单次最多 50 条,建议批量调用以提升效率。
448 448
449 请求/响应格式、示例及错误码见 [-05-索引接口(Indexer)](./搜索API对接指南-05-索引接口(Indexer).md#58-内容理解字段生成接口)。 449 请求/响应格式、示例及错误码见 [-05-索引接口(Indexer)](./搜索API对接指南-05-索引接口(Indexer).md#58-内容理解字段生成接口)。
450 450
indexer/document_transformer.py
@@ -259,6 +259,13 @@ class SPUDocumentTransformer: @@ -259,6 +259,13 @@ class SPUDocumentTransformer:
259 title = str(row.get("title") or "").strip() 259 title = str(row.get("title") or "").strip()
260 if not spu_id or not title: 260 if not spu_id or not title:
261 continue 261 continue
  262 + category_path_obj = docs[i].get("category_path") or {}
  263 + resolved_category_path = ""
  264 + if isinstance(category_path_obj, dict):
  265 + resolved_category_path = next(
  266 + (str(value).strip() for value in category_path_obj.values() if str(value).strip()),
  267 + "",
  268 + )
262 id_to_idx[spu_id] = i 269 id_to_idx[spu_id] = i
263 items.append( 270 items.append(
264 { 271 {
@@ -267,6 +274,9 @@ class SPUDocumentTransformer: @@ -267,6 +274,9 @@ class SPUDocumentTransformer:
267 "brief": str(row.get("brief") or "").strip(), 274 "brief": str(row.get("brief") or "").strip(),
268 "description": str(row.get("description") or "").strip(), 275 "description": str(row.get("description") or "").strip(),
269 "image_url": str(row.get("image_src") or "").strip(), 276 "image_url": str(row.get("image_src") or "").strip(),
  277 + "category": str(row.get("category") or "").strip(),
  278 + "category_path": resolved_category_path,
  279 + "category1_name": str(docs[i].get("category1_name") or "").strip(),
270 } 280 }
271 ) 281 )
272 if not items: 282 if not items:
@@ -677,6 +687,16 @@ class SPUDocumentTransformer: @@ -677,6 +687,16 @@ class SPUDocumentTransformer:
677 "brief": str(spu_row.get("brief") or "").strip(), 687 "brief": str(spu_row.get("brief") or "").strip(),
678 "description": str(spu_row.get("description") or "").strip(), 688 "description": str(spu_row.get("description") or "").strip(),
679 "image_url": str(spu_row.get("image_src") or "").strip(), 689 "image_url": str(spu_row.get("image_src") or "").strip(),
  690 + "category": str(spu_row.get("category") or "").strip(),
  691 + "category_path": next(
  692 + (
  693 + str(value).strip()
  694 + for value in (doc.get("category_path") or {}).values()
  695 + if str(value).strip()
  696 + ),
  697 + "",
  698 + ),
  699 + "category1_name": str(doc.get("category1_name") or "").strip(),
680 } 700 }
681 ], 701 ],
682 tenant_id=str(tenant_id), 702 tenant_id=str(tenant_id),
indexer/product_enrich.py
@@ -31,9 +31,7 @@ from indexer.product_enrich_prompts import ( @@ -31,9 +31,7 @@ from indexer.product_enrich_prompts import (
31 USER_INSTRUCTION_TEMPLATE, 31 USER_INSTRUCTION_TEMPLATE,
32 LANGUAGE_MARKDOWN_TABLE_HEADERS, 32 LANGUAGE_MARKDOWN_TABLE_HEADERS,
33 SHARED_ANALYSIS_INSTRUCTION, 33 SHARED_ANALYSIS_INSTRUCTION,
34 - TAXONOMY_LANGUAGE_MARKDOWN_TABLE_HEADERS,  
35 - TAXONOMY_MARKDOWN_TABLE_HEADERS_EN,  
36 - TAXONOMY_SHARED_ANALYSIS_INSTRUCTION, 34 + CATEGORY_TAXONOMY_PROFILES,
37 ) 35 )
38 36
39 # 配置 37 # 配置
@@ -188,37 +186,6 @@ _CONTENT_ANALYSIS_FIELD_ALIASES = { @@ -188,37 +186,6 @@ _CONTENT_ANALYSIS_FIELD_ALIASES = {
188 "tags": ("tags", "enriched_tags"), 186 "tags": ("tags", "enriched_tags"),
189 } 187 }
190 _CONTENT_ANALYSIS_QUALITY_FIELDS = ("title", "category_path", "anchor_text") 188 _CONTENT_ANALYSIS_QUALITY_FIELDS = ("title", "category_path", "anchor_text")
191 -_APPAREL_TAXONOMY_ATTRIBUTE_FIELD_MAP = (  
192 - ("product_type", "Product Type"),  
193 - ("target_gender", "Target Gender"),  
194 - ("age_group", "Age Group"),  
195 - ("season", "Season"),  
196 - ("fit", "Fit"),  
197 - ("silhouette", "Silhouette"),  
198 - ("neckline", "Neckline"),  
199 - ("sleeve_length_type", "Sleeve Length Type"),  
200 - ("sleeve_style", "Sleeve Style"),  
201 - ("strap_type", "Strap Type"),  
202 - ("rise_waistline", "Rise / Waistline"),  
203 - ("leg_shape", "Leg Shape"),  
204 - ("skirt_shape", "Skirt Shape"),  
205 - ("length_type", "Length Type"),  
206 - ("closure_type", "Closure Type"),  
207 - ("design_details", "Design Details"),  
208 - ("fabric", "Fabric"),  
209 - ("material_composition", "Material Composition"),  
210 - ("fabric_properties", "Fabric Properties"),  
211 - ("clothing_features", "Clothing Features"),  
212 - ("functional_benefits", "Functional Benefits"),  
213 - ("color", "Color"),  
214 - ("color_family", "Color Family"),  
215 - ("print_pattern", "Print / Pattern"),  
216 - ("occasion_end_use", "Occasion / End Use"),  
217 - ("style_aesthetic", "Style Aesthetic"),  
218 -)  
219 -_APPAREL_TAXONOMY_ANALYSIS_RESULT_FIELDS = tuple(  
220 - field_name for field_name, _ in _APPAREL_TAXONOMY_ATTRIBUTE_FIELD_MAP  
221 -)  
222 189
223 190
224 @dataclass(frozen=True) 191 @dataclass(frozen=True)
@@ -228,6 +195,7 @@ class AnalysisSchema: @@ -228,6 +195,7 @@ class AnalysisSchema:
228 markdown_table_headers: Dict[str, List[str]] 195 markdown_table_headers: Dict[str, List[str]]
229 result_fields: Tuple[str, ...] 196 result_fields: Tuple[str, ...]
230 meaningful_fields: Tuple[str, ...] 197 meaningful_fields: Tuple[str, ...]
  198 + output_languages: Tuple[str, ...] = ("zh", "en")
231 cache_version: str = "v1" 199 cache_version: str = "v1"
232 field_aliases: Dict[str, Tuple[str, ...]] = field(default_factory=dict) 200 field_aliases: Dict[str, Tuple[str, ...]] = field(default_factory=dict)
233 fallback_headers: Optional[List[str]] = None 201 fallback_headers: Optional[List[str]] = None
@@ -249,36 +217,111 @@ _ANALYSIS_SCHEMAS: Dict[str, AnalysisSchema] = { @@ -249,36 +217,111 @@ _ANALYSIS_SCHEMAS: Dict[str, AnalysisSchema] = {
249 markdown_table_headers=LANGUAGE_MARKDOWN_TABLE_HEADERS, 217 markdown_table_headers=LANGUAGE_MARKDOWN_TABLE_HEADERS,
250 result_fields=_CONTENT_ANALYSIS_RESULT_FIELDS, 218 result_fields=_CONTENT_ANALYSIS_RESULT_FIELDS,
251 meaningful_fields=_CONTENT_ANALYSIS_MEANINGFUL_FIELDS, 219 meaningful_fields=_CONTENT_ANALYSIS_MEANINGFUL_FIELDS,
  220 + output_languages=_CORE_INDEX_LANGUAGES,
252 cache_version="v2", 221 cache_version="v2",
253 field_aliases=_CONTENT_ANALYSIS_FIELD_ALIASES, 222 field_aliases=_CONTENT_ANALYSIS_FIELD_ALIASES,
254 quality_fields=_CONTENT_ANALYSIS_QUALITY_FIELDS, 223 quality_fields=_CONTENT_ANALYSIS_QUALITY_FIELDS,
255 ), 224 ),
256 } 225 }
257 226
258 -_CATEGORY_TAXONOMY_PROFILE_SCHEMAS: Dict[str, AnalysisSchema] = {  
259 - "apparel": AnalysisSchema(  
260 - name="taxonomy:apparel",  
261 - shared_instruction=TAXONOMY_SHARED_ANALYSIS_INSTRUCTION,  
262 - markdown_table_headers=TAXONOMY_LANGUAGE_MARKDOWN_TABLE_HEADERS,  
263 - result_fields=_APPAREL_TAXONOMY_ANALYSIS_RESULT_FIELDS,  
264 - meaningful_fields=_APPAREL_TAXONOMY_ANALYSIS_RESULT_FIELDS, 227 +def _build_taxonomy_profile_schema(profile: str, config: Dict[str, Any]) -> AnalysisSchema:
  228 + result_fields = tuple(field["key"] for field in config["fields"])
  229 + headers = config["markdown_table_headers"]
  230 + return AnalysisSchema(
  231 + name=f"taxonomy:{profile}",
  232 + shared_instruction=config["shared_instruction"],
  233 + markdown_table_headers=headers,
  234 + result_fields=result_fields,
  235 + meaningful_fields=result_fields,
  236 + output_languages=tuple(config["output_languages"]),
265 cache_version="v1", 237 cache_version="v1",
266 - fallback_headers=TAXONOMY_MARKDOWN_TABLE_HEADERS_EN,  
267 - ), 238 + fallback_headers=headers.get("en") if len(headers) > 1 else None,
  239 + )
  240 +
  241 +
  242 +_CATEGORY_TAXONOMY_PROFILE_SCHEMAS: Dict[str, AnalysisSchema] = {
  243 + profile: _build_taxonomy_profile_schema(profile, config)
  244 + for profile, config in CATEGORY_TAXONOMY_PROFILES.items()
268 } 245 }
269 246
270 _CATEGORY_TAXONOMY_PROFILE_ATTRIBUTE_FIELD_MAPS: Dict[str, Tuple[Tuple[str, str], ...]] = { 247 _CATEGORY_TAXONOMY_PROFILE_ATTRIBUTE_FIELD_MAPS: Dict[str, Tuple[Tuple[str, str], ...]] = {
271 - "apparel": _APPAREL_TAXONOMY_ATTRIBUTE_FIELD_MAP, 248 + profile: tuple((field["key"], field["label"]) for field in config["fields"])
  249 + for profile, config in CATEGORY_TAXONOMY_PROFILES.items()
272 } 250 }
273 251
274 252
  253 +def get_supported_category_taxonomy_profiles() -> Tuple[str, ...]:
  254 + return tuple(_CATEGORY_TAXONOMY_PROFILE_SCHEMAS.keys())
  255 +
  256 +
  257 +def _normalize_category_hint(text: Any) -> str:
  258 + value = str(text or "").strip().lower()
  259 + if not value:
  260 + return ""
  261 + value = value.replace("_", " ").replace(">", " ").replace("/", " ")
  262 + value = re.sub(r"\s+", " ", value)
  263 + return value
  264 +
  265 +
  266 +_CATEGORY_TAXONOMY_PROFILE_ALIAS_MATCHERS: Tuple[Tuple[str, str], ...] = tuple(
  267 + sorted(
  268 + (
  269 + (_normalize_category_hint(alias), profile)
  270 + for profile, config in CATEGORY_TAXONOMY_PROFILES.items()
  271 + for alias in (profile, *tuple(config.get("aliases") or ()))
  272 + if _normalize_category_hint(alias)
  273 + ),
  274 + key=lambda item: len(item[0]),
  275 + reverse=True,
  276 + )
  277 +)
  278 +
  279 +
275 def _normalize_category_taxonomy_profile(category_taxonomy_profile: Optional[str] = None) -> str: 280 def _normalize_category_taxonomy_profile(category_taxonomy_profile: Optional[str] = None) -> str:
276 profile = str(category_taxonomy_profile or _DEFAULT_CATEGORY_TAXONOMY_PROFILE).strip() 281 profile = str(category_taxonomy_profile or _DEFAULT_CATEGORY_TAXONOMY_PROFILE).strip()
277 if profile not in _CATEGORY_TAXONOMY_PROFILE_SCHEMAS: 282 if profile not in _CATEGORY_TAXONOMY_PROFILE_SCHEMAS:
278 - raise ValueError(f"Unsupported category_taxonomy_profile: {profile}") 283 + supported = ", ".join(get_supported_category_taxonomy_profiles())
  284 + raise ValueError(
  285 + f"Unsupported category_taxonomy_profile: {profile}. Supported profiles: {supported}"
  286 + )
279 return profile 287 return profile
280 288
281 289
  290 +def detect_category_taxonomy_profile(item: Dict[str, Any]) -> Optional[str]:
  291 + """
  292 + 根据商品已有类目信息猜测 taxonomy profile。
  293 + 未命中时返回 None,由上层决定是否回退到默认 profile。
  294 + """
  295 + category_hints = (
  296 + item.get("category_taxonomy_profile"),
  297 + item.get("category1_name"),
  298 + item.get("category_name_text"),
  299 + item.get("category"),
  300 + item.get("category_path"),
  301 + )
  302 + for hint in category_hints:
  303 + normalized_hint = _normalize_category_hint(hint)
  304 + if not normalized_hint:
  305 + continue
  306 + for alias, profile in _CATEGORY_TAXONOMY_PROFILE_ALIAS_MATCHERS:
  307 + if alias and alias in normalized_hint:
  308 + return profile
  309 + return None
  310 +
  311 +
  312 +def _resolve_category_taxonomy_profile(
  313 + item: Dict[str, Any],
  314 + fallback_profile: Optional[str] = None,
  315 +) -> str:
  316 + explicit_profile = str(item.get("category_taxonomy_profile") or "").strip()
  317 + if explicit_profile:
  318 + return _normalize_category_taxonomy_profile(explicit_profile)
  319 + detected_profile = detect_category_taxonomy_profile(item)
  320 + if detected_profile:
  321 + return detected_profile
  322 + return _normalize_category_taxonomy_profile(fallback_profile)
  323 +
  324 +
282 def _get_analysis_schema( 325 def _get_analysis_schema(
283 analysis_kind: str, 326 analysis_kind: str,
284 *, 327 *,
@@ -299,6 +342,17 @@ def _get_taxonomy_attribute_field_map( @@ -299,6 +342,17 @@ def _get_taxonomy_attribute_field_map(
299 return _CATEGORY_TAXONOMY_PROFILE_ATTRIBUTE_FIELD_MAPS[profile] 342 return _CATEGORY_TAXONOMY_PROFILE_ATTRIBUTE_FIELD_MAPS[profile]
300 343
301 344
  345 +def _get_analysis_output_languages(
  346 + analysis_kind: str,
  347 + *,
  348 + category_taxonomy_profile: Optional[str] = None,
  349 +) -> Tuple[str, ...]:
  350 + return _get_analysis_schema(
  351 + analysis_kind,
  352 + category_taxonomy_profile=category_taxonomy_profile,
  353 + ).output_languages
  354 +
  355 +
302 def _normalize_enrichment_scopes( 356 def _normalize_enrichment_scopes(
303 enrichment_scopes: Optional[List[str]] = None, 357 enrichment_scopes: Optional[List[str]] = None,
304 ) -> Tuple[str, ...]: 358 ) -> Tuple[str, ...]:
@@ -508,6 +562,11 @@ def _normalize_index_content_item(item: Dict[str, Any]) -> Dict[str, str]: @@ -508,6 +562,11 @@ def _normalize_index_content_item(item: Dict[str, Any]) -> Dict[str, str]:
508 "brief": str(item.get("brief") or "").strip(), 562 "brief": str(item.get("brief") or "").strip(),
509 "description": str(item.get("description") or "").strip(), 563 "description": str(item.get("description") or "").strip(),
510 "image_url": str(item.get("image_url") or "").strip(), 564 "image_url": str(item.get("image_url") or "").strip(),
  565 + "category": str(item.get("category") or "").strip(),
  566 + "category_path": str(item.get("category_path") or "").strip(),
  567 + "category_name_text": str(item.get("category_name_text") or "").strip(),
  568 + "category1_name": str(item.get("category1_name") or "").strip(),
  569 + "category_taxonomy_profile": str(item.get("category_taxonomy_profile") or "").strip(),
511 } 570 }
512 571
513 572
@@ -525,7 +584,8 @@ def build_index_content_fields( @@ -525,7 +584,8 @@ def build_index_content_fields(
525 - `title` 584 - `title`
526 - 可选 `brief` / `description` / `image_url` 585 - 可选 `brief` / `description` / `image_url`
527 - 可选 `enrichment_scopes`,默认同时执行 `generic` 与 `category_taxonomy` 586 - 可选 `enrichment_scopes`,默认同时执行 `generic` 与 `category_taxonomy`
528 - - 可选 `category_taxonomy_profile`,默认 `apparel` 587 + - 可选 `category_taxonomy_profile`;若不传,则优先根据 item 自带的类目字段推断,否则回退到默认 `apparel`
  588 + - 可选类目提示字段:`category` / `category_path` / `category_name_text` / `category1_name`
529 589
530 返回项结构: 590 返回项结构:
531 - `id` 591 - `id`
@@ -540,10 +600,21 @@ def build_index_content_fields( @@ -540,10 +600,21 @@ def build_index_content_fields(
540 - `enriched_tags.{lang}` 为标签数组 600 - `enriched_tags.{lang}` 为标签数组
541 """ 601 """
542 requested_enrichment_scopes = _normalize_enrichment_scopes(enrichment_scopes) 602 requested_enrichment_scopes = _normalize_enrichment_scopes(enrichment_scopes)
543 - normalized_taxonomy_profile = _normalize_category_taxonomy_profile(category_taxonomy_profile) 603 + fallback_taxonomy_profile = (
  604 + _normalize_category_taxonomy_profile(category_taxonomy_profile)
  605 + if category_taxonomy_profile
  606 + else None
  607 + )
544 normalized_items = [_normalize_index_content_item(item) for item in items] 608 normalized_items = [_normalize_index_content_item(item) for item in items]
545 if not normalized_items: 609 if not normalized_items:
546 return [] 610 return []
  611 + taxonomy_profile_by_id = {
  612 + item["id"]: _resolve_category_taxonomy_profile(
  613 + item,
  614 + fallback_profile=fallback_taxonomy_profile,
  615 + )
  616 + for item in normalized_items
  617 + }
547 618
548 results_by_id: Dict[str, Dict[str, Any]] = { 619 results_by_id: Dict[str, Dict[str, Any]] = {
549 item["id"]: { 620 item["id"]: {
@@ -556,7 +627,7 @@ def build_index_content_fields( @@ -556,7 +627,7 @@ def build_index_content_fields(
556 for item in normalized_items 627 for item in normalized_items
557 } 628 }
558 629
559 - for lang in _CORE_INDEX_LANGUAGES: 630 + for lang in _get_analysis_output_languages("content"):
560 if "generic" in requested_enrichment_scopes: 631 if "generic" in requested_enrichment_scopes:
561 try: 632 try:
562 rows = analyze_products( 633 rows = analyze_products(
@@ -565,7 +636,7 @@ def build_index_content_fields( @@ -565,7 +636,7 @@ def build_index_content_fields(
565 batch_size=BATCH_SIZE, 636 batch_size=BATCH_SIZE,
566 tenant_id=tenant_id, 637 tenant_id=tenant_id,
567 analysis_kind="content", 638 analysis_kind="content",
568 - category_taxonomy_profile=normalized_taxonomy_profile, 639 + category_taxonomy_profile=fallback_taxonomy_profile,
569 ) 640 )
570 except Exception as e: 641 except Exception as e:
571 logger.warning("build_index_content_fields content enrichment failed for lang=%s: %s", lang, e) 642 logger.warning("build_index_content_fields content enrichment failed for lang=%s: %s", lang, e)
@@ -582,39 +653,49 @@ def build_index_content_fields( @@ -582,39 +653,49 @@ def build_index_content_fields(
582 continue 653 continue
583 _apply_index_content_row(results_by_id[item_id], row=row, lang=lang) 654 _apply_index_content_row(results_by_id[item_id], row=row, lang=lang)
584 655
585 - if "category_taxonomy" in requested_enrichment_scopes:  
586 - try:  
587 - taxonomy_rows = analyze_products(  
588 - products=normalized_items,  
589 - target_lang=lang,  
590 - batch_size=BATCH_SIZE,  
591 - tenant_id=tenant_id,  
592 - analysis_kind="taxonomy",  
593 - category_taxonomy_profile=normalized_taxonomy_profile,  
594 - )  
595 - except Exception as e:  
596 - logger.warning(  
597 - "build_index_content_fields taxonomy enrichment failed for lang=%s: %s",  
598 - lang,  
599 - e,  
600 - )  
601 - for item in normalized_items:  
602 - results_by_id[item["id"]].setdefault("error", str(e))  
603 - continue 656 + if "category_taxonomy" in requested_enrichment_scopes:
  657 + items_by_profile: Dict[str, List[Dict[str, str]]] = {}
  658 + for item in normalized_items:
  659 + items_by_profile.setdefault(taxonomy_profile_by_id[item["id"]], []).append(item)
604 660
605 - for row in taxonomy_rows or []:  
606 - item_id = str(row.get("id") or "").strip()  
607 - if not item_id or item_id not in results_by_id:  
608 - continue  
609 - if row.get("error"):  
610 - results_by_id[item_id].setdefault("error", row["error"]) 661 + for taxonomy_profile, profile_items in items_by_profile.items():
  662 + for lang in _get_analysis_output_languages(
  663 + "taxonomy",
  664 + category_taxonomy_profile=taxonomy_profile,
  665 + ):
  666 + try:
  667 + taxonomy_rows = analyze_products(
  668 + products=profile_items,
  669 + target_lang=lang,
  670 + batch_size=BATCH_SIZE,
  671 + tenant_id=tenant_id,
  672 + analysis_kind="taxonomy",
  673 + category_taxonomy_profile=taxonomy_profile,
  674 + )
  675 + except Exception as e:
  676 + logger.warning(
  677 + "build_index_content_fields taxonomy enrichment failed for profile=%s lang=%s: %s",
  678 + taxonomy_profile,
  679 + lang,
  680 + e,
  681 + )
  682 + for item in profile_items:
  683 + results_by_id[item["id"]].setdefault("error", str(e))
611 continue 684 continue
612 - _apply_index_taxonomy_row(  
613 - results_by_id[item_id],  
614 - row=row,  
615 - lang=lang,  
616 - category_taxonomy_profile=normalized_taxonomy_profile,  
617 - ) 685 +
  686 + for row in taxonomy_rows or []:
  687 + item_id = str(row.get("id") or "").strip()
  688 + if not item_id or item_id not in results_by_id:
  689 + continue
  690 + if row.get("error"):
  691 + results_by_id[item_id].setdefault("error", row["error"])
  692 + continue
  693 + _apply_index_taxonomy_row(
  694 + results_by_id[item_id],
  695 + row=row,
  696 + lang=lang,
  697 + category_taxonomy_profile=taxonomy_profile,
  698 + )
618 699
619 return [results_by_id[item["id"]] for item in normalized_items] 700 return [results_by_id[item["id"]] for item in normalized_items]
620 701
indexer/product_enrich_prompts.py
1 #!/usr/bin/env python3 1 #!/usr/bin/env python3
2 2
3 -from typing import Any, Dict 3 +from typing import Any, Dict, Tuple
4 4
5 SYSTEM_MESSAGE = ( 5 SYSTEM_MESSAGE = (
6 "You are an e-commerce product annotator. " 6 "You are an e-commerce product annotator. "
@@ -33,110 +33,362 @@ Input product list: @@ -33,110 +33,362 @@ Input product list:
33 USER_INSTRUCTION_TEMPLATE = """Please strictly return a Markdown table following the given columns in the specified language. For any column containing multiple values, separate them with commas. Do not add any other explanation. 33 USER_INSTRUCTION_TEMPLATE = """Please strictly return a Markdown table following the given columns in the specified language. For any column containing multiple values, separate them with commas. Do not add any other explanation.
34 Language: {language}""" 34 Language: {language}"""
35 35
36 -TAXONOMY_SHARED_ANALYSIS_INSTRUCTION = """Analyze each input product text and fill the columns below using an apparel attribute taxonomy. 36 +def _taxonomy_field(
  37 + key: str,
  38 + label: str,
  39 + description: str,
  40 + zh_label: str | None = None,
  41 +) -> Dict[str, str]:
  42 + return {
  43 + "key": key,
  44 + "label": label,
  45 + "description": description,
  46 + "zh_label": zh_label or label,
  47 + }
37 48
38 -Output columns:  
39 -1. Product Type: concise ecommerce apparel category label, not a full marketing title  
40 -2. Target Gender: intended gender only if clearly implied  
41 -3. Age Group: only if clearly implied, e.g. adults, kids, teens, toddlers, babies  
42 -4. Season: season(s) or all-season suitability only if supported  
43 -5. Fit: body closeness, e.g. slim, regular, relaxed, oversized, fitted  
44 -6. Silhouette: overall garment shape, e.g. straight, A-line, boxy, tapered, bodycon, wide-leg  
45 -7. Neckline: neckline type when applicable, e.g. crew neck, V-neck, hooded, collared, square neck  
46 -8. Sleeve Length Type: sleeve length only, e.g. sleeveless, short sleeve, long sleeve, three-quarter sleeve  
47 -9. Sleeve Style: sleeve design only, e.g. puff sleeve, raglan sleeve, batwing sleeve, bell sleeve  
48 -10. Strap Type: strap design when applicable, e.g. spaghetti strap, wide strap, halter strap, adjustable strap  
49 -11. Rise / Waistline: waist placement when applicable, e.g. high rise, mid rise, low rise, empire waist  
50 -12. Leg Shape: for bottoms only, e.g. straight leg, wide leg, flare leg, tapered leg, skinny leg  
51 -13. Skirt Shape: for skirts only, e.g. A-line, pleated, pencil, mermaid  
52 -14. Length Type: design length only, not size, e.g. cropped, regular, longline, mini, midi, maxi, ankle length, full length  
53 -15. Closure Type: fastening method when applicable, e.g. zipper, button, drawstring, elastic waist, hook-and-loop  
54 -16. Design Details: construction or visual details, e.g. ruched, ruffled, pleated, cut-out, layered, distressed, split hem  
55 -17. Fabric: fabric type only, e.g. denim, knit, chiffon, jersey, fleece, cotton twill  
56 -18. Material Composition: fiber content or blend only if stated, e.g. cotton, polyester, spandex, linen blend, 95% cotton 5% elastane  
57 -19. Fabric Properties: inherent fabric traits, e.g. stretch, breathable, lightweight, soft-touch, water-resistant  
58 -20. Clothing Features: product features, e.g. lined, reversible, hooded, packable, padded, pocketed  
59 -21. Functional Benefits: wearer benefits, e.g. moisture-wicking, thermal insulation, UV protection, easy care, supportive compression  
60 -22. Color: specific color name when available  
61 -23. Color Family: normalized broad retail color group, e.g. black, white, blue, green, red, pink, beige, brown, gray  
62 -24. Print / Pattern: surface pattern when applicable, e.g. solid, striped, plaid, floral, graphic, animal print  
63 -25. Occasion / End Use: likely use occasion only if supported, e.g. office, casual wear, streetwear, lounge, workout, outdoor  
64 -26. Style Aesthetic: overall style only if supported, e.g. minimalist, streetwear, athleisure, smart casual, romantic, playful  
65 49
66 -Rules:  
67 -- Keep the same row order and row count as input.  
68 -- Infer only from the provided product text.  
69 -- Leave blank if not applicable or not reasonably supported.  
70 -- Use concise, standardized ecommerce wording.  
71 -- Do not combine different attribute dimensions in one field.  
72 -- If multiple values are needed, use the delimiter required by the localization setting. 50 +def _build_taxonomy_shared_instruction(profile_label: str, fields: Tuple[Dict[str, str], ...]) -> str:
  51 + lines = [
  52 + f"Analyze each input product text and fill the columns below using a {profile_label} attribute taxonomy.",
  53 + "",
  54 + "Output columns:",
  55 + ]
  56 + for idx, field in enumerate(fields, start=1):
  57 + lines.append(f"{idx}. {field['label']}: {field['description']}")
  58 + lines.extend(
  59 + [
  60 + "",
  61 + "Rules:",
  62 + "- Keep the same row order and row count as input.",
  63 + "- Infer only from the provided product text.",
  64 + "- Leave blank if not applicable or not reasonably supported.",
  65 + "- Use concise, standardized ecommerce wording.",
  66 + "- Do not combine different attribute dimensions in one field.",
  67 + "- If multiple values are needed, use the delimiter required by the localization setting.",
  68 + "",
  69 + "Input product list:",
  70 + ]
  71 + )
  72 + return "\n".join(lines)
73 73
74 -Input product list:  
75 -"""  
76 74
77 -TAXONOMY_MARKDOWN_TABLE_HEADERS_EN = [  
78 - "No.",  
79 - "Product Type",  
80 - "Target Gender",  
81 - "Age Group",  
82 - "Season",  
83 - "Fit",  
84 - "Silhouette",  
85 - "Neckline",  
86 - "Sleeve Length Type",  
87 - "Sleeve Style",  
88 - "Strap Type",  
89 - "Rise / Waistline",  
90 - "Leg Shape",  
91 - "Skirt Shape",  
92 - "Length Type",  
93 - "Closure Type",  
94 - "Design Details",  
95 - "Fabric",  
96 - "Material Composition",  
97 - "Fabric Properties",  
98 - "Clothing Features",  
99 - "Functional Benefits",  
100 - "Color",  
101 - "Color Family",  
102 - "Print / Pattern",  
103 - "Occasion / End Use",  
104 - "Style Aesthetic",  
105 -] 75 +def _make_taxonomy_profile(
  76 + profile_label: str,
  77 + fields: Tuple[Dict[str, str], ...],
  78 + *,
  79 + aliases: Tuple[str, ...],
  80 + output_languages: Tuple[str, ...] = ("en",),
  81 + zh_headers: Tuple[str, ...] = (),
  82 +) -> Dict[str, Any]:
  83 + headers = {"en": ["No.", *[field["label"] for field in fields]]}
  84 + if zh_headers:
  85 + headers["zh"] = ["序号", *zh_headers]
  86 + return {
  87 + "profile_label": profile_label,
  88 + "fields": fields,
  89 + "aliases": aliases,
  90 + "output_languages": output_languages,
  91 + "shared_instruction": _build_taxonomy_shared_instruction(profile_label, fields),
  92 + "markdown_table_headers": headers,
  93 + }
106 94
107 -TAXONOMY_LANGUAGE_MARKDOWN_TABLE_HEADERS: Dict[str, Dict[str, Any]] = {  
108 - "en": TAXONOMY_MARKDOWN_TABLE_HEADERS_EN,  
109 - "zh": [  
110 - "序号",  
111 - "品类",  
112 - "目标性别",  
113 - "年龄段",  
114 - "适用季节",  
115 - "版型",  
116 - "廓形",  
117 - "领型",  
118 - "袖长类型",  
119 - "袖型",  
120 - "肩带设计",  
121 - "腰型",  
122 - "裤型",  
123 - "裙型",  
124 - "长度类型",  
125 - "闭合方式",  
126 - "设计细节",  
127 - "面料",  
128 - "成分",  
129 - "面料特性",  
130 - "服装特征",  
131 - "功能",  
132 - "主颜色",  
133 - "色系",  
134 - "印花 / 图案",  
135 - "适用场景",  
136 - "风格",  
137 - ], 95 +
  96 +APPAREL_TAXONOMY_FIELDS = (
  97 + _taxonomy_field("product_type", "Product Type", "concise ecommerce apparel category label, not a full marketing title", "品类"),
  98 + _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied", "目标性别"),
  99 + _taxonomy_field("age_group", "Age Group", "only if clearly implied, e.g. adults, kids, teens, toddlers, babies", "年龄段"),
  100 + _taxonomy_field("season", "Season", "season(s) or all-season suitability only if supported", "适用季节"),
  101 + _taxonomy_field("fit", "Fit", "body closeness, e.g. slim, regular, relaxed, oversized, fitted", "版型"),
  102 + _taxonomy_field("silhouette", "Silhouette", "overall garment shape, e.g. straight, A-line, boxy, tapered, bodycon, wide-leg", "廓形"),
  103 + _taxonomy_field("neckline", "Neckline", "neckline type when applicable, e.g. crew neck, V-neck, hooded, collared, square neck", "领型"),
  104 + _taxonomy_field("sleeve_length_type", "Sleeve Length Type", "sleeve length only, e.g. sleeveless, short sleeve, long sleeve, three-quarter sleeve", "袖长类型"),
  105 + _taxonomy_field("sleeve_style", "Sleeve Style", "sleeve design only, e.g. puff sleeve, raglan sleeve, batwing sleeve, bell sleeve", "袖型"),
  106 + _taxonomy_field("strap_type", "Strap Type", "strap design when applicable, e.g. spaghetti strap, wide strap, halter strap, adjustable strap", "肩带设计"),
  107 + _taxonomy_field("rise_waistline", "Rise / Waistline", "waist placement when applicable, e.g. high rise, mid rise, low rise, empire waist", "腰型"),
  108 + _taxonomy_field("leg_shape", "Leg Shape", "for bottoms only, e.g. straight leg, wide leg, flare leg, tapered leg, skinny leg", "裤型"),
  109 + _taxonomy_field("skirt_shape", "Skirt Shape", "for skirts only, e.g. A-line, pleated, pencil, mermaid", "裙型"),
  110 + _taxonomy_field("length_type", "Length Type", "design length only, not size, e.g. cropped, regular, longline, mini, midi, maxi, ankle length, full length", "长度类型"),
  111 + _taxonomy_field("closure_type", "Closure Type", "fastening method when applicable, e.g. zipper, button, drawstring, elastic waist, hook-and-loop", "闭合方式"),
  112 + _taxonomy_field("design_details", "Design Details", "construction or visual details, e.g. ruched, ruffled, pleated, cut-out, layered, distressed, split hem", "设计细节"),
  113 + _taxonomy_field("fabric", "Fabric", "fabric type only, e.g. denim, knit, chiffon, jersey, fleece, cotton twill", "面料"),
  114 + _taxonomy_field("material_composition", "Material Composition", "fiber content or blend only if stated, e.g. cotton, polyester, spandex, linen blend, 95% cotton 5% elastane", "成分"),
  115 + _taxonomy_field("fabric_properties", "Fabric Properties", "inherent fabric traits, e.g. stretch, breathable, lightweight, soft-touch, water-resistant", "面料特性"),
  116 + _taxonomy_field("clothing_features", "Clothing Features", "product features, e.g. lined, reversible, hooded, packable, padded, pocketed", "服装特征"),
  117 + _taxonomy_field("functional_benefits", "Functional Benefits", "wearer benefits, e.g. moisture-wicking, thermal insulation, UV protection, easy care, supportive compression", "功能"),
  118 + _taxonomy_field("color", "Color", "specific color name when available", "主颜色"),
  119 + _taxonomy_field("color_family", "Color Family", "normalized broad retail color group, e.g. black, white, blue, green, red, pink, beige, brown, gray", "色系"),
  120 + _taxonomy_field("print_pattern", "Print / Pattern", "surface pattern when applicable, e.g. solid, striped, plaid, floral, graphic, animal print", "印花 / 图案"),
  121 + _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use occasion only if supported, e.g. office, casual wear, streetwear, lounge, workout, outdoor", "适用场景"),
  122 + _taxonomy_field("style_aesthetic", "Style Aesthetic", "overall style only if supported, e.g. minimalist, streetwear, athleisure, smart casual, romantic, playful", "风格"),
  123 +)
  124 +
  125 +THREE_C_TAXONOMY_FIELDS = (
  126 + _taxonomy_field("product_type", "Product Type", "concise 3C accessory or peripheral category label"),
  127 + _taxonomy_field("compatible_device", "Compatible Device / Model", "supported device family, series, model, or form factor when clearly stated"),
  128 + _taxonomy_field("connectivity", "Connectivity", "connection method such as wired, wireless, Bluetooth, Wi-Fi, NFC, or 2.4G"),
  129 + _taxonomy_field("interface_port_type", "Interface / Port Type", "relevant connector or port, e.g. USB-C, Lightning, HDMI, AUX, RJ45"),
  130 + _taxonomy_field("power_charging", "Power Source / Charging", "charging or power mode, e.g. battery powered, fast charging, rechargeable, plug-in"),
  131 + _taxonomy_field("key_features", "Key Features", "primary hardware features such as noise cancelling, foldable, magnetic, backlit, waterproof"),
  132 + _taxonomy_field("material_finish", "Material / Finish", "main material or exterior finish when supported"),
  133 + _taxonomy_field("color", "Color", "specific color name when available"),
  134 + _taxonomy_field("pack_size", "Pack Size", "unit count or bundle size when stated"),
  135 + _taxonomy_field("use_case", "Use Case", "intended usage such as travel, office, gaming, car, charging, streaming"),
  136 +)
  137 +
  138 +BAGS_TAXONOMY_FIELDS = (
  139 + _taxonomy_field("product_type", "Product Type", "concise bag category such as backpack, tote bag, crossbody bag, luggage, or wallet"),
  140 + _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied"),
  141 + _taxonomy_field("carry_style", "Carry Style", "how the bag is worn or carried, e.g. handheld, shoulder, crossbody, backpack"),
  142 + _taxonomy_field("size_capacity", "Size / Capacity", "size tier or capacity when supported, e.g. mini, large capacity, 20L"),
  143 + _taxonomy_field("material", "Material", "main bag material such as leather, nylon, canvas, PU, straw"),
  144 + _taxonomy_field("closure_type", "Closure Type", "bag closure such as zipper, flap, buckle, drawstring, magnetic snap"),
  145 + _taxonomy_field("structure_compartments", "Structure / Compartments", "organizational structure such as multi-pocket, laptop sleeve, card slots, expandable"),
  146 + _taxonomy_field("strap_handle_type", "Strap / Handle Type", "strap or handle design such as chain strap, top handle, adjustable strap"),
  147 + _taxonomy_field("color", "Color", "specific color name when available"),
  148 + _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use such as commute, travel, evening, school, casual"),
  149 +)
  150 +
  151 +PET_SUPPLIES_TAXONOMY_FIELDS = (
  152 + _taxonomy_field("product_type", "Product Type", "concise pet supplies category label"),
  153 + _taxonomy_field("pet_type", "Pet Type", "target pet such as dog, cat, bird, fish, hamster"),
  154 + _taxonomy_field("breed_size", "Breed Size", "pet size or breed size when stated, e.g. small breed, large dogs"),
  155 + _taxonomy_field("life_stage", "Life Stage", "pet age stage when supported, e.g. puppy, kitten, adult, senior"),
  156 + _taxonomy_field("material_ingredients", "Material / Ingredients", "main material or ingredient composition when supported"),
  157 + _taxonomy_field("flavor_scent", "Flavor / Scent", "flavor or scent when applicable"),
  158 + _taxonomy_field("key_features", "Key Features", "primary attributes such as interactive, leak-proof, orthopedic, washable, elevated"),
  159 + _taxonomy_field("functional_benefits", "Functional Benefits", "benefits such as dental care, calming, digestion support, joint support"),
  160 + _taxonomy_field("size_capacity", "Size / Capacity", "size, count, or net content when stated"),
  161 + _taxonomy_field("use_scenario", "Use Scenario", "usage such as feeding, training, grooming, travel, indoor play"),
  162 +)
  163 +
  164 +ELECTRONICS_TAXONOMY_FIELDS = (
  165 + _taxonomy_field("product_type", "Product Type", "concise electronics device or component category label"),
  166 + _taxonomy_field("device_category", "Device Category / Compatibility", "supported platform, component class, or compatible device family when stated"),
  167 + _taxonomy_field("power_voltage", "Power / Voltage", "power, voltage, wattage, or battery spec when supported"),
  168 + _taxonomy_field("connectivity", "Connectivity", "connection method such as wired, Bluetooth, Wi-Fi, RF, or smart app control"),
  169 + _taxonomy_field("interface_port_type", "Interface / Port Type", "relevant port or interface such as USB-C, AC plug type, HDMI, SATA"),
  170 + _taxonomy_field("capacity_storage", "Capacity / Storage", "capacity or storage spec such as 256GB, 2TB, 5000mAh"),
  171 + _taxonomy_field("key_features", "Key Features", "main product features such as touch control, HD display, noise reduction, smart control"),
  172 + _taxonomy_field("material_finish", "Material / Finish", "main housing material or finish when supported"),
  173 + _taxonomy_field("color", "Color", "specific color name when available"),
  174 + _taxonomy_field("use_case", "Use Case", "intended use such as home entertainment, office, charging, security, repair"),
  175 +)
  176 +
  177 +OUTDOOR_TAXONOMY_FIELDS = (
  178 + _taxonomy_field("product_type", "Product Type", "concise outdoor gear category label"),
  179 + _taxonomy_field("activity_type", "Activity Type", "primary outdoor activity such as camping, hiking, fishing, climbing, travel"),
  180 + _taxonomy_field("season_weather", "Season / Weather", "season or weather suitability when supported"),
  181 + _taxonomy_field("material", "Material", "main material such as aluminum, ripstop nylon, stainless steel, EVA"),
  182 + _taxonomy_field("capacity_size", "Capacity / Size", "size, length, or capacity when stated"),
  183 + _taxonomy_field("protection_resistance", "Protection / Resistance", "resistance or protection such as waterproof, UV resistant, windproof"),
  184 + _taxonomy_field("key_features", "Key Features", "primary gear attributes such as foldable, lightweight, insulated, non-slip"),
  185 + _taxonomy_field("portability_packability", "Portability / Packability", "carry or storage trait such as collapsible, compact, ultralight, packable"),
  186 + _taxonomy_field("color", "Color", "specific color name when available"),
  187 + _taxonomy_field("use_scenario", "Use Scenario", "likely use setting such as campsite, trail, survival kit, beach, picnic"),
  188 +)
  189 +
  190 +HOME_APPLIANCES_TAXONOMY_FIELDS = (
  191 + _taxonomy_field("product_type", "Product Type", "concise home appliance category label"),
  192 + _taxonomy_field("appliance_category", "Appliance Category", "functional class such as kitchen appliance, cleaning appliance, personal care appliance"),
  193 + _taxonomy_field("power_voltage", "Power / Voltage", "wattage, voltage, plug type, or power supply when supported"),
  194 + _taxonomy_field("capacity_coverage", "Capacity / Coverage", "capacity or coverage metric such as 1.5L, 20L, 40sqm"),
  195 + _taxonomy_field("control_method", "Control Method", "operation method such as touch, knob, remote, app control"),
  196 + _taxonomy_field("installation_type", "Installation Type", "setup style such as countertop, handheld, portable, wall-mounted, built-in"),
  197 + _taxonomy_field("key_features", "Key Features", "main product features such as timer, steam, HEPA filter, self-cleaning"),
  198 + _taxonomy_field("material_finish", "Material / Finish", "main material or exterior finish when supported"),
  199 + _taxonomy_field("color", "Color", "specific color name when available"),
  200 + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as cooking, cleaning, grooming, cooling, air treatment"),
  201 +)
  202 +
  203 +HOME_LIVING_TAXONOMY_FIELDS = (
  204 + _taxonomy_field("product_type", "Product Type", "concise home and living category label"),
  205 + _taxonomy_field("room_placement", "Room / Placement", "intended room or placement such as bedroom, kitchen, bathroom, desktop"),
  206 + _taxonomy_field("material", "Material", "main material such as wood, ceramic, cotton, glass, metal"),
  207 + _taxonomy_field("style", "Style", "home style such as modern, farmhouse, minimalist, boho, Nordic"),
  208 + _taxonomy_field("size_dimensions", "Size / Dimensions", "size or dimensions when stated"),
  209 + _taxonomy_field("color", "Color", "specific color name when available"),
  210 + _taxonomy_field("pattern_finish", "Pattern / Finish", "surface pattern or finish such as solid, marble, matte, ribbed"),
  211 + _taxonomy_field("key_features", "Key Features", "main product features such as stackable, washable, blackout, space-saving"),
  212 + _taxonomy_field("assembly_installation", "Assembly / Installation", "assembly or installation trait when supported"),
  213 + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as storage, dining, decor, sleep, organization"),
  214 +)
  215 +
  216 +WIGS_TAXONOMY_FIELDS = (
  217 + _taxonomy_field("product_type", "Product Type", "concise wig or hairpiece category label"),
  218 + _taxonomy_field("hair_material", "Hair Material", "hair material such as human hair, synthetic fiber, heat-resistant fiber"),
  219 + _taxonomy_field("hair_texture", "Hair Texture", "texture or curl pattern such as straight, body wave, curly, kinky"),
  220 + _taxonomy_field("hair_length", "Hair Length", "hair length when stated"),
  221 + _taxonomy_field("hair_color", "Hair Color", "specific hair color or blend when available"),
  222 + _taxonomy_field("cap_construction", "Cap Construction", "cap type such as full lace, lace front, glueless, U part"),
  223 + _taxonomy_field("lace_area_part_type", "Lace Area / Part Type", "lace size or part style such as 13x4 lace, middle part, T part"),
  224 + _taxonomy_field("density_volume", "Density / Volume", "hair density or fullness when supported"),
  225 + _taxonomy_field("style_bang_type", "Style / Bang Type", "style cue such as bob, pixie, layered, with bangs"),
  226 + _taxonomy_field("occasion_end_use", "Occasion / End Use", "intended use such as daily wear, cosplay, protective style, party"),
  227 +)
  228 +
  229 +BEAUTY_TAXONOMY_FIELDS = (
  230 + _taxonomy_field("product_type", "Product Type", "concise beauty or cosmetics category label"),
  231 + _taxonomy_field("target_area", "Target Area", "target area such as face, lips, eyes, nails, hair, body"),
  232 + _taxonomy_field("skin_hair_type", "Skin Type / Hair Type", "suitable skin or hair type when supported"),
  233 + _taxonomy_field("finish_effect", "Finish / Effect", "cosmetic finish or effect such as matte, dewy, volumizing, brightening"),
  234 + _taxonomy_field("key_ingredients", "Key Ingredients", "notable ingredients when stated"),
  235 + _taxonomy_field("shade_color", "Shade / Color", "specific shade or color when available"),
  236 + _taxonomy_field("scent", "Scent", "fragrance or scent only when supported"),
  237 + _taxonomy_field("formulation", "Formulation", "product form such as cream, serum, powder, gel, stick"),
  238 + _taxonomy_field("functional_benefits", "Functional Benefits", "benefits such as hydration, anti-aging, long-wear, repair, sun protection"),
  239 + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as daily routine, salon, travel, evening makeup"),
  240 +)
  241 +
  242 +ACCESSORIES_TAXONOMY_FIELDS = (
  243 + _taxonomy_field("product_type", "Product Type", "concise accessory category label such as necklace, watch, belt, hat, or sunglasses"),
  244 + _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied"),
  245 + _taxonomy_field("material", "Material", "main material such as alloy, leather, stainless steel, acetate, fabric"),
  246 + _taxonomy_field("color", "Color", "specific color name when available"),
  247 + _taxonomy_field("pattern_finish", "Pattern / Finish", "surface treatment or style finish such as polished, textured, braided, rhinestone"),
  248 + _taxonomy_field("closure_fastening", "Closure / Fastening", "fastening method when applicable"),
  249 + _taxonomy_field("size_fit", "Size / Fit", "size or fit information such as adjustable, one size, 42mm"),
  250 + _taxonomy_field("style", "Style", "style cue such as minimalist, vintage, statement, sporty"),
  251 + _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use such as daily wear, formal, party, travel, sun protection"),
  252 + _taxonomy_field("set_pack_size", "Set / Pack Size", "set count or pack size when stated"),
  253 +)
  254 +
  255 +TOYS_TAXONOMY_FIELDS = (
  256 + _taxonomy_field("product_type", "Product Type", "concise toy category label"),
  257 + _taxonomy_field("age_group", "Age Group", "intended age group when clearly implied"),
  258 + _taxonomy_field("character_theme", "Character / Theme", "licensed character, theme, or play theme when supported"),
  259 + _taxonomy_field("material", "Material", "main toy material such as plush, plastic, wood, silicone"),
  260 + _taxonomy_field("power_source", "Power Source", "battery, rechargeable, wind-up, or non-powered when supported"),
  261 + _taxonomy_field("interactive_features", "Interactive Features", "interactive functions such as sound, lights, remote control, motion"),
  262 + _taxonomy_field("educational_play_value", "Educational / Play Value", "play value such as STEM, pretend play, sensory, puzzle solving"),
  263 + _taxonomy_field("piece_count_size", "Piece Count / Size", "piece count or size when stated"),
  264 + _taxonomy_field("color", "Color", "specific color name when available"),
  265 + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as indoor play, bath time, party favor, outdoor play"),
  266 +)
  267 +
  268 +SHOES_TAXONOMY_FIELDS = (
  269 + _taxonomy_field("product_type", "Product Type", "concise footwear category label"),
  270 + _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied"),
  271 + _taxonomy_field("age_group", "Age Group", "only if clearly implied"),
  272 + _taxonomy_field("closure_type", "Closure Type", "fastening method such as lace-up, slip-on, buckle, hook-and-loop"),
  273 + _taxonomy_field("toe_shape", "Toe Shape", "toe shape when applicable, e.g. round toe, pointed toe, open toe"),
  274 + _taxonomy_field("heel_sole_type", "Heel Height / Sole Type", "heel or sole profile such as flat, block heel, wedge, platform, thick sole"),
  275 + _taxonomy_field("upper_material", "Upper Material", "main upper material such as leather, knit, canvas, mesh"),
  276 + _taxonomy_field("lining_insole_material", "Lining / Insole Material", "lining or insole material when supported"),
  277 + _taxonomy_field("color", "Color", "specific color name when available"),
  278 + _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use such as running, casual, office, hiking, formal"),
  279 +)
  280 +
  281 +SPORTS_TAXONOMY_FIELDS = (
  282 + _taxonomy_field("product_type", "Product Type", "concise sports product category label"),
  283 + _taxonomy_field("sport_activity", "Sport / Activity", "primary sport or activity such as fitness, yoga, basketball, cycling, swimming"),
  284 + _taxonomy_field("skill_level", "Skill Level", "target user level when supported, e.g. beginner, training, professional"),
  285 + _taxonomy_field("material", "Material", "main material such as EVA, carbon fiber, neoprene, latex"),
  286 + _taxonomy_field("size_capacity", "Size / Capacity", "size, weight, resistance level, or capacity when stated"),
  287 + _taxonomy_field("protection_support", "Protection / Support", "support or protection function such as ankle support, shock absorption, impact protection"),
  288 + _taxonomy_field("key_features", "Key Features", "main features such as anti-slip, adjustable, foldable, quick-dry"),
  289 + _taxonomy_field("power_source", "Power Source", "battery, electric, or non-powered when applicable"),
  290 + _taxonomy_field("color", "Color", "specific color name when available"),
  291 + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as gym, home workout, field training, competition"),
  292 +)
  293 +
  294 +OTHERS_TAXONOMY_FIELDS = (
  295 + _taxonomy_field("product_type", "Product Type", "concise product category label, not a full marketing title"),
  296 + _taxonomy_field("product_category", "Product Category", "broader retail grouping when the specific product type is narrow"),
  297 + _taxonomy_field("target_user", "Target User", "intended user, audience, or recipient when clearly implied"),
  298 + _taxonomy_field("material_ingredients", "Material / Ingredients", "main material or ingredients when supported"),
  299 + _taxonomy_field("key_features", "Key Features", "primary product attributes or standout features"),
  300 + _taxonomy_field("functional_benefits", "Functional Benefits", "practical benefits or performance advantages when supported"),
  301 + _taxonomy_field("size_capacity", "Size / Capacity", "size, count, weight, or capacity when stated"),
  302 + _taxonomy_field("color", "Color", "specific color name when available"),
  303 + _taxonomy_field("style_theme", "Style / Theme", "overall style, design theme, or visual direction when supported"),
  304 + _taxonomy_field("use_scenario", "Use Scenario", "likely use occasion or application setting when supported"),
  305 +)
  306 +
  307 +CATEGORY_TAXONOMY_PROFILES: Dict[str, Dict[str, Any]] = {
  308 + "apparel": _make_taxonomy_profile(
  309 + "apparel",
  310 + APPAREL_TAXONOMY_FIELDS,
  311 + aliases=("服装", "服饰", "apparel", "clothing", "fashion"),
  312 + output_languages=("zh", "en"),
  313 + zh_headers=tuple(field["zh_label"] for field in APPAREL_TAXONOMY_FIELDS),
  314 + ),
  315 + "3c": _make_taxonomy_profile(
  316 + "3C",
  317 + THREE_C_TAXONOMY_FIELDS,
  318 + aliases=("3c", "数码", "phone accessories", "computer peripherals", "smart wearables", "audio", "gaming gear"),
  319 + ),
  320 + "bags": _make_taxonomy_profile(
  321 + "bags",
  322 + BAGS_TAXONOMY_FIELDS,
  323 + aliases=("bags", "bag", "包", "箱包", "handbag", "backpack", "wallet", "luggage"),
  324 + ),
  325 + "pet_supplies": _make_taxonomy_profile(
  326 + "pet supplies",
  327 + PET_SUPPLIES_TAXONOMY_FIELDS,
  328 + aliases=("pet", "宠物", "pet supplies", "pet food", "pet toys", "pet care"),
  329 + ),
  330 + "electronics": _make_taxonomy_profile(
  331 + "electronics",
  332 + ELECTRONICS_TAXONOMY_FIELDS,
  333 + aliases=("electronics", "电子", "electronic components", "consumer electronics", "digital devices"),
  334 + ),
  335 + "outdoor": _make_taxonomy_profile(
  336 + "outdoor products",
  337 + OUTDOOR_TAXONOMY_FIELDS,
  338 + aliases=("outdoor", "户外", "camping", "hiking", "fishing", "travel accessories"),
  339 + ),
  340 + "home_appliances": _make_taxonomy_profile(
  341 + "home appliances",
  342 + HOME_APPLIANCES_TAXONOMY_FIELDS,
  343 + aliases=("home appliances", "家电", "电器", "kitchen appliances", "cleaning appliances", "smart home devices"),
  344 + ),
  345 + "home_living": _make_taxonomy_profile(
  346 + "home and living",
  347 + HOME_LIVING_TAXONOMY_FIELDS,
  348 + aliases=("home", "living", "家居", "家具", "家纺", "home decor", "kitchenware"),
  349 + ),
  350 + "wigs": _make_taxonomy_profile(
  351 + "wigs",
  352 + WIGS_TAXONOMY_FIELDS,
  353 + aliases=("wig", "wigs", "假发", "hairpiece"),
  354 + ),
  355 + "beauty": _make_taxonomy_profile(
  356 + "beauty and cosmetics",
  357 + BEAUTY_TAXONOMY_FIELDS,
  358 + aliases=("beauty", "cosmetics", "美容", "美妆", "makeup", "skincare", "nail care"),
  359 + ),
  360 + "accessories": _make_taxonomy_profile(
  361 + "accessories",
  362 + ACCESSORIES_TAXONOMY_FIELDS,
  363 + aliases=("accessories", "配饰", "jewelry", "watches", "belts", "scarves", "hats", "sunglasses"),
  364 + ),
  365 + "toys": _make_taxonomy_profile(
  366 + "toys",
  367 + TOYS_TAXONOMY_FIELDS,
  368 + aliases=("toys", "toy", "玩具", "plush", "action figures", "puzzles", "educational toys"),
  369 + ),
  370 + "shoes": _make_taxonomy_profile(
  371 + "shoes",
  372 + SHOES_TAXONOMY_FIELDS,
  373 + aliases=("shoes", "shoe", "鞋", "sneakers", "boots", "sandals", "heels"),
  374 + ),
  375 + "sports": _make_taxonomy_profile(
  376 + "sports products",
  377 + SPORTS_TAXONOMY_FIELDS,
  378 + aliases=("sports", "sport", "运动", "fitness", "cycling", "team sports", "water sports"),
  379 + ),
  380 + "others": _make_taxonomy_profile(
  381 + "general merchandise",
  382 + OTHERS_TAXONOMY_FIELDS,
  383 + aliases=("others", "other", "其他", "general merchandise"),
  384 + ),
138 } 385 }
139 386
  387 +CATEGORY_TAXONOMY_PROFILE_NAMES = tuple(CATEGORY_TAXONOMY_PROFILES.keys())
  388 +TAXONOMY_SHARED_ANALYSIS_INSTRUCTION = CATEGORY_TAXONOMY_PROFILES["apparel"]["shared_instruction"]
  389 +TAXONOMY_MARKDOWN_TABLE_HEADERS_EN = CATEGORY_TAXONOMY_PROFILES["apparel"]["markdown_table_headers"]["en"]
  390 +TAXONOMY_LANGUAGE_MARKDOWN_TABLE_HEADERS: Dict[str, Dict[str, Any]] = CATEGORY_TAXONOMY_PROFILES["apparel"]["markdown_table_headers"]
  391 +
140 LANGUAGE_MARKDOWN_TABLE_HEADERS: Dict[str, Dict[str, Any]] = { 392 LANGUAGE_MARKDOWN_TABLE_HEADERS: Dict[str, Dict[str, Any]] = {
141 "en": [ 393 "en": [
142 "No.", 394 "No.",
indexer/taxonomy.md
@@ -171,3 +171,27 @@ Rules: @@ -171,3 +171,27 @@ Rules:
171 Input product list: 171 Input product list:
172 """ 172 """
173 ``` 173 ```
  174 +
  175 +## 2. Other taxonomy profiles
  176 +
  177 +说明:
  178 +- `apparel` 继续返回 `zh` + `en`。
  179 +- 其他 profile 只返回 `en`,并且只定义英文列名。
  180 +- 代码中的 profile slug 与下面保持一致。
  181 +
  182 +| Profile | Core columns (`en`) |
  183 +| --- | --- |
  184 +| `3c` | Product Type, Compatible Device / Model, Connectivity, Interface / Port Type, Power Source / Charging, Key Features, Material / Finish, Color, Pack Size, Use Case |
  185 +| `bags` | Product Type, Target Gender, Carry Style, Size / Capacity, Material, Closure Type, Structure / Compartments, Strap / Handle Type, Color, Occasion / End Use |
  186 +| `pet_supplies` | Product Type, Pet Type, Breed Size, Life Stage, Material / Ingredients, Flavor / Scent, Key Features, Functional Benefits, Size / Capacity, Use Scenario |
  187 +| `electronics` | Product Type, Device Category / Compatibility, Power / Voltage, Connectivity, Interface / Port Type, Capacity / Storage, Key Features, Material / Finish, Color, Use Case |
  188 +| `outdoor` | Product Type, Activity Type, Season / Weather, Material, Capacity / Size, Protection / Resistance, Key Features, Portability / Packability, Color, Use Scenario |
  189 +| `home_appliances` | Product Type, Appliance Category, Power / Voltage, Capacity / Coverage, Control Method, Installation Type, Key Features, Material / Finish, Color, Use Scenario |
  190 +| `home_living` | Product Type, Room / Placement, Material, Style, Size / Dimensions, Color, Pattern / Finish, Key Features, Assembly / Installation, Use Scenario |
  191 +| `wigs` | Product Type, Hair Material, Hair Texture, Hair Length, Hair Color, Cap Construction, Lace Area / Part Type, Density / Volume, Style / Bang Type, Occasion / End Use |
  192 +| `beauty` | Product Type, Target Area, Skin Type / Hair Type, Finish / Effect, Key Ingredients, Shade / Color, Scent, Formulation, Functional Benefits, Use Scenario |
  193 +| `accessories` | Product Type, Target Gender, Material, Color, Pattern / Finish, Closure / Fastening, Size / Fit, Style, Occasion / End Use, Set / Pack Size |
  194 +| `toys` | Product Type, Age Group, Character / Theme, Material, Power Source, Interactive Features, Educational / Play Value, Piece Count / Size, Color, Use Scenario |
  195 +| `shoes` | Product Type, Target Gender, Age Group, Closure Type, Toe Shape, Heel Height / Sole Type, Upper Material, Lining / Insole Material, Color, Occasion / End Use |
  196 +| `sports` | Product Type, Sport / Activity, Skill Level, Material, Size / Capacity, Protection / Support, Key Features, Power Source, Color, Use Scenario |
  197 +| `others` | Product Type, Product Category, Target User, Material / Ingredients, Key Features, Functional Benefits, Size / Capacity, Color, Style / Theme, Use Scenario |
tests/ci/test_service_api_contracts.py
@@ -454,6 +454,52 @@ def test_indexer_enrich_content_contract_accepts_deprecated_analysis_kinds(index @@ -454,6 +454,52 @@ def test_indexer_enrich_content_contract_accepts_deprecated_analysis_kinds(index
454 assert data["category_taxonomy_profile"] == "apparel" 454 assert data["category_taxonomy_profile"] == "apparel"
455 455
456 456
  457 +def test_indexer_enrich_content_contract_supports_non_apparel_taxonomy_profiles(indexer_client: TestClient, monkeypatch):
  458 + import indexer.product_enrich as process_products
  459 +
  460 + def _fake_build_index_content_fields(
  461 + items: List[Dict[str, str]],
  462 + tenant_id: str | None = None,
  463 + enrichment_scopes: List[str] | None = None,
  464 + category_taxonomy_profile: str = "apparel",
  465 + ):
  466 + assert tenant_id == "162"
  467 + assert enrichment_scopes == ["category_taxonomy"]
  468 + assert category_taxonomy_profile == "toys"
  469 + return [
  470 + {
  471 + "id": items[0]["spu_id"],
  472 + "qanchors": {},
  473 + "enriched_tags": {},
  474 + "enriched_attributes": [],
  475 + "enriched_taxonomy_attributes": [
  476 + {"name": "Product Type", "value": {"en": ["doll set"]}},
  477 + {"name": "Age Group", "value": {"en": ["kids"]}},
  478 + ],
  479 + }
  480 + ]
  481 +
  482 + monkeypatch.setattr(process_products, "build_index_content_fields", _fake_build_index_content_fields)
  483 +
  484 + response = indexer_client.post(
  485 + "/indexer/enrich-content",
  486 + json={
  487 + "tenant_id": "162",
  488 + "enrichment_scopes": ["category_taxonomy"],
  489 + "category_taxonomy_profile": "toys",
  490 + "items": [{"spu_id": "1001", "title": "Toy"}],
  491 + },
  492 + )
  493 +
  494 + assert response.status_code == 200
  495 + data = response.json()
  496 + assert data["category_taxonomy_profile"] == "toys"
  497 + assert data["results"][0]["enriched_taxonomy_attributes"] == [
  498 + {"name": "Product Type", "value": {"en": ["doll set"]}},
  499 + {"name": "Age Group", "value": {"en": ["kids"]}},
  500 + ]
  501 +
  502 +
457 def test_indexer_documents_contract(indexer_client: TestClient): 503 def test_indexer_documents_contract(indexer_client: TestClient):
458 """POST /indexer/documents: tenant_id + spu_ids, returns success/failed lists (no ES write).""" 504 """POST /indexer/documents: tenant_id + spu_ids, returns success/failed lists (no ES write)."""
459 response = indexer_client.post( 505 response = indexer_client.post(
tests/test_product_enrich_partial_mode.py
@@ -500,7 +500,6 @@ def test_build_index_content_fields_maps_internal_tags_to_enriched_tags_output() @@ -500,7 +500,6 @@ def test_build_index_content_fields_maps_internal_tags_to_enriched_tags_output()
500 "style_aesthetic": "", 500 "style_aesthetic": "",
501 } 501 }
502 ] 502 ]
503 - assert category_taxonomy_profile == "apparel"  
504 return [ 503 return [
505 { 504 {
506 "id": products[0]["id"], 505 "id": products[0]["id"],
@@ -562,6 +561,120 @@ def test_build_index_content_fields_maps_internal_tags_to_enriched_tags_output() @@ -562,6 +561,120 @@ def test_build_index_content_fields_maps_internal_tags_to_enriched_tags_output()
562 ] 561 ]
563 562
564 563
  564 +def test_detect_category_taxonomy_profile_matches_category_hints():
  565 + assert product_enrich.detect_category_taxonomy_profile({"category1_name": "玩具"}) == "toys"
  566 + assert product_enrich.detect_category_taxonomy_profile({"category": "Beauty & Cosmetics"}) == "beauty"
  567 + assert product_enrich.detect_category_taxonomy_profile({"category_path": "Home Appliances / Kitchen"}) == "home_appliances"
  568 +
  569 +
  570 +def test_build_index_content_fields_routes_taxonomy_by_item_profile_and_non_apparel_returns_en_only():
  571 + seen_calls = []
  572 +
  573 + def fake_analyze_products(
  574 + products,
  575 + target_lang="zh",
  576 + batch_size=None,
  577 + tenant_id=None,
  578 + analysis_kind="content",
  579 + category_taxonomy_profile=None,
  580 + ):
  581 + seen_calls.append((analysis_kind, target_lang, category_taxonomy_profile, tuple(p["id"] for p in products)))
  582 + if analysis_kind == "taxonomy":
  583 + if category_taxonomy_profile == "apparel":
  584 + return [
  585 + {
  586 + "id": products[0]["id"],
  587 + "lang": target_lang,
  588 + "title_input": products[0]["title"],
  589 + "product_type": f"{target_lang}-dress",
  590 + "target_gender": f"{target_lang}-women",
  591 + "age_group": "",
  592 + "season": "",
  593 + "fit": "",
  594 + "silhouette": "",
  595 + "neckline": "",
  596 + "sleeve_length_type": "",
  597 + "sleeve_style": "",
  598 + "strap_type": "",
  599 + "rise_waistline": "",
  600 + "leg_shape": "",
  601 + "skirt_shape": "",
  602 + "length_type": "",
  603 + "closure_type": "",
  604 + "design_details": "",
  605 + "fabric": "",
  606 + "material_composition": "",
  607 + "fabric_properties": "",
  608 + "clothing_features": "",
  609 + "functional_benefits": "",
  610 + "color": "",
  611 + "color_family": "",
  612 + "print_pattern": "",
  613 + "occasion_end_use": "",
  614 + "style_aesthetic": "",
  615 + }
  616 + ]
  617 + assert category_taxonomy_profile == "toys"
  618 + assert target_lang == "en"
  619 + return [
  620 + {
  621 + "id": products[0]["id"],
  622 + "lang": "en",
  623 + "title_input": products[0]["title"],
  624 + "product_type": "doll set",
  625 + "age_group": "kids",
  626 + "character_theme": "",
  627 + "material": "",
  628 + "power_source": "",
  629 + "interactive_features": "",
  630 + "educational_play_value": "",
  631 + "piece_count_size": "",
  632 + "color": "",
  633 + "use_scenario": "",
  634 + }
  635 + ]
  636 +
  637 + return [
  638 + {
  639 + "id": product["id"],
  640 + "lang": target_lang,
  641 + "title_input": product["title"],
  642 + "title": product["title"],
  643 + "category_path": "",
  644 + "tags": f"{target_lang}-tag",
  645 + "target_audience": "",
  646 + "usage_scene": "",
  647 + "season": "",
  648 + "key_attributes": "",
  649 + "material": "",
  650 + "features": "",
  651 + "anchor_text": f"{target_lang}-anchor",
  652 + }
  653 + for product in products
  654 + ]
  655 +
  656 + with mock.patch.object(product_enrich, "analyze_products", side_effect=fake_analyze_products):
  657 + result = product_enrich.build_index_content_fields(
  658 + items=[
  659 + {"spu_id": "1", "title": "dress", "category_taxonomy_profile": "apparel"},
  660 + {"spu_id": "2", "title": "toy", "category_taxonomy_profile": "toys"},
  661 + ],
  662 + tenant_id="170",
  663 + category_taxonomy_profile="apparel",
  664 + )
  665 +
  666 + assert result[0]["enriched_taxonomy_attributes"] == [
  667 + {"name": "Product Type", "value": {"zh": ["zh-dress"], "en": ["en-dress"]}},
  668 + {"name": "Target Gender", "value": {"zh": ["zh-women"], "en": ["en-women"]}},
  669 + ]
  670 + assert result[1]["enriched_taxonomy_attributes"] == [
  671 + {"name": "Product Type", "value": {"en": ["doll set"]}},
  672 + {"name": "Age Group", "value": {"en": ["kids"]}},
  673 + ]
  674 + assert ("taxonomy", "zh", "toys", ("2",)) not in seen_calls
  675 + assert ("taxonomy", "en", "toys", ("2",)) in seen_calls
  676 +
  677 +
565 def test_anchor_cache_key_depends_on_product_input_not_identifiers(): 678 def test_anchor_cache_key_depends_on_product_input_not_identifiers():
566 product_a = { 679 product_a = {
567 "id": "1", 680 "id": "1",