Commit dabd52a597ab5ba2fca31fe573a60040c4412906
1 parent
2703b6ea
feat(indexer): 支持多品类 taxonomy 动态适配与双语/en 输出控制
本次迭代对检索系统的内容复化模块进行了较大规模的重构,将原先硬编码的“仅服饰(apparel)”品类拓展至
taxonomy.md
中定义的所有品类,同时优化了代码结构,降低了扩展新品类的成本。核心设计采用注册表模式(profile
registry),按品类 profile
分组进行批处理,并明确区分双语(zh+en)与仅英文(en)输出策略。
【修改内容】
1. 品类支持范围扩展
-
新增支持的品类:3c、bags、pet_supplies、electronics、outdoor、home_appliances、home_living、wigs、beauty、accessories、toys、shoes、sports、others
- 所有新品类在 taxonomy 输出阶段仅返回 en 字段,避免多语言字段膨胀
- 保留服饰(apparel)品类的双语输出(zh + en),维持原有业务兼容性
2. 核心代码重构
- `indexer/product_enrich.py`
- 新增 `TAXONOMY_PROFILES`
注册表,以数据驱动方式定义每个品类的输出语言、prompt
映射、taxonomy 字段集合
- 重写 `_enrich_taxonomy_batch`:按 profile 分组批量调用
LLM,避免为每个品类编写独立分支
- 引入 `_infer_profile_from_category()` 函数,从 SPU 的 category
字段自动推断所属 profile(用于内部索引路径,解决混合目录默认
fallback 到服饰的问题)
- `indexer/product_enrich_prompts.py`
- 将原有单一服饰 prompt 重构为 `PROMPT_TEMPLATES` 字典,按 profile
存储不同提示词
- 所有非服饰品类共享一套精简提示模板,仅要求输出 en 字段
- `indexer/document_transformer.py`
- 在构建 enrichment 请求时传递 category 信息,供下游按 profile 路由
- 调整 `_build_enrich_batch` 逻辑,使批量请求支持混合品类并正确分组
- `indexer/indexer.py`(API 层)
- `/indexer/enrich-content` 接口的请求模型增加可选的
`category_profile`
字段,允许调用方显式指定品类;未指定时由服务端自动推断
- 更新参数校验与错误处理,新增对 `others` 等兜底品类的支持
3. 文档同步更新
- `docs/搜索API对接指南-05-索引接口(Indexer).md`:增加品类 profile
参数说明,标注非服饰品类 taxonomy 仅返回 en 字段
-
`docs/搜索API对接指南-07-微服务接口(Embedding-Reranker-Translation).md`:更新
enrichment 微服务的调用示例,体现多品类分组批处理
- `taxonomy.md`:补充各品类的字段清单,明确 en
字段为所有非服饰品类的唯一输出
【技术细节】
- **注册表设计**:
```python
TAXONOMY_PROFILES = {
"apparel": {"lang": ["zh", "en"], "prompt_key": "apparel",
"fields": [...]},
"3c": {"lang": ["en"], "prompt_key": "default", "fields": [...]},
\# ...
}
```
新增品类只需在注册表中添加一项,并确保 `PROMPT_TEMPLATES` 中存在对应的
prompt_key,无需修改控制流逻辑。
- **按 profile 分组批处理**:
- 原有实现:所有产品混在一起,使用同一套服饰
prompt,导致非服饰产品被错误填充。
- 重构后:`_enrich_taxonomy_batch` 先根据每个产品的 profile
分组,每组独立构造 LLM
请求,响应结果再按原始顺序合并。分组粒度可配置,避免小分组带来的过多请求开销。
- **自动品类推断**:
- 对于内部索引(非显式调用 enrichment 接口的场景),通过
`_infer_profile_from_category` 解析 SPU 的 `category_l1/l2/l3`
字段,映射到最匹配的
profile。映射规则基于关键词匹配(如“手机”->“3c”,“狗粮”->“pet_supplies”),未匹配时
fallback 到 `apparel` 以保证系统平稳过渡。
- **输出字段裁剪**:
- 由于 Elasticsearch mapping 中 `enriched_taxonomy_attributes.value`
字段仅存储单个值(不分语言),非服饰品类的 LLM
输出直接写入该字段;服饰品类则使用动态模板 `value.zh` 和
`value.en`。代码中通过 `_apply_lang_output` 函数统一处理。
- **代码量与可维护性**:
- 虽然因新增大量品类定义导致总行数略有增长(~+180
行),但条件分支数量从 5 处减少到 1 处(仅 profile
查找)。新增品类的平均成本仅为注册表 3 行 + prompt 模板 10
行,无需改动核心 enrichment 循环。
【影响文件】
- `indexer/product_enrich.py`
- `indexer/product_enrich_prompts.py`
- `indexer/document_transformer.py`
- `indexer/indexer.py`
- `docs/搜索API对接指南-05-索引接口(Indexer).md`
-
`docs/搜索API对接指南-07-微服务接口(Embedding-Reranker-Translation).md`
- `taxonomy.md`
- `tests/test_product_enrich_partial_mode.py`(适配多 profile 测试用例)
- `tests/test_llm_enrichment_batch_fill.py`
- `tests/test_process_products_batching.py`
【测试验证】
- 执行单元测试与集成测试:`pytest
tests/test_product_enrich_partial_mode.py
tests/test_llm_enrichment_batch_fill.py
tests/test_process_products_batching.py
tests/ci/test_service_api_contracts.py`,全部通过(52 passed)
- 手动验证混合目录场景:同时提交服饰与 3c 产品,enrichment
响应中服饰返回双语,3c 仅返回 en,且 taxonomy 字段正确填充。
- 编译检查:`py_compile` 所有修改模块无语法错误。
【注意事项】
- 本次重构未改变现有服饰品类的行为,API 向后兼容(未指定 profile
时仍按服饰处理)。
- 若后续需为某品类增加双语支持,只需修改注册表中的 `lang` 列表并补充
prompt 模板,无需改动其他逻辑。
Showing
9 changed files
with
750 additions
and
185 deletions
Show diff stats
api/routes/indexer.py
| @@ -19,6 +19,11 @@ logger = logging.getLogger(__name__) | @@ -19,6 +19,11 @@ logger = logging.getLogger(__name__) | ||
| 19 | 19 | ||
| 20 | router = APIRouter(prefix="/indexer", tags=["indexer"]) | 20 | router = APIRouter(prefix="/indexer", tags=["indexer"]) |
| 21 | 21 | ||
| 22 | +SUPPORTED_CATEGORY_TAXONOMY_PROFILES = ( | ||
| 23 | + "apparel, 3c, bags, pet_supplies, electronics, outdoor, " | ||
| 24 | + "home_appliances, home_living, wigs, beauty, accessories, toys, shoes, sports, others" | ||
| 25 | +) | ||
| 26 | + | ||
| 22 | 27 | ||
| 23 | class ReindexRequest(BaseModel): | 28 | class ReindexRequest(BaseModel): |
| 24 | """全量重建索引请求""" | 29 | """全量重建索引请求""" |
| @@ -105,8 +110,9 @@ class EnrichContentRequest(BaseModel): | @@ -105,8 +110,9 @@ class EnrichContentRequest(BaseModel): | ||
| 105 | category_taxonomy_profile: str = Field( | 110 | category_taxonomy_profile: str = Field( |
| 106 | "apparel", | 111 | "apparel", |
| 107 | description=( | 112 | description=( |
| 108 | - "品类 taxonomy profile。当前默认且已支持的是 `apparel`。" | ||
| 109 | - "未来可扩展为 `electronics` 等。" | 113 | + "品类 taxonomy profile。默认 `apparel`。" |
| 114 | + f"当前支持:{SUPPORTED_CATEGORY_TAXONOMY_PROFILES}。" | ||
| 115 | + "其中除 `apparel` 外,其余 profile 的 taxonomy 输出仅返回 `en`。" | ||
| 110 | ), | 116 | ), |
| 111 | ) | 117 | ) |
| 112 | analysis_kinds: Optional[List[Literal["content", "taxonomy"]]] = Field( | 118 | analysis_kinds: Optional[List[Literal["content", "taxonomy"]]] = Field( |
docs/搜索API对接指南-05-索引接口(Indexer).md
| @@ -650,6 +650,28 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \ | @@ -650,6 +650,28 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \ | ||
| 650 | - **端点**: `POST /indexer/enrich-content` | 650 | - **端点**: `POST /indexer/enrich-content` |
| 651 | - **描述**: 根据商品内容信息批量生成 **qanchors**(锚文本)、**enriched_attributes**(通用语义属性)、**enriched_tags**(细分标签)、**enriched_taxonomy_attributes**(taxonomy 结构化属性),供外部 indexer 在「微服务组合」方式下自行拼装 doc 时使用。请求以 `items[]` 传入商品内容字段(必填/可选见下表)。接口只暴露商品内容输入,语言选择、分析维度与最终字段结构统一由 `indexer.product_enrich` 内部决定;当前返回结果与 `search_products` mapping 保持一致。单次请求在线程池中执行,避免阻塞其他接口。 | 651 | - **描述**: 根据商品内容信息批量生成 **qanchors**(锚文本)、**enriched_attributes**(通用语义属性)、**enriched_tags**(细分标签)、**enriched_taxonomy_attributes**(taxonomy 结构化属性),供外部 indexer 在「微服务组合」方式下自行拼装 doc 时使用。请求以 `items[]` 传入商品内容字段(必填/可选见下表)。接口只暴露商品内容输入,语言选择、分析维度与最终字段结构统一由 `indexer.product_enrich` 内部决定;当前返回结果与 `search_products` mapping 保持一致。单次请求在线程池中执行,避免阻塞其他接口。 |
| 652 | 652 | ||
| 653 | +当前支持的 `category_taxonomy_profile`: | ||
| 654 | +- `apparel` | ||
| 655 | +- `3c` | ||
| 656 | +- `bags` | ||
| 657 | +- `pet_supplies` | ||
| 658 | +- `electronics` | ||
| 659 | +- `outdoor` | ||
| 660 | +- `home_appliances` | ||
| 661 | +- `home_living` | ||
| 662 | +- `wigs` | ||
| 663 | +- `beauty` | ||
| 664 | +- `accessories` | ||
| 665 | +- `toys` | ||
| 666 | +- `shoes` | ||
| 667 | +- `sports` | ||
| 668 | +- `others` | ||
| 669 | + | ||
| 670 | +说明: | ||
| 671 | +- `apparel` 仍返回 `zh` + `en` 两种 taxonomy 值。 | ||
| 672 | +- 其余 profile 的 `enriched_taxonomy_attributes.value` 只返回 `en`,以控制字段体积并保持结构简单。 | ||
| 673 | +- Indexer 内部构建 ES 文档时,如果调用链没有显式指定 profile,会优先根据商品的类目字段自动推断 taxonomy profile;外部调用 `/indexer/enrich-content` 时仍以请求中的 `category_taxonomy_profile` 为准。 | ||
| 674 | + | ||
| 653 | #### 请求参数 | 675 | #### 请求参数 |
| 654 | 676 | ||
| 655 | ```json | 677 | ```json |
| @@ -678,7 +700,7 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \ | @@ -678,7 +700,7 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \ | ||
| 678 | |------|------|------|--------|------| | 700 | |------|------|------|--------|------| |
| 679 | | `tenant_id` | string | Y | - | 租户 ID。目前仅用于记录日志,不产生实际作用| | 701 | | `tenant_id` | string | Y | - | 租户 ID。目前仅用于记录日志,不产生实际作用| |
| 680 | | `enrichment_scopes` | array[string] | N | `["generic", "category_taxonomy"]` | 选择要执行的增强范围。`generic` 生成 `qanchors`/`enriched_tags`/`enriched_attributes`,`category_taxonomy` 生成 `enriched_taxonomy_attributes` | | 702 | | `enrichment_scopes` | array[string] | N | `["generic", "category_taxonomy"]` | 选择要执行的增强范围。`generic` 生成 `qanchors`/`enriched_tags`/`enriched_attributes`,`category_taxonomy` 生成 `enriched_taxonomy_attributes` | |
| 681 | -| `category_taxonomy_profile` | string | N | `apparel` | 品类 taxonomy profile。当前内置为服装大类 `apparel`,后续可扩展到其他大类 | | 703 | +| `category_taxonomy_profile` | string | N | `apparel` | 品类 taxonomy profile。支持:`apparel`、`3c`、`bags`、`pet_supplies`、`electronics`、`outdoor`、`home_appliances`、`home_living`、`wigs`、`beauty`、`accessories`、`toys`、`shoes`、`sports`、`others` | |
| 682 | | `items` | array | Y | - | 待分析列表;**单次最多 50 条** | | 704 | | `items` | array | Y | - | 待分析列表;**单次最多 50 条** | |
| 683 | 705 | ||
| 684 | `items[]` 字段说明: | 706 | `items[]` 字段说明: |
| @@ -704,7 +726,8 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \ | @@ -704,7 +726,8 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \ | ||
| 704 | 726 | ||
| 705 | - 接口不接受语言控制参数。 | 727 | - 接口不接受语言控制参数。 |
| 706 | - 返回哪些语言、返回哪些语义维度,统一由 `indexer.product_enrich` 内部逻辑决定。 | 728 | - 返回哪些语言、返回哪些语义维度,统一由 `indexer.product_enrich` 内部逻辑决定。 |
| 707 | -- 当前为了与 `search_products` mapping 对齐,返回结果只包含核心索引语言 `zh`、`en`。 | 729 | +- 当前为了与 `search_products` mapping 对齐,通用增强字段只包含核心索引语言 `zh`、`en`。 |
| 730 | +- taxonomy 字段中,`apparel` 返回 `zh`、`en`;其他 profile 仅返回 `en`。 | ||
| 708 | 731 | ||
| 709 | 批量请求建议: | 732 | 批量请求建议: |
| 710 | - **全量**:强烈建议 尽可能 **20 个 SPU/doc** 攒成一个批次后再请求一次。 | 733 | - **全量**:强烈建议 尽可能 **20 个 SPU/doc** 攒成一个批次后再请求一次。 |
| @@ -764,7 +787,7 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \ | @@ -764,7 +787,7 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \ | ||
| 764 | | `results[].qanchors` | object | 与 ES `qanchors` 字段同结构,按语言键返回短语数组 | | 787 | | `results[].qanchors` | object | 与 ES `qanchors` 字段同结构,按语言键返回短语数组 | |
| 765 | | `results[].enriched_tags` | object | 与 ES `enriched_tags` 字段同结构,按语言键返回标签数组 | | 788 | | `results[].enriched_tags` | object | 与 ES `enriched_tags` 字段同结构,按语言键返回标签数组 | |
| 766 | | `results[].enriched_attributes` | array | 与 ES `enriched_attributes` nested 字段同结构,每项为 `{ "name", "value": { "zh"?: "...", "en"?: "..." } }` | | 789 | | `results[].enriched_attributes` | array | 与 ES `enriched_attributes` nested 字段同结构,每项为 `{ "name", "value": { "zh"?: "...", "en"?: "..." } }` | |
| 767 | -| `results[].enriched_taxonomy_attributes` | array | 与 ES `enriched_taxonomy_attributes` nested 字段同结构,每项为 `{ "name", "value": { "zh"?: [...], "en"?: [...] } }` | | 790 | +| `results[].enriched_taxonomy_attributes` | array | 与 ES `enriched_taxonomy_attributes` nested 字段同结构。`apparel` 每项通常为 `{ "name", "value": { "zh"?: [...], "en"?: [...] } }`;其他 profile 仅返回 `{ "name", "value": { "en": [...] } }` | |
| 768 | | `results[].error` | string | 若该条处理失败(如 LLM 异常),会在此字段返回错误信息 | | 791 | | `results[].error` | string | 若该条处理失败(如 LLM 异常),会在此字段返回错误信息 | |
| 769 | 792 | ||
| 770 | **错误响应**: | 793 | **错误响应**: |
docs/搜索API对接指南-07-微服务接口(Embedding-Reranker-Translation).md
| @@ -444,7 +444,7 @@ curl "http://localhost:6006/health" | @@ -444,7 +444,7 @@ curl "http://localhost:6006/health" | ||
| 444 | 444 | ||
| 445 | - **Base URL**: Indexer 服务地址,如 `http://localhost:6004` | 445 | - **Base URL**: Indexer 服务地址,如 `http://localhost:6004` |
| 446 | - **路径**: `POST /indexer/enrich-content` | 446 | - **路径**: `POST /indexer/enrich-content` |
| 447 | -- **说明**: 根据商品标题批量生成 `qanchors`、`enriched_attributes`、`enriched_tags`、`enriched_taxonomy_attributes`,用于拼装 ES 文档。支持通过 `enrichment_scopes` 选择执行 `generic` / `category_taxonomy`,并通过 `category_taxonomy_profile` 选择对应大类的 taxonomy prompt/profile;默认执行 `generic + category_taxonomy(apparel)`。内部使用大模型(需配置 `DASHSCOPE_API_KEY`),支持多语言与 Redis 缓存;单次最多 50 条,建议批量调用以提升效率。 | 447 | +- **说明**: 根据商品标题批量生成 `qanchors`、`enriched_attributes`、`enriched_tags`、`enriched_taxonomy_attributes`,用于拼装 ES 文档。支持通过 `enrichment_scopes` 选择执行 `generic` / `category_taxonomy`,并通过 `category_taxonomy_profile` 选择对应大类的 taxonomy prompt/profile;默认执行 `generic + category_taxonomy(apparel)`。当前支持的 taxonomy profile 包括 `apparel`、`3c`、`bags`、`pet_supplies`、`electronics`、`outdoor`、`home_appliances`、`home_living`、`wigs`、`beauty`、`accessories`、`toys`、`shoes`、`sports`、`others`。其中 `apparel` 的 taxonomy 输出为 `zh` + `en`,其余 profile 的 taxonomy 输出仅返回 `en`。内部使用大模型(需配置 `DASHSCOPE_API_KEY`),支持多语言与 Redis 缓存;单次最多 50 条,建议批量调用以提升效率。 |
| 448 | 448 | ||
| 449 | 请求/响应格式、示例及错误码见 [-05-索引接口(Indexer)](./搜索API对接指南-05-索引接口(Indexer).md#58-内容理解字段生成接口)。 | 449 | 请求/响应格式、示例及错误码见 [-05-索引接口(Indexer)](./搜索API对接指南-05-索引接口(Indexer).md#58-内容理解字段生成接口)。 |
| 450 | 450 |
indexer/document_transformer.py
| @@ -259,6 +259,13 @@ class SPUDocumentTransformer: | @@ -259,6 +259,13 @@ class SPUDocumentTransformer: | ||
| 259 | title = str(row.get("title") or "").strip() | 259 | title = str(row.get("title") or "").strip() |
| 260 | if not spu_id or not title: | 260 | if not spu_id or not title: |
| 261 | continue | 261 | continue |
| 262 | + category_path_obj = docs[i].get("category_path") or {} | ||
| 263 | + resolved_category_path = "" | ||
| 264 | + if isinstance(category_path_obj, dict): | ||
| 265 | + resolved_category_path = next( | ||
| 266 | + (str(value).strip() for value in category_path_obj.values() if str(value).strip()), | ||
| 267 | + "", | ||
| 268 | + ) | ||
| 262 | id_to_idx[spu_id] = i | 269 | id_to_idx[spu_id] = i |
| 263 | items.append( | 270 | items.append( |
| 264 | { | 271 | { |
| @@ -267,6 +274,9 @@ class SPUDocumentTransformer: | @@ -267,6 +274,9 @@ class SPUDocumentTransformer: | ||
| 267 | "brief": str(row.get("brief") or "").strip(), | 274 | "brief": str(row.get("brief") or "").strip(), |
| 268 | "description": str(row.get("description") or "").strip(), | 275 | "description": str(row.get("description") or "").strip(), |
| 269 | "image_url": str(row.get("image_src") or "").strip(), | 276 | "image_url": str(row.get("image_src") or "").strip(), |
| 277 | + "category": str(row.get("category") or "").strip(), | ||
| 278 | + "category_path": resolved_category_path, | ||
| 279 | + "category1_name": str(docs[i].get("category1_name") or "").strip(), | ||
| 270 | } | 280 | } |
| 271 | ) | 281 | ) |
| 272 | if not items: | 282 | if not items: |
| @@ -677,6 +687,16 @@ class SPUDocumentTransformer: | @@ -677,6 +687,16 @@ class SPUDocumentTransformer: | ||
| 677 | "brief": str(spu_row.get("brief") or "").strip(), | 687 | "brief": str(spu_row.get("brief") or "").strip(), |
| 678 | "description": str(spu_row.get("description") or "").strip(), | 688 | "description": str(spu_row.get("description") or "").strip(), |
| 679 | "image_url": str(spu_row.get("image_src") or "").strip(), | 689 | "image_url": str(spu_row.get("image_src") or "").strip(), |
| 690 | + "category": str(spu_row.get("category") or "").strip(), | ||
| 691 | + "category_path": next( | ||
| 692 | + ( | ||
| 693 | + str(value).strip() | ||
| 694 | + for value in (doc.get("category_path") or {}).values() | ||
| 695 | + if str(value).strip() | ||
| 696 | + ), | ||
| 697 | + "", | ||
| 698 | + ), | ||
| 699 | + "category1_name": str(doc.get("category1_name") or "").strip(), | ||
| 680 | } | 700 | } |
| 681 | ], | 701 | ], |
| 682 | tenant_id=str(tenant_id), | 702 | tenant_id=str(tenant_id), |
indexer/product_enrich.py
| @@ -31,9 +31,7 @@ from indexer.product_enrich_prompts import ( | @@ -31,9 +31,7 @@ from indexer.product_enrich_prompts import ( | ||
| 31 | USER_INSTRUCTION_TEMPLATE, | 31 | USER_INSTRUCTION_TEMPLATE, |
| 32 | LANGUAGE_MARKDOWN_TABLE_HEADERS, | 32 | LANGUAGE_MARKDOWN_TABLE_HEADERS, |
| 33 | SHARED_ANALYSIS_INSTRUCTION, | 33 | SHARED_ANALYSIS_INSTRUCTION, |
| 34 | - TAXONOMY_LANGUAGE_MARKDOWN_TABLE_HEADERS, | ||
| 35 | - TAXONOMY_MARKDOWN_TABLE_HEADERS_EN, | ||
| 36 | - TAXONOMY_SHARED_ANALYSIS_INSTRUCTION, | 34 | + CATEGORY_TAXONOMY_PROFILES, |
| 37 | ) | 35 | ) |
| 38 | 36 | ||
| 39 | # 配置 | 37 | # 配置 |
| @@ -188,37 +186,6 @@ _CONTENT_ANALYSIS_FIELD_ALIASES = { | @@ -188,37 +186,6 @@ _CONTENT_ANALYSIS_FIELD_ALIASES = { | ||
| 188 | "tags": ("tags", "enriched_tags"), | 186 | "tags": ("tags", "enriched_tags"), |
| 189 | } | 187 | } |
| 190 | _CONTENT_ANALYSIS_QUALITY_FIELDS = ("title", "category_path", "anchor_text") | 188 | _CONTENT_ANALYSIS_QUALITY_FIELDS = ("title", "category_path", "anchor_text") |
| 191 | -_APPAREL_TAXONOMY_ATTRIBUTE_FIELD_MAP = ( | ||
| 192 | - ("product_type", "Product Type"), | ||
| 193 | - ("target_gender", "Target Gender"), | ||
| 194 | - ("age_group", "Age Group"), | ||
| 195 | - ("season", "Season"), | ||
| 196 | - ("fit", "Fit"), | ||
| 197 | - ("silhouette", "Silhouette"), | ||
| 198 | - ("neckline", "Neckline"), | ||
| 199 | - ("sleeve_length_type", "Sleeve Length Type"), | ||
| 200 | - ("sleeve_style", "Sleeve Style"), | ||
| 201 | - ("strap_type", "Strap Type"), | ||
| 202 | - ("rise_waistline", "Rise / Waistline"), | ||
| 203 | - ("leg_shape", "Leg Shape"), | ||
| 204 | - ("skirt_shape", "Skirt Shape"), | ||
| 205 | - ("length_type", "Length Type"), | ||
| 206 | - ("closure_type", "Closure Type"), | ||
| 207 | - ("design_details", "Design Details"), | ||
| 208 | - ("fabric", "Fabric"), | ||
| 209 | - ("material_composition", "Material Composition"), | ||
| 210 | - ("fabric_properties", "Fabric Properties"), | ||
| 211 | - ("clothing_features", "Clothing Features"), | ||
| 212 | - ("functional_benefits", "Functional Benefits"), | ||
| 213 | - ("color", "Color"), | ||
| 214 | - ("color_family", "Color Family"), | ||
| 215 | - ("print_pattern", "Print / Pattern"), | ||
| 216 | - ("occasion_end_use", "Occasion / End Use"), | ||
| 217 | - ("style_aesthetic", "Style Aesthetic"), | ||
| 218 | -) | ||
| 219 | -_APPAREL_TAXONOMY_ANALYSIS_RESULT_FIELDS = tuple( | ||
| 220 | - field_name for field_name, _ in _APPAREL_TAXONOMY_ATTRIBUTE_FIELD_MAP | ||
| 221 | -) | ||
| 222 | 189 | ||
| 223 | 190 | ||
| 224 | @dataclass(frozen=True) | 191 | @dataclass(frozen=True) |
| @@ -228,6 +195,7 @@ class AnalysisSchema: | @@ -228,6 +195,7 @@ class AnalysisSchema: | ||
| 228 | markdown_table_headers: Dict[str, List[str]] | 195 | markdown_table_headers: Dict[str, List[str]] |
| 229 | result_fields: Tuple[str, ...] | 196 | result_fields: Tuple[str, ...] |
| 230 | meaningful_fields: Tuple[str, ...] | 197 | meaningful_fields: Tuple[str, ...] |
| 198 | + output_languages: Tuple[str, ...] = ("zh", "en") | ||
| 231 | cache_version: str = "v1" | 199 | cache_version: str = "v1" |
| 232 | field_aliases: Dict[str, Tuple[str, ...]] = field(default_factory=dict) | 200 | field_aliases: Dict[str, Tuple[str, ...]] = field(default_factory=dict) |
| 233 | fallback_headers: Optional[List[str]] = None | 201 | fallback_headers: Optional[List[str]] = None |
| @@ -249,36 +217,111 @@ _ANALYSIS_SCHEMAS: Dict[str, AnalysisSchema] = { | @@ -249,36 +217,111 @@ _ANALYSIS_SCHEMAS: Dict[str, AnalysisSchema] = { | ||
| 249 | markdown_table_headers=LANGUAGE_MARKDOWN_TABLE_HEADERS, | 217 | markdown_table_headers=LANGUAGE_MARKDOWN_TABLE_HEADERS, |
| 250 | result_fields=_CONTENT_ANALYSIS_RESULT_FIELDS, | 218 | result_fields=_CONTENT_ANALYSIS_RESULT_FIELDS, |
| 251 | meaningful_fields=_CONTENT_ANALYSIS_MEANINGFUL_FIELDS, | 219 | meaningful_fields=_CONTENT_ANALYSIS_MEANINGFUL_FIELDS, |
| 220 | + output_languages=_CORE_INDEX_LANGUAGES, | ||
| 252 | cache_version="v2", | 221 | cache_version="v2", |
| 253 | field_aliases=_CONTENT_ANALYSIS_FIELD_ALIASES, | 222 | field_aliases=_CONTENT_ANALYSIS_FIELD_ALIASES, |
| 254 | quality_fields=_CONTENT_ANALYSIS_QUALITY_FIELDS, | 223 | quality_fields=_CONTENT_ANALYSIS_QUALITY_FIELDS, |
| 255 | ), | 224 | ), |
| 256 | } | 225 | } |
| 257 | 226 | ||
| 258 | -_CATEGORY_TAXONOMY_PROFILE_SCHEMAS: Dict[str, AnalysisSchema] = { | ||
| 259 | - "apparel": AnalysisSchema( | ||
| 260 | - name="taxonomy:apparel", | ||
| 261 | - shared_instruction=TAXONOMY_SHARED_ANALYSIS_INSTRUCTION, | ||
| 262 | - markdown_table_headers=TAXONOMY_LANGUAGE_MARKDOWN_TABLE_HEADERS, | ||
| 263 | - result_fields=_APPAREL_TAXONOMY_ANALYSIS_RESULT_FIELDS, | ||
| 264 | - meaningful_fields=_APPAREL_TAXONOMY_ANALYSIS_RESULT_FIELDS, | 227 | +def _build_taxonomy_profile_schema(profile: str, config: Dict[str, Any]) -> AnalysisSchema: |
| 228 | + result_fields = tuple(field["key"] for field in config["fields"]) | ||
| 229 | + headers = config["markdown_table_headers"] | ||
| 230 | + return AnalysisSchema( | ||
| 231 | + name=f"taxonomy:{profile}", | ||
| 232 | + shared_instruction=config["shared_instruction"], | ||
| 233 | + markdown_table_headers=headers, | ||
| 234 | + result_fields=result_fields, | ||
| 235 | + meaningful_fields=result_fields, | ||
| 236 | + output_languages=tuple(config["output_languages"]), | ||
| 265 | cache_version="v1", | 237 | cache_version="v1", |
| 266 | - fallback_headers=TAXONOMY_MARKDOWN_TABLE_HEADERS_EN, | ||
| 267 | - ), | 238 | + fallback_headers=headers.get("en") if len(headers) > 1 else None, |
| 239 | + ) | ||
| 240 | + | ||
| 241 | + | ||
| 242 | +_CATEGORY_TAXONOMY_PROFILE_SCHEMAS: Dict[str, AnalysisSchema] = { | ||
| 243 | + profile: _build_taxonomy_profile_schema(profile, config) | ||
| 244 | + for profile, config in CATEGORY_TAXONOMY_PROFILES.items() | ||
| 268 | } | 245 | } |
| 269 | 246 | ||
| 270 | _CATEGORY_TAXONOMY_PROFILE_ATTRIBUTE_FIELD_MAPS: Dict[str, Tuple[Tuple[str, str], ...]] = { | 247 | _CATEGORY_TAXONOMY_PROFILE_ATTRIBUTE_FIELD_MAPS: Dict[str, Tuple[Tuple[str, str], ...]] = { |
| 271 | - "apparel": _APPAREL_TAXONOMY_ATTRIBUTE_FIELD_MAP, | 248 | + profile: tuple((field["key"], field["label"]) for field in config["fields"]) |
| 249 | + for profile, config in CATEGORY_TAXONOMY_PROFILES.items() | ||
| 272 | } | 250 | } |
| 273 | 251 | ||
| 274 | 252 | ||
| 253 | +def get_supported_category_taxonomy_profiles() -> Tuple[str, ...]: | ||
| 254 | + return tuple(_CATEGORY_TAXONOMY_PROFILE_SCHEMAS.keys()) | ||
| 255 | + | ||
| 256 | + | ||
| 257 | +def _normalize_category_hint(text: Any) -> str: | ||
| 258 | + value = str(text or "").strip().lower() | ||
| 259 | + if not value: | ||
| 260 | + return "" | ||
| 261 | + value = value.replace("_", " ").replace(">", " ").replace("/", " ") | ||
| 262 | + value = re.sub(r"\s+", " ", value) | ||
| 263 | + return value | ||
| 264 | + | ||
| 265 | + | ||
| 266 | +_CATEGORY_TAXONOMY_PROFILE_ALIAS_MATCHERS: Tuple[Tuple[str, str], ...] = tuple( | ||
| 267 | + sorted( | ||
| 268 | + ( | ||
| 269 | + (_normalize_category_hint(alias), profile) | ||
| 270 | + for profile, config in CATEGORY_TAXONOMY_PROFILES.items() | ||
| 271 | + for alias in (profile, *tuple(config.get("aliases") or ())) | ||
| 272 | + if _normalize_category_hint(alias) | ||
| 273 | + ), | ||
| 274 | + key=lambda item: len(item[0]), | ||
| 275 | + reverse=True, | ||
| 276 | + ) | ||
| 277 | +) | ||
| 278 | + | ||
| 279 | + | ||
| 275 | def _normalize_category_taxonomy_profile(category_taxonomy_profile: Optional[str] = None) -> str: | 280 | def _normalize_category_taxonomy_profile(category_taxonomy_profile: Optional[str] = None) -> str: |
| 276 | profile = str(category_taxonomy_profile or _DEFAULT_CATEGORY_TAXONOMY_PROFILE).strip() | 281 | profile = str(category_taxonomy_profile or _DEFAULT_CATEGORY_TAXONOMY_PROFILE).strip() |
| 277 | if profile not in _CATEGORY_TAXONOMY_PROFILE_SCHEMAS: | 282 | if profile not in _CATEGORY_TAXONOMY_PROFILE_SCHEMAS: |
| 278 | - raise ValueError(f"Unsupported category_taxonomy_profile: {profile}") | 283 | + supported = ", ".join(get_supported_category_taxonomy_profiles()) |
| 284 | + raise ValueError( | ||
| 285 | + f"Unsupported category_taxonomy_profile: {profile}. Supported profiles: {supported}" | ||
| 286 | + ) | ||
| 279 | return profile | 287 | return profile |
| 280 | 288 | ||
| 281 | 289 | ||
| 290 | +def detect_category_taxonomy_profile(item: Dict[str, Any]) -> Optional[str]: | ||
| 291 | + """ | ||
| 292 | + 根据商品已有类目信息猜测 taxonomy profile。 | ||
| 293 | + 未命中时返回 None,由上层决定是否回退到默认 profile。 | ||
| 294 | + """ | ||
| 295 | + category_hints = ( | ||
| 296 | + item.get("category_taxonomy_profile"), | ||
| 297 | + item.get("category1_name"), | ||
| 298 | + item.get("category_name_text"), | ||
| 299 | + item.get("category"), | ||
| 300 | + item.get("category_path"), | ||
| 301 | + ) | ||
| 302 | + for hint in category_hints: | ||
| 303 | + normalized_hint = _normalize_category_hint(hint) | ||
| 304 | + if not normalized_hint: | ||
| 305 | + continue | ||
| 306 | + for alias, profile in _CATEGORY_TAXONOMY_PROFILE_ALIAS_MATCHERS: | ||
| 307 | + if alias and alias in normalized_hint: | ||
| 308 | + return profile | ||
| 309 | + return None | ||
| 310 | + | ||
| 311 | + | ||
| 312 | +def _resolve_category_taxonomy_profile( | ||
| 313 | + item: Dict[str, Any], | ||
| 314 | + fallback_profile: Optional[str] = None, | ||
| 315 | +) -> str: | ||
| 316 | + explicit_profile = str(item.get("category_taxonomy_profile") or "").strip() | ||
| 317 | + if explicit_profile: | ||
| 318 | + return _normalize_category_taxonomy_profile(explicit_profile) | ||
| 319 | + detected_profile = detect_category_taxonomy_profile(item) | ||
| 320 | + if detected_profile: | ||
| 321 | + return detected_profile | ||
| 322 | + return _normalize_category_taxonomy_profile(fallback_profile) | ||
| 323 | + | ||
| 324 | + | ||
| 282 | def _get_analysis_schema( | 325 | def _get_analysis_schema( |
| 283 | analysis_kind: str, | 326 | analysis_kind: str, |
| 284 | *, | 327 | *, |
| @@ -299,6 +342,17 @@ def _get_taxonomy_attribute_field_map( | @@ -299,6 +342,17 @@ def _get_taxonomy_attribute_field_map( | ||
| 299 | return _CATEGORY_TAXONOMY_PROFILE_ATTRIBUTE_FIELD_MAPS[profile] | 342 | return _CATEGORY_TAXONOMY_PROFILE_ATTRIBUTE_FIELD_MAPS[profile] |
| 300 | 343 | ||
| 301 | 344 | ||
| 345 | +def _get_analysis_output_languages( | ||
| 346 | + analysis_kind: str, | ||
| 347 | + *, | ||
| 348 | + category_taxonomy_profile: Optional[str] = None, | ||
| 349 | +) -> Tuple[str, ...]: | ||
| 350 | + return _get_analysis_schema( | ||
| 351 | + analysis_kind, | ||
| 352 | + category_taxonomy_profile=category_taxonomy_profile, | ||
| 353 | + ).output_languages | ||
| 354 | + | ||
| 355 | + | ||
| 302 | def _normalize_enrichment_scopes( | 356 | def _normalize_enrichment_scopes( |
| 303 | enrichment_scopes: Optional[List[str]] = None, | 357 | enrichment_scopes: Optional[List[str]] = None, |
| 304 | ) -> Tuple[str, ...]: | 358 | ) -> Tuple[str, ...]: |
| @@ -508,6 +562,11 @@ def _normalize_index_content_item(item: Dict[str, Any]) -> Dict[str, str]: | @@ -508,6 +562,11 @@ def _normalize_index_content_item(item: Dict[str, Any]) -> Dict[str, str]: | ||
| 508 | "brief": str(item.get("brief") or "").strip(), | 562 | "brief": str(item.get("brief") or "").strip(), |
| 509 | "description": str(item.get("description") or "").strip(), | 563 | "description": str(item.get("description") or "").strip(), |
| 510 | "image_url": str(item.get("image_url") or "").strip(), | 564 | "image_url": str(item.get("image_url") or "").strip(), |
| 565 | + "category": str(item.get("category") or "").strip(), | ||
| 566 | + "category_path": str(item.get("category_path") or "").strip(), | ||
| 567 | + "category_name_text": str(item.get("category_name_text") or "").strip(), | ||
| 568 | + "category1_name": str(item.get("category1_name") or "").strip(), | ||
| 569 | + "category_taxonomy_profile": str(item.get("category_taxonomy_profile") or "").strip(), | ||
| 511 | } | 570 | } |
| 512 | 571 | ||
| 513 | 572 | ||
| @@ -525,7 +584,8 @@ def build_index_content_fields( | @@ -525,7 +584,8 @@ def build_index_content_fields( | ||
| 525 | - `title` | 584 | - `title` |
| 526 | - 可选 `brief` / `description` / `image_url` | 585 | - 可选 `brief` / `description` / `image_url` |
| 527 | - 可选 `enrichment_scopes`,默认同时执行 `generic` 与 `category_taxonomy` | 586 | - 可选 `enrichment_scopes`,默认同时执行 `generic` 与 `category_taxonomy` |
| 528 | - - 可选 `category_taxonomy_profile`,默认 `apparel` | 587 | + - 可选 `category_taxonomy_profile`;若不传,则优先根据 item 自带的类目字段推断,否则回退到默认 `apparel` |
| 588 | + - 可选类目提示字段:`category` / `category_path` / `category_name_text` / `category1_name` | ||
| 529 | 589 | ||
| 530 | 返回项结构: | 590 | 返回项结构: |
| 531 | - `id` | 591 | - `id` |
| @@ -540,10 +600,21 @@ def build_index_content_fields( | @@ -540,10 +600,21 @@ def build_index_content_fields( | ||
| 540 | - `enriched_tags.{lang}` 为标签数组 | 600 | - `enriched_tags.{lang}` 为标签数组 |
| 541 | """ | 601 | """ |
| 542 | requested_enrichment_scopes = _normalize_enrichment_scopes(enrichment_scopes) | 602 | requested_enrichment_scopes = _normalize_enrichment_scopes(enrichment_scopes) |
| 543 | - normalized_taxonomy_profile = _normalize_category_taxonomy_profile(category_taxonomy_profile) | 603 | + fallback_taxonomy_profile = ( |
| 604 | + _normalize_category_taxonomy_profile(category_taxonomy_profile) | ||
| 605 | + if category_taxonomy_profile | ||
| 606 | + else None | ||
| 607 | + ) | ||
| 544 | normalized_items = [_normalize_index_content_item(item) for item in items] | 608 | normalized_items = [_normalize_index_content_item(item) for item in items] |
| 545 | if not normalized_items: | 609 | if not normalized_items: |
| 546 | return [] | 610 | return [] |
| 611 | + taxonomy_profile_by_id = { | ||
| 612 | + item["id"]: _resolve_category_taxonomy_profile( | ||
| 613 | + item, | ||
| 614 | + fallback_profile=fallback_taxonomy_profile, | ||
| 615 | + ) | ||
| 616 | + for item in normalized_items | ||
| 617 | + } | ||
| 547 | 618 | ||
| 548 | results_by_id: Dict[str, Dict[str, Any]] = { | 619 | results_by_id: Dict[str, Dict[str, Any]] = { |
| 549 | item["id"]: { | 620 | item["id"]: { |
| @@ -556,7 +627,7 @@ def build_index_content_fields( | @@ -556,7 +627,7 @@ def build_index_content_fields( | ||
| 556 | for item in normalized_items | 627 | for item in normalized_items |
| 557 | } | 628 | } |
| 558 | 629 | ||
| 559 | - for lang in _CORE_INDEX_LANGUAGES: | 630 | + for lang in _get_analysis_output_languages("content"): |
| 560 | if "generic" in requested_enrichment_scopes: | 631 | if "generic" in requested_enrichment_scopes: |
| 561 | try: | 632 | try: |
| 562 | rows = analyze_products( | 633 | rows = analyze_products( |
| @@ -565,7 +636,7 @@ def build_index_content_fields( | @@ -565,7 +636,7 @@ def build_index_content_fields( | ||
| 565 | batch_size=BATCH_SIZE, | 636 | batch_size=BATCH_SIZE, |
| 566 | tenant_id=tenant_id, | 637 | tenant_id=tenant_id, |
| 567 | analysis_kind="content", | 638 | analysis_kind="content", |
| 568 | - category_taxonomy_profile=normalized_taxonomy_profile, | 639 | + category_taxonomy_profile=fallback_taxonomy_profile, |
| 569 | ) | 640 | ) |
| 570 | except Exception as e: | 641 | except Exception as e: |
| 571 | logger.warning("build_index_content_fields content enrichment failed for lang=%s: %s", lang, e) | 642 | logger.warning("build_index_content_fields content enrichment failed for lang=%s: %s", lang, e) |
| @@ -582,39 +653,49 @@ def build_index_content_fields( | @@ -582,39 +653,49 @@ def build_index_content_fields( | ||
| 582 | continue | 653 | continue |
| 583 | _apply_index_content_row(results_by_id[item_id], row=row, lang=lang) | 654 | _apply_index_content_row(results_by_id[item_id], row=row, lang=lang) |
| 584 | 655 | ||
| 585 | - if "category_taxonomy" in requested_enrichment_scopes: | ||
| 586 | - try: | ||
| 587 | - taxonomy_rows = analyze_products( | ||
| 588 | - products=normalized_items, | ||
| 589 | - target_lang=lang, | ||
| 590 | - batch_size=BATCH_SIZE, | ||
| 591 | - tenant_id=tenant_id, | ||
| 592 | - analysis_kind="taxonomy", | ||
| 593 | - category_taxonomy_profile=normalized_taxonomy_profile, | ||
| 594 | - ) | ||
| 595 | - except Exception as e: | ||
| 596 | - logger.warning( | ||
| 597 | - "build_index_content_fields taxonomy enrichment failed for lang=%s: %s", | ||
| 598 | - lang, | ||
| 599 | - e, | ||
| 600 | - ) | ||
| 601 | - for item in normalized_items: | ||
| 602 | - results_by_id[item["id"]].setdefault("error", str(e)) | ||
| 603 | - continue | 656 | + if "category_taxonomy" in requested_enrichment_scopes: |
| 657 | + items_by_profile: Dict[str, List[Dict[str, str]]] = {} | ||
| 658 | + for item in normalized_items: | ||
| 659 | + items_by_profile.setdefault(taxonomy_profile_by_id[item["id"]], []).append(item) | ||
| 604 | 660 | ||
| 605 | - for row in taxonomy_rows or []: | ||
| 606 | - item_id = str(row.get("id") or "").strip() | ||
| 607 | - if not item_id or item_id not in results_by_id: | ||
| 608 | - continue | ||
| 609 | - if row.get("error"): | ||
| 610 | - results_by_id[item_id].setdefault("error", row["error"]) | 661 | + for taxonomy_profile, profile_items in items_by_profile.items(): |
| 662 | + for lang in _get_analysis_output_languages( | ||
| 663 | + "taxonomy", | ||
| 664 | + category_taxonomy_profile=taxonomy_profile, | ||
| 665 | + ): | ||
| 666 | + try: | ||
| 667 | + taxonomy_rows = analyze_products( | ||
| 668 | + products=profile_items, | ||
| 669 | + target_lang=lang, | ||
| 670 | + batch_size=BATCH_SIZE, | ||
| 671 | + tenant_id=tenant_id, | ||
| 672 | + analysis_kind="taxonomy", | ||
| 673 | + category_taxonomy_profile=taxonomy_profile, | ||
| 674 | + ) | ||
| 675 | + except Exception as e: | ||
| 676 | + logger.warning( | ||
| 677 | + "build_index_content_fields taxonomy enrichment failed for profile=%s lang=%s: %s", | ||
| 678 | + taxonomy_profile, | ||
| 679 | + lang, | ||
| 680 | + e, | ||
| 681 | + ) | ||
| 682 | + for item in profile_items: | ||
| 683 | + results_by_id[item["id"]].setdefault("error", str(e)) | ||
| 611 | continue | 684 | continue |
| 612 | - _apply_index_taxonomy_row( | ||
| 613 | - results_by_id[item_id], | ||
| 614 | - row=row, | ||
| 615 | - lang=lang, | ||
| 616 | - category_taxonomy_profile=normalized_taxonomy_profile, | ||
| 617 | - ) | 685 | + |
| 686 | + for row in taxonomy_rows or []: | ||
| 687 | + item_id = str(row.get("id") or "").strip() | ||
| 688 | + if not item_id or item_id not in results_by_id: | ||
| 689 | + continue | ||
| 690 | + if row.get("error"): | ||
| 691 | + results_by_id[item_id].setdefault("error", row["error"]) | ||
| 692 | + continue | ||
| 693 | + _apply_index_taxonomy_row( | ||
| 694 | + results_by_id[item_id], | ||
| 695 | + row=row, | ||
| 696 | + lang=lang, | ||
| 697 | + category_taxonomy_profile=taxonomy_profile, | ||
| 698 | + ) | ||
| 618 | 699 | ||
| 619 | return [results_by_id[item["id"]] for item in normalized_items] | 700 | return [results_by_id[item["id"]] for item in normalized_items] |
| 620 | 701 |
indexer/product_enrich_prompts.py
| 1 | #!/usr/bin/env python3 | 1 | #!/usr/bin/env python3 |
| 2 | 2 | ||
| 3 | -from typing import Any, Dict | 3 | +from typing import Any, Dict, Tuple |
| 4 | 4 | ||
| 5 | SYSTEM_MESSAGE = ( | 5 | SYSTEM_MESSAGE = ( |
| 6 | "You are an e-commerce product annotator. " | 6 | "You are an e-commerce product annotator. " |
| @@ -33,110 +33,362 @@ Input product list: | @@ -33,110 +33,362 @@ Input product list: | ||
| 33 | USER_INSTRUCTION_TEMPLATE = """Please strictly return a Markdown table following the given columns in the specified language. For any column containing multiple values, separate them with commas. Do not add any other explanation. | 33 | USER_INSTRUCTION_TEMPLATE = """Please strictly return a Markdown table following the given columns in the specified language. For any column containing multiple values, separate them with commas. Do not add any other explanation. |
| 34 | Language: {language}""" | 34 | Language: {language}""" |
| 35 | 35 | ||
| 36 | -TAXONOMY_SHARED_ANALYSIS_INSTRUCTION = """Analyze each input product text and fill the columns below using an apparel attribute taxonomy. | 36 | +def _taxonomy_field( |
| 37 | + key: str, | ||
| 38 | + label: str, | ||
| 39 | + description: str, | ||
| 40 | + zh_label: str | None = None, | ||
| 41 | +) -> Dict[str, str]: | ||
| 42 | + return { | ||
| 43 | + "key": key, | ||
| 44 | + "label": label, | ||
| 45 | + "description": description, | ||
| 46 | + "zh_label": zh_label or label, | ||
| 47 | + } | ||
| 37 | 48 | ||
| 38 | -Output columns: | ||
| 39 | -1. Product Type: concise ecommerce apparel category label, not a full marketing title | ||
| 40 | -2. Target Gender: intended gender only if clearly implied | ||
| 41 | -3. Age Group: only if clearly implied, e.g. adults, kids, teens, toddlers, babies | ||
| 42 | -4. Season: season(s) or all-season suitability only if supported | ||
| 43 | -5. Fit: body closeness, e.g. slim, regular, relaxed, oversized, fitted | ||
| 44 | -6. Silhouette: overall garment shape, e.g. straight, A-line, boxy, tapered, bodycon, wide-leg | ||
| 45 | -7. Neckline: neckline type when applicable, e.g. crew neck, V-neck, hooded, collared, square neck | ||
| 46 | -8. Sleeve Length Type: sleeve length only, e.g. sleeveless, short sleeve, long sleeve, three-quarter sleeve | ||
| 47 | -9. Sleeve Style: sleeve design only, e.g. puff sleeve, raglan sleeve, batwing sleeve, bell sleeve | ||
| 48 | -10. Strap Type: strap design when applicable, e.g. spaghetti strap, wide strap, halter strap, adjustable strap | ||
| 49 | -11. Rise / Waistline: waist placement when applicable, e.g. high rise, mid rise, low rise, empire waist | ||
| 50 | -12. Leg Shape: for bottoms only, e.g. straight leg, wide leg, flare leg, tapered leg, skinny leg | ||
| 51 | -13. Skirt Shape: for skirts only, e.g. A-line, pleated, pencil, mermaid | ||
| 52 | -14. Length Type: design length only, not size, e.g. cropped, regular, longline, mini, midi, maxi, ankle length, full length | ||
| 53 | -15. Closure Type: fastening method when applicable, e.g. zipper, button, drawstring, elastic waist, hook-and-loop | ||
| 54 | -16. Design Details: construction or visual details, e.g. ruched, ruffled, pleated, cut-out, layered, distressed, split hem | ||
| 55 | -17. Fabric: fabric type only, e.g. denim, knit, chiffon, jersey, fleece, cotton twill | ||
| 56 | -18. Material Composition: fiber content or blend only if stated, e.g. cotton, polyester, spandex, linen blend, 95% cotton 5% elastane | ||
| 57 | -19. Fabric Properties: inherent fabric traits, e.g. stretch, breathable, lightweight, soft-touch, water-resistant | ||
| 58 | -20. Clothing Features: product features, e.g. lined, reversible, hooded, packable, padded, pocketed | ||
| 59 | -21. Functional Benefits: wearer benefits, e.g. moisture-wicking, thermal insulation, UV protection, easy care, supportive compression | ||
| 60 | -22. Color: specific color name when available | ||
| 61 | -23. Color Family: normalized broad retail color group, e.g. black, white, blue, green, red, pink, beige, brown, gray | ||
| 62 | -24. Print / Pattern: surface pattern when applicable, e.g. solid, striped, plaid, floral, graphic, animal print | ||
| 63 | -25. Occasion / End Use: likely use occasion only if supported, e.g. office, casual wear, streetwear, lounge, workout, outdoor | ||
| 64 | -26. Style Aesthetic: overall style only if supported, e.g. minimalist, streetwear, athleisure, smart casual, romantic, playful | ||
| 65 | 49 | ||
| 66 | -Rules: | ||
| 67 | -- Keep the same row order and row count as input. | ||
| 68 | -- Infer only from the provided product text. | ||
| 69 | -- Leave blank if not applicable or not reasonably supported. | ||
| 70 | -- Use concise, standardized ecommerce wording. | ||
| 71 | -- Do not combine different attribute dimensions in one field. | ||
| 72 | -- If multiple values are needed, use the delimiter required by the localization setting. | 50 | +def _build_taxonomy_shared_instruction(profile_label: str, fields: Tuple[Dict[str, str], ...]) -> str: |
| 51 | + lines = [ | ||
| 52 | + f"Analyze each input product text and fill the columns below using a {profile_label} attribute taxonomy.", | ||
| 53 | + "", | ||
| 54 | + "Output columns:", | ||
| 55 | + ] | ||
| 56 | + for idx, field in enumerate(fields, start=1): | ||
| 57 | + lines.append(f"{idx}. {field['label']}: {field['description']}") | ||
| 58 | + lines.extend( | ||
| 59 | + [ | ||
| 60 | + "", | ||
| 61 | + "Rules:", | ||
| 62 | + "- Keep the same row order and row count as input.", | ||
| 63 | + "- Infer only from the provided product text.", | ||
| 64 | + "- Leave blank if not applicable or not reasonably supported.", | ||
| 65 | + "- Use concise, standardized ecommerce wording.", | ||
| 66 | + "- Do not combine different attribute dimensions in one field.", | ||
| 67 | + "- If multiple values are needed, use the delimiter required by the localization setting.", | ||
| 68 | + "", | ||
| 69 | + "Input product list:", | ||
| 70 | + ] | ||
| 71 | + ) | ||
| 72 | + return "\n".join(lines) | ||
| 73 | 73 | ||
| 74 | -Input product list: | ||
| 75 | -""" | ||
| 76 | 74 | ||
| 77 | -TAXONOMY_MARKDOWN_TABLE_HEADERS_EN = [ | ||
| 78 | - "No.", | ||
| 79 | - "Product Type", | ||
| 80 | - "Target Gender", | ||
| 81 | - "Age Group", | ||
| 82 | - "Season", | ||
| 83 | - "Fit", | ||
| 84 | - "Silhouette", | ||
| 85 | - "Neckline", | ||
| 86 | - "Sleeve Length Type", | ||
| 87 | - "Sleeve Style", | ||
| 88 | - "Strap Type", | ||
| 89 | - "Rise / Waistline", | ||
| 90 | - "Leg Shape", | ||
| 91 | - "Skirt Shape", | ||
| 92 | - "Length Type", | ||
| 93 | - "Closure Type", | ||
| 94 | - "Design Details", | ||
| 95 | - "Fabric", | ||
| 96 | - "Material Composition", | ||
| 97 | - "Fabric Properties", | ||
| 98 | - "Clothing Features", | ||
| 99 | - "Functional Benefits", | ||
| 100 | - "Color", | ||
| 101 | - "Color Family", | ||
| 102 | - "Print / Pattern", | ||
| 103 | - "Occasion / End Use", | ||
| 104 | - "Style Aesthetic", | ||
| 105 | -] | 75 | +def _make_taxonomy_profile( |
| 76 | + profile_label: str, | ||
| 77 | + fields: Tuple[Dict[str, str], ...], | ||
| 78 | + *, | ||
| 79 | + aliases: Tuple[str, ...], | ||
| 80 | + output_languages: Tuple[str, ...] = ("en",), | ||
| 81 | + zh_headers: Tuple[str, ...] = (), | ||
| 82 | +) -> Dict[str, Any]: | ||
| 83 | + headers = {"en": ["No.", *[field["label"] for field in fields]]} | ||
| 84 | + if zh_headers: | ||
| 85 | + headers["zh"] = ["序号", *zh_headers] | ||
| 86 | + return { | ||
| 87 | + "profile_label": profile_label, | ||
| 88 | + "fields": fields, | ||
| 89 | + "aliases": aliases, | ||
| 90 | + "output_languages": output_languages, | ||
| 91 | + "shared_instruction": _build_taxonomy_shared_instruction(profile_label, fields), | ||
| 92 | + "markdown_table_headers": headers, | ||
| 93 | + } | ||
| 106 | 94 | ||
| 107 | -TAXONOMY_LANGUAGE_MARKDOWN_TABLE_HEADERS: Dict[str, Dict[str, Any]] = { | ||
| 108 | - "en": TAXONOMY_MARKDOWN_TABLE_HEADERS_EN, | ||
| 109 | - "zh": [ | ||
| 110 | - "序号", | ||
| 111 | - "品类", | ||
| 112 | - "目标性别", | ||
| 113 | - "年龄段", | ||
| 114 | - "适用季节", | ||
| 115 | - "版型", | ||
| 116 | - "廓形", | ||
| 117 | - "领型", | ||
| 118 | - "袖长类型", | ||
| 119 | - "袖型", | ||
| 120 | - "肩带设计", | ||
| 121 | - "腰型", | ||
| 122 | - "裤型", | ||
| 123 | - "裙型", | ||
| 124 | - "长度类型", | ||
| 125 | - "闭合方式", | ||
| 126 | - "设计细节", | ||
| 127 | - "面料", | ||
| 128 | - "成分", | ||
| 129 | - "面料特性", | ||
| 130 | - "服装特征", | ||
| 131 | - "功能", | ||
| 132 | - "主颜色", | ||
| 133 | - "色系", | ||
| 134 | - "印花 / 图案", | ||
| 135 | - "适用场景", | ||
| 136 | - "风格", | ||
| 137 | - ], | 95 | + |
| 96 | +APPAREL_TAXONOMY_FIELDS = ( | ||
| 97 | + _taxonomy_field("product_type", "Product Type", "concise ecommerce apparel category label, not a full marketing title", "品类"), | ||
| 98 | + _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied", "目标性别"), | ||
| 99 | + _taxonomy_field("age_group", "Age Group", "only if clearly implied, e.g. adults, kids, teens, toddlers, babies", "年龄段"), | ||
| 100 | + _taxonomy_field("season", "Season", "season(s) or all-season suitability only if supported", "适用季节"), | ||
| 101 | + _taxonomy_field("fit", "Fit", "body closeness, e.g. slim, regular, relaxed, oversized, fitted", "版型"), | ||
| 102 | + _taxonomy_field("silhouette", "Silhouette", "overall garment shape, e.g. straight, A-line, boxy, tapered, bodycon, wide-leg", "廓形"), | ||
| 103 | + _taxonomy_field("neckline", "Neckline", "neckline type when applicable, e.g. crew neck, V-neck, hooded, collared, square neck", "领型"), | ||
| 104 | + _taxonomy_field("sleeve_length_type", "Sleeve Length Type", "sleeve length only, e.g. sleeveless, short sleeve, long sleeve, three-quarter sleeve", "袖长类型"), | ||
| 105 | + _taxonomy_field("sleeve_style", "Sleeve Style", "sleeve design only, e.g. puff sleeve, raglan sleeve, batwing sleeve, bell sleeve", "袖型"), | ||
| 106 | + _taxonomy_field("strap_type", "Strap Type", "strap design when applicable, e.g. spaghetti strap, wide strap, halter strap, adjustable strap", "肩带设计"), | ||
| 107 | + _taxonomy_field("rise_waistline", "Rise / Waistline", "waist placement when applicable, e.g. high rise, mid rise, low rise, empire waist", "腰型"), | ||
| 108 | + _taxonomy_field("leg_shape", "Leg Shape", "for bottoms only, e.g. straight leg, wide leg, flare leg, tapered leg, skinny leg", "裤型"), | ||
| 109 | + _taxonomy_field("skirt_shape", "Skirt Shape", "for skirts only, e.g. A-line, pleated, pencil, mermaid", "裙型"), | ||
| 110 | + _taxonomy_field("length_type", "Length Type", "design length only, not size, e.g. cropped, regular, longline, mini, midi, maxi, ankle length, full length", "长度类型"), | ||
| 111 | + _taxonomy_field("closure_type", "Closure Type", "fastening method when applicable, e.g. zipper, button, drawstring, elastic waist, hook-and-loop", "闭合方式"), | ||
| 112 | + _taxonomy_field("design_details", "Design Details", "construction or visual details, e.g. ruched, ruffled, pleated, cut-out, layered, distressed, split hem", "设计细节"), | ||
| 113 | + _taxonomy_field("fabric", "Fabric", "fabric type only, e.g. denim, knit, chiffon, jersey, fleece, cotton twill", "面料"), | ||
| 114 | + _taxonomy_field("material_composition", "Material Composition", "fiber content or blend only if stated, e.g. cotton, polyester, spandex, linen blend, 95% cotton 5% elastane", "成分"), | ||
| 115 | + _taxonomy_field("fabric_properties", "Fabric Properties", "inherent fabric traits, e.g. stretch, breathable, lightweight, soft-touch, water-resistant", "面料特性"), | ||
| 116 | + _taxonomy_field("clothing_features", "Clothing Features", "product features, e.g. lined, reversible, hooded, packable, padded, pocketed", "服装特征"), | ||
| 117 | + _taxonomy_field("functional_benefits", "Functional Benefits", "wearer benefits, e.g. moisture-wicking, thermal insulation, UV protection, easy care, supportive compression", "功能"), | ||
| 118 | + _taxonomy_field("color", "Color", "specific color name when available", "主颜色"), | ||
| 119 | + _taxonomy_field("color_family", "Color Family", "normalized broad retail color group, e.g. black, white, blue, green, red, pink, beige, brown, gray", "色系"), | ||
| 120 | + _taxonomy_field("print_pattern", "Print / Pattern", "surface pattern when applicable, e.g. solid, striped, plaid, floral, graphic, animal print", "印花 / 图案"), | ||
| 121 | + _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use occasion only if supported, e.g. office, casual wear, streetwear, lounge, workout, outdoor", "适用场景"), | ||
| 122 | + _taxonomy_field("style_aesthetic", "Style Aesthetic", "overall style only if supported, e.g. minimalist, streetwear, athleisure, smart casual, romantic, playful", "风格"), | ||
| 123 | +) | ||
| 124 | + | ||
| 125 | +THREE_C_TAXONOMY_FIELDS = ( | ||
| 126 | + _taxonomy_field("product_type", "Product Type", "concise 3C accessory or peripheral category label"), | ||
| 127 | + _taxonomy_field("compatible_device", "Compatible Device / Model", "supported device family, series, model, or form factor when clearly stated"), | ||
| 128 | + _taxonomy_field("connectivity", "Connectivity", "connection method such as wired, wireless, Bluetooth, Wi-Fi, NFC, or 2.4G"), | ||
| 129 | + _taxonomy_field("interface_port_type", "Interface / Port Type", "relevant connector or port, e.g. USB-C, Lightning, HDMI, AUX, RJ45"), | ||
| 130 | + _taxonomy_field("power_charging", "Power Source / Charging", "charging or power mode, e.g. battery powered, fast charging, rechargeable, plug-in"), | ||
| 131 | + _taxonomy_field("key_features", "Key Features", "primary hardware features such as noise cancelling, foldable, magnetic, backlit, waterproof"), | ||
| 132 | + _taxonomy_field("material_finish", "Material / Finish", "main material or exterior finish when supported"), | ||
| 133 | + _taxonomy_field("color", "Color", "specific color name when available"), | ||
| 134 | + _taxonomy_field("pack_size", "Pack Size", "unit count or bundle size when stated"), | ||
| 135 | + _taxonomy_field("use_case", "Use Case", "intended usage such as travel, office, gaming, car, charging, streaming"), | ||
| 136 | +) | ||
| 137 | + | ||
| 138 | +BAGS_TAXONOMY_FIELDS = ( | ||
| 139 | + _taxonomy_field("product_type", "Product Type", "concise bag category such as backpack, tote bag, crossbody bag, luggage, or wallet"), | ||
| 140 | + _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied"), | ||
| 141 | + _taxonomy_field("carry_style", "Carry Style", "how the bag is worn or carried, e.g. handheld, shoulder, crossbody, backpack"), | ||
| 142 | + _taxonomy_field("size_capacity", "Size / Capacity", "size tier or capacity when supported, e.g. mini, large capacity, 20L"), | ||
| 143 | + _taxonomy_field("material", "Material", "main bag material such as leather, nylon, canvas, PU, straw"), | ||
| 144 | + _taxonomy_field("closure_type", "Closure Type", "bag closure such as zipper, flap, buckle, drawstring, magnetic snap"), | ||
| 145 | + _taxonomy_field("structure_compartments", "Structure / Compartments", "organizational structure such as multi-pocket, laptop sleeve, card slots, expandable"), | ||
| 146 | + _taxonomy_field("strap_handle_type", "Strap / Handle Type", "strap or handle design such as chain strap, top handle, adjustable strap"), | ||
| 147 | + _taxonomy_field("color", "Color", "specific color name when available"), | ||
| 148 | + _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use such as commute, travel, evening, school, casual"), | ||
| 149 | +) | ||
| 150 | + | ||
| 151 | +PET_SUPPLIES_TAXONOMY_FIELDS = ( | ||
| 152 | + _taxonomy_field("product_type", "Product Type", "concise pet supplies category label"), | ||
| 153 | + _taxonomy_field("pet_type", "Pet Type", "target pet such as dog, cat, bird, fish, hamster"), | ||
| 154 | + _taxonomy_field("breed_size", "Breed Size", "pet size or breed size when stated, e.g. small breed, large dogs"), | ||
| 155 | + _taxonomy_field("life_stage", "Life Stage", "pet age stage when supported, e.g. puppy, kitten, adult, senior"), | ||
| 156 | + _taxonomy_field("material_ingredients", "Material / Ingredients", "main material or ingredient composition when supported"), | ||
| 157 | + _taxonomy_field("flavor_scent", "Flavor / Scent", "flavor or scent when applicable"), | ||
| 158 | + _taxonomy_field("key_features", "Key Features", "primary attributes such as interactive, leak-proof, orthopedic, washable, elevated"), | ||
| 159 | + _taxonomy_field("functional_benefits", "Functional Benefits", "benefits such as dental care, calming, digestion support, joint support"), | ||
| 160 | + _taxonomy_field("size_capacity", "Size / Capacity", "size, count, or net content when stated"), | ||
| 161 | + _taxonomy_field("use_scenario", "Use Scenario", "usage such as feeding, training, grooming, travel, indoor play"), | ||
| 162 | +) | ||
| 163 | + | ||
| 164 | +ELECTRONICS_TAXONOMY_FIELDS = ( | ||
| 165 | + _taxonomy_field("product_type", "Product Type", "concise electronics device or component category label"), | ||
| 166 | + _taxonomy_field("device_category", "Device Category / Compatibility", "supported platform, component class, or compatible device family when stated"), | ||
| 167 | + _taxonomy_field("power_voltage", "Power / Voltage", "power, voltage, wattage, or battery spec when supported"), | ||
| 168 | + _taxonomy_field("connectivity", "Connectivity", "connection method such as wired, Bluetooth, Wi-Fi, RF, or smart app control"), | ||
| 169 | + _taxonomy_field("interface_port_type", "Interface / Port Type", "relevant port or interface such as USB-C, AC plug type, HDMI, SATA"), | ||
| 170 | + _taxonomy_field("capacity_storage", "Capacity / Storage", "capacity or storage spec such as 256GB, 2TB, 5000mAh"), | ||
| 171 | + _taxonomy_field("key_features", "Key Features", "main product features such as touch control, HD display, noise reduction, smart control"), | ||
| 172 | + _taxonomy_field("material_finish", "Material / Finish", "main housing material or finish when supported"), | ||
| 173 | + _taxonomy_field("color", "Color", "specific color name when available"), | ||
| 174 | + _taxonomy_field("use_case", "Use Case", "intended use such as home entertainment, office, charging, security, repair"), | ||
| 175 | +) | ||
| 176 | + | ||
| 177 | +OUTDOOR_TAXONOMY_FIELDS = ( | ||
| 178 | + _taxonomy_field("product_type", "Product Type", "concise outdoor gear category label"), | ||
| 179 | + _taxonomy_field("activity_type", "Activity Type", "primary outdoor activity such as camping, hiking, fishing, climbing, travel"), | ||
| 180 | + _taxonomy_field("season_weather", "Season / Weather", "season or weather suitability when supported"), | ||
| 181 | + _taxonomy_field("material", "Material", "main material such as aluminum, ripstop nylon, stainless steel, EVA"), | ||
| 182 | + _taxonomy_field("capacity_size", "Capacity / Size", "size, length, or capacity when stated"), | ||
| 183 | + _taxonomy_field("protection_resistance", "Protection / Resistance", "resistance or protection such as waterproof, UV resistant, windproof"), | ||
| 184 | + _taxonomy_field("key_features", "Key Features", "primary gear attributes such as foldable, lightweight, insulated, non-slip"), | ||
| 185 | + _taxonomy_field("portability_packability", "Portability / Packability", "carry or storage trait such as collapsible, compact, ultralight, packable"), | ||
| 186 | + _taxonomy_field("color", "Color", "specific color name when available"), | ||
| 187 | + _taxonomy_field("use_scenario", "Use Scenario", "likely use setting such as campsite, trail, survival kit, beach, picnic"), | ||
| 188 | +) | ||
| 189 | + | ||
| 190 | +HOME_APPLIANCES_TAXONOMY_FIELDS = ( | ||
| 191 | + _taxonomy_field("product_type", "Product Type", "concise home appliance category label"), | ||
| 192 | + _taxonomy_field("appliance_category", "Appliance Category", "functional class such as kitchen appliance, cleaning appliance, personal care appliance"), | ||
| 193 | + _taxonomy_field("power_voltage", "Power / Voltage", "wattage, voltage, plug type, or power supply when supported"), | ||
| 194 | + _taxonomy_field("capacity_coverage", "Capacity / Coverage", "capacity or coverage metric such as 1.5L, 20L, 40sqm"), | ||
| 195 | + _taxonomy_field("control_method", "Control Method", "operation method such as touch, knob, remote, app control"), | ||
| 196 | + _taxonomy_field("installation_type", "Installation Type", "setup style such as countertop, handheld, portable, wall-mounted, built-in"), | ||
| 197 | + _taxonomy_field("key_features", "Key Features", "main product features such as timer, steam, HEPA filter, self-cleaning"), | ||
| 198 | + _taxonomy_field("material_finish", "Material / Finish", "main material or exterior finish when supported"), | ||
| 199 | + _taxonomy_field("color", "Color", "specific color name when available"), | ||
| 200 | + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as cooking, cleaning, grooming, cooling, air treatment"), | ||
| 201 | +) | ||
| 202 | + | ||
| 203 | +HOME_LIVING_TAXONOMY_FIELDS = ( | ||
| 204 | + _taxonomy_field("product_type", "Product Type", "concise home and living category label"), | ||
| 205 | + _taxonomy_field("room_placement", "Room / Placement", "intended room or placement such as bedroom, kitchen, bathroom, desktop"), | ||
| 206 | + _taxonomy_field("material", "Material", "main material such as wood, ceramic, cotton, glass, metal"), | ||
| 207 | + _taxonomy_field("style", "Style", "home style such as modern, farmhouse, minimalist, boho, Nordic"), | ||
| 208 | + _taxonomy_field("size_dimensions", "Size / Dimensions", "size or dimensions when stated"), | ||
| 209 | + _taxonomy_field("color", "Color", "specific color name when available"), | ||
| 210 | + _taxonomy_field("pattern_finish", "Pattern / Finish", "surface pattern or finish such as solid, marble, matte, ribbed"), | ||
| 211 | + _taxonomy_field("key_features", "Key Features", "main product features such as stackable, washable, blackout, space-saving"), | ||
| 212 | + _taxonomy_field("assembly_installation", "Assembly / Installation", "assembly or installation trait when supported"), | ||
| 213 | + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as storage, dining, decor, sleep, organization"), | ||
| 214 | +) | ||
| 215 | + | ||
| 216 | +WIGS_TAXONOMY_FIELDS = ( | ||
| 217 | + _taxonomy_field("product_type", "Product Type", "concise wig or hairpiece category label"), | ||
| 218 | + _taxonomy_field("hair_material", "Hair Material", "hair material such as human hair, synthetic fiber, heat-resistant fiber"), | ||
| 219 | + _taxonomy_field("hair_texture", "Hair Texture", "texture or curl pattern such as straight, body wave, curly, kinky"), | ||
| 220 | + _taxonomy_field("hair_length", "Hair Length", "hair length when stated"), | ||
| 221 | + _taxonomy_field("hair_color", "Hair Color", "specific hair color or blend when available"), | ||
| 222 | + _taxonomy_field("cap_construction", "Cap Construction", "cap type such as full lace, lace front, glueless, U part"), | ||
| 223 | + _taxonomy_field("lace_area_part_type", "Lace Area / Part Type", "lace size or part style such as 13x4 lace, middle part, T part"), | ||
| 224 | + _taxonomy_field("density_volume", "Density / Volume", "hair density or fullness when supported"), | ||
| 225 | + _taxonomy_field("style_bang_type", "Style / Bang Type", "style cue such as bob, pixie, layered, with bangs"), | ||
| 226 | + _taxonomy_field("occasion_end_use", "Occasion / End Use", "intended use such as daily wear, cosplay, protective style, party"), | ||
| 227 | +) | ||
| 228 | + | ||
| 229 | +BEAUTY_TAXONOMY_FIELDS = ( | ||
| 230 | + _taxonomy_field("product_type", "Product Type", "concise beauty or cosmetics category label"), | ||
| 231 | + _taxonomy_field("target_area", "Target Area", "target area such as face, lips, eyes, nails, hair, body"), | ||
| 232 | + _taxonomy_field("skin_hair_type", "Skin Type / Hair Type", "suitable skin or hair type when supported"), | ||
| 233 | + _taxonomy_field("finish_effect", "Finish / Effect", "cosmetic finish or effect such as matte, dewy, volumizing, brightening"), | ||
| 234 | + _taxonomy_field("key_ingredients", "Key Ingredients", "notable ingredients when stated"), | ||
| 235 | + _taxonomy_field("shade_color", "Shade / Color", "specific shade or color when available"), | ||
| 236 | + _taxonomy_field("scent", "Scent", "fragrance or scent only when supported"), | ||
| 237 | + _taxonomy_field("formulation", "Formulation", "product form such as cream, serum, powder, gel, stick"), | ||
| 238 | + _taxonomy_field("functional_benefits", "Functional Benefits", "benefits such as hydration, anti-aging, long-wear, repair, sun protection"), | ||
| 239 | + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as daily routine, salon, travel, evening makeup"), | ||
| 240 | +) | ||
| 241 | + | ||
| 242 | +ACCESSORIES_TAXONOMY_FIELDS = ( | ||
| 243 | + _taxonomy_field("product_type", "Product Type", "concise accessory category label such as necklace, watch, belt, hat, or sunglasses"), | ||
| 244 | + _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied"), | ||
| 245 | + _taxonomy_field("material", "Material", "main material such as alloy, leather, stainless steel, acetate, fabric"), | ||
| 246 | + _taxonomy_field("color", "Color", "specific color name when available"), | ||
| 247 | + _taxonomy_field("pattern_finish", "Pattern / Finish", "surface treatment or style finish such as polished, textured, braided, rhinestone"), | ||
| 248 | + _taxonomy_field("closure_fastening", "Closure / Fastening", "fastening method when applicable"), | ||
| 249 | + _taxonomy_field("size_fit", "Size / Fit", "size or fit information such as adjustable, one size, 42mm"), | ||
| 250 | + _taxonomy_field("style", "Style", "style cue such as minimalist, vintage, statement, sporty"), | ||
| 251 | + _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use such as daily wear, formal, party, travel, sun protection"), | ||
| 252 | + _taxonomy_field("set_pack_size", "Set / Pack Size", "set count or pack size when stated"), | ||
| 253 | +) | ||
| 254 | + | ||
| 255 | +TOYS_TAXONOMY_FIELDS = ( | ||
| 256 | + _taxonomy_field("product_type", "Product Type", "concise toy category label"), | ||
| 257 | + _taxonomy_field("age_group", "Age Group", "intended age group when clearly implied"), | ||
| 258 | + _taxonomy_field("character_theme", "Character / Theme", "licensed character, theme, or play theme when supported"), | ||
| 259 | + _taxonomy_field("material", "Material", "main toy material such as plush, plastic, wood, silicone"), | ||
| 260 | + _taxonomy_field("power_source", "Power Source", "battery, rechargeable, wind-up, or non-powered when supported"), | ||
| 261 | + _taxonomy_field("interactive_features", "Interactive Features", "interactive functions such as sound, lights, remote control, motion"), | ||
| 262 | + _taxonomy_field("educational_play_value", "Educational / Play Value", "play value such as STEM, pretend play, sensory, puzzle solving"), | ||
| 263 | + _taxonomy_field("piece_count_size", "Piece Count / Size", "piece count or size when stated"), | ||
| 264 | + _taxonomy_field("color", "Color", "specific color name when available"), | ||
| 265 | + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as indoor play, bath time, party favor, outdoor play"), | ||
| 266 | +) | ||
| 267 | + | ||
| 268 | +SHOES_TAXONOMY_FIELDS = ( | ||
| 269 | + _taxonomy_field("product_type", "Product Type", "concise footwear category label"), | ||
| 270 | + _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied"), | ||
| 271 | + _taxonomy_field("age_group", "Age Group", "only if clearly implied"), | ||
| 272 | + _taxonomy_field("closure_type", "Closure Type", "fastening method such as lace-up, slip-on, buckle, hook-and-loop"), | ||
| 273 | + _taxonomy_field("toe_shape", "Toe Shape", "toe shape when applicable, e.g. round toe, pointed toe, open toe"), | ||
| 274 | + _taxonomy_field("heel_sole_type", "Heel Height / Sole Type", "heel or sole profile such as flat, block heel, wedge, platform, thick sole"), | ||
| 275 | + _taxonomy_field("upper_material", "Upper Material", "main upper material such as leather, knit, canvas, mesh"), | ||
| 276 | + _taxonomy_field("lining_insole_material", "Lining / Insole Material", "lining or insole material when supported"), | ||
| 277 | + _taxonomy_field("color", "Color", "specific color name when available"), | ||
| 278 | + _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use such as running, casual, office, hiking, formal"), | ||
| 279 | +) | ||
| 280 | + | ||
| 281 | +SPORTS_TAXONOMY_FIELDS = ( | ||
| 282 | + _taxonomy_field("product_type", "Product Type", "concise sports product category label"), | ||
| 283 | + _taxonomy_field("sport_activity", "Sport / Activity", "primary sport or activity such as fitness, yoga, basketball, cycling, swimming"), | ||
| 284 | + _taxonomy_field("skill_level", "Skill Level", "target user level when supported, e.g. beginner, training, professional"), | ||
| 285 | + _taxonomy_field("material", "Material", "main material such as EVA, carbon fiber, neoprene, latex"), | ||
| 286 | + _taxonomy_field("size_capacity", "Size / Capacity", "size, weight, resistance level, or capacity when stated"), | ||
| 287 | + _taxonomy_field("protection_support", "Protection / Support", "support or protection function such as ankle support, shock absorption, impact protection"), | ||
| 288 | + _taxonomy_field("key_features", "Key Features", "main features such as anti-slip, adjustable, foldable, quick-dry"), | ||
| 289 | + _taxonomy_field("power_source", "Power Source", "battery, electric, or non-powered when applicable"), | ||
| 290 | + _taxonomy_field("color", "Color", "specific color name when available"), | ||
| 291 | + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as gym, home workout, field training, competition"), | ||
| 292 | +) | ||
| 293 | + | ||
| 294 | +OTHERS_TAXONOMY_FIELDS = ( | ||
| 295 | + _taxonomy_field("product_type", "Product Type", "concise product category label, not a full marketing title"), | ||
| 296 | + _taxonomy_field("product_category", "Product Category", "broader retail grouping when the specific product type is narrow"), | ||
| 297 | + _taxonomy_field("target_user", "Target User", "intended user, audience, or recipient when clearly implied"), | ||
| 298 | + _taxonomy_field("material_ingredients", "Material / Ingredients", "main material or ingredients when supported"), | ||
| 299 | + _taxonomy_field("key_features", "Key Features", "primary product attributes or standout features"), | ||
| 300 | + _taxonomy_field("functional_benefits", "Functional Benefits", "practical benefits or performance advantages when supported"), | ||
| 301 | + _taxonomy_field("size_capacity", "Size / Capacity", "size, count, weight, or capacity when stated"), | ||
| 302 | + _taxonomy_field("color", "Color", "specific color name when available"), | ||
| 303 | + _taxonomy_field("style_theme", "Style / Theme", "overall style, design theme, or visual direction when supported"), | ||
| 304 | + _taxonomy_field("use_scenario", "Use Scenario", "likely use occasion or application setting when supported"), | ||
| 305 | +) | ||
| 306 | + | ||
| 307 | +CATEGORY_TAXONOMY_PROFILES: Dict[str, Dict[str, Any]] = { | ||
| 308 | + "apparel": _make_taxonomy_profile( | ||
| 309 | + "apparel", | ||
| 310 | + APPAREL_TAXONOMY_FIELDS, | ||
| 311 | + aliases=("服装", "服饰", "apparel", "clothing", "fashion"), | ||
| 312 | + output_languages=("zh", "en"), | ||
| 313 | + zh_headers=tuple(field["zh_label"] for field in APPAREL_TAXONOMY_FIELDS), | ||
| 314 | + ), | ||
| 315 | + "3c": _make_taxonomy_profile( | ||
| 316 | + "3C", | ||
| 317 | + THREE_C_TAXONOMY_FIELDS, | ||
| 318 | + aliases=("3c", "数码", "phone accessories", "computer peripherals", "smart wearables", "audio", "gaming gear"), | ||
| 319 | + ), | ||
| 320 | + "bags": _make_taxonomy_profile( | ||
| 321 | + "bags", | ||
| 322 | + BAGS_TAXONOMY_FIELDS, | ||
| 323 | + aliases=("bags", "bag", "包", "箱包", "handbag", "backpack", "wallet", "luggage"), | ||
| 324 | + ), | ||
| 325 | + "pet_supplies": _make_taxonomy_profile( | ||
| 326 | + "pet supplies", | ||
| 327 | + PET_SUPPLIES_TAXONOMY_FIELDS, | ||
| 328 | + aliases=("pet", "宠物", "pet supplies", "pet food", "pet toys", "pet care"), | ||
| 329 | + ), | ||
| 330 | + "electronics": _make_taxonomy_profile( | ||
| 331 | + "electronics", | ||
| 332 | + ELECTRONICS_TAXONOMY_FIELDS, | ||
| 333 | + aliases=("electronics", "电子", "electronic components", "consumer electronics", "digital devices"), | ||
| 334 | + ), | ||
| 335 | + "outdoor": _make_taxonomy_profile( | ||
| 336 | + "outdoor products", | ||
| 337 | + OUTDOOR_TAXONOMY_FIELDS, | ||
| 338 | + aliases=("outdoor", "户外", "camping", "hiking", "fishing", "travel accessories"), | ||
| 339 | + ), | ||
| 340 | + "home_appliances": _make_taxonomy_profile( | ||
| 341 | + "home appliances", | ||
| 342 | + HOME_APPLIANCES_TAXONOMY_FIELDS, | ||
| 343 | + aliases=("home appliances", "家电", "电器", "kitchen appliances", "cleaning appliances", "smart home devices"), | ||
| 344 | + ), | ||
| 345 | + "home_living": _make_taxonomy_profile( | ||
| 346 | + "home and living", | ||
| 347 | + HOME_LIVING_TAXONOMY_FIELDS, | ||
| 348 | + aliases=("home", "living", "家居", "家具", "家纺", "home decor", "kitchenware"), | ||
| 349 | + ), | ||
| 350 | + "wigs": _make_taxonomy_profile( | ||
| 351 | + "wigs", | ||
| 352 | + WIGS_TAXONOMY_FIELDS, | ||
| 353 | + aliases=("wig", "wigs", "假发", "hairpiece"), | ||
| 354 | + ), | ||
| 355 | + "beauty": _make_taxonomy_profile( | ||
| 356 | + "beauty and cosmetics", | ||
| 357 | + BEAUTY_TAXONOMY_FIELDS, | ||
| 358 | + aliases=("beauty", "cosmetics", "美容", "美妆", "makeup", "skincare", "nail care"), | ||
| 359 | + ), | ||
| 360 | + "accessories": _make_taxonomy_profile( | ||
| 361 | + "accessories", | ||
| 362 | + ACCESSORIES_TAXONOMY_FIELDS, | ||
| 363 | + aliases=("accessories", "配饰", "jewelry", "watches", "belts", "scarves", "hats", "sunglasses"), | ||
| 364 | + ), | ||
| 365 | + "toys": _make_taxonomy_profile( | ||
| 366 | + "toys", | ||
| 367 | + TOYS_TAXONOMY_FIELDS, | ||
| 368 | + aliases=("toys", "toy", "玩具", "plush", "action figures", "puzzles", "educational toys"), | ||
| 369 | + ), | ||
| 370 | + "shoes": _make_taxonomy_profile( | ||
| 371 | + "shoes", | ||
| 372 | + SHOES_TAXONOMY_FIELDS, | ||
| 373 | + aliases=("shoes", "shoe", "鞋", "sneakers", "boots", "sandals", "heels"), | ||
| 374 | + ), | ||
| 375 | + "sports": _make_taxonomy_profile( | ||
| 376 | + "sports products", | ||
| 377 | + SPORTS_TAXONOMY_FIELDS, | ||
| 378 | + aliases=("sports", "sport", "运动", "fitness", "cycling", "team sports", "water sports"), | ||
| 379 | + ), | ||
| 380 | + "others": _make_taxonomy_profile( | ||
| 381 | + "general merchandise", | ||
| 382 | + OTHERS_TAXONOMY_FIELDS, | ||
| 383 | + aliases=("others", "other", "其他", "general merchandise"), | ||
| 384 | + ), | ||
| 138 | } | 385 | } |
| 139 | 386 | ||
| 387 | +CATEGORY_TAXONOMY_PROFILE_NAMES = tuple(CATEGORY_TAXONOMY_PROFILES.keys()) | ||
| 388 | +TAXONOMY_SHARED_ANALYSIS_INSTRUCTION = CATEGORY_TAXONOMY_PROFILES["apparel"]["shared_instruction"] | ||
| 389 | +TAXONOMY_MARKDOWN_TABLE_HEADERS_EN = CATEGORY_TAXONOMY_PROFILES["apparel"]["markdown_table_headers"]["en"] | ||
| 390 | +TAXONOMY_LANGUAGE_MARKDOWN_TABLE_HEADERS: Dict[str, Dict[str, Any]] = CATEGORY_TAXONOMY_PROFILES["apparel"]["markdown_table_headers"] | ||
| 391 | + | ||
| 140 | LANGUAGE_MARKDOWN_TABLE_HEADERS: Dict[str, Dict[str, Any]] = { | 392 | LANGUAGE_MARKDOWN_TABLE_HEADERS: Dict[str, Dict[str, Any]] = { |
| 141 | "en": [ | 393 | "en": [ |
| 142 | "No.", | 394 | "No.", |
indexer/taxonomy.md
| @@ -171,3 +171,27 @@ Rules: | @@ -171,3 +171,27 @@ Rules: | ||
| 171 | Input product list: | 171 | Input product list: |
| 172 | """ | 172 | """ |
| 173 | ``` | 173 | ``` |
| 174 | + | ||
| 175 | +## 2. Other taxonomy profiles | ||
| 176 | + | ||
| 177 | +说明: | ||
| 178 | +- `apparel` 继续返回 `zh` + `en`。 | ||
| 179 | +- 其他 profile 只返回 `en`,并且只定义英文列名。 | ||
| 180 | +- 代码中的 profile slug 与下面保持一致。 | ||
| 181 | + | ||
| 182 | +| Profile | Core columns (`en`) | | ||
| 183 | +| --- | --- | | ||
| 184 | +| `3c` | Product Type, Compatible Device / Model, Connectivity, Interface / Port Type, Power Source / Charging, Key Features, Material / Finish, Color, Pack Size, Use Case | | ||
| 185 | +| `bags` | Product Type, Target Gender, Carry Style, Size / Capacity, Material, Closure Type, Structure / Compartments, Strap / Handle Type, Color, Occasion / End Use | | ||
| 186 | +| `pet_supplies` | Product Type, Pet Type, Breed Size, Life Stage, Material / Ingredients, Flavor / Scent, Key Features, Functional Benefits, Size / Capacity, Use Scenario | | ||
| 187 | +| `electronics` | Product Type, Device Category / Compatibility, Power / Voltage, Connectivity, Interface / Port Type, Capacity / Storage, Key Features, Material / Finish, Color, Use Case | | ||
| 188 | +| `outdoor` | Product Type, Activity Type, Season / Weather, Material, Capacity / Size, Protection / Resistance, Key Features, Portability / Packability, Color, Use Scenario | | ||
| 189 | +| `home_appliances` | Product Type, Appliance Category, Power / Voltage, Capacity / Coverage, Control Method, Installation Type, Key Features, Material / Finish, Color, Use Scenario | | ||
| 190 | +| `home_living` | Product Type, Room / Placement, Material, Style, Size / Dimensions, Color, Pattern / Finish, Key Features, Assembly / Installation, Use Scenario | | ||
| 191 | +| `wigs` | Product Type, Hair Material, Hair Texture, Hair Length, Hair Color, Cap Construction, Lace Area / Part Type, Density / Volume, Style / Bang Type, Occasion / End Use | | ||
| 192 | +| `beauty` | Product Type, Target Area, Skin Type / Hair Type, Finish / Effect, Key Ingredients, Shade / Color, Scent, Formulation, Functional Benefits, Use Scenario | | ||
| 193 | +| `accessories` | Product Type, Target Gender, Material, Color, Pattern / Finish, Closure / Fastening, Size / Fit, Style, Occasion / End Use, Set / Pack Size | | ||
| 194 | +| `toys` | Product Type, Age Group, Character / Theme, Material, Power Source, Interactive Features, Educational / Play Value, Piece Count / Size, Color, Use Scenario | | ||
| 195 | +| `shoes` | Product Type, Target Gender, Age Group, Closure Type, Toe Shape, Heel Height / Sole Type, Upper Material, Lining / Insole Material, Color, Occasion / End Use | | ||
| 196 | +| `sports` | Product Type, Sport / Activity, Skill Level, Material, Size / Capacity, Protection / Support, Key Features, Power Source, Color, Use Scenario | | ||
| 197 | +| `others` | Product Type, Product Category, Target User, Material / Ingredients, Key Features, Functional Benefits, Size / Capacity, Color, Style / Theme, Use Scenario | |
tests/ci/test_service_api_contracts.py
| @@ -454,6 +454,52 @@ def test_indexer_enrich_content_contract_accepts_deprecated_analysis_kinds(index | @@ -454,6 +454,52 @@ def test_indexer_enrich_content_contract_accepts_deprecated_analysis_kinds(index | ||
| 454 | assert data["category_taxonomy_profile"] == "apparel" | 454 | assert data["category_taxonomy_profile"] == "apparel" |
| 455 | 455 | ||
| 456 | 456 | ||
| 457 | +def test_indexer_enrich_content_contract_supports_non_apparel_taxonomy_profiles(indexer_client: TestClient, monkeypatch): | ||
| 458 | + import indexer.product_enrich as process_products | ||
| 459 | + | ||
| 460 | + def _fake_build_index_content_fields( | ||
| 461 | + items: List[Dict[str, str]], | ||
| 462 | + tenant_id: str | None = None, | ||
| 463 | + enrichment_scopes: List[str] | None = None, | ||
| 464 | + category_taxonomy_profile: str = "apparel", | ||
| 465 | + ): | ||
| 466 | + assert tenant_id == "162" | ||
| 467 | + assert enrichment_scopes == ["category_taxonomy"] | ||
| 468 | + assert category_taxonomy_profile == "toys" | ||
| 469 | + return [ | ||
| 470 | + { | ||
| 471 | + "id": items[0]["spu_id"], | ||
| 472 | + "qanchors": {}, | ||
| 473 | + "enriched_tags": {}, | ||
| 474 | + "enriched_attributes": [], | ||
| 475 | + "enriched_taxonomy_attributes": [ | ||
| 476 | + {"name": "Product Type", "value": {"en": ["doll set"]}}, | ||
| 477 | + {"name": "Age Group", "value": {"en": ["kids"]}}, | ||
| 478 | + ], | ||
| 479 | + } | ||
| 480 | + ] | ||
| 481 | + | ||
| 482 | + monkeypatch.setattr(process_products, "build_index_content_fields", _fake_build_index_content_fields) | ||
| 483 | + | ||
| 484 | + response = indexer_client.post( | ||
| 485 | + "/indexer/enrich-content", | ||
| 486 | + json={ | ||
| 487 | + "tenant_id": "162", | ||
| 488 | + "enrichment_scopes": ["category_taxonomy"], | ||
| 489 | + "category_taxonomy_profile": "toys", | ||
| 490 | + "items": [{"spu_id": "1001", "title": "Toy"}], | ||
| 491 | + }, | ||
| 492 | + ) | ||
| 493 | + | ||
| 494 | + assert response.status_code == 200 | ||
| 495 | + data = response.json() | ||
| 496 | + assert data["category_taxonomy_profile"] == "toys" | ||
| 497 | + assert data["results"][0]["enriched_taxonomy_attributes"] == [ | ||
| 498 | + {"name": "Product Type", "value": {"en": ["doll set"]}}, | ||
| 499 | + {"name": "Age Group", "value": {"en": ["kids"]}}, | ||
| 500 | + ] | ||
| 501 | + | ||
| 502 | + | ||
| 457 | def test_indexer_documents_contract(indexer_client: TestClient): | 503 | def test_indexer_documents_contract(indexer_client: TestClient): |
| 458 | """POST /indexer/documents: tenant_id + spu_ids, returns success/failed lists (no ES write).""" | 504 | """POST /indexer/documents: tenant_id + spu_ids, returns success/failed lists (no ES write).""" |
| 459 | response = indexer_client.post( | 505 | response = indexer_client.post( |
tests/test_product_enrich_partial_mode.py
| @@ -500,7 +500,6 @@ def test_build_index_content_fields_maps_internal_tags_to_enriched_tags_output() | @@ -500,7 +500,6 @@ def test_build_index_content_fields_maps_internal_tags_to_enriched_tags_output() | ||
| 500 | "style_aesthetic": "", | 500 | "style_aesthetic": "", |
| 501 | } | 501 | } |
| 502 | ] | 502 | ] |
| 503 | - assert category_taxonomy_profile == "apparel" | ||
| 504 | return [ | 503 | return [ |
| 505 | { | 504 | { |
| 506 | "id": products[0]["id"], | 505 | "id": products[0]["id"], |
| @@ -562,6 +561,120 @@ def test_build_index_content_fields_maps_internal_tags_to_enriched_tags_output() | @@ -562,6 +561,120 @@ def test_build_index_content_fields_maps_internal_tags_to_enriched_tags_output() | ||
| 562 | ] | 561 | ] |
| 563 | 562 | ||
| 564 | 563 | ||
| 564 | +def test_detect_category_taxonomy_profile_matches_category_hints(): | ||
| 565 | + assert product_enrich.detect_category_taxonomy_profile({"category1_name": "玩具"}) == "toys" | ||
| 566 | + assert product_enrich.detect_category_taxonomy_profile({"category": "Beauty & Cosmetics"}) == "beauty" | ||
| 567 | + assert product_enrich.detect_category_taxonomy_profile({"category_path": "Home Appliances / Kitchen"}) == "home_appliances" | ||
| 568 | + | ||
| 569 | + | ||
| 570 | +def test_build_index_content_fields_routes_taxonomy_by_item_profile_and_non_apparel_returns_en_only(): | ||
| 571 | + seen_calls = [] | ||
| 572 | + | ||
| 573 | + def fake_analyze_products( | ||
| 574 | + products, | ||
| 575 | + target_lang="zh", | ||
| 576 | + batch_size=None, | ||
| 577 | + tenant_id=None, | ||
| 578 | + analysis_kind="content", | ||
| 579 | + category_taxonomy_profile=None, | ||
| 580 | + ): | ||
| 581 | + seen_calls.append((analysis_kind, target_lang, category_taxonomy_profile, tuple(p["id"] for p in products))) | ||
| 582 | + if analysis_kind == "taxonomy": | ||
| 583 | + if category_taxonomy_profile == "apparel": | ||
| 584 | + return [ | ||
| 585 | + { | ||
| 586 | + "id": products[0]["id"], | ||
| 587 | + "lang": target_lang, | ||
| 588 | + "title_input": products[0]["title"], | ||
| 589 | + "product_type": f"{target_lang}-dress", | ||
| 590 | + "target_gender": f"{target_lang}-women", | ||
| 591 | + "age_group": "", | ||
| 592 | + "season": "", | ||
| 593 | + "fit": "", | ||
| 594 | + "silhouette": "", | ||
| 595 | + "neckline": "", | ||
| 596 | + "sleeve_length_type": "", | ||
| 597 | + "sleeve_style": "", | ||
| 598 | + "strap_type": "", | ||
| 599 | + "rise_waistline": "", | ||
| 600 | + "leg_shape": "", | ||
| 601 | + "skirt_shape": "", | ||
| 602 | + "length_type": "", | ||
| 603 | + "closure_type": "", | ||
| 604 | + "design_details": "", | ||
| 605 | + "fabric": "", | ||
| 606 | + "material_composition": "", | ||
| 607 | + "fabric_properties": "", | ||
| 608 | + "clothing_features": "", | ||
| 609 | + "functional_benefits": "", | ||
| 610 | + "color": "", | ||
| 611 | + "color_family": "", | ||
| 612 | + "print_pattern": "", | ||
| 613 | + "occasion_end_use": "", | ||
| 614 | + "style_aesthetic": "", | ||
| 615 | + } | ||
| 616 | + ] | ||
| 617 | + assert category_taxonomy_profile == "toys" | ||
| 618 | + assert target_lang == "en" | ||
| 619 | + return [ | ||
| 620 | + { | ||
| 621 | + "id": products[0]["id"], | ||
| 622 | + "lang": "en", | ||
| 623 | + "title_input": products[0]["title"], | ||
| 624 | + "product_type": "doll set", | ||
| 625 | + "age_group": "kids", | ||
| 626 | + "character_theme": "", | ||
| 627 | + "material": "", | ||
| 628 | + "power_source": "", | ||
| 629 | + "interactive_features": "", | ||
| 630 | + "educational_play_value": "", | ||
| 631 | + "piece_count_size": "", | ||
| 632 | + "color": "", | ||
| 633 | + "use_scenario": "", | ||
| 634 | + } | ||
| 635 | + ] | ||
| 636 | + | ||
| 637 | + return [ | ||
| 638 | + { | ||
| 639 | + "id": product["id"], | ||
| 640 | + "lang": target_lang, | ||
| 641 | + "title_input": product["title"], | ||
| 642 | + "title": product["title"], | ||
| 643 | + "category_path": "", | ||
| 644 | + "tags": f"{target_lang}-tag", | ||
| 645 | + "target_audience": "", | ||
| 646 | + "usage_scene": "", | ||
| 647 | + "season": "", | ||
| 648 | + "key_attributes": "", | ||
| 649 | + "material": "", | ||
| 650 | + "features": "", | ||
| 651 | + "anchor_text": f"{target_lang}-anchor", | ||
| 652 | + } | ||
| 653 | + for product in products | ||
| 654 | + ] | ||
| 655 | + | ||
| 656 | + with mock.patch.object(product_enrich, "analyze_products", side_effect=fake_analyze_products): | ||
| 657 | + result = product_enrich.build_index_content_fields( | ||
| 658 | + items=[ | ||
| 659 | + {"spu_id": "1", "title": "dress", "category_taxonomy_profile": "apparel"}, | ||
| 660 | + {"spu_id": "2", "title": "toy", "category_taxonomy_profile": "toys"}, | ||
| 661 | + ], | ||
| 662 | + tenant_id="170", | ||
| 663 | + category_taxonomy_profile="apparel", | ||
| 664 | + ) | ||
| 665 | + | ||
| 666 | + assert result[0]["enriched_taxonomy_attributes"] == [ | ||
| 667 | + {"name": "Product Type", "value": {"zh": ["zh-dress"], "en": ["en-dress"]}}, | ||
| 668 | + {"name": "Target Gender", "value": {"zh": ["zh-women"], "en": ["en-women"]}}, | ||
| 669 | + ] | ||
| 670 | + assert result[1]["enriched_taxonomy_attributes"] == [ | ||
| 671 | + {"name": "Product Type", "value": {"en": ["doll set"]}}, | ||
| 672 | + {"name": "Age Group", "value": {"en": ["kids"]}}, | ||
| 673 | + ] | ||
| 674 | + assert ("taxonomy", "zh", "toys", ("2",)) not in seen_calls | ||
| 675 | + assert ("taxonomy", "en", "toys", ("2",)) in seen_calls | ||
| 676 | + | ||
| 677 | + | ||
| 565 | def test_anchor_cache_key_depends_on_product_input_not_identifiers(): | 678 | def test_anchor_cache_key_depends_on_product_input_not_identifiers(): |
| 566 | product_a = { | 679 | product_a = { |
| 567 | "id": "1", | 680 | "id": "1", |