Commit 048631be73b69068b8fc2d139a1a9f769704e34a

Authored by tangwang
1 parent dabd52a5

1. 新增说明文档《product_enrich模块说明.md》

2. 删掉自动推断 taxonomy profile的逻辑,build_index_content_fields()
3. 所有 taxonomy profile 都输出 zh/en”,并把按行业切语言的逻辑去掉
   只接受显式传入的 category_taxonomy_profile
config/config.yaml
... ... @@ -114,6 +114,7 @@ field_boosts:
114 114 qanchors: 1.0
115 115 enriched_tags: 1.0
116 116 enriched_attributes.value: 1.5
  117 + enriched_taxonomy_attributes.value: 0.3
117 118 category_name_text: 2.0
118 119 category_path: 2.0
119 120 keywords: 2.0
... ... @@ -194,6 +195,7 @@ query_config:
194 195 - qanchors
195 196 - enriched_tags
196 197 - enriched_attributes.value
  198 + - enriched_taxonomy_attributes.value
197 199 - option1_values
198 200 - option2_values
199 201 - option3_values
... ... @@ -252,6 +254,7 @@ query_config:
252 254 # - qanchors
253 255 # - enriched_tags
254 256 # - enriched_attributes
  257 + # - enriched_taxonomy_attributes.value
255 258 - min_price
256 259 - compare_at_price
257 260 - image_url
... ...
docs/搜索API对接指南-05-索引接口(Indexer).md
... ... @@ -668,9 +668,9 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \
668 668 - `others`
669 669  
670 670 说明:
671   -- `apparel` 仍返回 `zh` + `en` 两种 taxonomy 值。
672   -- 其余 profile 的 `enriched_taxonomy_attributes.value` 只返回 `en`,以控制字段体积并保持结构简单。
673   -- Indexer 内部构建 ES 文档时,如果调用链没有显式指定 profile,会优先根据商品的类目字段自动推断 taxonomy profile;外部调用 `/indexer/enrich-content` 时仍以请求中的 `category_taxonomy_profile` 为准。
  671 +- 所有 profile 的 `enriched_taxonomy_attributes.value` 都统一返回 `zh` + `en`。
  672 +- 外部调用 `/indexer/enrich-content` 时,以请求中的 `category_taxonomy_profile` 为准。
  673 +- 当前 Indexer 内部构建 ES 文档时,taxonomy profile 暂时固定使用 `apparel`;代码里已保留 TODO,后续从数据库读取该租户真实所属行业后再替换。
674 674  
675 675 #### 请求参数
676 676  
... ... @@ -726,8 +726,7 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \
726 726  
727 727 - 接口不接受语言控制参数。
728 728 - 返回哪些语言、返回哪些语义维度,统一由 `indexer.product_enrich` 内部逻辑决定。
729   -- 当前为了与 `search_products` mapping 对齐,通用增强字段只包含核心索引语言 `zh`、`en`。
730   -- taxonomy 字段中,`apparel` 返回 `zh`、`en`;其他 profile 仅返回 `en`。
  729 +- 当前为了与 `search_products` mapping 对齐,通用增强字段与 taxonomy 字段都统一只返回核心索引语言 `zh`、`en`。
731 730  
732 731 批量请求建议:
733 732 - **全量**:强烈建议 尽可能 **20 个 SPU/doc** 攒成一个批次后再请求一次。
... ... @@ -787,7 +786,7 @@ curl -X POST "http://127.0.0.1:6004/indexer/build-docs-from-db" \
787 786 | `results[].qanchors` | object | 与 ES `qanchors` 字段同结构,按语言键返回短语数组 |
788 787 | `results[].enriched_tags` | object | 与 ES `enriched_tags` 字段同结构,按语言键返回标签数组 |
789 788 | `results[].enriched_attributes` | array | 与 ES `enriched_attributes` nested 字段同结构,每项为 `{ "name", "value": { "zh"?: "...", "en"?: "..." } }` |
790   -| `results[].enriched_taxonomy_attributes` | array | 与 ES `enriched_taxonomy_attributes` nested 字段同结构。`apparel` 每项通常为 `{ "name", "value": { "zh"?: [...], "en"?: [...] } }`;其他 profile 仅返回 `{ "name", "value": { "en": [...] } }` |
  789 +| `results[].enriched_taxonomy_attributes` | array | 与 ES `enriched_taxonomy_attributes` nested 字段同结构。每项通常为 `{ "name", "value": { "zh"?: [...], "en"?: [...] } }` |
791 790 | `results[].error` | string | 若该条处理失败(如 LLM 异常),会在此字段返回错误信息 |
792 791  
793 792 **错误响应**:
... ...
docs/搜索API对接指南-07-微服务接口(Embedding-Reranker-Translation).md
... ... @@ -444,7 +444,7 @@ curl "http://localhost:6006/health"
444 444  
445 445 - **Base URL**: Indexer 服务地址,如 `http://localhost:6004`
446 446 - **路径**: `POST /indexer/enrich-content`
447   -- **说明**: 根据商品标题批量生成 `qanchors`、`enriched_attributes`、`enriched_tags`、`enriched_taxonomy_attributes`,用于拼装 ES 文档。支持通过 `enrichment_scopes` 选择执行 `generic` / `category_taxonomy`,并通过 `category_taxonomy_profile` 选择对应大类的 taxonomy prompt/profile;默认执行 `generic + category_taxonomy(apparel)`。当前支持的 taxonomy profile 包括 `apparel`、`3c`、`bags`、`pet_supplies`、`electronics`、`outdoor`、`home_appliances`、`home_living`、`wigs`、`beauty`、`accessories`、`toys`、`shoes`、`sports`、`others`。其中 `apparel` 的 taxonomy 输出为 `zh` + `en`,其余 profile 的 taxonomy 输出仅返回 `en`。内部使用大模型(需配置 `DASHSCOPE_API_KEY`),支持多语言与 Redis 缓存;单次最多 50 条,建议批量调用以提升效率。
  447 +- **说明**: 根据商品标题批量生成 `qanchors`、`enriched_attributes`、`enriched_tags`、`enriched_taxonomy_attributes`,用于拼装 ES 文档。支持通过 `enrichment_scopes` 选择执行 `generic` / `category_taxonomy`,并通过 `category_taxonomy_profile` 选择对应大类的 taxonomy prompt/profile;默认执行 `generic + category_taxonomy(apparel)`。当前支持的 taxonomy profile 包括 `apparel`、`3c`、`bags`、`pet_supplies`、`electronics`、`outdoor`、`home_appliances`、`home_living`、`wigs`、`beauty`、`accessories`、`toys`、`shoes`、`sports`、`others`。所有 profile 的 taxonomy 输出都统一返回 `zh` + `en`,`category_taxonomy_profile` 只决定字段集合。内部使用大模型(需配置 `DASHSCOPE_API_KEY`),支持多语言与 Redis 缓存;单次最多 50 条,建议批量调用以提升效率。
448 448  
449 449 请求/响应格式、示例及错误码见 [-05-索引接口(Indexer)](./搜索API对接指南-05-索引接口(Indexer).md#58-内容理解字段生成接口)。
450 450  
... ...
indexer/document_transformer.py
... ... @@ -259,13 +259,6 @@ class SPUDocumentTransformer:
259 259 title = str(row.get("title") or "").strip()
260 260 if not spu_id or not title:
261 261 continue
262   - category_path_obj = docs[i].get("category_path") or {}
263   - resolved_category_path = ""
264   - if isinstance(category_path_obj, dict):
265   - resolved_category_path = next(
266   - (str(value).strip() for value in category_path_obj.values() if str(value).strip()),
267   - "",
268   - )
269 262 id_to_idx[spu_id] = i
270 263 items.append(
271 264 {
... ... @@ -274,9 +267,6 @@ class SPUDocumentTransformer:
274 267 "brief": str(row.get("brief") or "").strip(),
275 268 "description": str(row.get("description") or "").strip(),
276 269 "image_url": str(row.get("image_src") or "").strip(),
277   - "category": str(row.get("category") or "").strip(),
278   - "category_path": resolved_category_path,
279   - "category1_name": str(docs[i].get("category1_name") or "").strip(),
280 270 }
281 271 )
282 272 if not items:
... ... @@ -284,7 +274,12 @@ class SPUDocumentTransformer:
284 274  
285 275 tenant_id = str(docs[0].get("tenant_id") or "").strip() or None
286 276 try:
287   - results = build_index_content_fields(items=items, tenant_id=tenant_id)
  277 + # TODO: 从数据库读取该 tenant 的真实行业,并据此替换当前默认的 apparel profile。
  278 + results = build_index_content_fields(
  279 + items=items,
  280 + tenant_id=tenant_id,
  281 + category_taxonomy_profile="apparel",
  282 + )
288 283 except Exception as e:
289 284 logger.warning("LLM batch attribute fill failed: %s", e)
290 285 return
... ... @@ -679,6 +674,7 @@ class SPUDocumentTransformer:
679 674  
680 675 tenant_id = doc.get("tenant_id")
681 676 try:
  677 + # TODO: 从数据库读取该 tenant 的真实行业,并据此替换当前默认的 apparel profile。
682 678 results = build_index_content_fields(
683 679 items=[
684 680 {
... ... @@ -687,19 +683,10 @@ class SPUDocumentTransformer:
687 683 "brief": str(spu_row.get("brief") or "").strip(),
688 684 "description": str(spu_row.get("description") or "").strip(),
689 685 "image_url": str(spu_row.get("image_src") or "").strip(),
690   - "category": str(spu_row.get("category") or "").strip(),
691   - "category_path": next(
692   - (
693   - str(value).strip()
694   - for value in (doc.get("category_path") or {}).values()
695   - if str(value).strip()
696   - ),
697   - "",
698   - ),
699   - "category1_name": str(doc.get("category1_name") or "").strip(),
700 686 }
701 687 ],
702 688 tenant_id=str(tenant_id),
  689 + category_taxonomy_profile="apparel",
703 690 )
704 691 except Exception as e:
705 692 logger.warning("LLM attribute fill failed for SPU %s: %s", spu_id, e)
... ...
indexer/product_enrich.py
... ... @@ -18,7 +18,7 @@ from dataclasses import dataclass, field
18 18 from collections import OrderedDict
19 19 from datetime import datetime
20 20 from concurrent.futures import ThreadPoolExecutor
21   -from typing import List, Dict, Tuple, Any, Optional
  21 +from typing import List, Dict, Tuple, Any, Optional, FrozenSet
22 22  
23 23 import redis
24 24 import requests
... ... @@ -146,8 +146,22 @@ if _missing_prompt_langs:
146 146 )
147 147  
148 148  
149   -# 多值字段分隔:英文逗号、中文逗号、顿号,及历史约定的 ; | / 与空白
  149 +# 多值字段分隔
150 150 _MULTI_VALUE_FIELD_SPLIT_RE = re.compile(r"[,、,;|/\n\t]+")
  151 +# 表格单元格中视为「无内容」的占位
  152 +_MARKDOWN_EMPTY_CELL_LITERALS: Tuple[str, ...] = ("-","–", "—", "none", "null", "n/a", "无")
  153 +_MARKDOWN_EMPTY_CELL_TOKENS_CF: FrozenSet[str] = frozenset(
  154 + lit.casefold() for lit in _MARKDOWN_EMPTY_CELL_LITERALS
  155 +)
  156 +
  157 +def _normalize_markdown_table_cell(raw: Optional[str]) -> str:
  158 + """strip;将占位符统一视为空字符串。"""
  159 + s = str(raw or "").strip()
  160 + if not s:
  161 + return ""
  162 + if s.casefold() in _MARKDOWN_EMPTY_CELL_TOKENS_CF:
  163 + return ""
  164 + return s
151 165 _CORE_INDEX_LANGUAGES = ("zh", "en")
152 166 _DEFAULT_ENRICHMENT_SCOPES = ("generic", "category_taxonomy")
153 167 _DEFAULT_CATEGORY_TAXONOMY_PROFILE = "apparel"
... ... @@ -195,19 +209,12 @@ class AnalysisSchema:
195 209 markdown_table_headers: Dict[str, List[str]]
196 210 result_fields: Tuple[str, ...]
197 211 meaningful_fields: Tuple[str, ...]
198   - output_languages: Tuple[str, ...] = ("zh", "en")
199 212 cache_version: str = "v1"
200 213 field_aliases: Dict[str, Tuple[str, ...]] = field(default_factory=dict)
201   - fallback_headers: Optional[List[str]] = None
202 214 quality_fields: Tuple[str, ...] = ()
203 215  
204 216 def get_headers(self, target_lang: str) -> Optional[List[str]]:
205   - headers = self.markdown_table_headers.get(target_lang)
206   - if headers:
207   - return headers
208   - if self.fallback_headers:
209   - return self.fallback_headers
210   - return None
  217 + return self.markdown_table_headers.get(target_lang)
211 218  
212 219  
213 220 _ANALYSIS_SCHEMAS: Dict[str, AnalysisSchema] = {
... ... @@ -217,7 +224,6 @@ _ANALYSIS_SCHEMAS: Dict[str, AnalysisSchema] = {
217 224 markdown_table_headers=LANGUAGE_MARKDOWN_TABLE_HEADERS,
218 225 result_fields=_CONTENT_ANALYSIS_RESULT_FIELDS,
219 226 meaningful_fields=_CONTENT_ANALYSIS_MEANINGFUL_FIELDS,
220   - output_languages=_CORE_INDEX_LANGUAGES,
221 227 cache_version="v2",
222 228 field_aliases=_CONTENT_ANALYSIS_FIELD_ALIASES,
223 229 quality_fields=_CONTENT_ANALYSIS_QUALITY_FIELDS,
... ... @@ -225,17 +231,13 @@ _ANALYSIS_SCHEMAS: Dict[str, AnalysisSchema] = {
225 231 }
226 232  
227 233 def _build_taxonomy_profile_schema(profile: str, config: Dict[str, Any]) -> AnalysisSchema:
228   - result_fields = tuple(field["key"] for field in config["fields"])
229   - headers = config["markdown_table_headers"]
230 234 return AnalysisSchema(
231 235 name=f"taxonomy:{profile}",
232 236 shared_instruction=config["shared_instruction"],
233   - markdown_table_headers=headers,
234   - result_fields=result_fields,
235   - meaningful_fields=result_fields,
236   - output_languages=tuple(config["output_languages"]),
  237 + markdown_table_headers=config["markdown_table_headers"],
  238 + result_fields=tuple(field["key"] for field in config["fields"]),
  239 + meaningful_fields=tuple(field["key"] for field in config["fields"]),
237 240 cache_version="v1",
238   - fallback_headers=headers.get("en") if len(headers) > 1 else None,
239 241 )
240 242  
241 243  
... ... @@ -254,29 +256,6 @@ def get_supported_category_taxonomy_profiles() -> Tuple[str, ...]:
254 256 return tuple(_CATEGORY_TAXONOMY_PROFILE_SCHEMAS.keys())
255 257  
256 258  
257   -def _normalize_category_hint(text: Any) -> str:
258   - value = str(text or "").strip().lower()
259   - if not value:
260   - return ""
261   - value = value.replace("_", " ").replace(">", " ").replace("/", " ")
262   - value = re.sub(r"\s+", " ", value)
263   - return value
264   -
265   -
266   -_CATEGORY_TAXONOMY_PROFILE_ALIAS_MATCHERS: Tuple[Tuple[str, str], ...] = tuple(
267   - sorted(
268   - (
269   - (_normalize_category_hint(alias), profile)
270   - for profile, config in CATEGORY_TAXONOMY_PROFILES.items()
271   - for alias in (profile, *tuple(config.get("aliases") or ()))
272   - if _normalize_category_hint(alias)
273   - ),
274   - key=lambda item: len(item[0]),
275   - reverse=True,
276   - )
277   -)
278   -
279   -
280 259 def _normalize_category_taxonomy_profile(category_taxonomy_profile: Optional[str] = None) -> str:
281 260 profile = str(category_taxonomy_profile or _DEFAULT_CATEGORY_TAXONOMY_PROFILE).strip()
282 261 if profile not in _CATEGORY_TAXONOMY_PROFILE_SCHEMAS:
... ... @@ -287,41 +266,6 @@ def _normalize_category_taxonomy_profile(category_taxonomy_profile: Optional[str
287 266 return profile
288 267  
289 268  
290   -def detect_category_taxonomy_profile(item: Dict[str, Any]) -> Optional[str]:
291   - """
292   - 根据商品已有类目信息猜测 taxonomy profile。
293   - 未命中时返回 None,由上层决定是否回退到默认 profile。
294   - """
295   - category_hints = (
296   - item.get("category_taxonomy_profile"),
297   - item.get("category1_name"),
298   - item.get("category_name_text"),
299   - item.get("category"),
300   - item.get("category_path"),
301   - )
302   - for hint in category_hints:
303   - normalized_hint = _normalize_category_hint(hint)
304   - if not normalized_hint:
305   - continue
306   - for alias, profile in _CATEGORY_TAXONOMY_PROFILE_ALIAS_MATCHERS:
307   - if alias and alias in normalized_hint:
308   - return profile
309   - return None
310   -
311   -
312   -def _resolve_category_taxonomy_profile(
313   - item: Dict[str, Any],
314   - fallback_profile: Optional[str] = None,
315   -) -> str:
316   - explicit_profile = str(item.get("category_taxonomy_profile") or "").strip()
317   - if explicit_profile:
318   - return _normalize_category_taxonomy_profile(explicit_profile)
319   - detected_profile = detect_category_taxonomy_profile(item)
320   - if detected_profile:
321   - return detected_profile
322   - return _normalize_category_taxonomy_profile(fallback_profile)
323   -
324   -
325 269 def _get_analysis_schema(
326 270 analysis_kind: str,
327 271 *,
... ... @@ -342,17 +286,6 @@ def _get_taxonomy_attribute_field_map(
342 286 return _CATEGORY_TAXONOMY_PROFILE_ATTRIBUTE_FIELD_MAPS[profile]
343 287  
344 288  
345   -def _get_analysis_output_languages(
346   - analysis_kind: str,
347   - *,
348   - category_taxonomy_profile: Optional[str] = None,
349   -) -> Tuple[str, ...]:
350   - return _get_analysis_schema(
351   - analysis_kind,
352   - category_taxonomy_profile=category_taxonomy_profile,
353   - ).output_languages
354   -
355   -
356 289 def _normalize_enrichment_scopes(
357 290 enrichment_scopes: Optional[List[str]] = None,
358 291 ) -> Tuple[str, ...]:
... ... @@ -562,11 +495,6 @@ def _normalize_index_content_item(item: Dict[str, Any]) -> Dict[str, str]:
562 495 "brief": str(item.get("brief") or "").strip(),
563 496 "description": str(item.get("description") or "").strip(),
564 497 "image_url": str(item.get("image_url") or "").strip(),
565   - "category": str(item.get("category") or "").strip(),
566   - "category_path": str(item.get("category_path") or "").strip(),
567   - "category_name_text": str(item.get("category_name_text") or "").strip(),
568   - "category1_name": str(item.get("category1_name") or "").strip(),
569   - "category_taxonomy_profile": str(item.get("category_taxonomy_profile") or "").strip(),
570 498 }
571 499  
572 500  
... ... @@ -584,8 +512,7 @@ def build_index_content_fields(
584 512 - `title`
585 513 - 可选 `brief` / `description` / `image_url`
586 514 - 可选 `enrichment_scopes`,默认同时执行 `generic` 与 `category_taxonomy`
587   - - 可选 `category_taxonomy_profile`;若不传,则优先根据 item 自带的类目字段推断,否则回退到默认 `apparel`
588   - - 可选类目提示字段:`category` / `category_path` / `category_name_text` / `category1_name`
  515 + - 可选 `category_taxonomy_profile`,默认 `apparel`
589 516  
590 517 返回项结构:
591 518 - `id`
... ... @@ -600,21 +527,10 @@ def build_index_content_fields(
600 527 - `enriched_tags.{lang}` 为标签数组
601 528 """
602 529 requested_enrichment_scopes = _normalize_enrichment_scopes(enrichment_scopes)
603   - fallback_taxonomy_profile = (
604   - _normalize_category_taxonomy_profile(category_taxonomy_profile)
605   - if category_taxonomy_profile
606   - else None
607   - )
  530 + normalized_taxonomy_profile = _normalize_category_taxonomy_profile(category_taxonomy_profile)
608 531 normalized_items = [_normalize_index_content_item(item) for item in items]
609 532 if not normalized_items:
610 533 return []
611   - taxonomy_profile_by_id = {
612   - item["id"]: _resolve_category_taxonomy_profile(
613   - item,
614   - fallback_profile=fallback_taxonomy_profile,
615   - )
616   - for item in normalized_items
617   - }
618 534  
619 535 results_by_id: Dict[str, Dict[str, Any]] = {
620 536 item["id"]: {
... ... @@ -627,7 +543,7 @@ def build_index_content_fields(
627 543 for item in normalized_items
628 544 }
629 545  
630   - for lang in _get_analysis_output_languages("content"):
  546 + for lang in _CORE_INDEX_LANGUAGES:
631 547 if "generic" in requested_enrichment_scopes:
632 548 try:
633 549 rows = analyze_products(
... ... @@ -636,7 +552,7 @@ def build_index_content_fields(
636 552 batch_size=BATCH_SIZE,
637 553 tenant_id=tenant_id,
638 554 analysis_kind="content",
639   - category_taxonomy_profile=fallback_taxonomy_profile,
  555 + category_taxonomy_profile=normalized_taxonomy_profile,
640 556 )
641 557 except Exception as e:
642 558 logger.warning("build_index_content_fields content enrichment failed for lang=%s: %s", lang, e)
... ... @@ -654,48 +570,40 @@ def build_index_content_fields(
654 570 _apply_index_content_row(results_by_id[item_id], row=row, lang=lang)
655 571  
656 572 if "category_taxonomy" in requested_enrichment_scopes:
657   - items_by_profile: Dict[str, List[Dict[str, str]]] = {}
658   - for item in normalized_items:
659   - items_by_profile.setdefault(taxonomy_profile_by_id[item["id"]], []).append(item)
660   -
661   - for taxonomy_profile, profile_items in items_by_profile.items():
662   - for lang in _get_analysis_output_languages(
663   - "taxonomy",
664   - category_taxonomy_profile=taxonomy_profile,
665   - ):
666   - try:
667   - taxonomy_rows = analyze_products(
668   - products=profile_items,
669   - target_lang=lang,
670   - batch_size=BATCH_SIZE,
671   - tenant_id=tenant_id,
672   - analysis_kind="taxonomy",
673   - category_taxonomy_profile=taxonomy_profile,
674   - )
675   - except Exception as e:
676   - logger.warning(
677   - "build_index_content_fields taxonomy enrichment failed for profile=%s lang=%s: %s",
678   - taxonomy_profile,
679   - lang,
680   - e,
681   - )
682   - for item in profile_items:
683   - results_by_id[item["id"]].setdefault("error", str(e))
684   - continue
  573 + for lang in _CORE_INDEX_LANGUAGES:
  574 + try:
  575 + taxonomy_rows = analyze_products(
  576 + products=normalized_items,
  577 + target_lang=lang,
  578 + batch_size=BATCH_SIZE,
  579 + tenant_id=tenant_id,
  580 + analysis_kind="taxonomy",
  581 + category_taxonomy_profile=normalized_taxonomy_profile,
  582 + )
  583 + except Exception as e:
  584 + logger.warning(
  585 + "build_index_content_fields taxonomy enrichment failed for profile=%s lang=%s: %s",
  586 + normalized_taxonomy_profile,
  587 + lang,
  588 + e,
  589 + )
  590 + for item in normalized_items:
  591 + results_by_id[item["id"]].setdefault("error", str(e))
  592 + continue
685 593  
686   - for row in taxonomy_rows or []:
687   - item_id = str(row.get("id") or "").strip()
688   - if not item_id or item_id not in results_by_id:
689   - continue
690   - if row.get("error"):
691   - results_by_id[item_id].setdefault("error", row["error"])
692   - continue
693   - _apply_index_taxonomy_row(
694   - results_by_id[item_id],
695   - row=row,
696   - lang=lang,
697   - category_taxonomy_profile=taxonomy_profile,
698   - )
  594 + for row in taxonomy_rows or []:
  595 + item_id = str(row.get("id") or "").strip()
  596 + if not item_id or item_id not in results_by_id:
  597 + continue
  598 + if row.get("error"):
  599 + results_by_id[item_id].setdefault("error", row["error"])
  600 + continue
  601 + _apply_index_taxonomy_row(
  602 + results_by_id[item_id],
  603 + row=row,
  604 + lang=lang,
  605 + category_taxonomy_profile=normalized_taxonomy_profile,
  606 + )
699 607  
700 608 return [results_by_id[item["id"]] for item in normalized_items]
701 609  
... ... @@ -1173,7 +1081,8 @@ def parse_markdown_table(
1173 1081 if len(parts) >= 2:
1174 1082 row = {"seq_no": parts[0]}
1175 1083 for field_index, field_name in enumerate(schema.result_fields, start=1):
1176   - row[field_name] = parts[field_index] if len(parts) > field_index else ""
  1084 + cell = parts[field_index] if len(parts) > field_index else ""
  1085 + row[field_name] = _normalize_markdown_table_cell(cell)
1177 1086 data.append(row)
1178 1087  
1179 1088 return data
... ...
indexer/product_enrich_prompts.py
... ... @@ -60,11 +60,9 @@ def _build_taxonomy_shared_instruction(profile_label: str, fields: Tuple[Dict[st
60 60 "",
61 61 "Rules:",
62 62 "- Keep the same row order and row count as input.",
63   - "- Infer only from the provided product text.",
64   - "- Leave blank if not applicable or not reasonably supported.",
  63 + "- Leave blank if not applicable, unmentioned, or unsupported.",
65 64 "- Use concise, standardized ecommerce wording.",
66   - "- Do not combine different attribute dimensions in one field.",
67   - "- If multiple values are needed, use the delimiter required by the localization setting.",
  65 + "- If multiple values, separate with commas.",
68 66 "",
69 67 "Input product list:",
70 68 ]
... ... @@ -75,19 +73,14 @@ def _build_taxonomy_shared_instruction(profile_label: str, fields: Tuple[Dict[st
75 73 def _make_taxonomy_profile(
76 74 profile_label: str,
77 75 fields: Tuple[Dict[str, str], ...],
78   - *,
79   - aliases: Tuple[str, ...],
80   - output_languages: Tuple[str, ...] = ("en",),
81   - zh_headers: Tuple[str, ...] = (),
82 76 ) -> Dict[str, Any]:
83   - headers = {"en": ["No.", *[field["label"] for field in fields]]}
84   - if zh_headers:
85   - headers["zh"] = ["序号", *zh_headers]
  77 + headers = {
  78 + "en": ["No.", *[field["label"] for field in fields]],
  79 + "zh": ["序号", *[field["zh_label"] for field in fields]],
  80 + }
86 81 return {
87 82 "profile_label": profile_label,
88 83 "fields": fields,
89   - "aliases": aliases,
90   - "output_languages": output_languages,
91 84 "shared_instruction": _build_taxonomy_shared_instruction(profile_label, fields),
92 85 "markdown_table_headers": headers,
93 86 }
... ... @@ -123,268 +116,250 @@ APPAREL_TAXONOMY_FIELDS = (
123 116 )
124 117  
125 118 THREE_C_TAXONOMY_FIELDS = (
126   - _taxonomy_field("product_type", "Product Type", "concise 3C accessory or peripheral category label"),
127   - _taxonomy_field("compatible_device", "Compatible Device / Model", "supported device family, series, model, or form factor when clearly stated"),
128   - _taxonomy_field("connectivity", "Connectivity", "connection method such as wired, wireless, Bluetooth, Wi-Fi, NFC, or 2.4G"),
129   - _taxonomy_field("interface_port_type", "Interface / Port Type", "relevant connector or port, e.g. USB-C, Lightning, HDMI, AUX, RJ45"),
130   - _taxonomy_field("power_charging", "Power Source / Charging", "charging or power mode, e.g. battery powered, fast charging, rechargeable, plug-in"),
131   - _taxonomy_field("key_features", "Key Features", "primary hardware features such as noise cancelling, foldable, magnetic, backlit, waterproof"),
132   - _taxonomy_field("material_finish", "Material / Finish", "main material or exterior finish when supported"),
133   - _taxonomy_field("color", "Color", "specific color name when available"),
134   - _taxonomy_field("pack_size", "Pack Size", "unit count or bundle size when stated"),
135   - _taxonomy_field("use_case", "Use Case", "intended usage such as travel, office, gaming, car, charging, streaming"),
  119 + _taxonomy_field("product_type", "Product Type", "concise 3C accessory or peripheral category label", "品类"),
  120 + _taxonomy_field("compatible_device", "Compatible Device / Model", "supported device family, series, model, or form factor when clearly stated", "适配设备 / 型号"),
  121 + _taxonomy_field("connectivity", "Connectivity", "connection method such as wired, wireless, Bluetooth, Wi-Fi, NFC, or 2.4G", "连接方式"),
  122 + _taxonomy_field("interface_port_type", "Interface / Port Type", "relevant connector or port, e.g. USB-C, Lightning, HDMI, AUX, RJ45", "接口 / 端口类型"),
  123 + _taxonomy_field("power_charging", "Power Source / Charging", "charging or power mode, e.g. battery powered, fast charging, rechargeable, plug-in", "供电 / 充电方式"),
  124 + _taxonomy_field("key_features", "Key Features", "primary hardware features such as noise cancelling, foldable, magnetic, backlit, waterproof", "关键特征"),
  125 + _taxonomy_field("material_finish", "Material / Finish", "main material or exterior finish when supported", "材质 / 表面处理"),
  126 + _taxonomy_field("color", "Color", "specific color name when available", "主颜色"),
  127 + _taxonomy_field("pack_size", "Pack Size", "unit count or bundle size when stated", "包装规格"),
  128 + _taxonomy_field("use_case", "Use Case", "intended usage such as travel, office, gaming, car, charging, streaming", "使用场景"),
136 129 )
137 130  
138 131 BAGS_TAXONOMY_FIELDS = (
139   - _taxonomy_field("product_type", "Product Type", "concise bag category such as backpack, tote bag, crossbody bag, luggage, or wallet"),
140   - _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied"),
141   - _taxonomy_field("carry_style", "Carry Style", "how the bag is worn or carried, e.g. handheld, shoulder, crossbody, backpack"),
142   - _taxonomy_field("size_capacity", "Size / Capacity", "size tier or capacity when supported, e.g. mini, large capacity, 20L"),
143   - _taxonomy_field("material", "Material", "main bag material such as leather, nylon, canvas, PU, straw"),
144   - _taxonomy_field("closure_type", "Closure Type", "bag closure such as zipper, flap, buckle, drawstring, magnetic snap"),
145   - _taxonomy_field("structure_compartments", "Structure / Compartments", "organizational structure such as multi-pocket, laptop sleeve, card slots, expandable"),
146   - _taxonomy_field("strap_handle_type", "Strap / Handle Type", "strap or handle design such as chain strap, top handle, adjustable strap"),
147   - _taxonomy_field("color", "Color", "specific color name when available"),
148   - _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use such as commute, travel, evening, school, casual"),
  132 + _taxonomy_field("product_type", "Product Type", "concise bag category such as backpack, tote bag, crossbody bag, luggage, or wallet", "品类"),
  133 + _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied", "目标性别"),
  134 + _taxonomy_field("carry_style", "Carry Style", "how the bag is worn or carried, e.g. handheld, shoulder, crossbody, backpack", "携带方式"),
  135 + _taxonomy_field("size_capacity", "Size / Capacity", "size tier or capacity when supported, e.g. mini, large capacity, 20L", "尺寸 / 容量"),
  136 + _taxonomy_field("material", "Material", "main bag material such as leather, nylon, canvas, PU, straw", "材质"),
  137 + _taxonomy_field("closure_type", "Closure Type", "bag closure such as zipper, flap, buckle, drawstring, magnetic snap", "闭合方式"),
  138 + _taxonomy_field("structure_compartments", "Structure / Compartments", "organizational structure such as multi-pocket, laptop sleeve, card slots, expandable", "结构 / 分层"),
  139 + _taxonomy_field("strap_handle_type", "Strap / Handle Type", "strap or handle design such as chain strap, top handle, adjustable strap", "肩带 / 提手类型"),
  140 + _taxonomy_field("color", "Color", "specific color name when available", "主颜色"),
  141 + _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use such as commute, travel, evening, school, casual", "适用场景"),
149 142 )
150 143  
151 144 PET_SUPPLIES_TAXONOMY_FIELDS = (
152   - _taxonomy_field("product_type", "Product Type", "concise pet supplies category label"),
153   - _taxonomy_field("pet_type", "Pet Type", "target pet such as dog, cat, bird, fish, hamster"),
154   - _taxonomy_field("breed_size", "Breed Size", "pet size or breed size when stated, e.g. small breed, large dogs"),
155   - _taxonomy_field("life_stage", "Life Stage", "pet age stage when supported, e.g. puppy, kitten, adult, senior"),
156   - _taxonomy_field("material_ingredients", "Material / Ingredients", "main material or ingredient composition when supported"),
157   - _taxonomy_field("flavor_scent", "Flavor / Scent", "flavor or scent when applicable"),
158   - _taxonomy_field("key_features", "Key Features", "primary attributes such as interactive, leak-proof, orthopedic, washable, elevated"),
159   - _taxonomy_field("functional_benefits", "Functional Benefits", "benefits such as dental care, calming, digestion support, joint support"),
160   - _taxonomy_field("size_capacity", "Size / Capacity", "size, count, or net content when stated"),
161   - _taxonomy_field("use_scenario", "Use Scenario", "usage such as feeding, training, grooming, travel, indoor play"),
  145 + _taxonomy_field("product_type", "Product Type", "concise pet supplies category label", "品类"),
  146 + _taxonomy_field("pet_type", "Pet Type", "target pet such as dog, cat, bird, fish, hamster", "宠物类型"),
  147 + _taxonomy_field("breed_size", "Breed Size", "pet size or breed size when stated, e.g. small breed, large dogs", "体型 / 品种大小"),
  148 + _taxonomy_field("life_stage", "Life Stage", "pet age stage when supported, e.g. puppy, kitten, adult, senior", "成长阶段"),
  149 + _taxonomy_field("material_ingredients", "Material / Ingredients", "main material or ingredient composition when supported", "材质 / 成分"),
  150 + _taxonomy_field("flavor_scent", "Flavor / Scent", "flavor or scent when applicable", "口味 / 气味"),
  151 + _taxonomy_field("key_features", "Key Features", "primary attributes such as interactive, leak-proof, orthopedic, washable, elevated", "关键特征"),
  152 + _taxonomy_field("functional_benefits", "Functional Benefits", "benefits such as dental care, calming, digestion support, joint support", "功能"),
  153 + _taxonomy_field("size_capacity", "Size / Capacity", "size, count, or net content when stated", "尺寸 / 容量"),
  154 + _taxonomy_field("use_scenario", "Use Scenario", "usage such as feeding, training, grooming, travel, indoor play", "使用场景"),
162 155 )
163 156  
164 157 ELECTRONICS_TAXONOMY_FIELDS = (
165   - _taxonomy_field("product_type", "Product Type", "concise electronics device or component category label"),
166   - _taxonomy_field("device_category", "Device Category / Compatibility", "supported platform, component class, or compatible device family when stated"),
167   - _taxonomy_field("power_voltage", "Power / Voltage", "power, voltage, wattage, or battery spec when supported"),
168   - _taxonomy_field("connectivity", "Connectivity", "connection method such as wired, Bluetooth, Wi-Fi, RF, or smart app control"),
169   - _taxonomy_field("interface_port_type", "Interface / Port Type", "relevant port or interface such as USB-C, AC plug type, HDMI, SATA"),
170   - _taxonomy_field("capacity_storage", "Capacity / Storage", "capacity or storage spec such as 256GB, 2TB, 5000mAh"),
171   - _taxonomy_field("key_features", "Key Features", "main product features such as touch control, HD display, noise reduction, smart control"),
172   - _taxonomy_field("material_finish", "Material / Finish", "main housing material or finish when supported"),
173   - _taxonomy_field("color", "Color", "specific color name when available"),
174   - _taxonomy_field("use_case", "Use Case", "intended use such as home entertainment, office, charging, security, repair"),
  158 + _taxonomy_field("product_type", "Product Type", "concise electronics device or component category label", "品类"),
  159 + _taxonomy_field("device_category", "Device Category / Compatibility", "supported platform, component class, or compatible device family when stated", "设备类别 / 兼容性"),
  160 + _taxonomy_field("power_voltage", "Power / Voltage", "power, voltage, wattage, or battery spec when supported", "功率 / 电压"),
  161 + _taxonomy_field("connectivity", "Connectivity", "connection method such as wired, Bluetooth, Wi-Fi, RF, or smart app control", "连接方式"),
  162 + _taxonomy_field("interface_port_type", "Interface / Port Type", "relevant port or interface such as USB-C, AC plug type, HDMI, SATA", "接口 / 端口类型"),
  163 + _taxonomy_field("capacity_storage", "Capacity / Storage", "capacity or storage spec such as 256GB, 2TB, 5000mAh", "容量 / 存储"),
  164 + _taxonomy_field("key_features", "Key Features", "main product features such as touch control, HD display, noise reduction, smart control", "关键特征"),
  165 + _taxonomy_field("material_finish", "Material / Finish", "main housing material or finish when supported", "材质 / 表面处理"),
  166 + _taxonomy_field("color", "Color", "specific color name when available", "主颜色"),
  167 + _taxonomy_field("use_case", "Use Case", "intended use such as home entertainment, office, charging, security, repair", "使用场景"),
175 168 )
176 169  
177 170 OUTDOOR_TAXONOMY_FIELDS = (
178   - _taxonomy_field("product_type", "Product Type", "concise outdoor gear category label"),
179   - _taxonomy_field("activity_type", "Activity Type", "primary outdoor activity such as camping, hiking, fishing, climbing, travel"),
180   - _taxonomy_field("season_weather", "Season / Weather", "season or weather suitability when supported"),
181   - _taxonomy_field("material", "Material", "main material such as aluminum, ripstop nylon, stainless steel, EVA"),
182   - _taxonomy_field("capacity_size", "Capacity / Size", "size, length, or capacity when stated"),
183   - _taxonomy_field("protection_resistance", "Protection / Resistance", "resistance or protection such as waterproof, UV resistant, windproof"),
184   - _taxonomy_field("key_features", "Key Features", "primary gear attributes such as foldable, lightweight, insulated, non-slip"),
185   - _taxonomy_field("portability_packability", "Portability / Packability", "carry or storage trait such as collapsible, compact, ultralight, packable"),
186   - _taxonomy_field("color", "Color", "specific color name when available"),
187   - _taxonomy_field("use_scenario", "Use Scenario", "likely use setting such as campsite, trail, survival kit, beach, picnic"),
  171 + _taxonomy_field("product_type", "Product Type", "concise outdoor gear category label", "品类"),
  172 + _taxonomy_field("activity_type", "Activity Type", "primary outdoor activity such as camping, hiking, fishing, climbing, travel", "活动类型"),
  173 + _taxonomy_field("season_weather", "Season / Weather", "season or weather suitability when supported", "适用季节 / 天气"),
  174 + _taxonomy_field("material", "Material", "main material such as aluminum, ripstop nylon, stainless steel, EVA", "材质"),
  175 + _taxonomy_field("capacity_size", "Capacity / Size", "size, length, or capacity when stated", "容量 / 尺寸"),
  176 + _taxonomy_field("protection_resistance", "Protection / Resistance", "resistance or protection such as waterproof, UV resistant, windproof", "防护 / 耐受性"),
  177 + _taxonomy_field("key_features", "Key Features", "primary gear attributes such as foldable, lightweight, insulated, non-slip", "关键特征"),
  178 + _taxonomy_field("portability_packability", "Portability / Packability", "carry or storage trait such as collapsible, compact, ultralight, packable", "便携 / 收纳性"),
  179 + _taxonomy_field("color", "Color", "specific color name when available", "主颜色"),
  180 + _taxonomy_field("use_scenario", "Use Scenario", "likely use setting such as campsite, trail, survival kit, beach, picnic", "使用场景"),
188 181 )
189 182  
190 183 HOME_APPLIANCES_TAXONOMY_FIELDS = (
191   - _taxonomy_field("product_type", "Product Type", "concise home appliance category label"),
192   - _taxonomy_field("appliance_category", "Appliance Category", "functional class such as kitchen appliance, cleaning appliance, personal care appliance"),
193   - _taxonomy_field("power_voltage", "Power / Voltage", "wattage, voltage, plug type, or power supply when supported"),
194   - _taxonomy_field("capacity_coverage", "Capacity / Coverage", "capacity or coverage metric such as 1.5L, 20L, 40sqm"),
195   - _taxonomy_field("control_method", "Control Method", "operation method such as touch, knob, remote, app control"),
196   - _taxonomy_field("installation_type", "Installation Type", "setup style such as countertop, handheld, portable, wall-mounted, built-in"),
197   - _taxonomy_field("key_features", "Key Features", "main product features such as timer, steam, HEPA filter, self-cleaning"),
198   - _taxonomy_field("material_finish", "Material / Finish", "main material or exterior finish when supported"),
199   - _taxonomy_field("color", "Color", "specific color name when available"),
200   - _taxonomy_field("use_scenario", "Use Scenario", "intended use such as cooking, cleaning, grooming, cooling, air treatment"),
  184 + _taxonomy_field("product_type", "Product Type", "concise home appliance category label", "品类"),
  185 + _taxonomy_field("appliance_category", "Appliance Category", "functional class such as kitchen appliance, cleaning appliance, personal care appliance", "家电类别"),
  186 + _taxonomy_field("power_voltage", "Power / Voltage", "wattage, voltage, plug type, or power supply when supported", "功率 / 电压"),
  187 + _taxonomy_field("capacity_coverage", "Capacity / Coverage", "capacity or coverage metric such as 1.5L, 20L, 40sqm", "容量 / 覆盖范围"),
  188 + _taxonomy_field("control_method", "Control Method", "operation method such as touch, knob, remote, app control", "控制方式"),
  189 + _taxonomy_field("installation_type", "Installation Type", "setup style such as countertop, handheld, portable, wall-mounted, built-in", "安装方式"),
  190 + _taxonomy_field("key_features", "Key Features", "main product features such as timer, steam, HEPA filter, self-cleaning", "关键特征"),
  191 + _taxonomy_field("material_finish", "Material / Finish", "main material or exterior finish when supported", "材质 / 表面处理"),
  192 + _taxonomy_field("color", "Color", "specific color name when available", "主颜色"),
  193 + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as cooking, cleaning, grooming, cooling, air treatment", "使用场景"),
201 194 )
202 195  
203 196 HOME_LIVING_TAXONOMY_FIELDS = (
204   - _taxonomy_field("product_type", "Product Type", "concise home and living category label"),
205   - _taxonomy_field("room_placement", "Room / Placement", "intended room or placement such as bedroom, kitchen, bathroom, desktop"),
206   - _taxonomy_field("material", "Material", "main material such as wood, ceramic, cotton, glass, metal"),
207   - _taxonomy_field("style", "Style", "home style such as modern, farmhouse, minimalist, boho, Nordic"),
208   - _taxonomy_field("size_dimensions", "Size / Dimensions", "size or dimensions when stated"),
209   - _taxonomy_field("color", "Color", "specific color name when available"),
210   - _taxonomy_field("pattern_finish", "Pattern / Finish", "surface pattern or finish such as solid, marble, matte, ribbed"),
211   - _taxonomy_field("key_features", "Key Features", "main product features such as stackable, washable, blackout, space-saving"),
212   - _taxonomy_field("assembly_installation", "Assembly / Installation", "assembly or installation trait when supported"),
213   - _taxonomy_field("use_scenario", "Use Scenario", "intended use such as storage, dining, decor, sleep, organization"),
  197 + _taxonomy_field("product_type", "Product Type", "concise home and living category label", "品类"),
  198 + _taxonomy_field("room_placement", "Room / Placement", "intended room or placement such as bedroom, kitchen, bathroom, desktop", "适用空间 / 摆放位置"),
  199 + _taxonomy_field("material", "Material", "main material such as wood, ceramic, cotton, glass, metal", "材质"),
  200 + _taxonomy_field("style", "Style", "home style such as modern, farmhouse, minimalist, boho, Nordic", "风格"),
  201 + _taxonomy_field("size_dimensions", "Size / Dimensions", "size or dimensions when stated", "尺寸 / 规格"),
  202 + _taxonomy_field("color", "Color", "specific color name when available", "主颜色"),
  203 + _taxonomy_field("pattern_finish", "Pattern / Finish", "surface pattern or finish such as solid, marble, matte, ribbed", "图案 / 表面处理"),
  204 + _taxonomy_field("key_features", "Key Features", "main product features such as stackable, washable, blackout, space-saving", "关键特征"),
  205 + _taxonomy_field("assembly_installation", "Assembly / Installation", "assembly or installation trait when supported", "组装 / 安装"),
  206 + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as storage, dining, decor, sleep, organization", "使用场景"),
214 207 )
215 208  
216 209 WIGS_TAXONOMY_FIELDS = (
217   - _taxonomy_field("product_type", "Product Type", "concise wig or hairpiece category label"),
218   - _taxonomy_field("hair_material", "Hair Material", "hair material such as human hair, synthetic fiber, heat-resistant fiber"),
219   - _taxonomy_field("hair_texture", "Hair Texture", "texture or curl pattern such as straight, body wave, curly, kinky"),
220   - _taxonomy_field("hair_length", "Hair Length", "hair length when stated"),
221   - _taxonomy_field("hair_color", "Hair Color", "specific hair color or blend when available"),
222   - _taxonomy_field("cap_construction", "Cap Construction", "cap type such as full lace, lace front, glueless, U part"),
223   - _taxonomy_field("lace_area_part_type", "Lace Area / Part Type", "lace size or part style such as 13x4 lace, middle part, T part"),
224   - _taxonomy_field("density_volume", "Density / Volume", "hair density or fullness when supported"),
225   - _taxonomy_field("style_bang_type", "Style / Bang Type", "style cue such as bob, pixie, layered, with bangs"),
226   - _taxonomy_field("occasion_end_use", "Occasion / End Use", "intended use such as daily wear, cosplay, protective style, party"),
  210 + _taxonomy_field("product_type", "Product Type", "concise wig or hairpiece category label", "品类"),
  211 + _taxonomy_field("hair_material", "Hair Material", "hair material such as human hair, synthetic fiber, heat-resistant fiber", "发丝材质"),
  212 + _taxonomy_field("hair_texture", "Hair Texture", "texture or curl pattern such as straight, body wave, curly, kinky", "发质纹理"),
  213 + _taxonomy_field("hair_length", "Hair Length", "hair length when stated", "发长"),
  214 + _taxonomy_field("hair_color", "Hair Color", "specific hair color or blend when available", "发色"),
  215 + _taxonomy_field("cap_construction", "Cap Construction", "cap type such as full lace, lace front, glueless, U part", "帽网结构"),
  216 + _taxonomy_field("lace_area_part_type", "Lace Area / Part Type", "lace size or part style such as 13x4 lace, middle part, T part", "蕾丝面积 / 分缝类型"),
  217 + _taxonomy_field("density_volume", "Density / Volume", "hair density or fullness when supported", "密度 / 发量"),
  218 + _taxonomy_field("style_bang_type", "Style / Bang Type", "style cue such as bob, pixie, layered, with bangs", "款式 / 刘海类型"),
  219 + _taxonomy_field("occasion_end_use", "Occasion / End Use", "intended use such as daily wear, cosplay, protective style, party", "适用场景"),
227 220 )
228 221  
229 222 BEAUTY_TAXONOMY_FIELDS = (
230   - _taxonomy_field("product_type", "Product Type", "concise beauty or cosmetics category label"),
231   - _taxonomy_field("target_area", "Target Area", "target area such as face, lips, eyes, nails, hair, body"),
232   - _taxonomy_field("skin_hair_type", "Skin Type / Hair Type", "suitable skin or hair type when supported"),
233   - _taxonomy_field("finish_effect", "Finish / Effect", "cosmetic finish or effect such as matte, dewy, volumizing, brightening"),
234   - _taxonomy_field("key_ingredients", "Key Ingredients", "notable ingredients when stated"),
235   - _taxonomy_field("shade_color", "Shade / Color", "specific shade or color when available"),
236   - _taxonomy_field("scent", "Scent", "fragrance or scent only when supported"),
237   - _taxonomy_field("formulation", "Formulation", "product form such as cream, serum, powder, gel, stick"),
238   - _taxonomy_field("functional_benefits", "Functional Benefits", "benefits such as hydration, anti-aging, long-wear, repair, sun protection"),
239   - _taxonomy_field("use_scenario", "Use Scenario", "intended use such as daily routine, salon, travel, evening makeup"),
  223 + _taxonomy_field("product_type", "Product Type", "concise beauty or cosmetics category label", "品类"),
  224 + _taxonomy_field("target_area", "Target Area", "target area such as face, lips, eyes, nails, hair, body", "适用部位"),
  225 + _taxonomy_field("skin_hair_type", "Skin Type / Hair Type", "suitable skin or hair type when supported", "肤质 / 发质"),
  226 + _taxonomy_field("finish_effect", "Finish / Effect", "cosmetic finish or effect such as matte, dewy, volumizing, brightening", "妆效 / 效果"),
  227 + _taxonomy_field("key_ingredients", "Key Ingredients", "notable ingredients when stated", "关键成分"),
  228 + _taxonomy_field("shade_color", "Shade / Color", "specific shade or color when available", "色号 / 颜色"),
  229 + _taxonomy_field("scent", "Scent", "fragrance or scent only when supported", "香味"),
  230 + _taxonomy_field("formulation", "Formulation", "product form such as cream, serum, powder, gel, stick", "剂型 / 形态"),
  231 + _taxonomy_field("functional_benefits", "Functional Benefits", "benefits such as hydration, anti-aging, long-wear, repair, sun protection", "功能"),
  232 + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as daily routine, salon, travel, evening makeup", "使用场景"),
240 233 )
241 234  
242 235 ACCESSORIES_TAXONOMY_FIELDS = (
243   - _taxonomy_field("product_type", "Product Type", "concise accessory category label such as necklace, watch, belt, hat, or sunglasses"),
244   - _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied"),
245   - _taxonomy_field("material", "Material", "main material such as alloy, leather, stainless steel, acetate, fabric"),
246   - _taxonomy_field("color", "Color", "specific color name when available"),
247   - _taxonomy_field("pattern_finish", "Pattern / Finish", "surface treatment or style finish such as polished, textured, braided, rhinestone"),
248   - _taxonomy_field("closure_fastening", "Closure / Fastening", "fastening method when applicable"),
249   - _taxonomy_field("size_fit", "Size / Fit", "size or fit information such as adjustable, one size, 42mm"),
250   - _taxonomy_field("style", "Style", "style cue such as minimalist, vintage, statement, sporty"),
251   - _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use such as daily wear, formal, party, travel, sun protection"),
252   - _taxonomy_field("set_pack_size", "Set / Pack Size", "set count or pack size when stated"),
  236 + _taxonomy_field("product_type", "Product Type", "concise accessory category label such as necklace, watch, belt, hat, or sunglasses", "品类"),
  237 + _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied", "目标性别"),
  238 + _taxonomy_field("material", "Material", "main material such as alloy, leather, stainless steel, acetate, fabric", "材质"),
  239 + _taxonomy_field("color", "Color", "specific color name when available", "主颜色"),
  240 + _taxonomy_field("pattern_finish", "Pattern / Finish", "surface treatment or style finish such as polished, textured, braided, rhinestone", "图案 / 表面处理"),
  241 + _taxonomy_field("closure_fastening", "Closure / Fastening", "fastening method when applicable", "闭合 / 固定方式"),
  242 + _taxonomy_field("size_fit", "Size / Fit", "size or fit information such as adjustable, one size, 42mm", "尺寸 / 适配"),
  243 + _taxonomy_field("style", "Style", "style cue such as minimalist, vintage, statement, sporty", "风格"),
  244 + _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use such as daily wear, formal, party, travel, sun protection", "适用场景"),
  245 + _taxonomy_field("set_pack_size", "Set / Pack Size", "set count or pack size when stated", "套装 / 规格"),
253 246 )
254 247  
255 248 TOYS_TAXONOMY_FIELDS = (
256   - _taxonomy_field("product_type", "Product Type", "concise toy category label"),
257   - _taxonomy_field("age_group", "Age Group", "intended age group when clearly implied"),
258   - _taxonomy_field("character_theme", "Character / Theme", "licensed character, theme, or play theme when supported"),
259   - _taxonomy_field("material", "Material", "main toy material such as plush, plastic, wood, silicone"),
260   - _taxonomy_field("power_source", "Power Source", "battery, rechargeable, wind-up, or non-powered when supported"),
261   - _taxonomy_field("interactive_features", "Interactive Features", "interactive functions such as sound, lights, remote control, motion"),
262   - _taxonomy_field("educational_play_value", "Educational / Play Value", "play value such as STEM, pretend play, sensory, puzzle solving"),
263   - _taxonomy_field("piece_count_size", "Piece Count / Size", "piece count or size when stated"),
264   - _taxonomy_field("color", "Color", "specific color name when available"),
265   - _taxonomy_field("use_scenario", "Use Scenario", "intended use such as indoor play, bath time, party favor, outdoor play"),
  249 + _taxonomy_field("product_type", "Product Type", "concise toy category label", "品类"),
  250 + _taxonomy_field("age_group", "Age Group", "intended age group when clearly implied", "年龄段"),
  251 + _taxonomy_field("character_theme", "Character / Theme", "licensed character, theme, or play theme when supported", "角色 / 主题"),
  252 + _taxonomy_field("material", "Material", "main toy material such as plush, plastic, wood, silicone", "材质"),
  253 + _taxonomy_field("power_source", "Power Source", "battery, rechargeable, wind-up, or non-powered when supported", "供电方式"),
  254 + _taxonomy_field("interactive_features", "Interactive Features", "interactive functions such as sound, lights, remote control, motion", "互动功能"),
  255 + _taxonomy_field("educational_play_value", "Educational / Play Value", "play value such as STEM, pretend play, sensory, puzzle solving", "教育 / 可玩性"),
  256 + _taxonomy_field("piece_count_size", "Piece Count / Size", "piece count or size when stated", "件数 / 尺寸"),
  257 + _taxonomy_field("color", "Color", "specific color name when available", "主颜色"),
  258 + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as indoor play, bath time, party favor, outdoor play", "使用场景"),
266 259 )
267 260  
268 261 SHOES_TAXONOMY_FIELDS = (
269   - _taxonomy_field("product_type", "Product Type", "concise footwear category label"),
270   - _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied"),
271   - _taxonomy_field("age_group", "Age Group", "only if clearly implied"),
272   - _taxonomy_field("closure_type", "Closure Type", "fastening method such as lace-up, slip-on, buckle, hook-and-loop"),
273   - _taxonomy_field("toe_shape", "Toe Shape", "toe shape when applicable, e.g. round toe, pointed toe, open toe"),
274   - _taxonomy_field("heel_sole_type", "Heel Height / Sole Type", "heel or sole profile such as flat, block heel, wedge, platform, thick sole"),
275   - _taxonomy_field("upper_material", "Upper Material", "main upper material such as leather, knit, canvas, mesh"),
276   - _taxonomy_field("lining_insole_material", "Lining / Insole Material", "lining or insole material when supported"),
277   - _taxonomy_field("color", "Color", "specific color name when available"),
278   - _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use such as running, casual, office, hiking, formal"),
  262 + _taxonomy_field("product_type", "Product Type", "concise footwear category label", "品类"),
  263 + _taxonomy_field("target_gender", "Target Gender", "intended gender only if clearly implied", "目标性别"),
  264 + _taxonomy_field("age_group", "Age Group", "only if clearly implied", "年龄段"),
  265 + _taxonomy_field("closure_type", "Closure Type", "fastening method such as lace-up, slip-on, buckle, hook-and-loop", "闭合方式"),
  266 + _taxonomy_field("toe_shape", "Toe Shape", "toe shape when applicable, e.g. round toe, pointed toe, open toe", "鞋头形状"),
  267 + _taxonomy_field("heel_sole_type", "Heel Height / Sole Type", "heel or sole profile such as flat, block heel, wedge, platform, thick sole", "跟高 / 鞋底类型"),
  268 + _taxonomy_field("upper_material", "Upper Material", "main upper material such as leather, knit, canvas, mesh", "鞋面材质"),
  269 + _taxonomy_field("lining_insole_material", "Lining / Insole Material", "lining or insole material when supported", "里料 / 鞋垫材质"),
  270 + _taxonomy_field("color", "Color", "specific color name when available", "主颜色"),
  271 + _taxonomy_field("occasion_end_use", "Occasion / End Use", "likely use such as running, casual, office, hiking, formal", "适用场景"),
279 272 )
280 273  
281 274 SPORTS_TAXONOMY_FIELDS = (
282   - _taxonomy_field("product_type", "Product Type", "concise sports product category label"),
283   - _taxonomy_field("sport_activity", "Sport / Activity", "primary sport or activity such as fitness, yoga, basketball, cycling, swimming"),
284   - _taxonomy_field("skill_level", "Skill Level", "target user level when supported, e.g. beginner, training, professional"),
285   - _taxonomy_field("material", "Material", "main material such as EVA, carbon fiber, neoprene, latex"),
286   - _taxonomy_field("size_capacity", "Size / Capacity", "size, weight, resistance level, or capacity when stated"),
287   - _taxonomy_field("protection_support", "Protection / Support", "support or protection function such as ankle support, shock absorption, impact protection"),
288   - _taxonomy_field("key_features", "Key Features", "main features such as anti-slip, adjustable, foldable, quick-dry"),
289   - _taxonomy_field("power_source", "Power Source", "battery, electric, or non-powered when applicable"),
290   - _taxonomy_field("color", "Color", "specific color name when available"),
291   - _taxonomy_field("use_scenario", "Use Scenario", "intended use such as gym, home workout, field training, competition"),
  275 + _taxonomy_field("product_type", "Product Type", "concise sports product category label", "品类"),
  276 + _taxonomy_field("sport_activity", "Sport / Activity", "primary sport or activity such as fitness, yoga, basketball, cycling, swimming", "运动 / 活动"),
  277 + _taxonomy_field("skill_level", "Skill Level", "target user level when supported, e.g. beginner, training, professional", "适用水平"),
  278 + _taxonomy_field("material", "Material", "main material such as EVA, carbon fiber, neoprene, latex", "材质"),
  279 + _taxonomy_field("size_capacity", "Size / Capacity", "size, weight, resistance level, or capacity when stated", "尺寸 / 容量"),
  280 + _taxonomy_field("protection_support", "Protection / Support", "support or protection function such as ankle support, shock absorption, impact protection", "防护 / 支撑"),
  281 + _taxonomy_field("key_features", "Key Features", "main features such as anti-slip, adjustable, foldable, quick-dry", "关键特征"),
  282 + _taxonomy_field("power_source", "Power Source", "battery, electric, or non-powered when applicable", "供电方式"),
  283 + _taxonomy_field("color", "Color", "specific color name when available", "主颜色"),
  284 + _taxonomy_field("use_scenario", "Use Scenario", "intended use such as gym, home workout, field training, competition", "使用场景"),
292 285 )
293 286  
294 287 OTHERS_TAXONOMY_FIELDS = (
295   - _taxonomy_field("product_type", "Product Type", "concise product category label, not a full marketing title"),
296   - _taxonomy_field("product_category", "Product Category", "broader retail grouping when the specific product type is narrow"),
297   - _taxonomy_field("target_user", "Target User", "intended user, audience, or recipient when clearly implied"),
298   - _taxonomy_field("material_ingredients", "Material / Ingredients", "main material or ingredients when supported"),
299   - _taxonomy_field("key_features", "Key Features", "primary product attributes or standout features"),
300   - _taxonomy_field("functional_benefits", "Functional Benefits", "practical benefits or performance advantages when supported"),
301   - _taxonomy_field("size_capacity", "Size / Capacity", "size, count, weight, or capacity when stated"),
302   - _taxonomy_field("color", "Color", "specific color name when available"),
303   - _taxonomy_field("style_theme", "Style / Theme", "overall style, design theme, or visual direction when supported"),
304   - _taxonomy_field("use_scenario", "Use Scenario", "likely use occasion or application setting when supported"),
  288 + _taxonomy_field("product_type", "Product Type", "concise product category label, not a full marketing title", "品类"),
  289 + _taxonomy_field("product_category", "Product Category", "broader retail grouping when the specific product type is narrow", "商品类别"),
  290 + _taxonomy_field("target_user", "Target User", "intended user, audience, or recipient when clearly implied", "适用人群"),
  291 + _taxonomy_field("material_ingredients", "Material / Ingredients", "main material or ingredients when supported", "材质 / 成分"),
  292 + _taxonomy_field("key_features", "Key Features", "primary product attributes or standout features", "关键特征"),
  293 + _taxonomy_field("functional_benefits", "Functional Benefits", "practical benefits or performance advantages when supported", "功能"),
  294 + _taxonomy_field("size_capacity", "Size / Capacity", "size, count, weight, or capacity when stated", "尺寸 / 容量"),
  295 + _taxonomy_field("color", "Color", "specific color name when available", "主颜色"),
  296 + _taxonomy_field("style_theme", "Style / Theme", "overall style, design theme, or visual direction when supported", "风格 / 主题"),
  297 + _taxonomy_field("use_scenario", "Use Scenario", "likely use occasion or application setting when supported", "使用场景"),
305 298 )
306 299  
307 300 CATEGORY_TAXONOMY_PROFILES: Dict[str, Dict[str, Any]] = {
308 301 "apparel": _make_taxonomy_profile(
309 302 "apparel",
310 303 APPAREL_TAXONOMY_FIELDS,
311   - aliases=("服装", "服饰", "apparel", "clothing", "fashion"),
312   - output_languages=("zh", "en"),
313   - zh_headers=tuple(field["zh_label"] for field in APPAREL_TAXONOMY_FIELDS),
314 304 ),
315 305 "3c": _make_taxonomy_profile(
316 306 "3C",
317 307 THREE_C_TAXONOMY_FIELDS,
318   - aliases=("3c", "数码", "phone accessories", "computer peripherals", "smart wearables", "audio", "gaming gear"),
319 308 ),
320 309 "bags": _make_taxonomy_profile(
321 310 "bags",
322 311 BAGS_TAXONOMY_FIELDS,
323   - aliases=("bags", "bag", "包", "箱包", "handbag", "backpack", "wallet", "luggage"),
324 312 ),
325 313 "pet_supplies": _make_taxonomy_profile(
326 314 "pet supplies",
327 315 PET_SUPPLIES_TAXONOMY_FIELDS,
328   - aliases=("pet", "宠物", "pet supplies", "pet food", "pet toys", "pet care"),
329 316 ),
330 317 "electronics": _make_taxonomy_profile(
331 318 "electronics",
332 319 ELECTRONICS_TAXONOMY_FIELDS,
333   - aliases=("electronics", "电子", "electronic components", "consumer electronics", "digital devices"),
334 320 ),
335 321 "outdoor": _make_taxonomy_profile(
336 322 "outdoor products",
337 323 OUTDOOR_TAXONOMY_FIELDS,
338   - aliases=("outdoor", "户外", "camping", "hiking", "fishing", "travel accessories"),
339 324 ),
340 325 "home_appliances": _make_taxonomy_profile(
341 326 "home appliances",
342 327 HOME_APPLIANCES_TAXONOMY_FIELDS,
343   - aliases=("home appliances", "家电", "电器", "kitchen appliances", "cleaning appliances", "smart home devices"),
344 328 ),
345 329 "home_living": _make_taxonomy_profile(
346 330 "home and living",
347 331 HOME_LIVING_TAXONOMY_FIELDS,
348   - aliases=("home", "living", "家居", "家具", "家纺", "home decor", "kitchenware"),
349 332 ),
350 333 "wigs": _make_taxonomy_profile(
351 334 "wigs",
352 335 WIGS_TAXONOMY_FIELDS,
353   - aliases=("wig", "wigs", "假发", "hairpiece"),
354 336 ),
355 337 "beauty": _make_taxonomy_profile(
356 338 "beauty and cosmetics",
357 339 BEAUTY_TAXONOMY_FIELDS,
358   - aliases=("beauty", "cosmetics", "美容", "美妆", "makeup", "skincare", "nail care"),
359 340 ),
360 341 "accessories": _make_taxonomy_profile(
361 342 "accessories",
362 343 ACCESSORIES_TAXONOMY_FIELDS,
363   - aliases=("accessories", "配饰", "jewelry", "watches", "belts", "scarves", "hats", "sunglasses"),
364 344 ),
365 345 "toys": _make_taxonomy_profile(
366 346 "toys",
367 347 TOYS_TAXONOMY_FIELDS,
368   - aliases=("toys", "toy", "玩具", "plush", "action figures", "puzzles", "educational toys"),
369 348 ),
370 349 "shoes": _make_taxonomy_profile(
371 350 "shoes",
372 351 SHOES_TAXONOMY_FIELDS,
373   - aliases=("shoes", "shoe", "鞋", "sneakers", "boots", "sandals", "heels"),
374 352 ),
375 353 "sports": _make_taxonomy_profile(
376 354 "sports products",
377 355 SPORTS_TAXONOMY_FIELDS,
378   - aliases=("sports", "sport", "运动", "fitness", "cycling", "team sports", "water sports"),
379 356 ),
380 357 "others": _make_taxonomy_profile(
381 358 "general merchandise",
382 359 OTHERS_TAXONOMY_FIELDS,
383   - aliases=("others", "other", "其他", "general merchandise"),
384 360 ),
385 361 }
386 362  
387   -CATEGORY_TAXONOMY_PROFILE_NAMES = tuple(CATEGORY_TAXONOMY_PROFILES.keys())
388 363 TAXONOMY_SHARED_ANALYSIS_INSTRUCTION = CATEGORY_TAXONOMY_PROFILES["apparel"]["shared_instruction"]
389 364 TAXONOMY_MARKDOWN_TABLE_HEADERS_EN = CATEGORY_TAXONOMY_PROFILES["apparel"]["markdown_table_headers"]["en"]
390 365 TAXONOMY_LANGUAGE_MARKDOWN_TABLE_HEADERS: Dict[str, Dict[str, Any]] = CATEGORY_TAXONOMY_PROFILES["apparel"]["markdown_table_headers"]
... ...
indexer/product_enrich模块说明.md 0 → 100644
... ... @@ -0,0 +1,173 @@
  1 +# 内容富化模块说明
  2 +
  3 +本文说明商品内容富化模块的职责、入口、输出结构,以及当前 taxonomy profile 的设计约束。
  4 +
  5 +## 1. 模块目标
  6 +
  7 +内容富化模块负责基于商品文本调用 LLM,生成以下索引字段:
  8 +
  9 +- `qanchors`
  10 +- `enriched_tags`
  11 +- `enriched_attributes`
  12 +- `enriched_taxonomy_attributes`
  13 +
  14 +模块追求的设计原则:
  15 +
  16 +- 单一职责:只负责内容理解与结构化输出,不负责 CSV 读写
  17 +- 输出对齐 ES mapping:返回结构可直接写入 `search_products`
  18 +- 配置化扩展:taxonomy profile 通过数据配置扩展,而不是散落条件分支
  19 +- 代码精简:只面向正常使用方式,避免为了不合理调用堆叠补丁逻辑
  20 +
  21 +## 2. 主要文件
  22 +
  23 +- [product_enrich.py](/data/saas-search/indexer/product_enrich.py)
  24 + 运行时主逻辑,负责批处理、缓存、prompt 组装、LLM 调用、markdown 解析、输出整理
  25 +- [product_enrich_prompts.py](/data/saas-search/indexer/product_enrich_prompts.py)
  26 + prompt 模板与 taxonomy profile 配置
  27 +- [document_transformer.py](/data/saas-search/indexer/document_transformer.py)
  28 + 在内部索引构建链路中调用内容富化模块,把结果回填到 ES doc
  29 +- [taxonomy.md](/data/saas-search/indexer/taxonomy.md)
  30 + taxonomy 设计说明与字段清单
  31 +
  32 +## 3. 对外入口
  33 +
  34 +### 3.1 Python 入口
  35 +
  36 +核心入口:
  37 +
  38 +```python
  39 +build_index_content_fields(
  40 + items,
  41 + tenant_id=None,
  42 + enrichment_scopes=None,
  43 + category_taxonomy_profile=None,
  44 +)
  45 +```
  46 +
  47 +输入最小要求:
  48 +
  49 +- `id` 或 `spu_id`
  50 +- `title`
  51 +
  52 +可选输入:
  53 +
  54 +- `brief`
  55 +- `description`
  56 +- `image_url`
  57 +
  58 +关键参数:
  59 +
  60 +- `enrichment_scopes`
  61 + 可选 `generic`、`category_taxonomy`
  62 +- `category_taxonomy_profile`
  63 + taxonomy profile;默认 `apparel`
  64 +
  65 +### 3.2 HTTP 入口
  66 +
  67 +API 路由:
  68 +
  69 +- `POST /indexer/enrich-content`
  70 +
  71 +对应文档:
  72 +
  73 +- [搜索API对接指南-05-索引接口(Indexer)](/data/saas-search/docs/搜索API对接指南-05-索引接口(Indexer).md)
  74 +- [搜索API对接指南-07-微服务接口(Embedding-Reranker-Translation)](/data/saas-search/docs/搜索API对接指南-07-微服务接口(Embedding-Reranker-Translation).md)
  75 +
  76 +## 4. 输出结构
  77 +
  78 +返回结果与 ES mapping 对齐:
  79 +
  80 +```json
  81 +{
  82 + "id": "223167",
  83 + "qanchors": {
  84 + "zh": ["短袖T恤", "纯棉"],
  85 + "en": ["t-shirt", "cotton"]
  86 + },
  87 + "enriched_tags": {
  88 + "zh": ["短袖", "纯棉"],
  89 + "en": ["short sleeve", "cotton"]
  90 + },
  91 + "enriched_attributes": [
  92 + {
  93 + "name": "enriched_tags",
  94 + "value": {
  95 + "zh": ["短袖", "纯棉"],
  96 + "en": ["short sleeve", "cotton"]
  97 + }
  98 + }
  99 + ],
  100 + "enriched_taxonomy_attributes": [
  101 + {
  102 + "name": "Product Type",
  103 + "value": {
  104 + "zh": ["T恤"],
  105 + "en": ["t-shirt"]
  106 + }
  107 + }
  108 + ]
  109 +}
  110 +```
  111 +
  112 +说明:
  113 +
  114 +- `generic` 部分固定输出核心索引语言 `zh`、`en`
  115 +- `taxonomy` 部分同样统一输出 `zh`、`en`
  116 +
  117 +## 5. Taxonomy profile
  118 +
  119 +当前支持:
  120 +
  121 +- `apparel`
  122 +- `3c`
  123 +- `bags`
  124 +- `pet_supplies`
  125 +- `electronics`
  126 +- `outdoor`
  127 +- `home_appliances`
  128 +- `home_living`
  129 +- `wigs`
  130 +- `beauty`
  131 +- `accessories`
  132 +- `toys`
  133 +- `shoes`
  134 +- `sports`
  135 +- `others`
  136 +
  137 +统一约束:
  138 +
  139 +- 所有 profile 都返回 `zh` + `en`
  140 +- profile 只决定 taxonomy 字段集合,不再决定输出语言
  141 +- 所有 profile 都配置中英文字段名,prompt/header 结构保持一致
  142 +
  143 +## 6. 内部索引链路的当前约束
  144 +
  145 +在内部 ES 文档构建链路里,`document_transformer` 当前调用内容富化时,taxonomy profile 暂时固定传:
  146 +
  147 +```python
  148 +category_taxonomy_profile="apparel"
  149 +```
  150 +
  151 +这是一种显式、可控、代码更干净的临时策略。
  152 +
  153 +当前代码里已保留 TODO:
  154 +
  155 +- 后续从数据库读取租户真实所属行业
  156 +- 再用该行业替换固定的 `apparel`
  157 +
  158 +当前不做“根据商品类目文本自动猜 profile”的隐式逻辑,避免增加冗余代码与不必要的不确定性。
  159 +
  160 +## 7. 缓存与批处理
  161 +
  162 +缓存键由以下信息共同决定:
  163 +
  164 +- `analysis_kind`
  165 +- `target_lang`
  166 +- prompt/schema 版本指纹
  167 +- prompt 实际输入文本
  168 +
  169 +批处理规则:
  170 +
  171 +- 单次 LLM 调用最多 20 条
  172 +- 上层允许传更大批次,模块内部自动拆批
  173 +- uncached batch 可并发执行
... ...
indexer/taxonomy.md
... ... @@ -175,8 +175,7 @@ Input product list:
175 175 ## 2. Other taxonomy profiles
176 176  
177 177 说明:
178   -- `apparel` 继续返回 `zh` + `en`。
179   -- 其他 profile 只返回 `en`,并且只定义英文列名。
  178 +- 所有 profile 统一返回 `zh` + `en`。
180 179 - 代码中的 profile slug 与下面保持一致。
181 180  
182 181 | Profile | Core columns (`en`) |
... ...
tests/test_llm_enrichment_batch_fill.py
... ... @@ -10,8 +10,14 @@ from indexer.document_transformer import SPUDocumentTransformer
10 10 def test_fill_llm_attributes_batch_uses_product_enrich_helper(monkeypatch):
11 11 seen_calls: List[Dict[str, Any]] = []
12 12  
13   - def _fake_build_index_content_fields(items, tenant_id=None):
14   - seen_calls.append({"n": len(items), "tenant_id": tenant_id})
  13 + def _fake_build_index_content_fields(items, tenant_id=None, category_taxonomy_profile=None):
  14 + seen_calls.append(
  15 + {
  16 + "n": len(items),
  17 + "tenant_id": tenant_id,
  18 + "category_taxonomy_profile": category_taxonomy_profile,
  19 + }
  20 + )
15 21 return [
16 22 {
17 23 "id": item["id"],
... ... @@ -53,7 +59,7 @@ def test_fill_llm_attributes_batch_uses_product_enrich_helper(monkeypatch):
53 59  
54 60 transformer.fill_llm_attributes_batch(docs, rows)
55 61  
56   - assert seen_calls == [{"n": 45, "tenant_id": "162"}]
  62 + assert seen_calls == [{"n": 45, "tenant_id": "162", "category_taxonomy_profile": "apparel"}]
57 63  
58 64 assert docs[0]["qanchors"]["zh"] == ["zh-anchor-0"]
59 65 assert docs[0]["qanchors"]["en"] == ["en-anchor-0"]
... ...
tests/test_product_enrich_partial_mode.py
... ... @@ -559,15 +559,7 @@ def test_build_index_content_fields_maps_internal_tags_to_enriched_tags_output()
559 559 ],
560 560 }
561 561 ]
562   -
563   -
564   -def test_detect_category_taxonomy_profile_matches_category_hints():
565   - assert product_enrich.detect_category_taxonomy_profile({"category1_name": "玩具"}) == "toys"
566   - assert product_enrich.detect_category_taxonomy_profile({"category": "Beauty & Cosmetics"}) == "beauty"
567   - assert product_enrich.detect_category_taxonomy_profile({"category_path": "Home Appliances / Kitchen"}) == "home_appliances"
568   -
569   -
570   -def test_build_index_content_fields_routes_taxonomy_by_item_profile_and_non_apparel_returns_en_only():
  562 +def test_build_index_content_fields_non_apparel_taxonomy_returns_en_only():
571 563 seen_calls = []
572 564  
573 565 def fake_analyze_products(
... ... @@ -580,40 +572,6 @@ def test_build_index_content_fields_routes_taxonomy_by_item_profile_and_non_appa
580 572 ):
581 573 seen_calls.append((analysis_kind, target_lang, category_taxonomy_profile, tuple(p["id"] for p in products)))
582 574 if analysis_kind == "taxonomy":
583   - if category_taxonomy_profile == "apparel":
584   - return [
585   - {
586   - "id": products[0]["id"],
587   - "lang": target_lang,
588   - "title_input": products[0]["title"],
589   - "product_type": f"{target_lang}-dress",
590   - "target_gender": f"{target_lang}-women",
591   - "age_group": "",
592   - "season": "",
593   - "fit": "",
594   - "silhouette": "",
595   - "neckline": "",
596   - "sleeve_length_type": "",
597   - "sleeve_style": "",
598   - "strap_type": "",
599   - "rise_waistline": "",
600   - "leg_shape": "",
601   - "skirt_shape": "",
602   - "length_type": "",
603   - "closure_type": "",
604   - "design_details": "",
605   - "fabric": "",
606   - "material_composition": "",
607   - "fabric_properties": "",
608   - "clothing_features": "",
609   - "functional_benefits": "",
610   - "color": "",
611   - "color_family": "",
612   - "print_pattern": "",
613   - "occasion_end_use": "",
614   - "style_aesthetic": "",
615   - }
616   - ]
617 575 assert category_taxonomy_profile == "toys"
618 576 assert target_lang == "en"
619 577 return [
... ... @@ -655,21 +613,30 @@ def test_build_index_content_fields_routes_taxonomy_by_item_profile_and_non_appa
655 613  
656 614 with mock.patch.object(product_enrich, "analyze_products", side_effect=fake_analyze_products):
657 615 result = product_enrich.build_index_content_fields(
658   - items=[
659   - {"spu_id": "1", "title": "dress", "category_taxonomy_profile": "apparel"},
660   - {"spu_id": "2", "title": "toy", "category_taxonomy_profile": "toys"},
661   - ],
  616 + items=[{"spu_id": "2", "title": "toy"}],
662 617 tenant_id="170",
663   - category_taxonomy_profile="apparel",
  618 + category_taxonomy_profile="toys",
664 619 )
665 620  
666   - assert result[0]["enriched_taxonomy_attributes"] == [
667   - {"name": "Product Type", "value": {"zh": ["zh-dress"], "en": ["en-dress"]}},
668   - {"name": "Target Gender", "value": {"zh": ["zh-women"], "en": ["en-women"]}},
669   - ]
670   - assert result[1]["enriched_taxonomy_attributes"] == [
671   - {"name": "Product Type", "value": {"en": ["doll set"]}},
672   - {"name": "Age Group", "value": {"en": ["kids"]}},
  621 + assert result == [
  622 + {
  623 + "id": "2",
  624 + "qanchors": {"zh": ["zh-anchor"], "en": ["en-anchor"]},
  625 + "enriched_tags": {"zh": ["zh-tag"], "en": ["en-tag"]},
  626 + "enriched_attributes": [
  627 + {
  628 + "name": "enriched_tags",
  629 + "value": {
  630 + "zh": ["zh-tag"],
  631 + "en": ["en-tag"],
  632 + },
  633 + }
  634 + ],
  635 + "enriched_taxonomy_attributes": [
  636 + {"name": "Product Type", "value": {"en": ["doll set"]}},
  637 + {"name": "Age Group", "value": {"en": ["kids"]}},
  638 + ],
  639 + }
673 640 ]
674 641 assert ("taxonomy", "zh", "toys", ("2",)) not in seen_calls
675 642 assert ("taxonomy", "en", "toys", ("2",)) in seen_calls
... ...