Commit 78cdef1c21718b2c9333855f4b44f5bf8a3ca49f

Authored by tangwang
1 parent f27a8d90

添加字段enriched_taxonomy_attributes

indexer/Untitled 0 → 100644
... ... @@ -0,0 +1 @@
  1 +taxonomy
0 2 \ No newline at end of file
... ...
indexer/taxonomy.md 0 → 100644
... ... @@ -0,0 +1,127 @@
  1 +
  2 +服装大类的taxonomy
  3 +
  4 +### A. Product Classification
  5 +
  6 +| 一级层级 | 中文列名 | English Column Name |
  7 +| ------------------------- | ---- | ------------------- |
  8 +| A. Product Classification | 品类 | Product Type |
  9 +| A. Product Classification | 目标性别 | Target Gender |
  10 +| A. Product Classification | 年龄段 | Age Group |
  11 +| A. Product Classification | 适用季节 | Season |
  12 +
  13 +### B. Garment Design
  14 +
  15 +| 一级层级 | 中文列名 | English Column Name |
  16 +| ----------------- | ---- | ------------------- |
  17 +| B. Garment Design | 版型 | Fit |
  18 +| B. Garment Design | 廓形 | Silhouette |
  19 +| B. Garment Design | 领型 | Neckline |
  20 +| B. Garment Design | 袖型 | Sleeve Style |
  21 +| B. Garment Design | 肩带设计 | Strap Type |
  22 +| B. Garment Design | 腰型 | Rise / Waistline |
  23 +| B. Garment Design | 裤型 | Leg Shape |
  24 +| B. Garment Design | 裙型 | Skirt Shape |
  25 +| B. Garment Design | 长度 | Length Type |
  26 +| B. Garment Design | 闭合方式 | Closure Type |
  27 +| B. Garment Design | 设计细节 | Design Details |
  28 +
  29 +### C. Material & Performance
  30 +
  31 +| 一级层级 | 中文列名 | English Column Name |
  32 +| ------------------------- | ----------- | -------------------- |
  33 +| C. Material & Performance | 面料 | Fabric |
  34 +| C. Material & Performance | 成分 | Material Composition |
  35 +| C. Material & Performance | 面料特性 | Fabric Properties |
  36 +| C. Material & Performance | 服装特征 / 功能细节 | Clothing Features |
  37 +| C. Material & Performance | 功能 | Functional Benefits |
  38 +
  39 +### D. Merchandising Attributes
  40 +
  41 +| 一级层级 | 中文列名 | English Column Name |
  42 +| --------------------------- | ------- | ------------------- |
  43 +| D. Merchandising Attributes | 主颜色 | Color |
  44 +| D. Merchandising Attributes | 色系 | Color Family |
  45 +| D. Merchandising Attributes | 印花 / 图案 | Print / Pattern |
  46 +| D. Merchandising Attributes | 适用场景 | Occasion / End Use |
  47 +| D. Merchandising Attributes | 风格 | Style Aesthetic |
  48 +
  49 +
  50 +
  51 +根据这个产生
  52 +enriched_taxonomy_attributes
  53 +
  54 +```python
  55 +Product Type
  56 +Target Gender
  57 +Age Group
  58 +Season
  59 +Fit
  60 +Silhouette
  61 +Neckline
  62 +Sleeve Length Type
  63 +Sleeve Style
  64 +Strap Type
  65 +Rise / Waistline
  66 +Leg Shape
  67 +Skirt Shape
  68 +Length Type
  69 +Closure Type
  70 +Design Details
  71 +Fabric
  72 +Material Composition
  73 +Fabric Properties
  74 +Clothing Features
  75 +Functional Benefits
  76 +Color
  77 +Color Family
  78 +Print / Pattern
  79 +Occasion / End Use
  80 +Style Aesthetic
  81 +```
  82 +
  83 +提示词:
  84 +
  85 +```python
  86 +SHARED_ANALYSIS_INSTRUCTION = """
  87 +Analyze each input product text and fill the columns below using an apparel attribute taxonomy.
  88 +
  89 +Output columns:
  90 +1. Product Type: concise ecommerce apparel category label, not a full marketing title
  91 +2. Target Gender: intended gender only if clearly implied
  92 +3. Age Group: only if clearly implied, e.g. adults, kids, teens, toddlers, babies
  93 +4. Season: season(s) or all-season suitability only if supported
  94 +5. Fit: body closeness, e.g. slim, regular, relaxed, oversized, fitted
  95 +6. Silhouette: overall garment shape, e.g. straight, A-line, boxy, tapered, bodycon, wide-leg
  96 +7. Neckline: neckline type when applicable, e.g. crew neck, V-neck, hooded, collared, square neck
  97 +8. Sleeve Length Type: sleeve length only, e.g. sleeveless, short sleeve, long sleeve, three-quarter sleeve
  98 +9. Sleeve Style: sleeve design only, e.g. puff sleeve, raglan sleeve, batwing sleeve, bell sleeve
  99 +10. Strap Type: strap design when applicable, e.g. spaghetti strap, wide strap, halter strap, adjustable strap
  100 +11. Rise / Waistline: waist placement when applicable, e.g. high rise, mid rise, low rise, empire waist
  101 +12. Leg Shape: for bottoms only, e.g. straight leg, wide leg, flare leg, tapered leg, skinny leg
  102 +13. Skirt Shape: for skirts only, e.g. A-line, pleated, pencil, mermaid
  103 +14. Length Type: design length only, not size, e.g. cropped, regular, longline, mini, midi, maxi, ankle length, full length
  104 +15. Closure Type: fastening method when applicable, e.g. zipper, button, drawstring, elastic waist, hook-and-loop
  105 +16. Design Details: construction or visual details, e.g. ruched, ruffled, pleated, cut-out, layered, distressed, split hem
  106 +17. Fabric: fabric type only, e.g. denim, knit, chiffon, jersey, fleece, cotton twill
  107 +18. Material Composition: fiber content or blend only if stated, e.g. cotton, polyester, spandex, linen blend, 95% cotton 5% elastane
  108 +19. Fabric Properties: inherent fabric traits, e.g. stretch, breathable, lightweight, soft-touch, water-resistant
  109 +20. Clothing Features: product features, e.g. lined, reversible, hooded, packable, padded, pocketed
  110 +21. Functional Benefits: wearer benefits, e.g. moisture-wicking, thermal insulation, UV protection, easy care, supportive compression
  111 +22. Color: specific color name when available
  112 +23. Color Family: normalized broad retail color group, e.g. black, white, blue, green, red, pink, beige, brown, gray
  113 +24. Print / Pattern: surface pattern when applicable, e.g. solid, striped, plaid, floral, graphic, animal print
  114 +25. Occasion / End Use: likely use occasion only if supported, e.g. office, casual wear, streetwear, lounge, workout, outdoor
  115 +26. Style Aesthetic: overall style only if supported, e.g. minimalist, streetwear, athleisure, smart casual, romantic, playful
  116 +
  117 +Rules:
  118 +- Keep the same row order and row count as input.
  119 +- Infer only from the provided product text.
  120 +- Leave blank if not applicable or not reasonably supported.
  121 +- Use concise, standardized English ecommerce wording.
  122 +- Do not combine different attribute dimensions in one field.
  123 +- If multiple values are needed, use the delimiter required by the localization setting.
  124 +
  125 +Input product list:
  126 +"""
  127 +```
... ...
mappings/README.md
... ... @@ -68,6 +68,7 @@
68 68 - `option2_values`
69 69 - `option3_values`
70 70 - `enriched_attributes.value`
  71 +- `enriched_taxonomy_attributes.value`
71 72 - `specifications.value_text`
72 73  
73 74 以 `category_path` 和 `option*_values` 为例,核心语言灌入结果应至少包含:
... ...
mappings/generate_search_products_mapping.py
... ... @@ -214,6 +214,11 @@ FIELD_SPECS = [
214 214 scalar_field("name", "keyword"),
215 215 text_field("value", "core_language_text_with_keyword"),
216 216 ),
  217 + nested_field(
  218 + "enriched_taxonomy_attributes",
  219 + scalar_field("name", "keyword"),
  220 + text_field("value", "core_language_text_with_keyword"),
  221 + ),
217 222 scalar_field("option1_name", "keyword"),
218 223 scalar_field("option2_name", "keyword"),
219 224 scalar_field("option3_name", "keyword"),
... ...
mappings/search_products.json
... ... @@ -34,8 +34,8 @@
34 34 "similarity": {
35 35 "default": {
36 36 "type": "BM25",
37   - "b": 0.1,
38   - "k1": 0.3
  37 + "b": 0.0,
  38 + "k1": 0.0
39 39 }
40 40 }
41 41 },
... ... @@ -2116,6 +2116,40 @@
2116 2116 }
2117 2117 }
2118 2118 },
  2119 + "enriched_taxonomy_attributes": {
  2120 + "type": "nested",
  2121 + "properties": {
  2122 + "name": {
  2123 + "type": "keyword"
  2124 + },
  2125 + "value": {
  2126 + "type": "object",
  2127 + "properties": {
  2128 + "zh": {
  2129 + "type": "text",
  2130 + "analyzer": "index_ik",
  2131 + "search_analyzer": "query_ik",
  2132 + "fields": {
  2133 + "keyword": {
  2134 + "type": "keyword",
  2135 + "normalizer": "lowercase"
  2136 + }
  2137 + }
  2138 + },
  2139 + "en": {
  2140 + "type": "text",
  2141 + "analyzer": "english",
  2142 + "fields": {
  2143 + "keyword": {
  2144 + "type": "keyword",
  2145 + "normalizer": "lowercase"
  2146 + }
  2147 + }
  2148 + }
  2149 + }
  2150 + }
  2151 + }
  2152 + },
2119 2153 "option1_name": {
2120 2154 "type": "keyword"
2121 2155 },
... ...