fix: search image_url normalization, logging, Streamlit width API

- Search image_url: parse results[].image_url, add _normalize_image_url() to convert protocol-less URLs (////host/path) to https://host/path; fix double slash (use https:// + url.lstrip("/") so normalized URL has single //). - Logging: log full LLM request/response (LLM_REQUEST, LLM_RESPONSE), full tool call results (TOOL_CALL_RESULT); for search tool log SEARCH_RESULT summary and per-item SEARCH_RESULT_ITEM (image_url_raw) and SEARCH_RESULT_PRODUCT (image_url_normalized). - Streamlit: replace deprecated use_container_width=True with width="stretch" for st.image and st.button. Co-authored-by: Cursor <cursoragent@cursor.com>

fix: search image_url normalization, logging, Streamlit width API
- Search image_url: parse results[].image_url, add _normalize_image_url() to convert protocol-less URLs (////host/path) to https://host/path; fix double slash (use https:// + url.lstrip("/") so normalized URL has single //). - Logging: log full LLM request/response (LLM_REQUEST, LLM_RESPONSE), full tool call results (TOOL_CALL_RESULT); for search tool log SEARCH_RESULT summary and per-item SEARCH_RESULT_ITEM (image_url_raw) and SEARCH_RESULT_PRODUCT (image_url_normalized). - Streamlit: replace deprecated use_container_width=True with width="stretch" for st.image and st.button. Co-authored-by: Cursor <cursoragent@cursor.com>
tangwang
1 parent 66442668
Showing 3 changed files with 106 additions and 13 deletions Show diff stats
app.py
app/agents/shopping_agent.py
app/tools/search_tools.py
@@ -248,9 +248,9 @@ def initialize_session():
     if "show_image_upload" not in st.session_state:
         st.session_state.show_image_upload = False
  
-    # Debug panel toggle
+    # Debug panel toggle (default True so 显示调试过程 is checked by default)
     if "show_debug" not in st.session_state:
-        st.session_state.show_debug = False
+        st.session_state.show_debug = True
  
  
 def save_uploaded_image(uploaded_file) -> Optional[str]:
@@ -276,7 +276,7 @@ def save_uploaded_image(uploaded_file) -&gt; Optional[str]:
  
  
 def _load_product_image(product: ProductItem) -> Optional[Image.Image]:
-    """Try to load a product image: image_url from API → local data/images → None."""
+    """Try to load a product image: image_url from API (normalized when stored) → local data/images → None."""
     if product.image_url:
         try:
             import requests
@@ -306,7 +306,7 @@ def display_product_card_from_item(product: ProductItem) -&gt; None:
             img = ImageOps.fit(img, target, method=Image.Resampling.LANCZOS)
         except AttributeError:
             img = ImageOps.fit(img, target, method=Image.LANCZOS)
-        st.image(img, use_container_width=True)
+        st.image(img, width="stretch")
     else:
         st.markdown(
             '<div style="height:120px;background:#f5f5f5;border-radius:6px;'
@@ -530,7 +530,7 @@ def main():
     with st.sidebar:
         st.markdown("### ⚙️ Settings")
  
-        if st.button("🗑️ Clear Chat", use_container_width=True):
+        if st.button("🗑️ Clear Chat", width="stretch"):
             if "shopping_agent" in st.session_state:
                 st.session_state.shopping_agent.clear_history()
             # Clear search result registry for this session
@@ -595,7 +595,7 @@ def main():
  
         with col1:
             # Image upload toggle button
-            if st.button("➕", help="Add image", use_container_width=True):
+            if st.button("➕", help="Add image", width="stretch"):
                 st.session_state.show_image_upload = (
                     not st.session_state.show_image_upload
                 )
@@ -8,9 +8,10 @@ Architecture:
   re-listing product details; the UI renders product cards from the registry
 """
  
+import json
 import logging
 from pathlib import Path
-from typing import Optional, Sequence
+from typing import Any, Optional, Sequence
  
 from langchain_core.messages import BaseMessage, HumanMessage, SystemMessage
 from langchain_openai import ChatOpenAI
@@ -32,15 +33,14 @@ logger = logging.getLogger(__name__)
 #   1. Guides multi-query search planning with explicit evaluate-and-decide loop
 #   2. Forbids re-listing product details in the final response
 #   3. Mandates [SEARCH_REF:xxx] inline citation as the only product presentation mechanism
-SYSTEM_PROMPT = """
-角色定义
+SYSTEM_PROMPT = """角色定义
 你是一名专业的服装电商导购，是一个善于倾听、主动引导、懂得搭配的“时尚顾问”，通过有温度的对话，给用户提供有价值的信息，包括需求引导、方案推荐、搜索结果推荐，最终促成满意的购物决策或转化行为。
  
 一些原则：
 1. 你是一个真人导购，是一个贴心、专业的销售，保持灵活，根据上下文，基于常识灵活的切换策略，在合适的上下文询问合适的问题、给出有价值的方案和搜索结果的呈现。
-2. 兼顾推荐与信息收集：适时的提供有价值的信息，如商品推荐、穿搭建议、趋势信息，在推荐方向上有需求缺口、需要明确的重要信息时，要适时的做“信息收集”，引导式的帮助用户更清晰的呈现需求、提高商品发现的效率，形成“提供-反馈”的良性循环。
-  1. 在意图不明时，主动通过1-2个关键问题（如品类、场景、风格、预算）进行引导，并提供初步方向。
-  2. 在了解到初步意向后，要进行相关商品的搜索、进行搜索结果的呈现，同时思考该方向下重要的决策因素，进行提议和问题收集，让用户既得到相关信息、又得到下一步的方向引导、同时也有机会修正或者细化诉求。
+2. 商品搜索结果推荐与信息收集：
+  1. 根据上下文、用户诉求，灵活的切换侧重点，何时需要进行搜索、何时要引导客户完善需求，你需要站在用户角度进行思考。比如已经有较为清晰的意图，则以搜索、方案推荐为主，有必要的时候，思考该方向下重要的决策因素，进行提议和问题收集，让用户既得到相关信息、又得到下一步的方向引导、同时也有机会修正或者细化诉求。如果存在重大的需求方向缺口，主动通过1-2个关键问题进行引导，并提供初步方向。
+  2. 适时的提供有价值的信息，如商品推荐、穿搭建议、趋势信息，在推荐方向上有需求缺口、需要明确的重要信息时，要适时的做“信息收集”，引导式的帮助用户更清晰的呈现需求、提高商品发现的效率，形成“提供-反馈”的良性循环。
   3. 对于复杂需求时，要能基于上下文，将导购任务进行合理拆解。
 3. 引导或者收集需求时，需要站在用户立场，比如询问用户期待的效果或感觉、使用的场合、偏好的风格等用户立场需，而不是询问具体的款式或参数，你需要将用户立场的需求理解/翻译/转化为具体的搜索计划，最后筛选产品、结合需求+结果特性组织推荐理由、呈现方案。
 4. 如何使用search_products：在需要搜索商品的时候，可以将需求分解为 2-4 个搜索查询，每个 query 聚焦一个明确的商品子类或搜索角度。每次调用 search_products 后，工具会返回以下内容，你需要决策是否要调整搜索策略，比如结果质量太差，可能需要调整搜索词、或者加大试探的query数量（不要超过3-5个）。可以进行多轮搜索，但是要适时的总结和反馈信息避免用户等待过长时间：
@@ -65,6 +65,11 @@ class AgentState(TypedDict):
  
 # ── Helper ─────────────────────────────────────────────────────────────────────
  
+# Max length for logging single content field (avoid huge logs)
+_LOG_CONTENT_MAX = 8000
+_LOG_TOOL_RESULT_MAX = 4000
+
+
 def _extract_message_text(msg) -> str:
     """Extract plain text from a LangChain message (handles str or content_blocks)."""
     content = getattr(msg, "content", "")
@@ -81,6 +86,23 @@ def _extract_message_text(msg) -&gt; str:
     return str(content) if content else ""
  
  
+def _message_for_log(msg: BaseMessage) -> dict:
+    """Serialize a message for structured logging (content truncated)."""
+    text = _extract_message_text(msg)
+    if len(text) > _LOG_CONTENT_MAX:
+        text = text[:_LOG_CONTENT_MAX] + f"... [truncated, total {len(text)} chars]"
+    out: dict[str, Any] = {
+        "type": getattr(msg, "type", "unknown"),
+        "content": text,
+    }
+    if hasattr(msg, "tool_calls") and msg.tool_calls:
+        out["tool_calls"] = [
+            {"name": tc.get("name"), "args": tc.get("args", {})}
+            for tc in msg.tool_calls
+        ]
+    return out
+
+
 # ── Agent class ────────────────────────────────────────────────────────────────
  
 class ShoppingAgent:
@@ -111,7 +133,18 @@ class ShoppingAgent:
             messages = state["messages"]
             if not any(isinstance(m, SystemMessage) for m in messages):
                 messages = [SystemMessage(content=SYSTEM_PROMPT)] + list(messages)
+            request_log = [_message_for_log(m) for m in messages]
+            req_json = json.dumps(request_log, ensure_ascii=False)
+            if len(req_json) > _LOG_CONTENT_MAX:
+                req_json = req_json[:_LOG_CONTENT_MAX] + f"... [truncated total {len(req_json)}]"
+            logger.info("[%s] LLM_REQUEST messages=%s", self.session_id, req_json)
             response = self.llm_with_tools.invoke(messages)
+            response_log = _message_for_log(response)
+            logger.info(
+                "[%s] LLM_RESPONSE %s",
+                self.session_id,
+                json.dumps(response_log, ensure_ascii=False),
+            )
             return {"messages": [response]}
  
         def should_continue(state: AgentState):
@@ -202,6 +235,16 @@ class ShoppingAgent:
                         preview = text[:600] + ("…" if len(text) > 600 else "")
                         if i < len(unresolved):
                             unresolved[i]["result"] = preview
+                            tc_name = unresolved[i].get("name", "")
+                            tc_args = unresolved[i].get("args", {})
+                            result_log = text if len(text) <= _LOG_TOOL_RESULT_MAX else text[:_LOG_TOOL_RESULT_MAX] + f"... [truncated total {len(text)}]"
+                            logger.info(
+                                "[%s] TOOL_CALL_RESULT name=%s args=%s result=%s",
+                                self.session_id,
+                                tc_name,
+                                json.dumps(tc_args, ensure_ascii=False),
+                                result_log,
+                            )
                         step_results.append({"content": preview})
  
                     debug_steps.append({"node": "tools", "results": step_results})
@@ -38,6 +38,21 @@ logger = logging.getLogger(__name__)
 _openai_client: Optional[OpenAI] = None
  
  
+def _normalize_image_url(url: Optional[str]) -> Optional[str]:
+    """Normalize image_url from API (e.g. ////cnres.appracle.com/... → https://cnres.appracle.com/...)."""
+    if not url or not isinstance(url, str):
+        return None
+    url = url.strip()
+    if not url:
+        return None
+    if url.startswith("https://") or url.startswith("http://"):
+        return url
+    # // or ////host/path → https://host/path (exactly one "//" after scheme)
+    if url.startswith("/"):
+        return "https://" + url.lstrip("/")
+    return "https://" + url
+
+
 def get_openai_client() -> OpenAI:
     global _openai_client
     if _openai_client is None:
@@ -226,7 +241,7 @@ def make_search_products_tool(
                                 raw.get("category_path") or raw.get("category_name")
                             ),
                             vendor=raw.get("vendor"),
-                            image_url=raw.get("image_url"),
+                            image_url=_normalize_image_url(raw.get("image_url")),
                             relevance_score=raw.get("relevance_score"),
                             match_label=label,
                             tags=raw.get("tags") or [],
@@ -249,6 +264,41 @@ def make_search_products_tool(
                 products=products,
             )
             registry.register(session_id, result)
+
+            # ── Search result detailed log (ref_id, summary, per-item id + image_url raw/normalized) ──
+            logger.info(
+                "[%s] SEARCH_RESULT ref_id=%s query=%s total_api_hits=%s returned_count=%s "
+                "verdict=%s quality_summary=%s perfect=%s partial=%s irrelevant=%s",
+                session_id,
+                ref_id,
+                query,
+                total_hits,
+                len(raw_results),
+                verdict,
+                quality_summary,
+                perfect_count,
+                partial_count,
+                irrelevant_count,
+            )
+            for idx, raw in enumerate(raw_results):
+                raw_img = raw.get("image_url") or ""
+                logger.info(
+                    "[%s] SEARCH_RESULT_ITEM raw idx=%s spu_id=%s title=%s image_url_raw=%s",
+                    session_id,
+                    idx,
+                    raw.get("spu_id", ""),
+                    (raw.get("title") or "")[:60],
+                    raw_img,
+                )
+            for p in products:
+                logger.info(
+                    "[%s] SEARCH_RESULT_PRODUCT spu_id=%s match_label=%s image_url_normalized=%s",
+                    session_id,
+                    p.spu_id,
+                    p.match_label,
+                    p.image_url or "",
+                )
+
             logger.info(
                 f"[{session_id}] Registered {ref_id}: verdict={verdict}, "
                 f"perfect={perfect_count}, partial={partial_count}, irrel={irrelevant_count}"