使用FastEmbed和Qdrant構建一個男士服裝推薦系統

作者：布加迪 2025-07-11 07:49:07

本文將構建一個男士服裝推薦系統，它將使用圖片嵌入和Qdrant矢量數據庫。你將從原始圖片數據進入到實時視覺推薦。

譯者 | 布加迪

審校 | 重樓

從Netflix、Spotify到亞馬遜，推薦系統無處不在。但是如果你想構建一個不僅關注標題或標簽，還關注圖片的視覺推薦引擎，該如何創建？本文將構建一個男士服裝推薦系統，它將使用圖片嵌入和Qdrant矢量數據庫。你將從原始圖片數據進入到實時視覺推薦。

學習目標

圖片嵌入如何表示視覺內容？
如何使用FastEmbed生成矢量？
如何使用Qdrant存儲和搜索矢量？
如何構建反饋驅動的推薦引擎？
如何使用Streamlit創建簡單的用戶界面？

用例：T恤和Polo衫的視覺推薦

想象一下，用戶點擊了一件時尚的Polo衫。你的服裝推薦系統不再使用產品標簽，而是推薦外觀相似的T恤和Polo衫。它使用圖片本身來做出決策。

不妨探索如何做到這一點。

第1步：理解圖片嵌入

什么是圖片嵌入？

圖片嵌入是一個矢量，它是一個數字列表。這些數字代表圖片中的關鍵特征。兩張相似的圖片在矢量空間中有相近的嵌入。這使得系統能夠測量視覺相似性。

比如說，兩件不同的T恤可能在像素上看起來不同。但如果它們的顏色、圖案和紋理相似，它們的嵌入就很接近。這對于服裝推薦系統來說至關重要。

如何生成嵌入？

大多數嵌入模型都使用深度學習。CNN（卷積神經網絡）提取視覺模式。這些模式成為矢量的一部分。

我們在本文中使用FastEmbed。這里使用的嵌入模型是：Qdrant/Unicom-ViT-B-32。

from fastembed import ImageEmbedding
from typing import List
from dotenv import load_dotenv
import os

load_dotenv()
model = ImageEmbedding(os.getenv("IMAGE_EMBEDDING_MODEL"))

def compute_image_embedding(image_paths: List[str]) -> list[float]:
 return list(model.embed(image_paths))

該函數接受一個圖片路徑列表。它返回捕捉這些圖片精髓的矢量。

第2步：獲取數據集

我們使用了一個包含約2000張男士服裝圖片的數據集。你可以在Kaggle上找到它。以下是我們加載數據集的方法：

import shutil, os, kagglehub
from dotenv import load_dotenv

load_dotenv()
kaggle_repo = os.getenv("KAGGLE_REPO")
path = kagglehub.dataset_download(kaggle_repo)
target_folder = os.getenv("DATA_PATH")

def getData():
 if not os.path.exists(target_folder):
 shutil.copytree(path, target_folder)

該腳本檢查目標文件夾是否存在。如果不存在，將圖片復制到該文件夾。

第3步：使用Qdrant存儲和搜索矢量

有了嵌入后，我們需要存儲和搜索它們。這時Qdrant就派上用場了。它是一個快速且可擴展的矢量數據庫。

以下是連接到Qdrant矢量數據庫的方法：

from qdrant_client import QdrantClient

client = QdrantClient(
 url=os.getenv("QDRANT_URL"),
 api_key=os.getenv("QDRANT_API_KEY"),
)
This is how to insert the images paired with its embedding to a Qdrant collection:
class VectorStore:
 def __init__(self, embed_batch: int = 64, upload_batch: int = 32, parallel_uploads: int = 3):
 # ... (initializer code omitted for brevity) ...

 def insert_images(self, image_paths: List[str]):
 def chunked(iterable, size):
 for i in range(0, len(iterable), size):
 yield iterable[i:i + size]

 for batch in chunked(image_paths, self.embed_batch):
 embeddings = compute_image_embedding(batch) # Batch embed
 points = [
 models.PointStruct(id=str(uuid.uuid4()), vector=emb, payload={"image_path": img})
 for emb, img in zip(embeddings, batch)
 ]

 # Batch upload each sub-batch
 self.client.upload_points(
 collection_name=self.collection_name,
 points=points,
 batch_size=self.upload_batch,
 parallel=self.parallel_uploads,
 max_retries=3,
 wait=True
 )

該代碼獲取圖片文件路徑列表，將其批量轉換成嵌入，然后將這些嵌入上傳到Qdrant集合。它先檢查該集合是否存在。然后，它使用線程并行處理圖片以加快速度。每幅圖片都會獲得一個唯一的 ID，并與其嵌入和路徑一起被封裝成一個“點”。然后，這些點被分塊上傳到 Qdrant。

搜索相似的圖片

def search_similar(query_image_path: str, limit: int = 5):
 emb_list = compute_image_embedding([query_image_path])
 hits = client.search(
 collection_name="fashion_images",
 query_vector=emb_list[0],
 limit=limit
 )
 return [{"id": h.id, "image_path": h.payload.get("image_path")} for h in hits]

你提供一張查詢圖片。系統會使用余弦相似度指標返回視覺上相似的圖片。

第4步：創建帶有反饋的推薦引擎

現在我們更進一步。如果用戶喜歡某些圖片而不喜歡其他圖片怎么辦？服裝推薦系統能從中學習嗎？

是的。Qdrant允許我們提供正面反饋和負面反饋。然后它會返回更好、更個性化的結果。

class RecommendationEngine:
 def get_recommendations(self, liked_images:List[str], disliked_images:List[str], limit=10):
 recommended = client.recommend(
 collection_name="fashion_images",
 positive=liked_images,
 negative=disliked_images,
 limit=limit
 )
 return [{"id": hit.id, "image_path": hit.payload.get("image_path")} for hit in recommended]

以下是該函數的輸入：

liked_images：代表用戶喜歡的商品的圖片ID 列表。
disliked_images：代表用戶不喜歡的商品的圖片ID 列表。
limit（可選）：這個整數指定返回的最大推薦數量（默認為 10）。

這將使用之前介紹的嵌入矢量相似度返回推薦的服裝。

這讓你的系統可以適應變化，快速學習用戶偏好。

第5步：使用Streamlit構建 UI

我們使用Streamlit來創建界面。它簡單、快速，用Python 編寫。

用戶可以：

瀏覽服裝
點贊或點踩商品
查看新的、更合理的推薦

以下是Streamlit代碼：

import streamlit as st
from PIL import Image
import os

from src.recommendation.engine import RecommendationEngine
from src.vector_database.vectorstore import VectorStore
from src.data.get_data import getData

# -------------- Config --------------
st.set_page_config(page_title="?? Men's Fashion Recommender", layout="wide")
IMAGES_PER_PAGE = 12

# -------------- Ensure Dataset Exists (once) --------------
@st.cache_resource
def initialize_data():
 getData()
 return VectorStore(), RecommendationEngine()

vector_store, recommendation_engine = initialize_data()

# -------------- Session State Defaults --------------
session_defaults = {
 "liked": {},
 "disliked": {},
 "current_page": 0,
 "recommended_images": vector_store.points,
 "vector_store": vector_store,
 "recommendation_engine": recommendation_engine,
}

for key, value in session_defaults.items():
 if key not in st.session_state:
 st.session_state[key] = value

# -------------- Sidebar Info --------------
with st.sidebar:
 st.title("?? Men's Fashion Recommender")

 st.markdown("""
 **Discover fashion styles that suit your taste.** 
 Like ?? or dislike ?? outfits and receive AI-powered recommendations tailored to you.
 """)

 st.markdown("### ?? Dataset")
 st.markdown("""
 - Source: [Kaggle – virat164/fashion-database](https://www.kaggle.com/datasets/virat164/fashion-database) 
 - ~2,000 fashion images
 """)

 st.markdown("### ?? How It Works")
 st.markdown("""
 1. Images are embedded into vector space 
 2. You provide preferences via Like/Dislike 
 3. Qdrant finds visually similar images 
 4. Results are updated in real-time
 """)

 st.markdown("### ?? Technologies")
 st.markdown("""
 - **Streamlit** UI 
 - **Qdrant** vector DB 
 - **Python** backend 
 - **PIL** for image handling 
 - **Kaggle API** for data
 """)

 st.markdown("---")
# -------------- Core Logic Functions --------------
def get_recommendations(liked_ids, disliked_ids):
 return st.session_state.recommendation_engine.get_recommendations(
 liked_images=liked_ids,
 disliked_images=disliked_ids,
 limit=3 * IMAGES_PER_PAGE
 )

def refresh_recommendations():
 liked_ids = list(st.session_state.liked.keys())
 disliked_ids = list(st.session_state.disliked.keys())
 st.session_state.recommended_images = get_recommendations(liked_ids, disliked_ids)

# -------------- Display: Selected Preferences --------------
def display_selected_images():
 if not st.session_state.liked and not st.session_state.disliked:
 return

 st.markdown("### ?? Your Picks")
 cols = st.columns(6)
 images = st.session_state.vector_store.points

 for i, (img_id, status) in enumerate(
 list(st.session_state.liked.items()) + list(st.session_state.disliked.items())
 ):
 img_path = next((img["image_path"] for img in images if img["id"] == img_id), None)
 if img_path and os.path.exists(img_path):
 with cols[i % 6]:
 st.image(img_path, use_container_width=True, captinotallow=f"{img_id} ({status})")
 col1, col2 = st.columns(2)
 if col1.button("? Remove", key=f"remove_{img_id}"):
 if status == "liked":
 del st.session_state.liked[img_id]
 else:
 del st.session_state.disliked[img_id]
 refresh_recommendations()
 st.rerun()

 if col2.button("?? Switch", key=f"switch_{img_id}"):
 if status == "liked":
 del st.session_state.liked[img_id]
 st.session_state.disliked[img_id] = "disliked"
 else:
 del st.session_state.disliked[img_id]
 st.session_state.liked[img_id] = "liked"
 refresh_recommendations()
 st.rerun()

# -------------- Display: Recommended Gallery --------------
def display_gallery():
 st.markdown("### ?? Smart Suggestions")

 page = st.session_state.current_page
 start_idx = page * IMAGES_PER_PAGE
 end_idx = start_idx + IMAGES_PER_PAGE
 current_images = st.session_state.recommended_images[start_idx:end_idx]

 cols = st.columns(4)
 for idx, img in enumerate(current_images):
 with cols[idx % 4]:
 if os.path.exists(img["image_path"]):
 st.image(img["image_path"], use_container_width=True)
 else:
 st.warning("Image not found")

 col1, col2 = st.columns(2)
 if col1.button("?? Like", key=f"like_{img['id']}"):
 st.session_state.liked[img["id"]] = "liked"
 refresh_recommendations()
 st.rerun()
 if col2.button("?? Dislike", key=f"dislike_{img['id']}"):
 st.session_state.disliked[img["id"]] = "disliked"
 refresh_recommendations()
 st.rerun()

 # Pagination
 col1, _, col3 = st.columns([1, 2, 1])
 with col1:
 if st.button("?? Previous") and page > 0:
 st.session_state.current_page -= 1
 st.rerun()
 with col3:
 if st.button("?? Next") and end_idx < len(st.session_state.recommended_images):
 st.session_state.current_page += 1
 st.rerun()

# -------------- Main Render Pipeline --------------
st.title("?? Men's Fashion Recommender")

display_selected_images()
st.divider()
display_gallery()

This UI closes the loop. It turns a function into a usable product.

結論

你剛構建了一個完整的服裝推薦系統，它可以識別圖片、理解視覺特征并提供智能推薦。

使用 FastEmbed、Qdrant和Streamlit，你現在有了一個強大的推薦系統。它適用于T恤、Polo衫以及任何男士服裝，但也可以適用于任何其他基于圖片的推薦。

原文標題：Build a Men’s Fashion Recommendation System Using FastEmbed and Qdrant，作者：Rindra Randriamihamina

責任編輯：姜華來源： 51CTO

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看