Challenge Scenario
You are a data scientist at Cymbal, an online retail store. You want to build a pipeline to constantly search for similar products on the market to inform a marketing comparison study. You have a few challenges:
How to handle multimodal data: The data you have collected is multimodal including text, image, and video, including some files in Cloud Storage.
How to perform a semantic similarity search instead of a keyword search: You want to find similar products across multiple dimensions (e.g., image, description, and specific features), where keyword search may not be effective.
How to use BigQuery to do it: Since most of your data is already in BigQuery, using the same tool could minimize the learning curve.
To address these challenges, you decide to implement multimodal vector search with BigQuery.
Topics tested
- Creating an external source connection in BigQuery and granting proper IAM permissions.
- Creating an object table to store images.
- Generating embeddings to convert images (multimodal data) to vectors.
- Running a vector search to find similar products.
#gcp #googlecloud #qwiklabs #learntoearn