Building AI-Powered Search for E-Commerce with Amazon OpenSearch: From "Gift for My Wife" to Relevant Products

The Problem with Keyword Search

Type "gift for my wife" into a traditional e-commerce search box built on BM25/TF-IDF keyword matching, and you'll get exactly what you asked for: nothing useful. The engine looks for products whose titles or descriptions literally contain the tokens "gift," "wife," or maybe "for." It has no concept of intent — that this query implies categories like jewelry, perfume, handbags, or spa gift sets, filtered by price range, occasion, and maybe recipient age bracket.

This is the core limitation of lexical search: it matches strings, not meaning. Natural language e-commerce queries ("gift for my wife," "something cozy for winter under ₹2000," "office bag that isn't too formal") require semantic understanding — mapping a fuzzy human intent to a structured set of relevant products.

This post walks through a production-grade architecture for solving this using Amazon OpenSearch Service, combining:

Vector (k-NN) semantic search for meaning-based retrieval
Hybrid search (lexical + semantic) for precision and recall
Query understanding via an LLM (Amazon Bedrock) to extract structured intent from free text
Neural sparse / dense embeddings for product catalog indexing
Related-product recommendations using vector similarity

Architecture Overview

The two pillars are:

Ingestion pipeline: every product document gets a vector embedding generated automatically as it's indexed.
Search pipeline: every query is embedded the same way, then matched via approximate nearest-neighbor (ANN) search, optionally blended with classic keyword search.

OpenSearch's neural search feature (available since OpenSearch 2.9) and ML Commons plugin handle this end-to-end — you don't need a separate embedding microservice. OpenSearch calls out to Amazon Bedrock (or SageMaker) via a registered ML connector during both indexing and querying.

Step 1: Provision the OpenSearch Domain

Enable the k-NN plugin and choose FAISS as the vector engine (best performance/recall tradeoff for production e-commerce catalogs):

aws opensearch create-domain \
  --domain-name ecommerce-search \
  --engine-version OpenSearch_2.17 \
  --cluster-config InstanceType=r6g.large.search,InstanceCount=3,DedicatedMasterEnabled=true \
  --ebs-options EBSEnabled=true,VolumeType=gp3,VolumeSize=100 \
  --node-to-node-encryption-options Enabled=true \
  --encryption-at-rest-options Enabled=true \
  --domain-endpoint-options EnforceHTTPS=true

Give the domain an IAM role with permission to invoke Bedrock models — the ML connector uses this role to call bedrock:InvokeModel.

Step 2: Register an Embedding Model via an ML Connector

Amazon Titan Text Embeddings V2 (1024-dim, supports multilingual queries — useful if you serve Hindi/Tamil/regional-language queries alongside English) or Cohere Embed on Bedrock both work well for product catalogs. Create the connector:

import boto3, json

client = boto3.client("opensearch", region_name="ap-south-1")

connector_payload = {
    "name": "Bedrock Titan Embedding Connector",
    "description": "Connector to Titan Text Embeddings V2",
    "version": 1,
    "protocol": "aws_sigv4",
    "parameters": {
        "region": "ap-south-1",
        "service_name": "bedrock"
    },
    "credential": {
        "roleArn": "arn:aws:iam::<account-id>:role/OpenSearch-Bedrock-Invoke-Role"
    },
    "actions": [
        {
            "action_type": "predict",
            "method": "POST",
            "url": "https://bedrock-runtime.ap-south-1.amazonaws.com/model/amazon.titan-embed-text-v2:0/invoke",
            "headers": {"content-type": "application/json"},
            "request_body": "{ \"inputText\": \"${parameters.inputText}\" }"
        }
    ]
}

Register and deploy it through the ML Commons plugin (via _plugins/_ml/connectors/_create, then _plugins/_ml/models/_register and _deploy). Note the returned model_id — you'll reference it everywhere downstream.

Step 3: Build the Ingest Pipeline

The text_embedding processor automatically converts product text into a vector at index time:

PUT _ingest/pipeline/product-embedding-pipeline
{
  "description": "Generates embeddings for product catalog",
  "processors": [
    {
      "text_embedding": {
        "model_id": "<your_model_id>",
        "field_map": {
          "search_text": "product_vector"
        }
      }
    }
  ]
}

search_text is a concatenated, denormalized field you construct at write time — e.g. "{title} {category} {description} {occasion_tags} {gender_tags} {brand}". This gives the embedding model richer semantic surface area than the title alone. For "gift for my wife" to match a "Rose Gold Pendant Necklace," the embedding needs occasion/recipient metadata baked into that text, not just the product name.

Step 4: Create the k-NN Index

PUT /products
{
  "settings": {
    "index": {
      "knn": true,
      "default_pipeline": "product-embedding-pipeline"
    }
  },
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "description": { "type": "text" },
      "category": { "type": "keyword" },
      "brand": { "type": "keyword" },
      "price": { "type": "float" },
      "occasion_tags": { "type": "keyword" },
      "recipient_tags": { "type": "keyword" },
      "rating": { "type": "float" },
      "in_stock": { "type": "boolean" },
      "search_text": { "type": "text" },
      "product_vector": {
        "type": "knn_vector",
        "dimension": 1024,
        "method": {
          "name": "hnsw",
          "engine": "faiss",
          "space_type": "cosinesimil",
          "parameters": { "ef_construction": 256, "m": 16 }
        }
      }
    }
  }
}

HNSW with FAISS gives sub-50ms ANN latency even on catalogs with millions of SKUs, which is what makes this viable for real-time storefront search rather than just batch recommendations.

Bulk-index your catalog — each document is embedded automatically on write via the default pipeline.

Step 5: Query Understanding — Turning "Gift for My Wife" into Structure

This is the step most tutorials skip, and it's the difference between a demo and a production system. A raw neural k-NN query on "gift for my wife" will retrieve semantically similar products, but it won't reliably apply hard constraints like gender, budget, or category exclusions (you don't want power tools surfacing just because someone bought them "for their wife" in a training corpus).

Use a lightweight Bedrock LLM call (Claude Haiku or Amazon Nova Micro — cheap, low-latency, good enough for structured extraction) to parse intent before hitting OpenSearch:

import boto3, json

bedrock = boto3.client("bedrock-runtime", region_name="ap-south-1")

SYSTEM_PROMPT = """Extract shopping intent from the user query as strict JSON only.
Schema: { "semantic_query": string, "recipient": string|null,
"occasion": string|null, "category_hints": string[], "max_price": number|null,
"gender_hint": string|null }"""

def extract_intent(query: str) -> dict:
    resp = bedrock.invoke_model(
        modelId="anthropic.claude-haiku-4-5-20251001-v1:0",
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 200,
            "system": SYSTEM_PROMPT,
            "messages": [{"role": "user", "content": query}]
        })
    )
    result = json.loads(resp["body"].read())
    return json.loads(result["content"][0]["text"])

# extract_intent("gift for my wife under 3000 rupees")
# -> {
#   "semantic_query": "romantic gift jewelry perfume for wife",
#   "recipient": "wife", "occasion": "general gifting",
#   "category_hints": ["jewelry", "beauty", "accessories"],
#   "max_price": 3000, "gender_hint": "female"
# }

The semantic_query field is an expanded, embedding-friendly rewrite of the raw query — this alone meaningfully improves vector recall, since "gift for my wife" and "rose gold necklace" sit much closer in embedding space once rewritten than the original short query does.

The structured fields (max_price, gender_hint, category_hints) become hard filters in the OpenSearch query — this is important. Never rely on the embedding alone to enforce a budget constraint; vector similarity is fuzzy by nature and will happily return a ₹15,000 watch for a ₹3,000 budget query if nothing constrains it.

Step 6: Hybrid Search Query

Combine lexical BM25 (catches exact brand/product-name matches) with neural k-NN (catches semantic intent), using OpenSearch's hybrid query and a normalization search pipeline to blend scores fairly (BM25 and cosine similarity live on different scales):

PUT /_search/pipeline/hybrid-norm-pipeline
{
  "description": "Normalize and combine lexical + neural scores",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": { "technique": "min_max" },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": { "weights": [0.35, 0.65] }
        }
      }
    }
  ]
}

GET /products/_search?search_pipeline=hybrid-norm-pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "search_text": "gift wife jewelry perfume"
          }
        },
        {
          "neural": {
            "product_vector": {
              "query_text": "romantic gift jewelry perfume for wife",
              "model_id": "<your_model_id>",
              "k": 50
            }
          }
        }
      ]
    }
  },
  "post_filter": {
    "bool": {
      "must": [
        { "range": { "price": { "lte": 3000 } } },
        { "term": { "in_stock": true } },
        { "terms": { "recipient_tags": ["female", "unisex"] } }
      ]
    }
  },
  "size": 20
}

The 0.35/0.65 weighting favors semantic relevance over exact keyword match — tune this per vertical. Fashion/gifting categories benefit from higher semantic weight; spare-parts or electronics catalogs (where users search exact model numbers like "iPhone 15 128GB") benefit from higher lexical weight. This is worth A/B testing rather than guessing.

Once a user views or adds a product to cart, you can recommend related items using the same product embeddings — no separate recommendation model needed for a first version. This is a "more like this" k-NN query using the source product's own vector:

GET /products/_search
{
  "query": {
    "knn": {
      "product_vector": {
        "vector": "<embedding_of_current_product>",
        "k": 12
      }
    }
  },
  "post_filter": {
    "bool": {
      "must_not": [{ "term": { "_id": "<current_product_id>" } }],
      "filter": [{ "term": { "in_stock": true } }]
    }
  }
}

For cross-sell rather than pure similarity (e.g., recommending a jewelry box alongside a necklace, rather than another necklace), a stronger approach layers co-purchase collaborative filtering on top — index a second co_purchase_vector derived from user session/purchase graphs (e.g., via a nightly batch job using OpenSearch Ingestion's ML offline batch inference, or an Item2Vec model trained on order history) and blend it with the content-based vector using the same hybrid-combination technique shown above. Content embeddings answer "what looks similar"; co-purchase embeddings answer "what do people actually buy together" — you generally want both.

Handling Query Latency and Cost

A few practical considerations that matter once you're past the prototype:

Cache the LLM intent extraction. Common queries ("gift for wife," "gift for mother," "birthday gift for husband") repeat heavily across users. Cache query → structured intent in something like ElastiCache/Redis with a TTL of a few hours; this removes the LLM call from the hot path for most traffic and cuts both latency and Bedrock cost substantially.
Precompute embeddings, never embed at query time redundantly. The neural query processor calls the embedding model synchronously per search request — this is fine at moderate QPS, but at high traffic, consider caching embeddings for frequent/normalized queries too.
Use ef_search tuning on HNSW to trade recall for latency depending on catalog size; larger catalogs (>5M SKUs) may need IVF-based FAISS indexing instead of pure HNSW to control memory footprint.
Segment the fallback path. If the LLM intent-extraction call fails or times out, fall back to a pure hybrid query using the raw user query as both the lexical and semantic input — never let the LLM layer become a hard dependency for search availability.

Evaluating Relevance

Don't ship this on vibes. Before rolling out to production traffic:

Build a labeled eval set of ~200–500 realistic gifting/natural-language queries with human-judged relevant products (even a simple 0–3 relevance scale works).
Track NDCG@10 and Recall@20 for: (a) pure BM25 baseline, (b) pure neural, (c) hybrid at different weight splits, (d) hybrid + LLM query rewriting.
In most e-commerce catalogs, hybrid + LLM rewriting outperforms pure neural by a meaningful margin specifically on ambiguous, occasion-based queries like "gift for my wife" — because the rewrite step surfaces category signal that the raw short query lacks. Pure neural tends to win on descriptive queries ("cozy oversized sweater in earth tones"), where the raw query already carries rich semantic content.

The key insight is that natural-language e-commerce search isn't a single model problem — it's an orchestration problem. The LLM handles fuzzy intent extraction, embeddings handle semantic matching, and OpenSearch's hybrid pipeline arbitrates between semantic and lexical signal while hard filters keep the results commercially sane. None of these layers alone gets you from "gift for my wife" to a relevant, in-stock, budget-appropriate product list — but together, on infrastructure you already run for search, they do.