Skip to content
AI-Native Portfolio

Semantic Search API

active

A FastAPI service for semantic search over documents using S3 Vectors and Bedrock embeddings.

PythonFastAPIS3 VectorsBedrockPydantic

Overview

A production-ready API for semantic search. Upload documents, automatically chunk and embed them, then search with natural language queries.

Architecture

Client → FastAPI → S3 Vectors
                 ↓
              Bedrock (Cohere Embed)
  1. Ingest: Documents chunked with overlap, embedded via Bedrock
  2. Index: Vectors stored in S3 Vectors with metadata
  3. Query: Query embedded, k-NN search, results ranked

Key Features

  • Automatic chunking — Configurable chunk size and overlap
  • Metadata filtering — Filter by document type, date, tags
  • Hybrid search — Combine semantic and keyword matching
  • Batch processing — Async document ingestion

API Design

@app.post("/documents")
async def ingest_document(
    file: UploadFile,
    metadata: DocumentMetadata = Depends()
) -> IngestResponse:
    chunks = chunk_document(file)
    embeddings = await embed_batch(chunks)
    await index_vectors(embeddings, metadata)
    return IngestResponse(chunks=len(chunks))

@app.post("/search")
async def search(
    query: str,
    filters: SearchFilters = None,
    limit: int = 10
) -> SearchResponse:
    embedding = await embed(query)
    results = await vector_search(embedding, filters, limit)
    return SearchResponse(results=results)

Performance

  • p50 search latency: 45ms
  • p99 search latency: 120ms
  • Throughput: 500 queries/second (single instance)

Scales horizontally — S3 Vectors handles the vector search, API is stateless.

Related Content