Semantic Search API
activeA FastAPI service for semantic search over documents using S3 Vectors and Bedrock embeddings.
PythonFastAPIS3 VectorsBedrockPydantic
Overview
A production-ready API for semantic search. Upload documents, automatically chunk and embed them, then search with natural language queries.
Architecture
Client → FastAPI → S3 Vectors
↓
Bedrock (Cohere Embed)
- Ingest: Documents chunked with overlap, embedded via Bedrock
- Index: Vectors stored in S3 Vectors with metadata
- Query: Query embedded, k-NN search, results ranked
Key Features
- Automatic chunking — Configurable chunk size and overlap
- Metadata filtering — Filter by document type, date, tags
- Hybrid search — Combine semantic and keyword matching
- Batch processing — Async document ingestion
API Design
@app.post("/documents")
async def ingest_document(
file: UploadFile,
metadata: DocumentMetadata = Depends()
) -> IngestResponse:
chunks = chunk_document(file)
embeddings = await embed_batch(chunks)
await index_vectors(embeddings, metadata)
return IngestResponse(chunks=len(chunks))
@app.post("/search")
async def search(
query: str,
filters: SearchFilters = None,
limit: int = 10
) -> SearchResponse:
embedding = await embed(query)
results = await vector_search(embedding, filters, limit)
return SearchResponse(results=results)
Performance
- p50 search latency: 45ms
- p99 search latency: 120ms
- Throughput: 500 queries/second (single instance)
Scales horizontally — S3 Vectors handles the vector search, API is stateless.