# Prathamesh Saraf

> Senior Forward Deployed Engineer · GenAI Architect. I help enterprise teams ship production GenAI: voice agents, agentic workflows, RAG, and the infrastructure to make them stick.

Based in Remote (US). Contact: pratamesh1867@gmail.com.

## Profile
- [Portfolio](https://prathameshsaraf.com/)
- [LinkedIn](https://www.linkedin.com/in/sarafpr)
- [X / Twitter](https://x.com/S1LV3R_J1NX)
- [GitHub](https://github.com/S1LV3RJ1NX)
- [Toptal: Top 3% AI Specialization](https://www.toptal.com/developers/resume/prathamesh-saraf#PzaPn5)
- [Resume PDF](/resume.pdf)

## Engagements
- CVS Health × TrueFoundry, Senior Forward Deployed Engineer (Feb 2024 to present, Remote (USA))
- ChatOwl, Technical Lead · Founding Engineer (Dec 2022 to Jan 2024, Remote (USA))
- Indian Institute of Science (IISc), Graduate Researcher, Cloud Systems Lab (Aug 2021 to Apr 2024, Bangalore, India)
- Saarthi.ai, Chatbot Developer (Aug 2020 to Aug 2021, Bangalore, India)

## Case studies
- [Ferremundo AI: fine-tuned retrieval, image embeddings, and LangGraph ordering for B2B hardware distribution](https://prathameshsaraf.com/case-studies/ferremundo-ai/): An AI product-matching and ordering layer for a Latin American B2B hardware distributor. Fine-tuned multilingual E5 text embeddings, contrastively fine-tuned SigLIP-2 image search, a LangGraph agent with Postgres or Redis checkpointing, and a production stack on AWS RDS, ECS, and Datadog APM.
- [Praxium: source-grounded, narrated courses from any PDF](https://prathameshsaraf.com/case-studies/praxium/): A SaaS that ingests a PDF and produces a structured, narrated course: personas, ABCD learning objectives, failure-mode analysis, a hierarchical outline, source-grounded per-subsection content with verbatim reference snippets, inline micro-checks (matching, ordering, fill-blank), KaTeX equations, end-of-lesson MCQs with confetti, narrated Synthesia videos, and a retrieval-grounded chat over the source. Driven by a resumable, multi-state instructional-design state machine, with a multi-dimension eval harness that scores coverage, grounding, coherence, persona alignment, and MCQ quality.
- [Scalable, cost-effective voice agents: a platform-based blueprint](https://prathameshsaraf.com/case-studies/voice-agents-blueprint-cvs/): A hierarchical voice-agent platform for a Fortune-5 healthcare buyer handling millions of daily customer interactions. A Master Agent orchestrates specialized SLM- and LLM-powered sub-agents; a tiered model strategy plus high-precision intent classification cut the theoretical tens-of-millions-of-LLM-calls-per-day workload by over 90%. Public co-authored blueprint on the CVS Health Tech Blog, Feb 2026.
- [From IVR to agentic: multi-vector retrieval for pharmacy intent classification](https://prathameshsaraf.com/case-studies/ivr-to-agentic/): Replacing a BERT plus LLaMA hybrid intent classifier in CVS Health's pharmacy IVR with a multi-vector retrieval pipeline on Qdrant: BM25 sparse retrieval, a fine-tuned dense encoder, ColBERT late-interaction reranking, and an LLM as the final classifier and out-of-scope filter on the top five candidates. Weighted F1 from 0.58 to 0.86 across roughly 1.5M daily customer interactions and 32 distinct pharmacy intents. The public retrospective is on the CVS Health Tech Blog.
- [MCP Gateway Catalog: one catalog, many tools, unified auth](https://prathameshsaraf.com/case-studies/mcp-registry/): A live product surface that lets any team browse 47+ Model Context Protocol servers, complete OAuth or API-key handshakes, and try every tool from the browser, without a local agent host.
- [MCP-Guardian: putting MCP on a diet](https://prathameshsaraf.com/case-studies/mcp-guardian/): An MCP proxy that replaces hundreds of tool schemas with three meta-tools, cutting 160k+ startup tokens to 456 (a 99.7% reduction), and adds scoping, audit, and OAuth-aware fan-out to any upstream MCP server, with no client changes.
- [TrueMem: a model-agnostic memory layer for AI applications](https://prathameshsaraf.com/case-studies/truemem/): A persistent, two-tier (short-term + long-term) memory service for LLM applications. Distilled facts replace verbatim history, semantic retrieval surfaces what matters, and the same memory follows users across any model.
- [CogenticAI DB Agent: natural-language database queries for an enterprise SaaS](https://prathameshsaraf.com/case-studies/db-agent/): A production-grade NL-to-SQL agent built with Google ADK and FastAPI: a SQL Generator, an SQL Executor with retry-with-feedback, and a Response Generator orchestrated as a sequential + loop agent. Converts business questions into safe, read-only Postgres queries and natural-language answers.
- [AIME: meeting intelligence and voice agents, end-to-end](https://prathameshsaraf.com/case-studies/aime-meetings/): An AI meeting and voice-agent platform (capture, transcribe, summarize, retrieve, and run live agents on top) built as a clean separation of a Python backend and a modern web operator console so each side can evolve independently.
- [Yukti: a workable, end-to-end RAG stack](https://prathameshsaraf.com/case-studies/yukti-rag/): A complete RAG application built to be understood: ingestion, embeddings, retrieval, an evaluation harness, and a UI. Small enough to read, real enough to deploy.

## Publications
- [CARL: Cost-Optimized Online Container Placement on VMs Using Adversarial Reinforcement Learning](https://ieeexplore.ieee.org/abstract/document/10839070/), IEEE Transactions on Cloud Computing, 2025

## Featured writing
- [Building Scalable and Cost-Effective Voice Agents: A Platform-Based Blueprint](https://medium.com/cvs-health-tech-blog/building-scalable-and-cost-effective-voice-agents-a-platform-based-blueprint-fae6ee5881c9), CVS Health Tech Blog
- [TrueMem: Building a Model-Agnostic Memory Layer for AI](https://www.truefoundry.com/blog/truemem-building-a-model-agnostic-memory-layer-for-ai), TrueFoundry engineering blog · byline
- [Transforming Customer Interactions: Evolving IVR Systems for Enhanced Experiences](https://medium.com/cvs-health-tech-blog/transforming-customer-interactions-evolving-ivr-systems-for-enhanced-experiences-73b12c7f5aea), CVS Health Tech Blog
- [My Adventures with LLMs](https://leanpub.com/adventures-with-llms), Leanpub · book

## Open source
- [MCP-Guardian](https://github.com/S1LV3RJ1NX/mcp-guardian): MCP proxy that replaces hundreds of tool schemas with three meta-tools, a 99.7% startup-token reduction with scoping, audit, and OAuth-aware fan-out. Talk at Linux Foundation MCP Dev Summit Bengaluru, June 2026.
- [MAL-Code](https://github.com/S1LV3RJ1NX/mal-code): Companion code for the book *My Adventures with LLMs*. Transformers to DeepSeek in PyTorch, from scratch.
- [AIME](undefined): AI meeting intelligence and voice agents, end-to-end: capture, transcribe, summarize, retrieve, and run live agents on top, with a self-hosted meeting bot for ingest.
- [Yukti](undefined): A complete, end-to-end RAG stack built to be understood: ingestion, embeddings, retrieval, an eval harness, and an operator console. Small enough to read, real enough to deploy.
- [PaymentTracking](https://github.com/S1LV3RJ1NX/PaymentTracking): PWA expense and income tracker for freelance sole-proprietors. Claude-powered OCR over invoices and FIRA certificates, live Google Sheets ledger, India-tax calculator (Sec 44ADA, new regime), all on Cloudflare Pages and R2.
- [Cognita](https://github.com/truefoundry/cognita): Open-source RAG framework I co-built at TrueFoundry. Production-ready primitives for ingestion, retrieval, and serving.

## Optional
- [Full content (llms-full.txt)](https://prathameshsaraf.com/llms-full.txt)