kalyan
research papers before graduation, production AI after --
I build systems that think, remember, and ship.
everything else is configuration.
Experience
5 published papers, then production AI at Guidewire -- voice agents, claim summarizers, knowledge graphs. I build things that work for real users, not things that work in demos.
- * Cut LLM claim-summarization latency materially by tracing every capture point end to end and slimming payloads
- * Built a multi-judge LLM eval framework: 5 judge models across 5 families, 5 metrics per field, 95% confidence intervals, automated HTML reports
- * Shipped a production claim-summarization agent end to end -- data model, persistence, sync REST API, event-trigger wiring
- * Built and shipped a production event-scheduling agent end to end -- Slack intake, scheduling, room/resource booking, ServiceNow ticketing, reminders, and feedback -- as a stateful LangGraph workflow (FastAPI + Postgres)
- * Range: a real-time voice claims-intake prototype, a voice (MCP) capability merged into the AI developer platform, and an AI-engineering competency framework I authored
- * Built deep learning models for underwater object detection using VLMs on Side-Scan Sonar imagery
- * Co-authored and published TRAM: Transformer-Based Mask R-CNN for sonar data
- * Cut LLM API cost ~20% on a production SaaS (AWS) by batching and right-sizing model calls instead of one large request per task
- * Built MedGPT -- diagnostic AI deployed at AIMS Kochi Hospital, improved accuracy from 30% to 80%
- * Co-authored 4 research papers across medical AI, computer vision, and NLP
What I Do
Agent Architectures
Multi-agent orchestration, tool use, memory, voice pipelines. LangGraph, MCP, Claude Code agents.
AI Evaluation & Quality
Eval frameworks, prompt versioning, LLM-as-judge, CI/CD for agents. promptfoo, multi-judge eval harnesses, anchored rubrics with confidence intervals.
Enterprise AI Integration
MCP servers, enterprise knowledge search, insurance claim processing. ClaimCenter and InsuranceNow integrations.
Cognitive Tooling
Knowledge graph (400+ entities), 30+ custom skills, session hooks, persistent memory. Systems engineering for thinking.
Research to Production
5 published papers, then shipped to real hospitals and real users. IISc Bangalore, AIMS Kochi, Amrita University.
Workshops & Enablement
Agent-building workshop (3 agent types), AI Agents & MCPs tech talk, ML Bootcamp co-lead (500+ participants).
The Lab
agam
FEATUREDPublished open-source memory system for AI coding agents. Built on a contrarian bet: proactively inject relevant context instead of retrieving it on demand. Sole author and maintainer, 300+ tests.
Cognitive OS
The private system agam is distilled from. Multi-agent setup on Claude Code: 30+ custom skills, 14 specialized agents, SQLite knowledge graph with 400+ entities, session hooks, and memory that persists across conversations.
Voice FNOL
FEATUREDVoice-based insurance claim intake. Parallel LLM pipeline (conversation + extraction), local STT via mlx-whisper, on-device TTS, 3D audio-reactive visualizer.
MedGPT
Diagnostic AI deployed at AIMS Kochi Hospital. Improved accuracy from 30% to 80% across clinical workflows.
VoiceAI
Automated customer interactions -- product explanations, event registration. Integrated with Indian IVR systems.
Kanaku
AI-powered dead stock management for FMCG distributors. 221 real products, 74 stores, statistical ML scoring, WhatsApp integration. Deployed on EC2.
JobScrapper
AI job matching pipeline. Two-layer scoring (heuristic + semantic), automated scraping, email alerts for subscribers.
SheLaw
Legal assistance platform for women. RAG with multi-query retrieval over legal documents.
Open Source
fetching live from github...
Research
5 published papers across computer vision, NLP, and medical AI. Research at IISc Bangalore and Amrita University.
TRAM: Transformer-Based Mask R-CNN for Underwater Object Detection in Side-Scan Sonar Data
MedFlorence2: Fine-tuning Small VLMs for Medical Question Answering
MultiView Material Classification
Estimation of Chronic Academic Stress using Short Form Video Contents
Early Detection of Cerebral Palsy in Children with GAIT
Gender Bias Mitigation in LLMs
The Stack
agent-architectures
Multi-agent systems with tool use, memory, and orchestration. Built 14 specialized agents and a knowledge graph that tracks 400+ entities across projects.
production-ai
LLM eval frameworks, prompt versioning, voice AI pipelines, RAG with pgvector. Systems that handle real insurance claims and real hospital diagnostics.
full-stack
Python (FastAPI, Django) and TypeScript (Next.js, Fastify). Most things I build have a backend talking to a database and a frontend someone actually uses.
tooling
MCP servers, CLI tools, automation bots, Claude Code skills. I build the tools I wish existed, then use them daily.