Applied AI Engineer. Production ML systems. Frustrated by the missing infrastructure layer — so I started building it.
Over the past year, I kept running into the same problem across different parts of the stack — evaluation, serving, prompt optimization, agent tracing. The models are good. The tooling around using them well in production isn't. Aevyra is my attempt to build what's missing.
Started on TorchServe — #4 all-time contributor, 177+ merged PRs. Led integration of TensorRT-LLM, vLLM, torch.compile, and torch.export. Architected a 75% cost reduction in multi-model GPU inference. Established TorchServe as the industry standard for PyTorch serving at scale (4.4K+ stars).
Moved to the Enterprise Llama team deploying custom Llama solutions across organizations in different verticals — fine-tuning on proprietary datasets, data and model distillation, architectural customizations. Led technical strategy for Llama adoption across Meta's enterprise and partner ecosystems. Speaker at Google Cloud Next '25. Active contributor to Llama Cookbook (18.2K+ stars).
Prototyped production computer vision systems — instance segmentation, object detection and tracking, pose estimation, gesture recognition. Built the applied ML competency that translated directly into LLM inference work at Meta.
Layer 2 forwarding in Cisco's Nexus 7000 and 9000 data center switches. Built container telemetry infrastructure using Docker, Contiv, and Kibana. Early work at the intersection of systems engineering and machine learning for network serviceability.
RTL verification and design for image processors and graphics processors in a 3G modem SoC. The hardware foundation that informs how I think about performance and systems today.
If you're deploying agents in production and hitting walls — with tracing, evaluation, debugging, or knowing which model is actually right for your task — I'd like to hear about it.