Lead/staff ai runtime engineer - llm/pytorch
BangaloreTalent Pro
...model lifecycle : training, check pointing, fault recovery, packaging, and deployment.- Implement observability hooks, diagnostics, and resilience mechanisms for deep learning workloads.- Champion best practices in CI/CD, testing, and software quality across the AI Runtime stack.Collaborate & Mentor : - Work cross-functionally [...]
Category IT & Telecommunications