Staff ai runtime engineer
BangaloreScaling Theory Technologies Pvt Ltd
...and inference at scale.- Design resilient and elastic runtime features (e. g. dynamic node scaling, job recovery) within our custom PyTorch stack.- Optimise distributed training reliability, orchestration, and job-level fault tolerance.- Profile and enhance low-level system performance across training and inference pipelines.- [...]
Category IT & Telecommunications