Machine Learning Engineer (Distributed Training)

GW127
  • $250,000-$400,000
  • Santa Clara, CA
  • Permanent

About the job


Machine Learning Engineer (Distributed Training)


We are seeking a distributed training focussed Machine Learning Engineer to train, accelerate and deploy specialized foundation models for a $50m+ funded later stage scale up, building the worlds leading 3D foundation models.


The team were founded on the back of state of the art MIT research in computer graphics, the Founding team a mix of some of the most cited researchers in this space globally and commercially experienced veterans from teams like Nvidia and Microsoft. You'll be joining an elite team of Engineers and Researchers.


You join to build and optimize the worlds largest 3D native ML systems, working from lower levels, building an end to end ML framework from pretraining all the way through training, quantization and inferencing. You'll work closely with model researchers on the in-house foundation models, optimizing for throughput and efficiency.


We are seeking a Machine Learning Engineer (Distributed Training) with:


  • Experience of low level training optimizations including quantization and parallelism
  • Demonstrable experience in low level training systems which could include attention and softmax work. This may include kernel experience. 
  • Expertise in Pytorch


Location: Bay Area on a hybrid basis 


Compensation: Substantial cash and generous equity + potential for sign on bonuses


Tom Parker Senior Software Systems & HPC Recruiter

Apply for this role