Research Engineer, Infrastructure, Inference
$350k - $475k • San Francisco
Posted 1mo ago
About the job
Thinking Machines Lab is seeking an infrastructure research engineer to design, optimize, and scale the systems that power large AI models. The goal is to make inference faster, more cost-effective, more reliable, and more reproducible, enabling research teams to focus on advancing model capabilities. This role is crucial for ensuring that every experiment, evaluation, and deployment runs smoothly at scale, with a focus on performant and efficient model inference for both real-world applications and research acceleration.
Responsibilities
- Bring cutting-edge AI models into production in collaboration with researchers and engineers.
- Enable high-performance inference for novel architectures by collaborating with research teams.
- Design and implement new techniques, tools, and architectures to improve performance, latency, throughput, and efficiency.
- Optimize codebase and compute fleet (e.g., GPUs) to maximize hardware FLOPs, bandwidth, and memory utilization.
- Extend orchestration frameworks (e.g., Kubernetes, Ray, SLURM) for distributed inference, evaluation, and large-batch serving.
- Establish standards for reliability, observability, and reproducibility across the inference stack.
- Publish and share learnings through internal documentation, open-source libraries, or technical reports to advance scalable AI infrastructure.
Requirements
- Bachelor's degree or equivalent experience in computer science, engineering, or a related field.
- Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their system architectures.
- Experience with inference serving systems optimized for throughput and latency (e.g., SGLang, vLLM).
- Ability to thrive in a highly collaborative environment with cross-functional partners.
- Proactive and initiative-driven mindset to work across different stacks and teams.
- Strong engineering skills with the ability to contribute performant, maintainable code and debug complex codebases.
- Experience training or supporting large-scale language models (preferred).
- Understanding of distributed compute systems, GPU parallelism, and hardware-aware optimizations (preferred).
- Contributions to open-source ML or systems infrastructure projects (e.g., SGLang, vLLM, PyTorch, Triton, DeepSpeed, XLA) (preferred).
- Track record of improving research productivity through infrastructure design or process improvements (preferred).
Benefits
- Generous health, dental, and vision benefits
- Unlimited PTO
- Paid parental leave
- Relocation support