Research Engineer, Infrastructure, Kernels

$350k - $475k • San Francisco

Posted 1mo ago

Job Location

San Francisco

Tech Stack

OpenAI Mistral LLM PyTorch Deep Learning JAX CUDA GPU CuTe Triton ML kernels distributed systems low-precision arithmetic

Remote Work Policy

On-site

About the job

Thinking Machines Lab is seeking an infrastructure research engineer to design, optimize, and maintain the compute foundations for large-scale language model training. This role involves developing high-performance ML kernels, enabling efficient low-precision arithmetic, and improving the distributed compute stack. You will work closely with researchers and systems architects, bridging algorithmic design with hardware efficiency, prototyping new kernel implementations, and defining numerical and parallelism strategies for scaling AI systems.

Responsibilities

Design and implement custom ML kernels (e.g., CUDA, CuTe, Triton) for core LLM operations, optimized for modern GPU and accelerator architectures.
Design compute primitives to reduce memory bandwidth bottlenecks and improve kernel compute efficiency.
Collaborate with research teams to align kernel-level optimizations with model architecture and algorithmic goals.
Develop and maintain a library of reusable kernels and performance benchmarks for internal model training.
Contribute to infrastructure stability and scalability, ensuring reproducibility, consistency across precision formats, and high utilization of compute resources.
Document and share insights through internal talks, technical papers, or open-source contributions.

Requirements

Bachelor’s degree or equivalent experience in a relevant technical field.
Strong engineering skills with the ability to contribute performant, maintainable code and debug complex codebases.
Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures.
Ability to thrive in a highly collaborative environment with cross-functional partners.
Proactive mindset to take initiative and work across different stacks and teams.
Proficiency in CUDA, CuTe, Triton, or other GPU programming frameworks.
Demonstrated ability to analyze, profile, and optimize compute-intensive workloads.
Experience training or supporting large-scale language models (preferred).
Experience developing or tuning kernels for deep learning frameworks (preferred).
Familiarity with tensor parallelism, pipeline parallelism, or distributed data processing frameworks (preferred).
Experience implementing low-precision formats or contributing to related compiler stacks (preferred).
Contributions to open-source GPU, ML systems, or compiler optimization projects (preferred).

Benefits

Health, dental, and vision benefits
Unlimited PTO
Paid parental leave
Relocation support

About thinkingmachines

View company profile