Research Engineer, Infrastructure, Kernels

$350k - $475k San Francisco

Posted 1mo ago

Remote Work Policy

On-site

Categories

AI Infrastructure Engineer

About the job

Thinking Machines Lab is seeking an infrastructure research engineer to design, optimize, and maintain the compute foundations for large-scale language model training. This role involves developing high-performance ML kernels, enabling efficient low-precision arithmetic, and improving the distributed compute stack. You will work closely with researchers and systems architects, bridging algorithmic design with hardware efficiency, prototyping new kernel implementations, and defining numerical and parallelism strategies for scaling AI systems.

Responsibilities

  • Design and implement custom ML kernels (e.g., CUDA, CuTe, Triton) for core LLM operations, optimized for modern GPU and accelerator architectures.
  • Design compute primitives to reduce memory bandwidth bottlenecks and improve kernel compute efficiency.
  • Collaborate with research teams to align kernel-level optimizations with model architecture and algorithmic goals.
  • Develop and maintain a library of reusable kernels and performance benchmarks for internal model training.
  • Contribute to infrastructure stability and scalability, ensuring reproducibility, consistency across precision formats, and high utilization of compute resources.
  • Document and share insights through internal talks, technical papers, or open-source contributions.

Requirements

  • Bachelor’s degree or equivalent experience in a relevant technical field.
  • Strong engineering skills with the ability to contribute performant, maintainable code and debug complex codebases.
  • Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures.
  • Ability to thrive in a highly collaborative environment with cross-functional partners.
  • Proactive mindset to take initiative and work across different stacks and teams.
  • Proficiency in CUDA, CuTe, Triton, or other GPU programming frameworks.
  • Demonstrated ability to analyze, profile, and optimize compute-intensive workloads.
  • Experience training or supporting large-scale language models (preferred).
  • Experience developing or tuning kernels for deep learning frameworks (preferred).
  • Familiarity with tensor parallelism, pipeline parallelism, or distributed data processing frameworks (preferred).
  • Experience implementing low-precision formats or contributing to related compiler stacks (preferred).
  • Contributions to open-source GPU, ML systems, or compiler optimization projects (preferred).

Benefits

  • Health, dental, and vision benefits
  • Unlimited PTO
  • Paid parental leave
  • Relocation support

About thinkingmachines

Get new AI jobs in your inbox

A weekly digest of the newest LLM, RAG, and AI agent engineering roles.

© 2026 AI Job Board. All rights reserved.