Software Engineer, Data Infrastructure

$350k - $475k San Francisco

Posted 1mo ago

Remote Work Policy

On-site

Categories

AI Infrastructure Engineer

About the job

Thinking Machines Lab is seeking an engineer to join a high-impact team focused on data infrastructure. This role is crucial for architecting and scaling the core systems that power distributed training pipelines, multimodal data catalogs, and intelligent processing of petabytes of data. You will work directly with researchers to accelerate experiments, develop new datasets, enhance infrastructure efficiency, and derive key insights from our data assets. If you are passionate about distributed systems, large-scale data mining, and building foundational tools from the ground up, we encourage you to apply.

Responsibilities

  • Design, build, and operate scalable, fault-tolerant infrastructure for LLM Research, including distributed compute, data orchestration, and storage across modalities.
  • Develop high-throughput systems for data ingestion, processing, and transformation, covering training data catalogs, deduplication, quality checks, and search.
  • Build systems for traceability, reproducibility, and robust quality control throughout the data lifecycle.
  • Implement and maintain monitoring and alerting systems to ensure platform reliability and performance.
  • Collaborate with research teams to enable new features, improve data quality, and expedite training cycles.

Requirements

  • Bachelor's degree or equivalent experience in computer science, engineering, or a related field.
  • Proficiency in at least one backend language, such as Python or Rust.
  • Fluency in distributed compute frameworks like Apache Spark or Ray.
  • Deep familiarity with cloud infrastructure, data lake architectures, and batch/streaming pipelines.
  • Comfort operating across the full stack and owning projects end-to-end.
  • Ability to thrive in a highly collaborative environment with cross-functional partners and subject matter experts.
  • A proactive approach with a bias for action to drive initiatives across different stacks and teams.
  • Hands-on experience with Kafka, dbt, Terraform, and Airflow is preferred.
  • Experience building a web crawler is a plus.
  • Extensive experience in scaling deduplication, data mining, and search is beneficial.
  • Strong knowledge of file formats and storage systems (e.g., Parquet, Delta Lake) and their impact on performance and scalability.
  • Proactive about documentation, testing, and empowering teammates with good tooling.

Benefits

  • Generous health, dental, and vision benefits
  • Unlimited PTO
  • Paid parental leave
  • Relocation support

About thinkingmachines

Get new AI jobs in your inbox

A weekly digest of the newest LLM, RAG, and AI agent engineering roles.

© 2026 AI Job Board. All rights reserved.