Infrastructure Engineer, Security

$200k - $475k San Francisco

Posted 1mo ago

Remote Work Policy

On-site

Categories

AI Infrastructure Engineer

About the job

Thinking Machines Lab is seeking an infrastructure engineer to lead and enhance the security infrastructure for their foundation models. This role involves working across compute, storage, networking, and data platforms to ensure systems are secure, reliable, and scalable. The engineer will define security controls, architecture, and tooling, integrating security by default into the platform. Collaboration with research and product teams will be key to enabling rapid progress while maintaining robust protection for models, data, and environments.

Responsibilities

  • Architect security patterns for platforms and services, including network segmentation, service-to-service authentication, RBAC, and policy enforcement in Kubernetes and cloud environments.
  • Manage identity, access, and secrets for humans and services, covering workload and cross-cloud identity, least-privilege IAM, and secrets management.
  • Build secure platforms for data ingestion, processing, and curation, implementing classification, encryption, access controls, and safe sharing patterns.
  • Develop threat models and review designs with researchers and engineers to ensure safe and scalable feature and experiment shipping.
  • Automate security checks and establish guardrails through policy-as-code, secure infrastructure baselines, CI/CD validation, and user-friendly security tools.

Requirements

  • Bachelor’s degree or equivalent experience in engineering or a related field.
  • Strong background in containers and orchestration (e.g., Kubernetes) and their security (namespaces, network policies, pod security, admission controls).
  • Practical experience with Infrastructure as Code (Terraform or similar) for provisioning networks, IAM, and shared services.
  • Solid understanding of cloud networking and security concepts (VPCs, load balancers, service discovery, mTLS, firewalls, zero-trust architectures).
  • Proficiency in a systems language like Rust and scripting in Python for platform components and tools.
  • Demonstrated experience owning complex, production-critical systems and debugging cross-layer issues.
  • Experience with ML infrastructure, GPU clusters, or large-scale training environments is preferred.
  • Background in AI labs, HPC environments, or ML-heavy organizations is preferred.
  • Experience profiling and tuning high-throughput systems is preferred.
  • Familiarity with securing specialized hardware (GPUs, TPUs) and their integration into pipelines is preferred.

Benefits

  • Generous health, dental, and vision benefits
  • Unlimited PTO
  • Paid parental leave
  • Relocation support

About thinkingmachines

Get new AI jobs in your inbox

A weekly digest of the newest LLM, RAG, and AI agent engineering roles.

© 2026 AI Job Board. All rights reserved.