Stability AI
https://stability.aiOpen positions (6)
Generative AI Inference Engineer
Unknown company· United States
Generative AI Inference Engineer About the role: We are seeking passionate Machine Learning Engineers to join our Inference team, focusing on the creative applications of generative AI models. The ideal candidate will have substantial experience developing and running inference for multi-modal models. A deep understanding of diffusion model architectures and familiarity with workflow tools like ComfyUI are a big plus. You will be expected to leverage and push the boundaries of state-of-the-art inference optimization techniques for multi-modal generative models. This role offers the opportunity to work alongside top researchers and engineers, utilizing cutting-edge high-performance computing resources to make a significant impact in the rapidly evolving field of generative AI. Responsibilities: Lead efforts to drive the design, development of customer-facing multi modal ML inference systems. Work with the Platform and Inference teams on building inference systems for the next generation of models, where you will work on areas such as optimization, model tuning and deployment. Partner with leading cloud providers to deliver hosted Stability AI inference solutions. Be a strategic thought partner for leaders across the organization on driving business impact through machine learning Be part of the team to bring new Stability models and pipelines into existence Prototype and productionize inference platform improvements and new features Qualifications: 7+ years working on productionizing machine learning systems, including inference pipeline development Expert level knowledge on writing and running python services at scale 5+ years working on python scientific stack, pyTorch and at least one high-performance inference framework (e.g. Triton and TensorRT) Deep understanding of Diffusion Architecture Experience profiling and optimizing deep neural networks on Nvidia GPUs, using profiling tools such as NVIDIA Nsight Experience with python-based image manipulation/encoding/decoding frameworks, such as OpenCV Experience deploying to cloud orchestration systems such as Kubernetes and cloud providers such as AWS, GCP, and Azure Experience with Docker Ability to rapidly prototype solutions and iterate on them with tight product deadlines Strong communication, collaboration, and documentation skills Experience with the open-source ML ecosystem (HuggingFace, W&B, etc.) Equal Employment Opportunity: We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.
2mo ago
Generative AI Inference Engineer
Unknown company· United States
Generative AI Inference Engineer About the role: We are seeking passionate Machine Learning Engineers to join our Inference team, focusing on the creative applications of generative AI models. The ideal candidate will have substantial experience developing and running inference for multi-modal models. A deep understanding of diffusion model architectures and familiarity with workflow tools like ComfyUI are a big plus. You will be expected to leverage and push the boundaries of state-of-the-art inference optimization techniques for multi-modal generative models. This role offers the opportunity to work alongside top researchers and engineers, utilizing cutting-edge high-performance computing resources to make a significant impact in the rapidly evolving field of generative AI. Responsibilities: Lead efforts to drive the design, development of customer-facing multi modal ML inference systems. Work with the Platform and Inference teams on building inference systems for the next generation of models, where you will work on areas such as optimization, model tuning and deployment. Partner with leading cloud providers to deliver hosted Stability AI inference solutions. Be a strategic thought partner for leaders across the organization on driving business impact through machine learning Be part of the team to bring new Stability models and pipelines into existence Prototype and productionize inference platform improvements and new features Qualifications: 7+ years working on productionizing machine learning systems, including inference pipeline development Expert level knowledge on writing and running python services at scale 5+ years working on python scientific stack, pyTorch and at least one high-performance inference framework (e.g. Triton and TensorRT) Deep understanding of Diffusion Architecture Experience profiling and optimizing deep neural networks on Nvidia GPUs, using profiling tools such as NVIDIA Nsight Experience with python-based image manipulation/encoding/decoding frameworks, such as OpenCV Experience deploying to cloud orchestration systems such as Kubernetes and cloud providers such as AWS, GCP, and Azure Experience with Docker Ability to rapidly prototype solutions and iterate on them with tight product deadlines Strong communication, collaboration, and documentation skills Experience with the open-source ML ecosystem (HuggingFace, W&B, etc.) Equal Employment Opportunity: We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.
2mo ago
Research Scientist – Controlled 3D Generation
Unknown company· Remote
Research Scientist – Controlled 3D Generation Location: Remote About the Role We’re seeking a Research Scientist passionate about 3D generation, flow matching, and diffusion models . You’ll help advance the frontier of controllable 3D content creation—building models that generate consistent, editable, and physically grounded 3D assets and scenes. What You’ll Do Conduct cutting-edge research on flow-matching, diffusion, and score-based methods for 3D generation and reconstruction. Design and implement scalable training pipelines for controllable 3D generation (meshes, Gaussians, NeRFs, voxels, implicit fields). Develop techniques for conditioning and control (text, sketch, pose, camera, physics) and multi-view consistency. Analyse model behaviour through ablations, visualisations, and quantitative metrics. Collaborate with cross-disciplinary research, graphics, and infrastructure teams to translate research into production-ready systems. Publish results at top-tier venues and work with interns. What You Bring PhD (or equivalent experience) in Machine Learning, Computer Vision, or Computer Graphics. Published work on diffusion, flow-matching, or score-based generative models (2D or 3D). Strong engineering and problem-solving abilities: experience with PyTorch, JAX, or CUDA-level optimisation . Understanding of 3D representations (meshes, Gaussians, signed-distance fields, volumetric grids, implicit networks). Solid grasp of geometry processing, multi-view consistency, and differentiable rendering . Ability to scale experiments efficiently and communicate complex results clearly. Bonus / Preferred Experience generating coherent 3D scenes with multiple interacting objects, lighting, and spatial layout. Familiarity with scene-level control (object placement, camera path, simulation, or text-to-scene composition). Knowledge of video-to-3D , image-to-scene , or 4D temporal generation . Background in physically-based rendering , simulation , or world-model architectures . Track record of impactful publications or open-source releases. Equal Employment Opportunity: We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.
2mo ago
Research Scientist – Controlled 3D Generation
Unknown company· Remote
Research Scientist – Controlled 3D Generation Location: Remote About the Role We’re seeking a Research Scientist passionate about 3D generation, flow matching, and diffusion models . You’ll help advance the frontier of controllable 3D content creation—building models that generate consistent, editable, and physically grounded 3D assets and scenes. What You’ll Do Conduct cutting-edge research on flow-matching, diffusion, and score-based methods for 3D generation and reconstruction. Design and implement scalable training pipelines for controllable 3D generation (meshes, Gaussians, NeRFs, voxels, implicit fields). Develop techniques for conditioning and control (text, sketch, pose, camera, physics) and multi-view consistency. Analyse model behaviour through ablations, visualisations, and quantitative metrics. Collaborate with cross-disciplinary research, graphics, and infrastructure teams to translate research into production-ready systems. Publish results at top-tier venues and work with interns. What You Bring PhD (or equivalent experience) in Machine Learning, Computer Vision, or Computer Graphics. Published work on diffusion, flow-matching, or score-based generative models (2D or 3D). Strong engineering and problem-solving abilities: experience with PyTorch, JAX, or CUDA-level optimisation . Understanding of 3D representations (meshes, Gaussians, signed-distance fields, volumetric grids, implicit networks). Solid grasp of geometry processing, multi-view consistency, and differentiable rendering . Ability to scale experiments efficiently and communicate complex results clearly. Bonus / Preferred Experience generating coherent 3D scenes with multiple interacting objects, lighting, and spatial layout. Familiarity with scene-level control (object placement, camera path, simulation, or text-to-scene composition). Knowledge of video-to-3D , image-to-scene , or 4D temporal generation . Background in physically-based rendering , simulation , or world-model architectures . Track record of impactful publications or open-source releases. Equal Employment Opportunity: We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.
2mo ago
Multimodal Generative AI Researcher
Unknown company· Remote
Multimodal Generative AI Researcher Location: Remote About the Role We’re looking for a Research Scientist with deep expertise in training and fine-tuning large Vision-Language and Language Models (VLMs / LLMs) for downstream multimodal tasks. You’ll help push the next frontier of models that reason across vision, language, and 3D , bridging research breakthroughs with scalable engineering. What You’ll Do Design and fine-tune large-scale VLMs / LLMs — and hybrid architectures — for tasks such as visual reasoning, retrieval, 3D understanding, and embodied interaction. Build robust, efficient training and evaluation pipelines (data curation, distributed training, mixed precision, scalable fine-tuning). Conduct in-depth analysis of model performance: ablations, bias / robustness checks, and generalisation studies. Collaborate across research, engineering, and 3D / graphics teams to bring models from prototype to production. Publish impactful research and help establish best practices for multimodal model adaptation. What You Bring PhD (or equivalent experience) in Machine Learning, Computer Vision, NLP, Robotics, or Computer Graphics. Proven track record in fine-tuning or training large-scale VLMs / LLMs for real-world downstream tasks. Strong engineering mindset — you can design, debug, and scale training systems end-to-end. Deep understanding of multimodal alignment and representation learning (vision–language fusion, CLIP-style pre-training, retrieval-augmented generation). Familiarity with recent trends, including video-language and long-context VLMs , spatio-temporal grounding , agentic multimodal reasoning , and Mixture-of-Experts (MoE) fine-tuning. Awareness of 3D-aware multimodal models — using NeRFs, Gaussian splatting, or differentiable renderers for grounded reasoning and 3D scene understanding. Hands-on experience with PyTorch / DeepSpeed / Ray and distributed or mixed-precision training. Excellent communication skills and a collaborative mindset. Bonus / Preferred Experience integrating 3D and graphics pipelines into training workflows (e.g., mesh or point-cloud encoding, differentiable rendering, 3D VLMs). Research or implementation experience with vision-language-action models , world-model-style architectures , or multimodal agents that perceive and act. Familiarity with efficient adaptation methods — LoRA, adapters, QLoRA, parameter-efficient finetuning, and distillation for edge deployment. Knowledge of video and 4D generation trends, latent diffusion / rectified flow methods, or multimodal retrieval and reasoning pipelines . Background in GPU optimisation, quantisation, or model compression for real-time inference. Open-source or publication track record in top-tier ML / CV / NLP venues. Equal Employment Opportunity: We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.
4mo ago
Multimodal Generative AI Researcher
Unknown company· Remote
Multimodal Generative AI Researcher Location: Remote About the Role We’re looking for a Research Scientist with deep expertise in training and fine-tuning large Vision-Language and Language Models (VLMs / LLMs) for downstream multimodal tasks. You’ll help push the next frontier of models that reason across vision, language, and 3D , bridging research breakthroughs with scalable engineering. What You’ll Do Design and fine-tune large-scale VLMs / LLMs — and hybrid architectures — for tasks such as visual reasoning, retrieval, 3D understanding, and embodied interaction. Build robust, efficient training and evaluation pipelines (data curation, distributed training, mixed precision, scalable fine-tuning). Conduct in-depth analysis of model performance: ablations, bias / robustness checks, and generalisation studies. Collaborate across research, engineering, and 3D / graphics teams to bring models from prototype to production. Publish impactful research and help establish best practices for multimodal model adaptation. What You Bring PhD (or equivalent experience) in Machine Learning, Computer Vision, NLP, Robotics, or Computer Graphics. Proven track record in fine-tuning or training large-scale VLMs / LLMs for real-world downstream tasks. Strong engineering mindset — you can design, debug, and scale training systems end-to-end. Deep understanding of multimodal alignment and representation learning (vision–language fusion, CLIP-style pre-training, retrieval-augmented generation). Familiarity with recent trends, including video-language and long-context VLMs , spatio-temporal grounding , agentic multimodal reasoning , and Mixture-of-Experts (MoE) fine-tuning. Awareness of 3D-aware multimodal models — using NeRFs, Gaussian splatting, or differentiable renderers for grounded reasoning and 3D scene understanding. Hands-on experience with PyTorch / DeepSpeed / Ray and distributed or mixed-precision training. Excellent communication skills and a collaborative mindset. Bonus / Preferred Experience integrating 3D and graphics pipelines into training workflows (e.g., mesh or point-cloud encoding, differentiable rendering, 3D VLMs). Research or implementation experience with vision-language-action models , world-model-style architectures , or multimodal agents that perceive and act. Familiarity with efficient adaptation methods — LoRA, adapters, QLoRA, parameter-efficient finetuning, and distillation for edge deployment. Knowledge of video and 4D generation trends, latent diffusion / rectified flow methods, or multimodal retrieval and reasoning pipelines . Background in GPU optimisation, quantisation, or model compression for real-time inference. Open-source or publication track record in top-tier ML / CV / NLP venues. Equal Employment Opportunity: We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.
4mo ago