AI Cloud Engineer

LockedIn AI

2 hours ago

Full-time

On-site

33 Irving Pl, Manhattan, New York, United States

Engineering

About LockedIn AI

LockedIn AI is the #1 real-time AI interview and meeting copilot, trusted by over 1 million users worldwide. We are building a next-generation AI career platform that helps users succeed in interviews, coding assessments, and professional communication using real-time AI assistance.

Our system runs on complex, high-performance AI infrastructure that must scale reliably, globally, and in real time.

Role Overview

We are hiring a cloud-native AI Cloud Engineer to design, build, and optimize the infrastructure powering LockedIn AI’s machine learning systems and real-time AI products.

This is a specialized role at the intersection of cloud engineering, distributed systems, and AI infrastructure. You will be responsible for the environments where models are trained, fine-tuned, deployed, and served at scale to over 1 million users.

You will own the full AI cloud stack — from GPU compute clusters to inference serving infrastructure and cost-optimized scaling systems.

Key Responsibilities

AI Cloud Architecture

Design cloud-native infrastructure for AI/ML workloads
Build GPU clusters for training, fine-tuning, and evaluation
Architect multi-environment setups (training, staging, production)
Optimize AWS/GCP/Azure systems for AI performance

Inference & Model Serving Infrastructure

Deploy and manage real-time AI inference systems (LLMs, STT, RAG)
Optimize serving frameworks like vLLM, Triton, TensorRT, or TGI
Improve latency, throughput, batching, and GPU utilization
Build failover, routing, and load balancing for AI endpoints

GPU & Distributed Compute

Manage GPU infrastructure for distributed training and inference
Configure multi-node training and model parallelism
Optimize scheduling for spot and reserved GPU instances
Automate scaling of compute resources based on demand

Cloud Cost Optimization (FinOps for AI)

Reduce cloud spend across GPU, storage, and inference workloads
Implement reserved/spot instance strategies for AI training
Track cost-per-inference and cost-per-training job metrics
Optimize LLM API usage and caching strategies

Infrastructure as Code & Automation

Build all infrastructure using Terraform, Pulumi, or CloudFormation
Automate provisioning of AI environments and services
Implement GitOps workflows for cloud infrastructure
Ensure reproducible and version-controlled cloud systems

Observability & Reliability

Monitor GPU health, inference latency, and system performance
Build dashboards for AI infrastructure metrics
Set up alerting for failures, spikes, and performance degradation
Ensure high availability of real-time AI systems

Security & Networking

Design secure cloud networks (VPC, IAM, encryption, access control)
Protect model weights, embeddings, and AI pipelines
Ensure compliance readiness (SOC2, GDPR, CCPA)
Secure inference endpoints and data flows

Required Qualifications

Experience

3+ years in cloud engineering, DevOps, or infrastructure roles
Experience with ML/AI production systems
Hands-on GPU infrastructure or AI deployment experience
Experience working with engineering and AI teams in production environments

Technical Skills

Cloud platforms: AWS, GCP, or Azure (deep expertise)
Kubernetes + Docker (production-grade usage)
Infrastructure as Code: Terraform / Pulumi / CloudFormation
AI serving frameworks: vLLM, Triton, TensorRT, or similar
Monitoring tools: Prometheus, Grafana, Datadog, CloudWatch
Python / Go / Bash for automation and tooling

Preferred Qualifications

Experience with LLM inference at scale
Distributed training systems (multi-GPU / multi-node)
Real-time systems (WebSockets, streaming, low-latency APIs)
Knowledge of RDMA, InfiniBand, or GPU networking
Multi-cloud infrastructure experience
Background in SaaS, edtech, or AI consumer products

What We Offer

Equity in a fast-growing AI company
Direct impact on a product used by 1M+ users
Remote-first flexibility with optional NYC collaboration
High ownership over AI infrastructure systems
Fast-paced startup environment with real technical challenges

Why Join LockedIn AI?

Build infrastructure powering real-time AI at scale
Work on GPU-heavy, latency-critical AI systems
Own cloud systems that directly impact model performance
Join a category-defining AI career tools platform
Operate at the frontier of applied AI infrastructure

How to Apply

Please submit:

Resume / CV
Short note including:
- Why you want to join LockedIn AI
- Whether you’ve used the product
- What improvements you would suggest
GitHub, projects, or technical writing (optional)

Equal Opportunity

LockedIn AI is committed to building a diverse and inclusive team. All hiring decisions are based on merit, skills, and business needs.

Apply now