About LockedIn AI
LockedIn AI is the #1 real-time AI interview and meeting copilot, trusted by over 1 million users worldwide. We are building a next-generation AI career platform that helps users succeed in interviews, coding assessments, and professional communication using real-time AI assistance.
Our system runs on complex, high-performance AI infrastructure that must scale reliably, globally, and in real time.
Role Overview
We are hiring a cloud-native AI Cloud Engineer to design, build, and optimize the infrastructure powering LockedIn AI’s machine learning systems and real-time AI products.
This is a specialized role at the intersection of cloud engineering, distributed systems, and AI infrastructure. You will be responsible for the environments where models are trained, fine-tuned, deployed, and served at scale to over 1 million users.
You will own the full AI cloud stack — from GPU compute clusters to inference serving infrastructure and cost-optimized scaling systems.
Key Responsibilities
AI Cloud Architecture
-
Design cloud-native infrastructure for AI/ML workloads
-
Build GPU clusters for training, fine-tuning, and evaluation
-
Architect multi-environment setups (training, staging, production)
-
Optimize AWS/GCP/Azure systems for AI performance
Inference & Model Serving Infrastructure
-
Deploy and manage real-time AI inference systems (LLMs, STT, RAG)
-
Optimize serving frameworks like vLLM, Triton, TensorRT, or TGI
-
Improve latency, throughput, batching, and GPU utilization
-
Build failover, routing, and load balancing for AI endpoints
GPU & Distributed Compute
-
Manage GPU infrastructure for distributed training and inference
-
Configure multi-node training and model parallelism
-
Optimize scheduling for spot and reserved GPU instances
-
Automate scaling of compute resources based on demand
Cloud Cost Optimization (FinOps for AI)
-
Reduce cloud spend across GPU, storage, and inference workloads
-
Implement reserved/spot instance strategies for AI training
-
Track cost-per-inference and cost-per-training job metrics
-
Optimize LLM API usage and caching strategies
Infrastructure as Code & Automation
-
Build all infrastructure using Terraform, Pulumi, or CloudFormation
-
Automate provisioning of AI environments and services
-
Implement GitOps workflows for cloud infrastructure
-
Ensure reproducible and version-controlled cloud systems
Observability & Reliability
-
Monitor GPU health, inference latency, and system performance
-
Build dashboards for AI infrastructure metrics
-
Set up alerting for failures, spikes, and performance degradation
-
Ensure high availability of real-time AI systems
Security & Networking
-
Design secure cloud networks (VPC, IAM, encryption, access control)
-
Protect model weights, embeddings, and AI pipelines
-
Ensure compliance readiness (SOC2, GDPR, CCPA)
-
Secure inference endpoints and data flows
Required Qualifications
Experience
-
3+ years in cloud engineering, DevOps, or infrastructure roles
-
Experience with ML/AI production systems
-
Hands-on GPU infrastructure or AI deployment experience
-
Experience working with engineering and AI teams in production environments
Technical Skills
-
Cloud platforms: AWS, GCP, or Azure (deep expertise)
-
Kubernetes + Docker (production-grade usage)
-
Infrastructure as Code: Terraform / Pulumi / CloudFormation
-
AI serving frameworks: vLLM, Triton, TensorRT, or similar
-
Monitoring tools: Prometheus, Grafana, Datadog, CloudWatch
-
Python / Go / Bash for automation and tooling
Preferred Qualifications
-
Experience with LLM inference at scale
-
Distributed training systems (multi-GPU / multi-node)
-
Real-time systems (WebSockets, streaming, low-latency APIs)
-
Knowledge of RDMA, InfiniBand, or GPU networking
-
Multi-cloud infrastructure experience
-
Background in SaaS, edtech, or AI consumer products
What We Offer
-
Equity in a fast-growing AI company
-
Direct impact on a product used by 1M+ users
-
Remote-first flexibility with optional NYC collaboration
-
High ownership over AI infrastructure systems
-
Fast-paced startup environment with real technical challenges
Why Join LockedIn AI?
-
Build infrastructure powering real-time AI at scale
-
Work on GPU-heavy, latency-critical AI systems
-
Own cloud systems that directly impact model performance
-
Join a category-defining AI career tools platform
-
Operate at the frontier of applied AI infrastructure
How to Apply
Please submit:
-
Resume / CV
-
Short note including:
-
Why you want to join LockedIn AI
-
Whether you’ve used the product
-
What improvements you would suggest
-
GitHub, projects, or technical writing (optional)
Equal Opportunity
LockedIn AI is committed to building a diverse and inclusive team. All hiring decisions are based on merit, skills, and business needs.