Portfolio Jobs

companies

Jobs

My job alerts

Cloud Inference Engineer

Fugue, Inc

San Francisco, CA, USA

Posted on Mar 4, 2026

Apply now

Qualifications

CUDA + GPU inference optimization
vLLM, SGLang, or TensorRT-LLM experience
KV caching, paged attention, batching, token streaming, etc.
Distributed compute (with GPUs is a super plus)
No degree required

Company

Luminal (YC S25) builds an AI compiler and serving stack that makes models 10x faster and production ready with one line.

Role

Founding, on site in downtown SF. Ship low latency, high throughput model serving on Luminal Cloud.

Day To Day Responsibilities

Deploy and tune models with optimizations like KV caching, paged attention, sequence packing, etc.
Conducting model performance reviews
Improve scheduler, batcher, autoscaling; profile latency, cost, utilization
Sometimes write kernels and, yes, occasional tasteful shitposting

Apply now

See more open positions at Fugue, Inc

Privacy policy Cookie policy