The GPU Cloud for AI Developers

Develop, train, and deploy AI models faster than ever. No code changes needed.

Reproducible Dev Environments

Provision a cloud GPU machine with your environment, code, and SSH keys ready to go.

Connect to it through SSH or your favorite IDE, or open it inside a Jupyter notebook.

~/project/dev.yaml
resources:
  accelerators: A100:1
workdir: ~/my_project
setup:
  pip install -r requirements.txt
>_
komo machine launch ~/project/dev.yaml --name dev
~/project/train.yaml
resources:
  accelerators: A100:8
workdir: ~/my_project
setup:
  pip install -r requirements.txt
run:
  python train.py --ngpus 8
>_
komo job launch ~/project/train.yaml

Serverless Jobs

Launch batch jobs for tasks such as training, fine-tuning, and data processing. Easily scale to multiple nodes for distributed jobs.

Once your job completes, the cloud instances will automatically be terminated. Never pay for idle instances ever again.

Infinitely Scalable Models

Deploy AI models behind a safe and secure endpoint. With built-in load balancing and autoscaling, your models scale with traffic, ensuring you only pay for the compute you need.

Use the serving framework of your choice (vLLM, Triton, etc.) for maximum flexibility.

~/project/serve.yaml
resources:
  accelerators: A100:1
  ports: 8000
workdir: ~/my_project
envs:
  HF_TOKEN: MY_TOKEN
setup:
  pip install -r requirements.txt
run:
  python -u -m vllm.entrypoints.openai.api_server \
    --port 8000 \
    --model meta-llama/Meta-Llama-3-8B-Instruct \
    ---trust-remote-code \
    --gpu-memory-utilization 0.95 \
    --max-num-seqs 64
service:
  replica_policy:
    min_replicas: 1
    max_replicas: 3
    target_qps_per_replica: 5
  
  # An actual request for readiness probe.
  readiness_probe:
    initial_delay_seconds: 1800
    path: /v1/chat/completions
    post_data:
      model: meta-llama/Meta-Llama-3-8B-Instruct
      messages:
        - role: user
          content: Hello! What is your name?
      max_tokens: 1
>_
komo service launch ~/project/serve.yaml --name llama3
AWS
GCP
Azure
Kubernetes

Multi-Cloud Execution

Use the Komodo Cloud or bring your own cloud account to consume your cloud credits and keep your data within your own private VPC.

With built-in Kubernetes support, you can seamlessly overflow from your on-prem cluster to the cloud.