Overview

Unified API for AI inference on decentralized GPU infrastructure.

Open Beta — We're expanding our provider network and refining the platform. Expect new models and features regularly. Feedback welcome!

Relay provides standard REST endpoints for chat, images, speech, and video—abstracting away the complexity of GPU provisioning, load balancing, and blockchain settlement. Currently, all models are provided and maintained by the Relay team. Support for self-hosted models and provider onboarding is on the immediate roadmap.

How It Works

Relay organizes resources as mode.source.model:

mode       → routing/billing category (direct or opengpu)
source     → service type (ollama, vllm, audio, automatic1111, video)
model      → specific model (gpt-oss:120b, sesame/csm-1b, etc.)

Modes:

  • direct — Low-latency (<1s), pay-per-request

  • opengpu — Decentralized execution on the OpenGPU Networkarrow-up-right, a permissionless compute network with redundant computing for reliability and censorship resistance

Sources:

  • ollama — Chat (Ollama-compatible)

  • vllm — Chat (OpenAI-compatible)

  • audio — Text-to-speech, Speech-to-text

  • automatic1111 — Image generation

  • video — Text-to-video generation

Quick Example

Endpoints

Source
Endpoint
Description

ollama

/v2/ollama/api/chat

Ollama-compatible chat

vllm

/v2/vllm/v1/chat/completions

OpenAI-compatible chat

audio

/v2/audio/tts/sesame

Text-to-speech

audio

/v2/audio/asr/whisper

Speech-to-text

automatic1111

/v2/automatic1111/sdapi/v1/txt2img

Image generation

video

/v2/video/generate

Text-to-video generation

video

/v2/video/edit

Image-to-video generation

Get Started

  1. Quickstart - First request in 60 seconds

  2. Authentication - API keys and tiers

  3. Endpoints - Full documentation

Last updated