# Overview

Unified API for AI inference on decentralized GPU infrastructure.

> **Open Beta** — We're expanding our provider network and refining the platform. Expect new models and features regularly. Feedback welcome!

Relay provides standard REST endpoints for chat, images, speech, and video—abstracting away the complexity of GPU provisioning, load balancing, and blockchain settlement. Currently, all models are provided and maintained by the Relay team. Support for self-hosted models and provider onboarding is on the immediate roadmap.

## How It Works

Relay organizes resources as **mode.source.model**:

```
mode       → routing/billing category (direct or opengpu)
source     → service type (ollama, vllm, audio, image, video, anthropic, openai)
model      → specific model (gpt-oss:120b, sesame/csm-1b, etc.)
```

**Modes:**

* `direct` — Low-latency (<1s), pay-per-use
* `opengpu` — Decentralized execution on the [OpenGPU Network](https://opengpu.network), a permissionless compute network with redundant computing for reliability and censorship resistance

## Quick Example

```bash
curl -X POST https://relay.opengpu.network/v2/ollama/api/chat \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-oss:20b", "messages": [{"role": "user", "content": "Hello"}]}'
```

## Endpoints

| Category         | Endpoints                                                          | Description                                           |
| ---------------- | ------------------------------------------------------------------ | ----------------------------------------------------- |
| Chat (Ollama)    | `/v2/ollama/api/chat`                                              | Ollama-compatible LLM chat                            |
| Chat (vLLM)      | `/v2/vllm/v1/chat/completions`                                     | OpenAI-compatible LLM chat (streaming)                |
| Chat (OpenAI)    | `/v2/openai/v1/chat/completions`                                   | OpenAI models (streaming)                             |
| Chat (Anthropic) | `/v2/anthropic/v1/messages`                                        | Anthropic models (streaming)                          |
| Audio            | `/v2/audio/tts/sesame`, `/v2/audio/asr/whisper`                    | Text-to-speech, Speech-to-text                        |
| Image            | `/v2/image/flux/generate`, `/v2/image/gemini-3/generate`, and more | Multiple providers: Flux, Gemini, GPT Image, Qwen, SD |
| Video            | `/v2/video/wan-25/t2v`, `/v2/video/sora-2/t2v`, and more           | Multiple providers: Wan, Kling, Sora                  |

For full parameter details, see the [API Explorer](https://docs.relaygpu.com/reference/api-explorer).

## Get Started

1. [Quickstart](/relay/introduction/quickstart.md) - First request in 60 seconds
2. [Authentication](/relay/introduction/authentication.md) - API keys and tiers
3. [API Explorer](https://docs.relaygpu.com/reference/api-explorer) - Full endpoint reference


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://opengpu-network.gitbook.io/relay/introduction/readme.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
