# OpenAI

## Create chat completion (OpenAI-compatible)

> Create a chat completion using OpenAI-compatible API.\
> \
> \## Authentication\
> \*\*Optional\*\* — omit for guest access (lower rate limits).\
> \
> \| Method | Header |\
> \|--------|--------|\
> \| API Key | \`X-API-Key: relay\_sk\_...\` |\
> \| Bearer Token | \`Authorization: Bearer relay\_sk\_...\` |\
> \
> If both are provided, \`X-API-Key\` takes priority.\
> \
> \*\*Available Models:\*\* See \`GET /v2/models\` for the current list. Key models:\
> \
> \| Model | Vendor | Specialty |\
> \|-------|--------|-----------|\
> \| \`openai/gpt-5.2\` | OpenAI | General-purpose reasoning |\
> \| \`openai/gpt-5.4\` | OpenAI | Latest flagship model |\
> \| \`deepseek-ai/DeepSeek-V3.1\` | DeepSeek | Open-weight reasoning |\
> \| \`deepseek-ai/DeepSeek-V4-Flash\` | DeepSeek | Fast, low-latency reasoning (thinking mode) |\
> \| \`deepseek-ai/DeepSeek-V4-Pro\` | DeepSeek | Advanced open-weight reasoning (thinking mode) |\
> \| \`Qwen/Qwen3-Coder\` | Alibaba | Code generation & reasoning |\
> \| \`Qwen/Qwen3.5-397B-A17B-FP8\` | Alibaba | Largest open model, supports \`enable\_thinking\` |\
> \| \`Qwen/Qwen3.5-35B-A3B-GPTQ-Int4\` | Alibaba | Compact MoE (35B/3B active), supports \`enable\_thinking\` |\
> \| \`moonshotai/kimi-k2.5\` | Moonshot AI | Multilingual reasoning |\
> \| \`infercom/DeepSeek-V3.1\` | DeepSeek (Infercom) | DeepSeek V3.1 via Infercom |\
> \| \`infercom/MiniMax-M2.5\` | MiniMax (Infercom) | MiniMax flagship, 164K context |\
> \| \`infercom/gpt-oss-120b\` | OpenGPU (Infercom) | GPT-OSS 120B reasoning model |\
> \
> \## Required Parameters\
> \| Parameter | Type | Description |\
> \|-----------|------|-------------|\
> \| model | string | Model ID (see table above, case-sensitive) |\
> \| messages | array | Array of message objects with \`role\` (\`system\`/\`user\`/\`assistant\`/\`tool\`) and \`content\` |\
> \
> \## Optional Parameters\
> \| Parameter | Type | Default | Description |\
> \|-----------|------|---------|-------------|\
> \| max\_completion\_tokens | integer | - | Max tokens to generate including reasoning (1-128000). Preferred over \`max\_tokens\`. |\
> \| max\_tokens | integer | - | Max tokens to generate (1-128000). Deprecated — use \`max\_completion\_tokens\` instead. |\
> \| temperature | float | - | Sampling temperature (0.0-2.0). Lower = more deterministic, higher = more creative. |\
> \| top\_p | float | - | Nucleus sampling threshold (0.0-1.0). Alternative to temperature. |\
> \| frequency\_penalty | float | - | Penalize frequent tokens (-2.0 to 2.0). Positive values reduce repetition. |\
> \| presence\_penalty | float | - | Penalize repeated topics (-2.0 to 2.0). Positive values encourage new topics. |\
> \| stop | string/array | - | Up to 4 stop sequence(s) where generation halts |\
> \| stream | boolean | \`false\` | Enable SSE streaming (direct mode only). Cannot combine with \`async\`. |\
> \| response\_format | object | - | Output format: \`{"type": "json\_object"}\` or \`{"type": "text"}\` |\
> \| seed | integer | - | Random seed for deterministic output |\
> \| n | integer | - | Number of completions to generate (1-10) |\
> \| logprobs | boolean | - | Return log probabilities of output tokens |\
> \| top\_logprobs | integer | - | Number of most likely tokens per position (0-20, requires \`logprobs: true\`) |\
> \| tools | array | - | Available tools: \`\[{"type":"function","function":{"name":"...","parameters":{...}}}]\` |\
> \| tool\_choice | string/object | - | Tool choice: \`"auto"\`, \`"none"\`, \`"required"\`, or \`{"type":"function","function":{"name":"..."}}\` |\
> \| stream\_options | object | - | Streaming options: \`{"include\_usage": true}\` to get token usage in stream |\
> \| store | boolean | - | Whether to store the output for later use |\
> \| user | string | - | Unique end-user identifier for abuse monitoring |\
> \
> \## Streaming Mode\
> Set \`stream: true\` to receive Server-Sent Events in OpenAI format.\
> Each event is a \`data: {json}\` line, ending with \`data: \[DONE]\`.\
> \
> \## Async Mode\
> Set \`async: true\` to get a task\_id immediately and poll for results.

```json
{"openapi":"3.1.0","info":{"title":"Relay API","version":"2.0.0"},"security":[{"ApiKeyHeader":[]},{"BearerAuth":[]}],"components":{"securitySchemes":{"ApiKeyHeader":{"type":"apiKey","in":"header","name":"X-API-Key","description":"Native API key: `relay_sk_...`"},"BearerAuth":{"type":"http","scheme":"bearer","description":"OpenAI SDK compatible: `relay_sk_...`"}},"schemas":{"OpenAIChatRequest":{"properties":{"model":{"type":"string","title":"Model","description":"Model identifier: openai/gpt-5.2, openai/gpt-5.4, deepseek-ai/DeepSeek-V3.1, deepseek-ai/DeepSeek-V4-Flash, deepseek-ai/DeepSeek-V4-Pro, Qwen/Qwen3-Coder, Qwen/Qwen3.5-397B-A17B-FP8, Qwen/Qwen3.5-35B-A3B-GPTQ-Int4, moonshotai/kimi-k2.5, infercom/DeepSeek-V3.1, infercom/MiniMax-M2.5, infercom/gpt-oss-120b"},"messages":{"items":{"additionalProperties":true,"type":"object"},"type":"array","title":"Messages","description":"Array of conversation messages with role and content"},"max_tokens":{"anyOf":[{"type":"integer","maximum":128000,"minimum":1},{"type":"null"}],"title":"Max Tokens","description":"Maximum tokens to generate (deprecated, use max_completion_tokens)"},"max_completion_tokens":{"anyOf":[{"type":"integer","maximum":128000,"minimum":1},{"type":"null"}],"title":"Max Completion Tokens","description":"Maximum tokens to generate including reasoning"},"temperature":{"anyOf":[{"type":"number","maximum":2,"minimum":0},{"type":"null"}],"title":"Temperature","description":"Sampling temperature (0.0-2.0)"},"top_p":{"anyOf":[{"type":"number","maximum":1,"minimum":0},{"type":"null"}],"title":"Top P","description":"Nucleus sampling parameter"},"frequency_penalty":{"anyOf":[{"type":"number","maximum":2,"minimum":-2},{"type":"null"}],"title":"Frequency Penalty","description":"Frequency penalty for token repetition"},"presence_penalty":{"anyOf":[{"type":"number","maximum":2,"minimum":-2},{"type":"null"}],"title":"Presence Penalty","description":"Presence penalty for topic repetition"},"stop":{"anyOf":[{"type":"string"},{"items":{"type":"string"},"type":"array"},{"type":"null"}],"title":"Stop","description":"Stop sequence(s)"},"stream":{"anyOf":[{"type":"boolean"},{"type":"null"}],"title":"Stream","description":"Enable streaming responses (SSE format, direct mode only)","default":false},"response_format":{"anyOf":[{"additionalProperties":true,"type":"object"},{"type":"null"}],"title":"Response Format","description":"Output format (e.g. {\"type\": \"json_object\"})"},"seed":{"anyOf":[{"type":"integer"},{"type":"null"}],"title":"Seed","description":"Random seed for deterministic output"},"tools":{"anyOf":[{"items":{"additionalProperties":true,"type":"object"},"type":"array"},{"type":"null"}],"title":"Tools","description":"Available tools for function calling"},"tool_choice":{"anyOf":[{"type":"string"},{"additionalProperties":true,"type":"object"},{"type":"null"}],"title":"Tool Choice","description":"Tool choice strategy"},"n":{"anyOf":[{"type":"integer","maximum":10,"minimum":1},{"type":"null"}],"title":"N","description":"Number of completions to generate"},"logprobs":{"anyOf":[{"type":"boolean"},{"type":"null"}],"title":"Logprobs","description":"Return log probabilities of output tokens"},"top_logprobs":{"anyOf":[{"type":"integer","maximum":20,"minimum":0},{"type":"null"}],"title":"Top Logprobs","description":"Number of most likely tokens to return at each position"},"user":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"User","description":"Unique identifier for the end-user"},"stream_options":{"anyOf":[{"additionalProperties":true,"type":"object"},{"type":"null"}],"title":"Stream Options","description":"Streaming options (e.g. {\"include_usage\": true})"},"store":{"anyOf":[{"type":"boolean"},{"type":"null"}],"title":"Store","description":"Whether to store the output for later use"},"chat_template_kwargs":{"anyOf":[{"additionalProperties":true,"type":"object"},{"type":"null"}],"title":"Chat Template Kwargs","description":"Chat template parameters (e.g. {\"enable_thinking\": true} for reasoning models)"},"mode":{"anyOf":[{"type":"string","enum":["auto","opengpu","direct"]},{"type":"null"}],"title":"Mode","description":"Routing mode: 'auto' (default), 'direct', or 'opengpu'","default":"auto"},"async":{"anyOf":[{"type":"boolean"},{"type":"null"}],"title":"Async","description":"Async mode: returns task_id immediately, poll /v2/tasks/{task_id} for result.","default":false}},"additionalProperties":false,"type":"object","required":["model","messages"],"title":"OpenAIChatRequest","description":"OpenAI Chat Completions API request"},"OpenAIChatResponse":{"properties":{"id":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Id","description":"Unique completion ID"},"object":{"type":"string","title":"Object","description":"Object type","default":"chat.completion"},"created":{"anyOf":[{"type":"integer"},{"type":"null"}],"title":"Created","description":"Unix timestamp"},"model":{"type":"string","title":"Model","description":"Model used"},"choices":{"items":{"additionalProperties":true,"type":"object"},"type":"array","title":"Choices","description":"Completion choices"},"usage":{"anyOf":[{"additionalProperties":true,"type":"object"},{"type":"null"}],"title":"Usage","description":"Token usage statistics"},"system_fingerprint":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"System Fingerprint","description":"System fingerprint"},"task_id":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Task Id","description":"Task identifier (direct:{uuid} or opengpu:{uuid})"},"task_address":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Task Address","description":"Blockchain task address (opengpu mode only, null for direct)"},"mode":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Mode","description":"Execution mode used"}},"additionalProperties":true,"type":"object","required":["model","choices"],"title":"OpenAIChatResponse","description":"OpenAI Chat Completions API response"},"AsyncTaskAccepted":{"properties":{"task_id":{"type":"string","title":"Task Id","description":"Unique task identifier for polling (direct:{uuid} or opengpu:{uuid})"},"status":{"type":"string","title":"Status","description":"Current task status","default":"running"},"poll_url":{"type":"string","title":"Poll Url","description":"URL to poll for task status"},"message":{"type":"string","title":"Message","description":"Human-readable status message"}},"type":"object","required":["task_id","poll_url","message"],"title":"AsyncTaskAccepted","description":"Response returned when async mode is enabled (HTTP 202 Accepted).\n\nClient should poll the poll_url until status is 'completed' or 'failed'."},"HTTPValidationError":{"properties":{"detail":{"items":{"$ref":"#/components/schemas/ValidationError"},"type":"array","title":"Detail"}},"type":"object","title":"HTTPValidationError"},"ValidationError":{"properties":{"loc":{"items":{"anyOf":[{"type":"string"},{"type":"integer"}]},"type":"array","title":"Location"},"msg":{"type":"string","title":"Message"},"type":{"type":"string","title":"Error Type"}},"type":"object","required":["loc","msg","type"],"title":"ValidationError"}}},"paths":{"/v2/openai/v1/chat/completions":{"post":{"tags":["OpenAI"],"summary":"Create chat completion (OpenAI-compatible)","description":"Create a chat completion using OpenAI-compatible API.\n\n## Authentication\n**Optional** — omit for guest access (lower rate limits).\n\n| Method | Header |\n|--------|--------|\n| API Key | `X-API-Key: relay_sk_...` |\n| Bearer Token | `Authorization: Bearer relay_sk_...` |\n\nIf both are provided, `X-API-Key` takes priority.\n\n**Available Models:** See `GET /v2/models` for the current list. Key models:\n\n| Model | Vendor | Specialty |\n|-------|--------|-----------|\n| `openai/gpt-5.2` | OpenAI | General-purpose reasoning |\n| `openai/gpt-5.4` | OpenAI | Latest flagship model |\n| `deepseek-ai/DeepSeek-V3.1` | DeepSeek | Open-weight reasoning |\n| `deepseek-ai/DeepSeek-V4-Flash` | DeepSeek | Fast, low-latency reasoning (thinking mode) |\n| `deepseek-ai/DeepSeek-V4-Pro` | DeepSeek | Advanced open-weight reasoning (thinking mode) |\n| `Qwen/Qwen3-Coder` | Alibaba | Code generation & reasoning |\n| `Qwen/Qwen3.5-397B-A17B-FP8` | Alibaba | Largest open model, supports `enable_thinking` |\n| `Qwen/Qwen3.5-35B-A3B-GPTQ-Int4` | Alibaba | Compact MoE (35B/3B active), supports `enable_thinking` |\n| `moonshotai/kimi-k2.5` | Moonshot AI | Multilingual reasoning |\n| `infercom/DeepSeek-V3.1` | DeepSeek (Infercom) | DeepSeek V3.1 via Infercom |\n| `infercom/MiniMax-M2.5` | MiniMax (Infercom) | MiniMax flagship, 164K context |\n| `infercom/gpt-oss-120b` | OpenGPU (Infercom) | GPT-OSS 120B reasoning model |\n\n## Required Parameters\n| Parameter | Type | Description |\n|-----------|------|-------------|\n| model | string | Model ID (see table above, case-sensitive) |\n| messages | array | Array of message objects with `role` (`system`/`user`/`assistant`/`tool`) and `content` |\n\n## Optional Parameters\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| max_completion_tokens | integer | - | Max tokens to generate including reasoning (1-128000). Preferred over `max_tokens`. |\n| max_tokens | integer | - | Max tokens to generate (1-128000). Deprecated — use `max_completion_tokens` instead. |\n| temperature | float | - | Sampling temperature (0.0-2.0). Lower = more deterministic, higher = more creative. |\n| top_p | float | - | Nucleus sampling threshold (0.0-1.0). Alternative to temperature. |\n| frequency_penalty | float | - | Penalize frequent tokens (-2.0 to 2.0). Positive values reduce repetition. |\n| presence_penalty | float | - | Penalize repeated topics (-2.0 to 2.0). Positive values encourage new topics. |\n| stop | string/array | - | Up to 4 stop sequence(s) where generation halts |\n| stream | boolean | `false` | Enable SSE streaming (direct mode only). Cannot combine with `async`. |\n| response_format | object | - | Output format: `{\"type\": \"json_object\"}` or `{\"type\": \"text\"}` |\n| seed | integer | - | Random seed for deterministic output |\n| n | integer | - | Number of completions to generate (1-10) |\n| logprobs | boolean | - | Return log probabilities of output tokens |\n| top_logprobs | integer | - | Number of most likely tokens per position (0-20, requires `logprobs: true`) |\n| tools | array | - | Available tools: `[{\"type\":\"function\",\"function\":{\"name\":\"...\",\"parameters\":{...}}}]` |\n| tool_choice | string/object | - | Tool choice: `\"auto\"`, `\"none\"`, `\"required\"`, or `{\"type\":\"function\",\"function\":{\"name\":\"...\"}}` |\n| stream_options | object | - | Streaming options: `{\"include_usage\": true}` to get token usage in stream |\n| store | boolean | - | Whether to store the output for later use |\n| user | string | - | Unique end-user identifier for abuse monitoring |\n\n## Streaming Mode\nSet `stream: true` to receive Server-Sent Events in OpenAI format.\nEach event is a `data: {json}` line, ending with `data: [DONE]`.\n\n## Async Mode\nSet `async: true` to get a task_id immediately and poll for results.","operationId":"openai_chat_completions_v2_openai_v1_chat_completions_post","parameters":[{"name":"x-api-key","in":"header","required":false,"schema":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"X-Api-Key"}},{"name":"authorization","in":"header","required":false,"schema":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Authorization"}}],"requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/OpenAIChatRequest"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/OpenAIChatResponse"}}}},"202":{"description":"Task accepted (async mode). Poll the poll_url for status.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/AsyncTaskAccepted"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}}}}
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://opengpu-network.gitbook.io/relay/reference/api-explorer/openai.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
