Chat

Two sources for LLM chat: ollama (Ollama-compatible) and vllm (OpenAI-compatible).


Ollama Source

Ollama-compatible API format.

POST /v2/ollama/api/chat

Request

{
  "model": "gpt-oss:20b",
  "messages": [
    {"role": "user", "content": "Hello"}
  ],
  "mode": "auto",
  "stream": false
}
Parameter
Type
Description

model

string

Model name (required)

messages

array

Conversation messages (required)

mode

string

auto, direct, or opengpu (default: auto)

stream

bool

Enable streaming (default: false)

think

string

Reasoning depth: low, medium, high

temperature

float

Randomness 0.0-2.0

Response

Field
Description

mode

Actual mode used: direct or opengpu

task_address

Unique identifier. For opengpu mode, verifiable on ogpuscan.ioarrow-up-right

Streaming

Returns Server-Sent Events:

Models (ollama source)

Model
Modes
Tiers

gpt-oss:20b

opengpu

all

gpt-oss:120b

direct, opengpu

pro/max

llama3.2:3b

opengpu

all

deepseek-r1:8b

opengpu

all


vLLM Source

OpenAI-compatible API format. Direct mode only.

Request

Response

Field
Description

mode

Actual mode used (always direct for vLLM)

task_address

Unique identifier for this request

Models (vllm source)

Model
Modes
Tiers

openai/gpt-oss-120b

direct

pro/max

Last updated