VLLM
OpenAI-compatible chat completion request for vLLM.
Supports standard OpenAI API parameters plus vLLM-specific extensions. Parameters are passed directly to the vLLM provider without transformation.
Model identifier. Available: openai/gpt-oss-120b
openai/gpt-oss-120bMaximum tokens to generate (deprecated, use max_completion_tokens)
Maximum tokens to generate (preferred over max_tokens)
Minimum tokens to generate before stopping
Sampling temperature (0.0 = deterministic)
Top-p (nucleus) sampling
Top-k sampling (-1 to disable)
Minimum probability threshold for sampling
Frequency penalty for token repetition
Presence penalty for topic repetition
Repetition penalty (1.0 = no penalty)
Number of completions to generate
Stop sequence(s) - generation stops when encountered
Random seed for reproducibility
Enable streaming responses (not yet supported)
falseReasoning effort level: 'low', 'medium', 'high'. Omit to use model default.
Include reasoning content in response. Omit to use model default.
Return log probabilities of output tokens
Number of most likely tokens to return at each position
Token ID to bias value mapping (-100 to 100)
Unique identifier for the end-user
Routing mode: 'auto' or 'direct' (vLLM is direct-only)
autoAsync mode: returns task_address immediately, poll /v2/tasks/{task_address} for result. Default: false (sync mode).
falseSuccessful Response
Task accepted (async mode). Poll the poll_url for status.
Validation Error
Last updated