Menu
Important
Stay updated on React2Shell

OpenAI-Compatible API

Last updated December 1, 2025

AI Gateway provides OpenAI-compatible API endpoints, letting you use multiple AI providers through a familiar interface. You can use existing OpenAI client libraries, switch to the AI Gateway with a URL change, and keep your current tools and workflows without code rewrites.

The OpenAI-compatible API implements the same specification as the OpenAI API.

The OpenAI-compatible API is available at the following base URL:

The OpenAI-compatible API supports the same authentication methods as the main AI Gateway:

  • API key: Use your AI Gateway API key with the header
  • OIDC token: Use your Vercel OIDC token with the header

You only need to use one of these forms of authentication. If an API key is specified it will take precedence over any OIDC token, even if the API key is invalid.

The AI Gateway supports the following OpenAI-compatible endpoints:

  • - List available models
  • - Retrieve a specific model
  • - Create chat completions with support for streaming, attachments, tool calls, and image generation
  • - Generate vector embeddings

You can use the AI Gateway's OpenAI-compatible API with existing tools and libraries like the OpenAI client libraries and AI SDK 4. Point your existing client to the AI Gateway's base URL and use your AI Gateway API key or OIDC token for authentication.

For compatibility with AI SDK v4 and AI Gateway, install the @ai-sdk/openai-compatible package.

Verify that you are using AI SDK 4 by using the following package versions: version (e.g., ) and version (e.g., ).

Retrieve a list of all available models that can be used with the AI Gateway.

Endpoint
Example request
Response format

The response follows the OpenAI API format:

Retrieve details about a specific model.

Endpoint
Parameters
  • (required): The model ID to retrieve (e.g., )
Example request
Response format

Create chat completions using various AI models available through the AI Gateway.

Endpoint

Create a non-streaming chat completion.

Example request
Response format

Create a streaming chat completion that streams tokens as they are generated.

Example request

Streaming responses are sent as Server-Sent Events (SSE), a web standard for real-time data streaming over HTTP. Each event contains a JSON object with the partial response data.

The response format follows the OpenAI streaming specification:

Key characteristics:

  • Each line starts with followed by JSON
  • Content is delivered incrementally in the field
  • The stream ends with
  • Empty lines separate events

SSE Parsing Libraries:

If you're building custom SSE parsing (instead of using the OpenAI SDK), these libraries can help:

  • JavaScript/TypeScript: - Robust SSE parsing with support for partial events
  • Python: - SSE support for HTTPX, or for requests

For more details about the SSE specification, see the W3C specification.

Send images as part of your chat completion request.

Example request

Send PDF documents as part of your chat completion request.

Example request

The AI Gateway supports OpenAI-compatible function calling, allowing models to call tools and functions. This follows the same specification as the OpenAI Function Calling API.

Controlling tool selection: By default, is set to , allowing the model to decide when to use tools. You can also:

  • Set to to disable tool calls
  • Force a specific tool with:

When the model makes tool calls, the response includes tool call information:

Generate structured JSON responses that conform to a specific schema, ensuring predictable and reliable data formats for your applications.

Use the OpenAI standard response format for the most robust structured output experience. This follows the official OpenAI Structured Outputs specification.

Example request
Response format

The response contains structured JSON that conforms to your specified schema:

  • : Must be
  • : Object containing schema definition
    • (required): Name of the response schema
    • (optional): Human-readable description of the expected output
    • (required): Valid JSON Schema object defining the structure

Legacy format: The following format is supported for backward compatibility. For new implementations, use the format above.

Both and legacy formats work with streaming responses:

Streaming assembly: When using structured outputs with streaming, you'll need to collect all the content chunks and parse the complete JSON response once the stream is finished.

Configure reasoning behavior for models that support extended thinking or chain-of-thought reasoning. The parameter allows you to control how reasoning tokens are generated and returned.

Example request

The object supports the following parameters:

  • (boolean, optional): Enable reasoning output. When , the model will provide its reasoning process.
  • (number, optional): Maximum number of tokens to allocate for reasoning. This helps control costs and response times. Cannot be used with .
  • (string, optional): Control reasoning effort level. Accepts , , or . Cannot be used with .
  • (boolean, optional): When , excludes reasoning content from the response but still generates it internally. Useful for reducing response payload size.

Mutually exclusive parameters: You cannot specify both and in the same request. Choose one based on your use case.

When reasoning is enabled, the response includes reasoning content:

Reasoning content is streamed incrementally in the field:

The AI Gateway preserves reasoning details from models across interactions, normalizing the different formats used by OpenAI, Anthropic, and other providers into a consistent structure. This allows you to switch between models without rewriting your conversation management logic.

This is particularly useful during tool calling workflows where the model needs to resume its thought process after receiving tool results.

Controlling reasoning details

When is (or when is not set), responses include a array alongside the standard text field. This structured field captures cryptographic signatures, encrypted content, and other verification data that providers include with their reasoning output.

Each detail object contains:

  • : one or more of the below, depending on the provider and model
    • : Contains the actual reasoning content as plain text in the field. May include a field (Anthropic models) for cryptographic verification.
    • : Contains encrypted or redacted reasoning content in the field. Used by OpenAI models when reasoning is protected, or by Anthropic models when thinking is redacted. Preserves the encrypted payload for verification purposes.
    • : Contains a condensed version of the reasoning process in the field. Used by OpenAI models to provide a readable summary alongside encrypted reasoning.
  • (optional): Unique identifier for the reasoning block, used for tracking and correlation
  • : Provider format identifier - , , or
  • (optional): Position in the reasoning sequence (for responses with multiple reasoning blocks)

Example response with reasoning details

For Anthropic models:

For OpenAI models (returns both summary and encrypted):

Streaming reasoning details

When streaming, reasoning details are delivered incrementally in :

For Anthropic models:

For OpenAI models (summary chunks during reasoning, then encrypted at end):

The AI Gateway automatically maps reasoning parameters to each provider's native format:

  • OpenAI: Maps to and controls summary detail
  • Anthropic: Maps to thinking budget tokens
  • Google: Maps to with budget and visibility settings
  • Groq: Maps to control reasoning format (hidden/parsed)
  • xAI: Maps to reasoning effort levels
  • Other providers: Generic mapping applied for compatibility

Automatic extraction: For models that don't natively support reasoning output, the gateway automatically extracts reasoning from tags in the response.

The AI Gateway can route your requests across multiple AI providers for better reliability and performance. You can control which providers are used and in what order through the parameter.

Example request

Provider routing: In this example, the gateway will first attempt to use Vertex AI to serve the Claude model. If Vertex AI is unavailable or fails, it will fall back to Anthropic. Other providers are still available but will only be used after the specified providers.

You can specify fallback models that will be tried in order if the primary model fails. There are two ways to do this:

Option 1: Direct field

The simplest way is to use the field directly at the top level of your request:

Option 2: Via provider options

Alternatively, you can specify model fallbacks through the field:

Which approach to use: Both methods achieve the same result. Use the direct field (Option 1) for simplicity, or use (Option 2) if you're already using provider options for other configurations.

Both configurations will:

  1. Try the primary model () first
  2. If it fails, try
  3. If that also fails, try
  4. Return the result from the first model that succeeds

Provider options work with streaming requests as well:

For more details about available providers and advanced provider configuration, see the Provider Options documentation.

The chat completions endpoint supports the following parameters:

  • (string): The model to use for the completion (e.g., )
  • (array): Array of message objects with and fields
  • (boolean): Whether to stream the response. Defaults to
  • (number): Controls randomness in the output. Range: 0-2
  • (integer): Maximum number of tokens to generate
  • (number): Nucleus sampling parameter. Range: 0-1
  • (number): Penalty for frequent tokens. Range: -2 to 2
  • (number): Penalty for present tokens. Range: -2 to 2
  • (string or array): Stop sequences for the generation
  • (array): Array of tool definitions for function calling
  • (string or object): Controls which tools are called (, , or specific function)
  • (object): Provider routing and configuration options
  • (object): Controls the format of the model's response
    • For OpenAI standard format:
    • For legacy format:
    • For plain text:
    • See Structured outputs for detailed examples

Messages support different content types:

Generate images using AI models that support multimodal output through the OpenAI-compatible API. This feature allows you to create images alongside text responses using models like Google's Gemini 2.5 Flash Image.

Endpoint
Parameters

To enable image generation, include the parameter in your request:

  • (array): Array of strings specifying the desired output modalities. Use for both text and image generation, or for image-only generation.
Example requests
Response format

When image generation is enabled, the response separates text content from generated images:

  • : Contains the text description as a string
  • : Array of generated images, each with:
    • : Always
    • : Base64-encoded data URI of the generated image

For streaming requests, images are delivered in delta chunks:

When processing streaming responses, check for both text content and images in each delta:

Image generation support: Currently, image generation is supported by Google's Gemini 2.5 Flash Image model. The generated images are returned as base64-encoded data URIs in the response. For more detailed information about image generation capabilities, see the Image Generation documentation.

Generate vector embeddings from input text for semantic search, similarity matching, and retrieval-augmented generation (RAG).

Endpoint
Example request
Response format
Dimensions parameter

You can set the root-level field (from the OpenAI Embeddings API spec) and the gateway will auto-map it to each provider's expected field; still passes through as-is and isn't required for to work.

The API returns standard HTTP status codes and error responses:

  • : Invalid request parameters
  • : Invalid or missing authentication
  • : Insufficient permissions
  • : Model or endpoint not found
  • : Rate limit exceeded
  • : Server error

If you prefer to use the AI Gateway API directly without the OpenAI client libraries, you can make HTTP requests using any HTTP client. Here are examples using and JavaScript's API:


Was this helpful?

supported.