Optional cancellationOptional effortIf the model supports effort levels, this parameter can be used to specify the effort level.
Optional enableWhether to enable caching for this request. Implementation depends on the specific provider (the below are examples, many other providers exist):
true - Caching is enabled by default for providers that support it.
Optional frequencyNumber between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Optional includeNot all AI providers support this feature. When supported, this parameter indicates if logprobs are requested. Logprobs provide information about the likelihood of each token in the response. This can be useful for debugging or understanding the model's behavior. When models support this and this property is set to true, the model will return logprobs for the tokens in the
Optional maxModel max output response tokens, optional.
Array of messages, allows full control over the order and content of the conversation.
Optional minPMinimum probability threshold for token sampling (0-1). Tokens with probability below this threshold are filtered out before sampling. This is a newer parameter not yet widely supported.
Model name, required.
Optional modelThe standard response formats may not be sufficient for all models. This field allows for a model-specific response format to be specified. For this field to be used, responseFormat must be set to 'ModelSpecific'.
Optional presenceNumber between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Optional reasoningModel max budget tokens that we may use for reasoning in reasoning models, optional.
Optional responseSpecifies the format that the model should output. Not all models support all formats. If not specified, the default is 'Any'.
Optional seedOptional seed for reproducible outputs. Not all models support seeding, but when supported, using the same seed with the same inputs should produce identical outputs.
Optional stopOptional array of sequences where the model will stop generating further tokens. The returned text will not contain the stop sequence.
Optional streamingWhether to use streaming for this request. If true and the provider supports streaming, responses will be streamed. If true but the provider doesn't support streaming, the request will fall back to non-streaming.
Optional streamingCallbacks for streaming responses. Only used when streaming is true.
Optional temperatureModel temperature, optional.
Optional topKTop-k sampling parameter. Limits the model to only sample from the top K most likely tokens at each step. For example, k=50 means the model will only consider the 50 most likely next tokens. Not supported by all providers (e.g., OpenAI doesn't support this).
Optional topNumber of top log probabilities to return per token. Only used when includeLogProbs is true. Typically ranges from 2-20, depending on the provider.
Optional topPTop-p (nucleus) sampling parameter (0-1). An alternative to temperature sampling that considers the smallest set of tokens whose cumulative probability exceeds the probability p. For example, 0.1 means only the tokens comprising the top 10% probability mass are considered. Generally, use either temperature OR top-p, not both.
Optional cancellation token to abort the chat completion request. When this signal is aborted, the provider should cancel the request and return a cancelled result as gracefully as possible.