Skip to content

API Reference

Packages

inference.networking.k8s.io/v1

Package v1 contains API Schema definitions for the inference.networking.k8s.io API group.

Resource Types

EndpointPickerConfig

EndpointPickerConfig specifies the configuration needed by the proxy to discover and connect to the endpoint picker extension. This type is intended to be a union of mutually exclusive configuration options that we may add in the future.

Appears in: - InferencePoolSpec

Field Description Default Validation
extensionRef Extension Extension configures an endpoint picker as an extension service. Required: {}

Extension

Extension specifies how to configure an extension that runs the endpoint picker.

Appears in: - EndpointPickerConfig - InferencePoolSpec

Field Description Default Validation
group Group Group is the group of the referent.
The default value is "", representing the Core API group.
MaxLength: 253
Pattern: ^$\|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$
kind Kind Kind is the Kubernetes resource kind of the referent.
Defaults to "Service" when not specified.
ExternalName services can refer to CNAME DNS records that may live
outside of the cluster and as such are difficult to reason about in
terms of conformance. They also may not be safe to forward to (see
CVE-2021-25740 for more information). Implementations MUST NOT
support ExternalName Services.
Service MaxLength: 63
MinLength: 1
Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$
name ObjectName Name is the name of the referent. MaxLength: 253
MinLength: 1
Required: {}
portNumber PortNumber The port number on the service running the extension. When unspecified,
implementations SHOULD infer a default value of 9002 when the Kind is
Service.
Maximum: 65535
Minimum: 1
failureMode ExtensionFailureMode Configures how the gateway handles the case when the extension is not responsive.
Defaults to failClose.
FailClose Enum: [FailOpen FailClose]

ExtensionConnection

ExtensionConnection encapsulates options that configures the connection to the extension.

Appears in: - Extension

Field Description Default Validation
failureMode ExtensionFailureMode Configures how the gateway handles the case when the extension is not responsive.
Defaults to failClose.
FailClose Enum: [FailOpen FailClose]

ExtensionFailureMode

Underlying type: string

ExtensionFailureMode defines the options for how the gateway handles the case when the extension is not responsive.

Validation: - Enum: [FailOpen FailClose]

Appears in: - Extension - ExtensionConnection

Field Description
FailOpen FailOpen specifies that the proxy should forward the request to an endpoint of its picking when the Endpoint Picker fails.
FailClose FailClose specifies that the proxy should drop the request when the Endpoint Picker fails.

ExtensionReference

ExtensionReference is a reference to the extension.

If a reference is invalid, the implementation MUST update the ResolvedRefs Condition on the InferencePool's status to status: False. A 5XX status code MUST be returned for the request that would have otherwise been routed to the invalid backend.

Appears in: - Extension

Field Description Default Validation
group Group Group is the group of the referent.
The default value is "", representing the Core API group.
MaxLength: 253
Pattern: ^$\|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$
kind Kind Kind is the Kubernetes resource kind of the referent.
Defaults to "Service" when not specified.
ExternalName services can refer to CNAME DNS records that may live
outside of the cluster and as such are difficult to reason about in
terms of conformance. They also may not be safe to forward to (see
CVE-2021-25740 for more information). Implementations MUST NOT
support ExternalName Services.
Service MaxLength: 63
MinLength: 1
Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$
name ObjectName Name is the name of the referent. MaxLength: 253
MinLength: 1
Required: {}
portNumber PortNumber The port number on the service running the extension. When unspecified,
implementations SHOULD infer a default value of 9002 when the Kind is
Service.
Maximum: 65535
Minimum: 1

Group

Underlying type: string

Group refers to a Kubernetes Group. It must either be an empty string or a RFC 1123 subdomain.

This validation is based off of the corresponding Kubernetes validation: https://github.com/kubernetes/apimachinery/blob/02cfb53916346d085a6c6c7c66f882e3c6b0eca6/pkg/util/validation/validation.go#L208

Valid values include:

  • "" - empty string implies core Kubernetes API group
  • "gateway.networking.k8s.io"
  • "foo.example.com"

Invalid values include:

  • "example.com/bar" - "/" is an invalid character

Validation: - MaxLength: 253 - Pattern: ^$|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$

Appears in: - Extension - ExtensionReference - ParentGatewayReference

InferencePool

InferencePool is the Schema for the InferencePools API.

Field Description Default Validation
apiVersion string inference.networking.k8s.io/v1
kind string InferencePool
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec InferencePoolSpec
status InferencePoolStatus Status defines the observed state of InferencePool. { parent:[map[conditions:[map[lastTransitionTime:1970-01-01T00:00:00Z message:Waiting for controller reason:Pending status:Unknown type:Accepted]] parentRef:map[kind:Status name:default]]] }

InferencePoolSpec

InferencePoolSpec defines the desired state of InferencePool

Appears in: - InferencePool

Field Description Default Validation
selector object (keys:LabelKey, values:LabelValue) Selector defines a map of labels to watch model server Pods
that should be included in the InferencePool.
In some cases, implementations may translate this field to a Service selector, so this matches the simple
map used for Service selectors instead of the full Kubernetes LabelSelector type.
If specified, it will be applied to match the model server pods in the same namespace as the InferencePool.
Cross namesoace selector is not supported.
Required: {}
targetPortNumber integer TargetPortNumber defines the port number to access the selected model server Pods.
The number must be in the range 1 to 65535.
Maximum: 65535
Minimum: 1
Required: {}
extensionRef Extension Extension configures an endpoint picker as an extension service. Required: {}

InferencePoolStatus

InferencePoolStatus defines the observed state of InferencePool.

Appears in: - InferencePool

Field Description Default Validation
parent PoolStatus array Parents is a list of parent resources (usually Gateways) that are
associated with the InferencePool, and the status of the InferencePool with respect to
each parent.
A maximum of 32 Gateways will be represented in this list. When the list contains
kind: Status, name: default, it indicates that the InferencePool is not
associated with any Gateway and a controller must perform the following:
- Remove the parent when setting the "Accepted" condition.
- Add the parent when the controller will no longer manage the InferencePool
and no other parents exist.
MaxItems: 32

Kind

Underlying type: string

Kind refers to a Kubernetes Kind.

Valid values include:

  • "Service"
  • "HTTPRoute"

Invalid values include:

  • "invalid/kind" - "/" is an invalid character

Validation: - MaxLength: 63 - MinLength: 1 - Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$

Appears in: - Extension - ExtensionReference - ParentGatewayReference

LabelKey

Underlying type: string

LabelKey was originally copied from: https://github.com/kubernetes-sigs/gateway-api/blob/99a3934c6bc1ce0874f3a4c5f20cafd8977ffcb4/apis/v1/shared_types.go#L694-L731 Duplicated as to not take an unexpected dependency on gw's API.

LabelKey is the key of a label. This is used for validation of maps. This matches the Kubernetes "qualified name" validation that is used for labels. Labels are case sensitive, so: my-label and My-Label are considered distinct.

Valid values include:

  • example
  • example.com
  • example.com/path
  • example.com/path.html

Invalid values include:

  • example~ - "~" is an invalid character
  • example.com. - can not start or end with "."

Validation: - MaxLength: 253 - MinLength: 1 - Pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?([A-Za-z0-9][-A-Za-z0-9_.]{0,61})?[A-Za-z0-9]$

Appears in: - InferencePoolSpec

LabelValue

Underlying type: string

LabelValue is the value of a label. This is used for validation of maps. This matches the Kubernetes label validation rules: * must be 63 characters or less (can be empty), * unless empty, must begin and end with an alphanumeric character ([a-z0-9A-Z]), * could contain dashes (-), underscores (_), dots (.), and alphanumerics between.

Valid values include:

  • MyValue
  • my.name
  • 123-my-value

Validation: - MaxLength: 63 - MinLength: 0 - Pattern: ^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$

Appears in: - InferencePoolSpec

Namespace

Underlying type: string

Namespace refers to a Kubernetes namespace. It must be a RFC 1123 label.

This validation is based off of the corresponding Kubernetes validation: https://github.com/kubernetes/apimachinery/blob/02cfb53916346d085a6c6c7c66f882e3c6b0eca6/pkg/util/validation/validation.go#L187

This is used for Namespace name validation here: https://github.com/kubernetes/apimachinery/blob/02cfb53916346d085a6c6c7c66f882e3c6b0eca6/pkg/api/validation/generic.go#L63

Valid values include:

  • "example"

Invalid values include:

  • "example.com" - "." is an invalid character

Validation: - MaxLength: 63 - MinLength: 1 - Pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?$

Appears in: - ParentGatewayReference

ObjectName

Underlying type: string

ObjectName refers to the name of a Kubernetes object. Object names can have a variety of forms, including RFC 1123 subdomains, RFC 1123 labels, or RFC 1035 labels.

Validation: - MaxLength: 253 - MinLength: 1

Appears in: - Extension - ExtensionReference - ParentGatewayReference

ParentGatewayReference

ParentGatewayReference identifies an API object including its namespace, defaulting to Gateway.

Appears in: - PoolStatus

Field Description Default Validation
group Group Group is the group of the referent. gateway.networking.k8s.io MaxLength: 253
Pattern: ^$\|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$
kind Kind Kind is kind of the referent. For example "Gateway". Gateway MaxLength: 63
MinLength: 1
Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$
name ObjectName Name is the name of the referent. MaxLength: 253
MinLength: 1
namespace Namespace Namespace is the namespace of the referent. If not present,
the namespace of the referent is assumed to be the same as
the namespace of the referring object.
MaxLength: 63
MinLength: 1
Pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?$

PoolStatus

PoolStatus defines the observed state of InferencePool from a Gateway.

Appears in: - InferencePoolStatus

Field Description Default Validation
parentRef ParentGatewayReference GatewayRef indicates the gateway that observed state of InferencePool.
conditions Condition array Conditions track the state of the InferencePool.
Known condition types are:
"Accepted"
"ResolvedRefs"
[map[lastTransitionTime:1970-01-01T00:00:00Z message:Waiting for controller reason:Pending status:Unknown type:Accepted]] MaxItems: 8

PortNumber

Underlying type: integer

PortNumber defines a network port.

Validation: - Maximum: 65535 - Minimum: 1

Appears in: - Extension - ExtensionReference