Interact with a Deployment

You can interact with a deployed inference service by sending a request to the service’s endpoint. The service will respond with a prediction based on the input data.

Before You Start #

Ensure you have completed the Developer System Setup.
Ensure that you have an active inference service deployed and in a Ready state (aioli d).
Depending upon your Kubernetes configuration, you may need to set up:
- A DNS domain name or Magic DNS configuration to access the service via the endpoint hostname.
- A port-forward to enable interactions with your service to perform inference requests. See the Kserve Documentation for more information.

How to Interact with a Service #

Using Service Endpoint Hostname #

If a DNS or Magic DNS has been configured, you can interact with the service using the endpoint hostname provided in the UI or CLI output after deploying the service. For example, the endpoint hostname http://iris-classifier-deployment.default.example.com can be used to interact with the service.

curl -s \
-H "Authorization: Bearer <YOUR_ACCESS_TOKEN>" \
-H Content-Type:application/json http://iris-classifier-deployment.default.example.com/classify \
-d   [[5.9, 3, 5.1, 1.8]]

Using Deployment Access Token #

You can use a port-forward via the ingress gateway service with the endpoint shown in the deployment state using following commands:

Set up environment variables for the ingress gateway service.

export INGRESS_HOST=localhost
export INGRESS_PORT=8100
INGRESS_GATEWAY_SERVICE=$(kubectl get svc --namespace istio-system --selector="app=istio-ingressgateway" --output jsonpath='{.items[0].metadata.name}')

Use port-forward to access the service.

kubectl port-forward --namespace istio-system svc/${INGRESS_GATEWAY_SERVICE} ${INGRESS_PORT}:80

If authentication is required, generate a deployment token with the following command:
```
aioli token create <DEPLOYMENT_NAME>
```
To view the details of the generated token, including the token itself, use:
```
aioli token show <TOKEN_ID>
```

Interact with the service using curl commands.

curl -s \
-H Host:iris-classifier-deployment.default.example.com \
-H "Authorization: Bearer <YOUR_ACCESS_TOKEN>" \
-H Content-Type:application/json http://${INGRESS_HOST}:${INGRESS_PORT}/classify \
-d   [[5.9, 3, 5.1, 1.8]]

{"prediction":2}

Model Tokens

As an alternative to a deployment token, you can create a model token :

aioli model token <MODEL_NAME>

token:
   <MODEL_TOKEN_HERE>

Determining the API Route #

The API route used (in this guide, /classify) is specific to the service and may vary based on the service’s implementation. For example, if the model image was created using bentoml, the API route would be defined in the service.py file.

from typing import Dict, Any
import numpy as np
import bentoml
from bentoml.io import NumpyNdarray, JSON
import torch

iris_clf_runner = bentoml.pytorch.get("iris:latest").to_runner()

svc = bentoml.Service("iris", runners=[iris_clf_runner])

@svc.api(input=NumpyNdarray(), output=JSON())
async def classify(input_array: np.ndarray) -> Dict[str, Any]:
    # Convert the numpy array to a Float Tensor
    input_tensor = torch.FloatTensor(input_array)
    
    # Run the prediction. Since it is not an async function, we use `run` instead of `async_run`
    result_tensor = await iris_clf_runner.async_run(input_tensor)
    
    # Convert the output tensor to numpy for JSON serialization
    result_array = result_tensor.numpy()
    
    # Assuming the model outputs class probabilities, we take the argmax to get the class label
    class_label = np.argmax(result_array, axis=1)[0]
    
    return {"prediction": class_label} # Convert to list for JSON serialization