Integrate your own LLMs

You can add a container with a custom model directly to hila or you can connect to an external model.

For internal models, start with step 1. For external models, start with step 2.

(Internal only) Build a custom container with the model.
- Expose the port with the REST endpoint, “/predict”.
- Push the container to your container registry.
Create a custom job. A job starts the container that has the custom model.

Post the following body to the /v1/job-types POST API, giving a custom job name and the path to the image in your container registry.
```
 {
     "name": "string",
     "image_url": "string",
     "runtime_object": "knative"
 }
```

Use or edit an existing wrapper or create a new one.

To view the wrappers:

Open edahub in hila monitoring app. See Open edahub.
In the left pane of edahub, navigate to /source/vianai/llm. The wrappers reside in this directory.

The wrappers associated with the hila models are:

gpt-4o vianai.llm.GPTWrapper.GPTWrapper

gpt-4o-mini vianai.llm.GPTWrapper.GPTWrapper

azure-openai-gpt4o vianai.llm.AzureGPTWrapper.AzureGPTWrapper

azure-openai-gpt4o-mini vianai.llm.AzureGPTWrapper.AzureGPTWrapper

Create a placeholder model object that points to the model and the wrapper, and passes parameters needed for the wrapper to interact with the LLM.

The object must contain the predict_metadata tag as shown in the following examples.

OpenAI example object

 {
     "project_name": "LLM Project",
     "model_name_ext": "gpt-4o",
     "model_type": "chat",
     "model_class": "vianai.llm.GPTWrapper.GPTWrapper",
     "model_api_url": "https://api.openai.com/v1/chat/completions",
     "api_token": "<YOUR_TOKEN>",
     "init_config_params": {
         "temperature": 0,
         "max_tokens": 1000,
         "top_p": 1,
         "response_format": {"type": "json_object"},
     },
     "prompt_template": {
         "role": "user",
         "content": "Can you explain the advancements in chatgpt-4o compared to chatGPT-3.5-turbo?"
     },
     "response_format": {"type": "json_object"},
     "max_output_tokens": 8192,
     "cost_output_tokens": 0.015
 }

Azure OpenAI example object

 {
     "name": "predict_metadata",
     "value": {
         "model_name_ext": "azure-gpt-4o-deployment",
         "model_class": "vianai.llm.AzureGPTWrapper.AzureGPTWrapper",
         "model_type": "chat",
         "task_type": "question_answering",
         "api_token": "YOUR_TOKEN",
         "init_config_params": {
             "temperature": 0.0,  # tweak temperature or top_p, not both
             "max_tokens": 1000,  # in output
             "top_p": 1,
             "response_format": {"type": "json_object"},
         },
         "model_file": "",
         "max_output_tokens": 4096,
         "prompt_template": {
             "messages": [
                 {"role": "system", "content": "Output JSON"},
                 {"role": "user", "content": ""},
                 {"role": "assistant", "content": ""},
             ],
             "cost_output_tokens": 0.015,
         },
         "api_type": "azure",
         "api_version": "2024-06-01",
         "azure_endpoint": "https://hila-benchmarking.openai.azure.com/"
     },
     "status": "active",
 }

Post the predict_metadata object to the /v1/models POST API to save it.

TABLE OF CONTENTS

gpt-4o	vianai.llm.GPTWrapper.GPTWrapper
gpt-4o-mini	vianai.llm.GPTWrapper.GPTWrapper
azure-openai-gpt4o	vianai.llm.AzureGPTWrapper.AzureGPTWrapper
azure-openai-gpt4o-mini	vianai.llm.AzureGPTWrapper.AzureGPTWrapper