Integrate your own LLMs
You can add a container with a custom model directly to hila or you can connect to an external model.
For internal models, start with step 1. For external models, start with step 2.
-
(Internal only) Build a custom container with the model.
- Expose the port with the REST endpoint, “/predict”.
- Push the container to your container registry.
-
Create a custom job. A job starts the container that has the custom model.
Post the following body to the
/v1/job-types POST
API, giving a custom job name and the path to the image in your container registry.{ "name": "string", "image_url": "string", "runtime_object": "knative" }
-
Use or edit an existing wrapper or create a new one.
To view the wrappers:
-
Open
edahub
in hila monitoring app. See Open edahub. -
In the left pane of edahub, navigate to
/source/vianai/llm
. The wrappers reside in this directory.
The wrappers associated with the hila models are:
gpt-4o vianai.llm.GPTWrapper.GPTWrapper gpt-4o-mini vianai.llm.GPTWrapper.GPTWrapper azure-openai-gpt4o vianai.llm.AzureGPTWrapper.AzureGPTWrapper azure-openai-gpt4o-mini vianai.llm.AzureGPTWrapper.AzureGPTWrapper -
-
Create a placeholder model object that points to the model and the wrapper, and passes parameters needed for the wrapper to interact with the LLM.
The object must contain the
predict_metadata
tag as shown in the following examples.OpenAI example object
{ "project_name": "LLM Project", "model_name_ext": "gpt-4o", "model_type": "chat", "model_class": "vianai.llm.GPTWrapper.GPTWrapper", "model_api_url": "https://api.openai.com/v1/chat/completions", "api_token": "<YOUR_TOKEN>", "init_config_params": { "temperature": 0, "max_tokens": 1000, "top_p": 1, "response_format": {"type": "json_object"}, }, "prompt_template": { "role": "user", "content": "Can you explain the advancements in chatgpt-4o compared to chatGPT-3.5-turbo?" }, "response_format": {"type": "json_object"}, "max_output_tokens": 8192, "cost_output_tokens": 0.015 }
Azure OpenAI example object
{ "name": "predict_metadata", "value": { "model_name_ext": "azure-gpt-4o-deployment", "model_class": "vianai.llm.AzureGPTWrapper.AzureGPTWrapper", "model_type": "chat", "task_type": "question_answering", "api_token": "YOUR_TOKEN", "init_config_params": { "temperature": 0.0, # tweak temperature or top_p, not both "max_tokens": 1000, # in output "top_p": 1, "response_format": {"type": "json_object"}, }, "model_file": "", "max_output_tokens": 4096, "prompt_template": { "messages": [ {"role": "system", "content": "Output JSON"}, {"role": "user", "content": ""}, {"role": "assistant", "content": ""}, ], "cost_output_tokens": 0.015, }, "api_type": "azure", "api_version": "2024-06-01", "azure_endpoint": "https://hila-benchmarking.openai.azure.com/" }, "status": "active", }
-
Post the predict_metadata object to the
/v1/models POST
API to save it.