Documentation
hila platform
Applications
Conversational Analytics
VIANAI
Documentation
Conversational Analytics

    Integrate your own LLMs

    You can add a container with a custom model directly to hila or you can connect to an external model.

    For internal models, start with step 1. For external models, start with step 2.

    1. (Internal only) Build a custom container with the model.

      • Expose the port with the REST endpoint, “/predict”.
      • Push the container to your container registry.
    2. Create a custom job. A job starts the container that has the custom model.

      Post the following body to the /v1/job-types POST API, giving a custom job name and the path to the image in your container registry.

       {
           "name": "string",
           "image_url": "string",
           "runtime_object": "knative"
       }
      
    3. Use or edit an existing wrapper or create a new one.

      To view the wrappers:

      1. Open edahub in hila monitoring app. See Open edahub.

      2. In the left pane of edahub, navigate to /source/vianai/llm. The wrappers reside in this directory.

      The wrappers associated with the hila models are:

      gpt-4o vianai.llm.GPTWrapper.GPTWrapper
      gpt-4o-mini vianai.llm.GPTWrapper.GPTWrapper
      azure-openai-gpt4o vianai.llm.AzureGPTWrapper.AzureGPTWrapper
      azure-openai-gpt4o-mini vianai.llm.AzureGPTWrapper.AzureGPTWrapper
    4. Create a placeholder model object that points to the model and the wrapper, and passes parameters needed for the wrapper to interact with the LLM.

      The object must contain the predict_metadata tag as shown in the following examples.

      OpenAI example object

       {
           "project_name": "LLM Project",
           "model_name_ext": "gpt-4o",
           "model_type": "chat",
           "model_class": "vianai.llm.GPTWrapper.GPTWrapper",
           "model_api_url": "https://api.openai.com/v1/chat/completions",
           "api_token": "<YOUR_TOKEN>",
           "init_config_params": {
               "temperature": 0,
               "max_tokens": 1000,
               "top_p": 1,
               "response_format": {"type": "json_object"},
           },
           "prompt_template": {
               "role": "user",
               "content": "Can you explain the advancements in chatgpt-4o compared to chatGPT-3.5-turbo?"
           },
           "response_format": {"type": "json_object"},
           "max_output_tokens": 8192,
           "cost_output_tokens": 0.015
       }
      

      Azure OpenAI example object

       {
           "name": "predict_metadata",
           "value": {
               "model_name_ext": "azure-gpt-4o-deployment",
               "model_class": "vianai.llm.AzureGPTWrapper.AzureGPTWrapper",
               "model_type": "chat",
               "task_type": "question_answering",
               "api_token": "YOUR_TOKEN",
               "init_config_params": {
                   "temperature": 0.0,  # tweak temperature or top_p, not both
                   "max_tokens": 1000,  # in output
                   "top_p": 1,
                   "response_format": {"type": "json_object"},
               },
               "model_file": "",
               "max_output_tokens": 4096,
               "prompt_template": {
                   "messages": [
                       {"role": "system", "content": "Output JSON"},
                       {"role": "user", "content": ""},
                       {"role": "assistant", "content": ""},
                   ],
                   "cost_output_tokens": 0.015,
               },
               "api_type": "azure",
               "api_version": "2024-06-01",
               "azure_endpoint": "https://hila-benchmarking.openai.azure.com/"
           },
           "status": "active",
       }
      
    5. Post the predict_metadata object to the /v1/models POST API to save it.

    TABLE OF CONTENTS
    Copyright © 2025
    Vianai Systems, Inc.
    All rights reserved.