Hallucination Framework for hila

Detecting hallucinations in LLM-generated answers

The main approach to hallucination mitigation in hila is post-LLM-generation detection. This means that hila detects hallucinations after they occur and then either highlights potential hallucinations, directly removes the hallucinations from the original answers, or generates a new answer to specifically target and eliminate those hallucinations.

hila is primarily concerned with faithfulness hallucinations, which is generated text that diverges from the provided context that is retrieved via a retrieval system. In particular, hila focuses on identifying context inconsistency and logical inconsistency. With regard to context inconsistency, hila focuses on detecting whether an answer generated by an LLM incorrectly states information derived from the context or states information that is in direct conflict with facts present in the context. Regarding logical inconsistency, hila focuses on detecting whether the LLM fails to maintain a series of logical deductions based on the provided context.

When an LLM-generated answer is unfaithful to the retrieved context, either by contradicting it or by drawing unwanted conclusions from the context, hila aims to highlight exactly which sentences in the generated answer is a potential hallucination. It also specifies the sentences in the context on which it base its detection.

hila hallucination detection techniques include the following:

Entailment-based detection — this is a proprietary, patent-pending VIANAI technique employed in hila.

By determining whether each sentence in the LLM-generated answer is entailed by the most similar sentences in the context, hila can arrive at a measure for hallucination. Vianai has determined through extensive research that simply checking the entailment score between the answer sentences and context sentences is insufficient because that does not capture the variability in sentences found in the answer text and source text. hila therefore applies a thorough process that extends beyond purely using entailment and leverages additional techniques that enable us to capture sentence variability and nuances in text. We are able to thereby provide a hallucination score that considers differences in sentence structure that appear between an LLM-generated answer and the sources the LLM uses to generate that answer.
Iterative refinement technique

An LLM is used to determined whether or not an LLM-generated answer contains a hallucination with regard to a provided context. This is done multiple times for each piece of context to get a score for hallucination. hila then applies additional steps that refine the final answer based on detected hallucinations. This technique not only produces a metric for hallucination (the hallucination score), but also provides an enhanced answer that overcomes hallucinations.
Multistep verification technique

Each LLM-generated answer is sent to an LLM to generate follow-up questions. hila uses these follow-up questions to retrieve more context from the retrieval system, and then uses this to answer the follow-up questions. The follow-up responses are then used to modify the original answer via an LLM. This leads to a more refined answer.

Retrieval Based Hallucination Reduction Methods

If an LLM is provided with an incorrect context from the start, then this leads to an unsatisfactory or factually hallucinatory answer. Vianai has developed techniques to improve the hila retrieval process to mitigate all possible downstream hallucinations.

MARAG — Model Augmented Retrieval Augmented Generation.

This is a proprietary VIANAI technique that is inspired by the way in which humans retrieve information when answering questions during long research processes. It is a complex retrieval process that leverages language models to enhance the retrieval process. This technique not only enables us to find the best possible sources to answer a question, but also enables us to determine what sources to not consider when prompting an LLM. Although this process is computationally expensive and long, it does enable much more factual correctness as the final answer that is generated by an LLM that uses the best possible context retrieved from the hila retrieval system. This technique is more useful for report generation, where longer wait times are expected.