teaching a small llm to filter its own context

the hallucination gap in rag

rag systems have a well-known failure mode: the retriever finds chunks that are topically related but do not actually contain the answer. the generator then hallucinates a plausible-sounding response that appears grounded in the retrieved context but is not. the user sees citations, assumes accuracy, and trusts a fabricated answer.

this is especially dangerous in sensitive domains -- medical, financial, legal -- where a confident-sounding wrong answer can cause real harm. the retriever did its job (found relevant documents), but relevance is not the same as sufficiency. the question is whether the retrieved context actually contains enough information to produce a correct answer.

a binary classifier for context sufficiency

the solution is a lightweight classifier that operates on question-answer-context triplets. given a question, the retrieved context, and a candidate answer, the classifier outputs a binary decision: does this answer actually derive from the provided context, or is it fabricated?

this classifier runs as a validation step between retrieval and response delivery. if the classifier flags the answer as not grounded in the context, the system can fall back to "i don't have enough information to answer this" instead of delivering a hallucinated response.

model and training setup

the base model is llama 3.2 1b -- small enough to run as a sidecar alongside the main rag pipeline without significant latency overhead. at 1 billion parameters, inference is fast even on modest hardware.

training uses lora (low-rank adaptation) with 4-bit quantization via bitsandbytes. this keeps gpu memory requirements low while still achieving strong fine-tuning performance. the sequence length is capped at 2048 tokens, sufficient for most retrieval contexts.

training_config = {
    "model": "meta-llama/Llama-3.2-1B",
    "quantization": "4bit",
    "max_seq_length": 2048,
    "lora_r": 16,
    "lora_alpha": 32,
    "lora_dropout": 0.05,
    "learning_rate": 2e-4,
    "epochs": 3,
    "batch_size": 4,
}

dataset

the training dataset contains 3,200 balanced samples -- half positive (answer is grounded in context) and half negative (answer is hallucinated or unsupported). each sample is a triplet of question, context passage, and answer, with a binary label.

negative samples are generated by pairing questions with contexts from different documents, or by using llm-generated answers that sound plausible but contradict or go beyond the provided context. this teaches the classifier to distinguish between "sounds right" and "is actually supported."

the dataset is available on huggingface as context-relevance-classifier-dataset.

the input format

each training sample is formatted as a structured prompt:

prompt_template = """given the following context, question, and answer,
determine if the answer is supported by the context.

context: {context}

question: {question}

answer: {answer}

is the answer supported by the context? respond with
only 'yes' or 'no'."""

the model learns to output a single token -- yes or no -- making classification fast and deterministic. no need to parse long responses or handle ambiguous outputs.

integration into rag pipelines

the classifier slots into the pipeline after the generator produces a response but before it is returned to the user:

user submits a query
retriever fetches top-k relevant chunks
generator produces an answer conditioned on the retrieved context
classifier evaluates the (question, context, answer) triplet
if classified as grounded: return the answer
if classified as ungrounded: return a fallback response or trigger re-retrieval with expanded search

the latency overhead is minimal. the 1b model classifies a triplet in under 50ms on a single gpu, which is negligible compared to the retrieval and generation steps.

where this matters most

for general-purpose chatbots, occasional hallucination is tolerable. for medical qa systems advising clinicians, financial systems generating investment summaries, or legal systems analyzing contract clauses, it is not. the classifier acts as a safety net -- it cannot guarantee the answer is correct, but it can flag when the answer is not supported by the evidence.

the fine-tuned model is available on huggingface: axondendriteplus/llama-3.2-1b-context-relevance-classifier.