Spilled Energy in Large Language Models

Abstract

We reinterpret the final softmax classifier over the vocabulary of Large Language Models (LLM) as an Energy-based Model (EBM). This allows us to decompose the chain of probabilities used in sequence-to-sequence modeling as multiple EBMs that interact together at inference time. Our decomposition offers a principled approach to measuring where the “energy spills” in LLM decoding, empirically showing that spilled energy correlates well with factual errors, inaccuracies, biases, and failures. Similar to Orgad et al. (2025), we localize the exact token associated with the answer, yet, unlike them, who need to train a classifier and ablate which activations to feed to it, we propose a method to detect hallucinations completely training-free that naturally generalizes across tasks and LLMs by using the output logits across subsequent generation steps. We propose two ways to detect hallucinations: the first one that measures the difference between two quantities that we call spilled energy, measuring the difference between energy values across two generation steps that mathematically should be equal; the other is marginal energy, which we can measure at a single step. Unlike prior work, our method is training-free, mathematically principled, and demonstrates strong cross-dataset generalization: we scale our analysis to state-of-the-art LLMs, including LLaMa-3, Mistral, and Qwen-3, evaluating on nine benchmarks and achieving competitive performance with robust results across datasets and different LLMs.

Publication
International Conference on Learning Representations (ICLR)
Robert Adrian Minut
Robert Adrian Minut
PhD Student

Hello! 👋 I’m Adrian, currently a PhD student of the Computer Science department at Sapienza University, dedicated to unraveling the intricacies of Large Language Models (LLMs). My focus revolves around probing the robustness, interpretability and diverse applications of these models, particularly intrigued by how they adeptly handle various tasks.

Hazem Dewidar
Hazem Dewidar
PhD Student
Iacopo Masi
Iacopo Masi
Associate Professor (PI)

My research interests include computer vision, biometrics, AI.