Implicit Inversion turns CLIP into a Decoder

Antonio D'Orazio, Maria Rosaria Briglia, Donato Crisostomi, Dario Loi, Emanuele Rodolà, Iacopo Masi

June 2025

Abstract

CLIP is a discriminative model trained to align images and text in a shared embedding space. Due to its multimodal structure, it serves as the backbone of many generative pipelines, where a decoder is trained to map from the shared space back to images. In this work, we show that image synthesis is nevertheless possible using CLIP alone—without any decoder, training, or fine-tuning. Our approach optimizes a frequency-aware implicit neural representation that encourages coarse-to-fine generation by stratifying frequencies across network layers. To stabilize this inverse mapping, we introduce adversarially robust initialization, a lightweight Orthogonal Procrustes projection to align local text and image embeddings, and a blending loss that anchors outputs to natural image statistics. Without altering CLIP’s weights, this framework unlocks capabilities such as text-to-image generation, style transfer, and image reconstruction. These findings suggest that discriminative models may hold untapped generative potential, hidden in plain sight.

Type

Arxiv-Preprint

Publication

arXiv preprint (technical report)

Antonio D'Orazio

PhD Student

Hi! 👋 I’m Antonio, a Ph.D. student at Sapienza University. Fascinated by the world of Computer Graphics, my research interests span from solving inverse problems in the Graphics domain to finding explainable representations using Neuro-Symbolic AI.

Maria Rosaria Briglia

PhD Student

Hello everyone! My name is Maria Rosaria, a Ph.D. student in AI Security, based in Sapienza University. My main research interest is in developing adversarial techniques in the generative AI domain, with a particular focus on Diffusion Model’s technology, and applying them also to the world of Explainable AI. My main research topics are Diffusion Models, Adversarial Machine Learning and Explainble AI by counterfactual examples.

Donato Crisostomi

Research Assistant (assegnista)

PhD student @ Sapienza, University of Rome | former Applied Science intern @ Amazon Search, Luxembourg | former Research Science intern @ Amazon Alexa, Turin

Iacopo Masi

Associate Professor (PI)

My research interests include computer vision, biometrics, AI.