Home Paper Released: Self-Visual-RAG
Post
Cancel

Paper Released: Self-Visual-RAG

My Adobe internship work has been accepted as a conference paper at ICLR 2025: “SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding.” Huge thanks to my mentor, Ruiyi Zhang, for his invaluable support and guidance! An improved implementation is available at Self-Visual-RAG, developed after my internship with the support of my labmate at UB.

SV-RAG enhances long-document understanding by adapting MLLMs for self-visual retrieval-augmented generation, optimizing both evidence retrieval and question answering with specialized LoRA adapters. demo1

Specifically, we use hidden states as embedding features and train the model to compute sequence interaction scores via contrastive learning, while using the same MLLM for QA.

demo2

This post is licensed under CC BY 4.0 by the author.