Paper Released: SV-RAG

Posted Jan 21, 2025 Updated Nov 7, 2025

By Jian Chen 1 min read

SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding

My Adobe internship work has been accepted as a conference paper at ICLR 2025: “SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding.” [PDF] Huge thanks to my mentor, Ruiyi Zhang, for his invaluable support and guidance! An improved implementation is available at Self-Visual-RAG, developed after my internship with the support of my labmate at UB.

SV-RAG enhances long-document understanding by adapting MLLMs for self-visual retrieval-augmented generation, optimizing both evidence retrieval and question answering with specialized LoRA adapters.

Specifically, we use hidden states as embedding features and train the model to compute sequence interaction scores via contrastive learning, while using the same MLLM for QA.

Reference

@inproceedings{chen2025svrag,
  title={SV-RAG: LoRA-Contextualizing Adaptation of {MLLM}s for Long Document Understanding},
  author={Jian Chen and Ruiyi Zhang and Yufan Zhou and Tong Yu and Franck Dernoncourt and Jiuxiang Gu and Ryan A. Rossi and Changyou Chen and Tong Sun},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=FDaHjwInXO}
}

BibTeX

Research, Multimodal Large Language Models

paper

This post is licensed under CC BY 4.0 by the author.