Home Paper Released: SV-RAG
Post
Cancel

Paper Released: SV-RAG

SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding

My Adobe internship work has been accepted as a conference paper at ICLR 2025: “SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding.” [PDF] Huge thanks to my mentor, Ruiyi Zhang, for his invaluable support and guidance! An improved implementation is available at Self-Visual-RAG, developed after my internship with the support of my labmate at UB.

SV-RAG enhances long-document understanding by adapting MLLMs for self-visual retrieval-augmented generation, optimizing both evidence retrieval and question answering with specialized LoRA adapters.

Specifically, we use hidden states as embedding features and train the model to compute sequence interaction scores via contrastive learning, while using the same MLLM for QA.

poster

Reference

1
2
3
4
5
6
7
@inproceedings{chen2025svrag,
  title={SV-RAG: LoRA-Contextualizing Adaptation of {MLLM}s for Long Document Understanding},
  author={Jian Chen and Ruiyi Zhang and Yufan Zhou and Tong Yu and Franck Dernoncourt and Jiuxiang Gu and Ryan A. Rossi and Changyou Chen and Tong Sun},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=FDaHjwInXO}
}

BibTeX

This post is licensed under CC BY 4.0 by the author.