QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-Free Visual Document Understanding
Abstract: In Visual Document Understanding (VDU) tasks, finetuning a pre-trained Vision-Language Model (VLM) with new datasets often falls short in optimizing the vision encoder to identify ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results
Feedback