Abstract: Recent advances in image understanding have enabled methods that leverage large language models for multimodal reasoning in remote sensing. However, existing approaches still struggle to ...
This repo implements UniTok, a unified visual tokenizer well-suited for both generation and understanding tasks. It is compatiable with autoregressive generative models (e.g. LlamaGen), multimodal ...
Abstract: This paper proposes a novel architecture that efficiently integrates visual place recognition (VPR) and loop closure detection (LCD) into a unified system, evaluated on challenging outdoor ...