Abstract: Unlike traditional single-modal visual tracking, vision-language tracking leverages textual descriptions to provide semantic understanding, aiding in handling challenges such as appearance ...