FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Apple Machine Learning Research This paper was accepted at the Workshop on Foundation Models in the Wild at ICLR 2025. Visual understanding is inherently contextual – what we focus on in an image depends on the task at hand. For instance, given an image of a person holding a bouquet of flowers, we may focus […]Continue reading