Apple announced its participation in CVPR 2025, taking place in Nashville, Tennessee. This event serves as a significant platform for advancing research in artificial intelligence (AI) and machine learning (ML), and Apple is proud to be a sponsor, showcasing cutting-edge research and fostering discussions within the community.
Key Research Presentations
Apple researchers will present their latest findings across various topics in computer vision. These include vision language models, 3D photogrammetry, large multimodal models, and innovative video diffusion models. Attendees can engage with interactive demonstrations of this research at Apple’s booth #1217 during the exhibition hours.
FastVLM: Efficient Vision Encoding
One standout research contribution is FastVLM, aimed at improving the performance of vision language models (VLMs) by addressing the inefficiencies of popular visual encoders at high resolutions. FastVLM introduces FastViTHD, a hybrid vision encoder that reduces encoding time and token output, thus enhancing the accuracy-latency trade-off crucial for real-time applications.
Matrix3D: A Unified Photogrammetry Model
In a Highlight presentation, Apple will showcase Matrix3D, an all-in-one model that revolutionizes 3D scene reconstruction from 2D images. This unified approach integrates multiple photogrammetry tasks while using multimodal training strategies to enhance performance and increase available training data.
Autoregressive Pre-Training for Vision Encoders
Another focus will be on multimodal autoregressive pre-training for large vision encoders. Apple’s researchers have developed a family of vision encoders that excel at multimodal tasks and visual recognition benchmarks, becoming more efficient in their training process compared to existing models.
World-Consistent Video Diffusion
Additionally, Apple will introduce World-Consistent Video Diffusion, which addresses the challenge of generating 3D-consistent video content. This model offers flexibility for various tasks, from single-image-to-3D conversion to camera-controlled video generation.
Supporting the ML Research Community
Apple remains committed to supporting underrepresented groups in the ML community, sponsoring multiple affinity group-hosted events at CVPR. Such initiatives underscore Apple’s dedication to fostering a diverse and inclusive environment in the field of machine learning.
Apple invites all attendees to explore its research advancements and engage in insightful discussions at CVPR 2025.
Share this post