UMassAmherst_VanHorn

PI: Grant Van Horn · University of Massachusetts Amherst

Computer and information sciences

Fine-grained visual classification has blossomed under the advances in large-scale image datasets like iNaturalist as well as algorithmic contributions like ResNet, ViT, Vision-Language Models (VLMs) like CLIP, and Multimodal Large Language Models (MLLMs). However, VLMs and MLLMs have garnered increased interest in FGVC due to the surprising fact that underperform more classical, simpler, and smaller approaches. We currently are investigating the root causes and solutions for this underperformance, focusing on making sure that VLM/MLLM responses are visually-grounded (eg. fine-tuningt), figuring out more faithful ways to evaluate their responses (evaluation procedures and benchmarks), and being able to steer predictions with expert knowledge (Visipedia).

306 TB

Data delivered over the OSDF

36,165

Jobs

107.1K

Files via OSDF

165.4K

CPU hours

607.6

GPU hours

Cumulative usage · Jul 2, 2025 – Jul 2, 2026

Get involved

Bring your data onto the fabric.

Request an access point and connect your first repository in an afternoon — facilitation is free.

Contact us →