Poster Companion

This interactive gallery highlights representative PSF-Med failures where semantically equivalent clinical questions trigger different answers from the same model on the same chest X-ray.

The examples below are intentionally curated for a poster audience: compact, phone-friendly, and focused on the core failure mode. Image filenames, internal record IDs, and other nonessential identifiers have been removed from the published view.

5 Curated examples
3 Models compared
4-5 Paraphrases per case
1 Same image, different wording
Read the PSF-Med preprint Benchmark and code Back to research overview

Failure Cases

Select a case to inspect the image, the original question, and how each model responded to rephrasings.

Case details