Performance of 28 vision-language models across 5 regions. All numbers are accuracy (%).
← Back to main page
Submit your results via Pull Request on GitHub.