Skip to main content

Table 1. User agreement (K’s \(\alpha \)) on a subset of PASCAL VOC test set, number of samples per category in this subset, and best machine-generated results (AI-1 and AI-2, avg. precision) on the whole test set.

From: “Are Machines Better Than Humans in Image Tagging?” - A User Study Adds to the Puzzle

Concept #Samples per concept Human (K’s \(\alpha \)) AI-1 AI-2
Airplane 77 0.980 0.986 0.998
Cat 96 0.978 0.955 0.990
Bird 101 0.976 0.934 0.976
Dog 123 0.974 0.947 0.989
Sheep 35 0.970 0.874 0.950
Cow 34 0.960 0.821 0.943
Horse 42 0.959 0.929 0.985
Bus 45 0.956 0.910 0.959
Train 55 0.953 0.960 0.987
Boat 46 0.940 0.922 0.964
Motorbike 55 0.938 0.921 0.972
Bicycle 68 0.909 0.860 0.947
TV monitor 63 0.898 0.827 0.942
Person 420 0.895 0.950 0.988
Car 126 0.848 0.836 0.948
Sofa 79 0.796 0.678 0.868
Bottle 93 0.761 0.654 0.836
Dining table 61 0.737 0.796 0.881
Chair 123 0.716 0.734 0.904
Potted plant 59 0.668 0.594 0.768
Overall/MAP - 0.913 0.854 0.940