What is the most important reason for using Computer Vision methods in humanities research? In this article, I argue that the use of numerical representation and data analysis methods offers a new language for describing cultural artifacts, experiences and dynamics. The human languages such as English or Russian that developed rather recently in human evolution are not good at capturing analog properties of human sensorial and cultural experiences. These limitations become particularly worrying if we want to compare thousands, millions or billions of artifacts—i.e. to study contemporary media and cultures at their new twenty-first century scale. When we instead use numerical measurements of image properties standard in Computer Vision, we can better capture details of a single artifact as well as visual differences between a number of artifacts–even if they are very small. The examples of visual dimensions that numbers can capture better then languages include color, shape, texture, contours, composition, and visual characteristics of represented faces, bodies and objects. The methods of finding structures and relationships in large numerical datasets developed in statistics and machine learning allow us to extend this analysis to very big datasets of cultural objects. Equally importantly, numerical image features used in Computer Vision also give us a new language to represent gradual and continuous temporal changes—something which natural languages are also bad at. This applies to both single artworks such as a film or a dance piece (describing movement and rhythm) and also to changes in visual characteristics in millions of artifacts over decades or centuries.