The Web is made for humans, not for machines. The majority of web assets cannot be understood by machines, because of the lack of available explicit and machine readable semantics. To fully automatically discover, process, and link web content, machines must be able to understand its meaning. Nowadays, multimedia documents such as images, video and audio files, but also other electronic documents such as PDFs, various formats for word processors, spreadsheets, slide show presentations, and file archives are indispensable constituents of the Web and use up the majority of the available bandwidth in the Internet. These documents largely contain unstructured data, partly in proprietary formats, which makes it intricate for machines to extract the actual content and meaning. Even though a web browser can display an image, it cannot understand the image content.
Consider the following scenario: someone uploads a holiday photograph to a web server so that it is publicly available to her friends. Those can download the image and admire her in front of that spectacular sight. But, if the image is downloaded by a computer it cannot see or recognize the content of the photograph like a human. Given explicit metadata for that image the computer would know where the picture has been taken, which objects can be seen, etc. Using this knowledge, a machine could provide background information to the user, link it to the personal data of the user, make it retrievable by its content, and suggest to make use of the image for a particular purpose, e.g. as an illustration in a travel blog.
The Semantic Web  introduces languages such as the Resource Description Format (RDF) and the Web Ontology Language (OWL) to bring structure to the content of web pages with the goal to provide explicit and machine understandable semantics. One way to provide explicit semantics in HTML pages is the inclusion of microdata, such as RDFa  and schema.orgFootnote 1, to annotate web documents with formal descriptions which are connected with the help of vocabularies to Linked Data resources.
Web documents are delivered via the Hypertext Transfer Protocol (HTTP). By using HTTP content negotiation different versions of the same web document can be identified and accessed via one unified URI . To access information resources in the Web of Data, for Linked Data resources the same URI is used to access a human readable HTML document as well as a machine understandable RDF version of the same resource . This mechanism should not be restricted to Linked Data resources only. Content providers should provide content-wise descriptions and metadata for every kind of asset on the Web including multimedia data. Moreover, this should be accomplished with minimal effort, i.e. without an overhead to laboriously create supplementary metadata in a manual way.
When requesting a web asset’s URL via HTTP, the computer receives a copy of the original resource. In order to provide a machine understandable explicit semantic description of the web asset, HTTP content negotiation should be enabled and on request an RDF description of the content of the web asset can be delivered. This RDF description can be provided manually, from existing metadata, or with the help of automated analysis algorithms. Overall, the possibility to automatically receive machine readable metadata lowers the barrier for machines to understand and correctly interpret web assets.
In this paper we propose a framework based on standardized web protocols to enable the delivery of machine readable content related metadata for arbitrary documents on the web independent of the web document’s type, modality, and encoding. To enable a smooth and least effort delivery of metadata we propose to utilize the content negotiation mechanism that enables to identify the original content as well as its metadata via the same URI. We demonstrate the feasibility of our approach with a prototype implementation that combines automated visual analysis as a web service with the content negotiation and metadata delivery mechanism with little effort for any content provider.
The paper is structured as follows: Sect. 2 describes technologies and description formats related to content representation, followed by potential use cases and service provisioning. Section 3 provides a detailed description of the prototypical implementation. Section 4 summarizes related approaches and Sect. 5 concludes the paper.