Multimedia Tools and Applications

, Volume 70, Issue 2, pp 977–1005

Building mobile multimedia services: a hybrid cloud computing approach


    • RWTH Aachen University
  • Yiwei Cao
    • RWTH Aachen University
  • Ralf Klamma
    • RWTH Aachen University

DOI: 10.1007/s11042-012-1100-6

Cite this article as:
Kovachev, D., Cao, Y. & Klamma, R. Multimed Tools Appl (2014) 70: 977. doi:10.1007/s11042-012-1100-6


Mobile multimedia services are in high demand, but their development comes at high costs. The emergent computing paradigm cloud computing has great potential to embrace these issues. In fact, we are at the early stage of the coalescence of cloud computing, mobile multimedia and the Web. Motivated by the tremendous success story of the Web based on its simplicity principles, we argue for a comprehensive review on current practices of web and mobile multimedia cloud computing techniques for avoiding frictions. We draw on experience from the development of advanced collaborative multimedia web applications utilizing multimedia metadata standards like MPEG-7 and real-time communication protocols like XMPP. We propose our i5CLoud, a hybrid cloud architecture, which serves as a substrate for scalable and fast time-to-market mobile multimedia services. This paper demonstrates the applicability of emerging cloud computing concepts for mobile multimedia.


Mobile multimediaCloud computingMultimedia metadataXMPP

1 Introduction

The widespread use of smartphones and other mobile devices contributes to unprecedented sharing of mobile multimedia on social networking sites like Facebook or streaming on web sites like YouTube. Additionally, web and mobile multimedia converge, as the mobile networks become an integral part of the Internet.

However, in current development practices and in the literature we can observe trends which may lead to unnecessary frictions in the development of professional mobile web multimedia applications. There is an asymmetry in multimedia material production, in multimedia processing and in multimedia material consumption, occurring in web browsers on desktops or in apps on mobile devices. Professional or semi-professional services for multimedia management and services are available for free or at very low rates. Part of this success story is the use of standards and protocols in web applications like HTML and HTTP. However, if we have a closer look at mobile devices themselves, it is not possible for technical amateurs to access multimedia materials without highly specialized mobile applications (apps) or to share all multimedia materials via their mobile web browsers. The reasons are the innovation speed in mobile technologies and the lack of standardization. Flash is an extreme example for one of those discussions among high-tech companies fencing their claims. Consequently, great opportunities of mobile devices like real-time collaboration among mobile users and mobile semantic processing of multimedia materials can only be realized for high costs in specialized app groups.

On the other hand, cloud computing has great potential to overcome the current problems with asymmetric production and use of multimedia materials. However, it has still its limitations in mobile settings. Most cloud computing research and development is concentrated on the scalability of businesses over the Web and neglecting mostly the needs of (mobile) users [32]. Similar to the cloud computing paradigm, mobile cloud computing envisions transparent augmentation of mobile device capabilities via ubiquitous wireless access to cloud storage and computing resources. As a new computing paradigm developers are still tackling how to utilize the benefits of cloud services properly. It is still questionable, whether it is the right way to rely on public cloud services completely (e.g. Amazon) and how to integrate in-house hardware infrastructure into the cloud. This could address different challenges to different developer groups who possess different levels of resources.

Thus, we aim to deal with the task of bringing the multimedia services to mobile users via cloud technologies. We argue, that we have to try out and evaluate new standard and protocol suites combined with the mobile cloud computing delivery model in order to improve current development practices and to shape mobile and web multimedia convergence. For instance, HTML5 [26] has great potential to remedy the mobile device fragmentation and XMPP takes a role as a cloud protocol gradually [25, 30, 45].

Contributions of this paper include a requirements overview, an architecture, a platform and services for mobile multimedia cloud computing. We are sure that we have not identified all issues on the way to a mobile multimedia architecture, and that at the moment, implementation of those use cases is blurred by obscure technical problems and limited development resources. But we are also sure, that the engineering knowledge we share here is valuable for other researchers and developers.

In this paper we describe our experiences with the development of advanced multimedia services and applications (Section 2). From these experiences it was clear that a comprehensive view on the different perspectives contributing to the use of mobile multimedia cloud computing and the Web is still missing. Section 3 systematizes these perspectives. With our gained knowledge we designed a conceptual hybrid multimedia architecture described in Section 4 overcoming many issues already identified. Especially, this hybrid mobile multimedia architecture help developers prototype and develop their apps fast. It saves time of taking care of buying hardware, installing software, and setting up runtime environments. The commonly used services can be well reused, so that developers can focus on their core functionality and achieve faster time-to-market than usual.

Furthermore, through the new mobile multimedia cloud computing services (Section 5) we tackle problems like combining private cloud infrastructure and public one offered by big Internet vendors, and multimedia metadata collaboration and improvement of mobile video user experience. We address the evaluation of our system and services in Section 6. Finally, we refer to related work in Section 7, and conclude and describe future work in Section 8.

2 Experiences from building multimedia Web applications

During the last years within our research group we have developed several prototypes of advanced multimedia web applications. Virtual Campfire [10] embraces a set of advanced applications for communities of practice [55]. It is a scenario to create, search, and share multimedia artifacts with context awareness across user communities. Through standard protocols, a large variety of (mobile) interfaces facilitate a rapid design and prototyping of context-aware multimedia community information systems. Virtual Campfire uses mobile multimedia semantics expressed in multimedia metadata and multimedia context expressed as ontology models. In the background, Virtual Campfire is fueled by our community engine called Lightweight Application Server (LAS) [48] which follows the service-oriented principles.

However, with the existing architectural solution we learned that several changes are needed. In fact, this a common problem with many web applications (generally using HTTP/HTML and LAMP) which fail to meet the following requirements [19]. First, mobile multimedia applications should be able to handle large data sets and many users. The reason is that the rise of mobile multimedia, social media, and Internet of Things increases the volume and detail of information that applications need to handle [51]. Second, application scale independence allows the application to grow by orders of magnitude without having to change the application. The Web 2.0 development model requires the ability to automatically scale (up and down) with the users’ requests [3]. The video creation website Animoto is a common example which went from 25000 to 250000 users when it got the sudden popularity on Facebook. Third, as computers are evolving from tools into media that connects human and non-human actor representations in digital social networks [28], the ability to rapidly deployment and reuse common features like real-time collaboration can make differences in the time-to-market process. Finally, novel protocols and standards like XMPP, SIP, or HTTP DASH tend to overcome some of the issues in dynamic mobile/Web ecosystem. The HTML5 [26] standardization efforts consider the requirements for interactive applications that can run both on traditional computers and smartphones, thus bridge over the gap between them. Streaming media and metadata, HTML5, XMPP, and WebSockets have great potential to empower users’ rich multimedia sharing experiences across Web and mobile devices.

Under these considerations we have designed a cloud-based multimedia framework for mobile and web services using emerging protocols and standards, as partially described in [30, 31]. The validity of this approach is shown later in this paper through the following application prototypes:
  • SeViAnno [11] is an interactive Web platform for MPEG-7 based semantic video annotation. SeViAnno features a well-balanced trade-off between a simple user interface and video semantization complexity (see Fig. 1). It allows video annotation based on the MPEG-7 standard with integrated various tagging approaches on multi-granular and community levels.

  • AnViAnno is a mobile application for context-aware acquisition, sharing and consumption of mobile videos and images (see Fig. 2). Additionally, AnViAnno captures the device (spatio-temporal) context which is further used to support the semantic annotation on mobile devices and web clients. Semantic annotations are realized as MPEG-7 Semantic Base Types including Agent, Concept, Event, Object, Place, and Time (see Fig. 2 right). AnViAnno seamlessly integrates with the web-based SeViAnno, in regard to the multimedia content and metadata. Actually, AnViAnno can be considered as the mobile counterpart of SeViAnno. Additionally, this application allows collaborative metadata creation and sharing.

  • Cloud Video Transcoder (ClViTra) is cloud service that adapts video content by transcoding videos in multiple versions with different parameters, which are then used for adaptive video delivery to different clients (see Section 6.1).
Fig. 1

A screen snapshot of SeViAnno’s user interface featuring a video player (top left), video information and video list (bottom left), user-created annotations (top right), and Google map mashup for place annotations (bottom right)
Fig. 2

Screen snapshots AnViAnno—an Android application for context-aware multimedia acquisition, annotation, sharing, and consumption

These applications and others (which are out of the scope of this paper) are built by using services of our multimedia-centric cloud platform called i5Cloud (see Section 4). Ultimately, the goal of i5Cloud is to enable a single person, i.e. a technical amateur, to design and run large-scale multimedia applications with a few efforts. The next section systematizes these considerations.

3 Requirements perspectives on mobile multimedia cloud computing

As mentioned above, many emerging and advanced technologies are available to enrich mobile multimedia experiences. Cloud-based applications are complex information systems. Therefore, we propose three crucial perspectives on the requirements for a mobile multimedia cloud platform: technology, mobile multimedia, and user and community perspective. Each perspective has further sub-perspectives related to technology selection and realization. These three perspectives are coming from practical experiences and literature survey. They are not the complete analysis of mobile multimedia cloud computing; however, they give a complimentary coverage and useful guidelines for its realization. We have chosen these perspectives because technology drives innovation in new services, multimedia is a central artifact in today’s digital world, and users and communities are the main actors on Web 2.0.

3.1 Technology perspective

The technology perspective founds a basic ground to facilitate mobile cloud computing for application portability and platform independence. Internet & American Life Project and Elon University’s “Imagining the Internet” Center have conducted a survey showing that over some 71% think that most people will work in Internet-based cloud applications such as Google Docs and in applications run from smartphones by 2020 [42]. They state that mobile devices will still be the driving forces for people to make use of cloud-based services and applications. However, there are still technological barriers to use cloud services on capacity-limited mobile devices.

Data management

With the growing scale of web applications and the data associated with them, scalable cloud data management becomes necessary part of the cloud ecosystem. Some of the popular scalable storage technologies in the moment are Amazon Simple Storage Service (S3), Google BigTable, Hadoop HBase and HDFS, etc. Basically, these distributed blob and key-value storage systems are very suitable for multimedia content, i.e. they are scalable and reliable as they use distributed and replicated storage over many virtual servers or network drives. The common approach to increasing availability (reducing access latency or increasing bandwidth) is to use solutions like content delivery networks (CDNs) which address the issue how to deliver static multimedia files on the edge.

Regarding multimedia metadata management, the aforementioned techniques are inferior compared with traditional relational databases and ontologies. As these data storage technologies fall into the category of NoSQL databases, they trade off the schema, joins, and ACID transactions for elastic horizontal scaling and big data storage. The ACID transaction principle refers that execution of transactions must be guaranteed with atomicity, consistency, isolation, and durability. This is rather more challenging for the distributed data storage in the cloud than centralized database management systems (DBMS). Furthermore, applications need to understand their priorities, since it is impossible to achieve Consistency, Availability and Partition-tolerance (the CAP theorem) at the same time [22]. Meanwhile, recent research work reports on data management tools with elastic (scale up and down) consistency which can be based on different parameters (application requirements, load, and cost) [3, 5].


Mobile multimedia clouds require broadband Internet connections in order to meet the required quality of experience (QoE). The 4G mobile networks with increased upload speed of 500 Mbit/s and download speed of 1 Gbit/s [40] open new classes of interactive applications such as instant streaming of videos [20] and remote-rendered 3D content [39]. Application protocols like XMPP and SIP together with their extensions are powerful protocols for cloud services that demonstrate several advantages beyond traditional HTTP-based Web services, e.g. SOAP and REST [30]. XMPP provides a common layer to connect human-to-human, human-to-machines and machine-to-machine synchronous and asynchronous communications [25]. For this reason, Almeida and Matos [1] showed the scalability (in terms of throughput required per number of nodes) and traffic overhead advantages of XMPP over SOAP/HTTP.


The clouds have a huge processing power at their disposal, but it is still challenging to make it truly accessible to mobile devices. The traditional client-server model and Web services/applications can be considered as the most widespread cloud application architectures. However, several other approaches to augmenting the computation capabilities of constrained mobile devices have been proposed. Offloading has gained big attention in mobile cloud computing research, because it has similar aims as the emerging cloud computing paradigm, i.e. to surmount mobile devices’ shortcomings by augmenting their capabilities with external resources. The full potential of mobile cloud applications can only be unleashed, if computation and storage are offloaded into the cloud, but without hurting user interactivity, introducing latency or limiting application possibilities [30]. Representative examples include the execution of software images on virtual machines into the cloud [14] or into nearby computers (called cloudlets) [46]. Instead of offloading the whole mobile software stack, some propose offloading of application parts as computation tasks [33]. The applications could be automatically split [27], or developed intentionally for an adaptive shift of its execution between a cloud and a device [57]. Basically, these approaches enable mobile application developers to have the illusion as if he/she is programming on much more powerful mobile devices with higher computational and storage capacities. Recent studies have showed that offloading can efficiently save energy [16, 34] and increase performance [38] by an order of magnitude on popular mobile platforms.

3.2 Mobile multimedia perspective

The mobile multimedia perspective pertains to aspects related to multimedia itself as a rich resource for multimedia processing. It is related to how multimedia is presented or encoded on the lower level, analyzed, and modeled or processed on the higher level. This requirement is addressed along with the high growth of mobile multimedia traffic. Cisco Systems has reported that mobile video traffic was 52 percent of all mobile traffic by the end of 2011 and this growth will continue to be over 80% in 2016 [15].

Content adaptation

Multimedia applications may require transferring adapted multimedia through different interconnected networks, servers, and clients with different media modality and quality. Multimedia content is usually compressed using compression algorithms or codecs, in order to achieve smaller file size for faster transmission or more efficient storage. However, different mobile device media platforms are based on different formats, containers and coding. For example, if we consider video codecs, Android supports H.263, H.264 AVC, MPEG-4 SP and VP8, while iPhone supports H.264 and MPEG-4. Obviously, in order to achieve interoperability in the heterogeneous ecosystem of mobile platforms, adaptation services are needed. In general, video adaptation requires large computing resources especially if there are a vast number of users requesting at the same time. Clouds tend to abstract the technological complexities connected with seamless multimedia content adaptation. For instance, cloud software-as-a-service (SaaS) solutions like and have emerged on the Internet, which can do the heavy lifting of CPU-expensive video encoding, thus relieving clients from upfront investment.

Multimedia semantics

Low-level multimedia analysis like feature extraction, metrics and segmentation can be automatically processed. The input values to these attributes help browsers process multimedia files of various formats easily. Multimedia analysis, machine learning methods and logic-based modeling have proved to be good for discovering complex relations and interdependencies, which are serving as input for reasoning in the media interpretation processes.

Multimedia modeling

Mobile devices are able to produce different kinds of multimedia content. Moreover, those rich sensing functionalities embedded in mobile devices provide valuable context information which can be used for indexing, querying, retrieval, and exploration of the multimedia content. Multimedia metadata standards like MPEG-7 or ontologies are the foundations for semantic multimedia knowledge representation and interpretation [9]. In the case of mobile multimedia, the semantics and metadata help the adaptation and personalization processes.

3.3 User and community perspective

This perspective is related to user’s and community’s experiences with regard to mobile multimedia applications. Pew Research Center reported that 71% of online adults used video-sharing sites such as YouTube and Vimeo as of May 2011.1 On those video-sharing sites, users and communities are the main actors who produce and consume multimedia on mobile devices. The previous two perspectives are addressed to mobile multimedia service developers, while this perspective is more related to end-user requirements. User interfaces are the access bridge between users and applications. And user evaluation procedure can be well applied to test this perspective.

Sharing and collaboration

People as social beings by nature would like to interact with each others. Meanwhile, the capabilities of mobile networks and devices craft new ways of ubiquitous interaction over Web 2.0 digital social networks. Consequently, mobile devices, Web 2.0, and social software result in two phenomena. First, there is an exponential growth of user-generated mobile multimedia on Web 2.0 which, on the other hand, is a driving force for further mobile device improvements. Second, there is a large number of diverse emergent communities, i.e. groups of people, usually co-workers or groups of people who have similar interests trying to perform some tasks to achieve a common goal. These two phenomena clearly show the demand for easy creation, sharing, and collaboration of multimedia elements such as photos, videos, interactive maps, and learning objects, etc. Fortunately, sharing is an indigenous part of cloud services. Sharing through a cloud generally enhances the quality of service, because cloud-to-device connections tend to be better than device-to-device connections.

Ubiquitous multimedia services

One of the biggest challenges in future multimedia application development is device heterogeneity. Future users are likely to own many types of devices. One-quarter of mobile users are predicted to own two or more mobile-connected devices by 2016 [15]. Switching from one device to another users would expect to have ubiquitous access to their multimedia content. Cloud computing is one of the promising solutions to offloading the tedious multimedia processing on mobile devices and to making the storage and access transparent.

Privacy and security

The adoption of cloud computing will affect security in mobile systems. The aspects are related to ensuring that the data and processing controlled by a third party is secure and remains private, and the transmission of data between the cloud and the mobile device is secured [35]. Holistic trust models of the devices, applications, communication channels and cloud service providers are required [41].

In summary, all these three perspectives make a good structure how to take various complex aspects into consideration at developing mobile multimedia cloud architecture. Although they have some overlapping sub-perspective as well, the foci are distinctive and a combination of different (sub) perspectives can refine the requirements. For example, “Ubiquitous multimedia services” is tightly related to the mobile multimedia perspective as well. From the technology perspective XMPP is related to provisioning of XMPP servers; whereas it is related to realization of real-time collaboration from the user and community perspective. Furthermore, MPEG-7 and ontologies enable expressing multimedia semantics. The combination of MPEG-7 and XMPP enables easy creation of scalable semantic multimedia real-time collaborative applications. Sections 4 and 5 exemplify the three perspectives.

4 i5Cloud: a multimedia cloud computing architecture

Considering the general requirements from the aforementioned three perspectives, we propose i5Cloud, a cloud architecture for multimedia applications and services. On the one hand, we design this architecture mainly with focus on technology and mobile multimedia perspectives. On the other hand, the user and community perspective is considered via facilitating and realization of multimedia services.

Three layers can be distinguished in the architecture diagram in Fig. 4. From bottom up, the infrastructure and platform layers focus on requirements from the technology perspective. The multimedia service layer considers the issues from the mobile multimedia perspective. Using the multimedia services developers are able to build scalable (mobile) multimedia applications that reflect the user and community requirements.

The lower layer, Infrastructure Layer, includes storage, compute, and basic software infrastructure. It can be broken down into several realms to facilitate services and applications to run smoothly. Accordingly, Solaris Container Driver is developed to manage the Sparc Enterprise Sun Server in i5cloud. The Deltacloud API enables web services to start, stop, persist, destroy, and monitor virtual instances. Figure 3 illustrates the state machine diagram of virtual instances in the cloud. These virtual instances operate on dedicated CPU, virtual/physical memory and storage resources which are easily configured. Virtual machines are grouped into realms which present boundaries between different computing resources. We have three realms, i.e. a processing realm for parallel processing over many machines, a streaming realm responsible for scalable handling of streaming requests, and a general realm for running other servers such as XMPP or Web servers.
Fig. 3

State machine diagram of cloud computing instances (according to Deltacloud API)

At the middle layer, Platform Layer, the resource manager elastically scales up and down active computing nodes (virtual instances) according to the request demand. In addition to computation and storage services, the i5Cloud platform also provides services for data streaming from the cloud to clients and vice versa. Media streaming is achieved by using standard software such as FFMpeg and Wowza2 servers, and text-based data (or metadata) is streamed using OpenFire XMPP servers. Thanks to the extensibility of XMPP, the OpenFire server can easily be extended with its own or third-party plugins. Our collaborative metadata services demonstrate these features (see Section “Collaborative metadata services”). The data storage manager handles the multimedia content and metadata in a highly-available manner. Some of the functionalities of i5Cloud at this level are exposed as platform services which the application developers can use for more flexible control of the execution environment. For example, file transfer and RTP streaming provide direct access to data storage. Furthermore, monitoring services could update applications with status information at real time. This has been used in our video transcoding services with more details described in Section 6.1.

The upper layer of the architecture, Multimedia Service Layer, is explained in the forthcoming sections with more details. In brief, the concurrent editing and multimedia sharing components are the engine for the collaborative multimedia and semantic metadata services which are further the main building blocks for collaborative multimedia applications. The MPEG-7 metadata standard is employed to realize the semantic metadata services. Moreover, the intelligent media adaptation services enable more interactive mobile video applications, i.e. they contribute to a mobile user experience improvement.

Generally speaking, the purpose of i5Cloud is not to compete with public cloud providers like Amazon AWS or Google AppEngine, which provide a whole palette of generic compute and storage cloud services. In contrast, i5Cloud is distinguished in several aspects. First, it is a specialized framework that enables a single developer or a technical amateur to build large-scale applications with focus on multimedia. The burden of scalable multimedia and metadata management is leveraged by the framework. The application developer focuses on the application logic.

Second, we use a hybrid cloud computing strategy. That means the i5Cloud takes advantage of in-house commodity hardware infrastructure which is usually available in most organizations, companies or institutions. In a case of cloud burst, i.e. more resources are needed than those available in the private cloud pool, i5Cloud can automatically reach external public cloud infrastructures such as Amazon EC2. As a result of this hybrid cloud computing approach, we achieve a balance between resource limitations and re-utilization of existing infrastructure. Intuitively, such hybrid approach has better security (main components run on controlled hardware) and lower latencies (with respect to data locality). It can be used by individuals or small organizations that want dynamically expand their system capacities by leasing resources from public clouds at a reasonable cost, but still retain control of own applications and data.

i5Cloud uses two core technologies that enable the next-generation mobile multimedia applications (Fig. 4). First, it uses Deltacloud API for the infrastructure to enable cloud interoperability. Second, it puts the bi-directional application-level protocol XMPP as a main communication protocol (except for multimedia content delivery and streaming). The following subsections provide better insights in these core components.
Fig. 4

Layered architecture of i5Cloud. The virtualized computing and storage infrastructure enables scalable and highly-available multimedia-centric services with easy development

Deltacloud API: enabling cloud interoperabilty

Deltacloud3 plays a big role in the i5Cloud architecture. Its RESTful API layer enables cross-cloud interoperability on infrastructure level with other cloud providers, e.g. Amazon EC2, Eucaliptus or GoGrid. The Deltacloud Core framework can be extended by creating intermediary drivers that interpret the Deltacloud RESTful API on the front while communicating with cloud providers using their own native APIs on the back. The drivers abstract the differences of different IaaS cloud providers. The Deltacloud API thus provides single unified access to heterogeneous cloud infrastructures.

Moreover, this cloud computing architecture is not constrained to run on our infrastructure only since the infrastructure layer runs on top of virtualized hardware. The upper three layers can easily be migrated on another public or private cloud infrastructure, thanks to the virtualization abstraction on service level. In other words, we are using an unified API for common cloud infrastructure management. This is crucial since many different virtualization layers exist. Popular cloud middleware like Eucaliptus or OpenNebula are restricted to the most popular virtualization technologies like Xen or KVM, which were not supported by our hardware. Deltacloud doesn’t restrict the used virtualization technology. For example, our working prototype uses Sun Solaris Containers virtualization technology which is pre-built in our in-house hardware. Thus, Delatacloud makes it possible to use heterogeneous commodity hardware.

XMPP: the glue for mobile cloud services

An essential enabling technology in our approach is the Extensible Messaging and Presence Protocol (XMPP) [45]. As an instant messaging protocol XMPP is able to enhance real-time collaboration on multimedia metadata, adaptation and sharing. XMPP-based communication takes place not only among various users and their communities, but also among multimedia cloud services as well.

Many advantages over existing technologies make XMPP a highly interesting candidate for next-generation online services. Many researchers have identified XMPP as suitable cloud protocol [25, 54]. When integrating devices with clouds, we must consider two main communication aspects, i.e. communication between the devices and the cloud, and in-between the services within the cloud. An XMPP network can be seen as a complete XML-based routing framework upon which a messaging middleware can be built. Hence, an XMPP-based middleware can be used to integrate different services into a distributed computing environment. For example, the application modules, external sensors and external services are XMPP entities identified by unique JIDs (Jabber IDs). Likewise, the long lived processing tasks could easily benefit from the built-in publish/subscribe messaging without the need of HTTP long pooling techniques. HTTP was originally designed to accommodate query and retrieval of web pages and didn’t aim at rather complex communication. The intrinsic synchronous HTTP protocol is unsuitable for time-consuming operations, like computation-demanding database lookups or video processing.

5 The Multimedia Services Layer in i5Cloud

The Multimedia Services Layer delivers to users and communities mobile services. This section covers some fundamental services that leverage the i5Cloud infrastructure to mobile multimedia use cases which in turn demonstrate the applicability and usefulness of our approach. First, we describe data management services for seamless mobile/web integration in the multimedia value chain, i.e. acquisition, adaptation and delivery of multimedia documents. Second, we explain services for multimedia metadata management.

5.1 Content management services

Mobile devices are becoming indivisible part of the Web today. And mobile/web integration means not only mobile web pages, but also an integration of mobile devices as equal nodes on the Internet, the same as desktop PCs and servers. Actually, the Web nowadays is a common communication channel for multimedia content that is “prosumed” on different personal computing devices such as desktops, laptops, tablets, or smartphones.

Content Management Services support multimedia sharing across different platforms ranging from web clients to mobile devices seamless. We demonstrate cross-platform video data management as seen from the mobile multimedia perspective and the technology perspective. The demo prototype highlights the mobile and web multimedia integration. The mobile application AnViAnno seamlessly integrates with the web-based SeViAnno (see Section 2). The seamless multimedia integration is done through i5Cloud which does the heavy-lifting of the necessary multimedia operations such as transcoding, adaptation, highly-available storage, responsive delivery, and scalable processing resources.

Both prototypes AnViAnno and SeViAnno demonstrate a current trend in web applications, i.e. mobile/web integration. Nowadays, most services are offered on different platforms using the Web as a common denominator. Seamless and uniform interaction with multimedia has become a prerequisite factor for successful services. However, the heterogeneity of end devices inevitably requires multimedia adaptation.

Ubiquitous multimedia acquisition and delivery

Using AnViAnno, users can record and annotate video/image content which can be uploaded or directly streamed to the cloud repository. In the i5Cloud, the videos are transcoded with the cloud video transcoding service. At the same time, the semantic metadata services handle the metadata content and store it in the MPEG-7 metadata store, making it available to other multimedia services. The management of all MPEG-7 data is implemented as LAS Web services. The LAS multimedia and user management services enable Web clients to create, search, and retrieve MPEG-7 metadata. Afterwards, the uploaded videos are available in SeViAnno for search, viewing and further annotation. More details about the SeViAnno part can be found in our previous research work [11].

5.2 Metadata management services

As an important part of multimedia, metadata can be used to describe useful information about multimedia artifacts and its content in a machine-readable format. We provide a set of services for mobile clients to annotate multimedia collaboratively at real-time and to share multimedia and its metadata. Additionally, the services exploit the rich mobile context information.

Context-aware semantic annotation services

With AnViAnno users can also semantically annotate videos. The annotation is based on the MPEG-7 metadata standard [29]. MPEG-7 is one of the most complete existing standards for multimedia metadata. It is XML-based and consists of several components: systems, description definition language, visual, audio, multimedia description schemes, reference software, conformance testing, extraction and use of MPEG-7 descriptions, profiles and levels, schema definition, MPEG-7 profile schemata, and query format. However, several different approaches have been used as metadata formats in multimedia applications such as the Ontology for Media Resources [56] and COMM [4]. In order to enable interoperability between systems using these different formats, we have implemented or used mapping services. For example, our MPEG-7 to RDF converter [8] is able to convert MPEG-7 documents into RDF documents for further reasoning the fact deriving about the multimedia.

As mentioned in Section 2, users are able to capture videos and annotate the videos with rich semantics in the MPEG-7 standard. The user-generated annotations are further used to navigate within video content or improve the retrieval from multimedia collections. For example, users can navigate through the video(s) using a seekbar or semantic annotations. The videos and their annotations are exposed to other internal LAS MPEG-7 services [10] and external clients.

The mobile devices enable users to make the initial metadata enrichment on site during the multimedia acquisition and capture context cues. However, they have limitations in regard to input modes and screen sizes. Our multimedia services using the i5Cloud enable extension of the metadata management on desktops or laptops using the SeViAnno web application.

Collaborative metadata services

The user-generated multimedia content changes relatively slowly after its creation. On the other hand, the associated metadata is under constant modification. For example, a video creator initially describes and tags a new video. But after sharing the video, many other people contribute to the video with annotations, hyperlinks, comments, ratings, etc. therefore, the success of multimedia services highly depends on features for metadata sharing collaborative metadata editing.

One of the key services for real-time collaboration is shared editing of XML documents. XML has been established as de facto a standard for data exchange and interoperability, including multimedia metadata standards. Since the nature of XML is generic and extendable, different kinds of information can be represented, such as graphic files (SVG), augmented reality content (ARML), and multimedia metadata (MPEG-7), etc. Therefore, real-time collaboration within multimedia applications in practice generally means concurrent editing of metadata documents in XML.

We use XMPP as a main communication protocol to support real-time collaboration. Our XMPP-based Mobile Multimedia Collaboration (XMMC) services run on top of i5Cloud. Multiple XMMC clients can bi-directionally stream XML messages over the XMMC i5Cloud services using XMPP channels. The Android client application (AnViAnno) also operates as a tool for multimedia acquisition, annotation with metadata, and multimedia content and metadata consumption (see Fig. 5).
Fig. 5

Main components of AnViAnno—an Android client application for semantic multimedia. This application can communicate with i5Cloud over three different connectors

On the cloud side, the XMPP communication is conducted via an Openfire XMPP Server [43] (see Fig. 4). The system is built on top of the XMPP server as an Openfire plugin. Two XMPP modules are responsible for the XMPP communication between clients.

First, Media Catalog Module is responsible to create, retrieve, update, and delete operations for multimedia content and metadata. The Media Catalog Module persists in the basic multimedia-related data on a relational database and ensures consistency of this data.

Second, Collaborative Metadata Module handles metadata-related services, i.e. metadata management and synchronization of the annotation metadata at real-time among all collaborating client applications. The concurrent XML editing service is based on the Collaborative Editing Framework for XML (CEFX+) [21, 53]. CEFX+ enables concurrent editing of XML documents at real-time using operational transformation algorithms [49]. The synchronization is done by keeping a copy of the XML at every client in the session and then in case of edit operations the service ensures timely updates on the copies. If some conflicts exist, it resolves them and broadcasts the changes with a message for conflict resolution on all clients.

During the synchronization of all working document copies, a copy is also situated at LAS MPEG-7 Integration Service so that the service would act as a client that synchronizes the XML metadata file and calls the related LAS MPEG-7 Semantic Base Type Services whenever an XML document is updated. Thus, XMMC achieves interoperability with pre-exiting and standard-compliant multimedia repositories (such as one described in [7]).

Figure 6 demonstrates the sequence diagram of collaborative multimedia annotation. First, Mobile Client-1 sends a <collanno-req> get IQ in order to retrieve metadata XML of a multimedia. Then, i5Cloud Collaborative Annotation Module (CAM) gets the corresponding XML file from Collaborative Editing Service (CES) and sends back the file via XEP-096 SI File Transfer extension of XMPP [36]. Once the transfer of the file succeeds and is acknowledged from client side, CAM sends back <collanno-req> result IQ so that clients can start with annotation.
Fig. 6

Sequence diagram of a collaborative annotation

Meanwhile, Mobile Client-2 also wants to participate in the collaborative annotation session. It sends a <collanno-req> message and is followed by the same procedure that happens for Mobile Client-1. Both clients share the editing session. Client-1 performs an insert, delete or update operation. The operation is propagated to all participants of the editing session via <groupchat> message of the XEP-0045 Multi-User Chat extension of XMPP [44]. The XML document at i5Cloud is also updated. Finally, CES notifies the LAS MPEG-7 Integration Service for the changes done.

In summary, the main contribution of the multimedia services layer of i5Cloud and client applications is to augment resource-poor mobile devices through cloud services such as fast intelligent video adaptation services, in order to enhance user experiences in mobile multimedia and to enable metadata creation, real-time sharing, and concurrent collaborative editing.

6 Evaluation

The evaluation aimed to assess i5Cloud and its mobile multimedia services from the three perspectives described in Section 3. The i5Cloud infrastructure layer was tested for the technology and multimedia perspective. This experiment was conducted as transcoding of video files. On the other hand, the ubiquitous multimedia management and collaborative metadata services were assessed from the user and community perspective.

6.1 Cloud Video transcoding

Cloud Video Transcoder is a scalable hybrid cloud application which uses i5Cloud and Amazon Web Services (AWS). Video transcoding is a technique to adaptively deliver video streams across heterogeneous networks. Basically, video is converted from one compressed format to another compressed format for adjustment to the channel bandwidth or the receiver. Video transcoding [52] is a CPU-intensive operation; therefore, it is a suitable test domain for scalable cloud computation applications. The main idea behind the Cloud Video Transcoder is to use its own private cloud infrastructure, and to start and use extra instances from a public cloud provider if demanded. In a more concrete sense, users upload multiple videos to the system, if the number of videos in the transcoding queue is more than free instances in the cloud, new instances are started and the videos are transcoded in parallel. When the transcoding queue gets empty, these extra instances are terminated. In the cloud, the cloud video transcoding service transcodes the video into streamable formats and stores the different versions of the video.

The i5Cloud runs on a Sun SPARC Enterprise T5240 Server with Solaris 10 operating system. The server supports maximum 128 simultaneous threads, that means 128 single-threaded instances as virtual machines. Amazon EC2 is used as the external public cloud provider. Table 1 contains information about the hardware profiles of the VM instances in use. The i5Cloud instance types are modeled according to Amazons setup, in order to have a coarse comparison at least. FFMpeg4 version 0.8.2 is used as a video processing library.
Table 1

Hardware profile details of the i5Cloud and Amazon EC2 instances



Amazon EC2








1 GB

2 GB

2 GB

613 MB

1.7 GB

1.7 GB


2 CPU threads

4 CPU threads

8 CPU threads

Up to 2 EC2 Compute Unitsa (for short periodic bursts)

1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)

5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)

Instance startup time

234 s

232 s

235 s

96 s

96 s

97 s

Max. no. instances




aOne EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor

The usage of computing resources of i5Cloud can easily be monitored at real time. The resource manager exposes its status through the monitoring interface (i5Cloud API). The resource monitor in Fig. 7 shows the status monitoring of resource usage during video transcoding tasks from different original formats into selected formats. Additionally, the cloud resource allocation can be visualized at real time during the transcoding operations of uploaded videos. The overall CPU workload is balanced and reduced through employment of additional Amazon EC2 instances. They have been started based on a scheduling algorithm, in order to support i5Cloud during a request peek (see Fig. 7). More cloud instances can be instantly employed depending on the workload. When the workload lowers, the number of instances is reduced to optimize resource usage for efficiency.
Fig. 7

Cloud resource usage monitoring console: CPU load of processing instances

Figure 8 shows a comparison chart of transcoding execution time on different compute instance types for different file sizes. These three videos were with identical parameters, i.e. same codec, bitrate, and frames per second.
Fig. 8

Transcoding execution time on instances with different hardware profiles in i5Cloud private cloud and Amazon EC2 public cloud infrastructure provider

Figure 9 clearly shows the advantage of the cloud computing approach. Using commodity hardware and virtualization technology enables us to get a hybrid cloud computing environment that elastically scales up and down depending on the workload. Particularly, the chart plots the execution times for transcoding of 15 identical videos. Each video was 10.73 MB, with duration of 2 min. 8 sec. and encoded with a wmv3 codec. The videos had the bitrate of 626 kb/s and 30 frames per second. The number of videos was kept constant and only the number of instances was increased at every iteration starting with 2 instances till 16 instances. Further increase of the number of instances would not affect the execution time since the number of instances would be greater than the number of videos, i.e. only 15 instances would have been busy and the rest would have been idle. The diagram clearly shows the scalability features of our i5Cloud architecture, i.e. the execution time can be reduced with increase of the number of instances. The scalability performs similar on our private infrastructure and on a public cloud infrastructure (Amazon EC2). Please note that the diagram doesn’t show that i5Cloud private instances perform better than Amazon EC2, but they exhibit a similar scalability feature. The differences in the execution time are probably due to the different hardware profiles.
Fig. 9

Scalability of I5Cloud. The chart shows the total transcoding time of 15 identical video files using different number of instances (x-axis)

The goal of i5Cloud (and Cloud Video Transcoder) is not be a data center, but rather using cloud computing principles, to provide development primitives for multimedia applications, which common single-machine applications cannot provide. Additionally, the advantage of multimedia applications developed on top of i5Cloud is that they can easily be deployed on other cloud infrastructure. Therefore, two factors in the this use case are important, i.e. the ability to scale with the load and offload to other infrastructure when necessary, which are shown above.

Nevertheless, real-world web applications need to handle thousands of streams. In order to show the benefit of this hybrid approach, we conducted a simulation using the CloudSim toolkit which is one of the most popular cloud simulation tool [6]. CloudSim allows modeling of datacenters, physical hosts, virtual machines, processing jobs, etc. We modeled the Cloud Video Transcoder with two datacenters representing the private i5Cloud and the public Amazon cloud. The parameters for the virtual machines were set the same as the small instances at i5Cloud and Amazon. The other parameters were inferred from the empirical results presented above. The experiment consisted of simulating sudden request demand of 10000 videos for processing. Each video was set to have size of 100MB. The simulation was repeated five times with the maximum number of private (i5Cloud) instances (25), and then adding 5 additional Amazon EC2 instances at every iteration. Figure 10 shows the speed up in processing by simply offloading in the public cloud and respective cost by using Amazon provider5. For the sake of simplicity, in this model the network latencies were not considered in the scheduling algorithm and the data transfer costs were not calculated. These results are very coarse estimate.
Fig. 10

Simulation of Cloud Video Transcoder using the CloudSim simulator. The maximum 25 i5Cloud medium instances are running all the time. Upon a requests spike of 10000 videos, the Cloud Video Transcoder offloads to the Amazon cloud. At each iteration 5 additional EC2 instances are added, which reduces the total processing time, but on the other hand it increases the monetary costs

6.2 Collaborative metadata services in cultural heritage

In order to evaluate i5Cloud and its services from the user and community perspective, we apply a use case scenario with regard to professional community’s requirements. In the user study and performance test we conducted, seven participants tested the mobile services for collaborative metadata annotation. Different Android smartphones and tablets were used. The OS version ranged between Android 2.2 and 3.1. The devices’ hardware ranged between 400 MHz and (dual-core) 1.2GHz CPUs and between 256 MB and 1 GB RAM memory. In order to enable participants to assess the different qualities of the services, they completed short-term tasks that simulated documentation work at a cultural heritage site. The tasks consisted of multimedia acquisition and sharing, context-aware metadata creation and collaborative annotations. The goal of the user study was to evaluate the context-aware mobile collaborative services delivered from the i5Cloud.

A simulated research team consisting of experts on different fields like archeology, architecture, history etc, are documenting an archaeological site. Team members were distributed in the field. Figure 11 outlines the scenario. First, a documentation expert discovers some artifacts on site and documents them with photos and videos. He also tags the multimedia content with basic metadata like name and description. The multimedia is stored to the Collaborative Multimedia Cloud, i.e. using the mobile multimedia services running on top of i5Cloud. Other experts, on-site or remote, join the session to collaboratively annotate the multimedia at real-time for different aspects. Therefore, a collaboration session is established by the Collaborative Multimedia Services. Then, for example, an architecture expert annotates the origin date/period of an artifact. This annotation is propagated to all team members immediately. Then, a historian augments that annotation, since he has deeper knowledge in these scopes. His correction is also pushed to all others seamlessly.
Fig. 11

XMPP-based mobile multimedia collaboration

The mobile multimedia cloud platform comes with real-time support, multimedia acquisition and sharing with other users, collaborative multimedia metadata annotation and integration with existing metadata repositories. Users can navigate through the video and place MPEG-7 based annotations collaboratively to fulfill the requirements from the user and community perspective.

Ensuring real-time responsiveness within mobile network settings can be problematic due to the unstable, low-bandwidth, high latency mobile network characteristics. We evaluated the CEFX+ framework in collaborative annotation context on mobile devices. Performance test for sending and executing remote updates were conducted and the results were analyzed. We used two Motorola Milestone smartphones. We measured the time passed during a mobile client’s generating an operation with adding, modifying, or deleting a semantic annotation and sending the generated operation to the other mobile client until corresponding operation is executed on the client side. In the test scenario, ten insert, ten update and ten delete operations have been generated.

The collected performance values, demonstrated in Fig. 12, reveal average execution time of 412 ms and standard deviation of 209 ms. Both average and deviation values are acceptable for near-real-time text-based collaboration applications. However, these values mostly depend on the network characteristics, we have conducted our tests with WiFi connection, the processing time can be longer in different mobile networks like EDGE or 3G.
Fig. 12

Execution time of remote concurrent XML editing operations on two Android devices in a WiFi network

The feedback results shown in Fig. 13 suggest that the collaborative metadata services have increased cultural heritage awareness of participants to certain extent. The prototype proved to have useful user interfaces with helpful collaboration features such as chatting, real-time concurrent editing of annotations and real-time notifications. However, some issues still should be solved especially in regard to providing collaborative services over unpredictable wireless network connections.
Fig. 13

Partial evaluation results from the user study of the collaborative multimedia annotation services

6.3 Discussion and future work

The emergence of cloud computing makes a deep impact on the entire life cycle of mobile multimedia. Since cloud systems exhibit features of complex information systems, we propose three perspectives on how to consider such systems.

Table 2 shows a mapping of the (sub-)perspectives on the mentioned services, applications and components. The video streaming and the XMPP-based metadata streaming deal with the cloud communication. Deltacloud API with video transcoding and processing services embrace the cloud computing benefits. The cloud video transcoding service takes charge of multimedia formats; the metadata services treat the multimedia semantics; and the intelligent processing services use multimedia (in this case video) analytics to deliver better UX on mobile devices. Furthermore, AnViAnno and SeViAnno demonstrate the convergence of mobile and Web multimedia.
Table 2

Mapping of the perspectives


Current status

Future work




Cache, replicated storage, CDN




Adaptive HTTP streaming (DASH)



Cloud burst offloading, Deltacloud

Improved scheduling model and monitoring

Mobile Multimedia



Cloud Video Transcoder

Scalable video coding



MPEG-7-based Semantic video annotation

W3C Media annotations



Context models [8]

User and Community

   Sharing & collab.


XMPP-based mobile multimedia collaboration

Mobile near real-time video stream sharing



AnViAnno & SeViAnno

Distributed user interfaces

   Privacy & security

Basic user authentication

ACL policies, OpenID

Additionally, Table 2 it lists some of the future steps in order to fully realize the vision of a mobile multimedia cloud. For instance, using scalable video coding techniques we could reduce the footprint of adapted content. The new DASH standard is a promising solution for mobile streaming to heterogeneous devices. Currently, the multimedia metadata employed is limited to only one standard, i.e. MPEG-7 Semantic Base Types. In order to have integration with other video platforms and more importantly to incorporate annotations from other sources, mapping tools are required. The W3C Media Annotations Working Group6 works on a promising solution to providing an ontology and API for media object cross-platform integration. A LAS Service and API will be developed in order to provide access to many other multimedia repositories easily. We are developing more multimedia services that benefit from i5Cloud. Additionally, there are open issues in i5Cloud such as more optimized resource scheduling where only an experimental validation can incorporate the complexities of such architectures and infrastructures with specific features [24].

7 Related work

Cloud computing has been hyped a lot in the last years. Existing cloud computing tools tackle only specific problems such as parallelized processing on massive data volumes [17], flexible virtual machine (VM) management [2] or large data storage [13]. Since it is not a new technology but a technology trend, many researchers try to put their work under the cloud computing umbrella. This also holds for multimedia research projects [12].

Several commercial services with cloud features have emerged lately which try to relieve the use from long-processing and pricey-software media operations.7,8 The cloud video transcoding services using i5Cloud is not to be compared with such services, since its main purpose is to demonstrate the capabilities of our i5Cloud hybrid cloud architecture.

Many research projects explore the rich possibilities of application level protocols like XMPP out of the traditional domain of instant messaging applications [25]. MOBILIS [47] project provides a set of generic services for real-time collaboration based on XMPP and integration to the existing Web 2.0 social networks. The system takes advantage of some existing XEPs that provides service discovery, multiuser chat, file transfer and enriched presence services. Junction [18] is an application-level communication platform, providing an XMPP-based framework and libraries for cross-platform application cooperation (mobile and web). These projects exploit the benefits of the XMPP protocol for collaboration and peer-to-peer XML messaging, but mostly for mobile multi-player games and social applications. They are not focused on multimedia metadata applications and cloud services.

A lot of research work has been done in collaborative document editing area. It is still a challenging task to support multiple users to edit a shared document at the same time. Apache Wave [50] (formerly known as Google Wave) and Novell Vibe [37] represent some of the state-of-the-art systems for collaboration. More interestingly, they also use the XMPP as a communication protocol. They are very powerful collaboration platforms, but they are still not applicable on mobile platforms because they have a large footprint which mobile devices cannot support currently. CEFX+ [21, 53] on the other hand, has been already used in mobile applications and we have used it as a backbone in our collaborative metadata services.

8 Conclusions

We have described the key challenges faced by mobile/web multimedia applications through three sets of requirements perspectives: technology perspective, mobile multimedia perspective, and user and community perspective. i5Cloud addresses the opportunity for a hybrid cloud computing solution tailored to this class of applications. It provides a substrate for scalable compute services, real-time communication and cloud-interoperable applications, which try to exploit novel protocols and standards.

The described use cases cover different aspects from the listed perspectives on mobile multimedia cloud computing. The evaluation of the research results is split in two parts: cloud video transcoding with focus on the technology and the mobile multimedia perspectives, and collaborative metadata services with focus on the user and community perspective. The perspectives prove to be useful in helping us develop and evaluate such complex mobile multimedia services.


This work is supported by the Excellence Initiative of German National Science Foundation (DFG) within the research cluster Ultra High-Speed Mobile Information and Communication (UMIC) and in part by NRW State within the B-IT Research School. We thank Gökhan Aksakalli and Michael Lottko for their prototype implementations.

Copyright information

© Springer Science+Business Media, LLC 2012