Designing and evaluating the usability of an API for real-time multimedia services in the Internet
In the last few years, multimedia technologies in general, and Real-Time multimedia Communications (RTC) in particular, are becoming mainstream among WWW and smartphone developers, who have an increasing interest in richer media capabilities for creating their applications. The engineering literature proposing novel algorithms, protocols and architectures for managing and processing multimedia information is currently overwhelming. However, most of these results do not arrive to applications due to the lack of simple and usable APIs. Interestingly, in this context in which APIs are the critical ingredient for reaching wide developer audiences, the scientific literature about multimedia APIs and their usability is scarce. In this paper we try to contribute to fill this gap by proposing the RTC Media API: a novel type of API designed with the aim of making simple for developers the use of latest trends in RTC multimedia including WebRTC, Video Content Analysis or Augmented Reality. We provide a specification of such API and discuss how it satisfies a set of design requirements including programming-language agnosticism, adaptation to cloud environments, support to multisensory multimedia, etc. After that, we describe an implementation of such an API that has been created in the context of the Kurento open source software project, and present a study evaluating the API usability performed in a group of more than 40 professional developers distributed worldwide. In the light of the obtained results, we conclude that the usability of the API is adequate across the main development activities (i.e. API learning, code creation and code maintenance), with an average usability score of 3.39 over 5 in a Likert scale, and that this result is robust with respect to developers’ profiles, cultures, professional experiences and preferred programming languages.
KeywordsMedia server Real-time multimedia communications Application Programming Interfaces WebRTC Multimedia processing Multimedia tools and applications Cognitive dimensions of notations
Analysts such as Marc Andreessen claim that “software is eating the world” stressing the importance of software-centered models into the economy and the transition of traditional business to software-based organizations . This trend is permeating into all areas of IT (Information Technologies) including also multimedia industries. In the last few years, we have witnessed how multimedia technologies have been evolving toward software-centered paradigms embracing cloud concepts through different types of XaaS (Everything as a Service) models .
More recently, another turn of the screw is taking place thanks to the emergence and popularization of APIs (Application Programming Interfaces). This is perfectly summarized by Steven Willmott with his claim “software is eating the world and APIs are eating software” . Software developers worldwide are getting used to create their applications as a composition of capabilities exposed through different APIs. These APIs are typically accessible through SDKs (Software Development Kits) and expose in an abstract way all kind of capabilities including device hardware, owned resources and remote third party infrastructures. This model, applied to cloud concepts, is quite convenient for individual developers and small companies, which have now the opportunity of competing with large market stakeholders without requiring huge effort investments and without needing to acquire hardware infrastructure or software licenses. Thanks to this, in the last few years, we are experiencing an explosion of innovation with thousands of new applications and services both for WWW and smartphone platforms that are being catalyzed by the rich and wide ecosystems of APIs made available to developers.
This trend towards the “APIfication” is also invading the multimedia arena and, very particularly, the RTC (Real-Time multimedia Communications) area. Initiatives such as WebRTC  are bringing audiovisual RTC in a standard and universal way to WWW users. The main difference between WebRTC and other popular video-conferencing applications is that WebRTC is not a service, but a set of APIs enabling WWW developers to create their customized applications using standard WWW development techniques.
WebRTC belongs to the HTML5 ecosystem and has awakened significant interest among the most important Internet and telecommunication companies. As opposed to other previous proprietary WWW multimedia technologies, it has been conceived to be open in a broad sense, both by being based on open standards and by providing open source software implementations. Currently, a huge standardization effort on WebRTC protocols is taking place at different IETF working groups (WGs), being the RTCWeb WG the most remarkable one . In turn, WebRTC APIs are being defined and consolidated at the W3C WebRTC WG . WebRTC standards are still under maturation stage and they might take some time to consolidate. In spite of this, most of the major browsers in the market already support WebRTC and it is currently available in billions of devices providing interoperable multimedia communications.
Hence, WebRTC is an opportunity for the creation of a next generation of disruptive and innovative multimedia services catalyzed worldwide through those emerging APIs. However, to reach this goal, the WebRTC ecosystem needs to evolve further. Basing on WebRTC browser capabilities, services can only provide peer-to-peer communications, which restrict use-cases to simple person-to-person calls involving few users. In order to enhance this model, server side infrastructures need to be involved. This is not new: as it is well known, the traditional WWW architecture is based on a three tier model  involving an application server layer and a service layer, this latter typically reserved to databases. In the same way, rich media applications also base on an equivalent three tier model where the service layer provides advanced media capabilities. The media component in charge of providing such capabilities is typically called media server in the jargon.
Group communication capabilities: These include mixing and forwarding. This type of media servers is called MCU (Multipoint Control Unit)  following the H.323 terminology and usually takes the form of Mixing Mixers or Selective Forwarding Units (SFU) .
Media archiving capabilities: These are related to the recording of the audiovisual streams into structured or unstructured repositories and the ability to recover them later for visualization.
Media bridging capabilities: This refers to attaining interoperability among networks or domains having incompatible media formats or protocols. Transcoders and IMS (IP Multimedia Subsystem) Gateways  are among the most popular in this area.
Media servers are a critical ingredient for transforming WebRTC into the next wave of multimedia communications, and the availability of mature solutions exposing simple to use yet powerful APIs is a necessary requirement in that area. However, most standardization and implementation efforts are still concentrated at the client side, and server side technologies are still quite fragmented. Although there are a relevant number of WebRTC media servers available they do not provide coherent APIs compatible with WWW development models. Developing solutions with them typically requires expertise with low level protocols such as SIP , XMPP  or MGCP , on which average WWW developers do not have any experience. In addition to this, most state-of-the-art WebRTC media servers just provide the three basic capabilities specified above and are extremely hard to extend with further features. However, nowadays, many RTC services involve person-to-machine and machine-to-machine communication models and require richer multimedia processing capabilities such as computer vision, augmented reality, speech analysis and synthesis, etc.
In this paper we propose an evolution on current state-of-the-art RTC media servers presenting a new type of RTC API for media server control, which has been designed for usability. This API addresses many current state-of-the-art limitations, such as the ones described above, and is aligned with WWW development principles, architectures and methodologies. The main contributions of this paper are threefold. First, we introduce the main concepts of the above mentioned API. Second, we present how developers may leverage it and create applications providing transparent interoperability among heterogeneous formats and protocols through a modular and extensible architecture. Third, we present an evaluation of the proposed API usability based on the Cognitive Dimensions of Notations (CDs) , which is a lightweight framework created for describing and analyzing the usability of notational systems, such as user interfaces, programming languages and APIs.
The remainder of this paper is as follows. Section 2 summarizes some approaches of RTC media servers and APIs available in the literature. Section 3 presents the proposed RTC Media API and illustrates how to create applications with it. Section 4 describes a survey in which our API is evaluated by means of a research questionnaire following the CDs framework. The last section concludes this research with discussion, contributions of the study and suggestions for further work.
2 Related work
2.1 RTC media server control APIs
Media server technologies emerged in 90’s catalyzed by the popularization of digital video services. Initial media servers were specialized into specific functions such as streaming , transcoding  and RTC for audio and video conferencing . In this paper we concentrate on this latter category.
The popularization of video and audio conferencing made RTC media servers to evolve through different types of standards, which include H.323 , where the media server role is played by elements such as the MCU (Multipoint Control Unit) and the IMS (IP Multimedia Subsystem), where media servers are generically called MRF (Media Resource Function) . These standards were conceived by operators and corporate communications solution vendors, who concentrated on the specificities of their infrastructures and not on the needs of developers. As a consequence, the involved media control interfaces were designed based on low level protocols and not on high level friendly APIs. Among such protocols we can find the IETF MGCP , which evolved later into the 3GPP H.248  recommendation. These are based on binary formats, which are hard to understand, implement, debug and extend. Probably due to this, these protocols did not have much impact out of telecommunication providers.
More recently, the commoditization of RTC media server technologies brought increasing interest on more flexible mechanisms for media control. Several IETF WG emerged with the objective of democratizing them among common developers. As a result, further protocols such as MSCML , MSML  emerged providing the ability of controlling media server resources through technologies understandable and familiar to average developers such as XML .
Although these protocols are simpler to understand and integrate, developing application on top of them is still a cumbersome, complex and error prone process. Due to this, many stakeholders noticed that the natural tools used by developers are not protocols but APIs and SDKs. Hence, a number of initiatives emerged trying to transform the protocol-based development methodology into an API-based development experience providing seamless media server control through interfaces adapted to programming languages specificities and not to infrastructure characteristics. In particular, the Java platform was one of the first on integrating this philosophy by trying to reproduce the WWW development experience and methodology for the creation of RTC media enabled applications. A relevant activity in this area is JAIN (Java API for Integrated Networks), which issued several APIs for the signaling, control and orchestration of media capabilities. These include the JAIN SIP API , the JAIN SLEE API  and the JAIN MEGACO API ; this latter being specifically devoted to control media servers through the H.248 protocol. JAIN APIs did not permeated much out of operators, but their ideas inspired more popular developments such as the SIP Servlet APIs  for the signaling plane, and the Media Server Control API (aka JSR 309)  for the media plane, which have been more widely used for the development of RTC solutions for voice and video.
Among all these APIs, this paper is especially interested in the JSR 309. JSR 309 concepts were quite revolutionary at the moment because the API tried to fully abstract the low level media server control protocols and media format details. The objective was to enable developers to concentrate on application logic. JSR 309 defined both a programming model and an object model for media server control through a northbound interface, but independent of media server control protocols and hence, without requiring any specific southbound protocol driver. JSR 309 does not make any kind of assumption in relation to the signaling protocol or to the call flow, which are left to the application logic.
From a developer’s perspective, probably the most innovative concept of JSR 309 was the introduction of a mechanism for defining the media processing logic in terms of a topology. This mechanism is based on an interface called Joinable. In JSR 309, all objects having the ability to manipulate media (e.g. send, receive, process, archive, etc.) implement such interface, which has a join method enabling interconnecting such objects following arbitrary dynamic topologies. Hence, a specific media processing logic can be implemented by developers just joining the appropriate objects. As an example, if you want to create an application mixing two RTP (Real-time Transport Protocol) streams and recording the resulting composite into a file, you just need to join the appropriate objects with the appropriate topology. Taking into consideration that in JSR 309 the NetworkConnection is the class of objects capable of receiving RTP streams, that MediaMixer is the class of objects with mixing capability and that MediaGroup is the class with be ability of recording; the above mentioned media topology can be achieved just by joining two NetworkConnection instances to a MediaMixer instance, which in turn, is joined with a recording MediaGroup. This approach makes possible for developers to conceive their media processing logic as graphs of “black-box” joinables, which is a quite modular and intuitive mechanism for working in abstract terms with the complex concepts involved in RTC multimedia applications.
Another relevant innovation of JSR 309 is the introduction of media events. Thanks to this mechanism, the media processing logic held by a media server can fire events to applications through a publish/subscribe mechanism. This is very convenient for enabling applications to become media-aware meaning that complex processing algorithms at the media server can provide asynchronous information dealing with things happening inside the media, for instance DTMF (Dual-Tone Multi-Frequency) tones being detected, voice activity being present, and so on.
JSR 309 permeated into mainstream developer audiences as a suitable API for media server control following the typical three tier model . However, in the last few years, the emergence of novel technologies and computation paradigms have made JSR 309 to show relevant limitations. For example, nowadays group videoconferencing services are evolving from Media Mixing models, which require relevant media processing, towards SFU (Selective Forwarding Unit) models, which are based on media routing . JSR 309 is heavily adapted to Media Mixing and, due to this, most of its APIs assume that participants send/receive only one media stream to/from the media server. As a consequence, SFU models do not fit nicely into JSR 309 APIs. This is particularly a problem when all the streams of a group videoconference are multiplexed into a single RTP session, as happens typically on modern WebRTC SFU media servers supporting bundle RTP , because JSR 309 APIs do not provide any kind of mechanism for demultiplexing streams from a NetworkConnection. Moreover, in JSR 309 the API specification explicitly forbids several input NetworkConnections to be joined to a single output NetworkConnection, as an SFU router would require. Instead, they need to be joined first to a MediaMixer, which, in turn, can be joined to the output NetworkConnection.
When looking to other modern RTC technologies, we notice again that the JSR 309 design has limitations. For example, if we consider WebRTC W3C APIs , we may understand that they split endpoint capabilities into different functional blocks each of which is exposed through an abstract interface (e.g. RtpSender, RtpReceiver, PeerConnection, etc.) However, if we want to expose WebRTC media server capabilities through JSR 309 we need to accept that endpoints can only be represented through the NetworkConnection interface, which is extremely limited to support rich WebRTC capabilities such as DataChannels , Trickle ICE , simulcast , etc.
JSR 309 shows also drawbacks in relation to its extensibility. In JSR 309 it is possible to support new media object types using MediaGroups, however, configuration of these new types have to be done with Media Server specific descriptions as strings, which cannot be validated by the compiler. It is important to note that these new media object types cannot be NetworkConnection, only MediaGroups. This is a hard limitation because no other network protocol different than RTP (negotiated through SDP) can be incorporated. The ideal would be to allow supporting the creation of new object types in a similar way than the core types, with factory methods in MediaSession (e.g. createNetworkConnection, createMediaGroup, etc.), but this is not possible as MediaSession is an interface defined in JSR 309 API and hence it cannot be modified by the API user.
A counter-intuitive asynchronous development model based on an obscure joinInitiate primitive, which is incompatible with modern Java mechanism for managing asynchrony such as futures, continuations or lambdas. This lack of clean asynchronous programming model makes JSR 309 difficult to adapt to reactive programming frameworks and languages that are very demanded by developers today such as Node.js or Scala.
A complete lack of mechanisms for monitoring and gathering quality stats on media sessions. This is an essential ingredient for production systems.
JSR 309 is designed specifically for the Java language. It would be desirable a portable API that can be used in as more languages as possible.
This API is specifically designed to control Media Servers for phone communications because it exposes concepts like Dialogs (Prompt and record, DTMF, VoiceXML dialog, etc.). For example, it is mandatory for an implementation to provide a player with the capability to detect audio signals in DTMF, but this kind of functionality is not very useful in web applications.
2.2 Foundations of API evaluation and characterization
APIs are critical, non-optional and cross-cutting in the construction of modern software systems . Programming is a hard mental work and developers need to deal with large amounts of information for writing satisfactory code. In that duty, APIs are the most critical ingredient, especially when dealing with distributed systems and enterprise frameworks. For example, recent works  show that API misuse is the single most prevalent cause of software defects.
Designing APIs consists on conceiving abstractions through types and interfaces so that they can be consumed seamlessly, efficiently and safely by application developers. This is quite a complex topic for which very little is known and which requires interdisciplinary knowledge combining cognitive psychology and software engineering. However, the responsibility of API design is typically assigned to development team members who often do not have expertise or training in this area and who, typically, are more concerned with implementation details than with usability.
In spite of the well-known importance of APIs, API design and evaluation has not been a mainstream research topic and only recently some light has been shed on this area. Early attempts to investigating APIs typically followed unstructured and ad-hoc approaches concentrating on the specificities of given technologies. For example, works have been published with guidelines and recommendations for API design in C# , Java  or C++  and for ad-hoc evaluation of new programming languages .
Using another perspective, some authors concentrated on specific problems transversal to all APIs with independence on their underlying technologies. Some remarkable efforts on this area enabled to understand that, for instance, the factory pattern tends to generate usability problems  and that there is a systematic set of questions that developers have when learning new APIs . All these efforts are relevant due to the talent of their authors to detect and isolate common patterns and practices, but they do not make possible to build a consistent and reusable methodology for the area.
During the last decades further authors have tried to systematize the problem of API design and usability evaluation from a holistic perspective. Different approaches have been created for this [2, 12, 21, 28]. However, in this area, the one which has gained highest popularity is the Cognitive Dimensions of Notations (CDs) framework [13, 36]. CDs is a framework for describing the usability of notational systems. In this context, a notational system typically consists of a collection of symbols made on some medium and which define a behavior (i.e. meaning) through some kind of structured interactions. Examples of notational systems include English text on paper, buttons on a WWW GUI or programing with API calls on an IDE. CDs allow designers of notational systems to evaluate their designs with respect to the impact they have on the users of those designs.
The CDs framework is not an analytic method. Rather, it is a set of discussion tools for use by designers and people evaluating designs whose main aim is to improve the quality of discussion. CDs emerged because, at the end of the day, API design is more of an engineering craft than a scientific discipline. It is subject to elements of affect, of fashion and of social acceptance, in addition to technical considerations. For these reasons, we can learn from studies of other design disciplines where the same craft elements apply. For example, a study comparing knitwear designers and helicopter designers  observed that designers’ communities develop their own vocabulary for design criteria that is created through practice and tradition. The CDs framework aims to provide the same kind of vocabulary for API designers.
As a result, the CDs main objective is to enable API designers to reason consistently in relation to how well an API supports the intended activities of its users. Simply stated, CDs make possible to discuss in a coherent way about the extent to which an API supports application developers at the time of performing typical activities such as API learning and understanding, application design and creation, application maintenance and evolution, etc. For this, the framework considers a set of dimensions each of which describes an aspect of the API usability. These dimensions constitute a vocabulary of terms that can be used to characterize cognitive artifacts and which makes possible to establish comparisons and to discuss and investigate about the implications of design decisions on those artifacts. It is important to remark that these dimensions are not good or bad in themselves but that they simply describe properties of the system with respect to developers’ activities.
Viscosity: resistance to change.
A viscous system needs many user actions to accomplish one goal. Changing all headings to upper-case may need one action per heading. (Environments containing suitable abstractions can reduce viscosity.) We distinguish repetition viscosity, many actions of the same type, from knock-on viscosity, where further actions are required to restore consistency.
Visibility: ability to view components easily.
Systems that bury information in encapsulations reduce visibility. Since examples are important for problem-solving, such systems are to be deprecated for exploratory activities; likewise, if consistency of transcription is to be maintained, high visibility may be needed.
Premature commitment: constraints on the order of doing things.
Self-explanatory. Examples: being forced to declare identifiers too soon; choosing a search path down a decision tree; having to select your cutlery before you choose your food.
Hidden dependencies: important links between entities are not visible.
If one entity cites another entity, which in turn cites a third, changing the value of the third entity may have unexpected repercussions. Examples: cells of spreadsheets; style definitions in Word; complex class hierarchies; HTML links. There are sometimes actions that cause dependencies to get frozen, e.g. soft figure numbering can be frozen when changing platforms; these interactions with changes over time are still problematic in the framework.
Role-expressiveness: the purpose of an entity is readily inferred.
Role-expressive notations make it easy to discover why the author has built the structure in a particular way; in other notations each entity looks much the same and discovering their relationships is difficult. Assessing role-expressiveness requires a reasonable conjecture about cognitive representations.
Error-proneness: the notation invites mistakes and the system gives little protection.
Enough is known about the cognitive psychology of slips and errors to predict that certain notations will invite them. Prevention (e.g. check digits, declarations of identifiers, etc) can redeem the problem.
Abstraction: types and availability of abstraction mechanisms.
Abstractions (redefinitions) change the underlying notation. Macros, data structures, global find-and-replace commands, quick-dial telephone codes, and word-processor styles are all abstractions. Some are persistent, some are transient. Abstractions, if the user is allowed to modify them, always require an abstraction manager (i.e. a redefinition sub-device). It will sometimes have its own notation and environment (e.g. the Word style sheet manager) but not always (for example, a class hierarchy can be built in a conventional text editor). Systems that allow many abstractions are potentially difficult to learn.
Closeness of mapping: closeness of representation to domain.
How closely related is the notation to the result it is describing?
Consistency: similar semantics are expressed in similar syntactic forms.
Users often infer the structure of information artifacts from patterns in notation. If similar information is obscured by presenting it in different ways, usability is compromised.
Diffuseness: verbosity of language.
Some notations can be annoyingly long-winded, or occupy too much valuable “real-estate” within a display area. Big icons and long words reduce the available working area.
Hard mental operations: high demand on cognitive resources.
A notation can make things complex or difficult to work out in your head, by making inordinate demands on working memory, or requiring deeply nested goal structures.
Provisionality: degree of commitment to actions or marks.
Even if there are hard constraints on the order of doing things (premature commitment), it can be useful to make provisional actions such as recording potential design options, sketching, or playing “what-if” games. Not all notational systems allow users to fool around or make sketchy markings.
Progressive evaluation: work-to-date can be checked at any time.
Evaluation is an important part of a design process, and notational systems can facilitate evaluation by allowing users to stop in the middle to check work so far, find out how much progress has been made, or check what stage in the work they are up to. A major advantage of interpreted programming environments such as BASIC is that users can try out partially- completed versions of the product program, perhaps leaving type information or declarations incomplete.
The dimensions are vaguely defined often leading to misinterpretation in applying them.
The theoretical and empirical foundations of the dimensions are poorly defined.
The dimensions lack clear operationalization (i.e. evaluation procedures and metrics), which mean they can be only applied in a subjective manner.
It does not support evaluation, as the dimensions simply define properties of notations and are not meant to be either “good” or “bad”.
It does not support design: the dimensions are not design guidelines and issues of effectiveness are excluded from its scope.
Its level of generality precludes specific predictions meaning that it is unfalsifiable and, hence, it cannot be considered to provide a scientific basis for evaluating anything.
They offer a comprehensive, broad-brush evaluation mechanism which does not suffer the ‘death by details’ symptom of other techniques.
They offer a set of discussion tools and a common vocabulary helpful for evaluating designs.
They are based on terms that are comprehensible by non-specialists.
They are directly applicable, without requiring customizations or reinterpretations, to all types of notations including APIs.
Although they are not theoretically complete, they are theoretically coherent, which makes possible to analysts to generate consistent analyses.
They describe a set of necessary, though not sufficient, conditions for usability, which enable deriving usability predictions from the structural properties of a notation, the properties and resources of an environment and the type of activity.
2.3 Quantitative evaluation of API usability
CDs are used by designers for performing quantitative evaluation of API usability. The common practice for this is to use a questionnaire  requesting users to evaluate, through a Likert scale , how they experience CDs dimensions when performing their development activities. There is a broad literature illustrating how to create such reliable questionnaires . When questionnaires target unsupervised and open audiences through the WWW, as it is the case on this paper, a critical aspect for attaining reasonable answer rates and acceptable accuracy is simplicity . Hence, without a full and complete understanding of the questions, developers under evaluation might not be willing to provide any information on the API usability at all, or might be giving incomplete or mistaken answers.
As stated above, the CDs framework defines dimensions as a vocabulary that can be used by designers when investigating the cognitive implications of their design decisions, so that designers might be able to express any properties of their information artifacts as a composition of these basic dimensions. As an analogy, this is somehow similar to the way vector spaces work: any vector in the space can be expressed as a composition of the base vectors. From this perspective, the base CDs dimensions are designed for independence (i.e. they do not overlap) and not for clarity and simplicity. As a result, questionnaires addressing the complete set of CDs dimensions in the context of all common development activities are too complex, long and impractical for our objectives . Using them might decrease the aim of the target population to provide answers as well as the overall usefulness of the resulting research.
This table shows the relation of Clarke’s dimensions of API usability with CDs dimensions and illustrates the meaning of each of these dimensions for developers. As it can be seen, Clarke’s dimensions are, in all cases, more intuitive and simpler to understand than the original CDs dimensions
Main related CDs dimensions
Closeness of mapping
Hard mental operations
The higher, the simpler the process of learning and understanding the API is
The higher, the less the low-level details developers need to manage for creating their applications
The higher, the simpler is the process of translating application requirements into code
The higher, the simpler the maintenance and evolution of applications are
The higher, the more incremental and progressive the API learning process is
Understandability deals with evaluating the effort required for understanding how to use the API for achieving a desired functionality. This dimension encompasses aspects such as whether the API names are descriptive and the relation among API types and constructs are clear and unambiguous. This relates to the base CDs dimension called closeness of mapping. It also includes the ability of the API to avoid developers to manage hidden information not explicitly represented in the API, which is called hidden dependencies in terms of the base CDs dimensions. In addition, the base CDs dimension called hard mental operations also affects understandability. In brief, this dimension addresses how simple is to access API features through object creation, primitive invocations or other means.
Abstraction, which is itself a base CDs dimension, relates to the ability of the API to guarantee that programmers can use the API proficiency without requiring specific knowledge or assumptions in relation to its implementation details. Abstractions should match the conventions and practices of programmers, without being elegantly abstract at the expense of understandability or other practical concerns. Abstraction is typically correlated with the degree of comfort developers feel when using the API. Summarizing with a slogan, this research question asks whether the API “makes simple things simple, and complex things possible”.
Expressiveness can be seen as the ability of inferring readily the purpose of an entity. This is related to the base CD dimension called role-expressiveness. Expressiveness is also related to how easy is for the programmer to build her code without needing to assume any specific cognitive model about API use. Intuitively, code written using expressive APIs tend to be simpler to read and transforming requirements into code is typically more efficient in expressive APIs. In terms of base CDs dimensions these properties are related to visibility and consistency. Moreover, expressive APIs do not impose constraints neither in the order or creation nor in the definiteness of the components comprising the code, which is related to the CDs dimensions called premature commitment and provisionality. We also consider the CDs base dimension called error-proneness to be part of the expressiveness properties of our API.
Reusability determines whether the client code is maintainable and extensible. In particular, this dimension addresses the typical concern on how hard is to modify pre-existing code and adapt it to slightly, extended or more general requirements. The main related base CDs dimensions is viscosity, understood as resistance to change, but it also involves other base dimensions such as diffuseness (i.e. the verbosity of the notation).
Learnability address the ability of the API learning process to be incremental. Learnable APIs enable developers to understand APIs in a gradual way without requiring initial disproportionate efforts, which is related to the CDs base dimension called progressive evaluation. Learnability also deals with whether performing a certain programming task using the API has a positive impact on performing other related but different tasks. This dimension might have some overlap with understandability, but emphasizes specifically the learning process rather than its practical outcomes.
2.4 Contributions of this paper: the RTC Media API requirements
Seamless API extensibility through custom modules
Adaptation to WWW technologies and methodologies
This requirement has two aspects. The first, and most important, is the need of our API to be adapted to novel RTC WWW technologies and very particularly to WebRTC . The WebRTC architecture, based on heavy use of RTP bundle  and RTCP demultiplexing mechanisms  and requiring complex ICE  management techniques such as Trickle ICE , makes complex to comply with this requirement. Also as specified in sections above, JSR 309 is not compatible with this as the NetworkConnection is based on plain RTP. The second, is the need of the API to adapt to the typical WWW three tier development model . This means that the RTC Media API should be usable for WWW developers with their common development, deployment and debugging techniques and tools. To some extent, this means that the RTC Media API should be perceived by WWW developers as any other of the APIs consumed in the application logic, such as database APIs or ESB (Enterprise Service Bus) APIs.
Full abstraction of media details (i.e. codecs and protocols)
Media representation and transport technologies are complex and require specialized knowledge that is not typically available for common developers. For maximizing productivity and minimizing development and debugging complexity the RTC Media API should hide all the low level details of such technologies through the appropriate abstractions. In doing so, these abstractions must maintain the appropriate expressiveness enabling the API semantics to provide to developers the ability of performing the required operations onto protocols and formats including payloading, depayloading, decoding, encoding, re-scaling, etc.
Programming language agnostic
RTC media topology agnostic
One of the main objectives of RTC Media Servers is to provide group communication capabilities to applications. Due to this, any useful RTC media API must consider this as a central aspect of its design by exposing the appropriate constructs for group communications. When looking to how RTC group communications are technically implemented, we can notice that they are based on a set of well-known RTP interconnecting topologies  among which the most common ones are Media Mixing Mixers (MMM), Media Switching Mixers (MSM) and Selective Forwarding Units (SFU). In short, MMMs are based on the principle of composing a single output media stream out of N input media streams, so that the final composite stream represents the addition of the N input streams. MMMs require decoding of the N input streams, the generation of the composite (e.g. linear adding in audio or matrix layout for video) and encoding to generate the output stream. Due to the performance cost of these operations MMM do not scale nicely. On the other hand, MSMs and SFUs do not perform any heavyweight processing and they just forward and route N incoming streams to M outgoing streams, reason why they have better scalability properties. Their only difference is that MSMs enable the N to M mapping to change dynamically while on SFUs it is static and the only possible operation is switching on/off forwarding on any of the output M streams.
Understanding the differences and appropriate usage scenarios of these topologies is complex and a source of extra complexity for application developers. Due to this, we include a requirement for our RTC Media API to transparently manage all the subtleties of this problem so that the most appropriate solution is provided transparently by the API. Remark that JSR 309 also tried to comply with this requirement through the “Joinable” mechanism making possible for developers to establish topologies just by joining sources with sinks. However, as explained above, both JSR 309, and equivalently JSR 79, are only compatible with MMM topologies and cannot manage the, by the way most popular, MSM or SFU models.
Advanced media QoS information gathering
QoS is critical in multimedia services. Some milliseconds of latency or jitter can be the difference between successful and unsuccessful applications . For this reason, RTC media developers need to have the appropriate instrumentation mechanisms enabling seamless debugging, monitoring and optimization of applications. These requirements guarantees that our RTC Media API developers are able to access advanced QoS metrics of the streams including relevant information such as packet loss, bandwidth, latency or jitter. Remark that none of the above mentioned RTC media server APIs, including the JSR 309, provide this kind of capability.
Compatibility with advanced media processing capabilities
So far, most RTC media technologies and APIs have been concentrated on the problem of transport (i.e. taking media information on one place and moving it to other places.) This happened because the most prevalent use case for RTC is person-to-person communications, where end-users expect from technology to eliminate distance barriers (i.e. to maintain a conversation as if it were face-to-face.) However, during the last decade novel use cases involving person-to-machine and machine-to-machine communications are gaining popularity in different verticals such as video surveillance, smart cities, smart environments, etc. In all these verticals, going beyond plain transport is a relevant requirement. As an example, the number of low latency RTC video applications being used in security scenarios is skyrocketing. In all these applications the ability to integrate Video Content Analysis (VCA) capabilities through different types of computer vision algorithms is an unavoidable requirement . In addition, modern media applications in areas such as gaming or entertainment complement VCA with another trending technology: Augmented Reality (AR), which is also having high demand from users . As a result, we include our RTC Media API to provide full compatibility with these advanced processing techniques enabling their seamless integration and use.
In RTC media services, as in other types of services, context is becoming a relevant ingredient for providing added value to applications . Context is somehow an ambiguous concept for which there is not yet a formal definition. However, most authors accept context as any kind of information that can be used for characterizing the situation of an entity . The OMA (Open Mobile Alliance) has generated a formal definition of context through the NGSI standard  as a set of attributes that can be associated to an entity. When working with RTC media, the entity is most typically a RTC media session (e.g. a media call).
Considering this context definition, this requirement means that our RTC media API needs to be capable of consuming context for customizing and adapting end-user experience but, most important, it needs to be capable of extracting context attributes from the media communication itself. In other words, the part of the context dealing with the media itself (i.e. what the media content is and what it represents at any time) needs to be manageable by the proposed API.
Adapted to multisensory multimedia
Traditionally RTC media has referred to simple audiovisual streams comprising typically one video track and one or two (i.e. stereo) audio tracks. However, modern trends and technologies extend this to a new multisensory notion , where multisensory streams may comprise several audio and video tracks (e.g. Multi-view and 3D video) but may also enable the integration of additional sensor information beyond cameras and microphones (e.g. thermometers, accelerometers, etc.) . Hence, we establish a requirement for our RTC Media API to be capable of managing such multisensory multimedia in as seamless and natural way.
Adaptation to cloud media servers
Cloud computing is permeating in all IT domains, including multimedia, as the de-facto standard for system deployment and management . This trend is also permeating into the RTC media server arena, reason why we need to consider it in the definition of our API. Adapting the RTC Media API to cloud environments basically means to make it compatible with how a PaaS (Platform as a Service) media server works . In other words, our API needs to be compatible with a new notion of distributed media server, which in opposition with traditional monolithic media servers, is distributed through a cloud environment and can elastically scale to adapt to end-users generated load.
3 Description of the proposed API: the RTC Media API
3.1 API specification
3.1.1 MediaObjects: MediaElements and MediaPipelines
Before providing a formal description of the RTC Media API, which is probably too harsh, let’s introduce some simple initial concepts that might be helpful for the understanding of the basic mechanisms and philosophy behind our API. The RTC Media API is built on top of an object oriented model where the root of the inheritance hierarchy is the MediaObject. The MediaObject is only a holder providing utility members (it is abstract and cannot be instantiated). The two main types inheriting from MediaObject are MediaElement and MediaPipeline.
The MediaElement is the main abstraction of the RTC Media API. Intuitively, a MediaElement can be seen as a black box implementing a specific media capability. In general, MediaElements receive media streams through sinks, send media streams through sources and, in the middle, do “something” with the media. There are two main subclasses of MediaElements: Endpoints and Filters. An Endpoint is always a MediaElement with the ability of communicating media with the external world. All media streams coming into an Endpoint sink are send out of the MediaElement through some kind of external interface (e.g. network interface, file system interface, etc.) In the same way, all media streams received from the external interface are published and made available to other MediaElements through the Endpoint source. Filters, on the other hand, do not communicate media streams with the external world. Their only function is to implement some kind of media processing. This can be simple transport (e.g. a pass-through filter) or may involve complex processing algorithms including computer vision or augmented reality.
MediaElements can be connected among each other by means of a connect primitive. When a MediaElement (let’s call it A) is connected to other MediaElement (say B), the media streams available at A’s source are feed to B’s sink. The connectivity of MediaElements works following quite intuitive and natural rules. First, a MediaElement source can be connected to as many MediaElement sinks as you want (i.e. a MediaElement can provide media to many MediaElements). Second, a MediaElement sink can only receive media from a connected source. Hence, connecting a source to a sink that is previously connected makes that sink to first disconnect from its previous source before being connected to the new one. Hence, application developers create their media processing logic just by connecting media elements following the desired topology.
Another interesting feature of MediaElements is that the connect primitive is overloaded to provide the ability of connecting just one track of those available on a media stream. The RTC Media API distinguishes three types of tracks: AUDIO, VIDEO and DATA. The two former correspond with the typical audio-visual component of a stream. The latter represents arbitrary sensor data whose semantics is application-dependent. The DATA component makes possible to integrate any kind of sensor data into media applications.
RtpEndpoint: it represents an Endpoint having the capability of sending and receiving media streams based on standards such as the RTP protocol , the AVP and AVPF RTP profiles [54, 67], and the SDP media session negotiation mechanisms .
WebRtcEndpoint: it represents an Endpoint having the capability of sending and receiving WebRTC streams complying with the appropriate standards and drafts .
PlayerEndpoint: it represents an Endpoint with the ability of reading streams from different sources, such as a file system, an HTTP resource or RTSP .
RecorderEndpoint: it represents an Endpoint with the ability of storing media out of the pipeline, typically on the media server file system or in a media repository through HTTP.
FaceOverlayFilter: it consists of a Filter using the Haar  computer vision algorithm for detecting faces on a stream and overlying on top of them images with customized scales and offsets.
MediaPipelines, in turn, are just containers of MediaElement graphs. A MediaPipeline holds MediaElements that can connect among each other following an arbitrary and dynamic topology. MediaElements owned by one MediaPipeline cannot connect to MediaElements owned by another MediaPipeline. Hence, the MediaPipeline represents an isolated multimedia session from the perspective of the application.
Code snippet for developing the application specified in Fig. 1 in Java with the Kurento Client API, our reference implementation of the RTC Media API. Media from each WebRtcEndpoint is recorded in the file system of the media server (files videoUserA.webm and videoUserB.webm respectively)
3.1.2 RTC Media API IDL specification
One of the main requirements of the RTC Media API is that it should be available in different programming languages. Due to this, RTC Media API capabilities are specified through an IDL (Interface Definition Language) which is language agnostic. From an implementation perspective that IDL is compiled later to different programming languages in order to generate the appropriate SDKs. In this way, RTC Media API capabilities are defined only once but the corresponding implementations can be generated for a variety of languages.
The remoteClasses section is used to define the interface to media server objects. We call it “remote” because these objects are remote from the perspective of the API consumer, as they are hosted into the RTC media server. For example, PlayerEndpoint and ImageOverlayFilter are defined in this section in their corresponding IDL file.
The complexTypes section is used to define enumerated types and registers used by remote classes or events. For example, the enumerated type MediaType with possible values AUDIO, DATA or VIDEO may be defined in this section.
The events section is used to define the events that can be fired when using RTC Media API. For example, EndOfStream may be defined in the events section of the IDL file describing a PlayerEndpoint, so that the event is fired when the end of the stream is reached by the player.
The code section is used to define properties to control the code generation phase for different programming languages. For example, in this section we can specify the package name in which all artifacts are generated for the Java language.
Example of an RTC Media API IDL file defining a PlayerEndpoint media element capability
Extends: A remote class may extend another remote class. In this case, all properties, methods and events of the superclass are available in objects of the subclass. Note that constructors of the superclass are not inherited. That is, they cannot be used to create objects of the subclass.
Constructor: A remote class constructor is defined with a parameter list. Every parameter has a name and a type. The available types are: primitive types (String, boolean, float, double, int and int64), remote classes or complex types. Parameters can be defined as optional.
Properties: A property is a value associated with a name. To define a remote class property it is necessary to specify its name and type. Properties can be defined as “read only”.
Methods: Methods are named procedures that can be invoked with or without parameters. Every parameter is specified by its name and type. Parameters can be defined as optional. A return type can be specified if the method returns a value.
Events: If a remote class declares an event it means that events of this type can be fired by objects of this remote class. It depends on the target programming language how this events are processed.
To define an event, it is mandatory to assign it a name. In addition, an event can have properties. Every property must be defined with a name and a type. In the same way than remote classes, events can also extend a parent event type inheriting all its properties.
Regarding complex types, they can have two formats: enumerated or register. If a property or param is defined with an enumerated complex type, it can only hold a value from the list of specified values. For example, properties based on the enumerated complex type MediaType of Table 3 must have the value AUDIO, DATA or VIDEO. On the other hand, register complex types can hold objects with several properties. For example, the register complex type Fraction has two int properties: numerator and denominator.
3.1.3 Compiling the RTC Media API IDL
Package: all artifacts (i.e. classes, interfaces and enums) are generated in the package specified in code.api.java.packageName section of JSON IDL file.
- Remote classes: For every remote class there are two generated artifacts: an interface and a builder class:
Interface: For every remote class a Java interface is generated. This interface has the remote class methods defined in the IDL. In addition, for every property, a getter method is also included. The name of the method is the string “get” followed by property name. If the property is not read only, a setter method is also generated following the same approach. Finally, for every event declared in the remote class, a method to subscribe listeners to it is generated. For example, the PlayerEndpoint has the event EndOfStream declared in the IDL so the method String addEndOfStreamListener(Listener <EndOfStream> listener) is generated. The complementary method to remove the subscription is also generated. Listener<E > is a generic interface with only one method: onEvent(E event).
- Builder class: We use the builder pattern  to create new remote class instances. A Builder is generated for each remote class. All mandatory params in the remote class constructor are mapped to parameters to the only constructor of the builder class. In this way, the compiler enforces that all mandatory parameters have a value. Optional constructor parameters are generated in builder class as fluent setter methods (prefixed with “with” instead of “set” or not prefixed if the method starts with “use”). The builder class is generated as an internal type of the above-mentioned interface to associate easily the class and the interface. The code snippet on Table 4 shows the creation of a PlayerEndpoint with the optional constructor parameter useEncodedMedia set to true.Table 4
Code snippet showing how to instantiate a PlayerEndpoint in Java
- Complex types: Depending on the complex type format (enum or register) the code generation is different:
Enumerated complex type: A Java enum class is generated.
- Register complex type: A basic Java bean class is created. For every property, getter and setter methods are generated. In addition, a constructor with all properties as parameters is also generated. The code snippet in Table 5 shows a sample code using a register (WindowParam) as a constructor parameter of a PointerDetectorFilter remote class.Table 5
Example illustrating how to instantiate a register complex type (WindowParam) as a Java bean
- Events: For each event defined in a RTC Media API IDL file a new Java class is generated. “Event” is appended to the name of the class. This class is very similar to the generated classes for register complex types. That is, a getter and a setter method is included for each property. In addition, all event classes extend from the RaiseBaseEvent base class. This base class contains properties for holding the source of the event (source) and the timestamp in which the event was generated (timestamp). The code snippet in Table 6 shows an example illustrating how to work with events.Table 6
Example illustrating how to work with events both in Java 7 and Java 8
package name: code.api.js.nodeName
package description: code.api.js.npmDescription
3.1.4 Creation and deletion of media capabilities
3.1.5 Synchronous and asynchronous programming models in the RTC Media API
One of the most critical design decisions when designing APIs is how they behave in relation to threads. When performing I/O (Input/Output) operations, there is a common agreement that asynchronous APIs are more scalable than synchronous ones . Synchronous I/O typically block threads until a response is received or a timeout is reached. Hence, given that there is a practical limit on the number of threads in a system (mainly due to memory constraints), synchronous API models tend to generate thread starvation and decrease performance due to the overload they generate into the operating system task scheduler. To solve this problem, many modern APIs provide asynchronous I/O operations. In this case, the thread executing the I/O is not blocked after the invocation and can be used to execute other tasks. However, asynchronous APIs are more complex to use and are susceptible of suffering a problem called “callback hell” . This is a well-known problem that arises when asynchronous calls are invoked in the callbacks of another asynchronous calls, creating a deep nesting of callbacks.
Example illustrating the creation of a PlayerEndpoint using the asynchronous Java API
Creation of a PlayerEndpoint in ES7
3.1.6 RTC Media API capabilities
Once we have presented the formal aspects of the RTC Media API, we can switch to a more practical perspective and introduce its media capabilities. These capabilities comprise specific media objects that are made available to application developers to create their RTC media enabled applications following the above-described API guidelines. These capabilities can be grouped into two main categories: media elements, which inherit from the MediaElement class and manage a single media stream, and hubs, which inherit from the Hub class and have been specifically designed for the management of groups of streams.
The WebRtcEndpoint is an I/O endpoint that provides full-duplex WebRTC media communications compatible with the corresponding protocol standards . It is important to remark, that among WebRtcEndpoint capabilities, the RTC Media API defines as mandatory the DataChannel support. DataChannels are a mechanism for receiving media information beyond audio and video given their ability to accommodate arbitrary sensor data that is transported in the same ICE connection than the audio and the video and, hence, may maintain synchronization with them.
The RtpEndpoint is equivalent but with the plain RTP protocol.
The HttpPostEndpoint is an input-only endpoint that accepts media using HTTP POST requests. This capability needs to support HTTP multipart and chunked encodings, so that it is compatible with the HTTP file upload function exposed by WWW browsers. This endpoint must support the MP4 and WebM media formats.
The PlayerEndpoint is an input-only endpoint that retrieves content from the local file system, HTTP URLs or RTSP URLs and injects it into the media pipeline. This endpoint must support the MP4 and WebM media formats for all input mechanisms as well as RTP/AVP/H.264 for RTSP streams.
The RecorderEndpoint is an output-only endpoint that provides function to store contents in reliable mode (doesn’t discard data). This endpoint may write media streams to the local file system, or to HTTP URLs using POST messages. This endpoint must support MP4 and WebM media formats.
Filters, in turn, are used for processing media streams. Filters are useful for integrating different types of capabilities such as Video Content Analysis (VCA), Augmented Reality (AR) or custom media adaptation mechanisms. The RTC Media API does not specify any kind of mandatory filter and it is let to API implementers to define their filters following the RTC Media API extensibility mechanisms.
Composites, as all hubs, act as a factory of HubPorts. This means that at a Composite instance we can create as many HubPorts as we want. These HubPorts are media elements having sources and sinks, which makes possible to connect other media elements to them and get media into and out of the hub.
A Composite mixes all streams received from its HubPort’s sinks and exposes the resulting mixed stream at the sources. The audio of the mixed stream obtained at a HubPort’s source includes all the inputs except the one of its own HubPort’s sink. The video, on the other hand, combines all HubPort’s sinks into the resulting composite matrix.
3.1.7 Extending the API
A module definition: the MediaElement interfaces and related types defined in the RTC Media API IDL.
The corresponding software libraries: the specific language-dependent SDK enabling developers to use the module in their software projects.
Example of module definition for a new Filter called CompuVisionFilter
Using the filter CompuVisionFilter previously defined
3.1.8 Implementing the RTC Media API: the Kurento Client API
In order to implement the RTC Media API and making it expose useful capabilities to developers we just need two ingredients. The first is an RTC media server. This media server needs to expose at its northbound some kind of control interface or protocol enabling the management of RTC media capabilities in a compatible way with the semantic requirements of the RTC Media API. The details about how to create such RTC media server and control protocol are out of the scope of this paper. The second, is to implement an RTC Media IDL compiler suitable for translating the RTC Media IDL into the corresponding programming-language-dependent SDKs. Remark that this compiler is not protocol agnostic, in the sense that it needs to translate the RTC Media API invocations into the appropriate messages of the RTC media server control protocol. In other words, each specific media server control protocol needs to have a custom IDL compiler.
The ZBarFilter filter detects QR and bar codes in video streams. When a code is found, the filter publishes a CodeFoundEvent. Application developers can add a listener to this event to execute some logic.
The ImageOverlayFilter filter inserts still images in the video stream. The filter makes possible to select the position, scaling and rotation coordinates of the image.
The FaceOverlayFilter filter detects faces in a video stream and overlays custom images onto the face coordinates. The filter makes possible to select specific scaling and offsets for the image position.
CrowdDetectorFilter filter implements a computer vision algorithm suitable for detecting crowds of people into video streams. The level of crowdedness is published through a custom event that contains information about the direction and speed of movement of the crowd.
PlateDetectorFilter filter detects European car plates and publishes the detected plate number as a custom event.
AugmentedRealityFilter filter wraps the Alvar library  to provide marker and markerless Augmented Reality capabilities.
AlphaBlending hub is a special type of hub that makes possible to mix different video streams using alpha transparency. This hub is useful for producing chroma blended videos in real time.
Thanks to all these capabilities, the Kurento software stack has been used for creating hundreds of applications combining different types of features which include WebRTC and RTP transports, media recording, Video Content Analysis, or Augmented Reality.3 All in all, Kurento provides a full working test-bed where the RTC Media API constructs described in this paper can be used, evaluated and improved.
3.1.9 Matching the RTC Media API requirements
Seamless API extensibility through custom modules: As it can be seen in Section 3.1.7, the RTC Media API can be extended in a seamless way by using the RTC Media Module mechanism, which provides full flexibility and no restrictions other than extending from the base RTC Media API classes.
Adaptation to WWW technologies and methodologies: As shown in Section 3.1.8, RTC Media API implementations fully comply with the traditional WWW three tiered development model and enable developers to create applications leveraging novel WWW RTC media technologies such as WebRTC in a seamless and direct way.
Full abstraction of media details (i.e. codecs and protocols): As it can be appreciated in the discussions and code examples in Sections 3.1.1, 3.1.3 and 3.1.7, the RTC Media API connect primitive exposed by all MediaElements makes possible to fully abstract codecs, protocols and formats. In all our examples there are no explicit references to codecs or formats, but many of the examples require specific transcodings to work. This is due to the fact the semantics of the connect primitive mandates the underlying media server capabilities to perform all the appropriate adaptations in a fully transparent way.
RTC media topology agnostic: Following the discussions on Sections 3.1.1 and 3.1.6 one can appreciate that the RTC Media API makes possible to interconnect media elements following arbitrary and dynamic topologies thanks to the connect primitive. This means that developers do not need to be aware of the low level details of MMM, MSM or SFU technologies: they just need to interconnect their endpoints, filters and hubs accordingly to their needs. The RTC Media API semantics shall translate these interconnections into the appropriate low level mechanisms using MMMs, MSMs or SFUs in a fully transparent way.
Advanced media QoS information gathering: As it can be observed in the discussions in Section 3.1.2, the RTC Media API IDL does not restrict in any way the information a media object may expose through its properties and methods. We have leveraged such flexibility for creating QoS metrics gathering mechanisms in all endpoints based on the RTP protocol. In particular, the WebRtcEndpoint exposes primitives fully compliant with the standard WebRTC “inboundrtp” and “outboundrtp” stats .
Compatibility with advanced media processing capabilities: As it can be observed in the discussions in Section 3.1.8, the Kurento software project has created a bunch of modules providing advanced capabilities such as Video Content Analysis, Augmented Reality, Computer Vision, etc. This demonstrates the ability of the RTC Media API Filter concept to hold all kind of extensions for advanced media processing.
Context awareness: The notion of context emerges in quite a seamless way from the discussions on Section 3.1.2. As it can be observed, the RTC Media API event mechanism makes possible for media capabilities to publish events to applications. These events may contain semantic information about the media content itself as shown, for example, in the CrowdDetectorFilter mentioned in Section 3.1.8. Hence, creating multimedia context-aware applications is straightforward: the application logic just needs to subscribe to the relevant events and publish them into a context database based on NGSI or any other equivalent standard.
Adapted to multisensory multimedia: The RTC Media API can manage seamlessly arbitrary sensor data beyond audio and video. This can be achieved through the combination of two features. The first is the support for DataChannel that, as specified in Section 3.1.6, makes possible for any media pipeline to exchange multisensory multimedia with the external world using the WebRTC protocol stack. The second is the fact that, as described in Section 3.1.1, all streams exchanges among MediaElements may have a DATA track. In particular, any information received using DataChannels into a WebRtcEndpoint is published to the rest of the pipeline through the endpoint’s source DATA track. In the same way, any information received through the DATA track at a WebRtcEndpoint’s sink is sent to the network using DataChannels. As the MediaElement interface enables all the information received through the DATA to be used by the element internal logic, this mechanism makes possible, for example, to create Augmented Reality filters that leverage sensor information for customizing the augmentation logic.
Adaptation to cloud media servers: As it can be observed in the discussion of Section 3.1.2, the RTC Media API does not specify how media pipelines are placed into media server instances. The API implementer has full freedom for selecting how newly created media pipelines are scheduled. This flexibility can be leveraged by API implementers to adapt their code to all kinds of cloud architectures. For example, as shown in the code snippet in Table 2, at the Kurento Client RTC Media API Java implementation, we decided that the RTC Media API is represented by a specific class (i.e. KurentoClient) that is built though a static create factory method. This method may accept as a parameter a single IP, in which case all pipelines are instantiated into the media server listening at that IP; a list of IPs, which causes media pipelines to be round-robin distributed on the corresponding media servers; or a media server scheduling interface, which can provide arbitrary logic for scheduling media pipeline creation into media servers. It may also accept no parameters and let the developer specify the behavior in a configuration file. All this flexibility makes possible for our RTC Media API to work seamlessly in cloud clusters of Kurento Media Server instances. Just as an example, this scheme is currently used in the NUBOMEDIA  and FIWARE  clouds. The complex details on how this happens are out of the scope of this paper. The point is to remark that the RTC Media API does not constraint the API implementer in any way when adapting to complex cloud scheduling and placement logic.
3.2 Some real-world example applications
The RTC Media API is currently being used in the context of the Kurento open source software community by hundreds of developers for creating RTC applications. Just for illustration, we can briefly describe here two of these applications.
4 API evaluation
In the sections above we have presented the RTC Media API and we have introduced a specific implementation of it. The rest of this paper is devoted to describing a study we performed for evaluating the RTC Media API usability in the context of the Kurento open source software community.
4.1 Study design: methodology and hypotheses
The RTC Media API herein presented enables the creation of rich RTC applications consuming advanced media capabilities with full abstraction of low-level details and in a programming language agnostic way.
The term creation must be understood here in a wide sense to cover all the activities developers perform in relation to the API. As also stated in Section 2.3, these activities include exploratory learning (i.e. the process of learning how to use the API), exploratory design (i.e. the process of creating application code consuming the API) and maintenance (i.e. the process of debugging and evolving the application code after the application has been first created).
Do developers feel that the API can be learnt in a simple, incremental and seamless way?
Do developers feel that the API is helpful for the creation of clean and error-free application code without needing to manage low-level complexities?
Do developers feel that maintaining and evolving code consuming the API is smooth and uncomplicated?
Do developers have the same perception of the API usability with independence on their demographic characteristics (i.e. years of experience, nationality, etc.) and on the types of applications they create?
Do developers have the same perception of the API usability with independence on their programming language?
4.1.1 Research questionnaire
Research questionnaire used for evaluating developer’s perception of API usability on the 5 target dimensions. Every assertion in the questionnaire is identified with a unique ID for further reference (e.g. U.1 refers to the first question of the Understandability dimension). Assertions generated in negative terms (i.e. N-assertions) start with an (N) mark. These assertions are useful for evaluating the consistency of the research. Participants are asked to provide their degree of agreement with every assertion in a scale from 1 (I fully disagree) to 5 (I fully agree). For the statistical analysis, N-assertions are inverted so that the coherence of the questionnaire is maintained
Kurento APIs are, in general, easy to understand
In Kurento APIs object names are descriptive and unambiguous
(N) I need to keep track of hidden information not represented by the APIs to create my applications
(N) Kurento APIs are obscure and it takes a huge effort to use them, even for creating simple applications
Kurento API objects, types, and primitives represent appropriately the underlying media-related concepts
I understand the difference between a Media Pipeline and a Media Element
Kurento APIs make simple to create applications without needing to worry about the low level media details
(N) I needed to adapt the API (e.g. inheriting, overriding, etc.) for having it meet my needs
(N) It’s necessary to understand how codecs and protocols work for being able to use Kurento APIs
I like writing applications with Kurento APIs. I'm familiar with their programming model
I feel appealing and attractive the general approach of Kurento APIs
Creating simple applications with Kurento APIs is simple. Creating complex applications is possible
Developing with Kurento fully matches the expectations I had
I can translate my media application requirements into code in an easy way
Reading an application code, I can understand what the application is doing in a simple way
After creating an application, I can explain seamlessly to other people what I have done in terms of media elements and their interconnections
(N) There are missing features in Kurento APIs that make not possible to implement interesting applications
(N) Programming with Kurento APIs is error prone. You need to take into consideration a lot of details for having an application working
(N) Creating applications requires too long and verbose code specifying too many things
(N) Adding a recording capability to an non-recording application requires modifying a lot of code
My code using Kurento APIs can be maintained and evolved easily
I can re-use Kurento related code in a simple way
(N) When using Kurento APIs, there are many different ways of doing the same thing and I need to take too many decisions in the process
Adapting my Kurento-based application to new media requirements is quite simple
I learned how to use Kurento in an incremental way, starting with simple concepts and progressing towards complex applications
(N) Programming with Kurento requires learning a lot of classes and dependencies, even for applications.
When I create a complex application, I can start by simple example and evolve it later in a seamless way
(N) I needed to read all the documentations and tutorials to be able to create my first application
Reading simple tutorials made me possible to understand better the complex ones and to create later applications complying with my requirements
- For the question “Type of application being developed”, we provided participants the possibility of selecting multiple items among the following options (the corresponding encoding token is provided between <> signs):
I’m creating video applications with recording capability (<Recording>)
I’m creating my own filters and extending Kurento APIs (<Filter>)
I’m creating video surveillance applications (<Video surveillance>)
I’m creating videoconferencing applications (<Videoconferencing>)
I’m creating broadcasting applications for distributing media among large groups of receivers (<Broadcasting>)
I’m using Kurento for integrating with other types of technologies beyond WebRTC (<Integration>)
I’m creating other types of applications (<Other>)
All questions dealing with “self-assessment of expertise” were expressed in the poll as assertions in the form “I’m an expert in …”, where answers are in the above mentioned 1 (I fully disagree) to 5 (I fully agree) format.
- The question dealing with “Learning stage on Kurento technologies” provided the ability of selecting one option among the following items:
I tried to install Kurento unsuccessfully
I installed Kurento and executed some of the provided demos
I developed a simple application
I developed a complex application
I developed a complex application which is in production
This table shows the additional questions made to participants in order to characterize them. The first column shows the type of data to be gathered through the question. The second column shows the question itself (summarized for the sake of readability), the third column shows the value type accepted by the web form. The mark  indicates users are given the choice of choosing one item in a list. The mark * indicates that multiple items may be selected. In this table, list items are tokenized for simplicity
[Male | Female]
Main computer language used
Type of application being developed
[Recording | Filter | Video surveillance | Videoconferencing | Broadcasting | Integration | Other]*
Years of experience as developer
Hours learning or programming with Kurento
Self-assessment of expertise as WebRTC developer
[1 to 5]
Self-assessment of expertise in media technologies
[1 to 5]
Self-assessment of expertise with Kurento technologies
[1 to 5]
Learning stage on Kurento technologies
[Tried | Installed | SimpleApp | ComplexApp | ProductionApp]
4.1.2 Participants and protocol
Most API usability studies are performed by recruiting students or researchers who are trained on the API through lectures or exercises and who are later interviewed for performing the evaluation . These types of protocols are sensitive to many different types of bias that may affect the study reliability. In particular, their main weakness is that participants are typically not professional developers and are not faced with real world programming tasks. Hence, their perception of the API limitations and usability problems can be severely biased by their own background and by the nature and contents of the training contents and of the proposed exercises. In addition, those contents and exercises are typically created by the API designers, which increases significantly the risk of introducing the designers’ cognitive models and hiding API limitations that might not be known even by designers themselves. Many API evaluation research works are aware of these limitations, but solving them is not trivial given the difficulty of reaching a statistical significant population of professional developers being independent of designers, having the time of learning API concepts and working on solving real-world tasks with them.
In order to avoid these problems, we leverage the fact that the RTC Media API has been implemented as part of the Kurento project. More specifically, the Kurento Client API is an almost complete implementation of it. This is a significant advantage because Kurento has been released as an Open Source Software and a community of developers has emerged around it. The size of the community is unknown but its main communication channel, the Kurento Public mailing list, has, at the time of this writing, 432 subscribers most of which are professional developers at different stages of the API learning process.
4.2 Results and analysis
4.2.1 Analysis of participants
The survey was activated following the protocol described above. After one week from the initial e-mail invitation a total of 17 participants had answered. Several reminders were sent to the Kurento Public mailing and the announcement was also published through different social channels, such as the Kurento Twitter account. In two weeks, 42 answers were received, which represents 9.7 % of the number of Kurento Public mailing list subscribers. This is aligned with typical answer rates in surveys.
Summary of answers in relation to participant’s expertise as developers and in the different involved technological areas
Type of data
Years of experience as developer
Hours learning or programming with Kurento
Self-assessment on WebRTC expertise
1 to 5
Self- assessment on video technologies expertise
1 to 5
Self- assessment on Kurento expertise
1 to 5
4.2.2 Analysis of dimensions
Results of the research showing, for each assertion of the poll, the main statistics of the provided answers. Notice that N-assertion have their results inverted for maintaining coherence. For each dimension, the statistics are computed on the average values along all assertions on that dimension for each user
Correlation between the different parameter data captured through the questionnaire against the scores of API usability perception averaged across all assertions
Parameter to be correlated with overall average score along all answers
Years of experience as developer
Hours learning or programming with Kurento
Self-assessment of expertise as WebRTC developer
Self-assessment of expertise in media technologies
Self-assessment of expertise with Kurento technologies
For completeness, we also evaluated the correlation of the perception of the different dimensions with other demographic data including nationality (consolidated in a per-continent manner) and type of application being created. The corresponding results are illustrated in Figs. 16 and 17. As it can be observed, there are no significant dependencies on the API usability with these variables.
4.3 Validity of the analyses
Following commonly accepted techniques for evaluating assessment data , we discuss the main threads to the validity of our research as well as the measures we deployed to minimize their impact.
4.3.1 Construct validity
We used a well-established methodology based on the CDs framework, the most widely accepted technique for this objective which has been already used successfully on a number of usability studies worldwide.
We performed a careful design of the questionnaire basing on high-level usability dimensions more adapted to participants needs than to API designer’s needs. Each of the high-level dimensions was measured through groups of 5 to 6 assertions facing the problem from different perspectives which minimizes effects of assertion misinterpretation.
The questionnaire contained complementary questions digging into the different components of each of the high-level dimensions and combining positively and negatively formulated assessments. This should enhance the consistency guarantees of the answers.
The protocol avoided to introduce any kind of bias by enabling participants to answer assertions basing only on their own knowledge about the API artifacts (i.e. documentation, code, etc.) and not on previous information provided by designers (e.g. training courses) or on specific artificial exercises which could be associated to specific cognitive models about the API.
4.3.2 Internal validity
Internal validity is a property associated to the extent to which a study minimizes systematic errors and avoid introducing bias into measurements. For enhancing our internal validity, in our research we tried to avoid any kind of selection bias by enabling Kurento Open Source Community members to answer freely to the poll. This strategy was clearly successful given the wide spectrum of participants we had, comprising developers of different ages, expertise degrees, nationalities and cultures. This enhances significantly our internal validity in relation to other previous similar studies  where API designers and participants have tight relationships (e.g. professors and students, workers of the same company, etc.) The risk of statistical effects in the data is also low given the fact that the poll was answered by 42 participants, which is a population sample significantly over the ones of other similar studies .
Cronbach’s alpha computed for all the high-level dimensions of our test
Commonly accepted rule of thumb for describing internal consistency in terms of Cronbatch’s alpha
alpha > = 0.9
0.9 > alpha > = 0.8
0.8 > alpha > = 0.7
0.7 > alpha > = 0.6
0.6 > = alpha > = 0.5
0.5 > = alpha
As it can be observed, the reliability of the obtained data is inside acceptable margins, which is reasonable for our type of questionnaire and open research search methodology where there is no control on the who’s, how’s and why’s of participants.
4.3.3 External validity
External validity refers to the extent to which a study can be generalized to other situations or populations. In relation to this, the main threads to external validity come from the research protocol, which was designed basing on specificities of our API implementation. In particular, the fact that we leveraged the Kurento Open Source software community for obtaining the test population is quite a strong restriction for the generalizability of our findings given that most newly designed APIs might not be Open Source and, even if they are, they do not need to have an active international community of more than 400 developers. Otherwise, the rest of our methodology as well as the analysis we performed and the conclusions we gathered from it do not assume any kind of specific requirements. This suggests that the gist of our findings are also applicable to different contexts and populations.
Based on the analysis shown above, we come back to the hypothesis H1, as stated in Section 4.1, and analyze its degree of fulfillment. For this, we use the results of our study for answering the therein stated questions, namely:
Do developers feel that the API can be learnt in a simple, incremental and seamless way?
The evaluation of the learning ability of developers emerges mainly from two of the dimensions under analysis: understandability and learnability.
As illustrated in Table 20, and as a general perception, participants feel the API understandability to be fine, with an average of 3.46 over all answers on this dimension. Through U.1 (3.33) we find a general declaration that the API is easy to understand. In particular, and as shown in assertion U.2 answers, participants feel outstanding (4.26) how object and primitive names are descriptive. On the other hand, and through the U.3 N-assertion (2.69), we find that developers detect the presence of hidden dependencies that make the API more complex to understand.
As also illustrated in Table 20, the learnability of the API is evaluated positively by participants (3.43 in average). The improvement area in this topic emerges from L.2 (3.07) and L.4 (2.86), which evidence the impressions of developers of needing to learn about a lot of API constructs and to read a relevant amount of documentation before being able to start using the API for useful things. On the other hand, as L.1 (3.88) evidences, the learning process seems to be compatible with an incremental approach where complexity is introduced in progressive steps.
Based on this, we can confirm that the API can be understood and learnt by developers in a seamless and incremental way. The main area of improvement is in the initial learning curve, which seems to be too abrupt, and in the presence of hidden information and dependencies among the API constructs. Probably, both topics are related and caused by the inherent complexity of RTC technologies. Our guess is that better and more complete documentation might be helpful for minimizing both problems.
Do developers feel that the API is helpful for the creation of clean and error-free application code without needing to manage low-level complexities?
The process of exploratory design, understood as the creative activity of creating application code consuming the API, is mainly related with two of our target dimensions: abstraction and expressiveness.
As shown in Table 20, When coming to abstraction, we also find out a general positive evaluation (the overall average is 3.39). Answers on all questions are quite uniform, being A.5 the top ranked one (3.69), which shows that developers feel appealing the API approach; and A.2 the bottom ranked one (3.24) indicating that some developers feel the need of adapting the API to their needs. It is remarkable that A.2 has the largest standard deviation (1.28) of A assertions, which evidences some degree of controversy. This is confirmed when looking closely to answers: in A.2 answers, there are 12 % of ones and 19 % of fives, while considering all Abstraction answers the ratios for ones and fives are 4.3 % and 12.6 % respectively.
The Expressiveness analysis also reflects positive evaluation but shows improvement areas. This is the less successful dimension with an overall average ranking of 3.17. Expressiveness limitations seem to emerge on assertions E.5 and E.6 (2.76 and 2.90 respectively). In particular, E.5 reveals that developers miss features that are relevant for their applications. E.6, in turn, manifests that the API gives not enough protection against failures. On the other hand, as demonstrated through E.3 and E.4, our API is easy to read (3.43) and is consistent when dealing with explaining code logic in terms of the API constructs (3.45).
Hence, our API is suitable for being used in the process of creating application code. However, as Fig. 14 illustrates, abstraction, and more significantly expressiveness, are the two dimensions with lower usability score. This evidences that, although the API ideas are appealing and intuitive (learnability and understandability get very high scores), leveraging them for creating real-world RTC applications still presents some difficulties. These seem to be related with the lack of further desirable features (i.e. richer extensions to the API might be necessary) and with the lack of protection against failures. This latter topic is a pervasive problem in most RTC media APIs due to its distributed and real-time nature and, to the best of our knowledge, there are no simple solutions for fixing it.
Do developers feel that maintaining and evolving code consuming the API is smooth and uncomplicated?
Corrective and evolutive maintenance of the code is related to the dimension we call reusability:
As shown in Table 20, reusability is the dimension with the highest ranking (3.47 in average). This is illustrated through results in assertions R.2, R.3 and R.6, which average to 3.67, 3.31 and 3.14 respectively. Also the API demonstrates nice properties in relation to verbosity, as shown by the 3.48 exhibited by R.1. Our API is considered concise and it is not too much verbose.
Hence, we may conclude that, once the application code using the API has been created, it can be modified, maintained and evolved without much efforts.
Do developers have the same perception of the API usability with independence on their demographic characteristics (i.e. years of experience, nationality, etc.) and on the types of applications they create?
In our main hypothesis H1, as stated in Section 4.1, we assume that API perception of usability is fine for all developers, with independence on their origin, culture or experience. For validating this assertion, we have performed several statistical analyses whose outcome is the following:
Correlation between API usability and programming experience.
Hence, we can conclude that, as long as an initial knowledge about the API and its foundations is known, the API usability perception does not depend significantly on the proficiency of developers.
Relation between API usability and nationality/culture.
Relation between API usability and the type of application being created.
Do developers have the same perception of the API usability with independence on their programming language?
To finish, and basing on the answers to these questions, we can conclude that the main hypothesis of the paper is validated by our study and that, with the exception of some collateral effects that are not quantitatively relevant, the API usability is confirmed to be good across all application creation activities and high usability scores are robust with respect to developers’ profiles, cultures, experience and preferred programming languages.
Along the paper, we have tried to stress the importance of listening to developers’ needs and solving developers’ problems. We defend software is eating RTC multimedia technologies. Hence, for pushing them to a next level we need to create novel APIs and SDKs suitable for their democratization among wider developer audiences. In current state-of-the-art, there are a huge amount of algorithms and technologies for transporting, analyzing and enriching media, but there are very few APIs and SDKs making possible for average WWW and smartphone developers to use them in a seamless and effortless way. Our RTC Media API brings a whole new concept by incorporating WWW development methodologies to the multimedia arena.
The RTC Media API in general, and the Kurento Client API implementation in particular, are still research artifacts, which are under maturation, and miss many relevant ingredients. In particular, the adaptation to latest trends in WebRTC technologies, including the incorporation of ORTC (http://ortc.org/) concepts, would benefit significantly the API flexibility and its ability to adapt to latest trends in WebRTC. The API would also benefit from having richer support for complex media streams suitable for including multiple audio and video tracks, so that 3D or MVC multimedia is supported. Improvements are also possible from the perspective of development tools beyond the API itself so that seamless mechanisms for debugging, diagnosing and optimizing applications would be more than welcome by developers. To conclude, further efforts should be invested in the future to perform a consistent and complete evaluation of the API performance suitable for illustrating the main QoS metrics of the different media elements and of the media pipeline mechanism on real-world operational conditions following the scheme of pervious research on this area .
- 1.Latest versions of the IDL files specifying the Kurento RTC Media API in the kmd.json format can be found at the Kurento GitHub repository at the following locations:
- 2.Latest versions of the IDL files specifying the Kurento RTC Media API extensions in the kmd.json format can be found at the Kurento GitHub repository in different locations, including the following:
Demos showing different Kurento applications created using the Kurento RTC Media API implementations are accessible at the Kurento YouTube channel: https://www.youtube.com/channel/UCFtGhWYqahVlzMgGNtEmKug
This work has been supported by the European Commission under projects FI-WARE FP7-2011-ICT-FI GA-285248, FI-CORE FP7-2014-ICT-FI GA- 632893 and NUBOMEDIA FP7-ICT-2013-1.6 GA-610576; by Spanish Ministerio de Educación under project Reactive Media (TIN2013-41819-R); and by the Regional Government of Madrid (CM) under project Cloud4BigData (S2013/ICE-2894) co-funded by FSE & FEDER.
- 1.Abowd GD, Dey AK, Brown PJ, Davies N, Smith M, Steggles P (1999) Towards a better understanding of context and context-awareness. In: Handheld and ubiquitous computing. Springer, pp 304–307Google Scholar
- 2.Afonso LM, Cerqueira RFG, de Souza CS (2012) Evaluating application programming interfaces as communication artefacts. System 100:8–31Google Scholar
- 4.Allen IE, Seaman CA (2007) Likert scales and data analyses. Qual Prog 40(7):64Google Scholar
- 5.Alvestrand H (2015) Transports for WebRTC draft-ietf-rtcweb-transports-10. https://tools.ietf.org/html/draft-ietf-rtcweb-transports-10. Accessed 20 Dec 2015
- 6.Andreasen F, Arango M, Huitema C, Kumar R, Pickett S, Elliott I, Foster B, Dugan A (2003) Media gateway control protocol (MGCP) version 1.0. Tech. rep., internet engineering task force, request for comments (RFC) 3435Google Scholar
- 7.Andreessen M (2011) Why software is eating the world. Available online: http://www.wsj.com/articles/SB10001424053111903480904576512250915629460 , Wall Street Journal 20
- 9.Bajaj V (2004) JAIN MEGACO API Specification. Tech. rep., Java community process, java specification request (JSR) 79Google Scholar
- 10.Bauer M, Kovacs E, Schülke A, Ito N, Criminisi C, Goix LW, Valla M (2010) The context API in the oma next generation service interface. In: Intelligence in next generation networks (ICIN), 2010 14th International Conference on, IEEE, pp 1–5Google Scholar
- 13.Blackwell AF, Britton C, Cox A, Green TR, Gurr C, Kadoda G, Kutar M, Loomes M, Nehaniv CL, Petre M, et al (2001) Cognitive dimensions of notations: Design tools for cognitive technology. In: Cognitive technology: instruments of mind. Springer, pp 325–341Google Scholar
- 14.Blackwell AF, Green TR (2000) A cognitive dimensions questionnaire optimised for users. In: proceedings of the twelfth annual meeting of the psychology of programming interest group, pp 137–152Google Scholar
- 15.Bloch J (2006) How to design a good API and why it matters. In: Companion to the 21st ACM SIGPLAN symposium on object-oriented programming systems, languages, and applications. ACM, pp 506–507Google Scholar
- 16.Bray T, Paoli J, Sperberg-McQueen CM, Maler E, Yergeau F (1998) Extensible markup language (XML). World Wide Web consortium recommendation REC-xml-19980210 http://www.w3.org/TR/1998/REC-xml-19980210
- 17.Catherine MR, Edwin EB (2013) A survey on recent trends in cloud computing and its application for multimedia. Int J Adv Res Comput Eng Technol (IJARCET) 2(1):304–309Google Scholar
- 18.Clarke S (2001) Evaluating a new programming language. In: 13th Workshop of the psychology of programming interest group, pp 275–289Google Scholar
- 21.Daughtry III JM, Carroll JM (2012) Perceived self-efficacy and APIs. Programming interest group p 42Google Scholar
- 24.Duala-Ekoko E, Robillard MP (2012) Asking and answering questions about unfamiliar APIs: An exploratory study. In: proceedings of the 34th international conference on software engineering. IEEE Press, pp 266–276Google Scholar
- 26.Ellis B, Stylos J, Myers B (2007) The factory pattern in API design: a usability evaluation. In: Proceedings of the 29th international conference on software engineering, IEEE Computer Society, pp 302–312Google Scholar
- 27.Ericson T, Brandt M (2009) Media server control API. Tech. rep., Java community process, Java specification request (JSR) 309Google Scholar
- 28.Farooq U, Zirkler D (2010) API peer reviews: a method for evaluating usability of application programming interfaces. In: Proceedings of the 2010 ACM conference on computer supported cooperative work, ACM, pp 207–210Google Scholar
- 29.Ferry D, Lim S (2004) JAIN SLEE API specification. Tech. rep., Java community process, Java specification request (JSR) 22Google Scholar
- 30.FIWARE Consortium (2015) Future internet core platform. Available online: http://www.fiware.org, FP7-2011-ICT-FI (GA-285248))
- 32.Gamma E, Helm R, Johnson R, Vlissides J (1994) Design patterns. elements of reusable object-oriented software. Addison-WesleyGoogle Scholar
- 33.Ganassali S (2008) The influence of the design of web survey questionnaires on the quality of responses. Surv Res Methods 2:21–32Google Scholar
- 34.Gouveia F,Wahle S, Blum N, Magedanz T (2009) Cloud computing and EPC/IMS integration: new value-added services on demand. In: Proceedings of the 5th international ICST mobile multimedia communications conference, ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), p 51Google Scholar
- 35.Green TR (1989) Cognitive dimensions of notations. People and computers V pp 443–460Google Scholar
- 38.Handley M, Perkins C, Jacobson V (2006) SDP: session description protocol. Tech. rep., internet engineering task force, request for comments (RFC) 4566Google Scholar
- 40.Hovemeyer D (2005) Simple and effective static analysis to find bugs. PhD thesis, College Park, MD, USA, AAI3184274Google Scholar
- 41.Internet Engineering Task Force (2015) Real-time communication in web-browsers (rtcweb). https://datatracker.ietf.org/wg/rtcweb. Accessed 9 Dec 2015
- 42.Ivov E, Marocco E, Holmberg C (2014) A session initiation protocol (SIP) usage for Trickle ICE draft-ietf-mmusic-trickle-ice-sip-03. https://tools.ietf.org/html/draft-ietf-mmusictrickle-ice-sip-03. Accessed 12 Dec 2015
- 43.Jennings C, Holmberg C, Alvestrand HT (2015) Negotiating media multiplexing using the session description protocol (SDP) draft-ietf-mmusic-sdp-bundle-negotiation-23. https://tools.ietf.org/html/draft-ietf-mmusic-sdp-bundle-negotiation-23. Accessed 12 Dec 2015
- 44.Johnston AB, Burnett DC (2012) WebRTC: APIs and RTCWEB protocols of the HTML5 real-time web. Digital Codex LLCGoogle Scholar
- 45.Kambona K, Boix EG, De Meuter W (2013) An evaluation of reactive programming and promises for structuring collaborative web applications. In: Proceedings of the 7th workshop on dynamic languages and applications. ACM, New York, NY, USA, DYLA '13, pp 3:1–3:9. doi 10.1145/2489798.2489802
- 47.Kristensen A (2003) SIP Servlet API. Tech. rep., Java community process, Java specification request (JSR) 116Google Scholar
- 49.Lienhart R, Maydt J (2002) An extended set of HAAR-like features for rapid object detection. In: Image processing. 2002. Proceedings. 2002 International Conference on, vol 1, pp I-900–I-903 vol. 1, doi: 10.1109/ICIP.2002.1038171
- 50.Melanchuk T (2009) An architectural framework for media server control. Tech. rep., internet engineering task force, Request for comments (RFC) 5567Google Scholar
- 52.NUBOMEDIA Consortium (2015) NUBOMEDIA: an elastic Platform as a Service (PaaS) cloud for interactive social multimedia. Available online: http://www.nubomedia.eu, ICT-2013.1.6 (GA-610579)
- 53.O’Doherty P, Ranganathan M (2003) JAIN SIP API specification. Tech. rep., Java community process, Java specification request (JSR) 32Google Scholar
- 54.Ott J, Wenger S, Sato N, Burmeister C, Rey J (2006) Extended RTP profile for real-time transport control protocol (RTCP)-based feedback (RTP/AVPF). Tech. rep., internet engineering task force, request for comments (RFC) 4585Google Scholar
- 56.Perkins C, Westerlund M (2010) Multiplexing RTP data and control packets on a single port. Tech. rep., internet engineering task force, request for comments (RFC) 5761Google Scholar
- 57.Piccioni M, Furia C, Meyer B, et al (2013) An empirical study of API usability. In: Empirical software engineering and measurement, 2013 ACM/IEEE International Symposium on, IEEE, pp 5–14Google Scholar
- 58.Pickard A (2012) Research methods in information. Facet publishing, LondonGoogle Scholar
- 59.Prusty N (2015) Learning ECMAScript 6. Packt Publishing Ltd, BirminghamGoogle Scholar
- 60.Reddy M (2011) API design for C++. Elsevier, AmsterdamGoogle Scholar
- 62.Rosenberg J (2010) Interactive connectivity establishment (ICE): a protocol for network address translator (NAT) traversal for offer/answer protocols. Tech. rep., internet engineering task force, request for comments (RFC) 5245Google Scholar
- 63.Rosenberg J, Schulzrinne H, Camarillo G, Johnston A, Peterson J, Sparks R, Handley M, Schooler E (2002) SIP: session initiation protocol. Tech. rep., internet engineering task force, request for comments (RFC) 3261Google Scholar
- 64.Saint-Andre P (2011) Extensible messaging and presence protocol (XMPP): Core. Tech. rep., internet engineering task force, request for comments (RFC) 6120Google Scholar
- 65.Saleem A, Xin Y, Sharratt G (2010) Media server markup language (MSML). Tech. rep., internet engineering task force, request for comments (RFC) 5707Google Scholar
- 66.Schulzrinne H (1998) Real time streaming protocol (RTSP). Tech. rep., internet engineering task force, request for comments (RFC) 2326Google Scholar
- 67.Schulzrinne H (2003) RTP profile for audio and video conferences with minimal control. Tech. rep., internet engineering task force, request for comments (RFC) 3551Google Scholar
- 68.Schulzrinne H, Casner S, Frederick R, Jacobson V (2003) RTP: A transport protocol for realtime applications. Tech. rep., internet engineering task force, request for comments (RFC) 3550Google Scholar
- 71.Tang X, Zhang F, Chanson ST (2002) Streaming media caching algorithms for transcoding proxies. In: Parallel processing, 2002. Proceedings. International Conference on, IEEE, pp 287–295Google Scholar
- 75.Van Dyke J, Burger E, Spitzer A (2007) Media server control markup language (MSCML) and protocol. Tech. rep., internet engineering task force, request for comments (RFC) 4722Google Scholar
- 78.Wagner B (2010) Effective C# (Covers C# 4.0): 50 specific ways to improve your C#. pearson educationGoogle Scholar
- 79.Westerlund M, Burman B, Nandakumar S (2015) Using simulcast in RTP sessions draft-westerlund-avtcore-rtp-simulcast-04. https://tools.ietf.org/html/draftwesterlund-avtcore-rtp-simulcast-04. Accessed 12 Dec 2015
- 80.Westerlund M, Wenger S (2016) RTP internet draft topologies draft-ietf-avtcorertp-topologies-update-10. https://tools.ietf.org/html/draft-ietf-avtcore-rtp-topologies-update-10. Accessed 7 Jan 2016
- 81.Willmott S, Balas G (2013) Winning in the API economy. Available online: http://www.3scale.net/wp-content/uploads/2013/10/Winning-in-the-API-EconomyeBook-3scale.pdf
- 82.World Wide Web Consortium (2011) Web real-time communications working group. http://www.w3.org/2011/04/webrtc/. Accessed 11 Dec 2015
- 83.World Wide Web Consortium (2015) WebRTC stats. Available online: http://w3c.github.io/webrtc-stats/
- 84.Zhou F, Duh HBL, Billinghurst M (2008) Trends in augmented reality tracking, interaction and display: A review of ten years of ISMAR. In: Proceedings of the 7th IEEE/ACM International Symposium on Mixed and Augmented Reality, IEEE Computer Society, pp 193–202Google Scholar