DASH has aroused a lot of interest on the traditional Web-based video market. Netflix and Youtube chose DASH as their main streaming technology [3, 24]. 3GPP, in the Release 10 specification , adopted DASH as the solution for streaming media over LTE Advanced mobile networks. It is therefore no surprise that researchers have started to think about incorporating DASH approach also in the IoT domain.
In  the authors propose video streaming framework dedicated for IoT that is built on the top of Content Centric Networking (CCN)  architecture. They use DASH mechanisms to create a set of video segments with different representations. Then, segment requests generated by DASH client are translated to CCN interests, whereas responses from HTTP server are translated to CCN content objects. Although it is considered that CCN architecture, as well as other name-based approaches such as the ones presented in , can provide many benefits to IoT , the main drawback of the proposed solution is that CCN follows a clean slate design concept and operates on the new network stack, thus it is incompatible with the existing network infrastructure and its deployment requires building the network from scratch . In turn,  presents DASH-based video platform for IoT, called WVSNP-DASH, that provides a specific naming syntax for video segments. In this way, essential metadata required for video playout is embedded in the name of every segment and a client can play the segment without the need for downloading a manifest file. However, the authors of  do not investigate a communication overhead introduced by WVSNP-DASH, which can be significant, especially when video content is divided into a set of small segments and URL of each segment is enlarged by video metadata. In addition, WVSNP-DASH proposes inserting only a slight set of parameters into a segment’s name: information about segment’s index and representation, video container format, the highest available representation and the total number of segments for given media content. In turn, MPD file defined by DASH specification contains much more information that are crucial for video playout, such as data related to video resolution, DRM (Digital Right Management) or timeline. Consequently, presented WVSNP-DASH, in fact, is not fully compliant with DASH data model.
In this paper, we propose Dynamic Adaptive Streaming over CoAP – DASCo. To the best of our knowledge, DASCo is the first solution that integrates DASH with the CoAP protocol designed for IoT. DASCo is compatible with the DASH semantic but exploits CoAP as a platform for accessing video segments. It works on a usual CoAP server, which means that its implementation can be based on existing CoAP infrastructure. On the other hand, compatibility with DASH specification means that DASCo inherits main DASH’s advantages, such as: support for different streaming services (On-demand, Live and Time-shift), support for various DRM solutions, capability to work with a number of different codecs and video container formats.
CoAP has been designed to run on resource-constrained devices. Therefore, unlike HTTP, it operates on UDP protocol, which provides lower communication overhead (the header length is 8 bytes only, no three-way handshake procedure since it is a connectionless protocol). In general, we assume that streaming services will be requested by handheld devices as smartphones or tablets which in fact have enough resources to deal with HTTP/TCP stack implementation. However, these devices can also take advantage of using the lightweight protocol stack defined for constrained IoT objects, because data transfer can be performed with lower costs in terms of energy and bandwidth consumption.
Another benefit of using CoAP in streaming application is a native interoperability between the application and CoAP-compliant IoT domain. The application can directly communicate with IoT objects to get universal and fast access to observations performed by sensors or actuators inputs. CoAP offers an observe option  that enables implementation of publish/subscribe interaction model. Using it, sensors can push information to the streaming application whenever the current state of the monitored quantity changes. As a result, video streaming service can be easily enhanced with the knowledge of physical environment and interaction with it. The all-CoAP model simplifies a structure of such enhanced streaming application and reduces its development costs, since all communication is performed using the same platform (at least at the application level). In addition, the speed and reliability of the application may increase, because there is no need to use additional devices responsible for protocol translations at the application layer (e.g. HTTP-CoAP proxies). Considering oncoming IEEE 802.11ah (a.k.a. Wi-Fi HaLow) standard that enables low power connectivity , the protocol stack matching between user handheld device with DASCo player and IoT objects spans from the application layer to the data link layer (in case of 802.15.4 sensors, their interoperability with user device is usually provided through layer 2 gateways located at the edge of the wireless sensor network).
CoAP constraints for media streaming
The central premise of CoAP was to provide RESTful access, but at the same time with a low communication overhead, to scalar data acquired by resource-constrained sensors or to send short commands to actuators. The CoAP header is fixed with a length of 4 bytes only, and it can be extended with additional Options headers. As a result, CoAP’s average transaction size in bytes, in typical IoT scenario, is almost 10 times smaller compared to HTTP . Thus, several approaches have been published in the recent literature that consider CoAP as a lightweight transfer protocol for scenarios other than interaction with IoT devices. For example, A. Eriksson et al. in  propose a solution that uses CoAP to implement point-to-multipoint communication for Information-Centric Networking (ICN) services such as content distribution or live video streaming. The authors of  exploited CoAP to handle large amounts of data during the deterministic data transfer in Industrial Ethernet environment. Nevertheless, streaming of multimedia content shows distinct characteristics compared to the transfer of scalar data generated by typical sensors, since it involves the transmission of much larger data sets and, particularly in the case of live streaming service, requires delay-sensitive communication.
Standard CoAP model includes two abstract layers. The upper layer implements REST semantic with Request/Response communication scheme realized by means of GET, PUT, POST and DELETE methods. The lower layer controls the transfer of CoAP messages through the underlying UDP protocol. This messaging layer supports two modes: reliable and unreliable. Reliable data transfer is based on retransmissions and timeout mechanisms and is carried out by confirmable (CON) messages. Each CON message must be acknowledged with an ACK message before the timeout expires. Otherwise, the message is retransmitted. On the other hand, non-confirmable (NON) messages are used for unreliable communication as their sender does not expect ACK as confirmation. All CoAP messages are identified by Message ID (MID), which is used to detect duplicates, and also to bind ACK with its CON message during reliable communication.
To limit the communication overhead resulting from packet fragmentation at the IP layer, the CoAP specification provides default upper bound for a message payload size that equals 1024 bytes. Taking into account that size of video segment usually exceeds this value, media streaming through CoAP can be performed in a block-wise fashion specified in . The block-wise option introduces an intermediate abstract layer, as it is depicted in Fig. 2, which is responsible for segmenting and resequencing operations.
The block-wise layer divides a payload into a set of blocks. The client sends a block request as a CON message and the server replies with ACK that carries a piggybacked response with the requested block. Each single block transfer is handled separately by the messaging layer and CoAP specification , by default, allows only one pending interaction between the client and the given server at a certain time. Consequently, there is a set of individual operations of block downloading that are performed in a Stop-and-Wait manner at the message layer (see Fig. 3) between the two events at the request/response layer that are related to the generation of a request and the receipt of a complete response. This Stop-and-Wait data transfer is very ineffective for a streaming service due to the following reasons:
A single block request results in a transmission of precisely one block. In networks that are characterized by a long Round Trip Time (RTT), the client wastes most of the time on waiting until the request arrives to the server and the server’s response goes back to the client. In  an improvement has been proposed that allows the server to send several blocks as a response to one block request. However, this solution is not compatible with the current CoAP specification because it requires modifications in the messaging layer.
One block carries a limited amount of data, since a block-wise option header defines the maximum block size as 1024 bytes only .
In the reliable transfer mode CoAP implements a simple congestion control mechanism with retransmission timeout and exponential back-off between retransmissions. Default values of retransmission parameters defined by CoAP specification are very conservative. The value for an initial timeout is randomly chosen between 2 and 3 s and it is doubled on every successive timeout expiration. These values result from the characteristics of wireless sensor networks, where a response can be delayed due to entering an endpoint into a sleep mode for sake of energy saving. Nevertheless, the default CoAP congestion control may lead to significant delays in media streaming process when packet losses occur. Consequently, delivery of a whole media segment, which consists of many blocks, may consume too much time resulting in undesired freezing events during video playout.
The continuity of multimedia transmission can be provided by using NON messages in a block-wise transfer. In this case, the client can send a number of block requests without waiting for receiving a response for the previous request. However, the non-confirmable transfer mode may result in too high and uncontrolled packet losses, particularly in a congested wireless environment. Furthermore, the advantage of using TCP in DASH streaming is not only a strict lossless transmission, since multimedia application can tolerate the loss of single packets . Another benefit of the congestion control mechanism provided by TCP is its ability to prevent congestion in multimedia transmission while simultaneously keeping the ongoing multimedia sessions. With a non-confirmable transmission, the CoAP client may generate a number of requests that exceeds network and/or server capabilities, thereby introducing a distortion in multimedia streaming service.
All the above mentioned obstacles should be carefully considered during the design of adaptive streaming in a CoAP-compliant environment.
Dynamic adaptive streaming over CoAP – DASCo
The proposed DASCo streaming framework provides communication principles between DASCo client application (i.e. media player) and a standard CoAP server. DASCo maintains compliance with DASH specification, thus it uses similar metadata formats and methods, but relies on a distinct delivery protocol. Information about the delivery protocol is included in the MPD file, which provides instructions to the client application on how to construct segment’s URL (e.g., by concatenation of <BaseURL > and <SegmentURL > elements from MPD).
CoAP follows the REST architecture principles and uses URI (Uniform Resource Identifier) to name resources hosted by a server. For DASCo purposes two kinds of resources can be registered on the CoAP server that are bound to a streaming service. The first one is mpd and the associated URI: coap://streaming_server_host/mpd/ is a list of all MPD files hosted by streaming server. The second resource is media_content. When service operator wants to provide new content for users, for example a video recorded by a camera marked as camera1, he/she registers new resource camera1video as a child resource of media_content, using POST method. Next, streaming server performs segmentation of the uploaded media content and transcodes segments to different representations. Afterwards, the server creates an MPD file for given content and registers the file as a child resource of mpd.
The specification  provides resource discovery mechanism with resource description for CoAP environment, which can be exploited by a client application to query a server for its list of hosted MPD files. A user who wants to stream the new media content sends a request to the server to obtain the related MPD file, for example:
Next, based on metadata available in the obtained MPD, user’s application consecutively requests media segments:
where query parameter repID indicates requested segment’s representation, and no refers to the sequence number of the segment. If segmentation process is implemented by means of a byte range option, the range attribute should be sent as a query parameter instead of segment’s sequence number:
DASCo streaming server can easily make the same content available outside the CoAP domain, via HTTP service for standard DASH clients. This can be done through the deployment of two manifest files: DASCo MPD and DASH MPD, in parallel, both having almost the same structure. The only difference is in values of elements that identify delivery protocol in segment’s URL, such as <BaseURL>. On-the-fly conversion between DASCo and DASH MPDs may also be applied when a request for DASH MPD arrives, since it requires nothing but a simple operation of swapping URL prefix from “coap” to “http” (alternatively, values of <SegmentURL > and <SegmentTemplate > elements may need to be changed if we want to specify segment’s location in DASH as a plain path to the segment file rather than URI to REST resource, as it is in CoAP). Deployment of DASH streaming together with DASCo is even easier when only relative URLs are used. In this case the MPD file does not contain any information about host’s URL where segments are stored, thus the MPD does not specify a delivery protocol. The absolute segment’s URL, jointly with a protocol part, is then constructed taking into account the service context, i.e. location of video segments is solved based on the location of the associated MPD file. As a result, the same MPD can be used by the streaming server as a response to the request received via CoAP (DASCo) or HTTP (DASH).
It is worth noting that placing a streaming server in the same network domain with video sources and DASCo clients, as it is presented in Fig. 1, results in low response times what is, in particular, crucial for live streaming service (or rather the “quasi-live”, since we allow a small delay related to transcoding and segmentation processes). Moreover, the server can gather statistics about the network and then use them during segmentation and transcoding processes in order to better adapt the created segments to network conditions. Such cooperation of codec and streaming protocol allows increasing the overall efficiency of adaptive streaming system .
As it was discussed in subsection 3.1, default CoAP retransmission timeout of 2 s is relatively conservative and may lead to inefficiency in streaming service. To cope with this issue, monitoring and self-tuning mechanisms should be incorporated into DASCo client application to estimate the timeout value based on RTT measurements. Such approach is in line with CoAP specification , which allows application to dynamically adjust values of different transmission parameters. An example of the timeout tuning mechanism is presented in .
Moreover, higher performance of streaming service can be achieved by increasing the maximum message payload size over the default 1024 bytes. For example, WiFi devices usually use MTU of 1500 bytes, which allows allocating, without unnecessary packet fragmentation, 1492 bytes for the CoAP message (8 bytes is consumed by UDP header) and 1364 bytes for the message payload (we assume that 128 bytes is reserved for the CoAP header and potential CoAP options). Although the specification permits the increase of the message payload size, unfortunately it does not apply to block-wise transmission. The reason lies in Block Option structure that is used during block-wise transfer. The SZX (“size exponential”) field of this structure, which determines the size of the transferred block, has a constant length and 1024 is the maximum possible value that can be assigned to it. Therefore, an increase of bandwidth utilization through increasing message payload can be achieved by using standard CoAP messages only, without block-wise transfer mode. Such solution involves a significant growth of DASCo applications’ complexity, because they need implementation of additional mechanisms responsible for fragmentation of the video segment into smaller pieces that match the message payload size (at the server side), their assembly by the client, detection of duplicate messages etc.
Designing an efficient adaptation algorithm for DASCo is out of scope of this paper. Many propositions that try to solve the problem: how to select the highest bitrate representation of a segment, whose bitrate is not higher than the download rate, have been published in the last few years. In general, the proposed adaptation algorithms for DASH can be classified into three basic classes: rate-based, buffer-based and time-based (a comprehensive overview of the main features of each class is presented in [6, 30]). Choosing the most appropriate algorithm for CoAP environment is an interesting research challenge. Time-based adaptation algorithms, such as presented in [6, 43], which operate on download times of media segments are worth considering. Their important feature is awareness of varying segments sizes, which is relevant in congested wireless environment with limited download rates, specific to IoT.
Taking into account Stop-and-Wait data transfer when block-wise option is used, an adaptation algorithm designed for DASCo may include a feature that allows downloading several segments and/or blocks in parallel. It can be reached by increasing the value of NSTART parameter. According to the CoAP standard , the default value for NSTART is 1, thus only a single ongoing transaction between the client and the given server is allowed. Enhanced DASCo adaptation algorithm may decide not only about the representation of the downloaded segment, but also control the number of simultaneously downloaded segments by changing NSTART value, depending on the monitored network conditions.
Furthermore, block-wise transmission gives a DASCo client the possibility to precisely control the segment download process, since the application can be aware of the download time for each block of a given segment. The client can know the total size of the requested segment (for example by using option Size2 defined in block-wise specification ) and how many blocks of the segment have been already downloaded. Based on this information, DASCo adaptation algorithm can evaluate the progress of the download process and eventually decides to stop downloading the current segment, and then requests a new segment with lower representation (i.e. smaller size) to avoid video playout stall. Such functionality is hard to implement in TCP-based DASH, since the entity that controls TCP connection is the operating system. On the other hand, tearing down a TCP connection at the application level is inefficient, because the application is not aware if request for the subsequent portion of data has already been sent by the TCP client or not (if yes, then the server response with requested data is useless) and next, a costly establishment of new connection (TCP three-way handshake, slow start phase) needs to be performed.
The CoAP specification also allows an application to change the value of MAX_RETRANSMIT parameter, which defines the maximum number of retransmissions. With this feature, DASCo adaptation algorithm can control number of retransmission attempts, since multimedia content can tolerate some losses, but often requires delay-sensitive communication, particularly in case of live streaming.