Big Data and Fog Computing
Fog Computing A model of distributed computing comprising of virtualized, heterogeneous, commodity computing and storage resources for hosting applications, analytics, content, and services that are accessible with low latency from the edge of wide area networks where clients are present while also having back-end connectivity to cloud computing resources.
Fog computing is an evolving computing paradigm that offers resources that lie between the edge devices and the cloud data center. It is motivated by the need to process high speed and large volumes of Big Data from Internet of Things with low latency. This article discusses the characteristics Fog computing, data-driven applications on it, and future research problems.
Ever since computers could connect over a network, computing paradigms have undergone cyclical phases on where within the network the computation is performed. While the original mainframes till the 1970s were large, centralized time-sharing machines accessed by multiple users through remote terminals, the personal computers (PCs) of the 1980s heralded local processing for individuals (Nelson and Bell 1986). The growth of local area networks (LAN), the Internet, and the World Wide Web (WWW) brought about client-server models in the 1990s where many clients could access content hosted on individual servers, though most of the processing was still done on the PC (Sinha 1992). Web services and e-commerce of the 2000s lead to the growth of cloud computing, where computation once again skewed to centralized data centers, but with server farms rather than single servers hosting services that were consumed by PCs (Hwang et al. 2013). A complementary phenomenon in that decade was peer-to-peer (P2P) systems where PCs distributed across the Internet would work collaboratively for content sharing, to tackle bandwidth limitations of the Internet (Milojicic et al. 2002). Both cloud computing and P2P signaled the arrival of the Big Data age, where the ability to collect large enterprise, scientific and web datasets, and media content put an emphasis on being able to share and process them at large scales.
The decade of 2010 is seeing a similar cyclical shift but at a faster pace due to several technology advances. Starting with a more centralized cloud computing model hosting thousands of virtual machines (VMs), we have seen the rollout of pervasive broadband Internet and cellular network communication combined with the rapid growth of smartphones as general-purpose computing platforms backed by cloud computing. Internet of Things (IoT) is yet another paradigm, enabled by the convergence of these other technologies (Gubbi et al. 2013). Sensors and constrained devices connected to the Internet are being deployed to support vertical IoT domains such as personal fitness using wearables, smart utilities using metering infrastructure, and even self-driving cars. Both smartphones and IoT mark the advent of Edge Computing (or mobile cloud computing). Here, ubiquitous devices numbering in the billions are present at the edge of the wide area network (WAN) that is the Internet and host applications that either operate locally or serve as lightweight clients that publish data to or consume content from cloud services (Garcia Lopez et al. 2015; Fernando et al. 2013).
The limitations of an edge-only or a cloud-only model were recognized by Satyanarayanan et al., who introduced the concept of cloudlets (Satyanarayanan et al. 2009). These are resource-rich servers, relative to edge devices, that could host VMs while also being closer in the network topology to the edge devices, relative to cloud data centers. They are designed to overcome the constrained resources available on edge platforms while reducing the network latency expended in being tethered to the cloud for interactive applications. Fog computing generalizes (and popularizes) the notion of cloudlets.
The term “fog computing” was coined by Cisco and first appears publicly in a talk by Flavio Bonomi, Vice President and Fellow at Cisco Systems, as part of the Network-Optimized Computing at the Edge Of the Network Workshop, colocated with International Symposium on Computer Architecture (ISCA), in 2011 (Bonomi 2011). This was further described as extending the concept of cloud computing to the edge of the network to support low-latency and geo-distributed applications for mobile and IoT domains (Bonomi et al. 2014). Since then, fog computing has been evolving as a concept and covers a wide class of resources that sit between the edge devices and cloud data centers on the network topology, have capacities that fall between edge devices and commodity clusters on clouds, and may be managed ad hoc as a smart gateway or professionally as a computing infrastructure (Varshney and Simmhan 2017; Vaquero and Rodero-Merino 2014; Yi et al. 2015).
Fog computing is relevant in the context of wide-area distributed systems, with many clients at the edge of the Internet (Donnet and Friedman 2007). Such clients may be mobile or consumer devices (e.g., smartphone, smart watch, virtual reality (VR) headset) used interactively by humans or devices that are part of IoT (e.g., smart meter, traffic cameras and signaling, driverless cars) for machine-to-machine (M2M) interactions. The client may serve both as consumers of data or actuators that receive control signals (e.g., fitness notification on smart watch, signal change operation on traffic light), as well as producers of data (e.g., heart rate observations from smart watch, video streams from traffic cameras) (Dastjerdi and Buyya 2016).
These have their relative advantages. Cloud computing outsources computing infrastructure management to providers, who offer elastic access to seemingly infinite compute resources on-demand which can be rented by the minute. They are also cheaper due to economies of scale at centralized locations (Cai et al. 2017). Edge computing leverages the compute capacity of existing captive devices and reduces network transfers, both of which lower the costs. There may also be enhanced trust and context available closer to the edge (Garcia Lopez et al. 2015).
While fog computing is still maturing, there are many reason why its rise is inevitable due to the gaps in these two common computing approaches. The network latency from the edge client to the cloud data center is high and variable, averaging between 20 and 250 ms depending on the location of the client and data center (He et al. 2013). The network bandwidth between the edge and the cloud, similarly, averages at about 800–1200 KB/s. Both these mean that latency-sensitive or bandwidth-intensive applications will offer poor performance using a cloud-centric model due to the round-trip time between edge and cloud (Satyanarayanan et al. 2009, 2015). Another factor is the connectivity of devices to the Internet. Mobile devices may be out of network coverage periodically and cause cloud-centric applications to degrade or loose functionality (Shi et al. 2016).
Edge computing while avoiding issues of network time suffers from operating on constrained devices that have limited battery, compute, and memory capacities (Barbera et al. 2013). This reduces application performance and limits sustained processing that can drain the battery. These devices may also be less robust and their network connectivity less available (Aral and Brandic 2017).
Fog computing serves as a computing layer that sits between the edge devices and the cloud in the network topology. They have more compute capacity than the edge but much less so than cloud data centers. They typically have high uptime and always-on Internet connectivity. Applications that make use of the fog can avoid the network performance limitation of cloud computing while being less resource constrained than edge computing. As a result, they offer a useful balance of the current paradigms.
The resource capacity in the fog can vary. At one end, Raspberry Pi devices with 1 GHz processors, 1 GB RAM, and 100 Mbps Ethernet may serve as a gateway fog resource for lower-end edge devices. On the other hand, fog resources could be provisioned as “micro” or “nano” data centers with clusters of servers or even accelerators present (Garcia Lopez et al. 2015). Individual fog devices can also have heterogeneous capacities (Chiang and Zhang 2016).
Fog resources tend to be more reliable and available than edge resources, though lacking the robust fail-over mechanisms in the cloud that is possible due to a large resource pool. This makes them well-suited to serve as a layer for persisting data and services for the short and medium term. Fog resources themselves may be deployed in a stationary environment (e.g., coffee shop, airport, cell tower) or in a mobile platform (e.g., train, cab) (Chiang and Zhang 2016). This can affect the network connectivity of the fog with the cloud, in case it uses cellular networks for Internet access, and even its energy footprint (Jalali et al. 2017).
Fog deployment models are still emerging. These resources may be deployed within a public or a private network, depending on its end use. Smart city deployments may make them available to utility services within the city network (Yannuzzi et al. 2017), while retail shops and transit services can make them available to their customers. A wider deployment for public use on-demand, say, by cellular providers, cities, or even cloud providers, will make it comparable to cloud computing in terms of accessibility. These have implications on the operational costs as well (Vaquero and Rodero-Merino 2014).
The fog resources may be made available as-a-service, similar to cloud resources. These may be virtualized or non-virtualized infrastructure (Bittencourt et al. 2015), with containers offering a useful alternative to hypervisor-based VMs that may be too heavyweight for lower-end fog resources (Anand et al. 2017). However, there is a still a lack of a common platform, and programming models are just emerging (Hong et al. 2013; Ravindra et al. 2017). Most applications that use the fog tend to be custom designed, and there has only been some theoretical work on scheduling applications on edge, fog, and cloud (Brogi and Forti 2017; Ghosh and Simmhan In Press).
Role of Big Data
One of the key rationales for deploying and using fog computing is Big Data generated at the edge of the network. This is accelerated by IoT deployments. Traditional web clients which just consume services and content from the WWW saw the growth of content distribution networks (CDN) to serve these with low latency. The data from IoT sensors is instead generating data at the clients that is being pushed to the cloud (Cai et al. 2017). In this context, fog computing has been described as acting like an inverse CDN (Satyanarayanan et al. 2015).
A large swathe of IoT data comes as observational streams, or time-series data, from widely distributed sensors (Naas et al. 2017; Shukla et al. 2017). These data streams vary in their rates – once every 15 min for smart utility meters, every second for heart rate monitoring by a fitness watch to 50 Hz by phasor measurement units (PMU) in smart power grids – and the number of sensors can range in the millions for city-scale deployments. These are high-velocity data streams that are latency sensitive and need online analytics and decision-making to provide, say, health alerts or manage the power grid behavior (Simmhan et al. 2013). Here, fog computing can help move the decision-making close to the edge to reduce latency.
Another class of high-volume data that is emerging is from video streams from traffic and surveillance cameras, for public safety, intelligent traffic management, and even driverless cars (Aazam et al. 2016). Here, the bandwidth consumed in moving the data from the edge to the cloud can be enormous as high-definition cameras become cheap but network capacity growth does not keep pace (Satyanarayanan et al. 2015). The applications that need to be supported can span real-time video analytics to just recording footage for future use. Fog computing can reduce the bandwidth consumed in the core Internet and limit data movement to the local network. In addition, it can offer higher compute resources and accelerators to deploy complex analytics as well.
In addition, telemetry data from monitoring the health of the IoT fabric itself may form a large corpus (Yannuzzi et al. 2017). Related to this is provenance that describes the source and processing done on the distributed devices that may be essential to determine the quality and veracity of the data. Fog can help with the collection and curation of such ancillary data streams as well.
Data archival is another key requirement within such applications (Cai et al. 2017; Vaquero and Rodero-Merino 2014). Besides online analytics, the raw or preprocessed observational data may need to be persisted for medium or long term to train models for machine learning (ML) algorithms, for auditing to justify automated decisions, or to analyze on-demand based on external factors. Depending on the duration of persistence, data may be buffered in the fog either transiently or for movement to the cloud during off-peak hours to shape bandwidth usage. Data can also be filtered or aggregated to send only the necessary subset to the cloud.
Lastly, metadata describing the entities in the ecosystem will be essential for information integration from diverse domains (Anand et al. 2017). These can be static or slow changing data or even complex knowledge or semantic graphs that are constructed. They may need to be combined with real-time data to support decision-making (Zhou et al. 2017). The fog layer can play a role in replicating and maintaining this across distributed resources closer to the edge.
There has been rapid progress on Big Data platforms on clouds and clusters, with frameworks like Spark, Storm, Flink, HBase, Pregel, and TensorFlow helping store and process large data volumes, velocities, and semi-structured data. Clouds also offer these platforms as a service. However, there is a lack of programming models, platforms, and middleware to support various processing patterns necessary over Big Data at the edge of the network that can effectively leverage edge, fog, and cloud resources (Pradhan et al. In Press).
Examples of Applications
The applications driving the deployment and need for fog computing are diverse. But some requirements are recurrent: low latency processing, high volume data, high computing or storage needs, privacy and security, and robustness. These span virtual and augmented reality (VR/AR) applications and gaming (Yi et al. 2015), Industrial IoT (Chiang and Zhang 2016), and field support for the military (Lewis et al. 2014). Some of the emerging and high-impact applications are highlighted below.
Smart cities are a key driver for fog computing, and these are already being deployed. The Barcelona city’s “street-side cabinets” offer fog resources as part of the city infrastructure (Yannuzzi et al. 2017). Here, audio and environment sensors, video cameras, and power utility monitors are packaged alongside compute resources and network backhaul capacity as part of fog cabinets placed along streets. These help aggregate data from sensors, perform basic analytics, and also offer WiFi hot spots for public use. As an example, audio analytics at the fog helps identify loud noises that then triggers a surveillance camera to capture a segment of video for further analysis. Similar efforts are underway at other cities as well (Amrutur et al. 2017). These go toward supporting diverse smart city applications for power and water utilities, intelligent transport, public safety, etc., both by the city and by app developers.
One of the major drivers for such city-scale fog infrastructure is likely to be video surveillance that is starting to become pervasive in urban spaces (Satyanarayanan et al. 2015). Such city or even crowd-sourced video feeds form meta-sensors when combined with recent advances in deep neural networks (DNN). For example, feature extraction and classification can count traffic and crowds, detect pollution levels, identify anomalies, etc. As such, they can replace myriad other environment sensors when supported by real-time analytics. Such DNN and ML algorithms are computationally cost to train and even infer and can make use of fog resources with accelerators (Khochare et al. 2017).
Wearables are playing a big role in not just personal fitness but also as assistive technologies in healthcare. Projects have investigated the use of such on-person monitoring devices to detect when stroke patients have fallen and need external help (Cao et al. 2015). Others use eye-glass cameras and head-up displays (HUDs) to offer verbal and cognitive cues to Alzheimer’s patients suffering from memory loss, based on visual analytics (Satyanarayanan et al. 2009). Predictive analytics over brain signals monitored from EEG headsets have been used for real-time mental state monitoring (Sadeghi et al. 2016). These are then used to mitigate external conditions and stimuli that can affect patients’ mental state.
All these applications require low latency and reliable analytics to be performed over observations that are being collected by wearables. Given that such devices need to be lightweight, the computation is often outsourced to a smartphone or a server that acts as a fog resource and to which the observational data is passed for computation. There are also decisions to be made with respect to what to compute on the wearable and what to communicate to the fog, so as to balance the energy usage of the device.
Driverless cars and drones are emerging application domains where fog computing plays a key role (Chiang and Zhang 2016). Both these platforms have many onboard sensors and real-time analytics for autonomous mobility. Vehicular networks allow connected cars to communicate with each other (V2V) to cooperatively share information to make decisions on traffic and road conditions (Hou et al. 2016). These can be extended to share compute capacity to perform these analytics. This allows a collection of parked or slow-moving vehicles to form a fog of resources among proximate ones. These may complement occasional roadside units that offer connectivity with the cloud. Here, the entities forming the fog can change dynamically and requires distributed resource coordination.
Drones are finding use in asset monitoring, such as inspecting gas pipelines and power transmission towers at remote locations (Loke 2015). Here, a mobile base station has a digital control tether with a collection of drones that follow a preset path to collect observational data. The drones have limited compute capacity to increase their endurance and need to prioritize its use for autonomous navigation. So they typically serve as “data mules,” collecting data from sensors along the route but doing limited onboard processing (Misra et al. 2015). However, when situations of interest arise, they can make use of their local capacity, resources available with nearby drones, or with fog resources in the base station, while the trip is ongoing. This can help decide if an additional flypast is necessary.
Fog computing is still exploratory in nature. While it has gained recent attention from researchers, many more topics need to be examined further.
Many of the proposed fog architectural designs have not seen large-scale deployments and are just plausible proposals. While city-scale deployments with 100s of fog devices are coming online, the experiences from their operations will inform future design (Yannuzzi et al. 2017). Network management is likely to be a key technical challenge as traffic management within the metropolitan area network (MAN) gains importance (Vaquero and Rodero-Merino 2014). Unlike static fog resources, mobile or ad hoc resources such as vehicular fog will pose challenges of resource discovery, access, and coordination (Hou et al. 2016; He et al. In Press). Resource churn will need to be handled through intelligent scheduling (Garcia Lopez et al. 2015). Resources will also require robust adaptation mechanisms based on the situational context (Preden et al. 2015). Open standards will be required to ensure interoperability. To this end, there are initial efforts on defining reference models for fog computing (Byers and Swanson 2017).
Tracking and managing content across edge, fog, and cloud will be a key challenge. Part of this is the result of devices in IoT acting as data sources and compute platforms, which necessitates coordination across the cyber and physical worlds (Anand et al. 2017). The generation of event streams that are transient and need to be processed in a timely manner poses additional challenges to the velocity dimension of Big Data (Naas et al. 2017). Data discovery, replication, placement, and persistence will need careful examination in the context of wide area networks and transient computing resources. Sensing will need to be complemented with “sensemaking” so that data is interpreted correctly by integrating multiple sources (Preden et al. 2015).
Programming Models and Platforms.
Despite the growth of edge and fog resources, there is a lack of a common programming abstraction or runtime environments for defining and executing distributed applications on these resources (Stoica et al. 2017). There has been some preliminary work in defining a hierarchical pattern for composing applications that generate data from the edge and need to incrementally aggregate and process them at the fog and cloud layers (Hong et al. 2013). They use spatial partitioning to assign edge devices to fogs. The ECHO platform offers a dataflow model to compose applications that are then scheduled on distributed runtime engines that are present on edge, fog, and cloud resources (Ravindra et al. 2017). It supports diverse execution engines such as NiFi, Storm, and TensorFlow, but the user couples the tasks to a specific engine. A declarative application specification and Big Data platform are necessary to ease the composition of applications in such complex environments.
Related to this are application deployment and resource scheduling. VMs are used in the cloud to configure the required environment, but they may prove to be too resource intensive for fog. Some have examined the use of just a subset of the VM’s footprint on the fog and migrating this image across resources to track the mobility of user(s) who access its services (Bittencourt et al. 2015). Resource scheduling on edge, fog, and cloud has also been explored, though often validated just through simulations due to the lack of access to large-scale fog setups (Gupta et al. 2017; Zeng et al. 2017). Spatial awareness and energy awareness are distinctive features that have been included in such schedulers (Brogi and Forti 2017; Ghosh and Simmhan In Press). Formal modeling of the fog has been undertaken as well (Sarkar and Misra 2016). Quality of Experience (QoE) as a user-centric alternative metric to Quality of Service (QoS) is also being examined (Aazam et al. 2016).
Such research will need to be revisited as the architectural and application models for fog computing become clearer, with mobility, availability, and energy usage of resources offering unique challenges (Shi et al. 2012).
Security, Privacy, Trust.
Unlike cloud computing where there is a degree of trust in the service provider, fog computing may contain resources from diverse and ad hoc providers. Further, fog devices may not be physically secured like a data center and may be accessible by third parties (Chiang and Zhang 2016). Containerization does not offer the same degree of sand-boxing between multiple tenant applications that virtualization does. Hence, data and applications in the fog operate within a mix of trusted and untrusted zones (Garcia Lopez et al. 2015). This requires constant supervision of the device, fabric, applications, and data by multiple stakeholders to ensure that security and privacy are not compromised. Techniques like anomaly detection, intrusion detection, moving target defense, etc. will need to be employed. Credential and identity management will be required. Provenance and auditing mechanisms will prove essential as well. These will need to be considered as first-class features when designing the fog deployment or the application.
- Aazam M, Huh EN (2014) Fog computing and smart gateway based communication for cloud of things. In: International conference on future internet of things and cloud (FiCloud)Google Scholar
- Aazam M, St-Hilaire M, Lung CH, Lambadaris I (2016) Mefore: QoE based resource estimation at fog to enhance QoS in IoT. In: International conference on telecommunications (ICT)Google Scholar
- Anand N, Chintalapally A, Puri C, Tung T (2017) Practical edge analytics: architectural approach and use cases. In: IEEE international conference on edge computing (EDGE)Google Scholar
- Aral A, Brandic I (2017) Quality of service channelling for latency sensitive edge applications. In: IEEE international conference on edge computing (EDGE)Google Scholar
- Amrutur B et al (2017) An open smart city IoT test bed: street light poles as smart city Spines. In: ACM/IEEE international conference on internet-of-things design and implementation (IoTDI)Google Scholar
- Barbera MV, Kosta S, Mei A, Stefa J (2013) To offload or not to offload? The bandwidth and energy costs of mobile cloud computing. In: IEEE international conference on computer communications (INFOCOM)Google Scholar
- Beck MT, Werner M, Feld S, Schimper T (2014) Mobile edge computing: a taxonomy. In: International conference on advances in future InternetGoogle Scholar
- Bittencourt LF, Lopes MM, Petri I, Rana OF (2015) Towards virtual machine migration in fog computing. In: IEEE international conference on P2P, parallel, grid, cloud and Internet computing (3PGCIC)Google Scholar
- Bonomi F (2011) Cloud and fog computing: trade-offs and applications. In: Workshop on network-optimized computing at the edge of the network (EON)Google Scholar
- Byers C, Swanson R (2017) Openfog reference architecture for fog computing. Technical report, Open Fog ConsortiumGoogle Scholar
- Cai H, Xu B, Jiang L, Vasilakos AV (2017) IoT-based big data storage systems in cloud computing: perspectives and challenges. IEEE Internet Things J 4(1):75–87Google Scholar
- Cao Y, Chen S, Hou P, Brown D (2015) Fast: a fog computing assisted distributed analytics system to monitor fall for stroke mitigation. In: IEEE international conference on networking, architecture and storage (NAS)Google Scholar
- Ghosh R, Simmhan Y (In Press) Distributed scheduling of event analytics across edge and cloud. ACM Trans Cyber Phys SystGoogle Scholar
- Gupta H, Vahid Dastjerdi A, Ghosh SK, Buyya R (2017) iFogSim: a toolkit for modeling and simulation of resource management techniques in the Internet of things, edge and fog computing environments. Softw: Pract Exp 47(9):1275–1296Google Scholar
- He J, Wei J, Chen K, Tang Z, Zhou Y, Zhang Y (In Press) Multi-tier fog computing with large-scale IoT data analytics for smart cities. IEEE Internet Things J. https://doi.org/10.1109/JIOT.2017.2724845
- Hwang K, Dongarra J, Fox GC (2013) Distributed and cloud computing: from parallel processing to the Internet of things. Morgan Kaufmann, Waltham, MAGoogle Scholar
- Jalali F, Khodadustan S, Gray C, Hinton K, Suits F (2017) Greening IoT with fog: a survey. In: IEEE international conference on edge computing (EDGE)Google Scholar
- Khochare A, Ravindra P, Reddy SP, Simmhan Y (2017) Distributed video analytics across edge and cloud using echo. In: International conference on service-oriented computing (ICSOC) demoGoogle Scholar
- Lewis G, Echeverría S, Simanta S, Bradshaw B, Root J (2014) Tactical cloudlets: moving cloud computing to the edge. In: IEEE military communications conference (MILCOM)Google Scholar
- Loke SW (2015) The Internet of flying-things: opportunities and challenges with airborne fog computing and mobile cloud in the clouds. Technical report 1507.04492v1, arXivGoogle Scholar
- Milojicic DS, Kalogeraki V, Lukose R, Nagaraja K, Pruyne J, Richard B, Rollins S, Xu Z (2002) Peer-to-peer computing. Technical report HPL-2002-57, HP labsGoogle Scholar
- Misra P, Simmhan Y, Warrior J (2015) Towards a practical architecture for internet of things: an India-centric view. IEEE Internet of Things (IoT) NewsletterGoogle Scholar
- Naas MI, Parvedy PR, Boukhobza J, Lemarchand L (2017) iFogStor: an IoT data placement strategy for fog infrastructure. In: IEEE international conference on fog and edge computing (ICFEC)Google Scholar
- Pradhan S, Dubey A, Khare S, Nannapaneni S, Gokhale A, Mahadevan S, Schmidt DC, Lehofer M (In Press) Chariot: goal-driven orchestration middleware for resilient IoT systems. ACM Trans Cyber-Phys SystGoogle Scholar
- Ravindra P, Khochare A, Reddy SP, Sharma S, Varshney P, Simmhan Y (2017) Echo: an adaptive orchestration platform for hybrid dataflows across cloud and edge. In: International conference on service-oriented computing (ICSOC)Google Scholar
- Sadeghi K, Banerjee A, Sohankar J, Gupta SK (2016) Optimization of brain mobile interface applications using IoT. In: IEEE international conference on high performance computing (HiPC)Google Scholar
- Shi C, Lakafosis V, Ammar MH, Zegura EW (2012) Serendipity: enabling remote computing among intermittently connected mobile devices. In: ACM international symposium on mobile ad hoc networking and computing (MobiHoc)Google Scholar
- Simmhan Y (2017) Iot analytics across edge and cloud platforms. IEEE IoT Newsletter. https://iot.ieee.org/ne wsletter/may-2017/iot-analytics-across-edge-and-cloud -platforms.html
- Stoica I, Song D, Popa RA, Patterson DA, Mahoney MW, Katz RH, Joseph AD, Jordan M, Hellerstein JM, Gonzalez J et al (2017) A Berkeley view of systems challenges for AI. Technical report, University of California, BerkeleyGoogle Scholar
- Varshney P, Simmhan Y (2017) Demystifying fog computing: characterizing architectures, applications and abstractions. In: IEEE international conference on fog and edge computing (ICFEC)Google Scholar