It is necessary to know the typical topology and use cases for Intelligent Multi-Modal Security Systems (IMSS) to understand the security considerations.

Summary

The design of Intelligent Multi-Modal Security Systems (IMSS) has experienced major transformations from the age where Analog cameras were monitored by humans and stored on VHS tapes to today; an IP networked, Deep Learning-driven system can efficiently augment humans with insightful information and recommendations. Intel expects further developments in this space and is enabling game changing technologies that will usher us into the next generation of IMSS.

In this Chapter, we explore the various historical transformations of IMSS technologies and show you a glimpse of how Intel is changing the future by driving exponential changes end-to-end from endpoint sensor edge devices, at the network edge, and through the network infrastructure to the cloud.

Intel is making advances in new technologies in Machine Learning-based inferencing, computing devices, memory, storage, and security and show how they allow IMSS System Architects to design for various constraints around cost, performance, security, privacy, and public policy.

  • Intel technology can add intelligence on the edge to optimize network bandwidth utilization, reduce storage and computing costs in the Data Center, and reduce human review time and fatigue.

  • Intel technologies like the OpenVINO™ toolkit make it easy to develop, deploy, and scale analytics intelligence on a variety of hardware platforms that optimize for performance, power, and cost.

  • Intel OpenVINO™ security Add-on (OVSA) solutions can be used to protect valuable analytics applications in transmission, storage, and at runtime. In addition, OVSA can provide privacy protections for video streams and analytics results.

  • Foundational Intel Security solutions ensure platform integrity, protect data, provide trusted execution environments, and accelerate end-to-end cryptographic operations.

  • Overall security robustness and system efficiency can be improved by taking an end-to-end system approach to security.

  • IMSS-using Intel Security technology helps to support new privacy and public policy requirements, laws, and regulations.

  • Adding intelligent analytics to edge devices improves privacy protection in IMSS.

  • Intel Corporation’s advances in AI, memory, and compute device designs drive the future capabilities of IMSS by enabling the use of efficient sensor fusion technologies.

History of Intelligent Multi-modal Security System Solutions

Video 1.0 – Analog Video Technology

Over the past 15 years, new technology has profoundly changed the design of IMSS solutions. Before the 2000s, typical IMSS implementations were built around analog cameras; the recordings they made were spooled to VHS tapes on stand-alone systems. When an incident occurred, a security agent faced a time-intensive process of screening VHS tapes on a video monitor to find an incident. Sharing the video information with another investigator required a security team to manually retrieve a tape and transport it to the next agent, who would then spend even more time scrolling through the VHS tape.

In this analog camera era, security was simple physical security; systems were hardwired and the integrity of the wiring and the recorders and recording media was protected by limiting physical access to the system.

Video 2.0 – IP-Connected Cameras and Video Recorders

Starting in the early 2000s, Physical Security System technology adopted the Internet revolution with the Internet Protocol Camera (IPC), and with it a major shift in the digital recording process: the digital data was now stored on a local server (Networked Video Recorder – NVR) rather than on VHS tapes (Figure 2-1). A local security agent could quickly retrieve an incident while at their desk and decide what to do based on the screening. A digital clip could be forwarded electronically to the next agent in the investigation.

Figure 2-1
An illustration of the evolution of video 3.0 and D S S 2.5. Both have H D I P camera, and A I training or inference, storage or monitoring at the ends. In D S S, wired or wireless LAN connects N V R in the center. In video, wireless WAN also connects edge cloud A I training. Wired LAN or WAN is the next path.

Recent evolution in IMSS system designs

These systems were installed and maintained mostly by consumers and physical security professionals, not Information Technology security experts. The shift to digital video and IP cameras went unnoticed by remote network attackers for many years, but in 2016, the video systems industry got a wake-up call. Starting in September 2016, the Mirai botnet DDoS attack took down the Akamai host service of Krebs on SecurityFootnote 1 with a worldwide botnet of 620 Gbps from more than 600,000 networked devices, most of them being IP cameras and video recorders. This was followed quickly by a similar attack on the DYN servers (Hosting Twitter, Spotify, Reddit, and others) in October 2016. The Mirai botnet exploited IP connected devices with default or fixed remote login credentials.

The next innovation in IMSS brought basic cameras with intelligence in the form of traditional computer vision. However, these system designs placed higher demands on a Data Center for more intelligence and computing power. System designers off-loaded some of these demands by connecting basic cameras with intelligent edge servers, and then connecting those with Data Centers. Today, the new system designs include smart camera technology with intelligence at the sensor, at the edge, and in the Data Center.

The recent releases of Intel vision technology enable moving intelligence to devices at the edge. Intelligent edge devices make it possible to detect and properly annotate objects of interest in the video stream. Such objects (termed “annotated video data”) are then transmitted to the Data Centers, where they receive more computationally intensive analysis and operation.

Intelligent edge devices bring four major benefits for system designers in optimizing the system operation:

  • The optimal use of network bandwidth and storage resources, as only the relevant data is transmitted to the Data Center for further analysis by discarding irrelevant or redundant data.

  • The optimal use of Data Center operators by reviewing only the annotated events, focusing attention on the important tasks.

  • The optimal use for review. When an administrator reviews captured and annotated data at the data center, personnel can quickly zero-in on potential areas of interest. This use case does not optimize the use of network bandwidth and storage resources; however, it greatly aids a human reviewer in finding and screening important events.

  • The ability to optimize for response latency through local analytics and accuracy through connected high-performance systems in the network edge for the best latency or in data centers for accuracy and performance.

Through the application of edge intelligence, video streams are now annotated with metadata that enables reviewers to find events of significance (e.g., a person of interest). However, as the number of cameras grows, and their resolution increases, more network capacity is required, raising the demand for more processing and storage resources in Data Centers.

In addition, performance and efficiency improvements in edge inferencing, bandwidth constraints and cost, and response latency constraints are driving the migration of analytics from the data center to on-premise edge video recorders, and even to edge cameras. Further bandwidth and storage benefits can be attained when only the frames or regions of interest can be upstreamed rather than the entire video stream.

Current Intelligent Multi-modal Security Systems Solutions

Video 3.0 – Intelligent Cameras and Video Recorders

Today, Intel offers the next evolution in this technology process: E2E Video 3.0. This innovation places compute intelligence in the form of Machine Learning (ML) inferencing at the edge of the Internet. Recent designs prove that the application of analytics to raw data streams at the on-premise edge creates a compelling advantage by improving compute efficiency, reducing latency, and reducing network bandwidth utilization.

Traditional computer vision algorithms worked well when both the target and the environment were well defined. In practice, however, real-life situations are often not clearly defined, and leave unacceptable gaps in certainty. The need for clear identification of objects of interest has driven recent developments in Intel compute systems. This has resulted in the development of hardware accelerators capable of deploying popular Convolutional Neural Network (CNN) models that have been trained to identify anomalies in targets and environments on the edge. Intel has invested heavily in these hardware accelerators to efficiently process computer vision workloads present in similar environments like Autonomous Vehicles, whenever changing environments and situations require close-to-100% certainty in environmental awareness.

Current system solutions employ network link protection, enabled via Open Network Video Interface Forum (ONVIFFootnote 2) standards, to protect the video streams while in transit across the network. While this improves security, it does not address the security inside private networks or on the devices.

Bandwidth and Connectivity

Intel’s Visual Computing accelerators for edge analytics have enabled new designs from edge to cloud that is faster and more efficient. Intel devices on the edge and in the Data Center offer a varying degree of power and performance to meet system constraints. This makes it possible for Intel to provide a suite of products that address customer design needs from edge to cloud (see Figure 2-2). With these intelligent edge devices, Intel has altered the type of data being transferred to a Data Center: metadata describing detected events can now be sent in place of or alongside raw data streams depending on design requirements. This pre-analysis and data pre-processing unlock several advantages: it unlocks the potential to reduce the amount of data to be transferred to a Data Center, increases the amount of network bandwidth for other functions or more metadata streams, and also increases the usefulness of the data at the Data Center.

Figure 2-2
An illustration with flow of metadata, compressed video, and network, between edge, gateway, edge D C, and cloud with their specifications.

Key performance improvements at each stage in an intelligent video system design

It matters where the analytics are located in a computer system design. A system design with intelligent video system capabilities that is placed at the edge helps to balance the overall compute performance. Consider what happens when analytics are embedded into edge sensor compute devices; for example, in the form of a field programmable gate array (FPGA). When an analytics application is installed in an edge device, the device can reduce the raw data streams into actionable metadata for Data Center analysts. This changes the analyst’s role from that of a performer of forensics analysis (searching data streams to analyze past events) to a decision maker (reviewing actionable metadata in near-real-time).

Cost/Power/Performance

Most Camera systems use Power over Ethernet (PoE) to minimize installation infrastructure complexity and cost. The lowest cost POE (type 1) supplies 13W to the camera. After power supply efficiency, there is about 10W available to camera electronics. Adding intelligence to these devices can be challenging to fit in the 10W constraint. As a result, many of the older designs require an extensive and expensive infrastructure redesign to be used in an edge device analytics environment.

Today’s new edge-based camera designs require only about 4–6 watts of power for a system on a chip that includes onboard analytics, as Figure 2-2 shows. Networked video recorders at the network edge for processing more streams or networks that require more processing capacity use less power and therefore have a better Total Cost of Ownership. Being able to fit analytics capabilities in a camera or NVR power constraint results in a reduced workload in the Data Center. This can be counted as a reduction in power demand and corresponding cooling demand for the Data Center, reducing the Total Cost of Ownership (TCO). Power efficient analytics engines can also be used in the data center, reducing the TCO of data centers that run analytics or Video Analytics as a Service (VAaaS) in public cloud centers.

The efficacy of the current generation machine learning accelerators brings additional benefits in response latency and in security.

Time-sensitive applications such as access controls that depend on recognition from a video camera can be performed locally, reducing the latency for system response. This improves efficiency and safety by reducing action response latency in the system.

Security is also improved in cases where privacy is critical. Analytics and high efficiency cryptographic functions in the camera can now enable new modes of privacy protection where video is never seen outside of the camera in an unprotected form.

Ease of Development, Deployment, and Scaling

Intel offers several toolkits that streamline the effort required to develop and deploy an intelligent video system design at the edge. Intel’s OpenVINO™ toolkitFootnote 3 enables software vendors and Original Equipment Manufacturers (OEMs) to easily and quickly deploy their pre-trained Vision-based CNNs to a variety of Intel-based accelerators: central processing units (CPUs), graphics processing units (GPUs), FPGAs, and visual processing unit (VPUs). The OpenVINO toolkit greatly reduces the time-to-deployment because it eliminates the need to redesign hardware and software architectures through its Inference Engine that does a load time compilation targeting existing Intel technologies with optimized kernels. The OpenVINO toolkit includes optimized calls for OpenCV* and OpenVX*, and provides support for the popular Deep Learning frameworks like Tensorflow* and Caffe*.

Today, the Intel OpenVINOTM toolkit (Figure 2-3) can be used to port a customer-pre-trained, Vision-based CNN (on supported frameworks and architectures) into OpenVINO’s Intermediate Representation (IR). The model’s IR can then be deployed at load time to a multiplicity of compute node types, including Intel® Xeon® processors, Core Processors, Atom processors, GPUs, FPGAs, and VPUs. Through the model’s IR, the OpenVINO toolkit has the ability to automatically optimize the system for best performance. The OpenVINO toolkit offers several advantages to developers:

  • Architecture agnostic: Operation with major frameworks

  • Performance: High performance and high efficiency solutions for edge-based computing

  • Portability: Cross-platform flexibility via hardware abstraction

Figure 2-3
An illustration of the Intel Open V I N O toolkit includes building, optimizing, and deploying. The toolkit with optimized performance in frameworks such as Tensor flow and Caffe, is deployed in Windows, Linux, or Mac O S.

Intel OpenVINO™ toolkit – visual inferencing and neural network optimization

The Intel OpenVINO toolkit libraries are capable of mapping analytics applications to specific architectures quickly and in an optimal manner. It is not uncommon for a customer application running on an older machine to see a significant increase in speed when the application is ported onto a supported processing unit. The Intel OpenVINO toolkit is designed to survey the system environment, determine what inferencing compute resources are available, and customize the model deployment to gain optimal performance or maximize power efficiency.

Figure 2-4 shows a simple recipe for IMSS designs that include distributed inference at the gateway/edge/endpoint. Start with an Intel CPU, then add targeted acceleration for higher throughput and/or throughput per watt. Intel-integrated graphics processor boosts throughput and is generally available in Intel® Core processors. Specialized Intel processors, including the Intel® Movidius™ Vision Processing Unit,Footnote 4 are targeted for computer vision.

Figure 2-4
An illustration of the components and software support within the Intel Open VINO toolkit includes 1. host C P U app and media processing, I O, storage, and 2. Integrated graphics, and 3. accelerators to get higher performance.

Components and software support within the Intel OpenVINO™ toolkit

Next-Generation Intelligent Multi-modal Security Systems Solutions

Intel expects IMSS system solutions in the future to continue moving analytics capabilities out to the edge of computing environments. As an example, a facial recognition application could run today on a system that uses edge-based analytics. The metadata generated by the edge-computing analytics could be fed into a series of mobile edge-based servers (see Figure 2-5). This design would give quick access to a database of information, equip security agents with near real-time access to facial recognition results, and allow for near real-time response through the network.

Figure 2-5
An illustration of the Edge server networks. E 2 E video 2.5 includes server and storage in the cloud, via 1 G or 10 G WAN to N V R, and via LAN to smart I P C. The E 2 E video 3.0 includes cloud with A I, via 100 G or 25 G WAN, to base station or edge cloud, which connect I P C.

Edge server networks will evolve as 5G networking takes hold

There has been a robust discussion in industry circles concerning the optimal partitioning of intelligence at the edge as opposed to in the Cloud (Data Center). The results of some recent customer applications have suggested several key advantages in placing intelligence at the edge of the compute environment:

  • Data Centers see a lower demand for power and, as a result, a lower demand for cooling.

  • Time to actionable data is reduced; agents can make almost real-time decisions.

  • Network communications traffic is minimized.

  • Privacy and security are improved by reducing the exposure of content and personally identifying data.

Impact of Memory and Compute Improvements

New, high density, persistent memory and storage technologies are enabling new optimizations in IMSS implementations. Changes in data storage density and R/W energy make it easier to include storage in edge devices such as NVRs and Cameras. New optimizations in IMSS are being driven by the changes in the latency and bandwidth of memory access along with different R/W lifetimes and energy cost.

Dramatic improvements are being made in compute capabilities from edge to cloud with an increased emphasis on heterogeneous compute platforms that are specialized to perform certain tasks. In addition, these compute platforms will continue to push the limits of performance at lower power envelopes.

Combining heterogeneous compute with new memory hierarchies can further improve performance and power efficiency when the architecture of computation and memory are combined to provide reduced latency and higher bandwidth to memory. There is an additional security benefit to on-die memory storage when the data transfers are unobservable.

Design for Privacy

Data privacy remains a hot topic in many parts of society and is typically driven by regional regulations. Regulations like the EU General Data Protection Regulation (GDPR)Footnote 5 and privacy regulations in many US states enforce strict privacy rules protecting the rights of subjects whose images are captured on edge devices (often accompanied by stiff penalties for non-compliance). The GDPR, for example, stipulates that the data subject must give a consent to the processing of his or her personal data – in the IMSS use-case, images – for it to be a lawful basis for processing. In addition, the data subject is afforded the right to access and request the erasure of any personal data related to them within a given period.Footnote 6 GDPR also mandates that personal data is lawful for use by authorized personnel under specific circumstances. Hence IMSS must support both personal privacy and provide authorized access to personal identifying information under strict controls for the public’s benefit. Failure to comply with the GDPR is punishable by fines of up to 4% of the violators’ annual gross revenue.

Our opinion is that the GDPR stipulates the privacy framework well, plus much of the new legislation (for example, the UK data protection Act of 2018 and the California Consumer Privacy Act of 2018) is similar to the GDPR, therefore the GDPR definitions and text will be used here as a reference.

Personal data

Personal data is any information related to an identified or identifiable person (PII). This can be a name, identification numbers or tokens, location, physical, physiological, genetic, mental, economic, cultural, or social identity.

Processing

Processing means operations on personal data such as collection, organization, recording, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction. Processing also includes profiling; using personal data to evaluate personal aspects such as work performance, economic situation, health, personal preferences, interests, reliability, behavior, location, or movements.

Protection

The GDPR requires that appropriate technical measures with due regard to the state of the art are taken to ensure that data controllers and data processors are able to fulfill their data protection obligations. Under the definitions of GDPR, video streams and images captured from video streams are the input data for analytics processing which extract biometric data for the purpose of uniquely identifying a natural person. These biometric data are a special category of personal data. The GDPR requires personal data to be protected against unauthorized or unlawful processing.

Protection Security guidelines

The following is a summarization of the security requirements regarding the protection of personal data. These have been collated and simplified from the GDPR text.

  • Data protection by design

  • Data protection by default

  • Minimizing processing of personal data

  • Pseudonymizing personal data as soon as possible

  • Transparency with regard to the functions and processing of personal data

  • Enabling the data subject to monitor the data processing

  • Enabling the controller to create and improve security features

  • Services and applications should be encouraged to take into account the right to data protection when developing and designing such products, services, and applications and, with due regard to the state of the art, to make sure that controllers and processors are able to fulfill their data protection obligations.

  • Storage time limitation

  • Ensure that by default, personal data are not made accessible without the individual’s intervention to an indefinite number of natural persons

  • Ensure the ongoing confidentiality, integrity, availability, and resilience of processing systems and services

  • Ensure the ability to restore the availability and access to personal data in a timely manner in the event of a physical or technical incident

  • Include a process for regularly testing, assessing, and evaluating the effectiveness of technical and organizational measures for ensuring the security of the processing.

  • Render the personal data unintelligible to any person who is not authorized to access it, leveraging cryptographic capabilities such as encryption.

So, what does all this mean? The system technology must provide state-of-the-art standard of care for the video streams and especially for the machine learning analytics results that are personally identifying. The data must be confidentiality protected when at rest and when in transit throughout the life cycle and processing path. In the next section, you will read how these systems are designed, and how the security features support these requirements.

Principle IMSS System Components

Following the general description of an IMSS, the principal components: a smart camera, network video analytics recorder, edge server, and operations data center server are described. Figure 2-6 illustrates the major processing devices in an end-to-end IMSS. Video is generated in cameras, transmitted over the ethernet to a networked video recorder, stored in an edge server, and transmitted upstream to an operations center, to a cloud server running video analytics as a service, and to remote viewers using client devices.

Figure 2-6
An illustration flow of I M S S. 4 cameras, a multifunction device, and a laptop, connect to I S P network and WAN public internet, which through remote client access enables, video analytics as a service, and in operation center server.

IMSS topology

IMSS System View

Cybersecurity is a key element in any IMSS, particularly when the system is connected to the public Internet for flexible access to the video streams. Not only is it important to provide confidentiality and privacy by encrypting the video transmitted via the ethernet but also security within the devices is critical for a fully robust security system.

Smart IP Camera

You learned about the generational progress of cameras earlier in this chapter. Now we will go into more detail about the emerging Video 3.0 smart cameras.

Figure 2-7 shows the addition of image analytics (earning the Smart Camera designation). In addition to the analytics, graphics rendering (including composition blend) may be present to label the video and overlay graphics such as a region of interest box. The basic IP camera functions from Figure 2-7 are the lens and image sensor, image synthesis, processing, and video encoding. The lens system focuses an image on the sensor through a color filter array. This produces a Bayer patterned image where only red, green, or blue is sensed in a given sensor pixel. The image synthesis functions convert The Bayer image to a full color image with each pixel having a red, green, and blue component. The image is then given block-based processing to dewarp (correct for optical artifacts and motion artifacts). Finally, there is a function that does full frame processing using multiple input frames: high dynamic range processing and temporal noise reduction. The primary output is encoded video over Ethernet (so the video must be compressed). The optional display output is used for installation and troubleshooting.

The sensor processing, statistics, exposure, white balance, focus, sensor control, and lens control functions are standard mandatory elements of a camera.

Figure 2-7
A work flow of a smart camera. It includes a lens system, image synthesis, block functions, frame functions, analytics, statistics, auto exposure, sensor gain and shutter timing, zoom, iris, sensor processing, graphics render, display control, and video compression.

Smart camera

Once a video stream is compressed, it can be protected for the upstream link to an edge video recorder or server or the cloud with link protection under ONVIF standards.

Analytics performed in the camera are characterized by latency sensitivity constraints and by the value proposition of performing analytics on uncompressed video streams.

An example of an analytics function that requires low latency is license plate recognition. To recognize a license plate, the plate must first be detected (located in the frame), tracked until the size of the license plate fits the accuracy required for recognition, whereupon a hi-resolution image of just the plate is sent to the recognition application. The latency between the initial image capture, object detection, and the region of interest capture for recognition is critical because if it is too long, the region of interest capture will fail (because the object will already be gone). In this example, the loop time constraint is determined by the speed the object is moving, the largest distance from the camera that the object can be detected, and the point when the object is out of frame. Performing the function in the camera reduces the time response constraint because it eliminates the time delay to compress the video and transmit it to a video recorder or server to be decompressed, run through the analytics application, track the object, and send the result back to the camera to identify the region of interest in the raw frame at the right time. When the function is less constrained by time, it can be performed with a more efficient processor, further benefitting in accuracy because the algorithm operates on the raw image frames. ML analytics can automate the Pan-Tilt-Zoom (PTZ) capability of cameras, raising assurance that they are always pointed toward and zoomed in on critical objects and events. In the general case, any analytics application that is interacting in a closed loop with objects benefits from the reduced latency. A side benefit to performing analytics processing in the camera comes from having access to the pre-compressed video frames. This can improve the accuracy of machine learning compared to analysis after compression due to the additional noise from compression.

Another example of the value proposition of analytics in a camera is to use a Machine Learning algorithm to determine where to optimize the compression bitstream budget, increasing the visual quality of important objects, and reducing quality elsewhere to make the best use of ethernet bandwidth and stream storage footprint.

The basic level of security for the video stream and metadata from the camera is provided by encrypting and optionally cryptographic hashing of the content. As we will show in later chapters, link protection for the stream is valuable, but does not provide complete protection from all the threats to an IMSS camera.

Network Video Recorder with Analytics

Figure 2-8 is a very simple functional process flow model of an on-premises NVR that is not performing any security functions (shown to highlight the data path processing).

Figure 2-8
A work-flow of On-premises N V R. It includes stream input management, playback decode, real-time decode, O S D composition and display, scale to C N N input, local analytics, and off S o C analytics.

On-premises NVR, no security features

The boundary indicated by the dashed line is the boundary of the System on a Chip (SoC) CPU. The compressed video streams from the cameras enter through a stream management function via an Ethernet LAN connection. The streams are spooled to storage (generally a hard disk drive or solid state drive) and fed to a real time decode processing function. After the compressed streams are decoded (or decompressed), the image frames are scaled to the frame size of the neural network before performing analytics functions such as object detection and classification. The analytics function may be performed locally, with an external accelerator, or the processing may be split across both. The metadata from the analytics will be stored and used to generate graphics elements that are composited (overlayed) with the video stream as it is sent to the display for viewing. The graphic elements may range from a simple text overlay indicating the video source location to simple rectangular bounding boxes with a text field, to complex semitransparent object overlays to call attention to objects of interest. For simplicity, this example does not include storage or subsequent transmission of the analytics results metadata. Systems that are doing that must also apply appropriate security to those data objects (which would meet the same objectives as the display output and stream output).

Video streams that have been stored can also be retrieved later for playback, which uses the same processes.

The process flow diagram in Figure 2-9 adds a remote user or upstream capability to the process flow in Figure 2-8.

Figure 2-9
A work-flow of On-premises N V R. It includes stream input management, stream output management, playback decode, real-time decode, encode, O S D composition, O S D composition and display, scale to C N N input, local analytics, and off S o C analytics.

On-premises NVR with remote user/upstream link

Again, there are no security functions shown here to emphasize the data processing tasks. This adds stream output management process and an additional OSD composition task. Because the remote user connects to the NVR via a Wide Area Network (WAN) connection for viewing via a simple player application, the video stream must be encoded (or compressed). The raw stream will be composited with graphics derived from the analytics results as described previously. Raw video streams can also be sent from storage via WAN to a user or to an upstream video server, data center, or an operations center with no analytics analysis.

Figure 2-10 introduces the added functions in an NVR to include full confidentiality and integrity protections for the video streams and analytics data. This will be described in detail later in the context of a complete processing implementation and a complete security implementation.

Figure 2-10
A work-flow of On-premises N V R. It includes stream input management, stream output management, playback decode, real-time decode, optional hash, decrypt, optional authenticate, encode, and more. Use case is, video playback with analytics and off-line video playback.

NVR with security and privacy rights management

For this example, the security functions are included. Unprotected or cleartext streams are shown in red and protected or ciphertext streams are shown in green. When the camera sends the streams to the video recorder, they are confidentiality protected by encrypting the streams in an SSL/TLS session. These must be decrypted for processing and for storage (after having been re-encrypted with a storage device key). Note that the encryption is done inside the SoC to protect the stream on the external datapath to the storage. If it is encrypted in the storage system, it will protect from theft of the disk but will not protect against an observer on the bus.

Real-time streams, having been decrypted, are ready for decoding before running analytics or displaying them. Streams from storage (secured with the storage key) must be decrypted before the playback decode task.

Local analytics has access to both real time and playback streams in plaintext. If the streams are to be processed in an off chip analytics accelerator, the streams must be encrypted to protect them on the public bus between the SoC and accelerator.

When streams have been composited with the graphics as described earlier, the display interface uses HDCP protection to prevent cloning or copying the streams being sent over the HDMI or DP interface to the local display.

Likewise, when the streams are upstreamed to another server or to a remote user, the stream is encrypted to protect the confidentiality and privacy of the information in the stream and the analytics results.

So in this example, video streams that are transported over interfaces that are physically accessible are always protected with encryption.

NVRs will consume from 8 streams to as many as 200+ streams. The amount of local storage depends on customer’s design choices on storage cost and the amount of time required to retrieve streams for forensic investigation and sometimes as required by law. The number of streams processed by analytics, viewed, and upstreamed also depends on customer’s design choices, optimized as described in the next section.

Compute resources – General to Specialized, Key Performance Indicators (KPIs)

Compute resources are assigned to the processes in the task graph of Figure 2-10, depending on the choice of workload for

  • The number of video streams input to the platform

  • The number of streams stored

  • The number of live and recorded streams played back

  • The number of streams processed with Machine Learning analytics

  • The number of streams post-processed adding graphics elements representing the result of the analytics

  • The number of streams viewed on local displays

  • The number of streams upstreamed or viewed remotely on a client device via the Internet

Tradeoffs between cost, performance, and power are made to optimize a system for these three constraints to meet the preceding workload attributes. For example, analytics can be performed on a CPU at lower Bill Of Materials (BOM) cost, but higher cost efficiency measured in frames per second per watt. When a GPU is available, better performance at lower power is available at no cost if the GPU is otherwise lightly loaded. Higher performance and performance efficiency are realized with dedicated Machine Learning inferencing processors such as FPGAs and VPU accelerators. However, adding these accelerators will increase system cost and may increase overall system power use while still improving the performance per watt. The critical consideration for devices that are energy- or power supply-limited is that the only way to get more overall performance is through improvements in efficiency. For example, if the device is limited to 25 watts and you need 10 watts to process one video stream on a CPU, you will be limited to 2 video streams. However, if you can process a stream using a high efficiency dedicated Machine Learning accelerator for 2 watts per stream, you can process 12 video streams.

The security workload can also vary dramatically depending on how the protection for the video streams and analytics metadata is managed at the system level. Using link and storage protection requires multiple decrypt/encrypt operations, whereas encryption keys assigned to the streams rather than the links will eliminate the redundant operations required to change keys. More detail on this is discussed in Chapters 5 and 7.

Edge Server

Edge Servers will generally ingest up to several hundred streams. While the primary role of an edge server is storage, edge servers can also provide analytics processing and display functions. The task graph for an edge server will look like the video recorder task graph from Figure 2-10, but will balance the workload differently to optimize for the storage function.

Operations Data Center Server

Operational data centers will ingest up to thousands of streams and may display hundreds of video streams.

In the operational center, human operators have difficulty remaining attentive to video streams for more than 20 minutes,Footnote 7 and the detection accuracy decreases when the number of streams to be monitored increases, dropping to ~50% for nine streams.Footnote 8 Machine Learning analytics provides valuable workload reduction for the operators by analyzing the streams for events that require human attention and judgment, enabling the operators to focus attention on critical events.

Machine Learning analytics is applied in data centers for after-the-fact (forensic) analysis requiring complex analytics. Investigations are performed when a critical event becomes known, often weeks after it occurred. Analytics can help to quickly locate the content that leads to law enforcement actions.

Functions such as tracking a person or vehicle across a large area scene observed by many cameras are often performed in operations centers because the aggregation of a large number of video streams is mandatory.

Another important application suited to operational centers is situational awareness. For this capability, sensor fusion can be performed in conjunction with an ML application to combine input from multiple cameras, audio sensors, environmental sensors, and scene state data to provide a rich, big data estimate with improved accuracy compared to any sensor alone. High quality situational awareness shifts the response from forensics to real time response to critical events.

Security principles must be applied across the complete end-to-end system to address gaps, weaknesses, and vulnerabilities that compromise the overall system security, that is, the system is only as secure as its weakest link.

End-to-End Security

As shown in Figure 2-11, security is proportional to the value of the assets in the system, therefore a diligent analysis is required for tradeoffs and the associated complexity.

Figure 2-11
An illustration of security continuum. Commercial and defense at the ends have 1 and 3 dollar signs respectively, with 2 dollar signs in the center.

Cost of security

The exponential growth of devices (projected to reach 50 billion) is driving a demand for security from the cloud to the sensor device edge. Secure processing has become necessary, and the degree of security required will vary depending on customer needs. In a security spectrum, commercial customers today have less of a demand as compared to Defense/Government customers, which often have the highest security needs. Defense customers are often concerned with physical security threats to their systems. To obtain the highest levels of security today, customers often pay a large price that is often commensurate with the criticality of the information that is being protected.

IMSS’s span this security continuum. Depending on the installation environment, and the risks related to system availability and accuracy, the required security level and the resulting implementation and maintenance costs vary accordingly.

Threats are constantly evolving and changing. Hackers and exploiters are no longer content with exploitations at the application or at the operating system (OS) level. They are working around the applications that would normally provide some indication that something is wrong. Attackers are digging their way into the boot code, communication channels, and compromising the integrity of the physical interfaces on the system. Once they obtain access, they cause physical changes that cause havoc on systems, or at minimum, cause unpredictable behavior that results in inadvertent release of information that can be used in the next level of exploitation. These threats are driving security and performance enhancements needed in both commercial and government ecosystems to stay ahead of the adversaries.

Solutions that provide security and enable trust have become increasingly necessary. It is increasingly important for customers to have systems that can reliably process what is expected, when it is expected, for as long as it is expected, and can discriminate against both malicious processes and malicious circuitry. Customers are driving demand for solutions that allow them to design trust into their systems, while providing additional security capabilities against exploitation.

The representative attack surfaces and the possible threat exposures are shown in Figure 2-12. The trends are indicating that the attacks are progressing down the stack from applications to operating system to hypervisors to firmware and eventually to hardware. The attack surfaces at different layers in the stack expose multiple threats. As an example, the booting of a device with unauthorized firmware will render the defenses above that layer in the stack weaker, resulting in a compromised system that’s hard to recover.

Figure 2-12
A chart of attack surfaces for I o T platforms. It has a stack from applications to the operating system to hypervisors to firmware and eventually to hardware. Different layers in the stack have examples of multiple threats.

Attack surfaces and evolving threat exposure

Intel has spent a great deal of time and effort in designing computer systems that can be secure. That effort has brought changes to many design aspects of computer systems, including:

  • How systems identify themselves on the network (an immutable and unchangeable ID)

  • How systems do a secure boot

  • How systems protect information on local storage devices

  • How systems create and manage trusted run-time environments

  • How systems protect access to security keys

  • How systems encrypt and decrypt messages

  • How systems perform Intra- and Inter-communication within a platform

  • How systems manage authority certificates

  • How systems manage communications channels

The changes in security and the evolving threats have resulted in the release of Intel Security Essentials as shown in Figure 2-13. Intel technology is mapped to the areas that Intel considers to be the four core security capabilities. All vendors must enable these core capabilities at different layers, and the capabilities must be enabled at the right layers by the right entities. The Intel mapped technologies include:

  • Platform Integrity – Includes Intel® Boot Guard, Intel® PTT, discrete TPM support, and others.

  • Protected Data, Key, and ID – Provides protected storage like Intel PTT, Discrete TPM, and total memory encryption (TME) that guards against frozen DRAM attacks.

  • Trusted Execution – Protects the runtime environment and application memory with solutions like Software Guard Extensions, MKTME, and others.

  • Crypto Acceleration – Includes particular crypto operations that perform AES encryptions/decryption and SHA for sign/verify operations, and Secure Key which includes a random number generator to create keys.

Figure 2-13
A chart of intel security essentials. It includes core security capabilities, value proposition of category, security technologies, and mapped intel technologies in 4 rows.

Intel core security capabilities baseline for trusted systems

Figure 2-14 shows a simplified surveillance use case with end-to-end flow of data from the edge devices to the cloud. The smart cameras (on the left) generate live video streams and send it to the network video recorder (in the middle), which could be analyzing some data from the video streams. The endpoints (on the right) receive the data and store it, display it, or upload it to the cloud.

Figure 2-14
A data flow in the surveillance use case for crypto. Includes smart I P cameras using intel, network video recorder using intel, and endpoints. Some components are Intel security key and Intel P T T.

Example of encryption in a surveillance use case

In Figure 2-14, there are several areas where crypto capabilities are required to protect the data in transit. Encrypted data is shown in green; several areas may require data encryption while the data is in transit. In this simple surveillance system, Intel technology can protect the video streams and analytics data throughout the system. For example, data streams sent from the camera to the Network Video Recorder (NVR) can be protected using Secure Real Time Streaming Protocol streaming protocol under ONVIF or with a VPN tunnel to the NVR. This is especially important when using cameras on a publicly accessible transmission means like the Internet. In addition, protection can be applied to the data sent to storage (providing at-rest protection) by using storage encryption or per-stream encryption. When data is up-streamed to an operation center or a cloud, the video and metadata can be protected using HDCP encryption; this protection is included in HDMI connections from the NVR to displays in the on-premises operations centers.

In Chapter 7, we will see how the overall security of an IMSS can be improved over this basic level of security.

Cost Overheads for Security

Applying security in the data path has some performance implications due to latency from cryptographic operations such as encrypt/decrypt and sign/verify operations. The key generation may not be as impacted due to infrequent nature of such operations.

Intel® QuickAssist Technology (Intel® QAT) on server platforms accelerates and compresses cryptographic workloads by offloading the cryptographic operations to hardware capable of optimizing those functions. This makes it easier for developers to integrate built-in cryptographic accelerators into network and security applications.

  • Symmetric cryptography functions include: Cipher operations (AES, DES, 3DES, ARC4); Wireless (Kasumi, Snow, 3G); Hash/Authenticate operations (SHA-1, MD5, SHA-2 [SHA-224, SHA-256, SHA-384, SHA-512]); Authentication (HMAC, AES-XCBC, AES-CCM); Random number generation.

  • Public Key Functions include: RSA operation; Diffie-Hellman operation; Digital signature standard operation; Key derivation operation; Elliptic curve cryptography (ECDSA and ECDH) Random number generation and price number testing.

  • Compression/Decompression include: DEFLATE (Lempel-Ziv 77)

Confidentiality, Integrity, Availability

These principles can be implemented using the AES-NI, SHA-NI, and DRNG CPU instructions at high performance. The protection at runtime for code and data can be achieved with Trusted Execution Environments such as virtual machines or with Software Guard Extensions (SGX). SGX technology can also be used to protect the Intellectual Property of the ML/DL models-related assets such as labels, features, models, training data, etc. Emerging memory encryption technologies provide protection for the workloads and data as they are written, stored, and read in DRAM. Relevant firmware over the air (FOTA) and software over the air (SOTA) updates can be deployed to improve the availability of the platform and handle the patching required to mitigate the security incidents.

Confidentiality, integrity, and availability are also critical for protecting the video streams and the analytics results. Privacy requirements will use the confidentiality benefits from the cryptographic accelerators, CPU instructions, and trusted execution capabilities. For usages in criminal prosecutions and in applications where the integrity of the video and metadata is critical, these capabilities also are mandatory.

Secure Data Storage

Intel Platform Trust Technology (PTT) or a discrete Trusted Platform Module (TPM) can be leveraged for storing the data and keys securely tethered to Silicon and paired with the platform. These keys are used to encrypt/decrypt the data stored on the mass storage volume.

Conclusion

In this chapter, we explored the historical transformations of IMSS technologies and introduced how Intel is changing the future by driving pertinent changes end-to-end from edge devices, on the network edge, and in the cloud. We also described an IMSS with a focus on E2E security and articulated the Intel security assets to leverage and build a robust IMSS system.

The IMSS domain doesn’t exist in a vacuum, the next chapter provides a detailed discussion of the relevant technologies in a surveillance system.

  • Basic Image Synthesis and Video Processing functions

  • Breakdown is important to understand how they work and will be secured (what is the value they get out of it)

  • Standard Computer Vision and Machine Learning functions

  • Provide standard of care Cybersecurity (pragmatic, yet robust)