Information Systems Frontiers

, Volume 14, Issue 3, pp 499–515

Effective distributed service architecture for ubiquitous video surveillance

Authors

  • Ray-I Chang
    • Department of Engineering Science and Ocean EngineeringNational Taiwan University
  • Te-Chih Wang
    • Department of Engineering Science and Ocean EngineeringNational Taiwan University
    • Department of Computer Science and Information EngineeringMing Chuan University
  • Jen-Chang Liu
    • Department of Computer Science and Information EngineeringNational Chi Nan University
  • Jan-Ming Ho
    • Institute of Information ScienceAcademia Sinica
Article

DOI: 10.1007/s10796-010-9255-z

Cite this article as:
Chang, R., Wang, T., Wang, C. et al. Inf Syst Front (2012) 14: 499. doi:10.1007/s10796-010-9255-z

Abstract

Video surveillance systems are playing an important role to protect lives and assets of individuals, enterprises and governments. Due to the prevalence of wired and wireless access to Internet, it would be a trend to integrate present isolated video surveillance systems by applying distributed computing environment and to further gestate diversified multimedia intelligent surveillance (MIS) applications in ubiquity. In this paper, we propose a distributed and secure architecture for ubiquitous video surveillance (UVS) services over Internet and error-prone wireless networks with scalability, ubiquity and privacy. As cloud computing, users consume UVS related resources as a service and do not need to own the physical infrastructure, platform, or software. To protect the service privacy, preserve the service scalability and provide reliable UVS video streaming for end users, we apply the AES security mechanism, multicast overlay network and forward error correction (FEC), respectively. Different value-added services can be created and added to this architecture without introducing much traffic load and degrading service quality. Besides, we construct an experimental test-bed for UVS system with three kinds of services to detect fire and fall-incident features and record the captured video at the same time. Experimental results showed that the proposed distributed service architecture is effective and numbers of services on different multicast islands were successfully connected without influencing the playback quality. The average sending rate and the receiving rates of these services are quite similar, and the surveillance video is smoothly played.

Keywords

Ubiquitous video surveillanceMultimedia intelligent surveillanceMulticast overlay networkForward error correctionAES security mechanismCloud computing

1 Introduction

Video surveillance services are active for decades to protect lives and properties of individuals, enterprises and governments such as homeland security, office-building security, airport security, and traffic surveillance on highways. Due to the technology advancements in digital media compression, computer computation, wired/wireless communications and microelectronics, it is evolved to the third-generation surveillance system (3GSS) (Bramberger et al. 2006; Marcenaro et al. 2001) with digital camera to deliver compressed-video over prevalent networks and distribution of services among network nodes connected by heterogeneous communication links. As a result, new video surveillance systems require services of automatic event analysis and alarm. They connect with networks for reducing users’ long attention and acquiring ubiquitous surveillance services. Such evolution makes these services feasible to apply intelligent multimedia processing techniques to gestate variety of multimedia intelligent surveillance (MIS). In this paper, we classified 3GSS as four categories of surveillance service models as shown in Fig. 1. Most of surveillance services nowadays are composed as a single service provided by single camera (Single Service Single Camera, SSSC) or multiple cameras (Single Service Multiple Cameras, SSMC). While applying the idea of information sharing, each camera shares its captured image with multiple services. Such an idea leads two new service models, called MSSC (Multiple Services Single Camera) and MSMC (Multiple Services Multiple Cameras).
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig1_HTML.gif
Fig. 1

Surveillance service models from relationships between cameras and services

Since there are multiple services composed by single or multiple cameras, captured image should be distributed over more than one processing center. For example, in a typical airport security system, it may need to provide two MIS services of face recognition and dangerous object recognition (DOR) (Beynon et al. 2003; Regazzoni and Sacchi 2000; Smith et al. 2006; Stringa and Regazzoni 2000; Venetianer et al. 2007) from a single camera somewhere in airport. These processes are very complicated not only because varied objects needed to be recognized in a complex scene, but also the objects should be processed in real-time. Thus, distribute varied recognition processes among multiple networked computers as a MSSC service model will be an effective way to provide these services. The distribution of services inside the networks is an important advantage of UVS service architecture. This paper concerns the decomposition of logical surveillance functionalities into a set of logical components that can be allocated to different processes inside an intelligent and physical network, which is built from proposed service architecture for UVS. The main advantages of such decomposition will help UVS to become more efficient, flexible and intelligent to support four surveillance service models mentioned above.

Besides, as the applications of video surveillance service are growing, value-added and diversified ubiquitous video surveillance applications are produced by integrating with other full-fledged services over Internet. Then, the loading of specific surveillance spots will be getting heavier than before. For example, the loading of CPU and network bandwidth in surveillance content provider, e.g., smart camera, should also be considered for multiple accesses from varied video surveillance services with real-time requirements. These issues are challenging the scalability of UVS to support MSSC services. In this paper, we apply multicasting surveillance videos for all generations of surveillance service models (i.e. SSSC, SSMC, MSSC and MSMC) to furnish the effective distributed architecture for UVS scalability. IP multicast (Deering 1989; Quinn 2001) is a one-to-many protocol of data communications, originally designed for multimedia conferencing and suitable for network applications of multiple accesses. Different with IP unicast, IP multicast serves only single traffic no matter how many clients request. IP multicast is much useful for large scale network applications of single data source. However, in order to avoid service abuse and malicious attack of flooding traffic, ISP (Internet Service Providers) usually disable multicast forwarding ability on routers. Thus, multicast packets cannot pass through Internet. Multicast backbone (MBone) (Kumar 1995) arises as a virtual network for connecting multicast islands over Internet. On each of these islands, there is a host running a multicast routing demon and these islands are connected with one another via unicast tunnels. Service components are connected over Internet and multicast traffic can be reached with all service components via applied multicast overlay network (Chen et al. 2009).

Notably, some surveillance videos in UVS may preserve privacy and sensitivity while delivering them over the public Internet. The public Internet also preserves the network dynamic of occasional packet loss to jeopardize the video playback quality. Thus, we apply not only the cost-effective AES encryption (NIST 2001) with Diffie-Hellman key negotiations (Rescorla 1999) to protect different surveillance videos from Internet eavesdropper, but also the open-loop error control of forward error correction (FEC) (Macker 1997; Luby et al. 2002) to preserve the playback quality of delivered surveillance videos in UVS for further processing and presentation in MIS services. Therefore, the effective and secure distributed service architectures are proposed to provide UVS with ubiquity, scalability and reliability. Beside, the proposed UVS architecture constructs a distributed, heterogeneous and intelligent video surveillance network. As cloud computing, users do not need to own the physical infrastructure, platform, or software. They consume resources as a service, where Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), Software-as-a-Service (SaaS), and pay only for resources that they use.

The paper is organized as follows. Section 2 presents the levels from the physical to the logical of the proposed UVS architecture. An example of decomposition of a surveillance functionality related to multiple object tracking and object recognizing is given. In Section 3, design and implementation of the proposed UVS architecture are presented. It describes how the proposed UVS architecture applied in our OpenIVS (Open Internet Video Surveillance) system (Wang et al. 2007) to achieve scalability, ubiquity, security and reliability on Internet. Section 4 shows the performance results of quality of UVS service. Two value-added MIS services of fire detection and fall-incident detection in homecare surveillance are shown. Conclusions and future works are presented in the final section.

2 UVS architecture

In recent years, some researchers proposed different automatic and smart video surveillance systems. Goshorn et al. (2007) proposed a cluster-based automated surveillance network. They applied clustering techniques into cameras for detecting and tracking multiple persons’ activities. IBM Smart Surveillance System (S3) (Tian et al. 2008) further provides not only the capability to automatically monitor a scene but also the capability to manage the surveillance data, perform event based retrieval, receive real time event alerts through standard web infrastructure and extract long-term statistical patterns of activity. Snidaro et al. (2008) proposed an automatic multi-sensor surveillance system focusing on detection and alarming if unauthorized human trying to cross in the harbor area. Kandhalu et al. (2009) presented a distributed real-time surveillance system, named OmniEye with large-scale deployment of video cameras over wireless mesh networks. Other researchers also proposed such as foreground analysis of real-time video surveillance (Tian et al. 2005), detection and tracking of moving object (Connell et al. 2004), and automatic detection and indexing of video-surveillance-event shots (Foresti et al. 2002). However, the proposed distributed service architecture for UVS system further considers not only the system integrity, scalability and ubiquity, but also the communication reliability and security over Internet. Besides, the cloud computing technologies can be further applied for user to enjoy the effective UVS services in a more pervasive way.

An interesting aspect of the distributed decomposition of a logical functionality is that it allows, in a natural way, the cooperation with other surveillance functionalities in distributed computing environment. Moreover, the advantage is not only the parallel processing preserved in proposed distributed architecture to speed up the recognition and extraction of surveillance information, but also the applied communication scheme of multicast delivery leads to more efficient use of communication resources over Internet. To realize the implementation of the proposed distributed service architecture for UVS, we firstly define the overall model in general. The bottom-up service architecture is distinguished by two levels: physical and logical. The two levels in the proposed service architecture are illustrated in Fig. 2.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig2_HTML.gif
Fig. 2

UVS architecture from physical to logical level

2.1 Physical level

The physical level can be defined as a network with nodes and links. In real video surveillance systems, the following components are in the physical level.
  • Nodes which capture surveillance images/video such as IP cameras.

  • Nodes which are images/video intelligent processing (IIP/VIP) units such as the servers of object tracking and object recognition.

  • Links which represent heterogeneous communication channels such as wireless channels and wired channels.

The physical level is designed for general distributed video surveillance environment without limitations and constraints on the specific applications. Therefore, diversified surveillance applications can be effectively built on top of this level. The capacity of the communication links to deliver surveillance information including the surveillance videos in high bandwidth is considered to be the primary issue.

Concerning communication links for practical UVS systems, high-speed wireless links (Keller and Hanzo 2000; Hanzo et al. 2000) are becoming more and more important for connecting cameras to VIP units to achieve ubiquity in video surveillance as they allow higher flexibility and lower costs in installation. The most widely used wireless surveillance cameras generally adopt H.264 source coding and robust wideband transmission techniques (e.g., direct sequence spread spectrum) at high frequencies (e.g., 2.4-GHz ISM band). Besides, the prevalent IP cameras in market apply error-free protocols, like HTTP/TCP of Internet protocol suite, to deliver surveillance video packets of adaptive bit rates, which is dependent on time-varying channel conditions. IP cameras can also apply well-known link-layer protocol IEEE 802.11/b/g/n WLAN with multiple-access schemes to connect to IIP/VIP units to furnish UVS.

On the other hand, if wired links (such as xDSL links, cable networks, and fiber optics) have to be used between surveillance cameras and IIP/VIP units, wired transmission channels can be applied for long distance UVS connections (e.g., between hubs, routers, and control centers). Internet is heterogeneous which applies wireless or wired links with different capability of bandwidth. UVS should characterize the bottleneck links along the delivery path between surveillance camera and IIP/VIP units to maintain the quality of UVS service for later presentation and processing. For example, applying layer video coding (Zink et al. 2005; Kim and Ammar 2005) techniques can help UVS to serve more Internet users with different access bandwidth.

2.2 Logical level

By applying the components in physical level, logical level can be effectively built to gestate more diversified video surveillance applications. As shown in Fig. 2, the major components in logical level are summarized as follows.

2.2.1 Content provider (CP)

CPs are responsible for providing not only surveillance images/videos to other components through stream links, but also operations on these images/videos and make them accessible to all applications. IP cameras or hosts attached with cameras can be CPs to provide raw or compressed data of captured images/video. Besides, CPs also include the stored video servers for others components to browse, review and process the backup surveillance videos to extract more valuable surveillance events.

2.2.2 Service requester (SR)

MIS services for requested surveillance images/videos are also provided in SRs. Due to the models of SSMC and MSMC, SR may need different surveillance images/videos with different coding formats from CPs. Trans-coding (Dogan et al. 2001) techniques may also be applied to translate different coding of surveillance images/video to a common format for further processing. Moreover, an SR is played as the role of CP to provide the processed images for other SRs. This characteristic also makes the proposed UVS architecture become very suitable for distributed processing to speed up applied MIS services.

2.2.3 Service provider (SP)

SPs are the key components of the proposed UVS architecture to provide different kinds of surveillance services for SRs. Namely, while the SP receives request from SR for a certain video surveillance service, SP has to find out all the locations of CPs for the requested service. Thus, the proposed surveillance services model of SSSC, SSMC, MSSC and MSMC can be fulfilled. SPs are components for users to directly access UVS services provided by CPs. While users request UVS services, SP has to find out the corresponding locations of CPs.

A service in UVS is namely a MIS services and is a well-defined set of operations involved in the intelligent processing in surveillance video such as event recognition and extraction. For example, DOR service is a conventional surveillance service, especially used in airport security system, to derive whether a dangerous object is in passengers’ luggage. DOR service used to decompose a given functionality into several sub-services which is mainly responsible for various objects recognition, alarm and presentation.

MIS services are logical entities and need to be developed on top of physical level. The distribution architecture in UVS is described as the process of mapping the logical level defined by a set of service chains in levels. We also applied the idea from the DAVIC architecture (http://www.davic.org/). There are two kinds of service components, service providers and requesters for accessing UVS services. Another component in logical level is CP which is an interface between logical and physical level. SPs are entities used for requesting services, presenting or processing the surveillance information including events, alarms and received surveillance images/videos from SRs. SRs are entities used for mapping SP’s requests to proposed service models, send corresponding content requests to CPs, and then providing requested services to SPs. Contents needed by SRs or SPs are fulfilled by CPs, which can deliver the surveillance data including the images/videos from cameras (e.g., CCTV camera, CCD camera and IP camera) from physical level in proposed UVS architecture.

The relations among SPs, SRs and CPs are also shown in the top of Fig. 2. There are two kinds of communication link in proposed UVS architecture. They are command links and stream links for the above-mentioned components applied in the logical level to communicate the commands and surveillance images/videos respectively with each other. The advantages of applying formal DAVIC architecture into proposed architecture come to high compatibility and easy integration with present video streaming services, such as video on demand for stored surveillance videos and IPTV for live public surveillance videos.

An example of logical level architecture is a generic video surveillance system of traffic surveillance of city or highway. There are tens to hundreds of cameras installed on the city streets and highway spots. The automatic alarms/acknowledgements of traffic jams/incidents on traffic surveillance spots are provided by MIS service on IIP/VIP units from this traffic surveillance system. The MIS service applies image/video processing techniques to find out the traffic events on surveillance spots and then inform operators in remote control center to pay attention right away. Then, the operators can use camera control service to zoom in/out, pan/tilt the surveillance camera to further clarify the alarms/events.

The above mentioned automatic alarm/acknowledge service, MIS service and camera control service are classified to the entities of logical level of the proposed UVS architecture. Those cameras, the communication link connecting cameras and IIP/VIP units are the entities of physical level of UVS architecture. The functionality of video surveillance system can be decomposed into basic processing components to further provide “services”. We will give more details about how three components cooperate in logical level and what applications will apply the proposed service models at the following subsections.

2.3 Cooperation in UVS components and applications

UVS architecture can accomplish four service models (i.e. SSSC, SSMC, MSSC and MSMC) of 3GSS via three proposed system components (CP, SR and SP). In the following, we will describe and demonstrate not only how system components cooperate, but also what applications can be conformed to these four service models.

SP maintains the information of all surveillance services, includes the service description, which CP and/or SR provide the service, where the CP and/or SR locate, and other information needed by SRs. Thus, there exists command links for all CPs and SRs to register to SP with the provided services. Besides, service streams between CPs and SRs can be directly transmitted without the need of SP’s relay. The advantage is not only reducing the latency of the service streams but also decreasing the loading of SP.

First of all, SSSC is a more prevalent model of nowadays than the other three surveillance service models. It is also the simplest surveillance service model. A common example of SSSC is from the IP camera or the instance message service featuring with streaming live audio or video, such as MSN and Google talk. We can easily share the live videos with talking peer by using the instant message software. The SSSC model can be applied in video surveillance applications as shown in Fig. 3. User 1 and user 2 are both played the roles of SR and CP at the same time. There is a SP which is responsible for maintaining service information of the two users.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig3_HTML.gif
Fig. 3

SSSC application

Like SSSC, the service model of MSSC is usually used for video surveillance applications with a small surveillance region because of using a single camera. Video surveillance for electric toll collection (ETC) on highway is an example of MSSC. It includes several MIS services, such as license plate recognition, ETC lane traffic statistics, and car speed measurement. Figure 4 shows the ETC surveillance application which applies the service model of MSSC. A camera located at ETC lane is played as a CP and provide the surveillance images/video for the above-mentioned multiple surveillance services in diversity.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig4_HTML.gif
Fig. 4

MSSC application

In Fig. 4, two MIS servers of license plate recognition and car speed measurement are illustrated. Moreover, the MIS servers also can be played as a CP for providing the processed images for other MIS services, for example, the license plate images could be used for another MIS of stolen car recognition system over Internet.

On the other hand, independent MIS servers simultaneously starts to process the surveillance images/video to provide valuable surveillance information for each client, due to the multicast delivery of the surveillance video to multiple MIS servers applied in the proposed UVS architecture. Moreover, the transmission loading of CP can be relieved to cost-effectively conform to MSSC model.

SSMC is used for 3GSS of real-time object tracking/recognition in different cameras with time/location dependency. For example, crossing lane detection of car moving in tunnel (Bramberger et al. 2006) is illustrated in Fig. 5 to demonstrate the SSMC application. Besides, though small-region object tracking/recognition (Tan et al. 1994; Regazzoni and Tesei 1996; Foresti and Regazzoni 1995; Bogaert et al. 1996; Collins et al. 2000) can be applied in the service models of single camera (i.e. SSSC and MSSC), the highly increasing cost in tracking/recognition of multiple objects becomes a burden on a single camera. Different with the application shown in Fig. 4, the distributed architecture of SSMC model can be also applied to surveillance applications using multiple cameras without dependency to speed up the extraction of surveillance events.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig5_HTML.gif
Fig. 5

SSMC application

Finally, MSMC is used for complex surveillance environment in a large scale and also contains the above three models integrated together by multiple MIS services with multiple cameras. Figure 6 shows an example of video surveillance application while applying the service model of MSMC. There are multiple cameras in two kinds of on-line security systems of video surveillance, building and airport security systems, accessed by the two common services, such as video storage and object detection services. Thus, by applying the proposed UVS architecture conformed to MSMC models, many different MIS services from different expertise for detection or extraction of different surveillance events can be integrated together to gestate more diversified surveillance application in cost-effectiveness in SP for users.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig6_HTML.gif
Fig. 6

MSMC application

3 Design and implementation of UVS

In this section, we present the design and implementation of the proposed architecture via a prototype system named OpenIVS. This OpenIVS constructs a framework to realize effective distributed service architecture for UVS over Internet. To fulfill the objectives of ubiquity, privacy, reliability, interoperability, and scalability in UVS, the OpenIVS is logically composed by four major subsystems to effectively conform to UVS’s three logical components (i.e. SP, SR and CP) in four different service models. These four major subsystems are briefly described as follows:
  • Multicast agent (MA) play as the MBone to provide UVS with scalability via application-layer multicast to connect Internet multicast islands, where the CPs and SRs are located, to construct multicast overlay network for UVS (i.e. so-called UVSMON) (Chen et al. 2009). Security and reliability of UVS can also be achieved via MAs.

  • Super Agent (SA) is UVS’s central agent, which is located in public Internet, closely related to SP for UVS users. The CPs and SRs located in private networks over Internet can be accessible through the SA’s help for UVS to achieve ubiquity over Internet. Before asking SA for help, CPs and SRs must register their connection information first, includes surveillance location, host address/port, video type and etc. to SA for accessibility from other CPs and SR.

  • Customary subsystem is a subsystem to comply the previous generations of surveillance systems (i.e. 1GSS & 2GSS) in market with 3GSS for proposed UVS. Customary subsystem will provide agents (Wang et al. 2003; Wang et al. 2008) to extend the isolated surveillance services of previous generations of surveillance systems connecting to Internet for UVS’s interoperability.

  • Extended service subsystems are some extended surveillance services gestated from subsystems mentioned above. They can be applied further as CP or SR in the proposed UVS architecture and demonstrate the cost-effectiveness and scalability in diversified surveillance applications.

After briefly introducing the four major subsystems for the implementation of UVS, we are going to present in further details that how these subsystems can help UVS to achieve ubiquity, privacy, reliability, interoperability and scalability.

3.1 Ubiquity implementation via MA and SA

Since the NAT technology has been used widely to provide Internet access from private networks. In OpenIVS, for example, a sMA is hard to receive not only service requests but also surveillance image while it is located behind a NAT device. This will damage the ubiquity of OpenIVS. Thus, to efficiently achieve ubiquity in UVS via the proposed forwarding agents, while the source MA (sMA), which is in the same multicast island as camera-ready CP, needs to deliver the surveillance video to another destination MA (dMA), which is in the same multicast island as another SR and CP requesting for the surveillance video, sMA will join the same multicast group as the camera-ready CP to receive surveillance video packets. As shown in Fig. 7, sMA encapsulates the received packet content with a header, which includes multicast group address and multicast port number, and unicast them to dMA. While dMA receives these unicast packets, the packets are de-encapsulated and multicast to dMA’s subnet according to the header information. Then, remote SRs or CPs will have the surveillance video from remote sMA for further process via their dMA in the same multicast subnet.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig7_HTML.gif
Fig. 7

Multicast surveillance videos delivered from sMA to dMA via unicast tunnel

Besides, we discovered totally two kinds of different video forwarding paths listed below from sMA to dMA and elaborated on how to forward surveillance video, which the remote SR/CP is requesting for, from a camera-ready CP in a multicast island.
  • dMA in public network

If dMA is located in public network, a command for forwarding surveillance video from specific surveillance spot will be issued from dMA to SA first. Because SA maintains all needed information of surveillance services available to be connected for remote CPs or SRs, SA will find out the corresponding sMA and forward the request command to it. Therefore, as shown in Fig. 8, surveillance videos that multicast from camera-ready CP will be directly forwarded from sMA to remote dMA in public network through the unicast tunnel.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig8_HTML.gif
Fig. 8

Communication procedure while dMA is in public network

  • dMA in private network

If dMA is located in private network, dMA cannot receive any packets from sMA directly. Hence, SA is applied to serve as an intermediate relay between sMA and dMA to forward packets. As shown in Fig. 9, dMA has to send a command for the multicast surveillance video from camera-ready CP via SA to sMA. Then, sMA starts to transmit the multicast surveillance video through SA to dMA by two unicast tunnels. As a result, SRs or CPs located in private network can access surveillance video from remote CP, which is located in either public or private network, through public SA’s help.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig9_HTML.gif
Fig. 9

Communication procedure while dMA is in private network

3.2 Privacy protection in MA and SA

Because surveillance videos usually preserve privacy, we also take account of the security issues for the proposed UVS over public Internet. As mentioned earlier, SA is a centralized architecture and contains all the important information of UVS. Malicious users can be prevented by applying authentication procedures from accessing valuable information of UVS. However, sensitive surveillance videos should also be protected from eavesdroppers due to its transportation over the public Internet. Thus, we are going to focus on the security issue of surveillance video delivered between SA and MAs as follows.

Considering security strength and real-time constraint of the surveillance videos, OpenIVS applies the well-known symmetric cryptography AES as the encryption and decryption algorithm to protect sensitive surveillance video content while forwarding. Besides, Diffie-Hellman key negotiation algorithm (DH) is not only applied to negotiate an encryption key, but also periodically update encryption key to boost security strength for UVS.

In OpenIVS, once the key negotiation procedure has been done, surveillance video can be transmitted by the procedures we mentioned in Section 3.1. The key negotiation in OpenIVS can be classified by two paths: MA (i.e. sMA and dMA) to SA and sMA to dMA. The key negotiation procedures for these two paths are illustrated in Figs. 10 and 11, respectively.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig10_HTML.gif
Fig. 10

Key negotiation between MA and SA

https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig11_HTML.gif
Fig. 11

Key negotiation between dMA and sMA

As the same cases mentioned in Section 3.1, Figure 10 shows the key negotiation procedure between MA (i.e. sMA or dMA) and SA. By exchanging MA’s and SA’s public key and applying DH, MA and SA then negotiate a common secret key (session key) for privacy protection to the surveillance videos forwarded between MA and SA. Besides, before the surveillance videos are directly transmitted between sMA and dMA, sMA and dMA negotiate a common secret key (session key) through SA’s help. Figure 11 presents the key negotiation procedure between sMA and dMA via SA. While the key negotiation procedure is done, sMA then sends the encrypted surveillance video to dMA.

3.3 Scalability for extended services

Due to the ubiquity of UVS, we also apply three kinds of extended services as examples in the proposed architecture to reveal the practicability. These three extended services include video recording subsystem (VRS), fire detection subsystem (FDS), and fall incident detection subsystem (FIDS).
  1. (1)

    VRS

     
Video recording is an essential functionality on video surveillance system. In the proposed architecture, we apply video recording as an independent subsystem. In this way, we can not only easily have multiple surveillance backups on several servers according to the importance of the surveillance spot, but also efficiently reduce the loading of recording servers by distributing the recording servers. Furthermore, VRS is responsible for recording surveillance videos via a file system and provide query interfaces for reviewing the stored video.
  1. (2)

    FDS

     
Figure 12 shows the implemented vision-based fire detection (Liu and Ahuja 2004) that has been an important application of surveillance system. It can be served as a supplement to the other fire detection methods, such as particle sampling and temperature sampling, which require a close proximity to the fire. The implemented fire detection algorithm takes color video as input, and judges the image pixels as flames through a predefined color table (Phillips et al. 2002). The temporal variation properties of fires are further analyzed to decide whether fire event has happened in a video sequence (Celik et al. 2007). The fire detection subsystem is an independent component that can be easily added to any subnet to receive the multicast videos from MA.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig12_HTML.gif
Fig. 12

The implemented FDS

  1. (3)

    FIDS

     
Figure 13 shows a fall incident subsystem that is implemented to monitor the indoor person’s behavior and detect emergency event (Juang and Chang 2007). The human body posture is classified through the algorithm of support vector machine (http://www.csie.ntu.edu.tw/~cjlin/libsvm), and FIDS system reports an emergency alarm when a person falls or remains in lying status for a long time. Similar to FDS, the fall incident subsystem is also an independent component that can be easily added to any subnet to receive the multicast videos from MA.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig13_HTML.gif
Fig. 13

The implemented FIDS

4 Theoretic analysis and experimental results

In this section, we will demonstrate the UVS performance by theoretic analysis and experiments on the OpenIVS test-bed as shown in Fig. 14.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig14_HTML.gif
Fig. 14

Experimental environment

4.1 Theoretic analysis

To demonstrate the UVS’s scalability, we did theoretic analysis for the performance evaluation of bandwidth cost and service latency on the proposed architecture. At the point of view of the surveillance camera, the analysis on 3GSS and UVS can be modeled as tree structures, as shown in Figs. 14 and 15, respectively. There are several services (e.g. N service requests, N represents the number of requesters.) required the surveillance images which are captured by the camera in surveillance system. According to the proposed architecture, the camera and services are CP and SRs respectively. In conventional 3GSS surveillance architecture, service bottleneck will be occurred at the camera caused by unicasting surveillance video to requesters. Once the N requesters simultaneously request to the same surveillance video from a CP, the total traffic load B3GSS in camera-ready CP’s network will be obviously grow up according to the number of N requesters. The total traffic load B3GSS is shown in Eq. 1, where bvideo represents the bandwidth cost for a single surveillance video traffic.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig15_HTML.gif
Fig. 15

A tree structure model for the traditional surveillance architecture

$$ {B_{3GSS}} = \sum\limits_{i = 1}^N {{b_{video}}} = N \times {b_{video}} $$
(1)
$$ {L_{3GSS}} = \frac{{{\rho_{Network}}}}{{1 - {\rho_{Network}}}} \approx \frac{{ \frac{{{B_{3GSS}}}}{{{B_{Network}}}} }}{{1 - \frac{{{B_{3GSS}}}}{{{B_{Network}}}} }} = \frac{{N \times {b_{video}}}}{{{B_{Network}} - N \times {b_{video}}}},\forall {B_{3GSS}} < {B_{Network}} $$
(2)
While applying the queueing theory, the queueing length L3GSS in CP buffer for conventional 3GSS is shown in Eq. 2. According to Little’s law, the queueing delay T3GSS of service latency in conventional 3GSS will be shown in Eq. 3. ρNetwork represents the utilization of CP’s network bandwidth BNetwork. Due to the surveillance video traffic B3GSS is usually much larger than the other traffic in CP’s network, the network utilization ρNetwork can be approximately represented by the ratio of B3GSS to BNetwork.
$$ {T_{3GSS}} = \frac{{{L_{3GSS}}}}{{{B_{3GSS}}}} = \frac{1}{{{B_{Network}} - N \times {b_{video}}}} $$
(3)
In proposed architecture, the tree model can be reconfigured as Fig. 16. Due to the characteristic of IP multicast, there is only single traffic in a subnet. Thus, the video traffic load in the camera-ready CP’s network will no longer be dependent on the number of requesters (SRs) that are behind the same MA. While in this best case, all SRs are behind the same MA, the bandwidth cost BUVS and queuing delay TUVS at the CP’s network shows as follows:
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig16_HTML.gif
Fig. 16

A tree structure model for the proposed architecture

$$ {B_{\rm{UVS}}} = {b_{video}} $$
(4)
$$ {T_{\rm{UVS}}} = \frac{{{L_{\rm{UVS}}}}}{{{B_{\rm{UVS}}}}} = \frac{1}{{{B_{Network}} - {b_{video}}}} $$
(5)
While there are SRs located outside of the local MA, we must consider the queueing delays from MAs to forward surveillance video to remote SRs. In the worst case, the MA may forward N surveillance videos from its local CP to N remote MAs if all SRs are located in different subnets. The resultant traffic cost BUVS_worst in local MA’s network and service latency TUVS_worst to N remote MAs are like 3GSS, that is, unicast to N different locations, and they are shown in Eqs. 6 and 7 respectively. TlMA-rMA indicates the queueing delay to forward N video traffic from local MA to remote MAs.
$$ {B_{UVS\_ worst}} = N \times {b_{video}} $$
(6)
$$ {T_{{\rm{UVS}}\_ {\rm{worst}}}} = {T_{{\rm{lMA - rMA}}}} = \frac{1}{{{B_{Network}} - N \times {b_{video}}}} $$
(7)
Thanks to the load balancing (LB) scheme (Chen et al. 2009) proposed for MAs applied in UVS, the local MA with camera-ready CP shares the traffic loading with other N MAs in the worst case of all the SRs locating in different multicast islands. Therefore, the queueing delay of service latency in the worst case of proposed UVS might be reduced to less than 3GSS. Due to the dynamic tree structures constructed by N remote MAs with different resource constraints, we can simply induct the average case under the assumption of each MA forwarding m video traffic in maximum to another MA to estimate a better service latency than UVS without load balancing. The service latency TUVS_LB of UVS with load balancing is estimated as follows:
$$ {T_{{\rm{UVS}}\_ {\rm{LB}}}} = {T_{\rm{UVS}}} + {T_{{\rm{LB\_ MA}}}} \approx \frac{1}{{{B_{Network}} - {b_{video}}}} + \sum\limits_{i = 0}^{{{\log }_m}N - 1} {\frac{{{m^i}}}{{{B_{Network}} - m \times {b_{video}}}}}, m < N $$
(8)
If some of N SRs are located in the same multicast island, the traffic load from the CP to all SRs will be less than the conventional 3GSS architecture. Therefore, for the average case of UVS traffic, there are several possibilities including all SRs behind a MA, only one SR behind a MA and the others behind another MA and etc. By mathematical induction, the average overall traffic load for proposed UVS is shown as follows.
$$ {B_{{\rm{UVS\_ average}}}} = \frac{{\sum\limits_{i = 1}^N {i \times {B_{\rm{UVS}}}} }}{N} = \frac{{{(}N + {1)} \times {B_{\rm{UVS}}}}}{{2}} $$
(9)

According to the study of bandwidth cost for the proposed architecture in average case, we elaborate all possible situations where SRs located: from all SRs behind a MA to all SRs behind different MAs, and the average traffic load BUVS_average is shown in Eq. 9. The analysis expressed the proposed architecture can significantly reduces traffic load in 50% off from CP in average. Thus, diversified video surveillance applications of MSMC service model can benefit from the proposed UVS architecture.

4.2 Experiments and performance results

For UVS, image/video quality after transmission is very important for object recognition in remote site. Thus, we did several experiments to demonstrate the impact of playback quality of surveillance video in H.263 compression with privacy protection of AES encryption and reliability provision of FEC error control are applied to OpenIVS for UVS test-bed. Figure 14 shows the experimental environment. There are four multicast islands, such as NTU R125A lab which is located in the laboratory of National Taiwan University in Taipei city, MCU s206 lab which is located in Taoyuan County, NCNU lab which is located in Nantou County and a private network which is also located in Taipei. Each of these multicast islands has a MA to forward their surveillance video. Besides, a client located at the private network would display these surveillance videos from surveillance spots in these multicast islands.

Figure 17 shows the impact of sending rates and receiving rates in average while applying privacy protection of AES encryption in surveillance video of OpenIVS. The frame rates on camera and client both ranged between 14.5 to 15 fps. We could see that the average sending and receiving rates are almost equal. That indicates the surveillance video can be smoothly played and these three multicast islands has been successfully connected without influencing the playback quality by using proposed forwarding agents of MAs and SA in proposed UVS architecture. Note that, since the surveillance video at local surveillance spot has less motion, its sending rate and receiving rate are lower than the other surveillance spots.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig17_HTML.gif
Fig. 17

Average sending and receiving rates at each site

Besides, we compared the variations of both sending rates and receiving rates during the same period at MCU while the privacy protection of AES encryption is applied on the forwarding agents (see Fig. 18). The results also indicate that the privacy protection we applied on forwarding agents does not degrade the transmission rates (i.e. the variation of sending/receiving rates are all very close) and we can securely deliver the surveillance video on Internet without affecting the playback quality.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig18_HTML.gif
Fig. 18

Sending rate and receiving rate of whether the security mechanism is applied

Finally, we did a simulation to measure the reliability that can be achieved in OpenIVS by applying FEC error control. The simulation was done by sending 5000 multicast packets with sending rate 225 kbps, each packet contained payload size of 512 bytes, and 25 FEC packets were added for every 50 video surveillance packets. In the FEC simulation over the private network with Wi-Fi environment, 103 packets were totally lost at receiver after transmission. Packet lost rate reached to 2.05%. While applying FEC during the transmission, all lost packet were recovered at receiver. Packet lost rate can be significantly reduced to zero if the extra bandwidth cost is allowable in the networks of OpenIVS. According to the simulation results, FEC is very helpful to recover lost packets for UVS over Internet to preserve playback quality, especially on the prevalent wireless environment.

We also did experiments to demonstrate the efficiency of FDS and FIDS while applying in different scenes. First of all, in order to demonstrate FDS, we collected several kinds of videos, with and without fire scenes, and played the video continuously in full screen. A web camera is used to capture the video screen to simulate the situation of played scene.

Table 1 shows the experimental results of FDS. Figure 19 shows a Christmas tree that is on fire. Almost all frames have fire objects in this film and FDS can successfully detect the fire objects all. Figure 20 shows a fire extinguisher instruction. This film is used to demonstrate whether FDS can successfully detect the change of fire scene.
Table 1

Experimental results of FDS

Film description

Total frames

Number of frames with fire objects

False Positive

False Negative

Fired building

416

416

0

5

Fired Christmas tree

1,782

1,757

0

68

Fire extinguisher instruction 1

499

182

22

0

fire extinguisher advertisement

265

163

0

69

Fire extinguisher instruction 2

259

217

0

36

Waving national flag

356

0

11

0

A part of movie

167

116

6

71

Fired room

160

111

0

78

Sunrise over the ocean

521

0

332

0

A film without fire scene

194

0

11

0

https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig19_HTML.jpg
Fig. 19

Fired Christmas tree

https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig20_HTML.jpg
Fig. 20

Fire extinguisher instruction 1

After that, we also did some experiments to demonstrate whether FIDS detects falling events. When a man fell and lied down for a period of time, an emergency event will be alert. Besides, while a man fell and changed the posture from lying to sitting, it is a warning event. Once the man sits for a period of time, there is a high possibility that the man needs help to stand up. Figure 21 shows received images from SP at frame 151, 155, 159, 173, 313, 463, 613 and 723. Figure 22 shows the posture classification by SVM. Figure 23 shows the average value for every 50 frames after FIDS detected a lying posture which its number of time exceeds a threshold in the succeeding 100 frames. Figure 24 shows the emergency event alarm for every 200 frames while a falling event is detected. The experimental result shows that FIDS can successfully detected the falling events.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig21_HTML.jpg
Fig. 21

Received fall and lay down frames

https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig22_HTML.gif
Fig. 22

Posture classification by SVM

https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig23_HTML.gif
Fig. 23

Average value for every 50 frames

https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig24_HTML.gif
Fig. 24

Emergency event alarm for every 200

Moreover, efficient distributed object recognition scenario, shown in Figure 25, can also be applied in the proposed architecture. Captured surveillance image is passed through the proposed architecture to several object recognition services, e.g., FDS and FIDS. Here, FDS and FIDS can be executed at different PCs that are connected by Internet cable and process the surveillance image from the same camera source. As a result, FDS and FIDS can recognize both events at the same time over a distributed processing environment.
https://static-content.springer.com/image/art%3A10.1007%2Fs10796-010-9255-z/MediaObjects/10796_2010_9255_Fig25_HTML.gif
Fig. 25

Distributed object recognition scenario

5 Conclusion

Surveillance systems have been playing an important role for human lives and properties. As the technology advancements, surveillance systems will toward the diversified development of multiple intelligent services in ubiquity. In this paper, we firstly point out the four service models in abstraction for UVS systems. Secondly, we perform a logical decomposition based on the point of view of services as a basic step to define the logical components associated with a given surveillance functionality. A novel distributed UVS architecture can be constructed with theses logical components to preserve ubiquity, privacy, reliability, interoperability and scalability. Then, we also design and implemented a distributed UVS prototype which is called OpenIVS. Three kinds of extensive services are implemented and demonstrated to effectively detect fire and fall-incident features, and record the captured video at the same time. Moreover, we demonstrate some experiments to evaluate the effective OpenIVS by good performance results of playback quality. At the end, we also did an analysis for evaluating the bandwidth cost on CP. The analysis result shows that the proposed distributed service architecture has the ability to significantly reduce half bandwidth cost on CP in UVS systems.

Copyright information

© Springer Science+Business Media, LLC 2010