In home environment, context knowledge is necessary for activity analysis. Lying on the sofa has a very different interpretation than lying on the floor. Without context information, usual lying on sofa might be classified as unusual activity. Keeping this important aspect in mind, we propose a mechanism that learns the scene context model in an unsupervised way. The proposed context model contains two levels of informations: block-level information, which will be used to generate features for direct classification process, and zone-level information, which is used to confirm the classification results.
The segmentation of a moving person from background is the first step in our activity analysis mechanism. The moving person is detected and refined using a combination of color and gradient-based background subtraction methods . We use mixture of Gaussian-based background subtraction with three distributions to identify foreground objects. Increasing the number of distributions does not improve segmentation in indoor scenarios. The effects of the local illuminations changes like shadows and reflections, and global illumination changes like switching light on or off, opening or closing curtains are handled using gradient-based background subtraction. Gradient-based background subtraction provides contours of the moving objects. Only valid objects have contours at their boundary. The resulting silhouette is processed further to define key points, the center of mass, head centroid position H
and feet or lower body centroid position using connected component analysis and ellipse fitting [14, 23]. The defined key points of the silhouette are then used to learn the activity and inactivity zones. These zones are represented in the form of polygons. Polygon representation allows easy and fast comparison with the current key points.
3.1 Learning of activity zones
Activity zones represent areas where a person usually walks. The scene image is divided into non-overlapping blocks. These blocks are then monitored over time to record certain parameters from the movements of the persons. The blocks through which feet or in case of occlusions lower body centroids pass are marked as floor blocks.
Algorithm 3.1: Learning of the activity zones (image)
Step 1 : Initialize
divide the scene image into non-overlapping blocks
for each block set the initial values
count ← 0
timestamp ← 0
Step 2: Update blocks using body key-points
for t ← 1 to N
Step 3: refine the block map and define activity zones
topblk = block at the top of current block
toptopblk = block at the top of topblk
rightblk = block to the right of current block
rightrightblk = block to the right of rightblk
perform the block-level dilation process
if topblk = 0 ∩ toptopblk ! = 0
if rightblk = 0 ∩ rightrightblk ! = 0
perform the connected component analysis on the refined floor
blocks to find clusters
delete the clusters containing just single block
define the edge blocks for each connected component
find the corner points from the edge blocks
save corner points V 0, V 1, V 2, ..., V
= V 0
as the vertices of a polygon representing an activity zone or cluster
The rest of the blocks are neutral blocks and represent the areas that might contain the inactivity zones. Figure 1 shows an unsupervised learning procedure for activity zones. Figure 1a shows the original surveillance scene, and Figure 1b shows feet blocks learned using trajectory information of moving persons. Figure 1c shows the refinement process, blocks are clustered into connected groups, single block gaps are filled, and then, clusters containing just one block are removed. This refinement process adds the missing block information and removes the erroneous blocks detected due to wrong segmentation. Each block has an associated count variable to verify the minimum number of the centroids passing through that block and a time stamp that shows the last use of the block. These two parameters define a probability value for each block. Only highly probable blocks are used as context. Similarly, the blocks that have not been used for a long time, for instance if covered by the movement of some furniture do not represent activity regions any more, and are thus available to be used as a possible part of an inactivity zone. The refinement process is performed when the person leaves the scene or after a scheduled time. Algorithm 3.1 explains the mechanism used to learn the activity zones in detail. Each floor block at time t has an associated 2D reference mean head location H
(t) for x and y coordinates). This mean location of a floor block represents the average head position in walking posture. It is continuously updated in case of normal walking or standing situations.
In order to account for several persons or changes over time, we compute the averages according to
represent the current head centroid location, and α is the learning rate, which is set to 0.05 here. In order to identify the activity zone, the learned blocks are grouped into a set of clusters, where each cluster represents a set of connected floor blocks. A simple postprocessing step similar to erosion and dilation is performed on each cluster. First, single floor block gaps are filled, and head location means are computed by interpolation from neighboring blocks. Then, clusters containing single blocks are removed. Remaining clusters are finally represented as a set of polygons. Thus, each activity zone is a closed polygon A
, which is defined by an ordered set of its vertices V0, V1, V2, ..., V
= V0. It consists of all the line segments consecutively connecting the vertices V
, i.e., . An activity zone is normally in an irregular shape and is detected as a concave polygon. Further, it may contain holes due to the presence of obstacles, for instance chairs or tables. It might be possible that all floor blocks are connected due to continuous paths in the scene. Therefore, the whole activity zone might just be a single polygon. Figure 1c shows the cluster representing the activity zone area. Figure 1d shows the result after refinement of the clusters. Figure 1e shows the edge blocks of cluster drawn in green and the detected corners drawn as circles. The corners define the vertices of the activity zone polygon. Figure 1f shows the final polygon detected from the activity area cluster, the main polygon contour is drawn in red, while holes inside polygon are drawn in blue.
3.2 Learning of inactivity zones
Inactivity zones represent the areas where a person normally rests. They might be of different shapes or scales and even in different numbers depending on the number of resting places in the scene. We do not assume any priors about the inactivity zones. Any number of resting places of any size or shape present in the scene will be modeled as inactivity zones, as soon as they come in to use. Inactivity zones again are represented as polygons. A semi-supervised classification mechanism classifies the actions of a person present in the scene. Four types of actions, walk, sit, bend and lie, are classified. The detailed classification mechanism is explained later in Section 4. If the classifier indicates a sitting action, a window representing a rectangular area B around the centroid of the body is used to learn the inactivity zone. Before declaring this area B as a valid inactivity zone, its intersection with existing sets of activity zone polygons A
is verified. A pairwise polygon comparison is performed to check for intersections. The intersection procedure results in a clipped polygon consisting of all the points interior to the activity zone polygon A
(clip polygon) that lie inside the inactivity zone B (subject). This intersection process is performed using a set of rules summarized in Table 2[24, 25].
The intersection process  is performed as follows. Each polygon is perceived as being formed by a set of left and right bounds. All the edges on the left bound are left edges, and those on the right are called right edges. Left and right sides are defined with respect to the interior of polygon. Edges are further classified as like edges (belonging to same polygon) and unlike edges (of different types means belongs to two different polygons). The following convention is used to formalize these rules: An edge is characterized by a two-letter word. The first letter indicates whether the edge is left (L) or right (R) edge, and the second letter indicates whether the edge belongs to subject (S) or clip (C) polygon. An edge intersection is indicated by X. The vertex formed at the intersection is assigned one of the four vertex classifications: local minimum (MN), local maximum (MX), left intermediate (LI) and right intermediate (RI). The symbol || denotes the logical 'or'.
The inactivity zones are updated anytime when they come in to use. If some furniture is moved to a neutral zone area, then the furniture is directly taken as new inactivity zone, as soon as it is used. If the furniture is moved to the area of an activity zone (intersect with an activity zone), then the furniture's new place is not learned. This is only possible after the next refinement phase. The following rule is followed for the zone updation: an activity region block might take the place of an inactivity region, but an inactivity zone is not allowed to overlap with an activity zone. The main reason for this restriction is that a standing posture on an inactivity place is unusual to occur. If it occurs for short time, either it is wrong and will be automatically handled by evidence accumulation or it has been occurred while the inactivity zone has been moved. In that case, the standing posture is persistent and results in the updation of an inactivity zone. The converse is not allowed because it may result in learning of false inactivity zones in the free area like floor. Sitting on the floor is not same as sitting on sofa and is classified as bending or kneeling. The newly learned feet blocks are then accommodated in an activity region in the next refinement phase. This region learning is run as a background process and does not disturb the actual activity classification process. Figure 2 shows a flowchart for the inactivity zone learning.
In the case of intersection with activity zones, the assumed current sitting area B (candidate inactivity zone) is detected as false and ignored. In case of no intersection, neighboring inactivity zones I
of B are searched. If neighboring inactivity zones already exist, B is combined with I
. This extended inactivity zone is again checked for intersection with the activity zones, while it is probable that two inactivity zones are close enough, but in fact, they belong to two separate resting places and are partially separated by some activity zone. So the activity zones act as a border between different inactivity zones. Without intersection check, a part of some activity zone might be considered as an inactivity zone, which might result in wrong number and size of inactivity zones, which in turn might result in wrong classification results. The polygon intersection verification algorithm from Vatti  is strong enough to process irregular polygons with holes. In the case of intersection of joined inactivity polygon with activity polygon, the union of the inactivity polygons is reversed and the new area B is considered as a new inactivity zone.