Multimedia Systems

, Volume 16, Issue 4, pp 293–307

Asynchronous reflections: theory and practice in the design of multimedia mirror systems

Authors

    • Like.com
  • Bo Begole
    • Palo Alto Research Center, Inc. (PARC)
  • Maurice Chu
    • Palo Alto Research Center, Inc. (PARC)
Regular Paper

DOI: 10.1007/s00530-010-0192-y

Cite this article as:
Zhang, W., Begole, B. & Chu, M. Multimedia Systems (2010) 16: 293. doi:10.1007/s00530-010-0192-y
  • 163 Views

Abstract

In this paper, we present a theoretical framing of the functions of a mirror by breaking the synchrony between the state of a reference object and its reflection. This framing provides a new conceptualization of the uses of reflections for various applications. We describe the fundamental technical components of such systems and illustrate the technical challenges in two different forms of electronic mirror systems for apparel shopping. The first example, the Responsive Mirror, is an intelligent video capture and access system for clothes shopping in physical stores that provides personalized asynchronous reflections of clothing items through an implicitly controlled human–computer interface. The Responsive Mirror employs computer vision and machine learning techniques to interpret the visual cues of the shopper’s behavior from cameras to then display two different reflections of the shopper on digital displays: (1) the shopper in previously worn clothing with matching pose and orientation and (2) other people in similar and dissimilar shirts with matching pose and orientation. The second example system is a Countertop Responsive Mirror that differs from the first in that the images do not respond to the real-time movement of the shopper but to frames in a recorded video so that the motion of the shopper in the different recordings are matched non-sequentially. These instantiations of the mirror systems in fitting room and jewelry shopping scenarios are described, focusing on the system architecture and the intelligent computer vision components. The effectiveness of Responsive Mirror is demonstrated by the user study. The paper contributes a conceptualization of reflection and examples of systems illustrating new applications in multimedia systems that break traditional reflective synchronies.

Keywords

Pervasive computingIntelligent user interfaceMultimedia systemAsynchronous reflectionPersonalized media contentComputer visionMachine learningResponsive MirrorApparel shopping

1 Introduction

Mirrors, physical objects that perform specular reflection of light (or other waves), are used for a variety of purposes: telescopes, lasers, cameras and perhaps most obviously for seeing oneself. There are a number of reasons one may desire to see oneself including grooming, personal health, athletic training, or shopping for apparel (clothing, jewelry, hats, eye glasses and other accessories). In all of these situations, the mirror (or “looking glass”) is acting as an information appliance, providing information to the observer of what they look like to others.

Generally, we use our reflection to check that the image matches our expectation of appearance or to make a choice among options. For example, a common practice when apparel shopping in a physical store is to search the inventory for items of interest, select a few for comparison and try them on in front of a mirror to decide which, if any, to purchase. The shopper evaluates the items according to how well they fit physically, and also how well they fit the image of herself that she wants others to perceive. That is, the shopper not only checks whether a garment fits her body, but also whether it fits her style. Fashion decisions are determined by complex and subtle factors [5, 9] which are reflected in the mirrored image.

1.1 Breaking reflective synchronies

There is another sense of the word “reflection” that we explore in this paper, which is that of looking back on past events, often to reuse information learned in the past. Traditional mirrors do not look back in time because the physical reflection of reality is synchronized in time and space by the speed of light traveling from physical objects, reflective surfaces, and perceiving entities. With multimedia technology it is possible to break the synchrony of what is reflected using record and playback technologies.

1.1.1 Synchronous reflection

First, let us decompose the elements of reflection in a mirror. A reference object (e.g., a shopper) exists physically in front of a specularly reflective surface (usually “silvered” glass) which bounces light back in real-time forming an image that is commonly referred to as a reflection. As the state of the reference object changes (position, color, light, etc.), the reflection changes in precise correspondence. We refer to such optical reflection as synchronous reflection because the reference object and reflection are both based on images from the current time. Changes in the reference result in corresponding changes to the reflection at the same time (constrained by laws of physics). Figure 1 shows a light ray diagram that illustrates the formation of an image in a flat mirror.
https://static-content.springer.com/image/art%3A10.1007%2Fs00530-010-0192-y/MediaObjects/530_2010_192_Fig1_HTML.gif
Fig. 1

Ray diagram of formation of a reflected image in flat mirror. The reflected light is perceived to come from an object behind the mirrored surface

The reflection is not necessarily an exact duplicate of the reference as it may be a transformation of the reference image. With physical mirrors, this can be accomplished using a non-planar shape of the mirror surface. Figure 2 illustrates the transformation of the reference image to a smaller-than-lifesize image in a convex mirror. In fact, no reflection can be a perfect representation of the reference object—even the best mirrors lose some of the light they receive. Therefore, we consider all reflections to be composed of one or more transformations of the image generated by the reference object(s). Because a reflection is generally assumed to correspond to the state of reality, when transformations are deliberately applied to a reflection, users should be notified so that they are fully aware of any differences from reality.
https://static-content.springer.com/image/art%3A10.1007%2Fs00530-010-0192-y/MediaObjects/530_2010_192_Fig2_HTML.gif
Fig. 2

Ray diagram of formation of a transformed reflection in a convex mirror

It is possible to emulate a conventional synchronous reflection electronically, without use of a polished physical reflector, simply using a camera to capture the reference object and an electronic display to show the image in real-time, as shown in Fig. 3. There are several examples of systems that provide electronic synchronous reflectivity that extend the use of mirrors as information appliances. Rozin [24] has created a series of art installations of mirrors made out of wood, trash, woven tape and other material. In each of these systems, a pixel is composed of an element of the material and can be tilted or otherwise moved to change that pixel’s brightness. The images captured by a camera are reflected in the “mirror” by downsampling the image and changing the brightness of corresponding pixels in real-time. The effect is of a grayscale reflection displayed in a non-reflective material. Along similar lines, Roussel et al. [23] created MirrorSpaces which captures images with a camera, performs image processing on them and displays them back to the user in real-time with the aim of supporting proxemics in distributed video communication. In another example, The Smart Makeup Mirror [15] uses a high-resolution camera and monitor to provide functionality analogous to a digitally enhanced lighted dressing-table magnifying mirror. The user can zoom into specific regions of the face and see how colors change in simulated lighting conditions.
https://static-content.springer.com/image/art%3A10.1007%2Fs00530-010-0192-y/MediaObjects/530_2010_192_Fig3_HTML.gif
Fig. 3

Digitally captured media enables wider variety of transformations and display technologies such as the wooden tiles used by Rozin

Like the physical mirrors, each of these electronic systems comprises light captured from reference objects transformed into a reflection. The digitally captured media, in contrast to physical surface reflection, allows the transformations to be more complex including affine transformations, color changes, downsampling from higher to lower resolution pixels, and other image manipulations.

1.1.2 Quasi-synchronous reflection

Using the camera and display, it is also possible to have the display show images recorded previously and to synchronize the presentation of those images with the changes in state of the physical reference objects using computer vision to detect scene changes. We refer to this form of digital reflection as quasi-synchronous reflection in that the displayed reflection is synchronized with changes in physical state, but the reflected image may have been captured at a different time or place, as illustrated in Fig. 4. For example, using simple face tracking, the electronic display could show different people’s faces that correspond to movements of the reference face. This mode of reflection introduces another fundamental component to the system, which is the storage of images that record past state and are matched to the current state. In synchronous reflection, matching is implied because the reflection is a transformation of current reality, whereas in quasi-synchronous reflection, some aspect of current reality must be extracted and matched against that aspect of recorded prior reality. Examples include the poses of people, the colors of items, the identities of individuals, or other information extracted using image analysis techniques.
https://static-content.springer.com/image/art%3A10.1007%2Fs00530-010-0192-y/MediaObjects/530_2010_192_Fig4_HTML.gif
Fig. 4

A quasi-synchronous reflection is retrieved from a repository to match some aspect of the current scene. (e.g., the face is the same but the eyeglass frames differ)

Beyond the intriguing artistic applications that this enables, quasi-synchronous reflection provides a powerful new mechanism to support decision-making processes in apparel shopping and is the conceptual basis for the capabilities of the Responsive Mirror system described in more detail in Sect. 3.

1.1.3 Asynchronous reflection

Taking this idea of breaking reflective synchrony further, we can imagine that the reference cue might also be something other than a physical object in current time. The reference cue could also be a recording of changes in physical state captured from another time or place, as shown in Fig. 5. For example, when playing back a recording that provides reference cues, another set of images captured from another time or place can be matched to the state changes in the reference, transformed and displayed. We refer to this as asynchronous reflection because the current time, the time the reference was captured and the time the reflection was captured all differ.
https://static-content.springer.com/image/art%3A10.1007%2Fs00530-010-0192-y/MediaObjects/530_2010_192_Fig5_HTML.gif
Fig. 5

In a fully asynchronous reflection, the times that the reference and reflection were captured differ from the current time

Again, aside from artistic novelty, this form of reflection has practical usefulness in terms of supporting decision-making processes in apparel shopping and perhaps other domains. Asynchronous reflection is the fundamental capability provided by Countertop Responsive Mirror, described in Sect. 4, which is used to compare images of previously tried on jewelry, eye glasses, hats or other apparel side-by-side with the shopper’s pose matched between the recorded sessions.

1.1.4 A framework of reflective synchrony

At this point, we have introduced different modes of reflection depending on whether the reference and reflection are synchronized with the current reality or with a past reality. Projecting this conceptualization forward, it is also possible to imagine using a synthetic or virtual reality. Indeed, there are examples of systems in that category as well. Let us now introduce all the possible states of synchrony between images captured from current, past or virtual reality, which are enumerated in Table 1 with names of the various modes of reflection based on the synchronization with respect to current reality. Each row in the table describes the different modes of reflection depending on the formation of the reference object in current, past or virtual reality. Each column describes the modes depending on whether the reflection comes from current, past or virtual reality. Let us now complete the description of the table one row at a time.
Table 1

Different modes of reflective synchrony arise depending on whether the reference and reflection are based on current, past or virtual reality

Reference

Reflection

Current reality

Past reality

Virtual reality

Current reality

Synchronous reflection: Conventional mirror reflecting a physical object, or electronic emulation

Examples: Physical mirror, Wooden Mirror, street mimes

Quasi-synchronous reflection: Recorded images matching the motion of a physical object

Examples: Responsive Mirror

Virtual-synchronous reflection: Images of a virtual object (avatar or clothing) matching the motion of a physical object

Examples: Project Natal, virtual fitting technologies

Past reality

Asynchronous mimicry: A person copies the motions of a recording

Examples: Karaoke, sports practice videos, fitness videos (Tae Bo, P90X)

Asynchronous reflection: A recorded video matches the motions in another recorded video

Examples: Countertop Responsive Mirror

Asynchronous virtual reflection: A virtual object matches the motions of a recorded video

Examples: Computer-generated animations from motion capture

Virtual reality

Virtual mimicry: A person copies the motions of a virtual object

Examples: Virtual sports trainers, dance and guitar games

Virtual-asynchronous reflection: A recorded video matches the motions of a virtual object or avatar

Examples: Feasible, but no current example known

Synchronous virtual reflection: A virtual object matches the state (or a transformation of that state) of a separate virtual object

Examples: Virtual mirrors in SecondLife

1.1.5 Modes of reflection when the reference object is based on current reality

As described previously, in synchronous and quasi-synchronous reflection, the reference object inhabits the current reality. This attribute is also true in virtual-synchronous reflection, in which the reflection is wholly or partially formed using virtual objects. We see examples of this in a recent video of an envisioned computer-vision-based game control system called Project Natal.1 Other examples include “virtual fitting” systems such as the Virtual Mirror (see footnote 1) for trying on sunglasses among many others.

1.1.6 Modes of reflection when the reference object is based on past reality

In constructing this framing of asynchrony in reflections, we note that there are some cases when a person may intentionally reflect the state of a recording such as when singing karaoke, exercising with a fitness video, practicing with a sports training video, or otherwise mimicking the scene of a recording. We refer to this as asynchronous mimicry (note that the category of synchronous reflection also contains the form of mimicry performed by street mimes). Completing this row of reflection modes cued by references from past realities is the category of asynchronous virtual reflections which includes technologies such as computer-generated animations based on motion capture of humans and is in prevalent use in video games and movies.

1.1.7 Modes of reflection when the reference object is based on virtual reality

For theoretical completeness, one can also imagine that the reference objects are wholly or partially created by a virtual reality. In the first cell of this row, virtual mimicry, human users emulate the states of virtually constructed avatars. Examples include the use of video game machines for fitness and sports training in which the human player mimics a virtual trainer, and for playing musical instruments along with a virtual band, or dancing along with a virtual dancer. For completeness, we include the category of virtual-asynchronous reflection in which a virtual model would reflect the state of a past reality, although we are not aware of an existing system that exhibits this capability. There are, however, examples in the final category of synchronous virtual reflection, where a reference object in a virtual reality is reflected by a virtually constructed reflection. An example is seen in mirrors found in SecondLife which can show a graphical transformation of the state of a virtual object.

1.2 Technical challenges in constructing multimedia mirrors

The most obvious technological requirements for a digital mirror are in the capture and display of images. However, these are both straightforward in today’s state of digital cameras and display technology. Rozin [24] has demonstrated the creative use of unexpected material such as wood, paper, trash and chrome balls to form physical pixels in a display, but novel material is not necessarily a requirement for a digital reflection.

The challenges in such systems today come in two forms: (1) selection of feature(s) to extract from the reference and (2) techniques to match an appropriate reflection.

First, all categories other than fully synchronous reflection require the matching of some aspect of the reference object to find the closest reflection. Which aspects of an image are important depends on the problem domain. Below, we provide examples of systems targeted at apparel shopping: one for clothing and one for head- or neck-worn jewelry. Although they are designed for closely related problem domains, the “important” features of the images differ. In the following two examples, we describe the design processes used to identify the important features to the users of the system.

Second, once the important aspect of the images is identified, developers must select or invent techniques to extract the important features from images and to develop the matching algorithms for finding suitable reflections. In the examples below, we describe different instantiations of feature extraction and matching techniques used in our example systems.

2 Responsive Mirror and Countertop Responsive Mirror systems

In this paper, we explore the design and architecture of the Responsive Mirror [30] and Countertop Responsive Mirror [8] for providing supplemental quasi-synchronous or asynchronous media reflections to facilitate a shopper’s exploration of fashion and decision-making. In contrast to previous fitting room systems, the Responsive Mirror reflects past recorded images/video to current reality and displays the contents with the goal of apparel shopping: physical fit and style fit. The Responsive Mirror employs computer vision and machine learning techniques to automatically find matching styles and seamlessly respond to the pose and movement of the shopper. The Responsive Mirror features an implicitly controlled interface that responds to natural human actions as the input (in contrast to explicitly controlled interfaces that use keyboard input, gesture or other explicitly controlled input modalities) to minimize disruption to the usual shopping experience.

We describe the architecture and computer vision engine of Responsive Mirror in Sect. 3. Then we present the Countertop Responsive Mirror—an asynchronous reflection system for jewelry shopping in Sect. 4. In order to assess the design considerations (privacy, placement, and interaction requirements) and potential effectiveness of the Responsive Mirror system, we conducted a “Wizard of Oz” user evaluation. The setting of the study and the results will be briefly described in Sect. 5. The related technologies, conclusions and future work are presented at the end.

3 Responsive Mirror

In order to instantiate our theoretical framing of quasi-synchronous reflection for clothes shopping, we have designed the Responsive Mirror—an intelligent clothes fitting room system [3, 4, 30, 31]. The concept is illustrated in Fig. 6 and prototype is illustrated in Fig. 7. The Responsive Mirror consists of a conventional mirror (center), two electronic displays and two digital cameras (mirror top and ceiling) connected to a real-time vision system that drives the interaction between the user and the display.
https://static-content.springer.com/image/art%3A10.1007%2Fs00530-010-0192-y/MediaObjects/530_2010_192_Fig6_HTML.gif
Fig. 6

Conceptual illustration of Responsive Mirror

https://static-content.springer.com/image/art%3A10.1007%2Fs00530-010-0192-y/MediaObjects/530_2010_192_Fig7_HTML.jpg
Fig. 7

The Responsive Mirror prototype

The display on the left of the mirror shows the shopper wearing previously worn clothes. This display helps the shopper compare multiple garments in parallel rather than in sequence. The quasi-synchronous reflection of this display is triggered by the change of reference states, i.e., the change of orientation of the user in front of the mirror. This reference signal is captured by the two cameras and analyzed by the computer vision engine. Then the engine search among past images to find the closest matching in orientation. Thus the reference signal is transformed into the best matched image, which is immediately displayed on this display as the quasi-synchronous reflection. From the user’s view, the pose of her previous clothes in the display matches her pose as she moves to view the clothing from different angles in the mirror. Although the system is displaying visual information about how the prior garment looked when worn, the quasi-asynchronous reflections also remind the person of other sensory perceptions they experienced during the prior fitting.

The display on the right of the mirror shows images of other people wearing clothes that are similar to or different than the one being tried on, also matching the orientation of the shopper. This display provides the shopper with reflections about social context and alternate fashions that she might like to try. When the display shows people wearing similar clothing, the shopper can use the images of others to form an impression of what kind of image of self she would exhibit in these clothes. If the shopper does not care for the garment she is currently trying, she can use the images of people wearing different clothing for ideas of alternate styles. The reference signal of the quasi-synchronous reflection of this display is the style of the shirt, which is recognized by our clothes recognition algorithm. Then the system search for the shirts whose styles are the most “similar” and “dissimilar” to the reference shirt, which are displayed as the reflections.

To prevent egregious invasion of privacy, the system’s cameras are not intended to be mounted in the room where a shopper actually changes clothes, but in an adjacent “fitting area”. When there is no customer in the cameras’ views, the displays can just show ambient information (videos, advertisements, etc.). As a customer tries on clothes and walks into the view of cameras, the system detects her presence and the displays become interactive as previously described.

3.1 Computer vision engine of Responsive Mirror

Full details of the architecture and other technical components of the Responsive Mirror can be found in [30, 31]. Here, we summarize the key components of the computer vision engine: (1) the clothes segmentation, recognition and matching algorithms (Sect. 3.2) and (2) the motion tracking algorithms (Sect. 3.3). We first focused on shirt recognition. We conducted a user study to discover the most salient clothing factors which people use to determine similarity between shirts. Then we took the divide-and-conquer approach for shirt recognition. A factor classifier is developed to recognize each salient factor in the shirt images. And then the factor features are fit into regression models to measure the pair-wise shirt similarities.

3.2 Clothes segmentation, recognition and matching

An interesting emerging trend on social networking is the combination of an image-similarity service, such as Like.com [19], with a slide-show service, such as RockYou [22]. Clothing similarity has also been employed as a contextual cue for the purpose of human identification in [2, 12, 25, 26, 29, 32]. Responsive Mirror also matches images of clothing, but it does not make direct product recommendations or recognize the people based on their clothes. Responsive Mirror instead provides images of people wearing a range of outfits that the system infers are similar or different to what the shopper is trying on, providing information about the presentations of self that people are making. The social contents can help a shopper consider similar and alternative fashions in various contexts while evaluating clothing.

In order to provide social fashion information to the user, we explore the use of computer vision to recognize classes and attributes of clothing, specifically shirts, for a variety of applications including identifying a store customer’s taste and spending profile and recommending “similar” or “different” clothes that match his or her fashion preferences. The recommendation application could be instantiated on a variety of platforms, e.g., as a web-based application, or a mobilized service on camera phones.

Automated clothes detection and clothes recognition is challenging for machine vision in a number of ways: (1) the high intra-class variation and deformable configurations of the clothes. (2) The computational speed requirement of the algorithm. (3) The social nature of the clothes recognition problem from human perception. In order to retrieve the information meaningful for the user, our system is required to recognize the salient clothing factors that people care about.

3.2.1 User study to determine how people assess “similarity”

In order to identify these salient clothing factors, a user study was conducted using a brief web-based survey. A screen shot of the survey is displayed in Fig. 8. 65 people were invited to participate in the user study. The experiment dataset was created by photographing 12 participants (male and female) wearing shirts from their personal wardrobe, for a total of 165 articles of clothing. From the dataset, we selected 25 men’s shirts and 15 women’s shirts that covered much of the variation in the two samples. The web survey tool showed 40 randomly chosen pairs of men’s shirts and 20 randomly chosen pairs of women’s shirts, one pair at a time. For each pair, respondents were then asked to rate the similarity of the pair of shirts on a 5-point scale, labeled from 1 (not similar at all) to 5 (extremely similar). At the end of the survey, respondents were asked in an open-ended question to list the most salient factors they used to determine similarity between pairs of shirts.
https://static-content.springer.com/image/art%3A10.1007%2Fs00530-010-0192-y/MediaObjects/530_2010_192_Fig8_HTML.gif
Fig. 8

User survey of shirt “similarity”

To analyze the open-ended responses, each unique factor listed in a participant’s response was coded. The coded factors listed in the order of decreasing frequency were: sleeve length, color, collar presence, shirt type, pattern, button presence, neckline, emblem/logo placement, and material/texture. Thus, we focused on recognizing (1) sleeve length, (2) shirt color, (3) collar presence, (4) pattern, (5) placket, and (6) emblem placement. It is interesting to notice that color is not identified as the most salient clothing factor as we had expected. There is no significant difference between male and female ratings.

3.2.2 Clothes segmentation

In order to recognize the shirts, we need to first detect the location of the shirts. Our detection method begins with a bounding box of the human body which is easy to obtain. For an outfit video in our Responsive Mirror system, the object tracking algorithm can automatically detect the bounding box of the person. Since the person is typically standing upright in front of the camera, our system detects the clothes parts by simply segmenting the bounding box with heuristic ratios.

3.2.3 Clothes recognition overview

In order to explore the salient shirt factors identified in the user study, we adopted various computer vision and machine learning algorithms to detect and recognize these factors from a single camera sensor [30, 31]. Considering the real-time requirement of the application of the algorithm, we extracted low-level primitives which can be computed efficiently in the images. After being formulated as classification problems, linear Support Vector Machines (SVMs) [7] or Decision Stumps [10] classifiers were learned on these features to recognize the factors. The recognition algorithms are briefly introduced in the following sections. Detailed descriptions can be found in [30, 31].

3.2.4 Sleeve length recognition

Sleeve length is the most important factor suggested by participants from the user study. It is intuitively a significant cue to discriminate between polo shirts, casual shirts and t-shirts (class 1: short-sleeve or no-sleeve) against business work shirts (class 2: long-sleeve). In order to recognize these categories, we assumed that long-sleeve shirts have less arm skin areas than short-sleeve or no-sleeve shirts. So sleeve recognition is reduced to two problems: skin detection and sleeve classification.

Our skin detection algorithm is mainly adapted from the work in [26], with the assumption that the skin tone from a person’s face (detected using the method in [16]) is usually similar to the skin tone of his/her arms. After skin detection, the sleeve length is described based on the number of skin pixels detected in the arms. A decision stump [10] is learned on these features to recognize the sleeve length. Fivefold cross-validation experiments were conducted on our dataset to test the performance of this sleeve recognition algorithm. Our algorithm achieves 89.2% sleeve length recognition accuracy.

3.2.5 Collar recognition

Participants in the study identified the presence of a collar in a shirt to be an important cue to discriminate between t-shirts against other types of shirts (e.g., business shirts and polo shirts). We explored a number of image features based on the Harris corner points [14] for collar recognition. A linear SVM with soft decision boundaries was learned on the extracted image features to recognize the presence of collars. Linear SVM is also employed for the recognition of the following factors. The fivefold cross-validation performance of the collar recognition algorithm had 78.7% accuracy. From the weights of the learned linear SVM, we found that the number of Harris corner points detected in the collar part was the most discriminative feature for collar recognition.

3.2.6 Placket recognition

The presence and length of placket line in shirts were indicated to be an important cue to discriminate between t-shirts against polo shirts and other types of shirts (e.g., business shirts). Thus, we employed the Canny edge detector [6] to detect the vertical placket points and measure their distribution to generate the features for placket recognition. The performance of the placket recognition algorithm was 83.8% accuracy. The number of vertical Canny edge points detected in the upper torso area was identified as the most discriminative feature for placket recognition.

3.2.7 Color analysis

In our user study, participants identified color as one of the significant factors to measure the clothes similarities. Therefore, we used color as one of the factors for clothes matching. A color histogram is computed in Hue and Saturation channels from the segmented torso part. Then, the histogram was compared with the histograms of other clothes images. The color dissimilarity between two clothes was measured by the χ2 distance between their color histograms [26].

3.2.8 Pattern complexity recognition

The complexity of the pattern in the shirt was also indicated as valuable for clothes recognition. And intuitively, pattern complexity was related to the suitability of the clothes for different social occasions. For example, a very colorful shirt was usually considered less suitable for a formal event than a solid-colored one. Thus, we extracted features based on the distribution of Harris corner points and Canny edge points, and the color complexities to recognize the pattern complexities of the shirts.

We were trying to recognize two pattern classes: (1) solid: shirts which are plain in color and texture, no large-area patterns; (2) patterned: shirts that are either colorful or patterned, for example, the block-patterned shirts. The pattern complexity recognition algorithm achieved 87.9% accuracy on our dataset.

3.2.9 Emblem placement recognition

Detecting the emblem placement was needed for the recognition of logo or character on the clothes, which are very valuable for clothes brand recognition or contextual information extraction. Thus, we focused on the centered versus non-centered emblem recognition problem because we noticed a lot of centered patterns or logos on the shirts in our dataset. The features for this problem were extracted by analyzing the difference between the central torso and the surrounding clothes parts. The more distinct they are, the more likely the emblem was located in the center. The emblem recognition algorithm performed very well according to our experiments with a recognition accuracy of 99.0% on our dataset.

3.2.10 Shirt style recognition

We combine all the factor features described above into a single feature vector and applied Linear SVM to classify the shirts into different style categories. Note that the definition of shirt styles involves several social issues, and there is no existing clear categorization. So we manually labeled the clothes images according to human experience. We defined four shirt styles: t-shirt (65), polo shirt (32), casual shirt (20) and business shirt (48).

Since t-shirts and business shirts were the most numerous in our dataset, we first examined this binary classification problem. The result is summarized in Table 2. The confusion matrix is given along with the overall classification accuracy, which is the overall count of hits against the total number of test examples. We can see our algorithm performs very well on classifying the t-shirts against the business shirts. We then focused on the more difficult four-class problem. The result is summarized in Table 3. We noticed that the vision algorithm has significant confusion between polo and casual shirts against business shirts. Providing more polo and casual shirts may marginally improve the performance, but we believe that the confusion mainly comes from the common features they share.
Table 2

Shirt style (t-shirt vs. business shirt) recognition accuracy

Classified as

T-shirt

Business

T-shirt

96.2%

3.9%

Business

5%

95%

Overall accuracy: 95.7%

Table 3

Shirt style (four-classes) recognition accuracy

Classified as

T-shirt

Polo

Casual

Business

T-shirt

80.8%

3.9%

15.4%

0%

Polo

16.7%

41.7%

8.3%

33.3%

Casual

0%

12.5%

50%

37.5%

Business

0%

5%

5%

90%

Overall accuracy: 72.7%

3.3 Clothes similarity measurement

In order to weigh the degree to which each clothes factor is salient in human perception, we turned to our user study data. Each respondent rated the similarity on 40 pairs of men’s shirts and 20 pairs of women’s shirts. We coded each shirt along each of the six factors. For each pair of shirts, we calculated a difference score for each factor. A 0 was given for each matched factor and a 1 for a mismatch. For the color, we compute a score between 0 and 1 depending on the normalized distance between the two color histograms. Thus, for each person’s similarity rating, we had six factor similarity scores.

We conducted a linear regression using the factor similarity scores to predict the similarity rating scores in order to determine the relative importance of each factor. The regression model generated the weights for each factor that would approximate human perception of shirt similarity. The coefficients provided the weights for each factor that we can use to generate similarity scores. For example, the regression equation for similarity ratings of men’s shirts is: similarity rating = 3.247 + (−0.63 × sleeve) + (−0.19 × pattern) + (2.40 × color) + (−0.88 × collar) + (−0.80 × placket) + (−0.06 × emblem).

To understand how well the features generated from our computer vision engine captured the variance in human ratings of shirt similarity, we conducted a linear regression using the image features to predict the similarity rating. The regression results show that in the case of men’s shirts, the predicted similarity score (using the features detected from the vision algorithms) correlates with the actual ratings at 0.52.

3.4 Motion tracking and pose matching

Human body orientation detection has been extensively studied for human pose estimation. The Responsive Mirror system uses visual analysis from a ceiling mounted camera to detect body orientation and to track the orientation efficiently. A typical person, when viewed from overhead, is an ellipsoidal shape. The longer axis is the shoulder, and the shorter axis is the person’s body orientation (Fig. 9). The curve in the figure is the detected body contour, which is then approximated by a best-fitting ellipse as shown in the figure. The detected body orientation is marked with a line.
https://static-content.springer.com/image/art%3A10.1007%2Fs00530-010-0192-y/MediaObjects/530_2010_192_Fig9_HTML.jpg
Fig. 9

Detecting body orientation from overhead camera view: a people standing straight with arms down, b people extending arms sideways

One problem with simple ellipse-fitting is that people’s pose may affect body shape. To handle the variation of poses, we first decide whether the overall detected contour is roughly convex. This is a good indicator of whether the arms are well-aligned with the body. If the person is extending arms sideways, the body contour is close to convex. In contrast, if a person is extending arms to the front, the body contour is a U-shape, which is concave. In this case, the pixels corresponding to the arms should not be considered. We used a morphological opening operator to eliminate arms from the foreground body pixels. With this operation, our detection scheme could successfully detect body orientation.

Another problem is the occasional incorrect detections. For example, when a person folds her arms in the front, the body shape is more circular. The orientation detection in this case is less reliable. This problem can be resolved by leveraging historical information assuming that people change orientation smoothly, and that incorrect detections occur only intermittently. Under these assumptions, we employed tracking, using a particle filter [18], incorporating history to stabilize orientation detection. This also helped to eliminate flip ambiguity, provided that the person does not turn too quickly.

Pose matching is employed in both Responsive Mirror and Countertop Responsive Mirror systems. The main challenge of pose matching is defining a parameter space for pose. However, for the specific purpose of self comparison of different clothing in the Responsive Mirror, we are able to get away with extracting simple image features and perform pose matching from these features. Examples of pose matching in Responsive Mirror are shown in Fig. 10. The details of our pose matching algorithm will be presented in following section.
https://static-content.springer.com/image/art%3A10.1007%2Fs00530-010-0192-y/MediaObjects/530_2010_192_Fig10_HTML.jpg
Fig. 10

Examples of pose matching in Responsive Mirror

4 Countertop Responsive Mirror system

As an illustration of systems in the category of asynchronous reflections, we describe here the design of the Countertop Responsive Mirror system—a shopping support system to enhance the conventional mirrors typically found in stores that sell head- or neck-worn accessories such as jewelry, eyeglasses or hats. In accessory shopping, mirrors are usually smaller than the full-sized ones found in clothing fitting rooms, and some are portable, which is an important feature of the mirrors in jewelry stores. Unlike the Responsive Mirror described in the previous section which supports quasi-synchronous reflection, asynchronous reflection capabilities are more suitable for accessory shopping for several, independent reasons. First, for shoppers trying on eyeglasses, quasi-synchronous reflections of past recordings like in the clothing Responsive Mirror is inadequate because they cannot view themselves adequately without the right prescription. Also, turning their heads severely limits the set of views that they can see themselves, so that asynchronous reflections makes sense for shoppers to view themselves without impaired vision. Second, we found in our observations of shoppers in jewelry stores [8] that the normative shopping practice of jewelry shoppers involved quick assessments using the mirror while browsing jewelry items. After the shopper is satisfied with having browsed items, she would go through a more detailed evaluation stage to make a decision to buy. Due to the high price of jewelry, the evaluation period often took longer than a single trip to the jewelry store and would often require approval from spouses and other family members. Thus, having the ability to evaluate and compare jewelry pieces for an extended period of time led to the design of an asynchronous reflection paradigm for jewelry shopping to enable shoppers to adequately evaluate their options. Furthermore, for high-priced items such as jewelry, we found that shoppers may have access to only a few jewelry items at a time due to security reasons. Thus, asynchronous reflections help storeowners maintain security of their inventory while enabling shoppers to make informed decisions.

Figure 11 shows the prototype Countertop Responsive Mirror system.2 The prototype consists of two components: one for “capture” and one for “access”. This separation of function was a deliberate design decision to match the normative shopping practices of jewelry shoppers.
https://static-content.springer.com/image/art%3A10.1007%2Fs00530-010-0192-y/MediaObjects/530_2010_192_Fig11_HTML.gif
Fig. 11

The Countertop Responsive Mirror prototype consists of a “capture” component using a camera behind a half-silvered mirror (left image retouched for clarity) and an “access” component (right) that emulates the functionality of the accessory tray for reviewing multiple pieces of accessories side-by-side

The “capture” component shown on the left side of Fig. 11 consists of a half-silvered mirror with an embedded camera. The associated silver-colored knob serves as a recording button that the shopper can use to capture a sequence of images. There is an embedded LCD monitor behind the mirror (not visible in Fig. 11) that gives feedback to the shopper when recording is occurring along with how many sessions have been recorded.

The “access” component is a large touch-screen display shown on the right side of Fig. 11. It consists of a graphical user interface to enable shoppers to view and compare their recorded sessions and consists of two large panels that show two different sessions of images. Below the two large panels is a single slider which gives the user control of which pair of images from the two sessions to show.

The asynchronous reflection capability is implemented by a matching algorithm that determines for every pair of sessions which images match in pose across sessions. Matching occurs automatically so the user has no explicit control of which images are matched to one another. To allow users to view more than two sessions, a panel with thumbnails of the recorded sessions that is displayed at the bottom of the GUI can be dragged to either of the two larger panels to load that session of images into the panel. The GUI will automatically shuffle the images in the sessions to implement the asynchronous reflection capability.

Unlike the Responsive Mirror system for clothing shopping as described in Sect. 3 which uses a second camera mounted on the ceiling to compute the shopper’s body orientation relative to the mirror, the Countertop Responsive Mirror only has a single camera to both capture front-facing images as well as match the pose of the shopper. From our field observations in local Indian jewelry shopping, we determined that matching the head tilt and rotations were the most predominant features to use as a reference. Secondarily, we observed that hand placements on the body next to the jewelry pieces and body leans to see close-ups of the jewelry were also indicative cues. Rather than attempting to explicitly estimate all of these head, hand, and body pose parameters of the person, the approach we chose was to engineer a similarity measure between two images which corresponds to what people would perceive as the best pose-matched image pair between sessions. The similarity measure is a modification of the sum-of-squared (SSD) distances between pixels of two RGB color images. The modification involves deemphasizing the effect of slight translation differences to the SSD distance since we found that slight translation differences were hardly perceptible as well as taking into account the pair-wise distance structure of images within a session. The interested reader may refer to [8] for more technical details of our engineered similarity measure.

Our chosen method above clearly does not extract faces and body parts, much less extract pose parameters, and a cursory assessment may lead one to believe that such a simplistic approach cannot possibly be adequate to implement asynchronous reflections. However, we found that the approach can match head poses, hand motions, and body leans between reference and reflection images to adequately implement asynchronous reflections for this application. The approach is also robust to clutter and motions of objects and people in the background because the shopper using the Countertop Responsive Mirror fills up most of the pixels in the image. Users of our system could perceive that matching was indeed occurring when they were evaluating jewelry, and the errors in matching that the system made were few enough that users did not mind.

5 Summary of user studies

During the course of developing these intelligent multimedia reflection systems, we have conducted several user studies to test the assumptions about the usefulness of the new capabilities and to draw out lessons for future design. In this section, we summarize some of the cross-cutting aspects of the user experience that these studies exposed and refer readers to other publications for more detail [4, 8, 30].

5.1 Benefits of the systems

A key question to explore was whether and in what ways do such systems provide benefit to users: shoppers, sellers and companions. In one study [4], users ranked both of the digital media reflection types offered by the Responsive Mirror (previous fitting reflection and similar/dissimilar shirt reflection) higher than a plain mirror. On the other hand, neither type of digital reflection changed the appeal of the shirts nor ultimately their decision to buy a shirt. As one might expect, participants’ comments indicated that the quality of the shirt itself was the determining factor, not the method used to assess or compare those qualities. This is important because to be of benefit to sellers, the system must in some way lead to increased sales. Although the Responsive Mirror’s information was considered somewhat helpful, retailers using this technology should not expect an immediate change in purchasing behavior or change in sales. However, retailers may reap longer-term benefit from increased customer satisfaction with the shopping experience.

In the case of the Countertop Responsive Mirror, rather than conduct lab-based experiments, we wanted to observe the actual experience of users in a store by conducting deployment trials in local jewelry stores and also with informal “focus groups” consisting of friends and family at the home of a business associate not on the research team. Details of those trials are described in [8] and key points are summarized here.

5.2 Recall

Respondents confirmed an expected affordance of the system: that it helped them recall what they had tried on, “It helps us remember what we wore.” For example, when people tried on four or six items over the course of 20 min, they often had trouble in recalling the appearance of the first few items. The system provided a convenient inventory of recordings well after the sales person had put items away.

5.3 Reaction to image matching capability

Across all deployments, the image matching capability caused the most excitement among shoppers. The slider was the single most used widget in the UI, promoting a good deal of interaction among shoppers and their companions. The image matching capability was viewed as a “cool” capability and provided a high degree of interactivity to quickly make a comment about the appearance of an item in a particular image; this was true both for sales people and their customers.

5.4 Privacy considerations

A common concern that sensor technologies face is how they affect a user’s sense of personal privacy. Will users accept a camera or other sensors into a traditionally semi-private space such as a fitting area? What concerns would they have and what measures should the system design incorporate to mitigate such concerns? In the press articles, privacy concerns are sometimes raised but rarely explored in depth.

In [4], we report a user study of 12 male participants using the Responsive Mirror to better understand user behaviors and their privacy sensitivities about this class of technologies. More specifically, we studied the privacy issues on sharing the images framing the questions to identify where are the typical boundary points in Irwin Altman’s dimensions of privacy [1]: disclosure (what types of information would you disclose to what types of relations), projection of group and individual identity (how deep is your concern about the impression of your personal values these images of you give to others), and the temporal dimension (implications of the duration that the images will exist).

With respect to disclosure, participants’ levels of concern were not significantly different regarding the gender of someone seeing the images. Participants had substantially the same levels of concern for friends or family members seeing the images (mean values of 1.08 and 1.50, respectively, on a 5-point scale where 5 = bothers me a great deal, 1 = doesn’t bother me at all) as well as co-workers and strangers (mean values of 2.08 and 2.25). This suggests that the granularity of disclosure classes can be as few as two for a large number of users (the categories could be hierarchically nested for those users who want finer granularity).

With regard to group and personal identity projection, participants rated their level of concern significantly higher for bad shirts (M = 3.0) versus good shirts (M = 1.42) (P = 0.001), as one might expect. On average, the frequency of how often participants think of the similarity of the clothes to what other people they know and do not know is roughly the same (M = 2.92 and 2.33, respectively, on a scale where 5 = always, often, sometimes, seldom, never = 1). Participants responded with a mean of 3.6 (SD 1.07) to the question of how often do they consider how others will perceive them in the clothes they are trying.

Regarding the issues of temporality, participants indicated a possible desire to remove images at some point in the future, with the highest number of responses for 3 months (five participants) and the distribution of the remaining responses was spread across times within 1 year (five participants) and never (two participants). These responses suggest that the system should prompt users at points in time of 3 months and again at 1 year to see if users want to keep or remove images in the system.

6 Related technologies

The idea of integrating digital technologies with mirrors is not new and this section describes some systems that are in the neighborhood of the multimedia reflective systems described above but that are not actually reflecting any current, past or virtual reality. Although they incorporate a mirror in the system, none of these systems are providing the variations of reflective synchrony described previously.

6.1 Interactive displays

There have been several systems that incorporate electronic displays and computer vision in retail apparel shopping. The work of Haritaoglu et al. [13] used computer vision to detect the number of people looking at a display and their demographic data to update an advertisement. A Prada boutique in Manhattan, New York contained a sophisticated dressing room [5] with a radio frequency identification (RFID) scanner that identified each garment brought in. An electronic display provided information about price, inventory, alternate colors and sizes. The fitting room also contained a motion-triggered video camera that recorded the shopper and played back the video when the shopper stepped out of the direct viewing area in front of the mirror. This video playback was not matched to the movement of the shopper. A component of the system, called the Magic Mirror [21], provided the ability for a person trying on clothes to send video of himself to friends who can send back comments and vote (thumbs up/down). The system could also project a static image of an alternate garment onto the mirror which the shopper could position herself so that the projected garment was fitted to her mirror image, providing a rudimentary “virtual fitting” with which the shopper could get a sense of how the garment might look on her. The trial of these technologies was not successful in that store, although the trials continued later in Bloomingdales. A report in Business 2.0 describes the dramatic mismatch between expectations of the retail technology designers for Prada and the reality of use of the technologies day to day where much of the system went largely unused due to a variety of factors including overflow traffic, technical failures and non-intuitive controllers (such as floor pedals to set the opacity of a glass wall) [20].

The Philips MiraVision LCD Mirror TV is an example of a commercial product that integrates an electronic display behind a conventional mirror. In this case, the conventional mirror is not fully opaque (often called “half-silvered” though the opacity is not necessarily half of normal) allowing light from a sufficiently bright display to pass through. There are some research systems that emulate this functionality but attempt to optimize what is shown in the electronic display and its position [11, 17].

6.2 Capture and access systems

Capture and access systems [28] are a class of ubiquitous computing systems that capture parts of an experience, via interactions with a user interface, cameras, and microphones, for access later. Several systems for note-taking in classrooms, recording meal preparation in the home, capturing informal meetings at work and battlefield visualizations in the military domain have been developed. One system called the cook’s collage [27] has been developed as a short-term memory aid when interrupted while in the middle of cooking to help recover what the cook had been doing before being interrupted.

The Responsive Mirror and Countertop Responsive Mirror can be considered to be types of capture and access systems. Like most capture and access systems, the Responsive Mirrors do serve as a memory aid, reminding shoppers of their appearance in clothing or accessories. However, the “access” of prior images (reflection) in the Responsive Mirrors is specifically matched to the state of the reference, so these systems are better characterized as “digital reflections”. Capture and access is perhaps a more general notion of technologies serving as general memory aids, whereas “digital reflection” is specifically designed to compare two (or more) items according to salient features of the items.

7 Summary and conclusion

Digital media enables whole new classes of systems that support both senses of the word reflection: (1) the projection of an image representing objects outside of one’s direct field of view, such as oneself in a looking glass, and (2) the contemplation of information about past events. We introduced a conceptual framework for these various classes based on whether the reference and reflection images are derived from current, past or virtual reality. We discussed the fundamental technological challenges raised by these new classes of reflection, primarily in terms of determining the “important” features of the images in the problem domain and selecting or inventing techniques to identify and match those features. We illustrated instances of these challenges and specific solutions that we encountered in the development of two multimedia mirror systems.

The Responsive Mirror provides personalized and interactive quasi-synchronous reflections for general apparel shopping. The reference object is a shopper standing in front of a mirror trying on clothes and the Responsive Mirror shows two different reflections. One reflection shows the images of the user in previous fitting trials, matching the rotational orientation of the user to the mirror, which allows the shopper to directly compare the look and fit of two items simultaneously. A second reflection shows images of other people wearing shirts that are similar and dissimilar to the shirt the user is trying on, which provides a “social reflection” of the shirt style. The matching techniques of the instantiations of Responsive Mirror in fitting room are described and evaluated. We described a user study employed to determine a meaningful metric of “similarity” between men’s shirts.

The Countertop Responsive Mirror provides similar capabilities but for head- and neck-worn accessories using asynchronous reflection. In this system, the reflected images are not matched to the current state of the shopper, but with respect to recorded sessions of the shopper wearing different items. For each frame of the reference recording, the shopper’s pose is matched to the closest corresponding pose in the reflection recording. As a result, the motion in the reflection does not necessarily follow the recorded sequence of those frames, but the shopper’s pose is consistent across the items being compared. The Countertop Responsive Mirror is currently undergoing trial deployments and iterations of design.

In future work, we aim to continue to examine additional applications for variously synchronized reference and reflection. We can easily imagine some novelty applications such as mounting a system in a pedestrian corridor and reflecting the prior passings of other pedestrians as you walk by, or showing people in a dining area the reflections of others who sat and ate in similar positions. We have no doubt that such displays will be fascinating to experience, but we are more keen to identify problems in which asynchronous reflection provides information that support decision-making tasks, communication or other information-oriented goals.

Footnotes
1

Project Natal shopping scenario (feature video minute 2:30–2:45). http://www.xbox.com/en-US/live/projectnatal/ (last accessed 4 January 2010). Virtual Mirror: http://www.virtualmirror.net/ (last accessed 4 Jan 2010).

 
2

A demonstration of the system is shown at BoingBoing Gadgets at: http://gadgets.boingboing.net/2009/06/30/how-parcs-responsive.html (last access 8 January 2010).

 

Copyright information

© Springer-Verlag 2010