Multimedia Tools and Applications

, Volume 50, Issue 3, pp 587–607

Taking advantage of contextualized interactions while users watch TV

Authors

    • Departamento de ComputaçãoUniversidade Federal de São Carlos
  • Erick Lazaro Melo
    • Departamento de ComputaçãoUniversidade Federal de São Carlos
  • Renan G. Cattelan
    • Faculdade de ComputaçãoUniversidade Federal de Uberlândia
  • Maria da Graça C. Pimentel
    • Instituto de Ciências Matemáticas e de ComputaçãoUniversidade de São Paulo
Article

DOI: 10.1007/s11042-010-0481-7

Cite this article as:
Teixeira, C.A.C., Melo, E.L., Cattelan, R.G. et al. Multimed Tools Appl (2010) 50: 587. doi:10.1007/s11042-010-0481-7

Abstract

While watching TV, viewers use the remote control to turn the TV set on and off, change channel and volume, to adjust the image and audio settings, etc. Worldwide, research institutes collect information about audience measurement, which can also be used to provide personalization and recommendation services, among others. The interactive digital TV offers viewers the opportunity to interact with interactive applications associated with the broadcast program. Interactive TV infrastructure supports the capture of the user–TV interaction at fine-grained levels. In this paper we propose the capture of all the user interaction with a TV remote control—including short term and instant interactions: we argue that the corresponding captured information can be used to create content pervasively and automatically, and that this content can be used by a wide variety of services, such as audience measurement, personalization and recommendation services. The capture of fine grained data about instant and interval-based interactions also allows the underlying infrastructure to offer services at the same scale, such as annotation services and adaptative applications. We present the main modules of an infrastructure for TV-based services, along with a detailed example of a document used to record the user–remote control interaction. Our approach is evaluated by means of a proof-of-concept prototype which uses the Brazilian Digital TV System, the Ginga-NCL middleware.

Keywords

Digital Interactive TVCapture and access multimedia applicationsUser-media interaction

1 Introduction

Polls and surveys are among the many resources used to collect data which, analyzed, provide important feedback that allow producers to offer services or products aligned to the population in general, or to some group in particular. In the context of television, being able to continuously collect information about how much of the population is tuned into each channel has been crucial to the creation of audience indexes that provide important feedback to producers and guidance to advertisers. This demands being able to collect data in the home that includes the channel chosen as well as whether the volume is muted or not.

Research services all over the world [29] strive to provide information about what is watched by who, and when they watch (e.g. [5, 27, 38]). The information is gathered from representative samples of the population, who agree to have data collected in their homes. The data collection is achieved by means of a variety of instruments—from special-purpose electronic meters or set-top-boxes to paper-based surveys and diaries. Information gathered by these services has been used by several researchers, for instance to investigate the categorization of TV viewers based on the program genres they watch [20]. Moreover, researchers have also used data collected in questionnaires of their own to design personalization services [24, 56].

Given the increasing availability of communication and computing devices mediating the user interaction with home appliances, personal devices, cars, and other machines (e.g., vending machines operated by cell phones), along with the constant improvements in techniques and infrastructure involving data mining, for instance, it is opportune to widen the amount of information that is captured and processed so as to investigate novel techniques to produce quality feedback.

The Interactive Digital Television (iDTV) platform is an opportune candidate for data collection. First, iDTV has a processing element, and secondly, iDTV employs a return channel to communicate with providers and broadcasters. Moreover, the TV business model depends on knowing how receptive the audience is to the programming being presented. This scenario has motivated researchers to define an audience measurement model and its correspondent implementation in the context of convergent broadcasting and IPTV networks [3]—the aim is to model user’s behaviour.

Data captured from a substantive sample of a population, combined with context information and metadata about the programs and applications with which viewers interact, constitutes a rich source of information for many professionals including producers, broadcasters, and advertisers. For example, the literature reports several efforts to provide personalization and recommendation TV services to users [6, 37, 40, 60] or groups of users [36, 57, 59].

The iDTV platform makes it possible to capture information beyond simple channel and volume changes, since iDTV allows for a richer set of user interactions. For example, it might be useful to capture information about user interaction with the Electronic Program Guide (EPG). Examples of navigation operations that can be captured are the path taken by the user while selecting a program, optional setting (e.g. language options for a program) or detailed interactions with an interactive multimedia program. Moreover, when users have a Personal Video Recorder (PVR), it could be interesting to capture information about content selected for recording as well as navigation over that content at the time of playback. Last but not least, for audience measurement purposes it could be important to capture any user interaction with interactive programs or applications [3].

The interactions described above can be classified as implicit interaction—in the sense that the user is performing actions as a natural part of watching interactive TV. However, it is also possible to capture explicit user interactions. Users may be willing to volunteer their opinion about programs or products. The technology underlying iDTV makes it possible for applications to run in the background of programs and, when appropriate, invite the users to evaluate the program in whole or in part.

The interactive TV platform provides an infrastructure that supports detailed capture of user–TV interaction, though this has not been reported in the literature. Several authors argue that, for the modeling of user behavior, gathering data about a specific instant is less important than consumption over a longer time period so that the corresponding data can be aggregated and processed [3]. In this paper, we propose the capture of all user interaction with a TV remote control—including short term and instant interactions. We argue that the corresponding captured information can be used to create content pervasively and automatically, extending the work in progress discussed elsewhere [45, 53]. The created content can be used by all kinds of services, including audience measurement, personalization and recommendation services. However, the fine grained data resulting from capturing instant and interval-based interactions also allows the underlying infrastructures to offer services based on instant and time-based intervals, such as annotation services [11, 22, 52] and adaptative applications [30]. We present the main modules of the infrastructure for TV-based services, along with a detailed example of a document used to record the user–remote control interaction. Our approach is evaluated by means of a proof-of-concept prototype which uses the Brazilian Digital TV System, the Ginga-NCL middleware.

Throughout this paper, we consider the scenario where a user watches television using a TV set with a return channel. Although this is not always the case [39], it is possible to envision solutions to most situations where sending information from the user to a service provider is desirable. The Brazilian Digital TV specification [47], for instance, proposes a variety of return channels techniques that range from cellular phones to WiMAX networks [28].

The remainder of this paper is organized as follows. In Section 2 we discuss the concept of capturing the user remote-control interaction. In Section 3, we present the main modules of our proposed infrastructure for TV-based services. In Section 4, we present an example of a document used to record the user’s interaction with the remote control. In Section 5, we introduce a proof-of-concept prototype, integrated into the environment of the Brazilian Digital TV System and its Ginga-NCL middleware, that supports features for the capture of user–TV interaction. In Section 6, we discuss related work, and in Section 7 ,we present our final remarks on such important issues as privacy and ownership.

2 Capturing the user–TV interaction

The value of capture and access applications has been advocated in the ubiquitous computing literature both as a general research theme [1] and as an appropriate platform for general [43] and specialized services (e.g. [31]). In the domain of the Web, the need to study large scale [2] and long term usage has been advocated [46], while several businesses were made possible by the possibility of capturing and analyzing information about user navigation, in areas including news personalization [13], keyword suggestions [19], TV show search [41] and recommendations for Web-hosted video [4].

Many users watch TV and mastering the use of the TV remote control is a widely held skill. Considering a typical remote control and a typical user, many actions are carried out while watching TV. Such actions include the most basic, such as turning the TV on/off, changing channel and changing or muting the volume, as well as more complex actions such as accessing, navigating and programming the TV set via the (EPG) [35].

The interaction becomes more complex when a Personal Video Recorder (PVR) is available. For instance, one may want to program that each episode of a weekly series is to be recorded automatically while they are away. Although, on the one hand, several commands must be issued using the remote control for programming the desired recording, on the other hand a much richer interaction is available to the user to review the recorded media. In this case, the basic interaction actions include play, pause, stop and resume.

When more advanced remote controls are available, the interaction may become even more elaborate: depending on the device, commands can be issued via voice, gesture, or touch screen. Moreover, the identification of the user may be possible using biometric readings (e.g., via fingerprint scan [15]). In such a scenario, it is important to be able to support several users interacting with the same multimedia content even if each user employs a different device, as illustrated in Fig. 1.
https://static-content.springer.com/image/art%3A10.1007%2Fs11042-010-0481-7/MediaObjects/11042_2010_481_Fig1_HTML.gif
Fig. 1

TV users interact with the TV by means of different devices

The literature report several studies on providing interaction (e.g. [9, 18, 32]) and services via non-traditional remote controls (e.g. [710]).

The capture of the user–remote control interaction can provide valuable information on both interactive and non-interactive TV platforms, even when there is not a return channel to collect the captured information. For non-interactive TV, the business of measuring how much of the population has tuned their TV to each channel is both substantial and important [26].

By non-interactive TV, we refer to conventional linear programs which, when presented to the user, do not offer any options for interaction via the TV set-top-box. Considering the millions of viewers of a given TV series, the capture of their most basic (implicit) interaction could give valuable feedback to the production team. For instance, mining the captured data could show that a significant portion of the viewers change channel each time a given character appears in an episode of a popular series. Similarly, the fact that a significant number of viewers mute the volume during a particular interval of a live show might trigger an action on the producers’ side.

Context information about the user–iDTV interaction, could be a particularly valuable part of the database to be analyzed. The usual context awareness data gathering principles [55] apply: who the user is as well as when, where, how and why the interaction takes place. For example,

  • Information about who the user is may be available in a user profile database associated with a single user or a whole household.

  • Information about when, that is, the time of the interaction, may be combined with higher level time information such as whether it is a weekend or holiday period.

  • Information about where the interaction takes place is important not only in mobile TV scenarios but also to identify the particular place of the interaction, since users may watch different programs if the TV set is located in the kitchen, in the living room or in the bedroom.

  • Information about how is most relevant to identity the type of remote control used (the TV set original remote control or, say, the user’s cell phone);

  • Information about why may be available, for instance, if the user is willing to provide extra information—for instance to specify that a particular program happens to be related to a homework assignment.

3 Infrastructure for TV-based services

Capturing user–TV interaction requires an infrastructure like the one depicted in Fig. 2. Users interact with the TV via the remote control or other devices (as illustrated earlier in Fig. 1). The remote control communicates with the digital TV receiver, or set-top box (STB), which contains software capable of monitoring user interaction. Once captured, the interaction must be stored in a database, along with information about the content that the user was viewing. For instance, a change in volume should be linked to the program that the user was watching at the time of the change. In our current implementation we have opted for local storage so the information can be periodically sent to a service provider using some type of return channel. We assume that the information is sent to a service provider using an XML-based declarative document corresponding to the user interaction with the TV, as described in Section 4.
https://static-content.springer.com/image/art%3A10.1007%2Fs11042-010-0481-7/MediaObjects/11042_2010_481_Fig2_HTML.gif
Fig. 2

Infrastructure for TV-based services

The web-based service provider depicted in Fig. 2 is a central element in our architecture. The service provider supports interfaces for receiving data sent by the set-top-boxes and is in charge of collecting the information from many users. It also contains an interface for receiving metadata associated with the TV programs. We currently use the OpenSocial API1 to provide integration with Web-based social communities [17].

Since the data captured in its original (raw) form is of some value for the content producers, this value can be increased if the data is processed using data mining techniques, so that some knowledge is extracted from the captured user–TV interaction. The web-based service provider should provide interfaces so that mining services can access the data. The results of the elaborate processing of captured data is likely to provide useful information for many stakeholders, including content producers and advertisers.

4 Documenting the user–remote control interaction

In this section we illustrate the use of XML to record the user–remote control interaction. The automatically generated XML documents can be used as input by several services: any necessary transformation can be performed on the document using standard XML processing tools.

The documents we discuss conform with an XML schema we have formalized: portions of the document structure are shown in Listings 1 to 6. It is important to observe that, although the underlying schema corresponds to a straightforward solution, to the best of our knowledge this is the first example reported in the literature.

Listing 1 the name of the root element watchTV in the lines 01 and 91 of the figure. At anytime during a capture session, the document can be sent, via a return channel, to the appropriate services. It is paramount to observe that services that exploit the captured interaction may be available to users via their own TV set-top-box—with no need for a return channel.

Listing 1 Main elements of XML document with typical user–remote control interaction
https://static-content.springer.com/image/art%3A10.1007%2Fs11042-010-0481-7/MediaObjects/11042_2010_481_Fige_HTML.gif

Considering a basic remote control, all the user interactions carried out to operate the channels and volume settings are considered to be implicit interaction. When a PVR is available, many other implicit commands can be used, such as play, pause, record and several seeking options. Moreover, a PRV remote-control also supports implicit commands to control the recording of programs.

More complex implicit interactions will be available when next-generation remote controls enter the typical household: the capture of voice and ink-based comments might be possible along with gestures commands, and user identification could be possible via face [25] or body (and face) detection [33] from live video or from biometric signs such as fingerprint.

In the particular example of fingerprint authentication, technology has been available for some time. For example, a patent by Harada and Okubo describes the use an individual’s fingerprint to activate a remote control input device [21], and a patent by Philomin et al. [42] describes a method and system for controlling access to a TV set using a remote control with a fingerprint scanner. User-authentication via fingerprint recognition in the TV remote control could support accessing program content based on the preferences and requirements of the consumer [54]. The combined efforts from fingerprint remote control manufacturers and IPTV middleware and applications providers are aimed not only at content personalization and customized advertising, but also at other services such as parental controls [15, 16].

4.1 Basic context information

The XML document in Listing 2 illustrates how basic context information can be recorded: in the example of a traditional home environment, values associated with context dimensions are recorded in the element context (lines 02–19), which includes the device used (how in lines 14–18) and where the interaction took place (lines 07–13). It is worth observing that, in this example, the indication of when the interaction took place is recorded as begin and end attributes of the watchTV element. Another option is the explicit use of a when element within the context element, at same level of the who and where elements.

Listing 2 Excerpt of XML document recording context information about typical user–remote control interaction
https://static-content.springer.com/image/art%3A10.1007%2Fs11042-010-0481-7/MediaObjects/11042_2010_481_Figf_HTML.gif

4.2 Implicit interaction: basic remote control

For even the most basic remote control, user interaction and other relevant information can be captured in XML documents. In the document shown in Listing 3, lines 25–45 illustrate how all basic actions can be stored, including the moments when the user operates the channel and volume buttons. The XML elements indicate what type of interaction caused the action. For instance, lines 25–37 indicates that the user has switched from channel 2 to channel 9 via the CH_UP key at 10:03:52 on Jan 1st 2009, and line 45 recorded that the user viewed the EPG at 10:13:58.

It is important to record detailed information about both source and destination channels in order to support the full range of analysis that might be performed by data mining algorithms. Listing 3 illustrates the recording of this information as elements from (lines 26–32) and to (lines 33–36) which are contained in the action; the action element, in turn, has the attribute type set to value channelChange (line 25).

Listing 3 Excerpt of XML document recording implicit interaction associated with common actions
https://static-content.springer.com/image/art%3A10.1007%2Fs11042-010-0481-7/MediaObjects/11042_2010_481_Figg_HTML.gif

It is important to record all metadata about the program being presented at the time of the capture, as suggested by lines 28–31 in Listing 3. This information could be extracted automatically from the EPG by set-top box software or could be retrieved from Web Services by the Service Provider element of the infrastructure (see Fig. 2).

Applications may also want to know about how users manipulate the volume controls. This information can be automatically captured and recorded as shown in lines 39–43, corresponding to when the user set the volume on level 13 using the VOL_UP key at 10:04:10 on Jan 1st 2009, and in line 44, when the volume was muted using the mute key.

4.3 Implicit interaction: PVR remote control

Similar representations can be used to capture the user–remote control interaction when a Personal Video Recorder is programmed to record some broadcast content. Moreover, information about when that content is watched (e.g., in the same day it is broadcast) and how many times it was viewed may also be of interest to applications. Information can be automatically captured about the use of Personal Video Recorder control buttons such as play, pause, seek and skip, including details about starting and ending times, who performed the interaction and any other implicit interaction performed via the remote control. All this information can be recorded in appropriate XML elements and made available to appropriate applications.

4.4 Implicit interaction: interactive TV remote control

When interactive TV is available, its associated remote control contains buttons whose input is handled by interactive TV services. In the Brazilian iTV scenario, at least four buttons for interaction with applications must be available on all remotes—they are of different shapes and colors (red circle, yellow triangle, blue square and green diamond) [47]. Figure 3 (left) shows a minimal iTV remote control, with the special buttons placed above the numeric keypad; the remote on Fig. 3 (right) includes buttons for controlling media playback (backwards, play, pause, stop and record).
https://static-content.springer.com/image/art%3A10.1007%2Fs11042-010-0481-7/MediaObjects/11042_2010_481_Fig3_HTML.gif
Fig. 3

Remote controls for the Brazilian Interactive TV. Left: minimal iTV control with four special buttons placed above the numeric keypad. Right: control includes buttons for controlling media playback

Implicit interaction with the special buttons can be captured as indicated by the lines 21–24 in Listing 4. In this case, the setting of each button has been mapped to values as indicated by the interactive video currently being played back. It is worth noting that, in this case, the interaction is proposed in the media per se and not via an associated application: these are part of the interactive video alternatives [50] and correspond to implicit interactions.

Listing 4 Excerpt of XML document recording implicit interactions offered via colored buttons in the control
https://static-content.springer.com/image/art%3A10.1007%2Fs11042-010-0481-7/MediaObjects/11042_2010_481_Figh_HTML.gif

4.5 Implicit interaction: advanced remote controls

As the number of interaction and annotation alternatives grow, the document model for recording automatically the user interaction must be able to include corresponding information. Three examples are shown in Listing 5, which illustrates not only the capture of information when colored buttons are used but also are when multimodal interaction is available.

In Listing 5, Lines 60–64 report that the user made an annotation by pressing the BLUE button at 10:33:52 while watching channel 6; any available details of the program are also stored (e.g. the program id). The semantics associated to this annotation depend on the user and on the selected application.

Listing 5 Excerpt of XML document recording implicit interaction about annotations captured by advanced remote controls
https://static-content.springer.com/image/art%3A10.1007%2Fs11042-010-0481-7/MediaObjects/11042_2010_481_Figi_HTML.gif

Lines 65–70 of Listing 5 also illustrate the recording of an annotation made by voice: the time is 10:52:11 and the annotation was made by means of an audio/mp3 file named audio1.mp3.

When pen-based electronic ink is used to generate an annotation on top of a video frame, as described in our previous work [8], the annotation may be recorded as a new frame which combines the original video frame and the electronic ink to form a new image as illustrated in lines 71–76: the image type is image/png, the media file is img.png, and the annotation was made while user was watching channel 6 at 11:15:28. Depending on the infrastructure, the new frame might be rendered as an annotation as shown in Fig. 4 [8].
https://static-content.springer.com/image/art%3A10.1007%2Fs11042-010-0481-7/MediaObjects/11042_2010_481_Fig4_HTML.gif
Fig. 4

Presentation of a frame annotated with a pen-based device [8]

Due to copyright restrictions, an alternative approach is to save the ink in an XML-based syntax so that it can be added to the original frame when available. As always, other available information about the program can be added to the captured information (e.g. in the <documentState> element).

4.6 Explicit interaction: the interaction with interactive TV programs

As far as explicit interaction is concerned, interactive TV applications may exploit all available input alternatives via a well defined API. For instance, it is possible to map the special colored buttons in a minimal remote control to ratings such as excellent, good, regular and bad so that the user can vote on some aspects of the program. As an advanced example, Fig. 5 shows an application in which users interact with the TV using a smart phone [44].
https://static-content.springer.com/image/art%3A10.1007%2Fs11042-010-0481-7/MediaObjects/11042_2010_481_Fig5_HTML.gif
Fig. 5

Interaction with an interactive TV application via a smartphone [44]

Such explicit interaction may also be captured along with the corresponding mapping, as illustrated in Listing 6 lines 78–90. The example illustrates both the mapping (lines 79–82) and the action performed by the user (lines 83–89): this is necessary since the mapping can change as the interaction goes on within the program.

Listing 6 Excerpt of XML document recording user–remote control implicit interaction via mapped colored keys
https://static-content.springer.com/image/art%3A10.1007%2Fs11042-010-0481-7/MediaObjects/11042_2010_481_Figj_HTML.gif

5 Proof of concept prototype

We implemented a proof-of-concept prototype which, when integrated into the Ginga-NCL middleware, supports features for the capture of the user–TV interaction. The prototype required the extension of the middleware with 6 new software components, indicated in white in Fig. 6. The new modules are connected to Ginga via interfaces with 5 original modules, indicated in gray in the figure.
https://static-content.springer.com/image/art%3A10.1007%2Fs11042-010-0481-7/MediaObjects/11042_2010_481_Fig6_HTML.gif
Fig. 6

Extended Ginga middleware architecture: original modules (gray) and new modules (white)

The new modules are as follows:

  • Database Manager. This component embeds a database manager. We selected the SQLite database manager2 given its widespread use in embedded systems, which in turn is due to its low resource requirements. We chose to embed a database manager instead of specialized code so that the component can be used by other applications demanding a relational database.

  • Interaction Manager: This component is responsible for managing the capture of the user interaction. The actions taken by the user are captured by the original Ginga-NCL component Input Manager and passed to our new component Interaction Manager, which searches for information about the user context and presented content. This information is stored using the Database Manager.

  • State Machine Manager: This component is responsible for capturing the state machine corresponding to the NCL document presented. It communicates with the original Ginga-NCL component NCL Formatter to obtain the current state of the document being presented. The State Machine Manager requests information about the state of all media nodes which are part of the document (given that an NCL media node may be in one of three states: running, paused, prepared), and the time of each media (for those which are associated with time), as well as the value of all necessary anchors.

  • Scheduler: This component is in charge of scheduling the transmission of information to the service provider. According to pre-established criteria, the component initiates the transmission process by activating the Interaction Packager component.

  • Interaction Packager: Upon activation of the Scheduler component, this component packages the captured user-interaction that is stored locally in the Database Manager and activates the Webservice Client component. It is worth noting that the data is exported to an XML format to be sent to service provider.

  • Webservice Client: This is the component that sends the information received from Interaction Packager component to the service providers that have registered to receive user interaction information. As shown in Fig. 2, there must be a communication channel with a service provider. We implement this communication via web services using the SOAP protocol. The Webservice Client component communicates with the service provider via some available return channel.

6 Related work

Many research results have focused on the importance of analyzing log information captured from Web usage (e.g. [2, 12, 46, 51], in particular research motivated by business applications [13, 19]. Other related work has looked at recommending videos hosted on the Web [4] and supporting the search of traditional TV shows [41].

Similar work on video has tried support to community-based annotations [23, 48, 49] and recommendations [14]. The work of De Pessemier et al. [14], in particular, is similar to ours in having a focus on extracting context information from social networks.

Our work is related to non-interactive TV established services that analyze basic information collected from implicit interaction (user identification, channel tuned and volume used) via special-purpose remote controls given to sizable samples of the TV viewing audience. These devices are able to provide, via radio signal, minute-level information to a collecting server [26, 27]. The collected data is analyzed and made available to subscribers via Web-based interfaces (the result available being the average audience in each channel at each minute). This approach exploits both implicit (channel id and volume level from all TVs in the household) and explicit interaction (user id for each TV in the household), which is processed in a secure way. Their approach is based on non-interactive TV and requires specialized remote controls. Our approach assumes the presence of interactive TV technology and makes use of both standard and advanced remote controls. Although our approach does not apply to non-interactive TV, we are able to capture much more information that can be used for more complex analysis and services.

Our work is most related to the work of Álvarez et al., who define a comprehensive audience measurement model and its correspondent implementation in the context of convergent broadcasting and IPTV networks [3]. Their model includes a Logical Model and a Data Model. The Logical Model includes Meters (for monitoring and collecting consumption measures), and a Data Center (in charge of calculating statistics from the information provided by the Meters). The Data Model has components for storing information that needs to be captured, including the Metadata, Panel and Audience Data—which stores information about the content, the user and user–TV interaction, respectively. Because the authors focus on modeling user behavior, they describe both the application of their model to the broadcasting and IPTV media delivery platforms, and the metrics they define for user content consumption—however, few details are given with respect to the model associated with information collection. Our work is complementary to theirs given that we specify the XML-based schema defined to record the user–remote control interaction. Another difference is that our experimental platform and corresponding architecture is the Brazilian Ginga middleware, while they experiment with IPTV and European MHP. The most relevant contribution of their work is on the metrics defined for user content consumption—we currently evaluating its adaptation to our model.

Regarding novel alternatives for exploiting the user–remote controls interaction, the literature reports research with respect to camera-based [25, 33] and fingerprint-based [21] non-intrusive devices for user authentication, user interfaces using general-purpose handheld devices [9], gesture based remote controls [32], and tangible interfaces including a sensor-enhanced cube for physical shortcuts to the most used commands [18]. Pen-based interaction has also been exploited, including in the creation of multi-user collaborative annotations in the home [7].

As far as services are concerned, the work by César et al. considers scenarios where a secondary screen is available, discussing the provision of services to control, enrich, share, and transfer television content [10]. Our work relates to their control services since we also decouple the television stream, the optional enhanced content, and television controls. Our approach is novel both in terms of capturing the television control information from the main remote-control, and for proposing the processing of the user-interaction in independent and novel tasks.

7 Final remarks

The availability of an infrastructure for capturing the user–remote control interaction while the user watches TV brings up many opportunities for integrating the captured information. After the proper mining and filtering of the captured data, a vast data base should be available to many applications offering recommendation, personalization and collaboration services.

However, a capture infrastructure such as the one proposed in this paper brings up issues of privacy and ownership. Regarding privacy, a properly deployed infrastructure should be able to take into account that users must be made aware when information is automatically and pervasively captured. One point that might be stressed is that, with continuous long-term capturing, users tend to forget that any monitoring is made. This implies a requirement for the infrastructure that users should be notified of capture as often as it is appropriate from the user’s perspective.

Regarding ownership, one important concern is that all annotation must be kept independent of the original source. This means that the captured information must be kept independent from the original media. The approach we propose is to record all the captured interaction as independent content (illustrated by the use of an XML document), so that users may share their TV watching history independently from the original media.

In the short term, we need to extend and evaluate the current version of our XML Schema with a larger group of researchers involved in the development of the Brazilian Interactive TV platform. We also plan to integrate the metrics for user content consumption defined by Álvarez et al. to our architecture [3].

Our current results leverage further work on integrating the capture infrastructure presented in this paper to a peer-to-peer platform we have proposed elsewhere [8], so as to facilitate the synchronous collaborative sharing of explicit annotations. In the short term, we also plan to be able to export the XML document we generate to an XML-based distributed storage for capture information (e.g. [43]). This storage could be used is applications related to personalization, linking and recommendation (e.g. [34]) and with others specially targeted at TV content [58].

Acknowledgements

We thank the following organizations for their support: FINEP, FAPESP, CAPES and CNPq. We thank Dick Bulterman for great discussions on this topic. We thank Luis F.G. Soares for inspiring us with the SBTVD. Erick Melo is a MSc candidate supported by CAPES. During the research reported in this paper, Renan Cattelan was a PhD candidate supported by FAPESP.

Copyright information

© Springer Science+Business Media, LLC 2010