Understanding the behaviour of online TV users

Karahasanović, Amela; Heim, Jan

doi:10.1007/s00779-015-0865-9

Understanding the behaviour of online TV users

Original Article
Published: 25 June 2015

Volume 19, pages 839–852, (2015)
Cite this article

Download PDF

Personal and Ubiquitous Computing Aims and scope Submit manuscript

Understanding the behaviour of online TV users

Download PDF

Amela Karahasanović¹ &
Jan Heim¹

1814 Accesses
5 Citations
Explore all metrics

Abstract

The amount of online video content available to us is rapidly increasing. Understanding how people are seeking and consuming this content is a prerequisite for providing good services. This paper investigates whether and how log data can be used to identify information-seeking behaviour in the context of online TV. A study was conducted where 27 participants performed given tasks on two Norwegian online TV sites. The participants were between 20 and 25 years old, and all of them were moderate or heavy users of online TV. Tasks that require both scanning and searching of information were given. Four main types of behaviour were identified in the qualitative data: goal-directed search, goal-directed metadata search followed by consumption, goal-directed search of metadata and video, and explorative behaviour. Detailed log event files were compared to self-reported data describing user’s activities (feedback collected at the end of each task and interviews) and screen captures. Our results indicate that the following four variables in the log files: number of (short navigation sequence, short video watching sequence) pairs, frequency of video search actions, percentage of time spent on sequences of navigate actions and percentage of time spent on watching videos can be used to characterise the four types of behaviour. This work extends previous research on usage of log files in describing user’s behaviour by providing simple way of characterising behaviour of online TV users. In particular, the results might be useful in supporting the personalisation of online TV services.

Investigating Motivational Factors Influencing Users’ Consumption of Video Streaming Services: A Human Factor Perspective

What content and context factors lead to selection of a video clip? The heuristic route perspective

Article 16 May 2019

Sang-Hyeak Yoon & Hee-Woong Kim

Relations Between Actions Performed by Users and Their Engagement

1 Introduction

Today, we are living in the information overload age. The futurologist Alvin Toffler predicted in 1970 that the growth of the information being produced would cause problems [17]. According to a study done by the analyst group IDC, the total amount of digital information in the world was 2.7 zettabytes^{Footnote 1} in 2012, and this figure was estimated to reach 7.5 zettabytes in 2015 [11]. Ninety per cent of this is unstructured data, such as digital video, sound files and images. On YouTube alone, 100 h of video are uploaded every minute [27]. This huge volume of data makes it challenging for users to search and retrieve relevant and interesting content.

To provide good services, service providers need a comprehensive understanding of users’ behaviour. Knowing how the users search and consume video content might facilitate the creation of systems that adapt to the users’ needs. Research has been done to understand how people search and retrieve information in general [4, 23]. The tasks which users perform when searching for information in the context of online newspapers have been identified [12, 13], and users’ behaviour has been analysed to support personalisation and improve design [16]. The results of studies on network traffic data [18] and the online content which users access have shown the possibility of predicting user behaviour. However, users’ search behaviour is context dependent [12]. A Web search with a specific goal, such as finding the timing of the next bus, differs from a search for an interesting TV show.

The purpose of this paper is to investigate which variables might be appropriate for identifying different types of behaviour in the context of online TV users. We conducted an explorative laboratory study with 27 participants between the ages of 20 and 25 years. During an hour-long experimental session, each participant performed 10 given tasks on 2 Norwegian online TV sites. Using detailed log files, screen capture files and self-reported descriptions of the user’s activities, we identified four main types of behaviour:

Goal-directed search One searches actively until one finds what one wants.
Goal-directed metadata search followed by consumption A goal-directed search of metadata (short programme descriptions or ingresses) is followed by a longer session of watching video or reading news.
Goal-directed search of metadata, goal-directed video A goal-directed search of metadata is followed by an active goal-directed search of video.
Explorative There is no particular strategy for finding the answer or something in which one is interested—just browsing.

To identify which variable might be appropriate for identifying the different types of behaviour, we analysed log files, proposed a set of variables and analysed the relationships among the strategies identified on the basis of the qualitative data with the values of the variables based on the log files. Our results indicated that the following four variables might be used to identify the users’ behaviour:

Number of pairs (short navigation sequence, short video watching sequence). A navigate sequence consists of the actions such as using the search function, scrolling and following the link.
Frequency of video search actions.
Percentage of time spent in sequences of navigation actions.
Percentage of time spent on watching videos.

Many studies on user behaviour have classified the users according to their demographic and background data, as opposed to their actual behaviour, although an a priori grouping of users is far from exhaustive [7, 12]. A dynamic identification of users’ strategies might facilitate personalising online TV and providing a better experience.

The remainder of this paper is organised as follows. Section 2 provides an overview of the related work. Section 3 describes the experimental design, and Sect. 4 describes the analysis. Section 5 presents the results of the study. Section 6 discusses our findings. Section 7 concludes and describes future work.

2 Related work

Modelling users’ behaviour originates from marketing research and aims to provide input for marketing strategies and product design. One widely applied technique involves segmenting customers using different statistical and data mining techniques. Within media and Internet research, several categorisations of users into distinct user types, the so-called typology of users, have been proposed. Based on the analysis of 22 media user typologies, Brandtzæg [5] proposed a classification of user behaviours, according to: frequency of use, variety of use, and content and activity preferences (nonusers, sporadics, lurkers, entertainment users/socialisers, debaters, instrumental users and advanced users). This user typology claims to be universal across different cultures, as well as stable over time. In a report on digital behaviour in the UK, the Digital Anthropology Report, six major groups of users were identified: digital extroverts, timid technophobes, social secretaries, first lifers, eager beavers and web-boomers [25].

Studies of users’ behaviour based on different types of log data have also been conducted in the context of Web and online TV. In his study on user behaviour within the context of search engines, Stenmark [24] used log files to identify similar groups of users based on their actual search behaviour. In his research, he identified the following groups: unsophisticated users, occasional users, fact seekers, interactive users, knowledgeable users and intensive searchers. The following data were analysed: query length, number of queries that were similar, time spent examining documents, time spent examining result pages, session duration, number of queries, number of viewed hits, requested result pages, number of activities, number of sessions and number of active days. Although it was conducted in a different application domain, this research can provide some input for studies concerning online TV since there are some similarities in user behaviour. The users of these services can, for example, browse metainformation on TV programmes in order to decide what they want to watch, or they might want to read a newspaper article about a video that they have watched. However, there might be some differences. However, it should be noted that the usage of search engines is more often goal-directed. Search for a specific piece of information such as tomorrow’s weather forecast, available hotels for holidays or driving direction differs from finding and watching an interesting video clip. While finding relevant information quickly would have the highest priority for search engines, different aspects of user experience might be more important for user of online TV.

In their study of P2P IPTV systems, Hei et al. [15] collected numerous statistics such as the evolution of the total numbers of peers in the PPLive network, distribution of peak number of peers among all channels, trend numbers of participating users, peer arrival and departure evolution on a popular movie channel, and peer download and upload of video traffic. Szabo and Huberman [10] have developed a method for predicting the long-term popularity of online content based on the early measurement of user access to two video-content-sharing portals Digg and YouTube. The prediction was based on the analysis of a huge amount of user access data (29 million diggs, video-count time series on 7146 selected video daily for 30 days on YouTube). In their study of video on demand over IP, Yu et al. [28] analysed the following: user access over time (hourly, daily and weekly access patterns), user arrival distributions, session lengths, popularity distribution, rate of change in user interests, etc. Antonini et al. [2] proposed a model for the integration of the heterogeneous and dynamic data coming from broadcaster’s archives, online newspapers, social media, etc. A cross-domain analysis scenario based on data gathered from YouTube, Twitter and a talk show has been provided. A knowledge graph connects information on subjects, social objects and concepts together with time dimension and enables connecting, for example, videos and hashtags.

Acharya et al. [1] investigated viewing and browsing patterns of video-on-web (VoW) users based on the analysis of the logs of the multicast media-on-demand (mMOD) video web server in the educational context. Their results showed that several requests for the same video title often occur within a short period of time. Furthermore, users often preview the initial portion of a video to find out whether they are interested in it. Baluja et al. [3] analysed the aggregated viewing patterns and video discoveries of YouTube users in order to provide users with recommendations tailored to their viewing habits. The data were collected from live user views of videos from youtube.com for a 3-month period, resulting in nearly 29 million total views of a set of approximately 4.2 million videos.

A general and comprehensive model of information-seeking behaviour based on observations and empirical findings has been proposed by Belkin et al. [4]. It consists of the following four dimensions: method of interaction, goal of interaction, mode of retrieval and resource considered. The method of interaction can be either searching for a specific item or scanning for something interesting. The goal of interaction might be learning about an item or selecting items for retrieval. The mode can be recognition or specification, and the resource can be information or metainformation. The authors suggest that any information-seeking strategy can be described within this four-dimensional model.

In many different contexts one differentiates between two general types of navigational behaviour: a directed search mode in which one is looking for a particular piece of information and a browsing mode that is more exploratory [21]. In the literature, a directed search mode is also called goal-directed, instrumental, searching, whereas browsing is also referred to as experiential. These two modes are closely related, and users move between the two of them [21].

In the context of decision-making Sacchi and Burigo [23] distinguish sequential and more directed (less sequential) information search strategies. In their investigation of the relation among task structure, knowledge and source, they have an index of sequentiality for capturing the directedness of the participants’ search behaviour. This index is an absolute value of the correlation between the original displayed order of the items and the order followed by participants, and a higher number indicates a more sequential strategy.

In their analysis of a video-on-demand service, Vilas et al. [26] identified a typical user’s session to be a composition of one or more reproductions with periods of reflection between them. Reproduction is the period between the first play interaction and the stop or end of the video. During this period, the users pause and play. Table 1 provides an overview of the information-seeking strategies described in the literature.

Table 1 User behaviour strategies and tasks

Full size table

Related to this is research conducted on user’ task identification in the context of the Web and TV. Kunert identified tasks in the context of interactive TV (iTV) applications [20] based on focus group studies. Darnell recorded users in their own home to investigate how they watch digital TV and use digital video recorder (DVR) systems [8]. The most frequent interactions of DVR users with the TV were skipping adverts, pausing and finding something else to watch when programmes ended. Participants without a DVR interacted with TV mostly by finding something else to watch when adverts or boring content occurred or when their programme ended. The most frequent interactions were changing channels by punching in channel numbers, recalling the previous channel and going to the guide to find a show.

Elkhatib et al. [9] investigated dynamic behaviour of online TV by analysing user and service statistics from more than 1,100 users over a period of 6 months. The collected statistics included time coded events such as playback requests, keyword searches or page loads. Cluster analysis identified distinctive groups of users with consistent browsing behaviour exhibited during different periods of days (5 am–3 pm, 3 pm–11 pm, and 11 pm–5 am). The results show clear differences between the time segments in way of accessing video-on-demand content. This behaviour was observed in several 11 pm–5 am clusters and only in one cluster in other time segments. Similarly, 11 pm–5 am is the only segment where the programme guide is used as the primary content portal. Rautiainen et al [22] investigated another temporal aspect of search behaviour in the context of online catch-up TV services. Based on the analysis of more than 5000 user sessions for a 12-month period, the authors found that the programmes accessed via browsing programme summaries were typically less than a week old, whereas the programmes accessed via free text search were typically older than a week.

The above-mentioned empirical studies describe the usage patterns in the context of online TV and related services applying a variety of methods such as focus groups [20], ethnographic studies [8], analysis of network traffic data [18] and analysis of server logs [9, 10, 15, 26, 28]. However, service-specific knowledge concerning users is insufficient for optimising delivery of such services. The classification of users as static demographic segments, on the other hand, is too coarse for delivering an optimal user experience to individual users and households. Furthermore, knowledge on information-seeking strategies within the context of search engines cannot be directly applied in the context of online TV due to differences in their intended use. We therefore envisage more empirical studies of information-seeking strategies in the context of online TV and methods enabling such studies.

This research sought to answer the following research questions:

RQ: Can we use log analysis to identify strategies used by online TV users when searching for and consuming video content that they are interested in?

3 Experimental design

3.1 Participants and setting

To test our experiment design, materials and tools, we first conducted a pretest study with one MSc student and then a pilot study with four MSc students. We focused our research on the younger participants familiar with online technologies. To balance for no-shows and problems with the technology used, we recruited 40 participants between 20 and 25 years old (mean 22.78, SD 1.60). They were recruited through the Norwegian-marked research and panel providing company, Nordstat, which provided random sample of the Norwegian Internet users with the specified characteristics (age, gender, education and online experience). They were given a 300 NOK gift card for their participation. All participants were from the Oslo area and had finished at least secondary school. They were distributed into four groups according to sex and usage of online TV as follows: 13 female heavy users, 15 male heavy users, 14 female moderate users and 8 male moderate users. To distinguish between heavy and moderate users, we used a simple questionnaire based on the work of Brandtzæg et al. [6]. The participants who reported that they watched news or entertainment programmes online every day or more often that once a week were categorised as heavy users. The participants who reported that they watched such programmes more often than once a month were categorised as moderate users. These categorisations are related to the use of online services in general, and not to online TV per se.

Due to different technical problems (problems with logging tools, problems with the network and problems with servers of online TV providers), the logging data for 13 participants were incomplete. For 27 participants (18 female: 10 heavy and 8 moderate users, and 9 male: 7 heavy and 2 moderate users), we have complete log files. These were used in further analysis.

The participants attended a 1-h-long experimental session at our usability laboratory. In each session, a participant was in a separate room with an observer. In addition, the participants were observed through a one-way mirror. The room where the participants were placed resembled a living room. The experimental sessions were conducted during a 2-week period. There were between 3 to 10 participants per day, and two observers were present at each session (one with the participant and one behind the one-way mirror).

3.2 Treatment and experiment procedures

This was an explorative study, and all the participants worked under the same conditions (same equipment and same tasks). All of the studies (pretest, test, pilot and the main study) followed the same structure:

When they arrived at the laboratory, the participants were welcomed and provided with a brief description of the experiment. The participants were then asked to sign a consent form and to fill in a short background questionnaire. This took about 15 min. After that, the participants worked on the given tasks for about 30 min. At the end of each task, a pop-up window with two short questions appeared on the screen. When a participant had answered the questions, the next task appeared. The participants had to complete a total of 10 tasks, two of them being warm-up exercises. This session took about 30 min. If the participants spent more time on a task than the maximum time allotted for it, the observer asked them to move on to the next task. At the end of each session, we conducted an interview with the participant. We asked open-ended questions about their usage of online TV and how they had worked on the given tasks. In addition, we asked them what they thought about the experimental set-up, tasks and their participation.

3.3 Data collection and supporting tools

Each participant used a laptop computer running Windows 7. The tasks were presented to the participants on the screen, but were also available on paper. We used Observer XT for making videos of the users’ behaviour and screen capturing, uLog 3 for weblogs, and our own Java application for collecting background information, presenting the task to the participants, collecting their answers to the given tasks and feedback collection. At the end of each task, a pop-up window appeared with the text “Please describe how you have worked on this task (what you have done first, second etc.)”. uLog collected all keyboard actions, mouse clicks and wheel movements, address changes in the browser and windows activations. The participants were instructed to briefly describe how they had worked. The task solving sessions were video recorded by PC cameras (recording the participants’ facial expressions), and the interviews were audio recorded. In addition, the observers wrote notes. The observer who was in the room with the participant was instructed to record only important events (technology problems, participants arriving too late, etc.) and not to sit too close to the participant in order to not disturb him/her.

3.4 Tasks

We created two types of tasks: one with a specific goal defined (GD) and another without (WG). Two Norwegian online TV services (TV-A and TV-B) were used in the experiment. For each of these services, we first formulated a small warm-up exercise allowing users to familiarise themselves with the environment and the online service (tasks TA0 and TB0). For TV-B, we also wanted to evaluate the usability of their social media functionality and so included two tasks designed to accomplish this. These tasks were not included in the analysis and are not presented here. Table 2 provides an overview of the different tasks. As the participants in the experiment were Norwegians, the tasks were originally written in Norwegian and have been translated into English by the authors of this paper.

Table 2 Task descriptions

Full size table

4 Analysis

A preliminary analysis showed no relationship between the different tasks and the key variables defined below. Close inspection of the user logs revealed that different participants solved the different tasks in quite different ways. This led to focusing on of the strategies as stated by the users themselves while solving the tasks, rather than the task type as such.

To answer the above-stated research questions, we analysed quantitative data on performance from log files and the qualitative data from feedback collection and interviews. We first identified strategies from the qualitative data. After that, we identified variables that might help in identifying these strategies from the log files and, finally, we analysed the relationship between the two.

4.1 Identification of strategies from feedback collection

The written answers provided by the participants in the feedback collection windows were analysed to explore the processes of information seeking when searching for and consuming video content. These answers were extracted in an Excel file and anonymised regarding the participant and the task. They were then analysed as follows. First, we selected ten random answers and used them to develop a coding schema. Both researchers in parallel analysed the text, proposed categories for the schema and then together made a final coding schema. One researcher then applied this schema to the rest of the data set.

4.2 Identification of strategies from users’ actions

We started the analysis by defining the actions and the variables that might help us to distinguish between the different strategies. Actions were identified by the participant’s interaction with the PC: mouse clicks, scrolling, using keyboard for submitting search strings and so on. It was also needed to include automatic responses from the system such as video buffering. A complete list of actions that were logged is found in “Appendix”.

In order to identify these variables, we first selected three of the longest log files, assuming that they will contain rather extensive sets of relevant user actions. Then, both researchers in parallel went through these files, identified relevant actions, sequences of actions and variables that might identify different types of behaviour. Based on the discussion between the researchers, a final list of actions and variables was made. This was then implemented as a Java program that was used for the log file analysis. Some random files were also analysed manually to check whether the programme worked as intended. The details of the analyses were as follows.

The reliability of the coding scheme was assessed in a separate analysis. The Krippendorff reliability coefficient alpha was 0.8727. (see “Appendix” for further details.)

Each participant produced one task log per task. Each task log was partitioned into several sequences containing a series of similar elements. We distinguished among four types of sequences: Navigate, Video, Administration and Ignore. The Administration and Ignore sequences were not included in the analysis. The Administration sequences contained task and subtask instructions related to the presentation of the tasks and the performance of the subtasks, such as answering questions, etc. The Ignore sequences contained redundant coding (some coding categories always followed each other) and responses that the coders could not recognise, which were coded as “Not clear”.

A Sequence was defined as any sequence of elements of the same type (Navigate, Video, Administration and Ignore). The duration of a sequence was from the beginning of the first element in the sequence to the beginning of the first element in the next sequence. The Navigate sequence was defined as a sequence consisting of the actions typically performed when searching for the metadata or information on a web page, such as using the search function, scrolling and following a link. The Video sequence was defined as a sequence consisting of the actions typically undertaken when watching the video, such as play, stop, pause and adjusting the sound and screen size. The elements of these sequences are presented in the “Appendix”.

Based on the definitions of these sequences, the following set of variables was defined:

taskTime Sum of time spent on video and navigate sequences.
navigateActivity Percentage of task time used on navigate sequences.
longNavigateSessions Number of navigate sequences longer than 20 s.
pureVideoTotalTime Time spent on video sequences not containing video search.
relativeViewTime Percentage of task time used by pureVideoTotalTime.
videoSearchTotalTime Total time spent on video sequences containing video search.
videoSearchCount Total number of video searches.
videoSearchSpeed videoSearchCount/videoSearchTotalTime

To distinguish between the different types of behaviours, we needed to differentiate between the shorter and longer sequences of actions. Based on the given tasks, we set a limit of 20 s and considered sequences shorter than 20 s as short sequences and defined them as follows:

shortPureVideo Video sequence without search/navigate action that lasts less than 20 s.
ShortNavigate Navigate session that lasts less than 20 s.
ShortPairsSequence Any sequence containing only short pairs.

This allowed the calculation of the following variable:

shortPair Number of elements (shortPureVideo and ShortNavigate; independent of order) in shortPairsSequences.

The value of 20 s was used since it divided the distribution of short pairs in two approximately equal parts. (see the explanation in Table 5). Four of the variables above were defined as key variables and used in the final analysis to identify the strategies used by the participants: shortPairs, videoSearchSpeed, navigateActivity, and relativeViewTime. The other variables were used in the definition of the key variables. This allowed for an analysis of the relationship between the strategies identified on the basis of qualitative data (done by the first author of the paper) and the key variables based on the log files (done by the second author of the paper). The SPSS 22 statistic tool was used for this analysis.

5 Results

We first analysed the raw data and excluded the participants whose log data or screen capture files were not complete, the participants who arrived later to the experiment than planned or those who experienced some other technical problems during the experiment. This left 27 participants for the analysis.

5.1 Strategies from feedback collection

We observed the following four different strategies among the participants:

S1 Goal-directed One actively searches until one finds the information he/she wants. Depending on the task, it could be the answer to the question posed or something one is interested in. If the users immediately found what they wanted, it is still S1.

Example:

“I have used the search-field to find the information needed to solve the tasks”.

S2 Goal-directed metadata and consumption A goal-directed search of metadata is followed by a more passive, longer consumption session (watching video, reading of news). The difference between S1 and S2 is that in S1 one was searching for information, whereas in S2 one was searching for the content and consumed it. In S2 it was explicitly stated that the search session ends in a watching/reading session.

Examples:

“I found some clips I wanted to see and watched them. I was not looking for anything in particular”.

“I went back to a clip I found earlier and watched it for five minutes”.

“I found the programme and watched it. I never found the answer to the question”.

S3 Goal-directed metadata, goal-directed video A goal-directed search of metadata is followed by an active goal-directed search of video.

Examples:

“I searched in the search field for the episode name. I was then spooling backward and forward to find the answer to the question”.

“I have found the programme under the letter M. I was spooling backward and forward to find the answer”.

S4 Explorative/moment-by-moment Many shorter search–watch sequences. No particular strategy on how to find the answer or something one is interested in—just browsing.

Examples:

“I have done what I always do—check the news. Today, there was nothing interesting, only something about the Swedish king. So I was browsing, looking for the programmes I could watch later, and then went on direct broadcast of different channels”.

“Went on Japan and watched a video there. Stopped before it was finished, went on Travels, but there was only one video there about studies abroad, so I stopped it. There was nothing interesting on Culture either, but I started a video there”.

S5 Not clear. Based on the description from the users one cannot specify how they were working

5.2 Strategies, tasks and background information

We identified the strategies applied by the participants based on the qualitative data as described above. A total of six tasks were analysed (see Sect. 3.4). Each task was assigned one of four strategies as described in Sect. 5.1. The distribution of how often each participant was assigned a specific strategy is shown in Table 3.

Table 3 Number of tasks logs where the participant applied a specific strategy

Full size table

The Table 4 above shows, for example, that Strategy 1 was not used at all in 11 task logs. It was used once in 9 task logs, twice in 6 task logs and three times in one task logs. By number of tasks logs we mean here the number of tasks performed by different users (task TA1 conducted by the participant P1 gives one task log, and task TA1 conducted by the participant P2 gives another task log).

The three variables of age, gender and advanced usage that were used during the recruitment of the participants were compared to the frequency of the participants’ usage of the different strategies. In the case of gender, the correlation coefficient is the point biserial coefficient. Table 4 shows that the only significant relationship (significance less than 0.01) is that older participants tended to use Strategy 1 more often. Strategy 1 is a goal-directed search where one searches actively until one finds what he/she wants. The relation between the different tasks and the key variables was examined by SPSS ANOVA general linear model repeated measurements tests, but no significant relationships were found.

Table 4 Correlation between age, gender and advanced usage, and frequency of strategy usage

Full size table

5.3 Strategies and variables

Tables 5 and 6 show the degree of association between the strategies used by the participants to solve a task, and the four different key variables as defined in Sect. 4.2. Due to skew and the lack of normal distribution of the variables, the nonparametric “median test” was used to test the significance of the associations.

Table 5 Frequency of tasks over and equal or below median on defining variables for the four strategy types: shortPairs, videoSearchSpeed (videoSearchCount/videoSearchTotalTime), navigateActivity (Per cent of task time used in navigate sequences) and relativeViewTime (per cent of task time used by pureVideoTotalTime)

Full size table

Table 6 Test statistics for data in Table 5. Grouping variable is strategy

Full size table

Table 6 shows that there are significant differences between the strategies with regard to the four key variables as seen from the bottom row of the table. For example, the key variable shortPairs had a median of 0, meaning that about half of the task occurrences were not assigned short pairs at all.

Table 5 shows that the distribution of short pairs were different over the strategies, for example 9 out of 26 (36.5 %) tasks were assigned short pairs in Strategy 1, while 19 out of 26 (73 %) were assigned short pairs in Strategy 4.

6 Discussion

We envisaged a simple way to identify strategies used by online TV users when searching for and consuming video content that they are interested in. The four variables we proposed were associated with the four strategies in the following way:

S1 Goal-directed. This strategy is characterised by a high degree of navigateActivity and relatively few shortPairs. The strategy is to search the Web with less emphasis on the content of the videos.
S2 Goal-directed metadata and reproduction. This strategy shows a low degree of videoSearchSpeed, moderately few shortPairs and a moderately high degree of relativeViewTime. This strategy is characterised by watching the videos, rather than actively seeking specific content.
S3 Goal-directed metadata, goal-directed video. S3 shows low relativeViewTime and high videoSearchSpeed, moderately few shortPairs and low navigateActivity. This strategy is typical for searching the videos for relevant content, rather than searching the net.
S4 Explorative/moment-by-moment. This strategy shows many shortPairs and very high relativeViewTime. The strategy seems to be characterised by actively searching for a relevant video and then watching it rather than seeking through it.

Our results indicate that the key variables are related to the participants own description of their strategies while solving the tasks. The variables are best suited to characterise the behaviour of online TV users while they are applying different strategies in different situations. Although the four variables: shortPairs, videoSearchSpeed, navigateActivity and relativeViewTime were fairly good indicators of the strategies used, a unique identification of strategies solely based on these kind of variables would need more research.

Studies done by others in similar contexts typically consider larger number of variables. In the analysis of the behaviour of search engine users, Stenmark [24] analysed larger number of measures including query length, number of queries that were similar, time spent examining documents and time spent examining result pages, enabling thus more detailed categorisations of user’s behaviour. Some of these measures correspond at the conceptual level to the variables we used. The time spent by a user examining a document in the context of search engines might correspond to the time a user spends watching a video in the context of online TV.

Another branch of the research predicts popularity of the content or recommends the content based on the detailed analysis of log files in the context of P2P IPTV systems [15], video-content-sharing portals [3, 10] and video on demand in the educational context [1]. The variables we identified say nothing about the content that is consumed. Information is differentiated by its type (metainformation, text, video) and not by its content (programme, episode, feature article). However, some of the variables they use are the same such as session/task time and video watching time. We believe that these systems could be extended with the information on user actions we propose to provide ultimate user experience.

The goal of this research was to test whether and how log analysis might be used to identify behaviour of online TV users, rather than to identify different behaviour types per se. However, the results from the analysis of the qualitative data we collected extend the previous work on information-seeking strategies and online behaviour. The four strategies we identified—one highly explorative and three strategies starting with a goal-directed search were based on open coding of the self-reported descriptions of the participants activities. The list of strategies we identified is less comprehensive than the general list of information-seeking strategies given by Belkin et al. [4], but gives a more detailed domain-specific description of the information that has been searched or scanned. We make distinction between search of video (Strategy S3 Goal-directed metadata, goal-directed video) and other types of search such as scrolling the web page and use of search words (strategies S1 Goal-directed and S4 Explorative). Similar to the work of Vilas et al. [26] that was conducted in the context of video-on-demand services, our strategies include descriptions of video consumption. As expected, two well-known types of general navigational behaviour: a directed search mode and browsing mode [21] appear also in our data set. Our strategies S1 (Goal-directed) and S3 (Goal-directed metadata, goal-directed video) correspond to the directed search mode, whereas our Strategy S4 (Explorative) corresponds to the browsing mode. Strategy S2 (Goal-directed metadata and consumption) is interesting one when considered in this context. While people generally watch TV programmes and video clips for pleasure, some of the tasks in our experiment required finding a specific piece of information in a video. Some participants approached this by actively searching the video (Strategy S3), whereas others (Strategy S2) watched the whole video (sometimes several times) until they found the answer to the posed question. Both strategies start by active goal-directed search of the available metadata (information on programmes, episodes and similar), but differs with respect to search/consumption of the video although they were working on the same tasks and same information structures and user interfaces. A user that applies a goal-directed strategy when searching metainformation on a website will not necessarily continue with goal-directed behaviour when searching video. This has to be considered when designing online TV applications.

In the general information and decision-making research, search is affected by the information structure and experience, but not by the given task [23]. Similarly, our results show no relationship between the given task and the variables we proposed. There was no relationship between the background of the participants and the strategies that they used. The only significant relationship that we noted was related to the older participants’ tendency to use a goal-directed search, where they searched actively until they found what they wanted.

6.1 Limitations

Limitations of this study lies in the drawbacks of the research method and the instruments we used. To reduce the effects of technical problems that might appear with the logging tools, we recruited a relatively large number of participants and checked whether there was a systematic difference between those with and those without a complete set of log data by comparing the two groups with respect to the background information. The nonparametric “median test” was chosen since most of the data distributions were rather skewed. As expected there were no significant differences between the two groups. The results are shown in “Appendix”.

The participants in this study were between 20 and 25 years of age, and all of them were heavy or moderate users of online entertainment. A larger number and variety of participants might yield more extensive data. It is possible that the list of identified strategies and variables needed to identify them would be different for other user groups. Therefore, our results are limited to this group of users. The questionnaire we used for recruitment of heavy and moderate users was based on the studies of general usage of ICT and online behaviour conducted in Europe by 2011 [6]. It is a subset of a norm-based questionnaire developed to identify Internet user types in general, not online TV users in particular. The level of access and usage of different user groups tends to increase for all groups [6]. Future studies exploring relationship between different user groups and strategies should take into account this overall development and more specific trends related to online TV consumption.

We created the given tasks to cover two different types of search and consumption. Learning about the content or dynamic changes on the service providers’ sites, of course, would influence the participants’ search strategies. To minimise this learning effect, we designed the tasks to be independent of each other. To avoid the possibility that some users would find the content at other locations (or not find it at all), we asked the service providers to keep the content in the same place during the study period. We conducted two warm-up exercises (one for each online TV service) to ensure that the participants would feel comfortable with the services and the experimental set-up. In the interviews following the experiment, the participants said that the tasks were realistic and that they worked as they would at home or at the office.

7 Conclusions

The aim of this research was to investigate whether and how log files can be used to identify behaviour of users when searching and consuming online TV content. In a study where 27 participants between 20 and 25 years old performed given tasks on two Norwegian online TV sites, we collected detailed user log files together with other data sources (self-reported descriptions of the performed activities, interviews, screen captures). We identified four main types of behaviour based on the analysis of the qualitative data. We than proposed a set of variables enabling recognising similar online behaviour. Our results indicate that following four variables: number of (short navigation sequence, short video watching sequence) pairs, frequency of video search actions, percentage of time spent on sequences of navigate actions and percentage of time spent on watching videos were fairly good indicators of the strategies used. These variables can therefore be used to characterise these four types of behaviour.

The results concerning the variables might be useful for researchers and practitioners investigating information-seeking behaviour of online TV users. Collecting too much data or wrong data is well-known problem in studies of different types of log files. Our results could therefore be useful for design of future studies on user behaviour. Particularly, the sequences and the associated actions we identified might help selecting the actions that should be logged. Based on our results, we recommend identifying navigating sequences, watching sequences and video search sequences, and calculating the above described variables. Further, we believe that the proposed variables can be useful for an automatic detection tool that will allow the system to adapt to a user’s current strategy and thus provide a better user experience.

The focus of this research was on the method that helps understanding the behaviour of online TV users, rather the behaviour itself. However, we believe that our results regarding the strategies used by online TV users might be useful as well. On the practical level, they might help the designers of online TV websites to provide better support to the users when they are searching and consuming video content. On the theoretical level, the strategies we identify extend the scope of general models of information-seeking strategies by adding understanding of content consumption. The list of strategies we identified based on the analysis of the qualitative data gives a more detailed domain-specific description of the information that has been searched or scanned than the list of information-seeking strategies given by Belkin et al. [4]. We make distinction between search of video and other types of search such as scrolling the web page and use of search words. Further, our results reveal an interesting strategy consisting of goal-directed search of the metadata followed by passive consumption of video content. A user that applies a goal-directed strategy when searching metainformation on a website will not necessarily continue with goal-directed behaviour when searching video even if the purpose of the watching video was to find a specific piece of information. This can be considered as a refinement of the classification proposed in [26] and should be taken into account when designing for different types of behaviour.

We plan to extend our research in two directions. First, we want to investigate whether field studies logging users’ behaviour in natural settings (at home, while commuting, breaks at work) and on different devices (laptops, smartphones and tablets) over a longer period of time will give the same results. Collecting larger amount of data in the wild might reveal new behaviour patterns and need for new variables. Second, we want to evaluate whether personalisation based on the strategies we identified can improve the user experience. The question remains to be answered which level of the personalisation users experience as appropriate and desired.

Notes

This is 2.7 followed by 21 zeros.

References

Acharya S, Smith B, Parnes P (2000) Characterizing user access to videos on the world wide web. In: proceedings of multimedia computing and networking San Jose, CA, 2000. SPIE—International Society for Optical Engineering, Bellingham, Wash, pp 130–141
Antonini A, Vignaroli L, Schifanella C, Pensa RG, Sapino ML (2013) MeSoOnTV: a media and social-driven ontology-based TV knowledge management system. In: Paper presented at the proceedings of the 24th ACM conference on hypertext and social media, Paris, France
Baluja S, Seth R, Sivakumar D, Jing Y, Yagnik J, Kumar D, Ravichandran D, Aly M (2008) Video suggestion and discovery for YouTube: taking random walks through the view graph. In: Proceedings of the 17th international world wide web conference, WWW ‘08, Beijing, China, 2008. ACM, New York, NY, USA, pp 895–904
Belkin NJ, Cool C, Stein A, Thiel U (1995) Cases, scripts, and information-seeking strategies: on the design of interactive information retrieval systems. Expert Syst Appl 9(3):379–395
Article Google Scholar
Brandtzæg PB (2010) Towards a unified media-user typology (MUT): a meta-analysis and review of the research literature on media-user typologies. Comput Human Behav 26(5):940–956. doi:10.1016/j.chb.2010.02.008
Article Google Scholar
Brandtzæg PB, Heim J, Karahasanovic A (2011) Understanding the new digital divide-A typology of Internet users in Europe. Int J Hum Comput Stud 69(3):123–138
Article Google Scholar
Chen HM, Cooper MD (2001) Using clustering techniques to detect usage patterns in a web-based information system. J Am Soc Infor Sci Technol 52(11):888–904
Article Google Scholar
Darnell MJ (2007) How do people really interact with tv? Naturalistic observations of digital TV and digital video recorder users. Computers in entertainment (CIE) - Interactive TV archive 5 (2, April/June 2007, Article No. 10)
Elkhatib Y, Killick R, Mu M, Race N (2014) Just browsing? Understanding user journeys in online TV. In: MM ‘14 Proceedings of the ACM international conference on multimedia, Orlando, FL, USA, 2014. ACM, New York, pp 965–968. doi:10.1145/2647868.2654980
Szabo G, Huberman BA (2010) Predicting the popularity of online content. Commun ACM 53(8):80–88
Article Google Scholar
Gantz J, Reinsel D (2010) The digital universe decade—are you ready?
Gutschmidt A (2013) Classification of the user tasks by the user behaviour. Empirical Studies on the Usage of On-line Newspapers, Logos
Google Scholar
Gutschmidt A, Cap CH (2008) User behaviour under the microscope. In: Paper presented at the WEBIST 2008, proceedings of the fourth international conference on web information systems and technologies, Funchal, Madeira, Portugal, May 4–7
Hayes AF, Krippendorff K (2007) Answering the call for a standard reliability measure for coding data. Commun Methods Meas 1:77–89
Article Google Scholar
Hei X, Liang C, Liang J, Liu Y (2007) A measurement study of a large-scale P2P IPTV system. IEEE Trans Multimed 9(8):1672–1687
Article Google Scholar
Herder E (2007) An analysis of user behaviour on the web. VDM Verlag Dr. Müller e. K. und Lizenzgeber, Saarbrüchen
Infogineering (2014) Understanding Information Overload. http://www.infogineering.net/understanding-information-overload.htm. Accessed 20 Dec 2014
Karahasanovic A, Lüders M, Terradillos E, Alejandro M, Rodríguez J, Núñez JM, Flórez DR (2012) Insight into usage of multimedia streaming services. IADIS Int J WWW/Internet 10(1):18
Google Scholar
Krippendorff K (2004) Content analysis: an introduction to its methodology. Sag, Thousand Oaks
Google Scholar
Kunert T (2009) User tasks and requirements for iTV applications. In: In user- centered interaction design patterns for interactive digital television applications, human-computer interaction series, Springer London, pp 85–98
Pace S (2003) A grounded theory of flow experiences of Web users. Int J Human-Comput Interact 60(2004):327–363
Google Scholar
Rautiainen M, Heikkinen A, Sarvanko J, Chorianopoulos K, Kostakos V, Ylianttila M (2013) Time shifting patterns in browsing and search behavior for catch-up TV on the web. Paper presented at the Conference: 11th European conference on interactive TV and video (EuroITV’13)
Sacchi S, Burigo M (2008) Strategies in the information search process: interaction among task structure. Knowl Sour J Gen Psychol 135(3):252–270
Article Google Scholar
Stenmark D (2008) Identifying clusters of user behaviour in intranet search engine log files. J Am Soc Inform Sci Technol 59(14):2232–2243
Article Google Scholar
UOK (2009) University of Kent, Digital Antropology Report 2009. http://www.antropologi.info/blog/anthropology/2009/digital-anthropology-report
Vilas M, Paneda XG, Garcia R, Melendi D, Garcia VG (2005) User behavior analysis of a video-on-demand service with a wide variety of subjects and lengths In: software engineering and advanced applications, 2005. 31st EUROMICRO conference on 30 Aug–3 Sept. 2005. IEEE, pp 330–337. doi:10.1109/EUROMICRO.2005.63
YouTube (2015) Statistics. https://www.youtube.com/yt/press/statistics.html
Yu H, Zheng D, Zhao BY, Zheng W (2006) Understanding user behaviour in large-scale video-on-demand systems. In: 1st ACM SIGOPS/EuroSys European conference on computer systems 2006 Leuven, Belgium, April 18–21 2006. ACM, pp 333–344

Download references

Acknowledgments

This research is funded by the VERDIKT programme of the Research Council of Norway (CELTIC research project R2D2 Networks, Contract Nr. 193018) and by the Center for Service Innovation (Norwegian Research Council). We would like to thank all the participants in our study as well as our project partners. We thank Ida Maria Haugstveit, Maria Borén, Karen Ranestad and Ragnhild Halvorsrud for their help in the experiment conduct and to the anonymous reviewers for their useful suggestions.

Author information

Authors and Affiliations

SINTEF ICT, Oslo, Norway
Amela Karahasanović & Jan Heim

Authors

Amela Karahasanović
View author publications
You can also search for this author in PubMed Google Scholar
Jan Heim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amela Karahasanović.

Appendix

Scoring reliability

The results from applying the Krippendorff KALPHA procedure on four participants were scored by two judges.

The analysis was based on [14]. The first four participants with complete log files provided the data for the reliability analysis. Two coders used an initial version of the coding scheme. In the final analysis a slight modification of this scheme was used. Each participant completed a total of 10 tasks, including warm-up tasks. The coding scheme had a total of 48 categories. Each category could be coded zero, one or several times within each task. The basic data were the number of times a specific category was used within a task. Each participant had total of 480 scores. Analysing the four participants together gave a data set of 1920 values, scored by the two coders. As pointed out by Hayes and Krippendorff [14], “In its two-observer interval data version, alpha equals Pearson intraclass-correlation coefficient”.

In the analysis the measurement level was set at intervals (level set to 3 in the analysis). If one of the judges had used the category zero times within the task, that unit was left out of the analysis. Bootstraps were set to 1000.

Krippendorff alpha was 0.8727. As pointed out by Krippendorff [19] pp. 241–243, “social scientists commonly rely on data with reliabilities α ≥ 0.800, consider data with 0.800 > α ≥ 0.667 only to draw tentative conclusions, and discard data whose agreement measures α < 0.667” (Table 7).

Table 8 lists the results from the questionnaire the participants completed before the test started. Tables 9 and 10 show the differences between the participants with and without valid log data. The nonparametric “median test” was chosen since most of the data distributions were rather skewed. As expected, there were no significant differences between the two groups, although there was a slight tendency that, proportionally, more females than males had valid log data. In Table 9, one can see, for example, that all the participants with valid data went on Facebook daily, while only one of those without valid data did not.

Table 7 Sequence elements

Full size table

Table 8 Frequency of demographic characteristics and Web usage

Full size table

Table 9 Category of participation, compared to questionnaire responses

Full size table

Table 10 Median test of significance for questionnaire answers and category of participation—With valid log data versus without valid log data

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karahasanović, A., Heim, J. Understanding the behaviour of online TV users. Pers Ubiquit Comput 19, 839–852 (2015). https://doi.org/10.1007/s00779-015-0865-9

Download citation

Received: 30 November 2014
Accepted: 06 May 2015
Published: 25 June 2015
Issue Date: August 2015
DOI: https://doi.org/10.1007/s00779-015-0865-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Understanding the behaviour of online TV users

Abstract

Similar content being viewed by others

Investigating Motivational Factors Influencing Users’ Consumption of Video Streaming Services: A Human Factor Perspective

What content and context factors lead to selection of a video clip? The heuristic route perspective

Relations Between Actions Performed by Users and Their Engagement

1 Introduction

2 Related work

3 Experimental design

3.1 Participants and setting

3.2 Treatment and experiment procedures

3.3 Data collection and supporting tools

3.4 Tasks

4 Analysis

4.1 Identification of strategies from feedback collection

4.2 Identification of strategies from users’ actions

5 Results

5.1 Strategies from feedback collection

5.2 Strategies, tasks and background information

5.3 Strategies and variables

6 Discussion

6.1 Limitations

7 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Understanding the behaviour of online TV users

Abstract

Similar content being viewed by others

Investigating Motivational Factors Influencing Users’ Consumption of Video Streaming Services: A Human Factor Perspective

What content and context factors lead to selection of a video clip? The heuristic route perspective

Relations Between Actions Performed by Users and Their Engagement

1 Introduction

2 Related work

3 Experimental design

3.1 Participants and setting

3.2 Treatment and experiment procedures

3.3 Data collection and supporting tools

3.4 Tasks

4 Analysis

4.1 Identification of strategies from feedback collection

4.2 Identification of strategies from users’ actions

5 Results

5.1 Strategies from feedback collection

5.2 Strategies, tasks and background information

5.3 Strategies and variables

6 Discussion

6.1 Limitations

7 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation