Advertisement

Machine Learning

, Volume 95, Issue 3, pp 381–421 | Cite as

Modeling topic control to detect influence in conversations using nonparametric topic models

  • Viet-An NguyenEmail author
  • Jordan Boyd-Graber
  • Philip Resnik
  • Deborah A. Cai
  • Jennifer E. Midberry
  • Yuanxin Wang
Article

Abstract

Identifying influential speakers in multi-party conversations has been the focus of research in communication, sociology, and psychology for decades. It has been long acknowledged qualitatively that controlling the topic of a conversation is a sign of influence. To capture who introduces new topics in conversations, we introduce SITS—Speaker Identity for Topic Segmentation—a nonparametric hierarchical Bayesian model that is capable of discovering (1) the topics used in a set of conversations, (2) how these topics are shared across conversations, (3) when these topics change during conversations, and (4) a speaker-specific measure of “topic control”. We validate the model via evaluations using multiple datasets, including work meetings, online discussions, and political debates. Experimental results confirm the effectiveness of SITS in both intrinsic and extrinsic evaluations.

Keywords

Bayesian nonparametrics Influencer detection Topic modeling Topic segmentation Gibbs sampling 

1 Influencing conversations by controlling the topic

Conversation, interactive discussion between two or more people, is one of the most essential and common forms of communication in our daily lives.1 One of the many functions of conversations is influence: having an effect on the belief, opinions or intentions of other conversational participants. Using multi-party conversations to study and identify influencers, the people who influence others, has been the focus of researchers in communication, sociology, and psychology (Katz and Lazarsfeld 1955; Brooke and Ng 1986; Weimann 1994), who have long acknowledged that there is a correlation between the conversational behaviors of a participant and how influential he or she is perceived to be by others (Reid and Ng 2000).

In an early study on this topic, Bales (1970) argues “To take up time speaking in a small group is to exercise power over the other members for at least the duration of the time taken, regardless of the content.” This statement asserts that structural patterns such as speaking time and activeness of participation are good indicators of power and influence in a conversation. Participants who talk most during a conversation are often perceived as having more influence (Sorrentino and Boutiller 1972; Regula and Julian 1973; Daley et al. 1977; Ng et al. 1993), more leadership ability (Stang 1973; Sorrentino and Boutiller 1972), more dominance (Palmer 1989; Mast 2002) and more control of the conversation (Palmer 1989). Recent work using computational methods also confirms that structural features such as number of turns and turn length are among the most discriminative features to classify whether a participant is influential or not (Rienks et al. 2006; Biran et al. 2012).

However, it is wrong to take Bales’s claim too far; the person who speaks loudest and longest is not always the most powerful. In addition to structural patterns, the characteristics of language used also play an important role in establishing influence and controlling the conversation (Ng and Bradac 1993). For example, particular linguistic choices such as message clarity, powerful and powerless language (Burrel and Koper 1998), and language intensity (Hamilton and Hunter 1998) in a message can increase influence. More recently, Huffaker (2010) showed that linguistic diversity expressed by lexical complexity and vocabulary richness has a strong relationship with leadership in online communities. To build a classifier to detect influencers in written online conversations, Biran et al. (2012) also propose to use a set of content-based features to capture various participants’ conversational behaviors, including persuasion and agreement/disagreement.

Among many studied behaviors, topic control and management is considered one of the most effective ways to control the conversation (Planalp and Tracy 1980). Palmer (1989) shows that the less related a participants’ utterances are to the immediate topic, the more dominant they are, and then argues, “the ability to change topical focus, especially given strong cultural and social pressure to be relevant, means having enough interpersonal power to take charge of the agenda.” Recent work by Rienks et al. (2006) also shows that topic change, among other structural patterns discussed above, is the most robust feature in detecting influencers in small group meetings.

In this article, we introduce a new computational model capturing the role of topic control in participants’ influence of conversations. Speaker Identity for Topic Segmentation (SITS), a hierarchical Bayesian nonparametric model, uses an unsupervised statistical approach which requires few resources and can be used in many domains without extensive training and annotation. More important, SITS incorporates an explicit model of speaker behavior by characterizing quantitatively individuals’ tendency to exercise control over the topic of conversation (Sect. 3). By focusing on topic changes in conversations, we go beyond previous work on influencers in two ways:
  • First, while structural statistics such as number of turns, turn length, speaking time etc are relatively easy to extract from a conversation, defining and detecting topic changes is less well understood. Topic, by itself, is a complex concept (Blei et al. 2003; Kellermann 2004). In addition, despite the large number of techniques proposed trying to divide a document into smaller, topically coherent segments (Purver 2011), topic segmentation is still an open research problem. Most previous computational methods for topic discovery and topic segmentation focus on content, ignoring the speaker identities. We show that we can capture conversational phenomena and influence better by explicitly modeling behaviors of participants.

  • Second, the conversation is often controlled explicitly, to some extent, by a subset of participants. For example, in political debates questions come from the moderator(s), and candidates typically have a fixed time to respond. These imposed aspects of conversational structure decrease the value of more easily extracted structural statistics for a variety of conversation types; observe, for example, that similar properties of the conversation can also be observed when looking at hosts and guests in televised political discussion shows such as CNN’s Crossfire.

Applying SITS on real-world conversations (Sect. 4), we show that this modeling approach is not only more effective than previous methods on traditional topic segmentation (Sect. 5), but also more intuitive in that it is able to capture an important behavior of individual speakers during conversations (Sect. 6). We then show that using SITS to model topic control improves influencer detection (Sect. 7). Taking quantitative and qualitative analysis together, the pattern of results suggests that our approach holds significant promise for further development; we discuss directions for future work in Sect. 8.

2 What is an influencer?

2.1 Influencer definition

In most research on persuasion and power, an influencer attempts to gain compliance from others or uses tactics to shape the opinions, attitudes, or behaviors of others (Scheer and Stern 1992; Schlenker et al. 1976). In research on social media, such as blogs and Twitter, measurements such as the number of followers or readers serve as a proxy for influence (Alarcon-del Amo et al. 2011; Booth and Matic 2011; Trammell and Keshelashvili 2005). Others have studied what influencers say; Drake and Moberg (1986) demonstrated that linguistic influence differs from attempts to influence that rely on power and exchange relationships. In interactions with targets, influencers may rely more on linguistic frames and language than on resources offered, which is proposed as the requirement for influence by exchange theorists (Blau 1964; Foa and Foa 1972; Emerson 1981).

We define an influencer as someone who has persuasive ability over where an interaction is headed, what topics are covered, and what positions are espoused within that interaction. In the same way that persuasion shapes, reinforces, or changes attitudes or beliefs, an influencer shapes, reinforces, or changes the direction of the interaction. An influencer within an interaction is someone who may introduce new ideas or arguments into the conversation that others pick up on and discuss (shapes new directions through topic shift), may express arguments about an existing topic that others agree to and further in the discussion (i.e., reinforces the direction), or may provide counter-arguments that others agree to and perpetuate, thereby redirecting where the topic of conversation is headed (i.e., changes the direction of the conversation).

2.2 Data scope and characteristics

We are interested in influence in turn-taking, multiparty discussions. This is a broad category including political debates, business meetings, online chats, discussions, conference panels, and many TV or radio talk shows. More formally, such datasets contain C conversations. A conversation c has T c turns, each of which is a maximal uninterrupted utterance by one speaker.2 In each turn t∈[1,T c ], a speaker a c,t utters N c,t words w c,t ={w c,t,n n∈[1,N c,t ]}. Each word is from a vocabulary of size V, and there are M distinct speakers.

3 Modeling topic shift

In this section, we describe SITS, a hierarchical nonparametric Bayesian model for topic segmentation that takes into consideration speaker identities, allowing us to characterize speakers’ topic control behavior over the course of the discussion (Nguyen et al. 2012). We begin with an overview of the topic segmentation problem and some related work. We then highlight the differences between SITS and previous approaches and describe the generative process and the inference technique we use to estimate the model.

3.1 Topic segmentation and modeling approaches

Whether in an informal situation or in more formal settings such as a political debate of business meeting, a conversation is often not about just one thing: topics evolve and are replaced as the conversation unfolds. Discovering this hidden structure in conversations is a key problem for building conversational assistants (Tur et al. 2010) and developing tools that summarize (Murray et al. 2005) and display (Ehlen et al. 2007) conversational data. Understanding when and how the topics change also helps us study human conversational behaviors such as individuals’ agendas (Boydstun et al. 2013), patterns of agreement and disagreement (Hawes et al. 2009; Abbott et al. 2011), relationships among conversational participants (Ireland et al. 2011), and dominance and influence among participants (Palmer 1989; Rienks et al. 2006).

One of the most natural ways to capture conversational structure is topic segmentation—the task of “automatically dividing single long recordings or transcripts into shorter, topically coherent segments” (Purver 2011). There are broadly two basic approaches previous work has used to tackle this problem. The first approach focuses on identifying discourse markers which distinguish topical boundaries in the conversations. There are certain cue phrases such as well, now, that reminds me, etc. that explicitly indicate the end of one topic or the beginning of another (Hirschberg and Litman 1993; Passonneau and Litman 1997). These markers can also serve as features for a discriminative classifier (Galley et al. 2003) or observed variables in generative model (Dowman et al. 2008). However, in practice the discourse markers that are most indicative of topic change often depend heavily on the domain of the data (Purver 2011). This drawback makes methods solely relying on these markers difficult to adapt to new domains or settings.

Our method follows the second general approach, which relies on the insight that topical segments evince lexical cohesion (Halliday and Hasan 1976). Intuitively, words within a segment will look more like their neighbors than like words in other segments. This has been a key idea in previous work. Morris and Hirst (1991) try to determine the structure of text by finding “lexical chains” which consists of units of text that are about the same thing. The often used text segmentation algorithm TextTiling (Hearst 1997) exploits this insight to compute the lexical similarity between adjacent sentences. More recent improvements to this approach include using different lexical similarity metrics like lsa (Choi et al. 2001; Olney and Cai 2005) and improving feature extraction for supervised methods (Hsueh et al. 2006). It also inspires unsupervised models using bags of words (Purver et al. 2006), language models (Eisenstein and Barzilay 2008), and shared structure across documents (Chen et al. 2009).

We also use lexical cohesion using a probabilistic topic modeling method (Blei et al. 2003; Blei 2012). The approach we take is unsupervised, so it requires few resources and is applicable in many domains without extensive training. Following the literature on topic modeling, we define each topic as a multinomial distribution over the vocabulary. Like previous generative models proposed for topic segmentation (Purver et al. 2006), each turn is considered a bag of words generated from an admixture of topics and topics are shared across different turns within a conversation or across different conversations.3 In addition, we take a Bayesian nonparametric approach (Müller and Quintana 2004) to allow the number of topics to be unbounded, in order to better represent the observed data.

The settings described above are still consistent with those in popular topic models such as latent Dirichlet allocation (Blei et al. 2003, lda) or hierarchical Dirichlet processes (Teh et al. 2006, hdp), in which turns in a conversation are considered independent. In practice, however, this is not the case. Obviously the topics of a turn at time t are highly correlated with those of the turn at t+1. To address this issue, there have been several recent attempts trying to capture the temporal dynamics within a document. Du et al. (2010) propose Sequential lda to study how topics within a document evolve over its structure. It uses the nested two-parameter Poisson Dirichlet process (pdp) to model the progressive dependency between consecutive part of a document, which can capture the continuity of topical flow in a document nicely but does not capture the topic change explicitly. Fox et al. (2008) proposed Sticky hdp-hmm, which is an extension of hdp-hmm (Teh et al. 2006) for the problem of speaker diarization involving segmenting an audio recording into intervals associated with individual speakers. Applying to the conversational setting, Sticky hdp-hmm associates each turn with a single topic; this is a strong assumption since people tend to talk about more than one thing in a turn, especially in political debates. We will, however, use it as one of the baselines in our topic segmentation experiment (Sect. 5). A related problem is to discover how topics themselves change over time (Blei and Lafferty 2006; Wang et al. 2008; Ren et al. 2008; Ahmed and Xing 2008, 2010), e.g., documents that talk about “physics” in 1900 will use very different terms than “physics” in 2000. These models assume documents are much longer and that topics evolve much more slowly than in a conversation.

Moreover, many of these methods do not explicitly model the changes of the topics within a document or conversation. To address this, we endow each turn with a binary latent variable l c,t , called the topic shift indicator (Purver et al. 2006). This latent variable signifies whether in this turn the speaker changed the topic of the conversation. In addition, to capture the topic-controlling behavior of the speakers across different conversations, we further associate each speaker m with a latent topic shift tendency denoted by π m . Informally, this variable is intended to capture the propensity of a speaker to effect a topic shift. Formally, it represents the probability that the speaker m will change the topic (distribution) of a conversation. In the remainder of this section, we will describe the model in more detail together with the inference techniques we use.

3.2 Generative process of SITS

SITS is a generative model of multiparty discourse that jointly discovers topics and speaker-specific topic shifts from an unannotated corpus (Fig. 1a). As in the hierarchical Dirichlet process (Teh et al. 2006), we allow an unbounded number of topics to be shared among the turns of the corpus. Topics are drawn from a base distribution H over multinomial distributions over the vocabulary of size V; H is a finite Dirichlet distribution with symmetric prior λ. Unlike the hdp, where every document (here, every turn) independently draws a new multinomial distribution from a Dirichlet process, the social and temporal dynamics of a conversation, as specified by the binary topic shift indicator l c,t , determine when new draws happen.
Fig. 1

Plate diagrams of our proposed models: (a) nonparametric SITS; (b) parametric SITS. Nodes represent random variables (shaded nodes are observed); lines are probabilistic dependencies. Plates represent repetition. The innermost plates are turns, grouped in conversations

Generative process

The formal generative process is:
  1. 1.

    For speaker m∈[1,M], draw speaker topic shift probability π m ∼Beta(γ)

     
  2. 2.

    Draw the global topic distribution G 0∼DP(α,H)

     
  3. 3.
    For each conversation c∈[1,C]
    1. (a)

      Draw a conversation-specific topic distribution G c ∼DP(α 0,G 0)

       
    2. (b)
      For each turn t∈[1,T c ] with speaker a c,t
      1. i.

        If t=1, set the topic shift indicator l c,t =1. Otherwise, draw \(l _{c,t} \sim\mbox{Bernoulli} ({\pi_{a _{c,t}}})\).

         
      2. ii.

        If l c,t =1, draw G c,t ∼DP(α c ,G c ). Otherwise, set G c,t G c,t−1.

         
      3. iii.
        For each word index n∈[1,N c,t ]
        • Draw a topic ψ c,t,n G c,t

        • Draw a token w c,t,n ∼Multinomial(ψ c,t,n ).

         
       
     
The hierarchy of Dirichlet processes allows statistical strength to be shared across contexts; within a conversation and across conversations. The per-speaker topic shift tendency π m allows speaker identity to influence the evolution of topics.

Intuitively, SITS generates a conversation as follows: At the beginning of a conversation c, the first speaker a c,1 draws a distribution over topics G c,1 from the base distribution, and uses that topic distribution to draw a topic ψ c,1,n for each token w c,1,n . Subsequently, at turn t, speaker a c,t will first flip a speaker-specific biased coin \(\pi_{a _{c,t}}\) to decide whether a c,t will change the topic of the conversation. If the coin comes up tails (l c,t =0), a c,t will not change the conversation topic and uses the previous turn’s topic distribution G c,t−1 to generate turn t’s tokens. If, on the other hand, the coin comes up heads (l c,t =1), a c,t will change the topic by drawing a new topic distribution G c,t from the conversation-specific collection of topics DP(α c ,G c ).

Segmentation notation

To make notation more concrete and to connect our model with topic segmentation, we introduce the notion of segments in a conversation. A segment s of conversation c is a sequence of turns [τ,τ′] such that
$$\left \{ \begin{array}{l} l _{c,\tau}= l _{c,{\tau' + 1}} = 1 \\ l _{c,t} = 0,\quad \forall t \in[\tau+ 1, \tau'] \end{array} \right . $$
When l c,t =0, G c,t is the same as G c,t−1 and all topics (i.e. multinomial distributions over words) {ψ c,t,n n∈[1,N c,t ]} that generate words in turn t and the topics {ψ c,t−1,nn′∈[1,N c,t−1]} that generate words in turn t−1 come from the same distribution. Thus, all topics used in a segment s are drawn from a single segment-specific probability measure G c,s ,
$$ G _{c,s} \mid l _{c,1}, l _{c,2}, \ldots, l _{c,{T_c}}, \alpha_c, G_c \sim\mbox{DP}({ \alpha_c}, {G_c}) $$
(1)
A visual illustration of these notations can be found in Fig. 2. For notational convenience, S c denotes the number of segments in conversation c, and s t denotes the segment index of turn t. We emphasize that all segment-related notations are derived from the posterior over the topic shifts l and not part of the model itself.
Fig. 2

Diagram of notation for topic shift indicators and conversation segments: Each turn is associated with a latent binary variable topic shift indicator l specifying whether the topic of the turn is shifted. In this example, topic shifts occur in turns τ and τ′+1. As a result, the topic shift indicators of turn τ and τ′+1 are equal to 1 (i.e. l c,τ =l c,τ′+1=1) and the topic shift indicators of all turns in between are 0 (i.e. l c,t =0,∀t∈[τ+1,τ′]). Turns [τ,τ′] form a segment s in which all topic distributions G c,τ ,G c,τ+1,…,G c,τ are the same and are denoted collectively as G c,s

3.3 Inference for SITS

To find the latent variables that best explain observed data, we use Gibbs sampling, a widely used Markov chain Monte Carlo inference technique (Neal 2000; Resnik and Hardisty 2010). The state space in our Gibbs sampler consists of the latent variables for topic indices assigned to all tokens z={z c,t,n } and topic shifts assigned to turns l={l c,t }. We marginalize over all other latent variables. For each iteration of the sampling process, we loop over each turn in each conversation. For a given turn t in conversation c, we first sample the topic shift indicator variable l c,t (Sect. 3.3.2) and then sample the topic assignment z c,t,n for each token in the turn (Sect. 3.3.1). Here, we only present the conditional sampling equations; for details on how these are derived, see the Appendix A.

3.3.1 Sampling topic assignments

In Bayesian nonparametrics, the Chinese restaurant process (crp) metaphor is often used to explain the clustering effect of the Dirichlet process (Ferguson 1973). The crp is an exchangeable distribution over partitions of integers, which facilitates Gibbs sampling (Neal 2000) (as we will see in (2)). When used in topic models, each Chinese restaurant consists of infinite number of tables, each of which corresponds to a topic. Customers, each of which corresponds to a token, are assigned to tables and if two tokens are assigned to the same table: they share the same topic.

The crp has a “rich get richer” property, which means that tables with many customers will attract yet more customers—a new customer will sit at an existing table with probability proportional to the number of customers currently at the table. The crp has no limit on the number of tables; when a customer needs to be seated, there is always a probability—proportional to the Dirichlet parameter α—that it will be seated at a new table. When a new table is formed, it is assigned a “dish”; this is a draw from the Dirichlet process’s base distribution. In a topic model, this atom associated with a new table is a multinomial distribution over word types. In a standard, non-hierarchical crp, this multinomial distribution comes from a Dirichlet distribution.

But it doesn’t have to—hierarchical nonparametric models extend the metaphor further by introducing a hierarchy of restaurants (Teh et al. 2006; Teh 2006), where the base distribution of one restaurant can be another restaurant. This is where things can get tricky. Instead of having a seating assignment, a customer now has a seating path and is potentially responsible for spawning new tables in every restaurant. In SITS there are restaurants for the current segment, the conversation, and the entire corpus, as shown in Fig. 3.
Fig. 3

Illustration of topic assignments in our inference algorithm. Each solid rectangle represents a restaurant (i.e., a topic distribution) and each circle represents a table (i.e., a topic). To assign token n of turn t in conversation c to a table z c,t,n in the corpus-level restaurant, we need to sample a path assigning the token to a segment-level table, the segment-level table to a conversation-level table and the conversation-level table to a globally shared corpus-level table

To sample z c,t,n , the index of the shared topic assigned to token n of turn t in conversation c, we need to sample the path assigning each word token to a segment-level table, each segment-level table to a conversation-level table and each conversation-level table to a shared dish. Before describing the sampling equations, we introduce notation denoting the counts:
  • N c,s,k : number of tokens in segment s in conversation c assigned to dish k

  • N c,k : number of segment-level tables in conversations c assigned to dish k

  • N k : number of conversation-level tables assigned to dish k.

Note that we use k to index the global topics shared across the corpus, each of which corresponds to a dish in the corpus-level restaurant. In general, computing the exact values of these counts makes bookkeeping rather complicated. Since there might be multiple tables at a lower-level restaurant assigned to the same table at the higher-level restaurant, to compute the correct counts, we need to sum the number of customers over all these tables. For example, in Fig. 3, since both ψ c,1 and ψ c,2 are assigned to ψ 0,2 (i.e., k=2), to compute N c,k we have to sum over the number of customers currently assigned to ψ c,1 and ψ c,2 (which are 4 and 2 respectively in this example).
To mitigate this problem of bookkeeping and to speed up the sampling process, we use the minimal path assumption (Cowans 2006; Wallach 2008) to generate the path assignments.4 Under the minimal path assumption, a new table in a restaurant is created only when there is no table already serving the dish. In other words in a restaurant, there is at most one table serving a given dish. A more detailed example of the minimal path assumption is illustrated in Fig. 4. Using this assumption, in the example shown in Fig. 3, ψ c,1 and ψ c,2 will be merged together since they are both assigned to ψ 0,2.
Fig. 4

Illustration of minimal path assumption. This figure shows an example of the seating assignments in a hierarchy of Chinese restaurants of a higher-level restaurant and a lower-level restaurant. Each table in the lower restaurant is assigned to a table in the higher restaurant and tables on the same path serve the same dish k. When sampling the assignment for table \(\psi^{L}_{2}\) in the lower restaurant, given that dish k=2 is assigned to this table, there are two options for how the table in the higher restaurant could be selected. It could be an existing table \(\psi^{H}_{2}\) or a new table \(\psi^{H}_{\mathit{new}}\), both serving dish k=2. Under the minimal path assumption, it is always assigned to an existing table (if possible) and only assigned to a new table if there is no table with the given dish. In this case, the minimal path assumption will assign \(\psi^{L}_{2}\) to \(\psi^{H}_{2}\)

Now that we have introduced our notations, the conditional distribution for z c,t,n is
$$\begin{aligned} &P\bigl(z _{c,t,n}\mid w _{c,t,n}, \boldsymbol{z}^{-{c, t, n}}, \boldsymbol{w}^{-{c, t, n}}, \boldsymbol{l}, *\bigr) \\ &\quad\propto P\bigl(z _{c,t,n} \mid\boldsymbol{z}^{-{c, t, n}}\bigr) P\bigl(w _{c,t,n} \mid z _{c,t,n}, \boldsymbol{w}^{-{c, t, n}}, \boldsymbol {l}, *\bigr) \end{aligned}$$
(2)
The first factor is the prior probability of assigning to a path according to the minimum path assumption (Wallach 2006, p. 60),
$$ P\bigl(z _{c,t,n} = k \mid\boldsymbol{z}^{-{c, t, n}}\bigr) \propto \frac{ N _{c,{s_t},k} ^{-{c, t, n}} + \alpha_c \frac{ N _{c,k} ^{-{c, t, n}} + \alpha_0 \frac{ N_k ^{-{c, t, n}} + \alpha\frac{1}{K^+}}{N_{\cdot} ^{-{c, t, n}} + \alpha}}{N _{c,\cdot}^{-{c, t, n}} + \alpha_0}}{ N _{c,{s_t},\cdot}^{-{c, t, n}} + \alpha_c}, $$
(3)
where K + is the current number of shared topics.5 Intuitively, (3) computes the probability of token w c,t,n being generated from a shared topic k. This probability is proportional to \(N _{c,{s_{t}},k}\)—the number of customers sitting at table serving dish k at restaurant \(G _{c,{s_{t}}}\), smoothed by the probability of generating this token from the table serving dish k at the higher-level restaurant (i.e., restaurant G c ). This smoothing probability is computed in the same hierarchical manner until the top restaurant is reached, where the base distribution over topics is uniform and the probability of picking a topic is equal to 1/K +. Equation (3) also captures the case where a table is empty; when the number of customers on that table is zero, the probability of generating the token from the corresponding topic relies entirely on the smoothing probability from the higher-level restaurant’s table.
The second factor is the data likelihood. After integrating out all ψ’s, we have
$$ P\bigl(w _{c,t,n} = w \mid z _{c,t,n} = k, \boldsymbol {w}^{-{c, t, n}}, \boldsymbol{l}, *\bigr) \propto \left \{ \begin{array}{l@{\quad}l} \frac{M _{k, w} ^{-{c, t, n}} + \lambda}{M _{k, \cdot}^{-{c, t, n}} + V\lambda}, & \hbox{if $k$ exists;} \\ \frac{1}{V}, & \hbox{if $k$ is new.} \end{array} \right . $$
(4)
Here, M k,w denotes the number of times word type w in the vocabulary is assigned to topic k; marginal counts are represented with ⋅ and ∗ represents all hyperparameters; V is the size of the vocabulary, and the superscript c,t,n denotes the same counts excluding w c,t,n .

3.3.2 Sampling topic shift indicators

Sampling the topic shift variable l c,t requires us to consider merging or splitting segments. We define the following notation:
  • k c,t : the shared topic indices of all tokens in turn t of conversation c.

  • \(S _{a _{c,t}, x}\): the number of times speaker a c,t is assigned the topic shift with value x∈{0,1}.

  • \(J^{x} _{c, s}\): the number of topics in segment s of conversation c if l c,t =x.

  • \(N^{x} _{c, s, j}\): the number of tokens assigned to the segment-level topic j when l c,t =x.6

Again, the superscript c,t is used to denote the exclusion of turn t of conversation c in the corresponding counts.
Recall that the topic shift is a binary variable. We use 0 to represent the “no shift” case, i.e. when the topic distribution is identical to that of the previous turn. We sample this assignment with the following probability:
$$\begin{aligned} & P\bigl(l _{c,t} = 0 \mid\boldsymbol{l}^{-{c, t}}, \boldsymbol {w}, \boldsymbol{k}, \boldsymbol{a}, \ast\bigr) \\ &\quad\propto \frac{S ^{-{c, t}} _{a _{c,t}, 0} + \gamma}{S ^{-{c, t}} _{a _{c,t}, \cdot}+ 2 \gamma} \times \frac{\alpha_c^{J^0 _{c, s_t}} \prod_{j=1}^{J^0 _{c, s_t}} (N^0 _{c, s_t, j} - 1)!}{\prod_{x=1}^{N^0 _{c, s_t, \cdot}} (x-1+\alpha_c)} \end{aligned}$$
(5)
In (5), the first factor is proportional to the probability of assigning a topic shift of value 0 to speaker a c,t and the second factor is proportional to the joint probability of all topics in segment s t of conversation c when l c,t =0.7
The other alternative is for the topic shift to be 1, which represents the introduction of a new distribution over topics inside an existing segment. The probability of sampling this assignment is:
$$\begin{aligned} &P\bigl(l _{c,t} = 1 \mid\boldsymbol{l}^{-{c, t}}, \boldsymbol {w}, \boldsymbol{k}, \boldsymbol{a}, \ast\bigr) \\ &\quad\propto \frac{S ^{-{c, t}} _{a _{c,t}, 1} + \gamma}{S ^{-{c, t}} _{a _{c,t}, \cdot}+ 2 \gamma} \times \biggl( \frac{\alpha_c^{J^1 _{c, (s_{t}-1)}} \prod_{j=1}^{J^1 _{c, (s_{t}-1)}} (N^1 _{c, (s_{t}-1), j} - 1)!}{\prod_{x=1}^{N^1 _{c, (s_{t}-1), \cdot}} (x-1+\alpha_c)} \frac{\alpha_c^{J^1 _{c, s_{t}}} \prod_{j=1}^{J^1 _{c, s_{t}}} (N^1 _{c, s_{t}j} - 1)!}{\prod_{x=1}^{N^1 _{c, s_{t}, \cdot}} (x-1+\alpha_c)} \biggr) \end{aligned}$$
(6)
As above, the first factor in (6) is proportional to the probability of assigning a topic shift of value 1 to speaker a c,t ; the second factor in the big bracket is proportional to the joint distribution of the topics in segments s t −1 and s t . In this case, l c,t =1 means splitting the current segment, which results in two joint probabilities for two segments.

4 Data and annotations

We validate our approach using five different datasets (Table 1). In this section, we describe the properties of each of the datasets and what information is available from the data. The datasets with interesting existing annotations typically are small and specialized. After validating our approach on simpler datasets, we move to larger datasets that we can explore qualitatively or by annotating them ourselves.
Table 1

Summary of datasets detailing how many distinct speakers are present, how many distinct conversations are in the corpus, the annotations available, and the general content of the dataset. The † marks datasets we annotated

Datasets

Speakers

Conversations

Annotations

Content

icsi Meetings

60

75

Topics

Engineering

2008 Debates

9

4

Topics

Politics

2012 Debates

40

9

None

Politics

Crossfire

2567

1134

Influencer

Politics

Wikipedia discussions†

604

1991

Influencer

Varied

4.1 Datasets

We first describe the datasets that we use in our experiments. For all datasets, we tokenize texts using Opennlp’s tokenizer and remove common stopwords.8 After that, we remove turns that are very short since they do not contain much information content-wise and most likely there is no topic shift during these turns. We empirically remove turns that have fewer than 5 tokens after removing stopwords.

The icsi meeting corpus

The icsi Meeting Corpus consists of 75 transcribed meetings at the International Computer Science Institute in Berkeley, California (Janin et al. 2003). Among these, 25 meetings were annotated with reference segmentations (Galley et al. 2003). Segmentations are binary, i.e., each point in the document is either a segment boundary or not, and on average each meeting has 8 segment boundaries. We use this dataset for evaluating topic segmentation (Sect. 5). After preprocessing, there are 60 unique speakers and the vocabulary contains 3346 non-stopword tokens.

The 2008 presidential election debates

Our second dataset contains three annotated presidential debates between Barack Obama and John McCain and a vice presidential debate between Joe Biden and Sarah Palin (Boydstun et al. 2013). Each turn is one of two types: questions (Q) from the moderator or responses (R) from a candidate. Each clause in a turn is coded with a Question Topic Code (T Q ) and a Response Topic Code (T R ). Thus, a turn has a list of T Q ’s and T R ’s both of length equal to the number of clauses in the turn. Topics are from the Policy Agendas Topics Codebook, a widely used inventory containing codes for 19 major topics and 225 subtopics.9 Table 2 shows an example annotation.
Table 2

Example turns from the annotated 2008 election debates (Boydstun et al. 2013). Each clause in a turn is coded with a Question Topic Code (T Q ) and a Response Topic Code (T R ). The topic codes (T Q and T R ) are from the Policy Agendas Topics Codebook. In this example, the following topic codes are used: Macroeconomics (1), Housing & Community Development (14), Government Operations (20)

Speaker

Type

Turn clauses

T Q

T R

Brokaw

Q

Sen. Obama, time for a discussion. I’m going to begin with you. Are you saying to Mr. Clark and to the other members of the American television audience that the American economy is going to get much worse before it gets better and they ought to be prepared for that?

1

N/A

Obama

R

No, I am confident about the American economy.

1

1

But most importantly, we’re going to have to help ordinary families be able to stay in their homes, make sure that they can pay their bills, deal with critical issues like health care and energy, and we’re going to have to change the culture in Washington so that lobbyists and special interests aren’t driving the process and your voices aren’t being drowned out.

1

14

Brokaw

Q

Sen. McCain, in all candor, do you think the economy is going to get worse before it gets better?

1

N/A

McCain

R

I think if we act effectively, if we stabilize the housing market—which I believe we can,

1

14

if we go out and buy up these bad loans, so that people can have a new mortgage at the new value of their home.

1

14

I think if we get rid of the cronyism and special interest influence in Washington so we can act more effectively.

1

20

To obtain reference segmentations in debates, we assign each turn a real value from 0 to 1 indicating how much a turn changes the topic. For a question-typed turn, the score is the fraction of clause topic codes not appearing in the previous turn; for response-typed turns, the score is the fraction of clause topic codes that do not appear in the corresponding question. This results in a set of non-binary reference segmentations. For evaluation metrics that require binary segmentations, we create a binary segmentation by labeling a turn as a segment boundary if the computed score is 1. This threshold is chosen to include only true segment boundaries. After preprocessing, this dataset contains 9 unique speakers and the vocabulary contains 1,761 non-stopword tokens.

The 2012 republican primary debates

We also downloaded nine transcripts in the 2012 Republican Party presidential debates, whose information is shown in Table 3. Since the transcripts are pulled from different sources, we perform a simple entity resolution step using edit distance to merge duplicate participants’ names. For example, “Romney”, “Mitt Romney” are resolved into “Romney”; “Paul”, “Rep. Paul”, “Representative Ron Paul R-TX” are resolved into “Paul” etc. We also merge anonymous participants such as “Unidentified Female”, “Unidentified Male”, “Question”, “Unknown” etc. into a single participant named “Audience”. After preprocessing, there are 40 unique participants in these 9 debates including candidates, moderators and audience members. This dataset is not annotated and we only use it for qualitative evaluation.
Table 3

List of the 9 Republican Party presidential debates used

Date

Place

Sponsor

Participants

13 Jun. 2011

Goffstown, NH

CNN

Bachmann, Cain, Gingrich, Paul, Pawlenty, Romney, Santorum

12 Sep. 2011

Tampa, FL

CNN

Bachmann, Cain, Gingrich, Huntsman, Paul, Perry, Romney, Santorum

18 Oct. 2011

Las Vegas, NV

CNN

Bachmann, Cain, Gingrich, Paul, Perry, Romney, Santorum

09 Nov. 2011

Rochester, MI

CNBC

Bachmann, Cain, Gingrich, Huntsman, Paul, Perry, Romney, Santorum

22 Nov. 2011

Washington, DC

CNN

Bachmann, Cain, Gingrich, Huntsman, Paul, Perry, Romney, Santorum

19 Jan. 2012

Charleston, SC

CNN

Gingrich, Paul, Romney, Santorum

23 Jan. 2012

Tampa, FL

NBC

Gingrich, Paul, Romney, Santorum

26 Jan. 2012

Jacksonville, FL

CNN

Gingrich, Paul, Romney, Santorum

22 Feb. 2012

Mesa, AZ

CNN

Gingrich, Paul, Romney, Santorum

CNN’s crossfire

Crossfire was a weekly U.S. television “talking heads” program engineered to incite heated arguments (hence the name). Each episode features two recurring hosts, two guests, and clips from the week’s news. Our Crossfire dataset contains 1134 transcribed episodes aired between 2000 and 2004.10 There are 2567 unique speakers and the vocabulary size is 16,791. Unlike the previous two datasets, Crossfire does not have explicit topic segmentations, so we use it to explore speaker-specific characteristics (Sect. 6.2).

Wikipedia discussions

Each article on Wikipedia has a related discussion page so that the individuals writing and editing the article can discuss the content, editorial decisions, and the application of Wikipedia policies (Butler et al. 2008). Unlike the other situations, Wikipedia discussions are not spoken conversations that have been transcribed. Instead, these conversations are written asynchronously.

However, Wikipedia discussions have much of the same properties as our other corpora. Contributors have different levels of responsibility and prestige, and many contributors are actively working to persuade the group to accept their proposed policies (for an example, see Table 4), other contributors are attempting to maintain civility, and other contributors are attacking their ostensible collaborators.
Table 4

Example of a Wikipedia discussion in our dataset

A:

The current lead sentence has been agreed upon by many—I know, I was embroiled in the huge debate that developed into the current lead. However, the sentence is still kinda awkward—even though it captures the broader essence of evolutionary theory. I would like to propose an alternate (below), because there is a problem with the way that the term change is used, as Kirk J. Fitzhugh has noted: “Change is not the pertinent quality of interest in evolution”. Hence: Evolution is the gradual departure across successive generations in the constituency of the inherited characteristics of organisms in biological populations.

B:

No thank you, this is just more obscurantism.

A:

It’s wp:V, not obscurantism, consistent with the history of the science. Not much thought goes into conceiving that “Evolution is change”, but if you are asked to think past this and call it obscurantism in your critique, it is a strange response. Obscurantism: “is the practice of deliberately preventing the facts or the full details of some matter from becoming known”—ironic that this applies more aptly to your rejection.

B:

Your obsession with providing the most scientifically accurate and current definition of evolution prevents the average reader from having a chance at understanding this article. That is obscurantism. It is not WPV, because that definition is not by a longshot the most commonly used, and specifically it is entirely unsuited for works meant to be read by lay readers.

C:

This is a general encyclopedia, not a graduate level evolutionary biology course. Keeping it simple so that people can understand what we write without having an advanced degree is a good thing. So no, let’s keep the lead as is.

Unlike spoken conversations, Wikipedia discussions lack social norms that prevent an individual from writing as often or as much as they want. This makes common techniques such as counting turns or turn lengths less helpful measures to discover who influencers are.

4.2 Influencer annotation

Our goal is to discover who are the influencers in these discussions. To assess our ability to discover influencers, we annotated randomly selected documents from both the Wikipedia and Crossfire datasets. This process proceeded as follows. First, we followed the annotation guidelines for influencers proposed by Bender et al. (2011) for Wikipedia discussion. A discussant is considered an influencer if he or she initiated a topic shift that steered the conversation in a different direction, convinced others to agree to a certain viewpoint, or used an authoritative voice that caused others to defer to or reference that person’s expertise. A discussant is not identified as an influencer if he or she merely initiated a topic at the start of a conversation, did not garner any support from others for the points he or she made, or was not recognized by others as speaking with authority. After annotating an initial set of documents, we revised our annotation guidelines and retrained two independent annotators until we reached an intercoder reliability Cohen’s Kappa (Artstein and Poesio 2008) of 0.8.11

Wikipedia discussions

Coders first learned to annotate transcripts using Wikipedia discussion data. The two coders annotated over 400 English Wikipedia discussion transcripts for influencer in batches of 20 to 30 transcripts each week. For the English transcripts, each coder annotated the transcripts independently, then annotations were compared for agreement; any discrepancies in the annotations were resolved through discussion of how to apply the coding scheme. After the first four sets of 20 to 30 transcripts, the coders were able to code the transcripts with acceptable intercoder reliability (Cohen’s Kappa >0.8). Once the coders reached acceptable intercoder reliability for two sets of English data in a row, the coders began independently coding the remaining set of transcripts. Intercoder reliability was maintained at an acceptable level (Cohen’s Kappa >0.8) for the English transcripts over the subsequent weeks of coding.

Crossfire

We then turned our attention to the Crossfire dataset. We split each Crossfire episode into smaller segments using the “Commercial_Break” tags and use each segment as a unit of conversation. The same two coders annotated the Crossfire data. To prepare for annotating the Crossfire interactions, the coders both annotated the same set of 20 interactions. First the intercoder reliability Cohen’s Kappa was calculated for the agreement between the coders, then any disagreements between the coders were resolved through discussion about the discrepant annotations. The first set of 20 transcripts was coded with a Cohen’s Kappa of 0.65 (before discussion). This procedure was repeated twice; each time the coders jointly annotated 20 transcripts, reliability was calculated, and any discrepancies were resolved through discussion. The third set achieved an acceptable Cohen’s Kappa of 0.8. The remaining transcripts were then split and annotated separately by the two coders. In all, 105 Crossfire episode segments were annotated. An annotation guideline for Crossfire is included in the Appendix B.

5 Evaluating topic segmentation

In this section, we examine how well SITS can identify when new topics are introduced, i.e., how well it can segment conversations. We discuss metrics for evaluating an algorithm’s segmentation relative to a gold annotation, describe our experimental setup, and report those results.

5.1 Experiment setups

Evaluation metrics

To evaluate the performance on topic segmentation, we use P k  (Beeferman et al. 1999) and WindowDiff (WD) (Pevzner and Hearst 2002). Both metrics measure the probability that two points in a document will be incorrectly separated by a segment boundary. Both techniques consider all windows of size k in the document and count whether the two endpoints of the window are (im)properly segmented against the gold segmentation. More formally, given a reference segmentation \(\mathcal{R}\) and a hypothesized segmentation \(\mathcal{H}\), the value of P k for a given window size k is defined as follow:
$$ P_k = \frac{\sum_{i=1}^{N-k} \delta_{\mathcal{H}}(i, i+k) \oplus \delta _{\mathcal{R}}(i, i+k)}{N - k} $$
(7)
where \(\delta_{\mathcal{X}}(i,j)\) is 1 if the segmentation \(\mathcal {X}\) assigns i and j to the same segment and 0 otherwise; ⊕ denotes the Xor operator; N is the number of candidate boundaries.
WD improves P k by considering how many boundaries lie between two points in the document, instead of just looking at whether the two points are separated or not. WD of size k between two segmentations \(\mathcal{H}\) and \(\mathcal{R}\) is defined as:
$$ \mbox{WD} = \frac{\sum_{i=1}^{N-k} [|b_{\mathcal{H}}(i,i+k) - b_{\mathcal{R}}(i, i+k)| > 0 ]}{N-k} $$
(8)
where \(b_{\mathcal{X}}(i,j)\) counts the number of boundaries that the segmentation \(\mathcal{X}\) puts between two points i and j.

However, these metrics have a major drawback. They require both hypothesized and reference segmentations to be binary. Many algorithms (e.g., probabilistic approaches) give non-binary segmentations where candidate boundaries have real-valued scores (e.g., probability or confidence). Thus, evaluation requires arbitrary thresholding to binarize soft scores. In previous work, to be fair for all methods, thresholds are usually set so that the number of segments is equal to a predefined value (Purver et al. 2006; Galley et al. 2003). In practice, this value is usually unknown.

To overcome these limitations, we also use Open image in new window (Pele and Werman 2008), a variant of the Earth Mover’s Distance (emd). Originally proposed by Rubner et al. (2000), emd is a metric that measures the distance between two normalized histograms. Intuitively, it measures the minimal cost that must be paid to transform one histogram into the other. emdis a true metric only when the two histograms are normalized (e.g., two probability distributions). Open image in new window relaxes this restriction to define a metric for non-normalized histograms by adding or subtracting masses so that both histograms are of equal size.

Applied to our segmentation problem, each segmentation can be considered a histogram where each candidate boundary point corresponds to a bin. The probability of each point being a boundary is the mass of the corresponding bin. We use |ij| as the ground distance between two points i and j.12 To compute Open image in new window we use the Fastemd implementation (Pele and Werman 2009).

Experimental methods

We applied the following methods to discover topic segmentations in a conversation:
  • TextTiling (Hearst 1997) is one of the earliest and most widely used general-purpose topic segmentation algorithms, sliding a fixed-width window to detect major changes in lexical similarity.

  • P-NoSpeaker-single: parametric version of SITS without speaker identity, run individually on each conversation (Purver et al. 2006).

  • P-NoSpeaker-all: parametric version of SITS without speaker identity run on all conversations.

  • P-SITS: the parametric version of SITS with speaker identity run on all conversations.

  • NP-HMM: the HMM-based nonparametric model with speaker identity. This model uses the same assumption as the Sticky hdp-hmm (Fox et al. 2008), where a single topic is associated with each turn.

  • NP-SITS: the nonparametric version of SITS with speaker identity run on all conversations.

Parameter settings and implementation

In our experiment, all parameters of TextTiling are the same as in Hearst (1997). For statistical models, Gibbs sampling with 10 randomly initialized chains is used. Initial hyperparameter values are sampled from U(0,1) to favor sparsity; statistics are collected after 500 burn-in iterations with a lag of 25 iterations over a total of 5000 iterations; and slice sampling (Neal 2003) optimizes hyperparameters. Parametric models are run with 25, 50 and 100 topics and the best results (averaged over 10 chains) are reported.

5.2 Results and analysis

Table 5 shows the performance of various models on the topic segmentation problem, using the icsi corpus and the 2008 debates.
Table 5

Results on the topic segmentation task. Lower is better. The parameter k is the window size of the metrics P k and WindowDiff chosen to replicate previous results

 

Model

\(\widehat{EMD}\)

P k

WindowDiff

k=5

10

15

k=5

10

15

ICSI

TextTiling

2.507

.289

.388

.451

.318

.477

.561

P-NoSpeaker-single

1.949

.222

.283

.342

.269

.393

.485

P-NoSpeaker-all

1.935

.207

.279

.335

.253

.371

.468

P-SITS

1.807

.211

.251

.289

.256

.363

.434

NP-HMM

2.189

.232

.257

.263

.267

.377

.444

NP-SITS

2.126

.228

.253

.259

.262

.372

.440

2008 Debates

TextTiling

2.821

.433

.548

.633

.534

.674

.760

P-NoSpeaker-single

2.822

.426

.543

.653

.482

.650

.756

P-NoSpeaker-all

2.712

.411

.522

.589

.479

.644

.745

P-SITS

2.269

.380

.405

.402

.482

.625

.719

NP-HMM

2.132

.362

.348

.323

.486

.629

.723

NP-SITS

1.813

.332

.269

.231

.470

.600

.692

Consistent with previous results in the literature, probabilistic models outperform TextTiling. In addition, among the probabilistic models, the models that had access to speaker information consistently segment better than those lacking such information. Furthermore, np-sits outperforms np-hmm in both experiments, suggesting that using a distribution over topics for turns is better than using a single topic. This is consistent with the parametric models in Purver et al. (2006).

The contribution of speaker identity seems more valuable in the debate setting. Debates are characterized by strong rewards for setting the agenda; dodging a question or moving the debate toward an opponent’s weakness can be useful strategies (Boydstun et al. 2013). In contrast, meetings (particularly low-stakes icsi meetings, technical discussions in r&d group) tend to have pragmatic rather than strategic topic shifts. In addition, agenda-setting roles are clearer in formal debates; a moderator is tasked with setting the agenda and ensuring the conversation does not wander too much.

The nonparametric model does best on the smaller debate dataset. We suspect that an evaluation that directly accessed the topic quality, either via prediction (Teh et al. 2006) or interpretability (Chang et al. 2009b) would favor the nonparametric model more.

6 Evaluating topic control

In this section, we focus on the ability of SITS to capture the extent to which individual speakers affect topic shifts in conversations. Recall that SITS associates with each speaker a topic shift tendency π that represents the probability of changing the topic in the conversation. While topic segmentation is a well studied problem, hence the evaluation in Sect. 5, there are no established quantitative measurements of an individual’s ability to control a conversation. To evaluate whether the tendency is capturing meaningful characteristics of speakers, we look qualitatively at the behavior of the model.

6.1 2008 election debates

To obtain a posterior estimate of π (Fig. 5) we create 10 chains with hyperparameters sampled from the uniform distribution U(0,1) and average π over 10 chains (as described in Sect. 5.1). In these debates, Ifill is the moderator of the debate between Biden and Palin; Brokaw, Lehrer and Schieffer are the three moderators of the three debates between Obama and McCain. Here “Question” denotes questions from audiences in “town hall” debate. The role of this “speaker” can be considered equivalent to the debate moderator.
Fig. 5

Topic shift tendency π of speakers in the 2008 Presidential Election Debates (larger means greater tendency). Ifill was the moderator in the vice presidential debate between Biden and Palin; Brokaw, Lehrer and Schieffer were the moderators in the three presidential debates between Obama and McCain; Question collectively refers to questions from the audiences

The topic shift tendencies of moderators are generally much higher than for candidates. In the three debates between Obama and McCain, the moderators—Brokaw, Lehrer and Schieffer—have significantly higher scores than both candidates. This is a useful reality check, since in a debate the moderators are the ones asking questions and literally controlling the topical focus. Similarly, the “Question” speaker had a relatively high variance, consistent with that “participant” in the model as an amalgamation of many distinct speakers.

Interestingly, however, in the vice-presidential debate, the score of moderator Ifill is higher than the candidates’ scores only by a small margin, and it is indistinguishable from the degree of topic control displayed by Palin. Qualitatively, the assessment of the model is consistent with widespread perceptions and media commentary at the time that characterized Ifill as a weak moderator. For example, Harper’s Magazine’s Horton (2008) discusses the context of the vice-presidential debate, in particular the McCain campaign’s characterization of Ifill as a biased moderator because she “was about to publish a book entitled The Breakthrough that discusses Barack Obama, and a number of other black politicians, achieving national prominence”. According to Horton:

First, the charges against Ifill would lead to her being extremely passive in her questioning of Palin and permissive in her moderating the debate. Second, the charge of bias against Ifill would enable Palin to simply skirt any questions she felt uncomfortable answering and go directly to a pre-rehearsed and nonresponsive talking point. This strategy succeeded on both points.

Similarly, Fallows (2008) of The Atlantic included the following in his “quick guide” remarks on the debate:

Ifill, moderator: Terrible. Yes, she was constrained by the agreed debate rules. But she gave not the slightest sign of chafing against them or looking for ways to follow up the many unanswered questions or self-contradictory answers. This was the big news of the evening …

Palin: “Beat expectations.” In every single answer, she was obviously trying to fit the talking points she had learned to the air time she had to fill, knowing she could do so with impunity from the moderator.

That said, our quantitative modeling of topic shift tendency suggests that all candidates managed to succeed at some points in setting and controlling the topic of conversation in the debates. In the presidential debates, our model gives Obama a slightly higher score than McCain, consistent with social science claims that Obama had the lead in setting the agenda over McCain (Boydstun et al. 2013). Table 6 shows some examples of SITS-detected topic shifts.
Table 6

Example of turns designated as a topic shift by SITS. We chose turns to highlight speakers with high topic shift tendency π. Some keywords are manually italicized to highlight the topics discussed

 

Previous turn

Turn detected as shifting topic

2008 Debates Dataset

BIDEN: Well, mortgage-holders didn’t pay the price […] Barack Obama pointed out two years ago that there was a subprime mortgage […]

PALIN: That is not so, but because that’s just a quick answer, I want to talk about, again, my record on energy … When we talk about energy, we need to consider the need to do all that we can to allow this nation to become energy independent […]

PALIN: Your question to him was whether he supported gay marriage and my answer is the same as his and it is that I do not.

IFILL: Wonderful. You agree. On that note, let’s move to foreign policy. You both have sons who are in Iraq or on their way to Iraq. You, Governor Palin, have said that you would like to see a real clear plan for an exit strategy. […]

MCCAIN: I think that Joe Biden is qualified in many respects. …

SCHIEFFER: […] Let’s talk about energy and climate control. Every president since Nixon has said what both of you […]

IFILL: So, Governor, as vice president, there’s nothing that you have promised […] that you wouldn’t take off the table because of this financial crisis we’re in?

BIDEN: Again, let me–let’s talk about those tax breaks. [Obama] voted for an energy bill because, for the first time, it had real support for alternative energy. […] on eliminating the tax breaks for the oil companies, Barack Obama voted to eliminate them. […]

Crossfire Dataset

PRESS: But what do you say, governor, to Governor Bush and […] your party who would let politicians and not medical scientists decide what drugs are distributed […]

WHITMAN: Well I disagree with them on this particular issues […] that’s important to me that George Bush stands for education of our children […] I care about tax policy, I care about the environment. I care about all the issues where he has a proven record in Texas […]

WEXLER: […] They need a Medicare prescription drug plan […] Talk about schools, […] Al Gore has got a real plan. George Bush offers us vouchers. Talk about the environment. […] Al Gore is right on in terms of the majority of Americans, but George Bush […]

KASICH: […] I want to talk about choice. […] George Bush believes that, if schools fail, parents ought to have a right to get their kids out of those schools and give them a chance and an opportunity for success. Gore says “no way” […] Social Security. George Bush says […] direct it the way federal employees do […] Al Gore says “No way” […] That’s real choice. That’s real bottom-up, not a bureaucratic approach, the way we run this country.

PRESS: Senator, Senator Breaux mentioned that it’s President Bush’s aim to start on education […] [McCain] […] said he was going to do introduce the legislation the first day of the first week of the new administration. […]

MCCAIN: After one of closest elections in our nation’s history, there is one thing the American people are unanimous about. They want their government back. We can do that by ridding politics of large, unregulated contributions that give special interests a seat at the table while average Americans are stuck in the back of the room.

6.2 Crossfire

The Crossfire dataset has many more speakers than the presidential and vice-presidential debates. This allows us to examine more closely what we can learn about speakers’ topic shift tendency and ask additional questions; for example, assuming that changing the topic is useful for a speaker, how can we characterize who does so effectively? In our analysis, we take advantage of properties of the Crossfire data to examine the relationship between topic shift tendency, social roles, and political ideology.

In order to focus on frequent speakers, we filter out speakers with fewer than 30 turns. Most speakers have relatively small π, with the mode around 0.3. There are, however, speakers with very high topic shift tendencies. Table 7 shows the speakers having the highest values according to SITS.
Table 7

Top speakers by topic shift tendencies from our Crossfire dataset. We mark hosts (†) and “speakers” who often (but not always) appeared in video clips (‡). Announcer makes announcements at the beginning and at the end of each show; Narrator narrates video clips; Male and Female refer to unidentified male and female respectively; Question collectively refers to questions from the audience across different shows. Apart from those groups, speakers with the highest tendency were political moderates

Rank

Speaker

π

Rank

Speaker

π

1

Announcer

.884

10

John Kasich

.570

2

Male

.876

11

James Carville

.550

3

Question

.755

12

Tucker Carlson

.550

4

George W. Bush

.751

13

Paul Begala

.545

5

Bill Press

.651

14

Christine T. Whitman

.533

6

Female

.650

15

Terry McAuliffe

.529

7

Al Gore

.650

16

Mary Matalin

.527

8

Narrator

.642

17

John McCain

.524

9

Robert Novak

.587

18

Ari Fleischer

.522

We find that there are three general patterns for who influences the course of a conversation in Crossfire. First, there are structural “speakers” that the show uses to frame and propose new topics. These are audience questions, news clips (e.g. many of Gore’s and Bush’s turns from 2000), and voiceovers. That SITS is able to recover these is reassuring, similar to what it has to say about moderators in the 2008 debates. Second, the stable of regular hosts receives high topic shift tendencies, which is again reasonable given their experience with the format and ostensible moderation roles (though in practice they also stoke lively discussion).

The third category is more interesting. The remaining non-hosts with high topic shift tendency appear to be relative moderates on the political spectrum:
  • John Kasich, one of few Republicans to support the assault weapons ban and who was elected in 2010 as the governor of Ohio, a swing state

  • Christine Todd Whitman, former Republican governor of New Jersey, a very Democratic state

  • John McCain, who before 2008 was known as a “maverick” for working with Democrats (e.g. Russ Feingold).

Although these observations are at best preliminary and require further investigation, we would conjecture that in Crossfire’s highly polarized context, it was the political moderates who pushed back, exerting more control over the agenda of the discussion, rather than going along with the topical progression and framing as posed by the show’s organizers. Table 6 shows several detected topic shifts from these speakers. In two of these examples, McCain and Whitman are Republicans disagreeing with President Bush. In the other, Kasich is defending a Republican plan (school vouchers) popular with traditional Democratic constituencies.

6.3 2012 Republican primary debates

As another qualitative data point, we include in Fig. 6 the model’s topic shift tendency scores for a subset of nine 2012 Republican primary debates. Although we do not have objective measures to compare against, nor clearly stated contemporary commentary as in the case of Ifill’s performance as moderator, we would argue that the model displays quite reasonable face validity in the context of the Republican race.
Fig. 6

Topic shift tendency π of speakers in the 2012 Republican Primary Debates (larger means greater tendency). King, Blitzer and Cooper are moderators in these debates; the rest are candidates

For example, among the Republican candidates, Ron Paul is known for tight focus on a discrete set of arguments associated with his position that “the proper role for government in America is to provide national defense, a court system for civil disputes, a criminal justice system for acts of force and fraud, and little else” (Paul 2007), often regardless of the specific question that was asked. Similarly, Rick Santorum’s performance in the primary debates tended to include strong rhetoric on social issues. In contrast, Mitt Romney tended to be less aggressive in his responses, arguably playing things safer in a way that was consistent with his general position throughout the primaries as the front-runner.

7 Detecting influencers in conversations

7.1 Computational methods for influencer detection

In this section, we turn to the direct application and validation of the model in detecting influencers in conversations. Even though influence in conversations has been studied for decades in communication and social psychology, computational methods have only emerged in recent years, thanks to improvements in both quantity and quality of conversational data. As one example, an early computational model to quantify influence between conversational participants (Basu et al. 2001) modeled interactions among a conversational group in a multi-sensor lounge room where people played interactive debating games. In these games, each participant can be in two states: speaker or silent. The model equates each participant with a Markov model. Each participant is allowed to be in either speaking state or silent state at each time step and the transition from one state to another of an individual is influenced by other participants’ states. This allows the model to capture pair-wise interactions among participants in the conversation. Zhang et al. (2005) then extended the work by proposing a model with two-level structure: the participant level, representing the actions of individual participants, and the group level, representing group-level actions. In this setting, the influence of each participant on the actions of the whole group is explicitly captured by the model. These models use expensive features such as prosody and visual cues.

Another popular approach is to treat influencer detection as a supervised classification problem that separates influential individuals from non-influential ones. Rienks and Heylen (2005) focus on extracting a set of structural features that can predict participants’ involvement using Support Vector Machines (Cortes and Vapnik 1995, svm). Later, Rienks et al. (2006) improved their previous work by extending the set of features to include features capturing topic changes as well as those derived from audio and speech. Again, we do not use any features extracted from audio or visual data, which makes our approach more generalizable. The two most relevant and most useful features extracted from the meeting textual transcripts are number of turns and length of turns, which we use as the baseline in our experiments described in Sect. 7.2. Biran et al. (2012) also follow a similar approach to detecting influencers in written online conversations by extracting features to capture different conversational behaviors such as persuasion, agreement/disagreement and dialog patterns.

In this paper, we are interested in determining who are the influencers in a conversation using only the conversation transcripts. We tackle this problem by using an unsupervised ranking approach. It is worth mentioning that, even though we are focused on studying how conversational influence expressed in textual data, there has also been a body of work approaching this problem by studying audio data (Hung et al. 2011), visual data (Otsuka et al. 2006) and both audio-visual activity cues (Jayagopi et al. 2009; Aran and Gatica-Perez 2010).

Our main purpose in this experimentation is to assess how effective SITS can be in detecting influencers in conversations, especially in comparison with methods based on structural patterns of conversations. We focus on the influencer detection problem: given a speaker in a multi-party conversation, predict whether the speaker is influential. In the remaining of this section, we describe in details the approach we take, the experimental setups, and the results.

7.2 Influencer detection problem

The influencer detection problem can be tackled using different methods that can be broadly classified into classification and ranking approaches. Most previous work follows the classification approach, in which different sets of features are proposed and a classifier is used (Rienks and Heylen 2005; Rienks et al. 2006; Biran et al. 2012). In this paper, we follow the ranking approach.

The ranking approach allows us to focus on individual functions that take a set of individuals and produce an ordering over those individuals from most influential to least influential. The function that produces this ordering is called a ranking method. More specifically, given a speaker a in a conversation c, each ranking method will provide an influence score \(\mathcal{I} _{a, c}\) that indicates how influential speaker a is in conversation c. We emphasize that, unlike most classification approaches (Rienks and Heylen 2005; Rienks et al. 2006; Biran et al. 2012), the ranking approach we are focusing on is entirely unsupervised and thus requires no training data.

The ranking approach has a straightforward connection to the classification approach, as each ranking function can be turned into a feature in the supervised classification framework. However, viewing the ranking methods (features) independently allows us to compare and interpret the effectiveness of each feature in isolation. This is useful as an evaluation method because it is independent of the choice of classifier and is less sensitive to the size of training data, which is often a limiting factor in computational social science.

We consider two sets of ranking methods: (1) structure-based methods, which use structural features and (2) topic-change-based methods, which use features extracted from the outputs of SITS.

Structure-based methods

score each instance based on features extracted from the structure of the conversation. As defined in Sect. 2, we use T c to denote the number of turns in conversation c; a c,t to denote the speaker that utters turn t in conversation c; and N c,t to denote the number of tokens in turn t in conversation c.
  1. 1.
    Number of turns: assumes that the more turns a speaker has during a conversation, the more influential he or she is. The influence score of this method is
    $$ \mathcal{I} _{a, c} = \bigl\vert \bigl\{ t \in[1, T_c] : a _{c,t} = a \bigr\} \bigr\vert $$
    (9)
     
  2. 2.
    Total turn lengths: instead of the number of turns, this method uses the total length of turns uttered by the speaker.
    $$ \mathcal{I} _{a, c} = \sum_{t \in[1, T_c] : a _{c,t} = a} N _{c,t} $$
    (10)
     
The two structural features used here capture the activeness of the speakers during a conversation and have been shown to be among the most effective features to detect influencers. These two structure-based methods are appropriate baselines in our experiment since, although being simple, they have been proven to be very effective in detecting influencers, both qualitatively (Bales 1970) and quantitatively (Rienks et al. 2006; Biran et al. 2012).

Topic-change-based methods

score each instance based on features extracted from the posterior distributions of SITS.
  1. 1.
    Total topic shifts is the total number of expected topic shifts speaker a makes in conversation c,
    $$ \mathcal{I} _{a, c} = \sum_{t \in[1, T_c] : a _{c,t} = a} \bar {l} _{c,t} $$
    (11)
    Recall that in SITS, each turn t in conversation c is associated with a binary latent variable l c,t , which indicates whether the topic of turn t is changed or not (these latent variables are introduced in Sect. 3). This expectation is computed through the empirical average of samples from the Gibbs sampler, \(\bar{l} _{c,t}\), after a burn-in period.13 Intuitively, the higher \(\bar{l} _{c,t}\) is, the more successful the speaker a c,t is in changing the topic of the conversation at this turn t.
     
  2. 2.
    Weighted topic shifts also quantify the topic changes a speaker makes by using the average topic shift indicator \(\bar{l} _{c,t}\) but weighted by (1−π a ), where π a is the topic shift tendency score of the speaker a. The basic idea here is that not all topic shifts should be counted equally. A successful topic shift by a speaker with small topic shift tendency score should be weighted higher than a successful topic by a speaker with high topic shift tendency score. The influence score of this ranking method is defined as
    $$ \mathcal{I} _{a, c} = (1- \pi_a) \cdot\sum _{t \in[1, T_c] : a _{c,t} = a} \bar{l} _{c,t} $$
    (12)
     

7.3 Experimental setup

Datasets

In this experiment, we use two datasets annotated for influencers: Crossfire and Wikipedia discussion pages. These two datasets and the annotation procedures are described in detail in Sect. 4. Table 8 shows dataset statistics.
Table 8

Statistics of the two datasets Crossfire and Wikipedia discussions that we annotated influencers. We use these two datasets to evaluate SITS on influencer detection

Statistics

Crossfire

Wikipedia

No. conversations

3391

604

No. unique speakers

2381

1991

Avg no. turns per conversation

38.2

12.8

Avg no. speakers per conversation

5

7

No. conversations annotated

85

48

No. positive instances

197

57

No. negative instances

182

338

Parameter settings and implementation

As before, we use Gibbs sampling with 10 randomly initialized chains for inference. Initial hyperparameter values are sampled from U(0,1) and statistics are collected after 200 burn-in iterations with a lag of 20 iterations over a total of 1000 iterations. Slice sampling optimizes the hyperparameters.

Evaluation measurements

To evaluate the effectiveness of each ranking method in detecting the influencers, we use three standard evaluation measurements. The first measurement is F 1, the harmonic mean of precision and recall,
$$ F_1 = \frac{2 \cdot\mathrm{Precision} \cdot\mathrm {Recall}}{\mathrm{Precision} + \mathrm{Recall}} $$
(13)
Even though F 1 is widely used, an important disadvantage is that it only examines a subset of top instances with highest scores, which might be the “easiest” cases. This phenomenon might lead to biased results when comparing the performance of different ranking methods. To overcome this problem, we also use auc-roc and auc-pr, which measure the area under the Receiver-Operating-Characteristic (roc) curve and the Precision-Recall (pr) curve. Using these two measurements, we can compare the performances of ranking methods using the full ranked lists. Davis and Goadrich (2006) point out that pr curve is more appropriate than roc for skewed datasets.

7.4 Results and analysis

Table 9 shows the results of the four ranking methods using Crossfire and Wikipedia discussion datasets. Since we run our Gibbs samplers multiple times, the results of the two topic-change-based methods are reported with standard deviations (across different chains).
Table 9

Influencer detection results on Crossfire and Wikipedia discussion pages. For both datasets, topic-change-based methods (⋆) outperform structure-based methods (⋄) by large margins. For all evaluation measurements, higher is better

 

Ranking methods

F 1

AUC-ROC

AUC-PR

Crossfire

Num. of turns ⋄

.736

.795

.726

Total turn lengths ⋄

.716

.782

.730

Total topic shifts ⋆

.806±.0122

.858±.0068

.865±.0063

Weighted topic shifts ⋆

.828±.0100

.869±.0078

.873±.0057

Wikipedia

Num. of turns ⋄

.367

.730

.291

Total turn lengths ⋄

.306

.732

.281

Total topic shifts ⋆

.552±.0353

.752±.0144

.377±.0284

Weighted topic shifts ⋆

.488±.0295

.749±.0149

.379±.0307

For both datasets, the two topic-change-based methods outperform the two structure-based methods by a large margin for all three evaluation measurements. The standard deviations in all three measurements of the two topic-change-based methods are relatively small. This shows the effectiveness of features based on topic changes in detecting influencers in conversations. In addition, the weighted topic shifts ranking method generally performs better than the total topic shifts method. This provides strong evidence that SITS is capable of capturing the speakers’ propensity to change the topic. The improvement (if any) in the performance of the weighted topic shifts ranking method over the total topic shifts method is more obvious in the Crossfire dataset than in Wikipedia discussions. We argue that this is because conversations in Wikipedia discussion pages are generally shorter and contain more speakers than those in Crossfire debates. This leaves less evidence about the topic change behavior of the speakers in Wikipedia and thus SITS struggles to capture the speakers’ behavior.

8 Conclusions and future work

SITS is a nonparametric hierarchical Bayesian model that jointly captures topics, topic shifts, and individuals’ tendency to control the topic in conversations. SITS takes a nonparametric topic modeling approach, representing each turn in a conversation as a distribution over topics and consecutive turns’ topic distributions as dependent on each other.

Crucially, SITS also models speaker-specific properties. As such, it improves performance on practical tasks such as unsupervised segmentation, but it also is attractive philosophically. Accurately modeling individuals is part of a broader research agenda that seeks to understand individuals’ values (Fleischmann et al. 2011), interpersonal relationships (Chang et al. 2009a), and perspective (Hardisty et al. 2010), which creates a better understanding of what people think based on what they write or say (Pang and Lee 2008). One particularly interesting direction is to extend the model to capture how language is coordinated during the conversation and how it correlates with influence (Giles et al. 1991; Danescu-Niculescu-Mizil et al. 2012).

The problem of finding influencers in conversation has been studied for decades by researchers in communication, sociology, and psychology, who have long acknowledged qualitatively the correlation between the ability of a participant to control conversational topic and his or her influence on other participants during the conversation. With SITS, we now introduce a computational technique for modeling more formally who is controlling the conversation. Empirical results on the two datasets we annotated (Crossfire TV show and Wikipedia discussion pages) show that methods based on SITS outperform previous methods that used conversational structure patterns in detecting influencers.

Using an unsupervised statistical model for detecting influencers is an appealing choice because it extends easily to other languages and to corpora that are multilingual (Mimno et al. 2009; Boyd-Graber and Blei 2009). Moreover, topic models offer opportunities for exploring large corpora (Zhai et al. 2012) in a wide range of domains including political science (Grimmer 2009), music (Hoffman et al. 2009), programming source code (Andrzejewski et al. 2007) or even household archaeology (Mimno 2011). Recent work has created frameworks for interacting with statistical models (Hu et al. 2011) to improve the quality of the latent space (Chang et al. 2009b), understand relationships with other variables (Gardner et al. 2010), and allow the model to take advantage of expert knowledge (Andrzejewski et al. 2009) or knowledge resources (Boyd-Graber et al. 2007).

This work opens several future directions. First, even though associating each speaker with a scalar that models their tendency to change the topic does improve performance on both topic segmentation and influencer detection tasks, it is obviously an impoverished representation of an individual’s conversational behaviors and could be enriched. For example, instead of just using a fixed parameter π for each conversational participant, one could extend the model to capture evolving topic shift tendencies of participants during the conversation. Modeling individuals’ perspective (Paul and Girju 2010), “side” (Thomas et al. 2006), or personal preferences for topics (Grimmer 2009) would also enrich the model and better illuminate the interaction of influence and topic.

Another important future direction is to extend the model to capture more explicitly the distinction between agenda setting and interaction influence. For example, questions or comments from the moderators during a political debate just shape the agenda of the debate and have little influence over how candidates would respond. Agenda setting does not have a direct effect on the views or opinions of others, and it does not try to sway the attitudes and beliefs of others. Agenda setting focuses generally on the topics that will be addressed, determining what those topics will be from the outset (McCombs and Reynolds 2009). It is during an interaction that an influencer is able to shape the discussion by shifting the interaction from one topic to another or providing evidence or expertise that can shape the opinions and judgments about the topics. To be identified as an influencer, however, others in the interaction must acknowledge or recognize the value of the expertise or agree with the opinion and viewpoints that have been offered. Thus, adding modules to find topic expertise (Marin et al. 2010) or agreement/disagreement (Galley et al. 2004) during the conversation would enable SITS to better detect influencers.

Understanding how individuals use language to influence others goes beyond conversational turn taking and topic control, however. In addition to what is said, often how something is expressed—i.e., the syntax—is nearly as important (Greene and Resnik 2009; Sayeed et al. 2012). Combining SITS with a model that can discover syntactic patterns (Sayeed et al. 2012) or multi-word expressions (Johnson 2010) associated with those attempting to influence a conversation would allow us to better understand how individuals use word choice and rhetorical strategies to persuade (Cialdini 2000; Anand et al. 2011) or coordinate with (Danescu-Niculescu-Mizil et al. 2012) others. Such systems could have a significant social impact, as they could identify, quantify, and measure attempts to spin or influence at a large scale. Models for automatic analysis of influence could lead to more transparent public conversations, ultimately improving our ability to achieve more considered and rational discussion of important topics, particularly in the political sphere.

Footnotes

  1. 1.

    This paper significantly revises and extends the work described in Nguyen et al. (2012).

  2. 2.

    Note the distinction from phonetic utterances, which by definition are bounded by silence.

  3. 3.

    The “bag of words” treatment of linguistic utterances is widely used, but of course a gross simplification. In other research, we have investigated nonparametric models capturing arbitrary-length phrases (Hardisty et al. 2010) and syntactic topic models (Boyd-Graber and Blei 2008); integrating linguistically richer models with SITS is a topic for future work.

  4. 4.

    We also investigated using the maximal assumption and fully sampling assignments. We found the minimal path assumption worked as well as explicitly sampling seating assignments and that the maximal path assumption worked less well. Another, more complicated, sampling method is to sample the counts N c,k and N k according to their corresponding Antoniak distributions (Antoniak 1974), similar to the direct assignment sampling method described in Teh et al. (2006).

  5. 5.

    The superscript + is to denote that this number is unbounded and varies during the sampling process.

  6. 6.

    Deterministically knowing the path assignments is the primary efficiency motivation for using the minimal path assumption. The alternative is to explicitly sample the path assignments, which is more complicated (for both notation and computation). This option is spelled out in full detail in the appendix.

  7. 7.

    Refer to Gershman and Blei (2012) for a detailed derivation of this joint probability.

  8. 8.
  9. 9.
  10. 10.
  11. 11.

    Kappa was measured based on whether the two annotators agreed on (a) whether there was an influencer, (b) who the primary influencer was, and (c) if there was a secondary influencer. When discrepancies occurred between the annotators, they were resolved through discussion between the annotators and with the supervising researcher. So decisions were not “yes or no” about each speaker; instead, they were about whether or not there was an influencer in each overall interaction, and if so, who the primary and secondary influencers were in a particular interaction.

  12. 12.

    The ground distance is the distance between two bins in a histogram. Please refer to Pele and Werman (2008) for a more formal definition of Open image in new window .

  13. 13.

    For more details on how to compute this value, refer to Sect. 3 of Resnik and Hardisty (2010).

Notes

Acknowledgements

We would like to thank the reviewers for their insightful comments. We are grateful to Eric Hardisty, Pranav Anand, Craig Martell, Douglas W. Oard, Earl Wagner, and Marilyn Walker for helpful discussions. This research was funded in part by the Army Research Laboratory through ARL Cooperative Agreement W911NF-09-2-0072 and by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), through the Army Research Laboratory. Jordan Boyd-Graber and Philip Resnik are supported by US National Science Foundation Grant NSF #1018625. Viet-An Nguyen and Philip Resnik are also supported by US National Science Foundation Grant NSF #IIS1211153. Any opinions, findings, conclusions, or recommendations expressed are the authors’ and do not necessarily reflect those of the sponsors.

References

  1. Abbott, R., Walker, M., Anand, P., Fox Tree, J. E., Bowmani, R., & King, J. (2011). How can you say such things?!?: Recognizing disagreement in informal political argument. In Proceedings of the workshop on language in social media (LSM). Google Scholar
  2. Ahmed, A., & Xing, E. P. (2008). Dynamic non-parametric mixture models and the recurrent Chinese restaurant process: with applications to evolutionary clustering. In Proceedings of SIAM international conference on data mining. Google Scholar
  3. Ahmed, A., & Xing, E. P. (2010). Timeline: a dynamic hierarchical Dirichlet process model for recovering birth/death and evolution of topics in text stream. In Proceedings of uncertainty in artificial intelligence. Google Scholar
  4. Alarcon-del Amo, M., Lorenzo-Romero, C., & Gomez-Borja, M. (2011). Classifying and profiling social networking site users: a latent segmentation approach. Cyberpsychology, Behavior, and Social Networking, 14(9). Google Scholar
  5. Anand, P., King, J., Boyd-Graber, J., Wagner, E., Martell, C., Oard, D. W., & Resnik, P. (2011). Believe me: we can do this! In The AAAI 2011 workshop on computational models of natural argument. Google Scholar
  6. Andrzejewski, D., Mulhern, A., Liblit, B., & Zhu, X. (2007). Statistical debugging using latent topic models. In Proceedings of European conference of machine learning. Google Scholar
  7. Andrzejewski, D., Zhu, X., & Craven, M. (2009). Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In Proceedings of the international conference of machine learning. Google Scholar
  8. Antoniak, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics, 2(6), 1152–1174. MathSciNetCrossRefzbMATHGoogle Scholar
  9. Aran, O., & Gatica-Perez, D. (2010). Fusing audio-visual nonverbal cues to detect dominant people in group conversations. In Proceedings of the international conference on pattern recognition (ICPR). Google Scholar
  10. Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4), 555–596. CrossRefGoogle Scholar
  11. Bales, R. F. (1970). Personality and interpersonal behavior. New York: Holt, Rinehart, and Winston. Google Scholar
  12. Basu, S., Choudhury, T., Clarkson, B., & Pentland, A. S. (2001). Learning human interactions with the influence model (Tech. Rep. 539). MIT Media Laboratory. Google Scholar
  13. Beeferman, D., Berger, A., & Lafferty, J. (1999). Statistical models for text segmentation. Machine Learning, 34(1–3), 177–210. CrossRefzbMATHGoogle Scholar
  14. Bender, E. M., Morgan, J. T., Oxley, M., Zachry, M., Hutchinson, B., Marin, A., Zhang, B., & Ostendorf, M. (2011). Annotating social acts: authority claims and alignment moves in wikipedia talk pages. In Proceedings of the workshop on languages in social media (LSM). Google Scholar
  15. Biran, O., Rosenthal, S., Andreas, J., McKeown, K., & Rambow, O. (2012). Detecting influencers in written online conversations. In Proceedings of the workshop on language in social media (LSM). Google Scholar
  16. Blau, P. (1964). Exchange and power in social life. Sociology political science. Transaction Books. Google Scholar
  17. Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84. MathSciNetCrossRefGoogle Scholar
  18. Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. In Proceedings of the international conference of machine learning. Google Scholar
  19. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022. zbMATHGoogle Scholar
  20. Booth, N., & Matic, A. (2011). Mapping and leveraging influencers in social media to shape corporate brand perceptions. Corporate Communications, 16(3), 184–191. CrossRefGoogle Scholar
  21. Boyd-Graber, J., & Blei, D. M. (2008). Syntactic topic models. In Proceedings of advances in neural information processing systems. Google Scholar
  22. Boyd-Graber, J., & Blei, D. M. (2009). Multilingual topic models for unaligned text. In Proceedings of uncertainty in artificial intelligence. Google Scholar
  23. Boyd-Graber, J., Blei, D. M., & Zhu, X. (2007). A topic model for word sense disambiguation. In Proceedings of empirical methods in natural language processing. Google Scholar
  24. Boydstun, A. E., Glazier, R. A., & Phillips, C. (2013). Agenda control in the 2008 presidential debates. American Politics Research. Google Scholar
  25. Brooke, M. E., & Ng, S. H. (1986). Language and social influence in small conversational groups. Journal of Language and Social Psychology, 5(3), 201–210. CrossRefGoogle Scholar
  26. Burrel, N. A., & Koper, R. J. (1998). The efficacy of powerful/powerless language on attitudes and source credibility. In Persuasion: advances through meta-analysis. Google Scholar
  27. Butler, B., Joyce, E., & Pike, J. (2008). Don’t look now, but we’ve created a bureaucracy: the nature and roles of policies and rules in wikipedia. In International conference on human factors in computing systems. Google Scholar
  28. Chang, J., Boyd-Graber, J., & Blei, D. M. (2009a). Connections between the lines: augmenting social networks with text. In Knowledge discovery and data mining. Google Scholar
  29. Chang, J., Boyd-Graber, J., Wang, C., Gerrish, S., & Blei, D. M. (2009b). Reading tea leaves: how humans interpret topic models. In Proceedings of advances in neural information processing systems. Google Scholar
  30. Chen, H., Branavan, S. R. K., Barzilay, R., & Karger, D. R. (2009). Global models of document structure using latent permutations. In Computational linguistics. Google Scholar
  31. Choi, F. Y. Y., Wiemer-Hastings, P., & Moore, J. (2001). Latent semantic analysis for text segmentation. In Proceedings of empirical methods in natural language processing. Google Scholar
  32. Cialdini, R. B. (2000). Influence: science and practice (4th ed.). Needham Heights: Allyn & Bacon. Google Scholar
  33. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. zbMATHGoogle Scholar
  34. Cowans, P. J. (2006). Probabilistic document modelling. Ph.D. thesis, University of Cambridge. Google Scholar
  35. Daley, J. A., McCroskey, J. C., & Richmond, V. P. (1977). Relationships between vocal activity and perception of communicators in small group interaction. Western Journal of Speech Communication, 41(3), 175–187. CrossRefGoogle Scholar
  36. Danescu-Niculescu-Mizil, C., Lee, L., Pang, B., & Kleinberg, J. (2012). Echoes of power: language effects and power differences in social interaction. In Proceedings of world wide web conference (pp. 699–708). Google Scholar
  37. Davis, J., & Goadrich, M. (2006). The relationship between precision-recall and ROC curves. In Proceedings of the international conference of machine learning. Google Scholar
  38. Dowman, M., Savova, V., Griffiths, T. L., Kording, K. P., Tenenbaum, J. B., & Purver, M. (2008). A probabilistic model of meetings that combines words and discourse features. Transactions on Audio, Speech, and Language Processing, 16(7), 1238–1248. CrossRefGoogle Scholar
  39. Drake, B. H., & Moberg, D. J. (1986). Communicating influence attempts in dyads: linguistic sedatives and palliatives. The Academy of Management Review, 11(3), 567–584. Google Scholar
  40. Du, L., Buntine, W. L., & Jin, H. (2010). Sequential latent Dirichlet allocation: discover underlying topic structures within a document. In International conference on data mining. Google Scholar
  41. Ehlen, P., Purver, M., & Niekrasz, J. (2007). A meeting browser that learns. In Proceedings of the AAAI spring symposium on interaction challenges for intelligent assistants. Google Scholar
  42. Eisenstein, J., & Barzilay, R. (2008). Bayesian unsupervised topic segmentation. In Proceedings of empirical methods in natural language processing. Google Scholar
  43. Emerson, R. M. (1981). Social exchange theory. In M. Rosenberg & R. H. Turner (Eds.), Social psychology: sociological perspectives (pp. 30–65). New York: Basic Books. Google Scholar
  44. Fallows, J. (2008). Your VP debate wrapup in four bullet points. The Atlantic. http://www.theatlantic.com/technology/archive/2008/10/your-vp-debate-wrapup-in-four-bullet-points/8887/.
  45. Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2), 209–230. MathSciNetCrossRefzbMATHGoogle Scholar
  46. Fleischmann, K. R., Templeton, T. C., & Boyd-Graber, J. (2011). Modeling diverse standpoints in text classification: learning to be human by modeling human values. In iConference. Google Scholar
  47. Foa, U. G., & Foa, E. B. (1972). Resource exchange: toward a structural theory of interpersonal communication. In A. W. Siegman & B. Pope (Eds.), Studies in dyadic communication (pp. 291–325). Elmsford: Pergamon. CrossRefGoogle Scholar
  48. Fox, E. B., Sudderth, E. B., Jordan, M. I., & Willsky, A. S. (2008). An HDP-HMM for systems with state persistence. In Proceedings of the international conference of machine learning. Google Scholar
  49. Galley, M., McKeown, K., Fosler-Lussier, E., & Jing, H. (2003). Discourse segmentation of multi-party conversation. In Proceedings of the association for computational linguistics. Google Scholar
  50. Galley, M., McKeown, K., Hirschberg, J., & Shriberg, E. (2004). Identifying agreement and disagreement in conversational speech: use of Bayesian networks to model pragmatic dependencies. In Proceedings of the association for computational linguistics. Google Scholar
  51. Gardner, M., Lutes, J., Lund, J., Hansen, J., Walker, D., Ringger, E., & Seppi, K. (2010). The topic browser: an interactive tool for browsing topic models. In Proceedings of advances in neural information processing systems. Google Scholar
  52. Gershman, S. J., & Blei, D. M. (2012). A tutorial on Bayesian nonparametric models. Journal of Mathematical Psychology, 56(1), 1–12. MathSciNetCrossRefzbMATHGoogle Scholar
  53. Giles, H., Coupland, N., & Coupland, J. (1991). Accommodation theory: communication, context, and consequence. In H. Giles, N. Coupland, & J. Coupland (Eds.), Contexts of accommodation: developments in applied socio-linguistics (pp. 1–68). Cambridge: Cambridge University Press. CrossRefGoogle Scholar
  54. Greene, S., & Resnik, P. (2009). More than words: syntactic packaging and implicit sentiment. In NAACL. Google Scholar
  55. Grimmer, J. (2009). A Bayesian hierarchical topic model for political texts: measuring expressed agendas in senate press releases. Political Analysis, 18(1), 1–35. CrossRefGoogle Scholar
  56. Halliday, M., & Hasan, R. (1976). Cohesion in English. New York: Longman. Google Scholar
  57. Hamilton, M. A., & Hunter, J. E. (1998). The effect of language intensity on receiver evaluations of message, source, and topic. In Persuasion: advances through meta-analysis. Google Scholar
  58. Hardisty, E., Boyd-Graber, J., & Resnik, P. (2010). Modeling perspective using adaptor grammars. In Proceedings of empirical methods in natural language processing. Google Scholar
  59. Hawes, T., Lin, J., & Resnik, P. (2009). Elements of a computational model for multi-party discourse: the turn-taking behavior of supreme court justices. Journal of the American Society for Information Science and Technology, 60(8), 1607–1615. CrossRefGoogle Scholar
  60. Hearst, M. A. (1997). TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1), 33–64. Google Scholar
  61. Hirschberg, J., & Litman, D. (1993). Empirical studies on the disambiguation of cue phrases. Computational Linguistics, 19(3), 501–530. Google Scholar
  62. Hoffman, M. D., Blei, D. M., & Cook, P. R. (2009). Finding latent sources in recorded music with a shift-invariant hdp. In Proceedings of the conference on digital audio effects. Google Scholar
  63. Horton, S. (2008). The Ifill factor. Harpers Magazine. http://harpers.org/archive/2008/10/hbc-90003659.
  64. Hsueh, P., Moore, J. D., & Renals, S. (2006). Automatic segmentation of multiparty dialogue. In Proceedings of the European chapter of the association for computational linguistics. Google Scholar
  65. Hu, Y., Boyd-Graber, J., & Satinoff, B. (2011). Interactive topic modeling. In Proceedings of the association for computational linguistics. Google Scholar
  66. Huffaker, D. (2010). Dimensions of leadership and social influence in online communities. Human Communication Research, 36(4), 593–617. CrossRefGoogle Scholar
  67. Hung, H., Huang, Y., Friedland, G., & Gatica-Perez, D. (2011). Estimating dominance in multi-party meetings using speaker diarization. Transactions on Audio, Speech, and Language Processing, 19(4), 847–860. CrossRefGoogle Scholar
  68. Ireland, M. E., Slatcher, R. B., Eastwick, P. W., Scissors, L. E., Finkel, E. J., & Pennebaker, J. W. (2011). Language style matching predicts relationship initiation and stability. Psychological Science, 22(1), 39–44. CrossRefGoogle Scholar
  69. Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., & Wooters, C. (2003). The ICSI meeting corpus. In IEEE international conference on acoustics, speech, and signal processing. Google Scholar
  70. Jayagopi, D. B., Hung, H., Yeo, C., & Gatica-Perez, D. (2009). Modeling dominance in group conversations using nonverbal activity cues. Transactions on Audio, Speech, and Language Processing, 17(3), 501–513. CrossRefGoogle Scholar
  71. Johnson, M. (2010). PCFGs, topic models, adaptor grammars and learning topical collocations and the structure of proper names. In Proceedings of the association for computational linguistics. Google Scholar
  72. Katz, E., & Lazarsfeld, P. F. (1955). Personal influence: the part played by people in the flow of mass communications. Foundations of communications research. New York: Free Press. Google Scholar
  73. Kellermann, K. (2004). Topical profiling: emergent, co-occurring, and relationally defining topics in talk. Journal of Language and Social Psychology, 23(3), 308–337. CrossRefGoogle Scholar
  74. Marin, A., Ostendorf, M., Zhang, B., Morgan, J. T., Oxley, M., Zachry, M., & Bender, E. M. (2010). Detecting authority bids in online discussions. In SLT (pp. 49–54). Google Scholar
  75. Mast, M. S. (2002). Dominance as expressed and inferred through speaking time. Human Communication Research, 28(3), 420–450. Google Scholar
  76. McCombs, M., & Reynolds, A. (2009). How the news shapes our civic agenda. In J. Bryant & M. B. Oliver (Eds.), Media effects: advances in theory and research (pp. 1–16). Lawrence Erlbaum. Google Scholar
  77. Mimno, D. M. (2011). Reconstructing Pompeian households. In Proceedings of uncertainty in artificial intelligence (pp. 506–513). Google Scholar
  78. Mimno, D., Wallach, H., Naradowsky, J., Smith, D., & McCallum, A. (2009). Polylingual topic models. In Proceedings of empirical methods in natural language processing. Google Scholar
  79. Morris, J., & Hirst, G. (1991). Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17(1), 21–48. Google Scholar
  80. Müller, P., & Quintana, F. A. (2004). Nonparametric Bayesian data analysis. Statistical Science, 19(1), 95–110. MathSciNetCrossRefzbMATHGoogle Scholar
  81. Murray, G., Renals, S., & Carletta, J. (2005). Extractive summarization of meeting recordings. In European conference on speech communication and technology. Google Scholar
  82. Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2), 249–265. MathSciNetGoogle Scholar
  83. Neal, R. M. (2003). Slice sampling. The Annals of Statistics, 31, 705–767. MathSciNetCrossRefzbMATHGoogle Scholar
  84. Ng, S. H., & Bradac, J. J. (1993). Power in language: verbal communication and social influence. language and language behaviors. Thousand Oaks: Sage Publications. Google Scholar
  85. Ng, S. H., Bell, D., & Brooke, M. (1993). Gaining turns and achieving high influence ranking in small conversational groups. British Journal of Social Psychology, 32(3), 265–275. CrossRefGoogle Scholar
  86. Nguyen, V. A., Boyd-Graber, J., & Resnik, P. (2012). SITS: a hierarchical nonparametric model using speaker identity for topic segmentation in multiparty conversations. In Proceedings of the association for computational linguistics. Google Scholar
  87. Olney, A., & Cai, Z. (2005). An orthonormal basis for topic segmentation in tutorial dialogue. In Proceedings of the human language technology conference. Google Scholar
  88. Otsuka, K., Yamato, J., Takemae, Y., & Murase, H. (2006). Quantifying interpersonal influence in face-to-face conversations based on visual attention patterns. In International conference on human factors in computing systems. Google Scholar
  89. Palmer, M. T. (1989). Controlling conversations: turns, topics and interpersonal control. Communication Monographs, 56(1), 1–18. CrossRefGoogle Scholar
  90. Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Hanover: Now Publishers. Google Scholar
  91. Passonneau, R. J., & Litman, D. J. (1997). Discourse segmentation by human and automated means. Computational Linguistics, 23(1), 103–139. Google Scholar
  92. Paul, R. (2007). Political power and the rule of law. Texas Straight Talk. http://www.lewrockwell.com/paul/paul366.html.
  93. Paul, M., & Girju, R. (2010). A two-dimensional topic-aspect model for discovering multi-faceted topics. In Association for the advancement of artificial intelligence. Google Scholar
  94. Pele, O., & Werman, M. (2008). A linear time histogram metric for improved sift matching. In ECCV (pp. 495–508). Google Scholar
  95. Pele, O., & Werman, M. (2009). Fast and robust earth mover’s distances. In International conference on computer vision. Google Scholar
  96. Pevzner, L., & Hearst, M. A. (2002). A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 28(1), 19–36. CrossRefGoogle Scholar
  97. Planalp, S., & Tracy, K. (1980). Not to change the topic but …: a cognitive approach to the management of conversations. In Communication yearbook 4, New Brunswick (pp. 237–258). Google Scholar
  98. Purver, M. (2011). Topic segmentation. In Spoken language understanding: systems for extracting semantic information from speech. Google Scholar
  99. Purver, M., Körding, K., Griffiths, T. L., & Tenenbaum, J. (2006). Unsupervised topic modelling for multi-party spoken discourse. In Proceedings of the association for computational linguistics. Google Scholar
  100. Regula, R., & Julian, W. (1973). The impact of quality and frequency of task contributions on perceived ability. The Journal of Social Psychology, 89(1), 115–122. CrossRefGoogle Scholar
  101. Reid, S. A., & Ng, S. H. (2000). Conversation as a resource for influence: evidence for prototypical arguments and social identification processes. European Journal of Social Psychology, 30(1), 83–100. CrossRefGoogle Scholar
  102. Ren, L., Dunson, D. B., & Carin, L. (2008). The dynamic hierarchical Dirichlet process. In Proceedings of the international conference of machine learning. Google Scholar
  103. Resnik, P., & Hardisty, E. (2010). Gibbs sampling for the uninitiated (Tech. Rep. UMIACS-TR-2010-04). University of Maryland. http://drum.lib.umd.edu//handle/1903/10058.
  104. Rienks, R., & Heylen, D. (2005). Dominance detection in meetings using easily obtainable features. In Proceedings of the 2nd joint workshop on multimodal interaction and related machine learning algorithms. Google Scholar
  105. Rienks, R., Zhang, D., Gatica-Perez, D., & Post, W. (2006). Detection and application of influence rankings in small group meetings. In Proceedings of the international conference on multimodal interfaces, ICMI ’06. Google Scholar
  106. Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121. CrossRefzbMATHGoogle Scholar
  107. Sayeed, A. B., Boyd-Graber, J., Rusk, B., & Weinberg, A. (2012). Grammatical structures for word-level sentiment detection. In North American association of computational linguistics. Google Scholar
  108. Scheer, L. K., & Stern, L. W. (1992). The effect of influence type and performance outcomes on attitude toward the influencer. Journal of Marketing Research, 29(1), 128–142. CrossRefGoogle Scholar
  109. Schlenker, B. R., Nacci, P., Helm, B., & Tedeschi, J. T. (1976). Reactions to coercive and reward power: the effects of switching influence modes on target compliance. Sociometry, 39(4), 316–323. CrossRefGoogle Scholar
  110. Sorrentino, R. M., & Boutiller, R. G. (1972). The effect of quantity and quality of verbal interaction on ratings of leadership ability. Journal of Experimental Social Psychology, 5, 403–411. Google Scholar
  111. Stang, D. J. (1973). Effect of interaction rate on ratings of leadership and liking. Journal of Personality and Social Psychology, 27(3), 405–408. CrossRefGoogle Scholar
  112. Teh, Y. W. (2006). A hierarchical Bayesian language model based on Pitman-Yor processes. In Proceedings of the association for computational linguistics. Google Scholar
  113. Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476), 1566–1581. MathSciNetCrossRefzbMATHGoogle Scholar
  114. Thomas, M., Pang, B., & Lee, L. (2006). Get out the vote: determining support or opposition from congressional floor-debate transcripts. In Proceedings of empirical methods in natural language processing. Google Scholar
  115. Trammell, K. D., & Keshelashvili, A. (2005). Examining the new influencers: a self-presentation study of a-list blogs. Journalism & Mass Communication Quarterly, 82(4), 968–982. CrossRefGoogle Scholar
  116. Tur, G., Stolcke, A., Voss, L., Peters, S., Hakkani-Tür, D., Dowding, J., Favre, B., Fernández, R., Frampton, M., Frandsen, M., Frederickson, C., Graciarena, M., Kintzing, D., Leveque, K., Mason, S., Niekrasz, J., Purver, M., Riedhammer, K., Shriberg, E., Tien, J., Vergyri, D., & Yang, F. (2010). The CALO meeting assistant system. Transactions on Audio, Speech, and Language Processing, 18(6), 1601–1611. CrossRefGoogle Scholar
  117. Wallach, H. M. (2006). Topic modeling: beyond bag-of-words. In Proceedings of the international conference of machine learning. Google Scholar
  118. Wallach, H. M. (2008). Structured topic models for language. Ph.D. thesis, University of Cambridge. Google Scholar
  119. Wang, C., Blei, D. M., & Heckerman, D. (2008). Continuous time dynamic topic models. In Proceedings of uncertainty in artificial intelligence. Google Scholar
  120. Weimann, G. (1994). The influentials: people who influence people. Suny series in human communication processes. Albany: State University of New York Press. Google Scholar
  121. Zhai, K., Boyd-Graber, J., Asadi, N., & Alkhouja, M. (2012). Mr. LDA: a flexible large scale topic modeling package using variational inference in mapreduce. In Proceedings of world wide web conference. Google Scholar
  122. Zhang, D., Gatica-Perez, D., Bengio, S., & Roy, D. (2005). Learning influence among interacting Markov chains. In Proceedings of advances in neural information processing systems. Google Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  • Viet-An Nguyen
    • 1
    Email author
  • Jordan Boyd-Graber
    • 2
  • Philip Resnik
    • 3
  • Deborah A. Cai
    • 4
  • Jennifer E. Midberry
    • 4
  • Yuanxin Wang
    • 4
  1. 1.Department of Computer ScienceUniversity of MarylandCollege ParkUSA
  2. 2.iSchool and UMIACSUniversity of MarylandCollege ParkUSA
  3. 3.Department of Linguistics and UMIACSUniversity of MarylandCollege ParkUSA
  4. 4.School of Media and CommunicationTemple UniversityPhiladelphiaUSA

Personalised recommendations