Conceptual Organization is Revealed by Consumer Activity Patterns

Computational models using text corpora have proved useful in understanding the nature of language and human concepts. One appeal of this work is that text, such as from newspaper articles, should reflect human behaviour and conceptual organization outside the laboratory. However, texts do not directly reflect human activity, but instead serve a communicative function and are highly curated or edited to suit an audience. Here, we apply methods devised for text to a data source that directly reflects thousands of individuals’ activity patterns. Using product co-occurrence data from nearly 1.3-m supermarket shopping baskets, we trained a topic model to learn 25 high-level concepts (or topics). These topics were found to be comprehensible and coherent by both retail experts and consumers. The topics indicated that human concepts are primarily organized around goals and interactions (e.g. tomatoes go well with vegetables in a salad), rather than their intrinsic features (e.g. defining a tomato by the fact that it has seeds and is fleshy). These results are consistent with the notion that human conceptual knowledge is tailored to support action. Individual differences in the topics sampled predicted basic demographic characteristics. Our findings suggest that human activity patterns can reveal conceptual organization and may give rise to it. Electronic supplementary material The online version of this article (10.1007/s42113-019-00064-9) contains supplementary material, which is available to authorized users.

Meaning may arise from an element's role or interactions within a larger system.For example, hitting nails is more central to people's concept of a hammer than its particular material composition or other intrinsic features.Likewise, the importance of a web page may result from its links with other pages rather than solely from its content.One example of meaning arising from extrinsic relationships are approaches that extract the meaning of word concepts from co-occurrence patterns in large, text corpora.The success of these methods suggest that human activity patterns may reveal conceptual organization.However, texts do not directly reflect human activity, but instead serve a communicative function and are usually highly curated or edited to suit an audience.Here, we apply methods devised for text to a data source that directly reflects thousands of individuals' activity patterns, namely supermarket purchases.Using product co-occurrence data from nearly 1.3m shopping baskets, we trained a topic model to learn 25 high-level concepts (or topics).These topics were found to be comprehensible and coherent by both retail experts and consumers.Topics ranged from specific (e.g., ingredients for a stir-fry) to general (e.g., cooking from scratch).Topics tended to be goal-directed and situational, consistent with the notion that human conceptual knowledge is tailored to support action.Individual differences in the topics sampled predicted basic demographic characteristics.These results suggest that human activity patterns reveal conceptual organization and may give rise to it.O ne common view is that concepts decompose into in- trinsic features or parts (1).On this view, a bird is an animal that typically has wings, feathers, a beak, and so on (2).However, extrinsic features are also critical for how humans organize concepts and come to understand the world (3).Indeed, everyday concepts are difficult to define solely in terms of intrinsic features.For example, Wittgenstein (4) asserted that the concept of game is undefinable.One might suggest that games are fun, but Russian Roulette is not fun and other activities that are fun are not games.Likewise, not all games are competitive (e.g., Ring Around the Rosie).Instead of defining game in terms of intrinsic features, one solution is to define game relationally -a game is simply something that is played (5).On this view, concepts are defined and become meaningful in terms of their relationships and interactions with other concepts rather than decomposing into a set of intrinsic features.
The importance of relations and interactions extends beyond abstract concepts.Many features of concrete concepts are extrinsic (6).For example, fundamental to our conception of a hammer is that it is used to hit objects, which is not an intrinsic property of hammer like its shape or material composition, but is instead a relation between a hammer and another object, typically a nail.Even for natural kinds, people commonly list extrinsic features for concepts (6), such as noting that birds eat worms.Meanings appear to update in light of extrinsic relationships.For example, people are more likely to judge a polar bear and a dog as similar after reading vignettes in which both played the same role in a relation, such as chasing some other animal (6).Likewise, merely sharing a thematic relationship, such as a man and a tie (e.g., wears), makes the linked concepts more similar (6,7).
If concepts are defined in terms of other concepts, what moors or grounds our concepts to the physical world we inhabit (8)?One proposed solution is that some concepts are embodied (for a review, see 9).For example, the action of hammering may be grounded to related motor programs and associated perceptions, linking the body, mind, and physical world.Indeed, neuroscientific evidence has shown that comprehension of language is tightly coupled with the neural regions associated with action and perception (10).A computational model developed by Mitchell et al. (11) was able to accurately predict the neural activity elicited by a noun by considering the co-occurrence of that noun with action verbs in a largetext corpora.In effect, the action verbs, for which elicited neural activity was known, provided a grounding or bases for representing associated nouns.
These corpora models, such as Latent Semantic Analysis, use the co-occurence of words within some context (e.g., a document) to learn lower dimensional, vector representations of

Significance Statement
Can a hammer be a hammer if it is not used for hitting nails?Objects may become meaningful through their interactions with other objects.Using techniques developed in computational linguistics for inferring word meanings from large text-corpora, we analyzed purchasing patterns in supermarkets.These realworld activity patterns organized into topics that reflected human conceptual knowledge.Topics ranged from specific (e.g., ingredients for a stir-fry) to general (e.g., cooking from scratch).Topics tended to be goal-directed and situational, consistent with the notion that human conceptual knowledge is tailored to support action.People's age, gender, and location were predictable from their topics.These results suggest that human activity patterns reveal conceptual organization and may give rise to it.
word concepts (12).Like the reviewed psychological research (6), words need not directly co-occur with one another to become more similar, but need only occur in similar contexts.Although LSA has enjoyed numerous successes, cases in which its representations diverge with those of humans has prompted further model development (13).
One subsequent proposal, Latent Dirichlet Allocation (LDA), is a probabilistic approach in which documents are generated according to a mixture of probabilities over latent themes or topics (e.g., 14).For example, LDA may find that the words 'Parliament' and 'Prime Minister' have a high probability of belonging to the same topic (e.g., 'politics').A passage about the Prime Minister visiting the Houses of Parliament would make this politics topic highly probable, though other topics would also be somewhat likely, such a topic related to tourism (Big Ben is part of the Houses of Parliament).
The representations learned by topic models appear similar to the concepts that people use (15,16).For example, topic modeling can predict subsequent words in a sentence, disambiguate word meanings, and extract the gist of a sentence (15).Related techniques find that word meanings extracted for text corpora reflect back that society's gender stereotypes (17).These successes provide credence to the idea that human concepts are heavily influenced by their extrinsic roles and relationships.It may be through patterning and interaction that concepts gain meaning, as opposed to decomposing into intrinsic features.
Despite these successes, corpora analysis of natural language is not ideally suited to evaluate the proposal that human concepts become meaningful and organize around extrinsic relations.After all, language is primarily concerned with effective communication of relevant information (18), rather than providing a faithful record of object interactions.For example, in waiting to cross the street with a companion, one would never verbalize that the passing car drives on the road.Written language is also heavily curated.For example, journalists adhere to particular guidelines and aim to report on stories of interest to their readership.Thus, data collected from either spoken or written language is arguably unrepresentative of true activity patterns.
The ideal data set for topic modeling would contain unfiltered and unedited information that directly pertains to human activity patterns.Consumer shopping is one such test bed.Retail data are collected from consumers as they purchase products together in the same basket, analogous to how words group together in the same document (see 1).Unlike natural language, grocery transaction data are not carefully curated with a particular audience in mind.Moreover, there are no editors controlling the final assortment of items in a basket (i.e., document).Unlike purchasing data collected from many thousands of consumers, text found in newspapers is typically written by a select cadre of journalists, thereby biasing the data towards the experience of a small minority.Therefore, one could argue that consumer purchases are more representative of activity patterns in-the-wild and therefore serve as a more appropriate test bed for evaluating models and theories of human semantic representation.In effect, working with a such a dataset would provide a real-world, large-scale test of theories of meaning that were hitherto only possible to explore in laboratory studies under limited and artificial conditions.
In addition to being more representative of people's activ-  The input in a corpora analysis is typically item counts (i.e., word counts) within some context (e.g., a sentence or document).Analogously, products (akin to words) are organized into baskets (akin to sentences).One advantage of applying these analysis techniques to baskets is that, unlike natural language, meaning is unaffected by item order.

NEWS
ity, consumer purchasing data better suits the mathematical assumptions of topic models than does natural language.Indeed, natural language researchers typically go to great lengths to pre-process their data.For example, researchers typically must remove function or 'stop' words that have little semantic meaning (such as the, of, and).They may also 'stem' words to remove prefixes and suffixes of words that have similar semantic meaning (e.g., eat vs. eating).Moreover, the order of words in sentences can also make a big difference to sentence meaning (e.g., "dog bites man" vs. "man bites dog").However, topic models typically ignore word order, instead preferring to consider language as a "bag-of-words" (19).In contrast, for retail data captured in-store, there is no inherent order for products within a basket, nor a need to heavily pre-process the data.If people's thematic organization of concepts arises through their interaction with the environment, then it should be possible for a topic model to recover relevant representations of these through consumer purchasing patterns, as shown in Figure 2. We tested this possibility using a large, anonymized dataset of 1,252,963 shopping baskets and 5,753 unique products, supplied by one of the UK's largest supermarket retailers.After optimizing an LDA solution using fit statistics and checking for convergence * , we labelled the 25 topics recovered by the model.
To foreshadow, we confirmed the psychological reality of these topics in two human studies, one with judgments from retail experts and another involving typical consumers.The latter study also suggested that an individual's shopping experience shaped their conceptual organization of the products.In support of this assertion, the rate at which an individual sampled topics (based on recent shopping history) predicted the individual's age, gender, and geographic region.Finally, topics tied to a season varied sensibly in their prevalence over the calendar year (e.g., the Christmas topic was most prevalent in December).Overall, these results suggest that conceptual organization may arise from people's direct interactions with Product Topic Probabilities Basket Topic Probabilities Fig. 2. Latent Dirichlet Allocation (LDA) uncovers the higher-level product topics that can be viewed as generating the observed baskets purchased by consumers.LDA's fit is driven by the co-occurrence pattern of products within baskets.In the solution, each product has a probability of occurring within each topic (shown on the left for apple).Each basket is generated by a mixture of probabilities over the topics (shown on the right for this basket).
objects, and, conversely, that patterns of real-world behaviour may reveal a great deal about an individual and their goals.

Results
The topics recovered were coherent and readily labeled by the authors.They tended to be organized along activity patterns and goals, ranging from specific (e.g., Stir Fry) to general in scope (e.g., Cooking from scratch) † .A subset of 10 topics was considered in the empirical studies of retail experts and typical consumers.
A. Evaluating topic labels with retail experts.When asked to select the appropriate topic label for a group of products from a list of four possible labels, 92.8% of the 51 retail industry experts selected the same topic label as was originally proposed by the authors.Figure 3 shows the proportion of times participants agreed with the originally proposed topic label for each topic.This high-level of accuracy from a group of experts, who were naive to our research program, indicates that the topics recovered from consumer activity patterns are meaningful.Disagreement regarding topic labels was primarily driven by conceptually similar topics (further details of this are available in the supplemental).For example, the most common error in labeling the summer salad topic was cooking from scratch.These errors are reasonable and are also consistent with the notion that baskets are generated by a mix of topics, as opposed to a single topic (see Figure 2).

B. Evaluating topic coherency with typical consumers.
We evaluated whether typical consumers could identify an intruder product from another topic among a number of products from the same topic.The 5 products from the same topic and the 1 intruder product were all highly ranked within their respective topic ‡ .Of the 3840 British consumers surveyed, 76.0% were † A complete list of the 25 topics and process used to label them is available in the supplemental ‡ More information about the ranking procedure used is available in the supplemental able to correctly identify the intruder product.Figure 3 shows accuracy by topic.
One topic stands out for its below chance level of performance, afternoon tea.Participants were most likely (51.7% of the time) to incorrectly suggest that 'mint humbugs' was the intruder.One possibility for this poor classification accuracy is that participants did not have enough context to interpret them correctly.In the afternoon tea topic, the top 5 items were predominantly fresh and 'staple' foods (e.g., Milk, Bananas, Danish sliced white bread).Thus, seeing a packet of sweets (i.e., 'humbugs') among this fresh food may have appeared unusual.An analogous issue arises with the low maintenance cooking topic.Each topic is a probability distribution over thousands of products, so perhaps it is not surprising that a small sample of products could be ambiguous.
Another possibility that is more core to our theory is that individual differences in experience may help explain some of these confusions.For example, the poor classification of the afternoon tea topic may have been driven by the fact that most British people no longer regularly engage in this ritualistic activity.If experience shapes people's mental concepts, then we would expect representations of certain products to vary between demographics.Supporting this view, consumers from Northern Ireland had an average probability for the Northern Ireland topic 7.5× higher than the average across all regions.The fact that the model was able to recover such strong regional differences in consumers suggests that it should be sensitive to other individual differences in people's experience of the world.

C. Classifying individual consumers by their experienced
topics.The previous results hinted that individual differences in knowledge organization may arise from differences in product interactions.More ambitiously, differences in how often people experience topics may predict basic demographic characteristics.To test this assertion, we used logistic regression to predict (5-fold cross-validation) self-reported age § , region ¶ and gender using consumers' mean LDA probabilities over baskets as the predictors.The models were able to predict age with an accuracy of 48.51%, region with an accuracy of 58.34% and gender with an accuracy of 57.17%, considerably higher than the guessing baselines of 36.86%,50.00%, and 50.00%, respectively.

D. Seasonal trends in topics.
In addition to experiences of topics varying across individuals, prevalence of topics should vary over the calendar year.We identified 4 topics that were likely to have a high seasonal popularity (summer fruits, summer salad, Christmas and low calorie options) and 4 'staple-food' topics that we believed unlikely to vary as much over the year (loose fruit and veg, Northern Ireland, quick to prepare meals and food for now).
Figure 4 shows the proportion of times baskets associated with a given topic occurred each month in 2014 in one of the UK's largest retailers.In line with our hypotheses, summer fruits and summer salad peaked in popularity during the summer months.Contrasting, baskets labelled with the Christmas topic peaked in popularity during December and the surrounding winter months.Low calorie options appeared to peak in January and steadily decline to its lowest level of popularity in December.The 'staple' topics shown in Figure 4b appeared § Discretized into 18-29, 30-44, 45-59 and 60+ ¶ Binarized into England vs. regional (i.e., Scotland, Wales and Northern Ireland) to vary considerably less over the year compared to the more seasonal topics.These results give further credence to the proposed topic labels and illuminate some seasonal variations in behavioural patterns that likely reflect time-dependent characteristics of people's thematic representations.Similarly intuitive patterns were shown to occur during different days of the week, which are reported in more detail in the supplemental.

Discussion
Rather than meaning residing solely in intrinsic features, conceptual knowledge may arise from object interactions.For example, the meaning of hammer is more closely tied to hitting nails than to its particular material composition.Support for the notion that meaning arises from a web of interactions comes from laboratory studies demonstrating that object interactions alter how people conceptualize objects ( 6) and from large-scale corpora analysis of text (e.g., newspaper articles) that extract meaning from word co-occurrence patterns.However, none of these previous investigations involve people engaging in unfiltered, goal-driven, real-world interactions with objects.Under such conditions, can meaningful conceptual organization be recovered from human activity patterns?
We tested this possibility by considering the shopping patterns of thousands of UK consumers.Using LDA, we found that the pattern of consumer purchases was highly revealing of people's conceptual organization of these products.Topics ranged from specific and goal-driven (e.g., ingredients for a stir-fry) to very general (e.g., cooking from scratch).Interest-ingly, the topics tended to be goal-directed and situational, which is consistent with the notion that human conceptual knowledge is tailored to support action.The situational nature of certain topics was reflected in their increasing prevalence during certain times of the year, such as the Christmas topic in December and the Summer Salad topic in the Summer.
The psychological reality of the 25 LDA topics we found was confirmed by two studies, one involving retail experts and one involving everyday consumers.The experts, who were blind to the purposes of this research, agreed with our labeling of the topics.The novices were able to identify an intruder product among an array of products from the same topic.These results indicate that the topics uncovered by human activity patterns are both comprehensible and coherent.
If meaning and real-world behavior are linked, then individual differences in experiences should be reflected in differences in conceptual organization.In support of this conjecture, topic prevalence varied across geographic regions.In our study of everyday consumers, poor performance for the topic afternoon tea may reflect that today's typical British consumer differs from past caricatures.Consistent with the idea that different types of people will have different topic experiences, we were able to predict basic demographic information about consumers from their topics mix (i.e., which topics best characterized their purchasing behavior).One avenue for future research is to develop, apply, and evaluate topic models in which individuals organize into higher-level groups that can vary in terms of topic prevalence or even topic composition.
One interesting question is whether shopping activity changes conceptual organization or conceptual organization drives shopping behaviour.Our results cannot definitely answer this question, but the likely answer is that the influence is bidirectional, much like how choices follow from preferences and preferences to a degree follow from choices (20).For example, having a concept like stir-fry should cause certain items to be purchased together to fulfill the common goal.Likewise, ingredients in the same dish may come to be viewed as more similar over time, consistent with laboratory studies that find that linking objects makes them more similar (6).One practical ramification is that recommender systems (21,22) using techniques related to our own may themselves shape conceptual organization.
What is clear is that conceptual organization is deeply tied to extrinsic relationships and that meaning can be seen as a byproduct of an element's role within a larger system or web.Indeed, the insight behind Google's PageRank algorithm is that web pages should be prioritized to the extent that they are central within a link graph (23).Prior to PageRank, the exact same algorithm was developed in Psychology to explain why certain features of concepts are more central than others within a concept web (24).Whether the system is human or artificial or the domain involves natural language or shopping behavior, meaning can be inferred, and perhaps arises, from relations among elements embedded within a larger system.

A. Topic Model.
A.1.Data.The topic model was developed on a random 0.1% sample of all grocery transactions that occurred in 2014 in one the UK's largest supermarket chains.The transactions were filtered such that only relatively popular products selling > 50, 000 units annually were kept.Moreover, data was filtered such that only large baskets containing ≥ 20 items were kept.Filtering was performed to ensure that LDA would have enough observations to learn meaningful topics.This is typical in LDA modelling (see 25) and is performed by the original LDA authors (14).After filtering, the final dataset contained 1,253,183 unique baskets and 5,753 unique products.
Note that -unlike traditional uses of LDA in NLP -we did not remove commonly occurring items from documents (i.e., 'stop words').Whilst natural language may contain 'stop words' (i.e., common words with little semantic meaning such as 'the'), we did not believe grocery transactions to suffer from the same problem.In the retail case, purchasing popular products, such as milk, bananas and bread, may be informative, perhaps indicating that the consumer is stocking essential items.
The basket data was fully anonymised for general research purposes so as to not be personally identifiable.

A.2. Model fit.
In our experiments we applied Latent Dirichlet Allocation (LDA) to the data, using the machine learning library in Spark 1.6.0(26).We conducted a range of experiments to identify the optimal set of hyperparameters (including the number of topics k) and in each case monitored the training and test log-perplexity to ensure model convergence and generalization, respectively (see supplemental for further details).
The LDA solution with the lowest log-perplexity on held out data (i.e., best generalization) had 25 topics.Models were trained for a maximum of 500 epochs, used the Online Variational Bayes optimization algorithm with an α = 0.1.The remaining hyperparameters were set to the package defaults.

B. Retail experts study.
B.1.Participants.Participants were recruited internally within the UK headquarters of dunnhumby (www.dunnhumby.com), a customer marketing company with over 29 years of experience working with grocery retailers and fast moving consumer goods (FMCG) brands.
Employees were asked to participate via the company intranet and were not remunerated.Fifty-one participated in the study.Participants had a wide range of roles within the business, including data analysts, category experts, company directors and client leads.Of these, 56.86% were male.Participants were surveyed in early December 2016 and were blind to the purposes of this study.

B.2. Materials.
The study was hosted on an internal company server.Participants accessed the study via their web browsers and answered questions by clicking on the appropriate radio button with their cursor.
In each trial, participants were shown 10 product images and accompanying product descriptions from a single topic.The displayed products were the 10 with the highest relevance.Product descriptions and images were downloaded from the retailer's website in late November 2016.

B.3. Design.
All participants were asked to label the same 10 topics in a random order.The dependent measure was the proportion of times that participants selected the topic label originally proposed during the model development phase.This proportion was then compared against a random baseline, to check whether participants were responding non-randomly.
It was not feasible to survey participants about all 25 topics in the final LDA solution given constraints on employee time.Therefore, 10 topics were chosen from the original 25 to include in the survey.

B.4. Procedure.
Participants were asked to label a group of products for 10 separate topics.The order of the presented topics was randomized.In each trial, participants saw 10 products from a certain topic and were asked to choose from one of four labels.One of the four labels was the 'target' label proposed by the authors whereas the other three were randomly selected from the remaining nine topics.The 10 products were the most relevant ones from within that topic.The process used to determine item relevance within a topic is described in the supplemental.

C. Consumer study.
C.1.Participants.Participants were recruited using dunnhumby's consumer survey panel; Shopper Thoughts (https://shopperthoughts.com/).Participants completed the survey as part of a larger, monthly survey for 50 card loyalty points.The final sample consisted of 3,840 participants, of which 59.47% were female.The modal age group was 55-59 (n = 501) and 724 participants did not disclose their age.All participants were from England, Scotland or Wales with the majority of respondents based in central England (n = 946).Participants were surveyed during March 2017.

C.2. Materials.
The study was accessible via the web, after logging in to the survey platform.Participants responded to the survey by clicking on a radio button next to the picture and product description of the item they believed to be the intruder.
For each trial, participants were shown 6 product images.Five were the most relevant from within a topic and the other 'intruder' was the most relevant product from a randomly selected alternative topic.Product descriptions and images were the same as those used during Study 1.
C.3.Procedure.Participants were first informed that the purpose of the study was to help retailers group together products found in the supermarket.They were then informed about the study's procedure.After agreeing to participate, the sole trial started.
Participants were shown six images of products alongside product descriptions and asked to select the item that didn't belong to the group by clicking on a radio button underneath the appropriate image.Following their choice, participants were debriefed about the purpose of the research.C.4.Design.The dependent variable was the proportion of times that participants identified the intruder product.This proportion was then compared against a random baseline to assess whether participants were able to identify intruders significantly above chance levels.
Participants each completed one trial in which they were asked to identify the intruding product.Participants were randomly selected to see one of the 10 topics also used in the retail experts study.This ensured that comparisons between the two related studies were consistent.

D. Predicting individual differences.
To demonstrate that the LDA solution could predict meaningful attributes about customers, we built three Regularised Logistic Regression models.In particular, we averaged the LDA probabilities from each basket at the customer level and then used those mean probabilities to predict customer's self-reported age band, region and gender.These self-reported values had been gathered by the marketing panel used for the consumer study during the last 3 years.The final dataset consisted of data from 30,233 consumers.Age was discretized in to 4 buckets; 18-29, 30-44, 45-59 and 60+.The age model was therefore evaluated in terms of accuracy.Region was collapsed into two buckets; England and regional (i.e., Scotland, Wales and Northern Ireland).The gender and region models were therefore measured in terms of the Area Under the ROC curve (AUC).The final dataset consisted of 28,122 customers.To find the best performing model, we performed a grid-search between λ values of 0.1 to 1.0.We determined the best model to be the one that had the highest performance over 5 cross-validation folds.Baselines were calculated by predicting the majority class in each fold.

Supplementary Information
A. Latent Dirichlet Allocation.For this project, we used a topic model known as Latent Dirichlet Allocation (LDA) (1).LDA is a generative probabilistic model that groups data into K unobserved topics.In the case of this project, baskets are represented as random mixtures over unobservable topics.Topics are then characterized as a mixture over N distinct products and D baskets.The generative process used by LDA can be described as follows: 1.For each topic k ∈ {1, ..., K}: • Choose a distribution over products φ k ∼ Dir(β).Where β and α are hyperparameters that determine the concentration of the Dirichlet prior placed on the topics' distribution over products φ and the baskets' distribution over topics θ, respectively.The latent variables φ k and θ d can then be inferred using an iterative learning algorithm, such as expectation maximization (1).w d ) is the probability of a product w in a basket d and N d is the number of products in a basket.Perplexity can be useful to monitor during model training to ensure that the algorithm is converging on the training set and generalizing to unseen data.However, some have argued that perplexity and human interpretation are uncorrelated or even negatively correlated (2).
Given that interpretability is a fundamental goal of this research, we decided to test it more directly.In particular, we tested to see whether the topics were interpretable to humans and could identify known patterns in historic purchasing data.We now discuss this in more detail.

arXiv:1810.08577v1 [cs.AI] 19 Oct 2018
A.2. Calculating product relevancy for a topic.One conventional approach to interpreting topic models is to rank items (i.e.products) within a topic and manually inspect those with the highest probabilities.One can look for similarities between items with high probabilities to understand whether there is a key theme that binds them together.
A major issue with the traditional approach to interpreting topic models is that the probability of an item given a topic φ kw can be biased positively in favour of more frequently occurring items within the corpus (3).The validity of this traditional approach was therefore significantly limited, particularly given that we did not filter high-frequency products (i.e.stop items) from our dataset.
To overcome issues with biased item probabilities, we explored two additional measures for determining the pertinence of items within a topic; lift and relevance.
The lift measure (first described by 3) is defined as: Lift is therefore a ratio of an item w's probability within a topic k (i.e., p(w k )) to its marginal probability across the corpus pw.Whilst this helps to diminish the impact of overall token frequency, some have argued that the measure is noisy (4).In particular, it can give overly high rankings to items that occur very rarely within the corpus.Indeed -during our manual inspection of the topics -we found this to be the case, limiting our ability to interpret the topics' meanings.
To overcome this problem, Sievert and Shirley (4) proposed the relevance metric, which is defined as: where λ is a free parameter that determines the weight given to the item w's probability within a topic k relative to its lift (measured on a log scale).If one sets λ = 1 then r(w, k|λ) = log p(w k ).Alternatively, if λ = 0 then r(w, k|λ) = log(lif t(w, k)).Thus, the benefit of this metric over lift is that it's possible to blend the probability of an item given a topic with lift.The metric's authors recommend using λ = 0.6 to maximize human interpretability, which is the value that we kept throughout all analyses (4).The data displayed in Figure 1 plot the relationship between item corpus frequency and each respective metric.As discussed, results indicate a strong correlation between item probabilities and frequency (r = 0.8862).The lift metric (r = 0.2193) and relevance metric (r = 0.2934) considerably reduce this correlation.
After manually inspecting the topics with each of the three measures, we agreed that relevance provided the best measure of item salience within a topic.We therefore used this to help determine topic names.
A.3.Initial topic labelling.When labelling the topics, the authors inspected the relevancy scores of each item within each topic, considered the most relevant items.Table 1 shows the topic labels along with the relative size of each topic within the corpus.

B. Label confusion by retail experts.
In the retail expert study, errors in labeling appeared sensible.Namely, the most popular alternative labels tended to be related to the original topic (see Table 2).For example, the most popular alternative label to the cooking from scratch table was loose fruit and veg; both topic labels pertain to ingredients that need to be prepared before consumption.
C. Topics by day of week.In addition to monthly trends, the proposed topic labels are also indicative of different weekly trends in purchasing habits.In particular, we hypothesized that topics indicative of longer preparation times (e.g.loose fruit and veg) or a special weekend occasion (e.g.afternoon tea) would be more likely to occur on or just before the weekend.Contrasting, we hypothesized that topics indicative of impulse purchasing (e.g.food for now) or stocking up for the long-term (e.g.branded store cupboard) would not vary much across the week.Food for now Branded store cupboard Fig. 2. The proportion of baskets with a given topic label on each day of the week, divided by the weekly mean average across all topics.Plot a) shows that food requiring longer preparation times (i.e.loose fruit and veg) or eaten specifically during weekend occasions (i.e.afternoon tea) are more likely to be bought on Thursday or Friday.Plot b) indicates that impulse purchases (i.e.food for now) or food that tends to be stored away (i.e.branded store cupboard) does not vary in popularity over the week.
cognition | computational social science | big data | machine learning | decision making

Fig. 1 .
Fig.1.The input in a corpora analysis is typically item counts (i.e., word counts) within some context (e.g., a sentence or document).Analogously, products (akin to words) are organized into baskets (akin to sentences).One advantage of applying these analysis techniques to baskets is that, unlike natural language, meaning is unaffected by item order.

Fig. 3 .
Fig. 3. Proportion correct with standard error bars for the study on label agreement involving retail experts and the intruder study involving typical consumers.All proportions were significantly different (p < .001)than chance levels, 25.00% (1 of 4) and 16.67% (1 of 6), respectively.

Fig. 4 .
Fig. 4. Topic prevalence varies by season.The proportion of baskets with a given topic label in each month of 2014, divided by the monthly mean average across all topics (i.e., index), is shown.a) Topics that should be seasonal peak at the expected time, such December for the Christmas topic.b) In contrast, topics for staple products vary less in prevalence over time.

2 .-
For each basket d ∈ {1, ..., D} in the collection c: • Generate a vector of topic probabilities: θ d ∼ Dir(α) • For each product w d,n in basket d: Generate a topic assignment: z d,n ∼ M ultinomial(θ d ) -Draw a product w d,n ∼ M ultinomial(φz d,n )

Fig. 1 .
Fig. 1.The relationship between item frequency within the corpus and a) item-topic probability, b) lift and c) relevancy

Table 1 . The labels given to each of the 25 topics. Size is defined as the number of products that had the highest probability of belonging to the respective topic over the total number of products in the corpus. Asterisks indicate that they were surveyed in studies A and B.
d N dWhere p(