Detecting multi-timescale consumption patterns from receipt data: a non-negative tensor factorization approach

Understanding consumer behavior is an important task, not only for developing marketing strategies but also for the management of economic policies. Detecting consumption patterns, however, is a high-dimensional problem in which various factors that would affect consumers’ behavior need to be considered, such as consumers’ demographics, circadian rhythm, seasonal cycles, etc. Here, we develop a method to extract multi-timescale expenditure patterns of consumers from a large dataset of scanned receipts. We use a non-negative tensor factorization (NTF) to detect intra- and inter-week consumption patterns at one time. The proposed method allows us to characterize consumers based on their consumption patterns that are correlated over different timescales.


Introduction
Consumption has been extensively studied in multiple research disciplines, and their viewpoints differ from one another.Macroeconomists, for example, consider that individual consumers' decision determines the economic condition at the macroscopic level 1 .In marketing studies, on the other hand, analyzing the shopping behavior of individual consumers is essential to gain insight into business strategy 2 .Researchers also study consumption at different time scales; economists often assume that representative individuals live infinitely long to investigate life-long consumption paths, while business researchers are interested in shorter practical time scales.
Many studies point out that consumption patterns change in accordance with the consumer's stage of life [3][4][5] .Arguably, young people having a child would go to supermarkets more frequently than elderly people would do.Income level of an individual would also affect how often and how much they spend for what.Different demographic characteristics may therefore exhibit different dynamical patterns of expenditure, and this leads us to conjecture that we could infer consumers' demographic properties from their dynamical expenditure patterns.
To understand the consumption behavior of individuals with different demographic properties, we explore the following research questions: RQ1: Does consumers' expenditure behavior exhibit dynamical patterns over multiple timescales?RQ2: Do the dynamical patterns reflect demographic differences?

RQ3: What demographic factors characterize the expenditure patterns?
To answer these research questions, we develop a non-negative tensor factorization (NTF) method to detect multi-timescale patterns of consumers' expenditure at intra-and inter-week scales.We employ the PARAFAC decomposition as a means to factorize a three-way tensor representing the actual expenditure data [6][7][8] .The NTF method has been widely used to mine temporal patterns in different social contexts, such as face-to-face contacts among humans 9,10 , online communications 11 , online game 12 and students' life in a university 13 .However, mining multi-timescale patterns has not been done so far, except for the study uncovering the intra-and inter-day transaction patterns of banks 14 .
In our model, the (i, j, k)-th element of a tensor corresponds to the number of items purchased by consumer i on jth day of week k.The NTF allows us to know how the intra-week expenditure behavior is associated with the inter-week patterns and how many such multi-timescale patterns exist.We argue that different multi-timescale patterns may come from different demographic characteristics of consumers, such as gender, marital status and age.This suggests that people in different stages of life indeed spend differently both at intra-and inter-week scales.

Related Work
Maximizing aggregate consumption is a primary goal for policymakers and is considered to contribute to social welfare 15,16 .Economists often model consumer behavior as a solution to a utility maximization problem with infinite horizon [16][17][18][19] .Using a formal framework based on a utility maximization problem, economists have been discussing how consumers form and follow consumption habits 20,21 , including whether or not such an explicit dynamical pattern exists [21][22][23][24][25][26] .Various studies also point out that consumption patterns tend to change according to the consumer's stage of life [3][4][5] .
Marketing scientists study consumer behavior from a more business-oriented viewpoint.For instance, they model the expenditure pattern of targeted consumers to predict the effect of a business strategy, such as a recommendation system, on actual consumption 27 .Models of consumer behavior incorporate various factors, including the structure of consumers' network 28,29 , self-revealed information in social media 30,31 , and spatial information regarding the consumer's geographical location 32 .Among many factors that could explain the observed consumption patterns, the sequence of temporal actions has been particularly studied to understand consumers' dynamic behavior [33][34][35][36][37] .A dynamical model has also been used to predict consumers' future activity 38 .Notably, some studies point out that there are temporal patterns of shopping activity at the intra-week scale, i.e., day-of-week effects [39][40][41] .
In this study, we employ a non-negative tensor factorization (NTF) method 7,8 to uncover hidden patterns in our receipt data.We represent consumers' expenditure data as a 3-way tensor, which will be detailed in the following section.NTF is widely used to mine temporal patterns in face-to-face contacts 9,10 , financial transactions 14 , online communications 11 and online games 12 .Based on the decomposed patterns from our consumption data, we show that consumers with different demographics have different consumption patterns.

Data
Our dataset is constructed from the receipt data scanned through a bookkeeping smartphone application Dr.Wallet 42 .This application allows users to digitize the record of their purchases by scanning receipts using smartphones or tablet PCs.Item names listed in receipts are annotated and documented by human workers.The dataset contains the prices, the name of each item and the date when the receipt has been scanned.There are in total 2,796,008 purchased items recorded by 2,624 users from April 1, 2017 to January 21, 2018.The data also contains the demographic attributes of the users such as gender, marital status and age range.Table 1 shows the basic statistics and the demography of users.

Tensor representation of consumption expenditure
Our study aims to detect dynamical patterns from our shopping record dataset.To pursue this goal, we use a non-negative tensor factorization (NTF) to obtain the latent factors that would reflect the characteristic expenditure patterns across different attributes of consumers [7][8][9]12 . Here we try to extract multi-timescale patterns that would exist at intra-and inter-week scales 14 .We represent the users' shopping records by a 3-way tensor, whose size is given by I × J × K, where I =#consumers (= 2, 624), J =#days in a week (= 7) and K =#weeks (= 42).The constructed 3-way tensor is interpreted as representing a sequence of weekly bipartite networks in each of which the nodes denoting the days of the week are connected to users with edge weights being the number of purchased items (Fig. 1).3/4

Non-negative tensor factorization
The NTF method decomposes tensor X ∈ R I×J×K + into latent factors that characterize the activity patterns of the corresponding mode.Each element of the tensor is denoted by x i jk ∈ X .In our model, x i jk denotes the number of items purchased by user i on j-th day of week k.We employ the PARAFAC decomposition as an NTF algorithm throughout the analysis 6,7 .The PARAFAC decomposition is an approximation method that expresses X as a sum of rank-one non-negative tensors { X r } R r=1 : where R denotes the number of components, and + represent the r-th component factors that respectively encode the membership of a user to a component, intra-and inter-week activity levels.The operator • represents outer product. Let be the factor matrices, whose r-th columns are vectors a r , b r and c r , respectively.The factor matrices A, B and C are obtained by solving the following minimization problem with non-negativity constraints: where • F denotes the Frobenius norm, and A, B, C represents the Kruscal form of the tensor decomposition (i.e., the right hand side of Eq. 1).To solve this problem, we use the non-negative alternate least squares (ANLS) with block principal pivoting (BPP) 43 .

Number of components
We utilize the Core-Consistency Diagnostic to determine an appropriate number of components, R 6 .The basic idea of the Core-Consistency measure is to quantify the difference between PARAFAC decomposition and a more general decomposition, namely the Tucker3 decomposition 6 .The Tucker3 decomposition is more flexible than PARAFAC because it allows for correlations between different components.If PARAFAC and Tucker3 return similar decomposition, then the PARAFAC model is considered good approximation of the original tensor (i.e., ignoring correlations among components would be innocent).
For the PARAFAC decomposition, the (i, j, k) element of the tensor can be written as where λ nmp denotes a product of Kronecker delta, i.e., λ nmp = δ nm δ mp δ np , where δ nm is the Kronecker delta that takes one if n = m, and 0 otherwise.Note that λ nmp takes 1 if n = m = p and 0 otherwise, so λ nmp is the (n, m, p) element of the superdiagonal binary tensor L .For the Tucker3 model, the (i, j, k) element of the tensor is generally written as where g nmp may not be expressed by a product of Kronecker delta.g nmp is an element of the core tensor obtained by the Tucker3 algorithm 7 .
The Core-Consistency (CC) quantifies the distance between PARAFAC and the Tucker3 decomposition as Note that the same number of components R is used for all modes.If the PARAFAC and the Tucker3 methods yield exactly the same decomposition, then CC = 100 6 .In general, CC value decreases with R because interactions between components tend to be more evident as the number of components increases.

Core-Consistency
The CC values for our NTF results with different rank size R are shown in Fig. 2, for which we run PARAFAC decomposition 20 times for each R. The result indicates that R = 3 would be the best choice because the CC value is larger than a rule-of-thumb threshold (= 85) 10 up to R = 3 and turns negative for R = 4.

Multi-timescale expenditure patterns
We firstly examine if the shopping activities have different dynamical patterns by looking at the components of dayof-week and weekly activities (RQ1).The r-th column of factor matrices B and C contain day-of-week and weekly activity patterns of Component r, respectively.For R = 3, we find three distinctive day-of-week expenditure patterns from matrix B (Fig. 3a).Each pattern is characterized by the days of week on which activity is concentrated, namely Weekdays, Saturday or Sunday.This suggests that the users' expenditure behavior during a week is characterized by one of these three patterns or a combination of them.Similarly, weekly patterns can be extracted from C (Fig. 3b).Activity level of Component 2 (i.e., weekdayshopping pattern) is the highest among the three and relatively stable except for the last 5 weeks which correspond to the year end.Activity of Components 1 (i.e., Sunday-shopping pattern) and 3 (i.e., Saturday-shopping pattern) are lower than that of Component 2 throughout the data period, while activity of Component 1 is a bit more volatile than that of Component 3.

Expenditure patterns and demographic differences
To address RQ2, we group the users based on their activities and see if each group has a characteristic demographic property.We use the factor matrix A obtained by the PARAFAC decomposition, on which we implement the k-medoids and the k-means methods to quantify the belongingness of user i to each component.We compare the two clustering methods by the silhouette score 44

(Figs. S1 and S2 in Supplementary Information (SI)).
We find that the k-medoids method gives us more evenly sized clusters compared to the k-means method.We select the number of clusters k = 5, because the rate at which the sum of distances between points in a cluster and the medoid decreases slows down around k = 5 (Fig. S3 in SI).In section , we will also show the results for which the consumers are grouped based on a threshold value.
To visualize the clustering result based on the k-medoids, we project the factor matrix A onto two-dimensional 5/4 space by exploiting the t-SNE embedding 45 (Fig. 4).The t-SNE is a visualization technique that allows us to convert high-dimensional data into low dimensional vectors 45 .Note that each consumer is classified by the k-medoids into one of the five non-overlapping groups based on their belongingness to each component quantified by matrix A.

Characterizing clusters based on the demographic properties
Different multi-timescale expenditure patterns would reflect the users' demographic characteristics because the status of a consumer (i.e., age, gender, marital status, etc) might determine, at least partially, the timing of shopping and the variety of items purchased.Here, we compare the demographic characteristics among the five clusters identified by the k-medoids method.Fig. 5 indicates that each user cluster is characterized by some demographic properties.Typical examples can be found from Cluster 1 and Cluster 4. Cluster 1 consists of relatively young consumers having no children, while Cluster 4 appears to be formed mainly by married elderly women who have children.We use the chi-squared test to see if the demographic distribution in each cluster is significantly different from the null distribution obtained from the original demographic structure.The chi-squared statistic is given by the sum of squared differences between the number of users identified by the k-medoids method and the expected number under the null hypothesis: , where D m denotes the observed number of consumers in category (i.e., Male, Female, etc) for Cluster m, and E m is the expected number of consumers in category for Cluster m under the null 46 .
The results from the chi-squared tests suggest that for each demographic attribute (i.e., gender, age, marital status and child), the distribution of users identified by the clustering method is significantly different from the null 6/4 distribution (p < 0.001).We also test whether there is a statistical difference in the distribution of users between two particular clusters.We conduct the statistical tests for all the pairwise combinations between different clusters.For all the demographic attributes, the null hypothesis is rejected for most of the pairs of clusters (Table S1 in SI).
Lastly, we answer RQ3 by focusing on representative users in each component, who are selected based on their belongingness to a component.Since the representative users in a given component would share similar demographic characteristics, we could identify which component is associated with which demographic properties.
We detect R(= 3) groups of representative users according to the following threshold rule: User i is considered to belong to group r if a ir / ∑ r a ir ≥ h r , where threshold h r is chosen such that only the upper 10 percent of users belong to group r.Fig. 6 shows the demographic distributions of the representative users belonging to each component.We note that each user may belong to multiple components, but such overlap is quite small (Fig. S4 in SI).
We find that "Marital status" and "Child" are two demographic properties that distinguish Component 2 (Weekday-shopping pattern) from the other components (Fig. 6c and d).For these two family-related attributes, the demographic distribution of the representative consumers in Component 2 is clearly different from the null distribution.This finding suggests that "Marital status" and "Child" would be the two driving factors that yield the five clusters detected by the k-medoids.On the other hand, the difference in user age between clusters seem to be more reflected in the activity of Components 1 (Sunday-shopping pattern) and 3 (Saturday-shopping pattern) rather than Component 2 (Fig. 6b), while it is not clear for gender (Fig. 6a).This means that gender and user age may be less important in extracting the multi-timescale patterns and the emergence of clusters classified by them.

Conclusion
We have presented a NTF-based method to extract dynamical shopping patterns of consumers from scanned receipt data collected through a bookkeeping application.The proposed method allows us to find intra-and inter-week expenditure patterns simultaneously, which would be impossible without such a large, high-resolution yet long time-series dataset.We found three multi-time scale patterns, each of which captures a characteristic expenditure behavior that is seen at daily and weekly scales.While our method successfully revealed explicit patterns, there remain some issues that need to be addressed in future research.First, there may be other multi-timescale activity patterns that exist shorter and/or longer time scales rather than daily and weekly.For instance, the timing of shopping may be affected by time of a day, and consumption of expensive goods (e.g., cars) may be scheduled once in every ten years.Second, consumption patterns could also be encoded in what they purchased.While our analysis is based on the number of items purchased by a user, its composition would also be useful for revealing the demographic characteristics of users.Third, more multi-timescale patterns may exist in other economic and social contexts, such as financial markets, online communication networks and face-to-face networks.NTF is a useful and user-friendly tool for the detection of multi-timescale properties, and we hope our work will stimulate further research on many economic and social activities to better understand human behavior.

Figure 1 .
Figure 1.Schematic of NTF for extracting intra-and inter-week expenditure patterns.

Figure 3 .
Figure 3. Activity at different timescales.(a) Day-of-week (i.e., intra-week) activity of each component.Activity of Component r of day j is given by b jr .(b) Weekly (i.e., inter-week) activity of Component r in week k is given by c kr .

Figure 4 .
Figure 4. Clustering of users.The user feature vectors obtained from factor matrix A are visualized through t-Distributed Stochastic Neighbor Embedding (t-SNE).

7 / 4 Figure 5 .
Figure 5. Demographic distribution for each cluster.(a) Gender, (b) Age range, from 1 (youngest) to 6 (oldest), (c) marital status, and (d) share of users who have or do not have children.

Figure 6 .
Figure 6.Demographic distribution of representative users in each component.User i belongs to group r if a ir / ∑ r a ir ≥ h r .(a) Gender, (b) Age range, from 1 (youngest) to 6 (oldest), (c) marital status, and (d) share of users who have or do not have a child.

Figure S3 .
Figure S3.Sum of distances between points in a cluster and the medoid.We select k = 5 for the analysis.

Figure S4 .
Figure S4.Jaccard index for the overlap of users belonging to multiple components.

Table 1 .
Basic statistics of receipt data collected from Dr. Wallet between April 1, 2017 and January 21, 2018.Total number of purchased items is 2,796,008.Age range is in ascending order, i.e., 1 and 6 denote the youngest and the oldest cohorts, respectively.

Table S1 .
Chi-squared test for demographic difference between clusters.