Trans2Vec: Learning Transaction Embedding via Items and Frequent Itemsets
Learning meaningful and effective representations for transaction data is a crucial prerequisite for transaction classification and clustering tasks. Traditional methods which use frequent itemsets (FIs) as features often suffer from the data sparsity and high-dimensionality problems. Several supervised methods based on discriminative FIs have been proposed to address these disadvantages, but they require transaction labels, thus rendering them inapplicable to real-world applications where labels are not given. In this paper, we propose an unsupervised method which learns low-dimensional continuous vectors for transactions based on information of both singleton items and FIs. We demonstrate the superior performance of our proposed method in classifying transactions on four datasets compared with several state-of-the-art baselines.
This work is partially supported by the Telstra-Deakin Centre of Excellence in Big Data and Machine Learning. Tu Dinh Nguyen gratefully acknowledges the partial support from the Australian Research Council (ARC).
- 1.Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: KDD, pp. 80–86 (1998)Google Scholar
- 4.Chen, M.: Efficient vector representation for documents through corruption. In: ICLR 2017 (2017)Google Scholar
- 5.Cheng, H., Yan, X., Han, J., Hsu, C.-W.: Discriminative frequent pattern analysis for effective classification. In: ICDE, pp. 716–725 (2007)Google Scholar
- 6.Fournier-Viger, P., Lin, J.C.-W., Vo, B., Chi, T.T., Zhang, J., Le, H.B.: A survey of itemset mining. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 7(4), e1207 (2017)Google Scholar
- 7.Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: KDD, pp. 855–864 (2016)Google Scholar
- 10.Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, pp. 1188–1196 (2014)Google Scholar
- 11.Li, W., Han, J., Pei, J.: CMAR: accurate and efficient classification based on multiple class-association rules. In: ICDM, pp. 369–376. IEEE (2001)Google Scholar
- 12.Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)Google Scholar
- 13.Nguyen, D., Luo, W., Phung, D., Venkatesh, S.: Control matching via discharge code sequences. In: NIPS 2016 Workshop on Machine Learning for Health (2016)Google Scholar
- 14.Phan, X.-H., Nguyen, L.-M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW, pp. 91–100 (2008)Google Scholar
- 15.Rousseau, F., Kiagias, E., Vazirgiannis, M.: Text categorization as a graph classification problem. In: ACL, pp. 1702–1712 (2015)Google Scholar