Trans2Vec: Learning Transaction Embedding via Items and Frequent Itemsets

  • Dang NguyenEmail author
  • Tu Dinh Nguyen
  • Wei Luo
  • Svetha Venkatesh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10939)


Learning meaningful and effective representations for transaction data is a crucial prerequisite for transaction classification and clustering tasks. Traditional methods which use frequent itemsets (FIs) as features often suffer from the data sparsity and high-dimensionality problems. Several supervised methods based on discriminative FIs have been proposed to address these disadvantages, but they require transaction labels, thus rendering them inapplicable to real-world applications where labels are not given. In this paper, we propose an unsupervised method which learns low-dimensional continuous vectors for transactions based on information of both singleton items and FIs. We demonstrate the superior performance of our proposed method in classifying transactions on four datasets compared with several state-of-the-art baselines.



This work is partially supported by the Telstra-Deakin Centre of Excellence in Big Data and Machine Learning. Tu Dinh Nguyen gratefully acknowledges the partial support from the Australian Research Council (ARC).


  1. 1.
    Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: KDD, pp. 80–86 (1998)Google Scholar
  2. 2.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)CrossRefGoogle Scholar
  3. 3.
    Chen, D., Sain, S.L., Guo, K.: Data mining for the online retail industry: a case study of RFM model-based customer segmentation using data mining. J. Database Market. Customer Strategy Manag. 19(3), 197–208 (2012)CrossRefGoogle Scholar
  4. 4.
    Chen, M.: Efficient vector representation for documents through corruption. In: ICLR 2017 (2017)Google Scholar
  5. 5.
    Cheng, H., Yan, X., Han, J., Hsu, C.-W.: Discriminative frequent pattern analysis for effective classification. In: ICDE, pp. 716–725 (2007)Google Scholar
  6. 6.
    Fournier-Viger, P., Lin, J.C.-W., Vo, B., Chi, T.T., Zhang, J., Le, H.B.: A survey of itemset mining. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 7(4), e1207 (2017)Google Scholar
  7. 7.
    Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: KDD, pp. 855–864 (2016)Google Scholar
  8. 8.
    He, Z., Feiyang, G., Zhao, C., Liu, X., Jun, W., Wang, J.: Conditional discriminative pattern mining: concepts and algorithms. Inf. Sci. 375, 1–15 (2017)CrossRefGoogle Scholar
  9. 9.
    Kameya, Y., Sato, T.: RP-growth: top-k mining of relevant patterns with minimum support raising. In: SDM, pp. 816–827. SIAM (2012)CrossRefGoogle Scholar
  10. 10.
    Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, pp. 1188–1196 (2014)Google Scholar
  11. 11.
    Li, W., Han, J., Pei, J.: CMAR: accurate and efficient classification based on multiple class-association rules. In: ICDM, pp. 369–376. IEEE (2001)Google Scholar
  12. 12.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)Google Scholar
  13. 13.
    Nguyen, D., Luo, W., Phung, D., Venkatesh, S.: Control matching via discharge code sequences. In: NIPS 2016 Workshop on Machine Learning for Health (2016)Google Scholar
  14. 14.
    Phan, X.-H., Nguyen, L.-M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW, pp. 91–100 (2008)Google Scholar
  15. 15.
    Rousseau, F., Kiagias, E., Vazirgiannis, M.: Text categorization as a graph classification problem. In: ACL, pp. 1702–1712 (2015)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Dang Nguyen
    • 1
    Email author
  • Tu Dinh Nguyen
    • 1
  • Wei Luo
    • 1
  • Svetha Venkatesh
    • 1
  1. 1.Center for Pattern Recognition and Data Analytics, School of Information TechnologyDeakin UniversityGeelongAustralia

Personalised recommendations