Skip to main content
Log in

Fast block-wise partitioning for extreme multi-label classification

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Extreme multi-label classification aims to learn a classifier that annotates an instance with a relevant subset of labels from an extremely large label set. Many existing solutions embed the label matrix to a low-dimensional linear subspace, or examine the relevance of a test instance to every label via a linear scan. In practice, however, those approaches can be computationally exorbitant. To alleviate this drawback, we propose a Block-wise Partitioning (BP) pretreatment that divides all instances into disjoint clusters, to each of which the most frequently tagged label subset is attached. One multi-label classifier is trained on one pair of instance and label clusters, and the label set of a test instance is predicted by first delivering it to the most appropriate instance cluster. Experiments on benchmark multi-label data sets reveal that BP pretreatment significantly reduces prediction time, and retains almost the same level of prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

All data are publicly available with references provided in the paper.

Code availability

The code can be obtained from the authors. It will also be made publicly available in github once the paper is accepted for publication.

Notes

  1. These choices are adopted from the Extreme Classification Repository.

References

  • Agrawal R, Gupta A, Prabhu Y, Varma M (2013) Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages. In: Proceedings of the 22nd international conference on World Wide Web, ACM, pp 13–24

  • Babbar R, Schölkopf B (2017) Dismec: distributed sparse machines for extreme multi-label classification. In: Proceedings of the tenth ACM international conference on web search and data mining, ACM, pp 721–729

  • Babbar R, Schölkopf B (2019) Data scarcity, robustness and extreme multi-label classification. Mach Learn, 1–23

  • Bhatia K, Dahiya K, Jain H, Kar P, Mittal A, Prabhu Y, Varma M (2016) The extreme classification repository: multi-label datasets and code. URL http://manikvarma.org/downloads/XC/XMLRepository.html

  • Bhatia K, Jain H, Kar P, Varma M, Jain P (2015) Sparse local embeddings for extreme multi-label classification. Adv Neural Inf Process Syst, 730–738

  • Chang W-C, Jiang D, Yu H-F, Teo CH, Zhang J, Zhong K, Kolluri K, Hu Q, Shandilya N, Ievgrafov V et al (2021) Extreme multi-label learning for semantic matching in product search. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, 2643–2651

  • Chang W-C, Yu H-F, Zhong K, Yang Y, Dhillon IS (2020) Taming pretrained transformers for extreme multi-label text classification. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 3163–3171

  • Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292

    MATH  Google Scholar 

  • Dahiya K, Agarwal A, Saini D, Gururaj K, Jiao J, Singh A, Agarwal S, Kar P, Varma M (2021a) Siamesexml: siamese networks meet extreme classifiers with 100m labels. In: International conference on machine learning, PMLR, pp 2330–2340

  • Dahiya K, Saini D, Mittal A, Shaw A, Dave K, Soni A, Jain H, Agarwal S, Varma M (2021b) Deepxml: A deep extreme multi-label learning framework applied to short text documents. In: Proceedings of the 14th ACM international conference on web search and data mining, pp 31–39

  • Day WH, Edelsbrunner H (1984) Efficient algorithms for agglomerative hierarchical clustering methods. J Classif 1:7–24

    Article  MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B (Methodological) 39:1–22

    MathSciNet  MATH  Google Scholar 

  • Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: Computer vision and pattern recognition, 2009. CVPR 2009. IEEE Conference on, IEEE, pp 248–255

  • Evron I, Moroshko E, Crammer K (2018) Efficient loss-based decoding on graphs for extreme classification. Adv Neural Inf Process Syst, 31

  • Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) Liblinear: a library for large linear classification. J Mach Learn Resarch 9:1871–1874

    MATH  Google Scholar 

  • Gupta V, Wadbude R, Natarajan N, Karnick H, Jain P, Rai P (2019) Distributional semantics meets multi-label learning. Proc AAAI Conf Artif Intell 33:3747–3754

    Google Scholar 

  • Hsu DJ, Kakade SM, Langford J, Zhang T (2009) Multi-label prediction via compressed sensing. In: Advances in neural information processing systems, pp 772–780

  • Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall Inc

    MATH  Google Scholar 

  • Jain H, Balasubramanian V, Chunduri B, Varma M (2019) Slice: scalable linear extreme classifiers trained on 100 million labels for related searches. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 528–536

  • Jain H, Prabhu Y, Varma M (2016) Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 935–944

  • Jalan A, Kar P (2019) Accelerating extreme classification via adaptive feature agglomeration. In: Proceedings of the 28th international joint conference on artificial intelligence, pp 2600–2606

  • Jasinska K, Dembczynski K, Busa-Fekete R, Pfannschmidt K, Klerx T, Hullermeier E (2016) Extreme f-measure maximization using sparse probability estimates. In: International conference on machine learning, pp 1435–1444

  • Jiang T, Wang D, Sun L, Yang H, Zhao Z, Zhuang F (2021) Lightxml: transformer with dynamic negative sampling for high-performance extreme multi-label text classification. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 7987–7994

  • Khandagale S, Xiao H, Babbar R (2019) Bonsai-diverse and shallow trees for extreme multi-label classification. arXiv preprint arXiv:1904.08249

  • Khandagale S, Xiao H, Babbar R (2020) Bonsai: diverse and shallow trees for extreme multi-label classification. Mach Learn 109:2099–2119

    Article  MathSciNet  MATH  Google Scholar 

  • Liu J, Chang W-C, Wu Y, Yang Y (2017) Deep learning for extreme multi-label text classification. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 115–124

  • McAuley J, Leskovec J (2013) Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on recommender systems, ACM, pp 165–172

  • Mittal A, Dahiya K, Agrawal S, Saini D, Agarwal S, Kar P, Varma M (2021) Decaf: deep extreme classification with label features. In Proceedings of the 14th ACM international conference on web search and data mining, pp 49–57

  • Mittal A, Dahiya K, Malani S, Ramaswamy J, Kuruvilla S, Ajmera J, Chang K-h, Agarwal S, Kar P, Varma M (2022) Multi-modal extreme classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12393–12402

  • Nasierding G, Tsoumakas G, Kouzani AZ (2009) Clustering based multi-label classification for image annotation and retrieval. In: 2009 IEEE international conference on systems, man and cybernetics SMC , IEEE, pp 4514–4519

  • Niculescu-Mizil A, Abbasnejad E (2017) Label filters for large scale multilabel classification. In: Artificial intelligence and statistics, pp 1448–1457

  • Panos A, Dellaportas P, Titsias MK (2021) Large scale multi-label learning using gaussian processes. Mach Learn 110:965–987

    Article  MathSciNet  MATH  Google Scholar 

  • Partalas I, Kosmopoulos A, Baskiotis N, Artieres T, Paliouras G, Gaussier E, Androutsopoulos I, Amini M-R, Galinari P (2015) Lshtc: A benchmark for large-scale text classification. arXiv preprint arXiv:1503.08581

  • Prabhu Y, Kag A, Harsola S, Agrawal R, Varma M (2018) Parabel: partitioned label trees for extreme classification with application to dynamic search advertising. In: Proceedings of the 2018 world wide web conference, International world wide web conferences steering committee, pp 993–1002

  • Prabhu Y, Varma M (2014) Fastxml: a fast, accurate and stable tree-classifier for extreme multi-label learning. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 263–272

  • Qaraei M, Schultheis E, Gupta P, Babbar R (2021) Convex surrogates for unbiased loss functions in extreme classification with missing labels. In: Proceedings of the web conference, vol 2021, pp 3711–3720

  • Si S, Zhang H, Keerthi SS, Mahajan D, Dhillon IS, Hsieh C-J (2017) Gradient boosted decision trees for high dimensional sparse output. In: International conference on machine learning, pp 3182–3190

  • Siblini W, Kuntz P, Meyer F (2018) Craftml, an efficient clustering-based random forest for extreme multi-label learning

  • Snoek CG, Worring M, Van Gemert JC, Geusebroek J-M, Smeulders AW (2006) The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of the 14th ACM international conference on multimedia, ACM, pp 421–430

  • Tagami Y (2017) Annexml: Approximate nearest neighbor search for extreme multi-label classification. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 455–464

  • Wei T, Tu W-W, Li Y-F, Yang G-P (2021) Towards robust prediction on tail labels. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp 1812–1820

  • Weston J, Makadia A, Yee H (2013) Label partitioning for sublinear ranking. In: International conference on machine learning, pp 181–189

  • Wetzker R, Zimmermann C, Bauckhage C (2008) Analyzing social bookmarking systems: a del. icio. us cookbook. In: Proceedings of the ECAI 2008 mining social data workshop, pp 26–30

  • Wydmuch M, Jasinska K, Kuznetsov M, Busa-Fekete R, Dembczynski K (2018) A no-regret generalization of hierarchical softmax to extreme multi-label classification. In: Advances in neural information processing systems, pp 6355–6366

  • Yen IE, Huang X, Dai W, Ravikumar P, Dhillon I, Xing E (2017) Ppdsparse: a parallel primal-dual sparse method for extreme classification. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 545–553

  • Yen I E-H, Huang X, Ravikumar P, Zhong K, Dhillon I (2016) Pd-sparse: a primal and dual sparse approach to extreme multiclass and multilabel classification. In: International conference on machine learning, pp 3069–3077

  • You R, Dai S, Zhang Z, Mamitsuka H, Zhu S (2018) Attentionxml: extreme multi-label text classification with multi-label attention based recurrent neural networks. arXiv preprint arXiv:1811.01727

  • Yu H-F, Jain P, Kar P, Dhillon I (2014) Large-scale multi-label learning with missing labels. In: International conference on machine learning, pp 593–601

  • Zubiaga A (2012) Enhancing navigation on wikipedia with social tags. arXiv preprint arXiv:1202.5469

Download references

Funding

Liang and Lee were supported by the National Science Foundation under Grants CCF-1934568, DMS-1916125 and DMS-2113605. Hsieh was supported by the National Science Foundation under Grants CCF-1934568, IIS-1901527 and IIS-2008173.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the development of the proposed method and the writing of the manuscript. YL carried out most of the numerical experiments. All authors reviewed the manuscript.

Corresponding author

Correspondence to Thomas C. M. Lee.

Ethics declarations

Conflict of interest

The authors do not have any conflicts of interest/competing interests to declare.

Additional information

Responsible editor: Dragi Kocev.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, Y., Hsieh, CJ. & Lee, T.C.M. Fast block-wise partitioning for extreme multi-label classification. Data Min Knowl Disc 37, 2192–2215 (2023). https://doi.org/10.1007/s10618-023-00945-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-023-00945-5

Keywords

Navigation