An Apache Giraph Implementation of Distributed ADMM for Solving LASSO Problems

Agrawal, Rohit; Shastri, Aditya A.; Ahuja, Kapil; Perreard, Antoine; Gujral, Juniper

doi:10.1007/978-981-16-2712-5_44

Rohit Agrawal²⁰,
Aditya A. Shastri²⁰,
Kapil Ahuja²⁰,
Antoine Perreard²¹ &
…
Juniper Gujral²²

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1393))

460 Accesses
2 Citations

Abstract

Convex formulation of optimization problems is gaining importance in many engineering problems such as signal/image processing, machine learning, the theory of structured sparsity, rank minimization, etc. Alternating Direction Method of Multipliers (ADMM) is commonly used to solve convex optimization problems. Since the volume of data is increasing day by day, developing distributed and high-performance algorithms to solve such problems is a need of today’s world. Currently in literature, distributed ADMM is implemented using Message Passing Interface (MPI), which does not scale well with the increase in the size of the data. Our main goal here is to propose an Apache Giraph-based implementation (on Hadoop) of distributed ADMM to solve the LASSO (Least Absolute Shrinkage and Selection Operator) formulation of convex optimization problems. Our most novel contribution is in exploiting the distributed nature of our algorithm to obtain the inverse of a matrix cheaply. The experimental results on randomly generated datasets show that our implementation converges in three iterations and about 30 s for a problem of size \(1.2 \times 10^9\). This is much more efficient than an MPI-based implementation that takes four times the iterations and ten times the time as compared to our algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Map-Reduce takes tens of iterations, and the time runs into minutes.
2.
Please refer [1, 11] for step-by-step evaluation of r and s.
3.
The dataset could not be exactly matched because of different formats.

References

Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2010) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends® Mach Learn 3(1):1–122
Google Scholar
Wei E, Ozdaglar A (2012) Distributed alternating direction method of multipliers. Proceedings 51st IEEE conference on decision and control, IEEE, pp 1–6
Google Scholar
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Nemade V, Shastri A, Ahuja K, Tiwari A (2018) Scaled and projected spectral clustering with vector quantization for handling big data. Proceedings of the 9th IEEE symposium series on computational intelligence (SSCI), IEEE, pp 2174–2179
Google Scholar
Message Passing Interface Forum: MPI: A message-passing interface standard version 3.0. Chapter author for collective communication, process topologies, and one sided communications (2012)
Google Scholar
Afonso M, Bioucas-Dias J, Figueiredo M (2010) Fast image recovery using variable splitting and constrained optimization. IEEE Trans Image Process 19(9):2345–2356
Article MathSciNet Google Scholar
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical LASSO. Biostatistics 9(3):432–441
Article Google Scholar
Apache Giraph: http://giraph.apache.org/. Accessed August 2020
Agrawal R, Ahuja K, Hoo CH, Nguyen TDA, Kumar A (2019) ParaLarPD: parallel FPGA router using primal-dual sub-gradient method. Electronics 8(12):1439–1454
Article Google Scholar
Agrawal R, Ahuja K, Maheshwari D, Kumar A (2020) ParaLarH: parallel FPGA router based upon Lagrange heuristics. arXiv:2010.11893
Wohlberg B. (2017) ADMM penalty parameter selection by residual balancing. arXiv:1704.06209
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J Royal Stat Soc Seri B (Methodological) 58(1):267–288
MathSciNet MATH Google Scholar
Mateos G, Bazerque J, Giannakis G (2010) Distributed sparse linear regression. IEEE Trans Signal process 58(10):5262–5276
Article MathSciNet Google Scholar
He B, Yang H, Wang S (2000) Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. J Optim Theory Appl 106(2):337–356
Article MathSciNet Google Scholar
Microsoft Azure: Install Giraph on HDInsight hadoop clusters, and use Giraph to process large-scale graphs. https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-giraph-install-linux. Accessed August 2018
Microsoft Azure: Node configuration. https://docs.microsoft.com/en-us/rest/api/automation/dscnodeconfiguration/createorupdate. Accessed August 2018

Download references

Author information

Authors and Affiliations

Data and Computational Sciences Lab, Indian Institute of Technology, Indore, 453552, India
Rohit Agrawal, Aditya A. Shastri & Kapil Ahuja
Formerly with EISTI, Cergy, France
Antoine Perreard
Formerly with Indian Institute of Technology, Indore, 453552, India
Juniper Gujral

Authors

Rohit Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Aditya A. Shastri
View author publications
You can also search for this author in PubMed Google Scholar
Kapil Ahuja
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Perreard
View author publications
You can also search for this author in PubMed Google Scholar
Juniper Gujral
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kapil Ahuja .

Editor information

Editors and Affiliations

Computer Science and Engineering, Indian Institute of Technology Indore, Indore, Madhya Pradesh, India
Aruna Tiwari
Computer Science and Engineering, Indian Institute of Technology Indore, Indore, Madhya Pradesh, India
Kapil Ahuja
Dr. B. R. Ambedkar National Institute of Technology, Jalandhar, India
Anupam Yadav
Department of Mathematics, South Asian University, New Delhi, India
Jagdish Chand Bansal
Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, India
Kusum Deep
School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK
Atulya K. Nagar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Agrawal, R., Shastri, A.A., Ahuja, K., Perreard, A., Gujral, J. (2021). An Apache Giraph Implementation of Distributed ADMM for Solving LASSO Problems. In: Tiwari, A., Ahuja, K., Yadav, A., Bansal, J.C., Deep, K., Nagar, A.K. (eds) Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol 1393. Springer, Singapore. https://doi.org/10.1007/978-981-16-2712-5_44

Download citation

DOI: https://doi.org/10.1007/978-981-16-2712-5_44
Published: 14 October 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-2711-8
Online ISBN: 978-981-16-2712-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

An Apache Giraph Implementation of Distributed ADMM for Solving LASSO Problems