Scalable algorithms for signal reconstruction by leveraging similarity joins

  • Abolfazl AsudehEmail author
  • Jees Augustine
  • Azade Nazi
  • Saravanan Thirumuruganathan
  • Nan Zhang
  • Gautam Das
  • Divesh Srivastava
Special Issue Paper


Signal reconstruction problem (SRP) is an important optimization problem where the objective is to identify a solution to an underdetermined system of linear equations that is closest to a given prior. It has a substantial number of applications in diverse areas including network traffic engineering, medical image reconstruction, acoustics, astronomy and many more. Most common approaches for SRP do not scale to large problem sizes. In this paper, we propose multiple optimization steps, developing scalable algorithms for the problem. We first propose a dual formulation of the problem and develop the Direct algorithm that is significantly more efficient than the state of the art. Second, we show how adapting database techniques developed for scalable similarity joins provides a significant speedup over Direct, scaling our proposal up to large-scale settings. Third, we describe a number of practical techniques that allow our algorithm to scale to settings of size in the order of a million by a billion. We also adapt our proposal to identify the top-k components of the solved system of linear equations. Finally, we consider the dynamic setting where the inputs to the linear system change and propose efficient algorithms inspired by the database techniques of materialization and reuse. Extensive experiments on real-world and synthetic data confirm the efficiency, effectiveness and scalability of our proposal.


Signal reconstruction Traffic reconstruction Underdetermined systems Scalable algorithm 



This paper was supported in part by AT&T and National Science Foundation (Grant No. 1343976, 1443858, 1624074, and 1760059).

Supplementary material


  1. 1.
    Beyer, K., Gemulla, R., Haas, P.J., Reinwald, B., Sismanis, Y.: Distinct-value synopses for multiset operations. Commun. ACM 52(10), 87–95 (2009)CrossRefGoogle Scholar
  2. 2.
    Bjerhammar, A.: Application of Calculus of Matrices to Method of Least Squares: With Special Reference to Geodetic Calculations. Elander, Göteborg (1951)zbMATHGoogle Scholar
  3. 3.
    Boehm, M., Dusenberry, M.W., Eriksson, D., Evfimievski, A.V., Manshadi, F.M., Pansare, N., Reinwald, B., Reiss, F.R., Sen, P., Surve, A.C., et al.: SystemML: declarative machine learning on spark. PVLDB 9(13), 1425–1436 (2016)Google Scholar
  4. 4.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  5. 5.
    Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings on Compression and Complexity of Sequences 1997, pp. 21–29 (1997)Google Scholar
  6. 6.
    Candes, E.J., Romberg, J.K., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Cao, J., Davis, D., Vander Wiel, S., Yu, B.: Time-varying network tomography: router link data. J. Am. Stat. Assoc. 95(452), 1063–1075 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Chandrasekaran, B.: Survey of Network Traffic Models, vol. 567. Washington University, St. Louis CSE (2009)Google Scholar
  9. 9.
    Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: ICDE (2006)Google Scholar
  10. 10.
    Cohen, E., Kaplan, H.: Tighter estimation using bottom k sketches. PVLDB 1(1), 213–224 (2008)Google Scholar
  11. 11.
    Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Finding hierarchical heavy hitters in data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol. 29, pp. 464–475. VLDB Endowment (2003)Google Scholar
  12. 12.
    Craig, I.J., Brown, J.C.: Inverse problems in astronomy: a guide to inversion strategies for remotely sensed data. In: Research Supported by SERC. Adam Hilger, Ltd., Bristol and Boston (1986)Google Scholar
  13. 13.
    Dasu, T., Johnson, T., Muthukrishnan, S., Shkapenyuk, V.: Mining database structure; or, how to build a data quality browser. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 240–251. ACM (2002)Google Scholar
  14. 14.
    Ding, B., König, A.C.: Fast set intersection in memory. PVLDB 4(4), 255–266 (2011)Google Scholar
  15. 15.
    Dokmanić, I., Gribonval, R.: Beyond Moore–Penrose part II: the sparse pseudoinverse (2017). arXiv:1706.08701
  16. 16.
    Erdos, P., Rényi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5(1), 17–60 (1960)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Fortz, B., Thorup, M.: Optimizing OSPF/IS-IS weights in a changing world. IEEE J. Sel. Areas Commun. 20(4), 756–767 (2002)CrossRefGoogle Scholar
  18. 18.
    Ge, D., Jiang, X., Ye, Y.: A note on the complexity of \(L_p\) minimization. Math. Program. 129(2), 285–299 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Goldschmidt, O.: ISP backbone traffic inference methods to support traffic engineering. In: Internet Statistics and Metrics Analysis (ISMA) Workshop, pp. 1063–1075 (2000)Google Scholar
  20. 20.
    Gong, Y.: Identifying P2P users using traffic analysis (2005). Accessed 21 May 2007
  21. 21.
    Gordon, J.: Pareto process as a model of self-similar packet traffic. In: Global Telecommunications Conference, 1995 (GLOBECOM’95) vol. 3, pp. 2232–2236 (1995)Google Scholar
  22. 22.
    Grangeat, P., Amans, J.L.: Three-Dimensional Image Reconstruction in Radiology and Nuclear Medicine, vol. 4. Springer, Berlin (2013) Google Scholar
  23. 23.
    Hadjieleftheriou, M., Yu, X., Koudas, N., Srivastava, D.: Hashed samples: selectivity estimators for set similarity selection queries. PVLDB 1(1), 201–212 (2008)Google Scholar
  24. 24.
    Hansen, P.C.: Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion. SIAM, Philadelphia (1998)CrossRefGoogle Scholar
  25. 25.
    Hasani, S., Thirumuruganathan, S., Asudeh, A., Koudas, N., Das, G.: Efficient construction of approximate ad-hoc ML models through materialization and reuse. Proc. VLDB Endow. 11(11), 1468–1481 (2018)CrossRefGoogle Scholar
  26. 26.
    Hrinivich, W.T., Hoover, D.A., Surry, K., Edirisinghe, C., D’Souza, D., Fenster, A., Wong, E.: Ultrasound guided high-dose-rate prostate brachytherapy: live needle segmentation and 3d image reconstruction using the sagittal transducer. Brachytherapy 15, S195 (2016)CrossRefGoogle Scholar
  27. 27.
    Kaoudi, Z., Quiané-Ruiz, J.A., Thirumuruganathan, S., Chawla, S., Agrawal, D.: A cost-based optimizer for gradient descent optimization. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 977–992. ACM (2017)Google Scholar
  28. 28.
    Kim, Y., Nelson, P.: Optimal regularisation for acoustic source reconstruction by inverse methods. J. Sound Vib. 275(3), 463–487 (2004)CrossRefGoogle Scholar
  29. 29.
    Kleinrock, L., Kamoun, F.: Hierarchical routing for large networks performance evaluation and optimization. Comput. Netw. 1(3), 155–174 (1977)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Kumar, A., Naughton, J., Patel, J.M., Zhu, X.: To join or not to join? Thinking twice about joins before feature selection. In: Proceedings of the 2016 International Conference on Management of Data, pp. 19–34. ACM (2016)Google Scholar
  31. 31.
    Lagrange, J.L.: Mécanique Analytique, vol. 1. Mallet-Bachelier, Paris (1853)Google Scholar
  32. 32.
    Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 631–636. ACM (2006)Google Scholar
  33. 33.
    Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 177–187. ACM (2005)Google Scholar
  34. 34.
    McMahan, B., Ramage, D.: Federated learning: collaborative machine learning without centralized training data. Technical report, Google (2017)Google Scholar
  35. 35.
    Medina, A., Taft, N., Salamatian, K., Bhattacharyya, S., Diot, C.: Traffic matrix estimation: existing techniques and new directions. ACM SIGCOMM Comput. Commun. Rev. 32(4), 161–174 (2002)CrossRefGoogle Scholar
  36. 36.
    Moors, E.: On the reciprocal of the general algebraic matrix (abstract). Bull. Am. Math. Soc. 26, 394–395 (1920)Google Scholar
  37. 37.
    Needell, D., Tropp, J.A.: CoSaMP: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 26(3), 301–321 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Nunes, B.A.A., Mendonca, M., Nguyen, X.N., Obraczka, K., Turletti, T.: A survey of software-defined networking: past, present, and future of programmable networks. IEEE Commun. Surv. Tutor. 16(3), 1617–1634 (2014)CrossRefGoogle Scholar
  39. 39.
    Penrose, R.: A generalized inverse for matrices. In: Mathematical proceedings of the Cambridge philosophical society, vol. 51, pp. 406–413. Cambridge University Press, Cambridge (1955)Google Scholar
  40. 40.
    Tebaldi, C., West, M.: Bayesian inference on network traffic using link count data. J. Am. Stat. Assoc. 93(442), 557–573 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    Trefethen, L.N., Bau III, D.: Numerical linear algebra. Society for Industrial and Applied Mathematics, Philadelphia. Technical report, ISBN 978-0-89871-361-9 (1997)Google Scholar
  42. 42.
    Tsirogiannis, D., Guha, S., Koudas, N.: Improving the performance of list intersection. PVLDB 2(1), 838–849 (2009)Google Scholar
  43. 43.
    Tune, P., Roughan, M.: Maximum entropy traffic matrix synthesis. In: ACM SIGMETRICS Performance Evaluation Review vol. 42(2), pp. 43–45 (2014)Google Scholar
  44. 44.
    Vogel, C.R.: Computational Methods for Inverse Problems. SIAM, Philadelphia (2002)CrossRefzbMATHGoogle Scholar
  45. 45.
    Zhang, C., Kumar, A., Ré, C.: Materialization optimizations for feature selection workloads. ACM Trans. Datab. Syst. (TODS) 41(1), 2 (2016)MathSciNetGoogle Scholar
  46. 46.
    Zhang, Y., Roughan, M., Duffield, N., Greenberg, A.: Fast accurate computation of large-scale IP traffic matrices from link loads. In: ACM SIGMETRICS Performance Evaluation Review, vol. 31, pp. 206–217. ACM (2003)Google Scholar
  47. 47.
    Zhang, Y., Roughan, M., Lund, C., Donoho, D.: An information-theoretic approach to traffic matrix estimation. In: Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, pp. 301–312. ACM (2003)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.University of Illinois at ChicagoChicagoUSA
  2. 2.University of Texas at ArlingtonArlingtonUSA
  3. 3.Google AIMountain ViewUSA
  4. 4.QCRI, HBKUAr RayyanQatar
  5. 5.Pennsylvania State UniversityState CollegeUSA
  6. 6.AT&T Labs-ResearchFlorham ParkUSA

Personalised recommendations