Skip to main content
Log in

Compressive mining: fast and optimal data mining in the compressed domain

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Real-world data typically contain repeated and periodic patterns. This suggests that they can be effectively represented and compressed using only a few coefficients of an appropriate basis (e.g., Fourier and wavelets). However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area. This work studies the optimization problems related to obtaining the tightest lower/upper bound on Euclidean distances when each data object is potentially compressed using a different set of orthonormal coefficients. Our technique leads to tighter distance estimates, which translates into more accurate search, learning and mining operations directly in the compressed domain. We formulate the problem of estimating lower/upper distance bounds as an optimization problem. We establish the properties of optimal solutions and leverage the theoretical analysis to develop a fast algorithm to obtain an exact solution to the problem. The suggested solution provides the tightest estimation of the \(L_2\)-norm or the correlation. We show that typical data analysis operations, such as \(k\)-nearest-neighbor search or k-Means clustering, can operate more accurately using the proposed compression and distance reconstruction technique. We compare it with many other prevalent compression and reconstruction techniques, including random projections and PCA-based techniques. We highlight a surprising result, namely that when the data are highly sparse in some basis, our technique may even outperform PCA-based compression. The contributions of this work are generic as our methodology is applicable to any sequential or high-dimensional data as well as to any orthogonal data transformation used for the underlying data compression scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. This has a natural interpretation as the Nash equilibrium of a two-player game [33] in which Player 1 seeks to minimize the objective of (3) with respect to \(\mathbf {z}\) and Player 2 seeks to minimize the same objective with respect to \(\mathbf {y}\).

  2. The proof of optimality in this case assumes a finite subset of \(P_3\) and applies the same conditions of optimality that were leveraged before; in fact, particular selection of the subset is not of importance, as long as its cardinality is large enough to accommodate the computed energy allocation \((e'_x,e'_q)\).

  3. While JL Lemma applies for any set of points \(\lbrace \mathbf {X}^{(1)}, \dots , \mathbf {X}^{(V)} \rbrace \) in high dimensions, more can be achieved if sparse representations of \(\mathbf {X}^{(i)}, \forall i\), are known to exist a priori. Compressive sensing (CS) [37] [34] roughly states that a sparse signal, compared with its ambient dimension, can be perfectly reconstructed from far fewer samples than dictated by the well-known Nyquist–Shannon theorem. To this extent, CS theory exploits the sparsity to extend the JL Lemma to more general signal classes, not restricted to a collection of points \(\mathcal {X}\). As a by-product of this extension, the CS version of the JL Lemma constitutes the restricted isometry property (RIP).

  4. Recent developments [39] describe deterministic constructions of \(\varvec{\Phi }\) in polynomial time, based on the fact that \(\mathcal {X}\) is known a priori and fixed. The authors in [39] propose the NuMax algorithm, a semidefinite programming solver for convex nuclear norm minimization over \(\ell _{\infty }\)-norm and positive semidefinite constraints. However, NuMax has \(\mathcal {O}(C + N^3 + N^2C^2)\) time complexity per iteration and overall \(\mathcal {O}(C^2)\) space complexity, where \(C:= {V \atopwithdelims ()2}\); this renders NuMax prohibitive for real-time applications. Such an approach is not included in our experiments, but we mention it here for completeness.

  5. For centroid initialization, one can choose \(C^{(t)}\) to be \((i)\) completely random points in the compressed domain, \((ii)\) set randomly to one of the compressed representations of \(\mathbf {X}^{(i)} \in \mathcal {DB}\) or (iii) use the better performing k-Means++ initialization algorithm [42, 47].

  6. The condition (16d) excludes the cases that for some \(i\) \(z_i = 0\), or \(y_i=0\), which will be treated separately.

  7. An alternative and more direct approach of establishing the existence of a fixed point is by considering all possible cases and defining an appropriate compact convex set \(E\subset \mathbb {R}^2_+ \setminus (0,0)\) so that \(T(E)\subset E\), whence existence follows by the Brouwer’s fixed point theorem [33], as \(T\) is continuous.

References

  1. Souza, A., Pineda, J.: Tidal mixing modulation of sea surface temperature and diatom abundance in Southern California. Cont. Shelf Res. 21(6–7), 651–666 (2001)

    Article  Google Scholar 

  2. Noble, P., Wheatland, M.: Modeling the sunspot number distribution with a Fokker–Planck equation. Astrophys. J. 732(1), 5 (2011)

  3. Baechler, G., Freris, N., Quick, R., Crochiere, R.: Finite rate of innovation based modeling and compression of ECG signals. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1252–1256 (2013)

  4. Chien, S., Immorlica, N.: Semantic similarity between search engine queries using temporal correlation, In: Proceedings of World Wide Web conference (WWW 2005) (2005)

  5. Liu, B., Jones, R., Klinkner, K.L.: Measuring the meaning in time series clustering of text search queries. In: Proceedings of the ACM International Conference on Information and Knowledge Management, pp. 836–837, ACM (2006)

  6. Nygren, E., Sitaraman, R.K., Wein, J.: Networked systems research at akamai. ACM SIGOPS Oper. Syst. Rev. 44(3), 1–1 (2010)

    Article  Google Scholar 

  7. Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Proceedings of the International Conference of Foundations of Data Organization (FODO), pp. 69–84 (1993)

  8. Rafiei, D., Mendelzon, A.: Efficient retrieval of similar time sequences using DFT. In: Proceedings of the International Conference of Foundations of Data Organization (FODO), pp. 1–15 (1998)

  9. Chan, F.-P., Fu, A.-C., Yu, C.: Haar wavelets for efficient similarity search of time-series: with and without time warping. IEEE Trans. Knowl. Data Eng. 15(3), 686–705 (2003)

    Article  Google Scholar 

  10. Eruhimov, V., Martyanov, V., Raulefs, P., Tuv, E.: Combining unsupervised and supervised approaches to feature selection for multivariate signal compression. In: Intelligent Data Engineering and Automated, Learning, pp. 480–487 (2006)

  11. Vlachos, M., Kozat, S., Yu, P.: Optimal distance bounds for fast search on compressed time-series query logs. ACM Trans. Web 4(2), 6:1–6:28 (2010)

    Article  Google Scholar 

  12. Vlachos, M., Kozat, S., Yu, P.: Optimal distance bounds on time-series data. In: Proceedings of SIAM Data Mining (SDM), pp. 109–120 (2009)

  13. Cai, Y., Ng, R.: Indexing spatio-temporal trajectories with chebyshev polynomials. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 599–610, ACM (2004)

  14. Wang, C., Wang, X.S.: Multilevel filtering for high dimensional nearest neighbor search. In: Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge, Citeseer (2000)

  15. Dasgupta, S.: Experiments with random projection. In: Proceedings of Conference on Uncertainty in Artificial Intelligence, pp. 143–151, Morgan Kaufmann Publishers Inc. (2000)

  16. Calderbank, R., Jafarpour, S., Schapire, R.: Compressed learning: universal sparse dimensionality reduction and learning in the measurement domain. Technical Report (Princeton University) (2009)

  17. Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math. 26, 189–206 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  18. Indyk, P., Naor, A.: Nearest-neighbor-preserving embeddings. ACM Trans. Algorithms (TALG) 3(3), 31 (2007)

    Article  MathSciNet  Google Scholar 

  19. Ailon, N., Chazelle, B.: Approximate nearest neighbors and the fast Johnson–Lindenstrauss transform. In: Proceedings of ACM symposium on Theory of Computing, pp. 557–563, ACM (2006)

  20. Boutsidis, C., Zouzias, A., Drineas, P.: Random projections for \(k\)-means clustering. In. Advances in Neural Information Processing Systems, pp. 298–306 (2010)

  21. Cardoso, Â., Wichert, A.: Iterative random projections for high-dimensional data clustering. Pattern Recognit. Lett. 33(13), 1749–1755 (2012)

    Article  Google Scholar 

  22. Achlioptas, D.: Database-friendly random projections. In: Proceedings of ACM Symposium on Principles of Database Systems (PODS), pp. 274–281 (2001)

  23. Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 245–250, ACM (2001)

  24. Freris, N.M., Vlachos, M., Kozat, S.S.: Optimal distance estimation between compressed data series. In: Proceedings of SIAM Data Mining (SDM), pp. 343–354 (2012)

  25. Vlachos, M., Yu, P., Castelli, V.: On periodicity detection and structural periodic similarity. In: Proceedings of SIAM Data Mining (SDM), pp. 449–460 (2005)

  26. Mueen, A., Nath, S., Liu, J.: Fast approximate correlation for massive time-series data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 171–182, ACM (2010)

  27. Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2002)

  28. Mueen, A., Keogh, E.J., Shamlo, N.B.: Finding time series motifs in disk-resident data. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), pp. 367–376 (2009)

  29. Oppenheim, A.V., Schafer, R.W., Buck, J.R., et al.: Discrete-Time Signal Processing, vol. 5. Prentice Hall, Upper Saddle River (1999)

    Google Scholar 

  30. Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge, pp. 151–162 (2001)

  31. Boyd, S., Vandenberghe, L.: Convex Optimization, 1st edn. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  32. Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In: Recent Advances in Learning and Control, pp. 95–110, Springer (2008)

  33. Basar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory, 2nd edn. Academic Press, New York (1995)

    MATH  Google Scholar 

  34. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theor. 52(4), 1289–1306 (2006)

  35. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

  36. Jones, P.W., Osipov, A., Rokhlin, V.: Randomized approximate nearest neighbors algorithm. Proc. Natl. Acad. Sci. 108(38), 15679–15686 (2011)

    Article  Google Scholar 

  37. Candes, E.J., Romberg, J.K., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  38. Kushilevitz, E., Ostrovsky, R., Rabani, Y.: Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput. 30(2), 457–474 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  39. Hegde, C., Sankaranarayanan, A., Yin, W., Baraniuk, R.: A convex approach for learning near-isometric linear embeddings, preprint, Aug (2012)

  40. Dasgupta, S.: Learning mixtures of Gaussians. In: Proceedings of Symposium on Foundations of Computer Science (FOCS), pp. 634–644, IEEE (1999)

  41. Arriaga, R.I., Vempala, S.: An algorithmic theory of learning: robust concepts and random projection. In: Proceedings of Symposium on Foundations of Computer Science (FOCS), pp. 616–623, IEEE (1999)

  42. Freris, N.M., Vlachos, M., Turaga, D.S.: Cluster-aware compression with provable k-means preservation. In: Proceedings of SIAM Data Mining (SDM), pp. 82–93 (2012)

  43. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28(2), 129–137 (1982)

  44. Tanay, A., Sharan, R., Shamir, R.: Biclustering algorithms: a survey. Handb. Comput. Mol. Biol. 9, 1–26 (2005)

    Google Scholar 

  45. Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 269–274 (2001)

  46. Huber, P.J.: Projection pursuit. Ann. Stat. 13(2), 435–475 (1985)

  47. Arthur, D., Vassilvitskii, S.: k-Means++: the advantages of careful seeding. In: Proceedings of Symposium of Discrete Analysis (2005)

  48. Cullum, J.K., Willoughby, R.A.: Lanczos algorithms for large symmetric eigenvalue computations: vol. 1, Theory. No. 41, SIAM (2002)

  49. Mallat, S.: A Wavelet Tour of Signal Processing, 2nd edn. Academic Press, San Diego (1999)

  50. Wikipedia. http://en.wikipedia.org/wiki/Design_rule_checking

  51. Crawford, B.: Design rules checking for integrated circuits using graphical operators. In: Proceedings on Computer Graphics and Interactive Techniques, pp. 168–176, ACM, (1975)

Download references

Acknowledgments

The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013)/ERC Grant Agreement No. 259569.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michail Vlachos.

Appendix

Appendix

1.1 Existence of solutions and necessary and sufficient conditions for optimality

The constraint set is a compact convex set, in fact, a compact polyhedron. The function \(g(x,y):= -\sqrt{x}\sqrt{y}\) is convex but not strictly convex on \(\mathbb {R}^2_+\). To see this, note that the Hessian exists for all \(x,y>0\) and equals

$$\begin{aligned} \triangledown ^2g = \frac{1}{4}\left( \begin{array}{cc} x^{-\frac{3}{2}}y^{-\frac{1}{2}} &{} -x^{-\frac{1}{2}}y^{-\frac{1}{2}}\\ -x^{-\frac{1}{2}}y^{-\frac{1}{2}} &{} x^{-\frac{1}{2}}y^{-\frac{3}{2}} \end{array}\right) \end{aligned}$$

with eigenvalues \(0, \frac{1}{\sqrt{xy}}(\frac{1}{x} + \frac{1}{y})\), and hence is positive semi-definite, which in turn implies that \(g\) is convex [31]. Furthermore, \(-\sqrt{x}\) is a strictly convex function of \(x\) so that the objective function of (3) is convex, and strictly convex only if \(p^-_x\cap p^-_q = \emptyset \). It is also a continuous function so solutions exist, i.e., the optimal value is bounded and is attained. It is easy to check that the Slater condition holds, whence the problem satisfies strong duality and there exist Lagrange multipliers [31]. We skip the technical details for simplicity, but we want to highlight that this property is crucial because it guarantees that the KKT necessary conditions [31] for Lagrangian optimality are also sufficient. Therefore, if we can find a solution that satisfies the KKT conditions for the problem, we have found an exact optimal solution and the exact optimal value of the problem. The Lagrangian is

$$\begin{aligned}&L(\mathbf {y}, \mathbf {z}, \lambda , \mu , {\varvec{\alpha }}, {\varvec{\beta }}) \nonumber \\&\quad := -2\!\!\sum \limits _{i\in P_1} \!\!b_i \sqrt{z}_i - 2\!\!\sum \limits _{i\in P_2} \!\!a_i \sqrt{y}_i - 2\!\!\sum \limits _{i\in P_3} \!\! \sqrt{z}_i\sqrt{y}_i\nonumber \\&\quad \quad \quad + \lambda \Big (\sum \limits _{i\in p^-_x}( z_i - e_x)\Big ) + \mu \Big (\sum \limits _{i\in p^-_q}( y_i - e_q)\Big )\nonumber \\&\quad \quad \quad + \sum \limits _{i\in p^-_x}\alpha _i(z_i-Z) + \sum \limits _{i\in p^-_q}\beta _i(y_i-Y). \end{aligned}$$
(15)

The KKT conditions are as followsFootnote 6:

$$\begin{aligned} 0\le z_i \le Z, \ 0\le y_i&\le Y, \,\,\text {(PF)}\nonumber \\ \sum \limits _{i\in p^-_x} z_i \le e_x, \ \sum \limits _{i\in p^-_x} z_i&\le e_Q \end{aligned}$$
(16a)
$$\begin{aligned} \lambda , \mu , \alpha _i, \beta _i&\ge 0 \,\,\text {(DF)} \end{aligned}$$
(16b)
$$\begin{aligned} \alpha _i(z_i-Z) = 0, \ \ \beta _i(y_i-Y)&= 0 \,\,\text {(CS)} \end{aligned}$$
(16c)
$$\begin{aligned} \lambda \Big (\sum \limits _{i\in p^-_x}( z_i - e_x)\Big ) = 0, \ \ \mu \Big (\sum \limits _{i\in p^-_q}( y_i - e_q)\Big )&= 0 \nonumber \\ i\in P_1: \ \frac{\partial L}{\partial z_i} = -\frac{b_i}{\sqrt{z_i}} + \lambda + \alpha _i&= 0\,\, \text {(O)}\nonumber \\ i\in P_2: \ \frac{\partial L}{\partial y_i} = -\frac{a_i}{\sqrt{y_i}} + \mu + \beta _i&= 0 \nonumber \\ i\in P_3: \ \frac{\partial L}{\partial z_i} = -\frac{\sqrt{y_i}}{\sqrt{z_i}} + \lambda + \alpha _i&= 0 \nonumber \\ \frac{\partial L}{\partial y_i} = -\frac{\sqrt{z_i}}{\sqrt{y_i}} + \mu + \beta _i&= 0 \;, \end{aligned}$$
(16d)

where we use shorthand notation for primal feasibility (PF), dual feasibility (DF), complementary slackness (CS) and optimality (O) [31].

1.2 Proof of Theorem 1

For the first part, note that problem (3) is a double minimization problem over \(\{z_i\}_{i\in p^-_x}\) and \(\{y_i\}_{i\in p^-_q}\). If we fix one vector in the objective function of (3), then the optimal solution with respect to the other one is given by the waterfilling algorithm. In fact, if we consider the KKT conditions (16) or the KKT conditions to (2), they correspond exactly to (7). The waterfilling algorithm has the property that if \(\mathbf {a} = \text {waterfill }(\mathbf {b},e_x,A)\), then \(b_i>0\) implies \(a_i>0\). Furthermore, it has a monotonicity property in the sense that \(b_i \le b_j\) implies \(a_i \le a_j\). Assume that, at optimality, \(a_{l_1} < a_{l_2}\) for some \(l_1\in P_1,l_2\in P_3\). Because \(b_{l_1} \ge B \ge b_{l_3}\) we can swap these two values to decrease the objective function, which is a contradiction. The exact same argument applies for \(\{b_l\}\), so \(\min _{l\in P_1}a_l \ge \max _{l\in P_3}a_l, \min _{l\in P_2}b_l \ge \max _{l\in P_3}b_l\).

For the second part, note that \(-\sum \nolimits _{i\in P_3} \sqrt{z_i}\sqrt{y_i} \ge -\sqrt{e_x'}\sqrt{e_q'}\). If \(e_x'e_q' >0\), then at optimality this is attained with equality for the particular choice of \(\{a_l,b_l\}_{l\in P_3}\). It follows that all entries of the optimal solution \(\{a_l,b_l\}_{l\in p^-_x \cup p^-_q}\) are strictly positive, hence (16d) implies that

$$\begin{aligned} a_i&= \frac{b_i}{\lambda + \alpha _i}, \ \ i\in P_1 \end{aligned}$$
(17a)
$$\begin{aligned} b_i&= \frac{a_i}{\mu + \beta _i}, \ \ i \in P_2 \end{aligned}$$
(17b)
$$\begin{aligned} a_i&= (\mu + \beta _i)b_i, \ \ i \in P_3\nonumber \\ b_i&= (\lambda + \alpha _i)a_i, \ \ i \in P_3. \end{aligned}$$
(17c)

For the particular solution with all entries in \(P_3\) equal \( \left( a_l = \sqrt{e_x'/|P_3|}, b_l = \sqrt{e_q'/|P_3|} \right) \), (8a) is an immediate application of (17c). The optimal entries \(\{a_l\}_{l\in P_1}, \{b_l\}_{l\in P_2}\) are provided by waterfilling with available energies \(e_x - e_x', e_q-e_q'\), respectively, so (9) immediately follows.

For the third part, note that the cases that either \(e_x'=0,e_q'>0\) or \(e_x'>0,e_q'=0\) are excluded at optimality by the first part, cf.  (7).

For the last part, note that when \(e_x' = e_q' = 0\), equivalently \(a_l=b_l = 0\) for \(l\in P_3\), it is not possible to take derivatives with respect to any coefficient in \(P_3\), so the last two equations of (16) do not hold. In that case, we need to perform a standard perturbation analysis. Let \({\varvec{\epsilon }}:= \{\epsilon _l\}_{l \in P_1 \cup P_2}\) be a sufficiently small positive vector. As the constraint set of (3) is linear in \(z_i\) and \(y_i\), any feasible direction (of potential decrease of the objective function) is of the form \(z_i \leftarrow z_i - \epsilon _i, i\in P_1\), \(y_i \leftarrow y_i - \epsilon _i, i \in P_2\), and \(z_i,y_i \ge 0, i \in P_3\) such that \(\sum \nolimits _{i\in P_3}z_i = \sum \nolimits _{i\in P_1}\epsilon _i, \sum \nolimits _{i\in P_3}y_i = \sum \nolimits _{i\in P_2}\epsilon _i\). The change in the objective function is then equal to (modulo an \(o(||{\varvec{\epsilon }}||^2)\) term)

$$\begin{aligned} g({\varvec{\epsilon }})&\approx \frac{1}{2} \sum \limits _{i \in P_1} \frac{b_i}{\sqrt{z}_i}\epsilon _i + \frac{1}{2} \sum \limits _{i \in P_2} \frac{a_i}{\sqrt{y}_i}\epsilon _i - \sum \limits _{i\in P_3}\sqrt{z_i}\sqrt{y_i}\nonumber \\&\ge \frac{1}{2} \sum \limits _{i \in P_1} \frac{b_i}{\sqrt{z}_i}\epsilon _i + \frac{1}{2} \sum \limits _{i \in P_2} \frac{a_i}{\sqrt{y}_i}\epsilon _i - \sqrt{\sum \limits _{i\in P_1}\epsilon _i}\sqrt{\sum \limits _{i\in P_2}\epsilon _i} \nonumber \\&\ge \frac{1}{2}\min _{i\in P_1}\frac{b_i}{\sqrt{z}_i}\epsilon _1 + \frac{1}{2} \min _{i \in P_2} \frac{a_i}{\sqrt{y}_i}\epsilon _2 - \sqrt{\epsilon _1}{\epsilon _2}, \end{aligned}$$
(18)

where the first inequality follows from an application of the Cauchy–Schwartz inequality to the last term, and in the second one, we have defined \(\epsilon _j = \sum \nolimits _{i\in P_j}\epsilon _i, i=1,2\). Let us define \(\epsilon := \sqrt{\epsilon _1/\epsilon _2}\). From the last expression, it suffices to test for any \(i\in P_1,j\in P_2\):

$$\begin{aligned} g(\epsilon _1,\epsilon _2)&= \frac{1}{2} \frac{b_i}{\sqrt{z}_i}\epsilon _1 + \frac{1}{2} \frac{a_j}{\sqrt{y}_j}\epsilon _2 - \sqrt{\epsilon _1}\sqrt{\epsilon _2} \nonumber \\&= \frac{1}{2}\sqrt{\epsilon _1}\sqrt{\epsilon _2} g_1(\epsilon )\nonumber \\ g_1(\epsilon )&:= \frac{b_i}{\sqrt{z}_i}\epsilon + \frac{a_j}{\sqrt{y}_j}\frac{1}{\epsilon } -2 \ge \frac{1}{\epsilon }g_2(\epsilon )\nonumber \\ g_2(\epsilon )&:= \frac{b_i}{A}\epsilon ^2 -2\epsilon + \frac{a_i}{B}, \end{aligned}$$
(19)

where the inequality above follows from the fact that \(\sqrt{z_i} \le A, i\in P_1\) and \(\sqrt{y_i} \le B, i\in P_2\). Note that \(h(\epsilon )\) is a quadratic with a nonpositive discriminant \(\Delta := 4(1 - \frac{a_ib_i}{AB}) \le 0\) as, by definition, we have that \(B \le b_i, i\in P_1\) and \(A\le a_i, i\in P_2\). Therefore, \(g(\epsilon _1,\epsilon _2) \ge 0\) for any \((\epsilon _1,\epsilon _2)\) both positive and sufficiently small, which is a necessary condition for local optimality. By convexity, the vector pair \((\mathbf {a},\mathbf {b})\) obtained constitutes an optimal solution. \(\square \)

Fig. 20
figure 20

Plot of functions \(h_a,h_b,h\). Top row \(h_a\) is a bounded decreasing function, which is piecewise linear in \(\frac{1}{\gamma }\) with nonincreasing slope in \(\frac{1}{\gamma }\); \(h_b\) is a bounded increasing piecewise linear function of \(\gamma \) with nonincreasing slope. Bottom row \(h\) is an increasing function; the linear term \(\gamma \) dominates the fraction term, which is also increasing, see bottom right

1.3 Energy allocation in double waterfilling

Calculating a fixed point of \(T\) is of interest only if \(e_x'e_q'>0\) at optimality. We know that we are not in the setup of Theorem 1.4; therefore, we have the additional property that either \(e_x > |P_1|A^2\), \(e_q > |P_2|B^2\) or both. Let us define

$$\begin{aligned} \gamma _a&:= \inf \Big \{\gamma > 0: \sum \limits _{l \in P_2}\min \Big ( a_l^2\frac{1}{\gamma },B^2 \Big ) \le e_q\Big \}\nonumber \\ \gamma _b&:= \sup \Big \{\gamma \ge 0: \sum \limits _{l \in P_1}\min \Big ( b_l^2\gamma ,A^2 \Big )\le e_x\Big \}. \end{aligned}$$
(20)

Clearly if \(e_x > |P_1|A^2\) then \(\gamma _b = +\infty \), and for any \(\gamma \ge \max _{l\in P_1}\frac{A^2}{b_l^2}\) we have \( \sum \nolimits _{l \in P_1}\min (b_l^2\gamma ,A^2) = |P_1|A^2\). Similarly, if \(e_q>|P_2|B^2\) then \(\gamma _a = 0\), and for any \(\gamma \le \min _{l\in P_2}\frac{a_l^2}{B^2}\) we have \( \sum \nolimits _{l \in P_2}\min (a_l^2\frac{1}{\gamma },B^2) = |P_2|B^2\). If \(\gamma _b < +\infty \), we can find the exact value of \(\gamma _b\) analytically by sorting \(\{\gamma _l^{(b)}:=\frac{A^2}{b_l^2}\}_{l\in P_1}\), –i.e., by sorting \(\{b_l^2\}_{P_1}\) in decreasing order and considering

$$\begin{aligned} h_b(\gamma ):= \sum \limits _{l \in P_1}\min (b_l^2\gamma _i^{(b)},A^2) - e_x \end{aligned}$$

and \(v_i:= h_b(\gamma _i^{(b)})\) (Fig. 20). In this case, \(v_1<\ldots <v_{|P_1|}\), and \(v_{|P_1|}>0\), and there are two possibilities: 1) \(v_1>0\) whence \(\gamma _b < \gamma _1^{(b)}\) and 2) there exists some \(i\) such that \(v_i<0<v_{i+1}\) whence \(\gamma _i^{(b)} < \gamma _b < \gamma _{i+1}^{(b)}\). For both ranges of \(\gamma \), the function \(h\) becomes linear and strictly increasing, and it is elementary to compute its root \(\gamma _b\). A similar argument applies for calculating \(\gamma _a\) if \(\gamma _a\) is strictly positive, by defining

$$\begin{aligned} h_a:= \sum \limits _{l \in P_2}\min \Big ( a_l^2\frac{1}{\gamma },B^2 \Big ) - e_q. \end{aligned}$$

Theorem 2

[Exact solution of (9)] If either \(e_x > |P_1|A^2\), \(e_q > |P_2|B^2\) or both, then the nonlinear mapping \(T\) has a unique fixed point \((e_x', e_q')\) with \(e_x',e_q'>0\). The equation

$$\begin{aligned} \frac{e_x - \sum \nolimits _{l \in P_1}\min (b_l^2\gamma ,A^2)}{e_q - \sum \nolimits _{l \in P_2}\min (a_l^2\frac{1}{\gamma },B^2)} = \gamma \end{aligned}$$
(21)

has a unique solution \(\bar{\gamma }\) with \(\gamma _a \le \bar{\gamma }\) and \(\gamma _a \le \gamma _b\) when \(\gamma _b <+\infty \). The unique fixed point of \(T\) [solution of (9)] satisfies

$$\begin{aligned} e_x'&= e_x - \sum \nolimits _{l \in P_1}\min \left( b_l^2\bar{\gamma },A^2 \right) \nonumber \\ e_q'&= e_q - \sum \nolimits _{l \in P_2}\min \left( a_l^2\frac{1}{\bar{\gamma }},B^2 \right) . \end{aligned}$$
(22)

Proof

ExistenceFootnote 7 of a fixed point is guaranteed by existence of solutions and Lagrange multipliers for (3), as by assumption we are in the setup of Theorem 1.2. Define \(\gamma := \frac{e_x'}{e_q'}\); a fixed point \((e_x',e_q') = T((e_x',e_q')), e_x',e_q'>0\), corresponds to a root of

$$\begin{aligned} h(\gamma ):= -\frac{e_x - \sum \nolimits _{l \in P_1}\min (b_l^2\gamma ,A^2)}{e_q - \sum \nolimits _{l \in P_2}\min (a_l^2\frac{1}{\gamma },B^2)} + \gamma \end{aligned}$$
(23)

For the range \(\gamma \ge \gamma _a\) and \(\gamma \le \gamma _b\), if \(\gamma _b<+\infty \), we have that \(h(\gamma )\) is continuous and strictly increasing. The fact that \(\lim _{\gamma \searrow \gamma _a} h(\gamma )<0,\lim _{\gamma \nearrow \gamma _b} h(\gamma )>0\) shows the existence of a unique root \(\bar{\gamma }\) of \(h\) corresponding to a unique fixed point of \(T\), cf. (22). \(\square \)

Remark 5

(Exact calculation of a root of \(h\)) We seek to calculate the root of \(h\) exactly and efficiently. In doing so, consider the points \(\{\gamma _l\}_{l\in P_1\cup P_2}\), where \(\gamma _l:= \frac{A}{b_l^2}, \ l\in P_1, \ \gamma _l:= \frac{a_l^2}{B}, \ l\in P_2\). Then, note that for any \(\gamma \ge \gamma _l, l\in P_1\) we have that \(\min (b_l^2\gamma ,A^2) = A^2\). Similarly, for any \(\gamma \le \gamma _l, l\in P_2\), we have that \(\min (a_l^2\frac{1}{\gamma },B^2) = B^2\). We order all such points in increasing order, and consider the resulting vector \({\varvec{\gamma }}':=\{\gamma _i'\}\) excluding any points below \(\gamma _a\) or above \(\gamma _b\). Let us define \(h_i:= h(\gamma _i')\). If for some \(i\), \(h_i = 0\) we are done. Otherwise, there are three possibilities: 1) there is an \(i\) such that \(h_i< 0 < h_{i+1}\); 2) \(h_1>0\) and 3) \(h_N<0\). In all cases, the numerator (denominator) of \(h\) is linear in \(\gamma \) (\(\frac{1}{\gamma }\)) for the respective range of \(\gamma \). Therefore, \(\bar{\gamma }\) is obtained by solving the linear equation

$$\begin{aligned} f(\gamma )&:= e_x - \sum \limits _{l \in P_1}\min (b_l^2\gamma ,A^2) \nonumber \\&- \gamma \left( e_q - \sum \limits _{l \in P_2}\min \left( a_l^2\frac{1}{\gamma },B^2 \right) \right) . \end{aligned}$$
(24)

Note that there is no need for further computation to set this into the form \(f(\gamma ) = \alpha \gamma + \beta \) for some \(\alpha ,\beta \). Instead, we use the elementary property that a linear function \(f\) on \([x_0,x_1]\) with \(f(x_0)f(x_1)<0\) has a unique root given by

$$\begin{aligned} \bar{x} = x_0 -\frac{x_1-x_0}{f(x_1)-f(x_0)}f(x_0) \;\;. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vlachos, M., Freris, N.M. & Kyrillidis, A. Compressive mining: fast and optimal data mining in the compressed domain. The VLDB Journal 24, 1–24 (2015). https://doi.org/10.1007/s00778-014-0360-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-014-0360-3

Keywords

Navigation