Skip to main content
Log in

Novel matrix hit and run for sampling polytopes and its GPU implementation

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

We propose and analyze a new Markov Chain Monte Carlo algorithm that generates a uniform sample over full and non-full-dimensional polytopes. This algorithm, termed “Matrix Hit and Run” (MHAR), is a modification of the Hit and Run framework. For a polytope in \(\mathbb {R}^n\) defined by m linear constraints, the regime \(n^{1+\frac{1}{3}} \ll m\) has a lower asymptotic cost per sample in terms of soft-O notation (\(\mathcal {O}^*\)) than do existing sampling algorithms after a warm start. MHAR is designed to take advantage of matrix multiplication routines that require less computational and memory resources. Our tests show this implementation to be substantially faster than the hitandrun R package, especially for higher dimensions. Finally, we provide a python library based on PyTorch and a Colab notebook with the implementation ready for deployment in architectures with GPU or just CPU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability and materials

The code for replicating the experiments is available on github. The source code is available in the github repository, and can be replicated in the free online colab platform: https://github.com/uumami/mhar_pytorch. The authors created a library for testing at https://github.com/uumami/mhar.

Code availability

Code: https://github.com/uumami/mhar_pytorch . Python library: https://pypi.org/project/mhar/. Library Code: https://github.com/uumami/mhar.

Notes

  1. The prefix “M” in reality stands for mentat, a type of human in Frank Herbert’s Dune series who could simultaneously see the multiple probable paths the future may take.

  2. PyTorch-lighting Batch Size Finder https://pytorch-lightning.readthedocs.io/en/stable/advanced/training_tricks.html.

References

Download references

Acknowledgements

Mario Vazquez Corte wants to acknowledge CONACYT and ITAM for the support provided in the completion of his academic work. Additionally, he also acknowledges Dr. Fernando Eponda, Dr. Jose Octavio Gutierrez, and Dr. Rodolfo Conde for their support and insight in the development of this work, and want to express his special gratitude to Saul Caballero, Daniel Guerrero, Alfredo Carrillo, Jesus Ledezma, and Erick Palacios Moreno for their invaluable feedback during this process. Dr. Montiel thanks, unique and exclusively, two anonymous reviewers whose comments help us to improve this manuscript.

Funding

The research was conducted without external funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mario Vazquez Corte.

Ethics declarations

Conflict of interest

The authors state that there are no conflicts or competing interests.

Consent to participate

No people were involved in experiments that required consent from subjects.

Consent for publication

Both authors express consent to publish this article.

Ethical approval

We do not perform any actions or experiments that require ethics approval.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A. Mathematical proofs of lemmas and theorems

Lemma Appendix A.1

If \(m_E < n,\) then the complexity of calculating \(P_{\Delta ^E}\) is \(\mathcal {O}(m_E^{\omega -2}n^2).\)

Proof

Computing \(P_{\Delta ^E}\) is done in three matrix multiplications, one matrix-to-matrix subtraction, and one matrix inversion operation over \((A^EA'^E)\). The number of operations needed to calculate the inverse matrix depends on the algorithm used for matrix multiplication Cormen et al. (2009). The order of number of operations for computing \(P_{\Delta ^E}\) is the sum of the following:

  1. 1.

    Obtain \((A^EA'^E)\) in \(\mathcal {O}(\mu _{A^E, A'^E})=\mathcal {O}(\mu (m_E,n,m_E)) =\mathcal {O}(m_E^{\omega -1}n)\) operations.

  2. 2.

    Find the inverse \((A^EA'^E)^{-1}\) in \(\mathcal {O}(m_E^{\omega })\), since \((A^EA'^E)^{-1}\) has dimension \(m_E \times m_E\).

  3. 3.

    Multiply \( A'^E(A^EA'^E)^{-1}\) in \(\mathcal {O}(\mu _{ A'^E,(A^EA'^E)^{-1}})=\mathcal {O}(\mu (n,m_E,m_E)) =\mathcal {O}(m_E^{\omega -1}n)\).

  4. 4.

    Calculate \(A'^E(A^EA'^E)^{-1}A^E\) in \(\mathcal {O}(\mu _{ A'^E(A^EA'^E)^{-1},A^E})=\mathcal {O}(\mu (n,m_E,n)) =\mathcal {O}(m_E^{\omega -2}n^2)\).

  5. 5.

    Subtract \( I - A'^E(A^EA'^E)^{-1}A^E\) in \(\mathcal {O}(n^2)\).

These sum to \(2 \times \mathcal {O}(m_E^{\omega -1}n) + \mathcal {O}(m_E^{\omega }) + \mathcal {O}(m_E^{\omega -2}n^2)+\mathcal {O}(n^2)\). Hence the complexity of calculating \(P_{\Delta ^E}\) is \( \mathcal {O}(\mu _{A'^E(A^EA'^E)^{-1},A^E})=\mathcal {O}(m_E^{\omega -2}n^2)\). \(\square \)

Lemma Appendix A.2

The cost per iteration of HAR for \(0 \le m_E\) is \(\mathcal {O}(\max \{ m_In,m_E^{\omega -2}n^2\}).\)

Proof

As seen in Algorithm 1, the only difference between the full and non-full-dimensional cases is the projection step \(P_{\Delta ^E}h=d\). Then, the cost per iteration is defined by the larger of the original cost per iteration \(\mathcal {O}(m_In)\) of HAR for \(m_E=0\), and the extra cost induced by the projection when \(m_E>0\).

Because \(P_{\Delta ^E}\) has dimension \(n \times n \) and h is an \(n \times 1\) vector, \(\mu _{P_{\Delta ^E},h}=n^2\) and the complexity is \(\mathcal {O}(n^2)\). By Lemma 3.1, finding \(P_{\Delta ^E}\) has an asymptotic complexity of \(\mathcal {O}(m_E^{\omega -2}n^2)\). Therefore, the cost of projecting h at each iteration is \( \mathcal {O}(n^2) + \mathcal {O}(m_E^{\omega -2}n^2) = \mathcal {O}(m_E^{\omega -2}n^2)\), since \(m_E>0\). Therefore, the cost per iteration for \(m_E >0\) is \(\mathcal {O}(\max \{ m_In,m_E^{\omega -2}n^2)\})\). If \(m_E=0\), then the coefficient \(\max \{ m_In,m_E^{\omega -2}n^2)\}\) equals \(\max \{m_In,0\}=m_In\) and the cost per sample is \(\mathcal {O}^*(\max \{m_In,0)\})=\mathcal {O}^*(m_In)\). \(\square \)

Lemma Appendix A.3

The complexity of generating matrix D in MHAR given \(P_{\Delta ^E}\) and \(\max \{m_I, n\} \le z\) is \(\mathcal {O}(nz)\) if \(m_E=0,\) and \(\mathcal {O}(n^{\omega -1}z)\) if \(m_E>0.\)

Proof

Generating H has complexity \(\mathcal {O}(nz)\) using the Box-Muller method. If \(m_E=0\), then \(D=H\), implying a total asymptotic cost \(\mathcal {O}(nz)\). If \(m_E>0\), then \(D=P_{\Delta ^E}H\), whose cost \(\mathcal {O}(\mu _{P_{\Delta ^E},H})=\mathcal {O}(n^{\omega -1}z)\) given by \(\max \{m_I, n\} \le z\), needs to be included. \(\mathcal {O}(n^{\omega -1}z)\) bounds \(\mathcal {O}(nz)\). Therefore, the total cost of computing D for \(m_E>0\) is bounded by \(\mathcal {O}(n^{\omega -1}z)\). \(\square \)

Lemma Appendix A.4

The complexity of generating all line sets \(\{L^k\}_{k=1}^{z}\) in MHAR given D, X, and \(\max \{m_I, n\} \le z\) is bounded by \(\mathcal {O}(m_In^{\omega -2}z) \ if \ n \le m_I,\) and by \(\mathcal {O}(m_I^{\omega -2}nz)\) otherwise.

Proof

All \(\Lambda ^k\)s can be obtained as follows:

  1. 1.

    Obtain matrix \(A^IX\) in \(\mathcal {O}(\mu _{A^I,X})\). This is done in \(\mathcal {O}(m_In^{\omega -2}z) \ if \ n \le m_I\), and in \(\mathcal {O}(m_I^{\omega -2}nz)\) otherwise.

  2. 2.

    Compute \(B^I - A^IX\), where \(B_I=(b^I|...|b^I) \in \mathbb {R}^{m^I \times z}\), which takes \(\mathcal {O}(m_Iz)\) operations.

  3. 3.

    Calculate \(A^ID\), which is bounded by \(\mathcal {O}(\mu _{A^I,D})\), which is done in \(\mathcal {O}(m_In^{\omega -2}z) \ if \ n \le m_I\), and in \(\mathcal {O}(m_I^{\omega -2}nz)\) otherwise.

  4. 4.

    Divide \(\frac{B^I - A^IX}{A^ID}\) (entry-wise) to obtain all \(\lambda ^k_i\). All the necessary point-wise operations for this calculation have a combined order of \(\mathcal {O}(m_Iz)\).

  5. 5.

    For each \(k \in \{1,...,z\}\), find which coefficients \(a_i^I d^k\) are positive or negative, which takes \(\mathcal {O}(m_Iz)\).

  6. 6.

    For each \(k \in \{1,...,z\}\), find the intervals \(\lambda _{min}^k=\max \ \{\lambda _i^k \ | \ a_i^I d^k < 0\}\) and \( \lambda _{max}^k=\min \ \{\lambda _i^k \ | \ a_i^I d^k > 0\}\), which can be done in \(\mathcal {O}(m_Iz)\).

This procedure constructs all the intervals \(\Lambda ^k=(\lambda _{\min }^k, \lambda _{\max }^k)\). The complexity of this operation is bounded by \(\mathcal {O}(\mu _{A^I,X}) = \mathcal {O}(\mu _{A^I,D})\). Hence, the complexity of finding all line sets is bounded by \(\mathcal {O}(m_In^{\omega -2}z) \ if \ n \le m_I\), and by \(\mathcal {O}(m_I^{\omega -2}nz)\) otherwise. \(\square \)

Lemma Appendix A.5

Sampling z new points given \(\{\Lambda ^k\}_{k=1}^z\) has complexity \(\mathcal {O}(zn).\)

Proof

Selecting a random \(\theta ^k \in \Lambda ^k\) takes \(\mathcal {O}(1)\). Sampling a new point \(x^k_{t,j+1} = x^k_{t,j} + \theta d^k_{t,j}\) has complexity \(\mathcal {O}(n)\) because it requires n scalar multiplications and n sums. Then, sampling all new \(x_{t,j+1}^k\) points is bounded by \(\mathcal {O}(zn)\). \(\square \)

Lemma Appendix A.6

Assume \(m_E = 0,\) \(\max \{n,m\} < z,\) and \(n \le m_I.\) Then, the cost per iteration of MHAR is \(\mathcal {O}(m_In^{\omega -2}z), \) which is the number of operations needed for finding all line sets \(\{L^k\}_{k=1}^z.\)

Proof

First, we enumerate the cost of each step of the iteration for \(m_E=0\) and \(n \le m_I\) if \(\max \{n,m\} < z\):

  1. 1.

    By Lemma 3.1, generating \(P_{\Delta ^E}\) is bounded by \(\mathcal {O}(1)\).

  2. 2.

    By Lemma 4.1, generating D is bounded by \(\mathcal {O}(nz)\).

  3. 3.

    By Lemma 4.2, generating \(\{L^k\}_{k=1}^z\) for \(n \le m_I\) is bounded by \(\mathcal {O}(m_In^{\omega -2}z)\).

  4. 4.

    By Lemma 4.3, generating all new \(x_{t,j+1}^k\) is bounded by \(\mathcal {O}(zn)\).

By hypothesis, \(0<n\le m_I\). Then, \(nz \le m_Iz < m_In^{\omega -2}z\), because \(\omega \in (2,3]\). Therefore, \(\mathcal {O}(1) \subseteq \mathcal {O}(nz) \subseteq \mathcal {O}(m_In^{\omega -2}z)\), where the first term is the complexity of finding the projection matrix (omitted for \(m_E=0\)), the second one bounds generating D and sampling new points, and the third one is the asymptotic cost of finding all line sets \(\{L^k\}_{k=1}^z\). \(\square \)

Lemma Appendix A.7

Assume \(m_E = 0,\) \(\max \{n,m\} < z,\) and \(n > m_I.\) Then, the cost per iteration of MHAR is \(\mathcal {O}(nm_I^{\omega -2}z),\) which is the number of operations needed for finding all line sets \(\{L^k\}_{k=1}^z.\)

Proof

As in the proof of Lemma 4.4, the complexity of the projection matrix, generating D, and sampling all-new \(x_{t,j+1}^k\) points is the same, given by \(m_E=0\) and \(n > m_I\). Hence, the only change is provided by Lemma 4.2, in which the cost of finding all line sets \(\{L^k\}_{k=1}^z\) for \(n > m_I\) is \(\mathcal {O}(nm_I^{\omega -2}z)\). By hypothesis, \(0<m_I\) and \(\max \{n,m\} < z\), thus \(nz < nm_I^{\omega -2}z\). Therefore, \(\mathcal {O}(1) \subseteq \mathcal {O}(nz) \subseteq \mathcal {O}(nm_I^{\omega -2}z)\), where the third term is the cost of finding all line sets \(\{L^k\}_{k=1}^z\). \(\square \)

Lemma Appendix A.8

Assume \(m_E < n\) and \((m,n)<z.\) Then, the cost of calculating the projection matrix \(P_{\Delta ^E}\) is bounded by the cost of generating D.

Proof

By hypothesis \(m_E < n \), implying that \(m_E^{\omega -2}n^2 < n^{\omega -2}n^2 = n^{\omega }\). Because \(n<z\), \(n^{\omega } = n^{\omega -1}n <n^{\omega -1}z \). Combining both inequalities yields \(m_E^{\omega -2}n^2< n^{\omega }< n^{\omega -1}z\). Therefore, \(\mathcal {O}(m_E^{\omega -2}n^2) \subseteq \mathcal {O}(n^{\omega -1}z)\), where the first term is the complexity of computing \(P_{\Delta ^E}\) (by Lemma 3.1), and the second term is the complexity of projecting H in order to obtain D (by Lemma 4.1). \(\square \)

Lemma Appendix A.9

Assume \(m_E>0,\) \(\max \{n,m\} < z,\) and \(n \le m_I.\) Then, the cost per iteration of MHAR is \(\mathcal {O}(m_In^{\omega -2}z),\) which is the number of operations needed for finding all line sets \(\{L^k\}_{k=1}^z.\)

Proof

First, we enumerate the cost of each step of the iteration for \(m_E>0\), \(n \le m_I\), and \(\max \{n,m\} < z\):

  1. 1.

    By Lemma 3.1, generating \(P_{\Delta ^E}\) is bounded by \(\mathcal {O}(m_E^{\omega -2}n^2)\).

  2. 2.

    By Lemma 4.1, generating D is bounded by \(\mathcal {O}(n^{\omega -1}z)\).

  3. 3.

    By Lemma 4.2, generating \(\{L^k\}_{k=1}^z\) for \(n \le m_I\) is bounded by \(\mathcal {O}(m_In^{\omega -2}z)\).

  4. 4.

    By Lemma 4.3, generating all new \(x_{t,j+1}^k\) is bounded by \(\mathcal {O}(zn)\).

Using Lemma 4.7, the Big-O term for finding \(P_{\Delta ^E}\) (step 1) is bounded by the term of generating D (step 2). Because \(n<m_I\), \(n^{\omega -1}z=n^{\omega -2}nz<n^{\omega -2}m_Iz\). Therefore, \(\mathcal {O}(m_E^{\omega -2}n^2)\subseteq \mathcal {O}(n^{\omega -1}z) \subseteq \mathcal {O}(m_In^{\omega -2}z)\), which are the respective costs of steps 1, 2, and 3. Furthermore, \(nz \le n^{\omega -2}m_Iz\), implying that step 4 is also bounded by step 3 in terms of complexity. This implies that all the operations above are bounded by the term \(\mathcal {O}(m_In^{\omega -2}z)\), which is the asymptotic complexity of finding all line sets \(\{L^k\}_{k=1}^z\). \(\square \)

Lemma Appendix A.10

Assume \(m_E>0,\) \(\max \{n,m\} < z,\) and \(n>m_I.\) Then, the cost per iteration of MHAR is \(\mathcal {O}(nm_I^{\omega -2}z),\) which is the number of operations needed for generating D.

Proof

As in the proof of Lemma 4.8, the cost of the projection matrix, generating D, and sampling all-new \(x_{t,j+1}^k\) points is the same, given by \(m_E>0\) and \(n > m_I\). Hence, the only change is provided by Lemma 4.2, in which the cost of finding all line sets \(\{L^k\}_{k=1}^z\) for \(n > m_I\) is \(\mathcal {O}(nm_I^{\omega -2}z)\).

By Lemma 4.7, the Big-O term for finding \(P_{\Delta ^E}\) is bounded by the term of generating D. Because \(n>m_I\), \(m_I^{\omega -2}nz < n^{\omega -2}nz=n^{\omega -1}z\). Therefore, \(\mathcal {O}(m_E^{\omega -2}n^2) \subseteq \mathcal {O}(nm_I^{\omega -2}z) \subseteq \mathcal {O}(n^{\omega -1}z)\), which are the respective costs of the projection matrix, finding all line sets, and generating D. Furthermore, \(nz \le n^{\omega -2}nz=n^{\omega -1}z\), implying that the cost of sampling all new \(x_{t,j+1}^k\) is also bounded by the cost of generating D. This implies that all the operations above are bounded by \(\mathcal {O}(nm_I^{\omega -2}z)\). \(\square \)

Lemma Appendix A.11

For \(\max \{n,m\} < z\) m, and \(n<m,\) MHAR has a lower cost per sample than does John’s walk after proper pre-processing, warm start, and ignoring the logarithmic and error terms.

Proof

Given proper pre-processing, \(n \ll m\), and \(\max \{n,m\} < z\), then MHAR’s cost per sample is \(\mathcal {O}^*(mn^{\omega + 1})\), and that for John’s walk is \(\mathcal {O}(mn^{11} + n^{15})\). Note that \(mn^{\omega + 1}\in \mathcal {O}(mn^{11} + n^{15})\). Therefore, when ignoring the logarithmic and error terms, MHAR has a lower cost per sample. \(\square \)

Lemma Appendix A.12

For \(\max \{n,m\} < z\) and the regime \(n \ll m,\) MHAR has a lower cost per sample than does the John walk after proper pre-processing, warm start, and ignoring logarithmic and error terms.

Proof

From proper pre-processing, \(n \ll m\), and \(\max \{n,m\} < z\), MHAR’s cost per sample is \(\mathcal {O}^*(mn^{\omega + 1})\) and that for John walk is \(\mathcal {O}(mn^{\omega + \frac{3}{2}})\). Note that \(mn^{\omega + 1} \in \mathcal {O}(mn^{\omega + \frac{3}{2}})\). Therefore when ignoring the logarithmic and error terms, MHAR has a lower cost per sample. \(\square \)

Appendix B. Additional optimal expansion experiments

Here we present the results for different expansion parameters using 10 MHAR runs for each dimension (25, 50, 100, 500) on simplices and hypercubes. Figures 8, 9, 10 and 11 shows the box-plots for simplices while Figs. 12, 13, 14 and 15 shows the box-plots for hypercubes.

The box in the boxplots shows the 25%, 50%, and 75% percentiles. The dots mark the outliers, and the upper and lower limits mark the maximum and minimum values without considering outliers.

Fig. 8
figure 8

Box-plots for simplices in dimension 25 comparing expansion behavior for different values of the expansion hyper-parameter z

Fig. 9
figure 9

Box-plots for simplices in dimension 50 comparing expansion behavior for different values of the expansion hyper-parameter z

Fig. 10
figure 10

Box-plots for simplices in dimension 100 comparing expansion behavior for different values of the expansion hyper-parameter z

Fig. 11
figure 11

Box-plots for simplices in dimension 500 comparing expansion behavior for different values of the expansion hyper-parameter z

Fig. 12
figure 12

Box-plots for hypercube in dimension 25 comparing expansion behavior for different values of the expansion hyper-parameter z

Fig. 13
figure 13

Box-plots for hypercube in dimension 50 comparing expansion behavior for different values of the expansion hyper-parameter z

Fig. 14
figure 14

Box-plots for hypercube in dimension 100 comparing expansion behavior for different values of the expansion hyper-parameter z

Fig. 15
figure 15

Box-plots for hypercube in dimension 500 comparing expansion behavior for different values of the expansion hyper-parameter z

Appendix C. Additional performance experiments

Here we present additional experiments on the fitness of MHAR. Table 5 reports the running times and the average sampled points per second for the best values of z for each combination of Figure and dimension. For each combination, we conducted the experiment 10 times. Table 5 shows that average samples per second are lower for higher dimensions, due to the curse of dimensionality. However, the performance of MHAR is outstanding.

Table 5 Samples Per Second of the MHAR

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Corte, M.V., Montiel, L.V. Novel matrix hit and run for sampling polytopes and its GPU implementation. Comput Stat (2023). https://doi.org/10.1007/s00180-023-01411-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00180-023-01411-y

Keywords

Navigation