Quantum Meets Fine-grained Complexity: Sublinear Time Quantum Algorithms for String Problems

Longest common substring (LCS), longest palindrome substring (LPS), and Ulam distance (UL) are three fundamental string problems that can be classically solved in near linear time. In this work, we present sublinear time quantum algorithms for these problems along with quantum lower bounds. Our results shed light on a very surprising fact: Although the classic solutions for LCS and LPS are almost identical (via suffix trees), their quantum computational complexities are different. While we give an exact $\tilde O(\sqrt{n})$ time algorithm for LPS, we prove that LCS needs at least time $\tilde \Omega(n^{2/3})$ even for 0/1 strings.


Introduction
Perhaps the earliest questions that were studied in computer science are the algorithmic aspects of string problems.The edit distance, longest common substring, and longest palindrome substring are some of the more famous problems in this category.Efforts to solve these problems led to the discovery of several fundamental techniques such as dynamic programming, hashing algorithms, and suffix trees.These algorithms have numerous applications in several fields including DNAsequencing, social media, compiler design, anti-virus softwares, etc.All of the above problems have received significant attention in the classical setting (see, e.g., [MP80, BYJKK04, BES06, AKO10, BEG + 18, CDG + 18, Far97, ACPR18, KSV14]).Efficient classical algorithms for these problems emerged early on in the 1960s [CLRS09].Moreover, thanks to a series of recent developments in fine-grained complexity [BI18,ABW15], we now seem to have a clear understanding of the classical lower bounds as well.Unless plausible conjectures such as SETH2 are broken, we do not hope for a substantially better algorithm for edit distance.Failure to extend the fine-grained lower bounds to the quantum setting has left some very interesting open questions behind both in terms of quantum complexity and quantum lower bounds.
Despite a plethora of new quantum algorithms for various problems (e.g., [Amb07, B Š06, Gro96, AWHL06, MSS07, Sho97]), not much attention is given to string problems.Until recently, the only non-trivial quantum algorithm for such problems was the Õ( √ n + √ m) time algorithm of Ramesh and Vinay [RV03] for pattern matching where n and m are the sizes of the text and the pattern (we also mention the works [AM14, CIL + 12] that consider string problems with nonstandard queries in the quantum setting).Recently, Boroujeni, Ehsani, Ghodsi, Hajiaghayi, and Seddighin [BEG + 18] made a clever use of the Grover's search algorithm [KLM + 07] to obtain a constant factor approximation quantum algorithm for edit distance in truly subquadratic time.Shortly after, it was shown by Chakraborty, Das, Goldenberg, Koucky, and Saks [CDG + 18] that a similar technique can be used to obtain a classical solution with the same approximation factor.Several improved classical algorithms have been given for edit distance in recent years [AN20, KS20,BR20] though it is still an open question if a non-trivial quantum algorithm can go beyond what we can do classically for edit distance.
In this work, we give novel sublinear time quantum algorithms and quantum lower bounds for LCS, LPS, and a special case of edit distance namely Ulam distance (UL).All these problems require Ω(n) time in the classical setting even if approximate solutions are desired.LCS and LPS can be solved in linear time via suffix trees [CLRS09] and there is an O(n log n) time algorithm for Ulam distance [CLRS09].Our results shed light on a very surprising fact: Although the classical solutions for LCS and LPS are almost identical, their quantum computational complexities are different.While we give an exact Õ( √ n) time quantum algorithm for LPS, we prove that any quantum algorithm for LCS needs at least time Ω(n 2/3 ) even for 0/1 strings.We accompany this with several sublinear time quantum algorithm for LCS.A summary of our results is given in Tables 1 and 2.

Related work
Our work is very similar in spirit to for instance the work of Ambainis, Balodis, Iraids, Kokainis, Prusis, and Vihrovs [ABI + 19], where Grover's algorithm is cleverly combined with classical techniques to design more efficient quantum algorithms for dynamic programming problems.This is particularly similar to our approach since we obtain our main results by combining known quantum algorithms with new classical ideas.In the present work, however, we go beyond Grover's algorithm and make use of several other quantum techniques such as element distinctness, pattern matching, amplitude amplification, and amplitude estimation to obtain our improvements.In addition to this, we also develop quantum walks that improve our more general results for some special cases.
In particular, our quantum walk for obtaining a 1 − ǫ approximate solution for LCS is tight up to logarithmic factors due to a lower-bound we give in Section 7.
Another line of research which is closely related to our work is the study of quantum lower bounds for edit distance [Rub19,BPS21].While a SETH-based quadratic lower bound is known for the classical computation of edit distance, quantum lower bounds are not very strong.Recently, a quantum lower bound of Ω(n 1.5 ) was given by Buhrman, Patro, and Speelman [BPS21] under a mild assumption.Still no quantum algorithm better than the state of art classical solution (which runs in time O(n 2 / log 2 n) [MP80]) is known for edit distance.The reader can find more details in [Rub19].
While not directly related to string algorithms, another investigation of SETH in the quantum setting is the recent work by Aaronson, Chia, Lin, Wang, and Zhang [ACL + 20] that focuses on the quantum complexity of the closest pair problem, a fundamental problem in computational geometry.Interestingly, the upper bounds obtained in that paper also use an approach based on element distinctness and quantum walks.Despite this, the high-level ideas of our work are substantially different from [ACL + 20] as we utilize several novel properties of LCS and LPS to design our algorithms.
In the classical sequential setting, LCS and LPS can be solved in linear time [CLRS09].The solutions are almost identical: we first construct suffix trees for the input strings and then find the lowest common ancestors for the tree nodes.Ulam distance can also be solved exactly in time O(n log n) [CLRS09] which is the best we can hope for via a comparison-based algorithm [Fre75] or algebraic decision trees [Ram97].Approximation algorithms running in time Õ(n/d + √ d) where d denotes the Ulam distance of the two strings have also been developed [AN10,NSS17].

Preliminaries
Description of LCS, LPS and UD.In the longest common substring problem (LCS), the input consists of two strings and our goal is to find the longest substring3 which is shared between the two strings.We denote the two input strings by A and B. We assume that A and B have the same length, which we denote by n.We use Σ to denote the alphabet of the strings.For any ǫ ∈ [0, 1), we say that an algorithm outputs a (1 − ǫ)-approximation of the longest common substring if for any input strings A and B, it outputs a common substring of length at least (1 − ǫ)d, where d is the length of the longest common substring of A and B.
In the longest palindrome substring problem (LPS), the goal is to find the longest substring of a given string A which reads the same both forward and backward.The length of A is also denoted by n and its alphabet by Σ.For any ǫ ∈ [0, 1), we say that an algorithm outputs a (1 − ǫ)-approximation of the longest palindrome substring if for any input string A, it outputs a palindrome substring of length at least (1 − ǫ)d, where d is the length of the longest palindrome substring of A.
We say that a string of length n over an alphabet Σ is non-repetitive if no character appears twice in the string (note that this can happen only if |Σ| ≥ n).The Ulam distance is a special case of the edit distance in which the input strings are non-repetitive.Let us now define the problem more formally.In Ulam Distance (UD) we are given two non-repetitive strings A and B of length n, and consider how to transform one of them to the other one.For this purpose we allow two basic operations character addition and character deletion, each at a unit cost and our goal is to minimize the total cost of the transformation4 .We denote by ud(A, B) the minimum number of such operations needed to transform A into B. The goal is to compute ud(A, B), either exactly or approximately.For any ǫ ∈ [0, 1], we say that an algorithm outputs a (1 + ǫ)-approximation of ud(A, B) if it outputs some value r such that the inequality General definitions and conventions.Throughout the paper, we use notations Õ(•) and Ω(•) that hide the polylogarithmic factors in terms of n.We always assume that the size of Σ is polynomial in n and that each character is encoded using O(log n) bits.The size of Σ thus never appears explicitly in the complexity of our algorithms.We say that a randomized or a quantum algorithm solves a problem like LCS, LPS or UL with high probability if it solves the problem with probability at least 9/10 (this success probability can be easily amplified to 1 − 1/poly(n) with a logarithmic overhead in the complexity).
For convenience, we often only compute/approximate the size of the solution as opposed to explicitly giving the solution.However, it is not hard to see that for LCS and LPS, the same algorithms can also give an explicit solution with a logarithmic overhead in the runtime.(a solution can be specified by two integers pointing at the interval of the input.) For a string X, we denote by X[i, j] the substring of X that starts from the i-th character and ends at the j-th character.We say a string X is q-periodic if we have X i = X i+q for all 1 ≤ i ≤ |X| − q.Moreover, the periodicity of a string X is equal to the smallest number q > 0 such that X is q-periodic.We also call a non-repetitive string of length n over an alphabet of size n a permutation (it represents a permutation of the set Σ).
Quantum access to the inputs.In the quantum setting, we suppose that the input strings A and B can be accessed directly by a quantum algorithm.More precisely, we have an oracle O A that, for any i ∈ {1, . . ., n}, any a ∈ Σ, and any z ∈ {0, 1} * , performs the unitary mapping , where ⊕ denotes an appropriate binary operation defined on Σ (e.g., bit-wise parity on the binary encodings of a and A[i]).Similarly we have an oracle O B that, for any i ∈ {1, . . ., n}, any b ∈ Σ, and any z ∈ {0, 1} * , performs the unitary mapping can be implemented at unit cost.This description corresponds to quantum random access ("QRAM access") to the input, which is the standard model to investigate the complexity of sublinear time quantum algorithms.

Results
We present sublinear time quantum algorithms along with quantum lower bounds for LCS, LPS, and UL.For the most part, the novelty of our work is to make use of existing quantum algorithms to solve our problems.For this purpose, we introduce new classical techniques that significantly differ from the conventional methods.However for a special case of LCS, we design a novel quantum walk that leads to an improvement over our more general solution.We give a brief explanation of this technique later in the section.For now, we start by stating the quantum tools that we use in our algorithms.

Quantum components
and succeeds with probability 9/10 (the success probability can be increased to 1 − 1/poly(n) with only a logarithmic overhead).Here, T (n) represents the time complexity of computing f (i) for one given element i ∈ [n].Additionally, distinguishing between the case where f (x) = 1 holds for at least m elements (for some value 1 ≤ m ≤ n) and the case where f (i) = 0 for all i ∈ [n] can be done in time Õ( n/m • T (n)).

Pattern matching ([RV03]
).Let P and S be a pattern and a text of lengths n and m respectively.One can either verify that P does not appear as a substring in S or find the leftmost (rightmost) occurrence of P in S in time Õ( √ n + √ m) via a quantum algorithm.The algorithm gives a correct solution with probability at least 9/10.

Element distinctness ([Amb07]
).Let X and Y be two lists of size n and f : (X ∪ Y ) → N be a function that is used to compare the elements of X and Y5 .There is a quantum algorithm that finds (if any) an (x, y) pair such that x ∈ X, y ∈ Y and f (x) = f (y).The algorithm succeeds with probability at least 9/10 and has running time Õ(n 2/3 • T (n)), where T (n) represents the time needed to answer to the following question: Given α, β ∈ X ∪Y is f (α) = f (β) and if not which one is smaller?

Amplitude amplification ([BHMT02]
).Let Q be a decision problem and A be a quantum algorithm that solves Q with one-sided error and success probability 0 < p < 1 (i.e., on a yes-instance A always accepts, while on a no-instance A rejects with probability p).Let T be the runtime of A. One can design a quantum algorithm for Q with runtime O(T / √ p) that solves Q with one-sided error and success probability at least 9/10.

Amplitude estimation ([BHMT02]
).Let A be a quantum algorithm that outputs 1 with probability 0 < p < 1 and returns 0 with probability 1 − p.Let T be the time needed for A to generate its output.For any α > 0, one can design a quantum algorithm with runtime O(T /(α √ p)) that outputs with probability at least 9/10 an estimate p such that (1 − α)p ≤ p ≤ (1 + α)p.

LCS and LPS
In this section, we outline the ideas for obtaining sublinear time algorithms for LCS and LPS.We begin as a warm up by giving a simple exact algorithm for LCS that runs in sublinear time when the solution size is small.Next, we explain our techniques for the cases that the solution size is large (at a high-level, this part of the algorithm is very similar in both LCS and LPS).In our algorithms, we do a binary search on the size of the solution.We denote this value by d.
Notice that the runtime is sublinear when d is large.
The same technique can be used to approximate LPS.Similarly, we define d ′ = (1 − ǫ)d for some constant 0 < ǫ < 1 and draw a random substring of size d ′ from A. With the same argument, provided that the solution size is at least d, the probability that P is part of an optimal solution is at least Ω(d/n).We show in Section 5 that by searching the reverse of P in its neighbourhood we are able to find a solution of at least d ′ .This step of the algorithm slightly differs from LCS in that we only search the reverse of P in the area at most d away from P .Thus, both the text and the pattern are of size O(d) and therefore the search can be done in time Õ( √ d).By utilizing amplitude amplification, we can obtain an algorithm with runtime Õ( n/d approximation factor 1 − ǫ. From 1 − ǫ approximation to exact solution.We further develop a clever technique to obtain an exact solution with the above ideas.We first focus on LCS to illustrate this new technique.The high-level intuition is the following: After sampling P from A and searching P in B, if the pattern appears only once (or a small number of times) in B then by extending the matching parts of B from both ends we may find a common substring of size d (see Figure 1).Thus intuitively, the challenging case is when there are several occurrences of P in B. The key observation is that since |P | is large and there are several places that P appears in B then they must overlap (see Figure 2).This gives a very convenient approach to tackle the problem.Assume that P appears at positions i and j > i of B and these parts are overlapping.This implies that P is certainly (j − i)-periodic.To see how this enables us to solve the problem, assume that P is x-periodic and that a large continuous area of B is covered by occurrences of P .It follows that except for the boundary cases, P appears as parts of that interval that are exactly x units away.Therefore, by detecting such an interval and computing the periodicity of P , one can determine all occurrences of P in the interval almost as quickly as finding one occurrence of P .Since the techniques are involved, we defer the details to Section 3 and here we just mention some intuition about its complexity analysis: It follows from the above observations that when several occurrences of P cover an entire interval I of B, then we only need to consider O(1) places in I that may correspond to an optimal solution.Moreover, the length of every such interval is at least d.Thus, there are only n/d places of B that we need to take into account in our algorithm.Therefore at a high-level, when d is large (say Ω(n)), the only non-negligible cost we pay is for the pattern matching which takes time Õ( √ n).We show in Section 3 that as d becomes smaller the A very similar argument can be used to shave the 1 − ǫ approximation factor from the LPS algorithm as well.
To summarize, we obtain the following theorems.Theorem 2.1 combines the two algorithms we just described: if d is larger than n 1/3 we use the second algorithm with runtime Õ(n/ √ d) otherwise we use the first algorithm with runtime Õ(n 2/3 √ d).
Theorem 2.1.The longest common substring of two strings of size n can be computed with high probability by a quantum algorithm in time Õ(n 5/6 ).
Theorem 2.2.The longest palindrome substring of a string of size n can be computed with high probability by a quantum algorithm in time Õ( √ n).
We accompany Theorems 2.1 and 2.2 with quantum lower bounds (all the lower bounds are proven in Section 7).Intuitively, obtaining a solution with time better than Õ( √ n) is impossible for either problem due to a reduction to searching unordered sets.This makes our solution for LPS optimal up to subpolynomial factors.For LCS, an improved lower bound of Ω(n 2/3 ) can be obtained via a reduction from element distinctness.However, the gap is still open between our upper bound of Õ(n 5/6 ) and lower bound of Ω(n 2/3 ).Thus, we aim to improve our upper bound by considering approximate solutions and special cases.In the following, we briefly explain these results.
Improved 1 − ǫ approximation for LCS.One way to obtain a better algorithm for LCS is by considering 1 − ǫ approximation algorithms.Note that the quantum algorithm for element distinctness is based on quantum walks, and our solution for LCS is obtained via a reduction to element distinctness.The runtime of quantum walks can be improved when multiple solutions are present for a problem.If instead of a solution of size d which is exact, we resort to solutions of size (1 − ǫ)d, we are guaranteed to have at least ǫd solutions.Thus, intuitively this should help us improve the runtime of our algorithm.
Although the intuition comes from the inner workings of the quantum techniques, we are actually able to improve the runtime with a completely combinatorial idea.For small d, instead of constructing two sets S A and S B with n − d + 1 elements, we construct two sets of size O(n/ √ d) and prove that if the two sets have an element in common, then there is a solution of size at least Only the elements colored in red are included in the two sets.
(1 − ǫ)d.The construction of the two sets is exactly the same as the construction of S A and S B except that only some of the elements are present in the new subsets (see Figure 3 for an illustration of the construction).
We prove that since each set now has size ).This combined with the Õ(n/ √ d) algorithm for large d gives us a 1 − ǫ approximation algorithm with runtime Õ(n 3/4 ).
Theorem 2.3.For any constant 0 < ǫ < 1, the longest common substring of two strings of size n can be approximated within a factor 1 − ǫ with high probability by a quantum algorithm in time Õ(n 3/4 ).
LCS for non-repetitive strings.For all string problems, a special case which is of particular interest is when the characters are different.For instance, although DNA's consist of only 4 characters, one can make the representation more informative by assigning a symbol to every meaningful block of the sequence.This way, there is only little chance a character appears several times in the sequence.Similarly, in text recognition, one may rather represent every word with a symbol of the alphabet resulting in a huge alphabet and strings with low repetitions.These scenarios are motivating examples for the study of edit distance and longest common subsequence under this assumption, known respectively as Ulam distance [AN10, CK06, NSS17] and longest increasing subsequence (which has been the target of significant research by the string algorithms community -see, e.g., [MS20] for references).
We thus consider LCS for input strings A and B that are non-repetitive (an important special case is when A and B are permutations).We show that there exists an Õ(n 3/4 )-time quantum algorithm for exact LCS and an Õ(n 2/3 )-time quantum algorithm for approximate LCS.This significantly improves the generic results of Theorems 2.1 and 2.3 and, for approximate LCS, this matches (up to possible polylogarithmic factors) the lower bound of Theorem 7.1.
Theorem 2.4.The longest common substring of two non-repetitive strings of size n can be computed with high probability by a quantum algorithm in time Õ(n 3/4 ).
Theorem 2.5.For any constant 0 < ǫ < 1, the longest common substring of two non-repetitive strings of size n can be approximated within a factor 1 − ǫ with high probability by a quantum algorithm in time Õ(n 2/3 ).
The improvements for non-repetitive strings are obtained by improving the complexity of the first part of the algorithm used for general LCS from Õ(n 2/3 √ d) to Õ(n 2/3 + √ nd) for the case of exact LCS, and from Õ(n 2/3 d 1/6 ) to Õ(n 2/3 /d 1/3 + √ n) for approximate LCS.We now briefly describe how we achieve such improvements.
Let us first consider exact LCS.The Õ(n 2/3 √ d)-time quantum algorithm described above was based on a "black-box reduction" to element distinctness, i.e., we used the quantum algorithm for element distinctness (which is based, as already mentioned, on a technique known as quantum walk) as a black-box.In comparison, our new algorithm is constructed by designing a quantum walk especially tailored for our problem.More precisely, we use the approach by Magniez, Nayak, Roland and Santha [MNRS11] to design a quantum walk over the Johnson graph (more precisely, we work with a graph defined as the direct product of two Johnson graphs, which is more convenient for our purpose).
We say that a pair (i, j) ] is marked if there is a common substring of length d that starts at position i in A and position j in B, i.e., A[i, i The goal of our quantum walk is to find a pair of subsets (R 1 , R 2 ) where R 1 and R 2 are two subsets of [n − d + 1] of size r (for some parameter r) such that there exists a marked pair (i, j) in R 1 × R 2 .Note that since the strings are non-repetitive, for two random subsets R 1 , R 2 , the expected number of pairs (i, A simple but crucial observation is that only those pairs can be marked since marked pairs should agree on their first character.Thus for two random subsets R 1 , R 2 we can check if there exists a marked pair in R 1 × R 2 in expected time Õ( r 2 /n) using Grover search, since we have only Θ(r 2 /n) candidates for marked pairs.This significantly improves over the upper bound Õ( √ r 2 ) we would get without using the assumption that the strings are non-repetitive.This is how we can improve the complexity down to Õ(n 2/3 + √ nd).Note that a technical difficulty that we need to overcome is guaranteeing that the running time of the checking procedure is small not only for random subsets but for also all (R 1 , R 2 ), i.e., we need a guarantee on the worst-case running time of the checking procedure.We solve this issue by disregarding the pairs of subsets (R 1 , R 2 ) that contain too many candidateswe are able to prove that the impact is negligible by using concentration bounds.
The improvement for approximate LCS uses a very similar idea.The main difference is that now we consider a pair (i, j) there is a common substring of length ⌈(1 − ǫ)d⌉ that starts at position i in A and position j in B. Since the fraction of marked pairs increases by a factor ǫd we obtain a further improvement.Analyzing the running time of the resulting quantum walk shows that we obtain overall time complexity Õ(n 2/3 /d 1/3 + √ n).

Ulam distance
Finally, we present a sublinear time quantum algorithm that computes a (1 + ǫ)-approximation of the Ulam distance (i.e., the edit distance for non-repetitive strings).
Theorem 2.6.For any constant ǫ > 0, there exists a quantum algorithm that computes with high probability a (1+ǫ)-approximation of the Ulam distance of two non-repetitive strings in time Õ( √ n).
In comparison, classical algorithms require linear time even for computing a constant-factor approximation of the Ulam distance when the distance is small (see, e.g., [AN10] for a discussion of the classical lower bounds).Theorem 2.6 thus shows that while for general strings it is still unknown whether a quantum speed-up is achievable for the computation of the edit distance, we can obtain a quadratic speed-up for non-repetitive strings.In Section 7 we show a quantum lower bound (see Theorem 7.4) that matches (up to possible polylogarithmic factors) the upper bound of Theorem 2.6.Since it is easy to show that any quantum algorithm that computes the Ulam distance exactly requires Ω(n) time, our results are thus essentially optimal.
Let us now describe briefly how the quantum algorithm of Theorem 2.6 is obtained.Our approach is based on a prior work by Naumovitz, Saks, and Seshadhri [NSS17] that showed how to construct, for any constant ǫ > 0, a classical algorithm that computes a (1 + ǫ)-approximation of the Ulam distance and runs in sublinear time when ud(A, B) is large (the running time becomes linear when ud(A, B) is small).The core technique is a variant of the Saks-Seshadhri algorithm for estimating the longest increasing sequence from [SS10], which can be used to construct a binary "indicator" (we denote this indicator UlamIndic in Section 6) that outputs 1 with probability p and 0 with probability 1 − p, for some value p that is related to the value of ud(A, B).Conceptually, the approach is based on estimating the value of ud(A, B) from this indicator using a hierarchy of gap tests and estimators, each with successively better run time.This results in a fairly complex algorithm.
At a high level, our strategy is applying quantum amplitude estimation on the classical indicator UlamIndic to estimate p and thus ud(A, B).Several technical difficulties nevertheless arise since the indicator requires a rough initial estimation of ud(A, B) to work efficiently.To solve these difficulties, we first construct a quantum gap test based on quantum amplitude estimation that enables to test efficiently if the success probability of an indicator is larger than some given threshold q or smaller than (1 − η)q for some given gap parameter (see Proposition 6.3 in Section 6).We then show how to apply this gap test to the indicator UlamIndic with successively better initial estimates of ud(A, B) in order to obtain a (1 + ǫ)-approximation of ud(A, B) in Õ( √ n) time.

Longest Common Substring
Recall that in LCS, we are given two strings A and B of size n and the goal is to find the largest string t which is a substring of both A and B. This problem can be solved in time O(n) via suffix tree in the classical setting.In this section, we give upper bounds for the quantum complexity of this problem (lower bounds are discussed in Section 7).We first begin by giving an algorithm for exact LCS that runs in time Õ(n 5/6 ) in Section 3.1.We then give an algorithm for approximate LCS that runs in time Õ(n 3/4 ) in Section 3.2.

Quantum algorithm for exact LCS
In our algorithm, we do a binary search on the size of the solution.We denote this value by d.Thus, by losing an O(log n) factor in the runtime, we reduce the problem to verifying if a substring of size d is shared between the two strings.Our approach is twofold; we design separate algorithms for small and large d.

Quantum algorithm for small d
Our algorithm for small d is based on a reduction to element distinctness.Let |Σ| be the size of the alphabet and v : Σ → [0, |Σ| − 1] be a function that maps every element of the alphabet to a distinct number in range 0 . . .|Σ| − 1.In other words, v is a perfect hashing for the characters.We extend this definition to substrings of the two strings.For a string t, we define v(t) as follows: It follows from the definition that two strings t and t ′ are equal if and only if we have v(t) = v(t ′ ).
From the two strings, we make two sets of numbers S A and S B each having n − d + 1 elements.Element i of set S A is a pair (A, i) whose value is equal to v(A[i, i + d − 1]) and similarly element i of S B is a pair (B, i) whose value is equal to v(B[i, i + d − 1]).The two sets contain elements with equal values if and only if the two strings have a common substring of size d.Therefore, by solving element distinctness for S A and S B we can find out if the size of the solution is at least d.Although element distinctness can be solved in time Õ(n 2/3 ) when element comparison can be implemented in Õ(1) time, our algorithm needs more runtime since comparing elements takes time ω(1).More precisely, each comparison can be implemented in time Õ( √ d) in the following way: In order to compare two elements of the two sets, we can use Grover's search to find out if the two substrings are different in any position and if so we can find the leftmost position in time Õ( √ d).Thus, it takes time Õ( √ d) to compare two elements which results in runtime Õ(n 2/3 √ d).We summarize this result in the following lemma.
Lemma 3.1.There exists a quantum algorithm that runs in time Õ(n 2/3 √ d) and verifies with probability 9/10 if there is a common substring of length d between the two strings.

Quantum algorithm for large d
We now present a quantum algorithm that runs in time Õ(n/ √ d) and verifies if there is a common substring of length d between the two strings.
General description of the algorithm.We say that a character of A is marked if it appears among the first ⌊d/3⌋ characters of some substring of length d shared between A and B. For example, If there is exactly one common substring of length d, there are precisely ⌊d/3⌋ marked characters but we may have more marked characters as the number of common subsequences of length d between A and B increases.In our algorithm, we sample a substring of length 2⌊d/3⌋ of A and a substring of length d of B. We call the substring sampled from A the pattern and denote it by P and denote the substring sampled from B by S. We denote the intervals of A and B that correspond to P and S by [ℓ P , r P ] and [ℓ S , r S ] respectively.That is, A[ℓ P , r P ] = P and B[ℓ S , r S ] = S.We say that the pair (P, S) is good if the following conditions hold (see Figure 4 for an illustration): • There exists a pair (i, j) such that A[i, i + d − 1] = B[j, j + d − 1] is a common subsequence of size d between the two strings.
• ℓ S ≤ j − i + ℓ P ≤ j − i + r P ≤ r S .It directly follows that if (P, S) is a good pair, ℓ P has to be a marked character.When we fix a good pair (P, S) and we refer to the optimal solution, we mean the common subsequence of size d made by If there is no substring of length d shared between A and B, then obviously no good pair exists.Let us now compute the probability of sampling such P and S under the assumption that there is a common substring of length d shared between A and B. There are at least ⌊d/3⌋ marked characters in A and thus with probability Ω(d/n) we sample the right pattern.Moreover, the corresponding substring of solution in B has at least ⌊d/3⌋ positions such that if we sample S starting from those positions the pair (P, S) is good.Thus, with probability Ω(d 2 /n 2 ) we sample a good pair (P, S).
We describe below a quantum procedure that given a good pair (P, S), constructs with probability at least 1 − 1/poly(n) a common substring of length d in time Õ( √ d).By first sampling (P, S) and then running this procedure, we get a one-sided procedure that verifies if there exists a common substring of length d shared between A and B with probability Ω(d 2 /n 2 ) in time Õ( √ d).With amplitude amplification, we can thus construct a quantum algorithm that verifies if there exists a common substring of length d with high probability in time Õ( We summarize this result in the following lemma.Lemma 3.2.There exists a quantum algorithm that runs in time Õ(n/ √ d) and verifies with probability 9/10 if there is a common substring of length d between the two strings.
Constructing a common substring from a good pair.From here on, we assume that (P, S) is good and describe how to construct a common substring of length d in time Õ( √ d).We aim to find positions of S that match with P .For this purpose, we use the string matching algorithm of Ramesh and Vinay [RV03] that searches P in S in time Õ( |S| + |P |).If there is no match, then we can conclude that S and P do not meet our property and therefore this should not happen.If there is one such position, then we can detect in time Õ( √ d) if by extending this matching from two ends we can obtain a common substring of size d as follows: We use Grover's search to find the left-most (up to d characters away) position where the two substrings differ when extending them from right.We do the same from the left and it tells us if this gives a common substring of size at least d.Each search takes time Õ( √ d).We refer the reader to Figure 5 for a pictorial illustration.More generally, the above idea works when there are only O(1) many positions of S that match with P .However, we should address the case that P appears many times in S. In the following we discuss how this can be handled.In time Õ( √ d) we first find the leftmost and rightmost positions of S that match with P .Let the starting index of the leftmost match be ℓ and the starting index of the rightmost match be r.In other words, P = S[ℓ, ℓ and we have |P | = 2⌊d/3⌋ then the two substrings overlap.Therefore, P as well as the entire string S[ℓ, r + |P | − 1] is (r − ℓ)-periodic.This is the key property on which our algorithm will be based.
Let us denote the substring S[ℓ, r + |P | − 1] by T .We extend both P and T from left and right up to a distance of 2d in the following way: We increase the index of the ending point so long as the substring remains (r − ℓ)-periodic.We also stop if we move the ending point by more than 2d.We do the same for the starting point; we decrease the starting point so long as the substring remains (r − ℓ)-periodic and up to at most a distance of 2d.Since we bound the maximum change by 2d, then this can be done in time Õ( (i).The starting index of the corresponding optimal solution in A is smaller than α.Since the matched parts of the solution are both in the periodic segments, this implies that the first non-periodic indices (when going backwards from the matched parts) in the two solution intervals are 6a).
(ii).The ending index of the corresponding optimal solution in A is larger than β and thus (with a similar argument as above) the corresponding character of 6b).
(iii).The corresponding optimal solution in A is in the interval A[α, β] and thus it is (r−ℓ)-periodic (Figure 6c).
In all three cases, we can find the optimal solution in time Õ( √ d).For the first two cases, we know one correspondence between the two common substrings of length d.More precisely, in Case 1 we know that A[α] is part of the solution and this character corresponds to B[α ′ ].Thus, it suffices to do a Grover's search from two ends to extend the matching in the two directions.Similarly in Case 2 we know that A[β] corresponds to B[β ′ ] and thus via a Grover's search we find a common substring of length d in the two strings.
Finally, for Case 3 we point out that since the entire solution lies in intervals A[α, β] and B[α ′ , β ′ ] then we have β ≥ α + d − 1 and β ′ ≥ α ′ + d − 1.Notice that both intervals A[α, β] and B[α ′ , β ′ ] are (r − ℓ)-periodic.Let the starting position of P in A be x.For a fixed position y in the interval [α ′ , β ′ ] that matches with P , if we extend this matching from the two ends we obtain a solution of size min{x − α, y − α ′ } + min{β − x + 1, β ′ − y + 1}.This means that extending one of the following matchings between P and S gives us a common subsequence of size d:

Combining the two algorithms
Combining Lemmas 3.1 and 3.2 gives us a solution in time Õ(n 5/6 ).
Theorem 2.1 (repeated).The longest common substring of two strings of size n can be computed with high probability by a quantum algorithm in time Õ(n 5/6 ).
Proof.We do a binary search on d (size of the solution).To verify a given d, we consider two cases: if d < n 1/3 we run the algorithm of Lemma 3.1 and otherwise we run the algorithm of Lemma 3.2.Thus, the overall runtime is bounded by Õ(n 5/6 ).

Quantum algorithm for approximate LCS
In the following, we show that if an approximate solution is desired then we can improve the runtime down to Õ(n 3/4 ).Similar to the above discussion, we use two algorithms for small d and large d.Our algorithm for large d is the same as the one we use for exact solution.For small d we modify the algorithm of Lemma 3.1 to improve its runtime down to Õ(n 2/3 d 1/6 ).This is explained in Lemma 3.3.
Lemma 3.3.For any constant 0 < ǫ < 1, there exists a quantum algorithm that runs in time Õ(n 2/3 d 1/6 ) and if the two strings share a common substring of length d, finds a common substring of length (1 − ǫ)d.
Proof.Similar to Lemma 3.1, we use element distinctness for this algorithm.We make two sets S A and S B and prove that they share two equal elements if and only if their corresponding substrings of size (1 − ǫ)d are equal.The difference between this algorithm and the algorithm of Lemma 3.1 is that here S A and S B contain O(n/ √ ǫd) elements instead of n − d + 1 elements.Let k = √ ǫd.We break both strings into blocks of size k and make the two sets in the following way (see Figure 7 for an illustration): • For each i such that i mod k = 1, we put element (A, i) in set S A .
• For each i such that ⌈i/k⌉ mod k = 1 we put element (B, i) in set S B .It follows that among the ǫd first characters of the solution, there is one position which is included in S A such that its corresponding position in B is also included in S B .Thus, if we define the equality between two elements (A, i) and (B, j) as Lemmas 3.2 and 3.3 lead to a quantum algorithm for LCS with approximation factor 1 − ǫ and runtime Õ(n 3/4 ).obtain all the bits of x i and y i in O(log n) time.We set S = {1, . . ., n} and say that a state (R 1 , R 2 ) ∈ T r is marked if there exists a pair (i, j) ∈ R 1 × R 2 such that x i = y j .Easy calculations show that the fraction of marked states is Ω(r 2 /n 2 ).By using an appropriate data structure that allows insertion, deletion and lookup operations in polylogarithmic time (see Section 6.2 of [Amb07] for details about how to construct such a data structure) to store the elements x i for all i ∈ R 1 , and using another instance of this data structure to store the elements y j for all j ∈ R 2 , we can perform the setup operation in time s(r) = Õ(r) and perform each update operation in time u(r) = Õ(1).Each checking operation can be implemented in time c(r) = Õ( √ r) as follows: perform a Grover search over R 1 to check if there exists i ∈ R 1 for which x i = y j for some j ∈ R 2 (once i is fixed the later property can be checked in polylogarithmic time using the data structure representing R 2 ).6 From Proposition 4.1, the overall time complexity of the quantum walk is thus By taking r = n 2/3 we get time complexity Õ(n 2/3 ).As discussed in Section 6.1 of [Amb07], the above data structure, and thus the whole quantum walk can also be implemented in the same way in the (weaker) comparison model: if given any pair of indices (i, j) ∈ [n]×[n] we can decide in T (n) time whether x i < y j or x i ≥ y j , then the time complexity of the implementation is Õ(n 2/3 T (n)).

Quantum algorithm for exact LCS of non-repetitive strings
We set S = {1, . . ., n − d + 1}.Let us write α = 54 log n.For any state (R 1 , R 2 ) ∈ T r we define the data structure D(R 1 , R 2 ) as follows: • D(R 1 , R 2 ) records all the values A[i] and B[j] for all i ∈ R 1 and j ∈ R 2 using the same data structure as the data structure considered above for element distinctness; • additionally, D(R 1 , R 2 ) records the number of pairs (i, and stores explicitly all these pairs (using a history-independent data structure updatable in polylogarithmic time, which can be constructed based on the data structure from [Amb07]).
We define the set of marked states T * r as follows.We say that (R 1 , R 2 ) is marked if the following two conditions hold: The following lemma is easy to show.Lemma 4.2.If the two non-repetitive strings A and B have a common substring of length d, then the fraction of marked states is Ω r 2 /n 2 .
Proof.The fraction of states (R 1 , R 2 ) for which there exists a pair (i, The fraction of (R 1 , R 2 ) such that Condition (i) holds is thus Ω(r 2 /n 2 ).Let X be the random variable representing the number of pairs (i, is taken uniformly at random in T r .We will show that X ≤ α(r 2 /n + 1) with high probability.Let us prove this upper bound for the worst possible case: input strings for which the number of pairs (i, j) ∈ S × S such that A Assume that we have fixed R 1 .For each i ∈ R 1 there exists a unique index j ∈ S such that Let Y be the random variable representing the total number of pairs (i, j) ∈ R 1 × R 2 such that A[i] = B[j] when choosing R 2 .Note that Y follows an hypergeometric distribution of mean r 2 /n.From standard extensions of Chernoff's bound (see, e.g., Section 1.6 of [DP09]), we get Thus the fraction of (R 1 , R 2 ) such that Condition (ii) does not hold is at most 1/n 3 .The statement of the lemma then follows from the union bound.
Let us now analyze the costs corresponding to this quantum walk.The setup and update operations are similar to the corresponding operations of the quantum walk for element distinctness described in Section 4.1.The only difference is that we need to keep track of the pairs (i, since the strings are non-repetitive.Moreover this index can be found in polylogarithmic time using the data structure associated with R 2 .Similarly, for each j ∈ [n], there is at most one index i ∈ R 1 such that A[i] = B[j], which can be also found in polylogarithmic time.Thus the time complexities of the setup and update operations are s(r) = Õ(r) and u(r) = Õ(1), respectively, as in the quantum walk for element distinctness described in Section 4.1.The checking operation first checks if the number of pairs (i, j) ∈ R 1 × R 2 such that A[i] = B[j] is at most α(r 2 /n + 1) and then, if this condition is satisfied, performs a Grover search on all pairs (i, Using Proposition 4.1, we get that the overall time complexity of the quantum walk is For d ≤ n 1/3 , this expression is minimized for r = n 2/3 and gives complexity Õ(n 2/3 ).For d ≥ n 1/3 , the complexity is dominated by the last term Õ( √ nd).By combining the above algorithm with the Õ(n/ √ d)-time algorithm of Lemma 3.2, we get overall complexity Õ(n 3/4 ).
Theorem 2.4 (repeated).The longest common substring of two non-repetitive strings of size n can be computed with high probability by a quantum algorithm in time Õ(n 3/4 ).
Proof.We do a binary search on d.To verify a given d, we consider two cases: if d < √ n we run the Õ( √ nd)-time quantum algorithm we just described and otherwise we run the algorithm of Lemma 3.2.Thus, the overall runtime is bounded by Õ(n 3/4 ).

Quantum algorithm for approximate LCS of non-repetitive strings
We set again S = {1, . . ., n − d + 1} and, for any (R 1 , R 2 ) ∈ T r , we define the data structure D(R 1 , R 2 ) exactly as above.This time, we say that (R 1 , R 2 ) is marked if the following two conditions hold: where we use the same value α = 54 log n as above.We now show the following lemma.
Lemma 4.3.Assume that r ≤ n/d.If A and B have a common subsequence of length d, then the fraction of marked sets is Ω dr 2 n 2 .
Proof.Let us write m = d − ⌈(1 − ǫ)d⌉ and say that a pair (i, . There are at least m good pairs in S × S. The probability that R 1 does not contain any element involved in a good pair is where we used the fact that exp(−x) ≤ 1 − x/2 on x ∈ [0, 1] to derive the last inequality.Thus with probability at least rm/(2n), the set R 1 contains at least one element involved in a good pair.Assuming that R 1 contains at least one element involved in a good pair, when choosing R 2 at random with probability at least r/n it contains a good pair.Thus the overall probability that R 1 × R 2 contains a good pair (and thus satisfies Condition (i)) is at least Ω(dr 2 /n 2 ).
The fraction of (R 1 , R 2 ) such that Condition (ii) does not hold is at most 1/n 3 (the proof of this claim is exactly the same as for Lemma 4.2).The statement of the lemma then follows from the union bound.
We thus get a quantum walk with running time If d ≤ √ n then the expression is minimized for r = n 2/3 /d 1/3 and the complexity is Õ(n 2/3 /d 1/3 ).(Note that with this value of r the condition of Lemma 4.  Let S be the substring of length d that spans the interval [r, r + d − 1] of the input string.The main strategy of our checking procedure is to identify positions of S that are candidates for the center of the solution.For this purpose we also define a pattern P of length ⌈d/2⌉ which is equal to the reverse of interval [r, r + ⌈d/2⌉ − 1].We say a position i of S matches with pattern P if If r is marked, then it is in the left half of some palindrome substring A[ℓ, ℓ+d−1].Moreover, the size of S is equal to d and by the way we construct P it is equal to the reverse of A[r, r + ⌈d/2⌉ − 1].Thus, A[2ℓ + d − ⌈d/2⌉ − r, 2ℓ + d − r − 1] is equal to P and completely lies inside S. Thus, by finding all positions of S that match with P , we definitely find A[2ℓ In particular, if there is only one position of S that matches with P , this enables to identify the center of In Section 5.2.1 below we show how to identify efficiently all the positions of S that match with pattern P .These positions give a list of candidates for the center of a solution.At least one of them is the center of a solution.If there is only one candidate (or a constant number of candidates), we can then via Grover's search determine in time Õ( √ d) whether the size of the corresponding palindrome substring is d or not.In Section 5.2.2, we explain how to deal with the case where there are many candidates for the center.

Identifying the patterns
Below we describe how to identify all positions of the S that match with pattern P by considering the periodicity of P .We show that this can be done in time Õ( √ d).To do so, we first find the rightmost occurrence of the P in S using the quantum algorithm of Ramesh and Vinay [RV03].This takes time Õ( set of all occurrences of P in S.More precisely, every element of T is the starting position of an occurrence of P in S. Since |S| ≤ 2|P |, there is one special case in which o = o r − |P |.In this case |T | contains exactly two elements and the proof is trivial.Thus, we assume in the following that the two rightmost occurrences of P in S overlap. Remember that we say a string s is q-periodic if we have s i = s i+q for all 1 ≤ i ≤ |s| − q.Since P appears as a substring of S at both locations o r and o, the pattern P is (o r − o)-periodic.Moreover, P is not i-periodic for any 1 < i < o r − o since otherwise o would not be the position of the second rightmost occurrence of P in S. We first prove that each element of T is a position in S that matches with P .Recall that by the way we construct S and P , P is equal to the reverse of the first |P | characters of S.Moreover, |S| ≤ 2|P | and thus the entire interval S[1, o r + |P | − 1] is (o r − o)-periodic (we refer to Figure 9 for an illustration).This implies that every element of T is the beginning of one occurrence of P in S.
The green substring is equal to pattern P and the orange substring is the reverse of P .
In order to show that T contains all occurrences of P in S, assume for the sake of contradiction that there is a position p of S that is the beginning of one occurrence of P but p is not included in T .This implies that there is one element e ∈ T such that |e − p| < o r − o.Thus, with the same argument P is |e − p|-periodic.This is in contradiction with what we proved previously.

Checking the patterns
We have shown that the set T computed in Section 5.2.1 is the set of the starting positions of each occurrence of P in S. We define which is the set of all possible centers for the solution.Since we consider all occurrences of P in S, one of the elements in C is equal to the actual center of our solution.If |C| = O(1), one can iterate over all elements of C and verify in time O( √ d) if each one is a center for a palindrome substring of size d.In the following, we show that even if |C| is large, we can narrow down the search to a constant number of elements in C, which makes it possible to find efficiently a palindrome substring of length d.
We showed in Section 5.2.1 that C is large only if P is periodic with small periodicity.Let α be the periodicity of P .As we discuss in Section 5.2.1, every consecutive pair of elements in T have distance α.Thus, each pair of consecutive elements in C have distance α/2.Let y be the largest index of A such that A[r, y] is α-periodic.Also, let x be the smallest index of A such that A[x, y] is α-periodic.We prove two things in the following: 1) If y − x + 1 ≥ d + α then we already have a palindrome substring of size d in the interval [x, y] that can easily be found in time Õ( √ d); 2) Otherwise the center of our solution has a distance of at most α/2 from (x + y)/2, and thus we only need to consider a constant number of elements in C, which enables us to solve the problem in time Õ( √ d) as well.
Here is the proof of the first statement.Proof.
x r + e + |P | − 2 r y x ′ x ′ x ′′ y ′′ < α distance ≡ 0 α Otherwise, instead of finding x and y, we extend r from both ends up to a distance of 2d so long as it remains α-periodic.This can be done in time Õ( √ d) and then with the same analysis we can find a palindrome substring of size at least d.
The proof of the second statement relies on the following lemma.
Lemma 5.2.If y − x + 1 < d + α then the center of the solution cannot be more than 2α away from (x + y)/2.
Proof.We denote the optimal solution by A[ℓ, ℓ + d − 1].Let c = ℓ + (d − 1)/2 be the center of the solution.Assume for the sake of contradiction that c < (x + y)/2 − 2α.This means that ℓ < x − α.Thus, the interval A[x − α, x − 1] is exactly equal to the interval A[x, x + α − 1] and thus A[x − α, y] is also α-periodic.This is in contradiction with the maximality of [x, y].A similar proof holds for the case that c > (x + y)/2 + 2α.Lemma 5.2 implies that we only need to consider all candidates in C that are within the range [(x + y)/2 − 2α, (x + y)/2 + 2α].Moreover, the distance between every pair of candidates in C is α/2 and therefore we only need to consider a constant number of elements.Therefore, we can find the solution in time Õ( √ d).

(1 + ǫ)-Approximation of the Ulam distance
In this section we prove the following theorem.
Theorem 2.6 (repeated).For any constant ǫ > 0, there exists a quantum algorithm that computes with high probability a (1+ǫ)-approximation of the Ulam distance of two non-repetitive strings in time Õ( √ n).

Classical indicator for the Ulam distance
Naumovitz, Saks and Seshadhri [NSS17] showed how to construct, for any constant ǫ > 0, a classical algorithm that computes a (1 + ǫ)-approximation of the Ulam distance of two non-repetitive strings A, B of length n in time Õ(n/ud(A, B) + √ n).Their algorithm is complex: it consists of nine procedures that form a hierarchy of gap tests and estimators, each with successively better run time.The core technique, which lies at the lowest layer of the hierarchy and gives a very good -but slow -estimation of the Ulam distance, is a variant of the Saks-Seshadhri algorithm for estimating the longest increasing sequence from [SS10].
For our purpose, we will only need the following result from [NSS17]. 8Proposition 6.1 ([NSS17]).Let δ > 0 be any constant.For any two non-repetitive strings A, B of length n and any integer parameter t ′ ≥ c • ud(A, B) for some constant c depending only on δ, there exists a Õ( √ t ′ )-time classical algorithm UlamIndic(A, B, δ, t ′ ) that outputs 1 with some probability p and 0 with probability (1 − p), for some probability p such that 8 The precise statement of this result appears in Lemma 8.5 and Table 1 in [NSS17].The procedure is denoted XLI2 in [NSS17].Note that the original statement is actually more general: it considers strings A, B of different lengths and gives an indicator for a slightly different quantity (called Xloss(A, B) in [NSS17]).When A and B have the same length we have Xloss(A, B) = 1 2 ud(A, B), which gives the statement of Proposition 6.1.
The procedure of Proposition 6.1 lies at one of the lowest layers of the hierarchy in [NSS17], and works by applying the Saks-Seshadhri algorithm from [SS10] on randomly chosen substrings of A and B (of size roughly t).9

Quantum algorithm for the Ulam distance
To prove Theorem 2.6, the basic idea is to apply quantum amplitude estimation on the classical algorithm UlamIndic from Proposition 6.1.Let us explain this strategy more precisely and show the technical difficulties we need to overcome.Assume that we know that ud(A, B) is in the interval [D 1 , D 2 ] for some values D 1 and D 2 .We can then use UlamIndic(A, B, δ, t) with t = c • D 2 , and apply the quantum amplitude estimation algorithm of Theorem 6.2 to estimate the probability it outputs 1.If we use k = Θ( (n + t)/D 1 ) = Θ( n/D 1 ) in Theorem 6.2 and δ small enough, we will get a good approximation of the quantity ud(A, B)/(n + t), and thus a good approximation of ud(A, B).The complexity of this strategy is Õ The main issue is that we do not know such tight upper and lower bounds on ud(A, B).Concerning the upper bound, we overcome this difficultly by simply successively trying D 2 = n, D 2 = (1 − η)n, D 2 = (1 − η) 2 n, . . .for some small constant η (for technical reasons we actually start from D √ n/c, and deal with the case of larger D 2 using the classical algorithm from [NSS17]).For the lower bound, on the other hand, we cannot simply start with D 1 = 1 and iteratively increase this value, since the cost would be too high: in order to achieve an overall running time of Õ( √ n) we need to keep D 1 ≈ D 2 .Instead of estimating the probability that UlamIndic outputs 1 using Theorem 6.2, which is too costly, we thus design a gap test (see Proposition 6.3 below) that enables us to check if this probability is larger than D 2 or smaller than (1 − η)D 2 much more efficiently, in time Õ( n/D 2 ).We now present the details of our quantum algorithm.Let us first present the gap test.This test relies on quantum amplitude amplification.Here is the precise statement of quantum amplitude amplification that we will use.Theorem 6.2 (Theorem 12 in [BHMT02]).Let A be a classical algorithm that runs in time T , outputs 1 with probability p and outputs 0 with probability 1−p, for some (unknown) value p ∈ [0, 1].For any integer k ≥ 1, there exists a quantum algorithm that runs in time Õ(kT ) and outputs with probability at least 8/π 2 an estimate p such that The gap test is described in the following proposition.
Proposition 6.3.Let A be a classical algorithm that runs in time T , outputs 1 with probability p and outputs 0 with probability 1−p, for some (unknown) value p.For any q ∈ (0, 1] and any constant η ∈ (0, 1], there exists a quantum algorithm denoted QTest(A, q, η) that runs in time Õ(T / √ q) and with probability at least 1 − poly(n) outputs LARGE if p ≥ q and SMALL if p ≤ (1 − η)q.
Proof. Figure 12 describes our main quantum gap test.The complexity of this test is Õ(T / √ q), from Theorem 6.2.We show below that its success probability is at least 8/π 2 .The success probability can then be increased to 1 − 1/poly(n) by repeating the test Θ(log n) times and using majority voting.
1. Apply the algorithm of Theorem 6.2 with k = 20 η √ q .Let p denote the output.
We are now ready to give the proof of Theorem 2.6.Proof of Theorem 2.6: n, where c is the constant from Proposition 6.1, then ud(A, B) can already be computed in Õ( √ n) time by the classical algorithm from [NSS17].We thus assume below that ud(A, B) ≤ 1−ǫ c √ n.We also assume that ud(A, B) > 0, since otherwise the two strings are identical, which can be checked in Õ( √ n) time using Grover search.
2. For i from 1 to r do: 2.2.Apply Algorithm QTest(A, q, η) with A = UlamIndic(A, B, δ, c 1−ǫ t).2.3.If the output is LARGE, then stop and return t.Then observe that for r = log((1−δ)( holds, since we are assuming that ud(A, B) ≥ 1.From Proposition 6.3 combined with (3), this means that with probability at least 1 − 1/poly(n), the algorithm will never reach Step 3 (and thus does not output ERROR).In the remaining of the proof, we assume that the algorithm stops before reaching Step 3. Let i * denote the value of i during the last iteration of the loop of Step 2, and write t √ n and q * = t * /n.Observe that the output of the algorithm is t * .
Assume first that i * = 1.The output is thus t Observe that Algorithm QTest then necessarily outputted LARGE for i = 1.Proposition 6.3 guarantees that with probability at least 1− 1/poly(n) the inequality p t * ≥ (1− η)q * holds (otherwise it would have outputted SMALL).This inequality combined with (3) implies that and thus the output t * = nq * of the algorithm satisfies the following inequalities: where we used the inequality 1/(1 − x) ≤ 1 + 2x, which holds for any x ∈ [0, 1/2].Also note that the inequality is trivially satisfied since we are assuming ud(A, B) ≤ 1−ǫ c √ n and η < ǫ.The output is thus a (1 + ǫ)-approximation of ud(A, B).Assume now that i * ≥ 2. Observe that Algorithm QTest then necessarily outputted LARGE for i = i * and SMALL for i = i * − 1.Since it outputted LARGE for i = i * , Inequalities (4) hold with probability at least 1 − 1/poly(n), from exactly the same argument as above.Since the output was SMALL for i = i * − 1, Proposition 6.3 guarantees that with probability at least 1 − 1/poly(n) the inequality p t * /(1−η) ≤ q * /(1 − η) holds (otherwise it would have outputted LARGE).Since t * = nq * , this inequality combined with (3) implies: The output of the algorithm is thus a (1 + ǫ)-approximation of ud(A, B) as well.
Since the complexity of applying Algorithm UlamIndic at Step i is Õ( (1 − η) i √ n), the overall complexity of the algorithm is as claimed.

Lower bounds
In this section we prove lower bounds for the problems considered in this paper.We start by an easy lower bound for the longest common substring over a large alphabet.
Theorem 7.1.For any constant c ∈ (0, 1], any quantum algorithm that computes with high probability a c-approximation of the longest common substring of two strings of length n over an alphabet of size 2n requires Ω(n 2/3 ) time.This lower bound also holds for non-repetitive strings.
Proof.Consider the following version of the element distinctness problem: given a list L of m characters in an alphabet of size m such that either all the characters of L are distinct, or only one character occurs twice in L (i.e., the other m − 2 characters are distinct), decide which of the two cases holds.A Ω(m 2/3 )-query lower bound is known for this problem in the quantum setting [AS04,Amb05,Kut05].Let us take m = 2n.Construct a string A of length n by taking n elements from L uniformly at random, and construct another string B of length n using the remaining n elements from L.
Observe that if all the characters in L are distinct then A and B are non-repetitive and have no common substring.On the other hand, if the characters in L are not all distinct, then with probability at least 1/2 the two strings A and B are non-repetitive and have a common substring of length 1.This gives a (randomized) reduction from element distinctness problem to our problem since a c-approximation of the longest common substring enables us to distinguish between the two cases.
We now present a lower bound for LCS that holds even for binary strings.
Theorem 7.2.For any constant c ∈ (0, 1], any quantum algorithm that computes with high probability a c-approximation of longest common substring of two binary strings of length n requires Ω(n 2/3 ) time.
Proof.The main idea is simple: by dividing the input strings into blocks of size Θ(log n), we can reduce the case with alphabet of size 2n to the case of binary alphabets, and thus use the lower bound of Theorem 7.1 with only a logarithmic overhead.
We now give more details of the reduction.Let A and B denote strings of length n over an alphabet of size 2n.Each character of the alphabet is encoded by a random binary string of length s (i.e., a binary string of length s such that each bit is 1 with probability 1/2).Using easy arguments from probability theory (see, e.g., Section 2 of [AW85]), for any constant α > 0 we can guarantee that the following property holds with probability at least 9/10 if we take s ≥ d α • log n for some constant d α that depends only on α: the length of the longest common substring of the encodings of any two distinct characters is at most αs.Let us choose α = c/3 and consider s = ⌈d α • log n⌉.Below we assume that the above property holds (with happens with probability at least 9/10).
If the longest common substring of A and B has length zero then the longest common substring of their binary versions has length at most 2αs = 2cs/3 < cs.If the longest common substring of A and B has length at least one, on the other hand, then the longest common substring of their binary versions has length at least s.Thus a c-approximation of the longest common substring of the binary versions enables us to distinguish, with high probability, between the two cases.
We now give our lower bounds for LPS and for the computation of the Ulam distance.
Theorem 7.3.For any constant c ∈ (0, 1], any quantum algorithm that computes with high probability a c-approximation of the longest palindrome of a binary string of length n requires Ω( √ n) time.
Proof.Let m ≥ 3 be an integer.Let S 1 ⊆ {0, 1} m denote the set of all m-bit strings of Hamming weight one in which the first and last characters are both zero, and let S 0 ⊆ {0, 1} m be the set containing only the all-zero string.Distinguishing between strings in S 0 and S 1 requires Ω( √ m) queries in the quantum setting (see, e.g., [BBBV97]).Let us write k = ⌈3/c⌉.Given a string x ∈ S 0 ∪ S 1 and any r ∈ {1, . . ., k}, let x r be the binary string of length km obtained from x by replacing each 0 in x by 0 k and each 1 in x by 1 r 0 k−r .
Take n = k 2 m.We now consider the string A ∈ {0, 1} n obtained by concatenating the strings x 1 , x 2 , . .., x k .Each x i , for i ∈ {1, . . ., k}, is called a block of A. Observe that if x ∈ S 0 (i.e., if x is the all-zero string), then A is also the all-zero strings.In this case the length of the LPS of A is n.On the other hand, if x ∈ S 1 then no palindrome of A can include two full blocks (since the numbers of repeated 1s in distinct blocks do not match), and thus the length of the LPS of A is at most km + 2(km − 1) ≤ 3km < cn.Thus computing a c-approximation of the LPS of A enables us to distinguishing between strings in S 0 and S 1 .This gives the lower bound Ω( √ m) = Ω( √ n) on the complexity of computing a c-approximation of the LPS .
Proof.Let us consider the alphabet {1, 2, . . ., n}.Let A be the all-increasing string, i.e., A = 123 • • • n.Let B either the all-increasing string or the string obtained by permuting the ℓ-th position and the (ℓ + 1)-position of the all-increasing string for some unknown ℓ ∈ {1, . . ., n − 1}.Note that in former case the Ulam distance of A and B is zero, while in the second case the Ulam distance is two.Computing a (1 + ǫ)-factor approximation of the Ulam distance of A and B thus requires distinguishing between the two cases, which requires Ω( √ n) queries from the lower bound on Grover search [BBBV97].

Figure 1 :
Figure 1: When P appears in B only once.

Figure 2 :
Figure 2: When P appears several times in B.

Figure 4 :
Figure 4: An example of a good pair (P, Q).The orange part of S shows the part of S that matches with P .

Figure 5 :
Figure5: If P matches S in only one position, by extending the two ends in the two strings we can find a common substring of length d.The orange part of S shows the part of S that matches with P .Also, the red and green parts and extensions from left and right for the matched parts.
√ d) via Grover's search.Let us denote the resulting substrings by A[α, β] and B[α ′ , β ′ ].The following observation enables us to find the solution in time Õ( √ d): one of the following three cases holds for the optimal solution.

•
An optimal solution when x − α ≥ y − α ′ : the rightmost matching in the intervalB[α ′ , α ′ + |P | + (x − α)], or • An optimal solution when x − α ≤ y − α ′ : the leftmost matching in the interval B[α ′ + (x − α), β ′ ].Each matching can be found in time Õ( √ d) and similar to the ideas explained above we can extend each matching to verify if it gives us a common substring of size d in time Õ(

Figure 6 :
Figure 6: The three cases considered.The red intervals correspond to the longest common substring.The yellow interval is P and the green interval is S[ℓ, r + |P | − 1].The blue and grey intervals are extensions of the yellow and green intervals so long as they remain (r − ℓ)-periodic.

Figure 7
Figure 7: k = √ ǫd.Only the elements colored in red are included in the two sets.

Figure 8 :
Figure 8: If A[ℓ, ℓ + d − 1] is a palindrome, then the blue substring is equal to the reverse of the orange substring.
√ d) since |S| + |P | = O(d).If P does not appear in S at all or only once, we are done.Otherwise, let o r be the rightmost position of S that matches with P .That is S[o r , o r + |P | − 1] = P .Moreover, let o be the position of the second rightmost occurrence of P in S. We argue that T

Lemma 5. 1 .
If y − x + 1 ≥ d + α then one can find a palindrome substring of size at least d in time Õ( √ d).

Figure 10 :
Figure 10: Illustration for the proof of Lemma 5.1.All the colored intervals are palindrome.

Figure 11 :
Figure 11: Illustration for the proof of Lemma 5.2.Dashed arrows show that the two strings are the reverse of each other.Solid arrows show that the two strings are equal.

Figure 13 :
Figure 13: Quantum algorithm computing a (1 + ǫ)-approximation of the Ulam distance of two strings A and B.

Table 1 :
Our algorithms are shown in this table.

Table 2 :
This table includes the quantum lower bounds for the problems we study in this work.
Thus, by losing an O(log n) factor in the runtime, we reduce the problem to verifying if a solution of size at least d exists for our problem instance.Exact quantum algorithm for LCS (small d).For small d, we use element distinctness to solve LCS.Let |Σ| be the size of the alphabet and v : Σ → [0, |Σ| − 1] be a function that maps every element of the alphabet to a distinct number in range 0 . . .|Σ| − 1.In other words, v is a perfect hashing for the characters.We extend this definition to substrings of the two strings so that two substrings t and t ′ are equal if and only if we have v(t) = v(t ′ ).From the two strings, we then make two sets of numbers S A and S B each having n − d + 1 elements.Element i of set S A is a pair (A, i) approximation for LCS and LPS (large d).We use another technique to solve LCS and LPS when the solution is large.While for LPS, this new idea alone gives an optimal solution, for LCS we need to combine it with the previous algorithm to make sure the runtime is sublinear.