Appendix A
Shor's Algorithm
Shor’s algorithm, developed by Peter Shor in 1994, is a quantum algorithm that efficiently factors large integers, a problem that is computationally intractable for classical computers.
The algorithm’s brilliance lies in reducing the factoring problem to a problem of period finding, which quantum computers can solve exponentially faster than classical computers. Specifically, to factor a number N, Shor’s algorithm finds the period r of the function f(x) = ax mod N for some randomly chosen a. This period r is the key to finding factors: if r is even and a(r/2) is not congruent to -1 mod N, then gcd(a^(r/2) ± 1, N) will yield a nontrivial factor of N with high probability. While the reduction from factoring to period finding is purely classical number theory, the quantum speedup comes from how efficiently a quantum computer can find this period.
The quantum advantage in period finding comes from the Quantum Fourier Transform (QFT), a quantum analogue of the classical discrete Fourier transform. The QFT can be implemented on a quantum computer using only O((log N)2) quantum gates, whereas a classical FFT requires O(N log N) operations (an exponential difference in scaling). In Shor’s algorithm, the QFT is applied to a superposition of states that encodes the periodic function f(x). Through the properties of quantum interference discussed earlier, the QFT amplifies the amplitudes corresponding to multiples of the period while suppressing all other amplitudes. When the quantum state is measured after the QFT, the result reveals information about the period r with high probability. This entire quantum subroutine runs in polynomial time—specifically O((log N)3) operations using fast multiplication techniques—making it feasible to factor numbers that would be impossible for classical computers.
To understand the dramatic speedup, consider the best known classical factoring algorithm: the General Number Field Sieve (GNFS). GNFS has a sub-exponential runtime of approximately O(exp((64/9 * log N)(1/3) * (log log N)(2/3))), which, while better than purely exponential algorithms like trial division, still grows extremely rapidly with the size of N. For a 2048-bit number (commonly used in RSA encryption), GNFS would require roughly 2^112 operations—far beyond the reach of any conceivable classical computer, even with all the world’s computational resources combined. In contrast, Shor’s algorithm running on a quantum computer could factor the same 2048-bit number using only about 10 million quantum gates, executable in a matter of hours once a sufficiently large, fault-tolerant quantum computer exists. This exponential-to-polynomial reduction in complexity is what makes Shor’s algorithm a fundamental threat to modern cryptography, transforming a problem that would take longer than the age of the universe into one solvable in an afternoon.
Quantum Period Finding
Shor’s algorithm is the most famous instance of a more general quantum technique: quantum period finding. The core insight is that many hard number-theoretic problems—integer factoring, discrete logarithms over finite fields, and elliptic curve discrete logarithms—can each be reduced to the problem of finding the period of a function defined over a cyclic group. Classical computers have no efficient method for extracting such periods when the group is exponentially large, but a quantum computer can do so in polynomial time.
The quantum period-finding subroutine proceeds as follows. Two quantum registers are prepared: a phase register of approximately 2n qubits initialized in a uniform superposition, and a work register of n qubits. The function f(x)—for example, ax mod N in the case of factoring—is computed coherently into the work register, entangling the two registers such that each value in the work register is correlated with a set of phase register values spaced exactly r apart, where r is the unknown period. At this point, the Quantum Fourier Transform (QFT) is applied to the phase register. The QFT converts the periodic structure encoded in the amplitudes into sharp peaks at integer multiples of 2(2n) / r in the frequency domain. A measurement of the phase register then yields a value close to j * 2^(2n) / r for a random integer j, from which r can be extracted via the classical continued fractions algorithm. Repeating the procedure O(1) times succeeds with high probability.
The QFT itself is efficient—requiring only O(n2) gates, or O(n log n) using the approximate QFT of Coppersmith (1994)—and is never the computational bottleneck. The dominant cost is the modular exponentiation circuit, which must compute ax mod N for a superposition of exponentially many values of x. In a naive implementation this requires O(n^3) quantum gates, and virtually all algorithmic optimizations to Shor’s algorithm focus on reducing the cost of this step through better arithmetic circuits, reformulations of the underlying number-theoretic problem, or space-time tradeoffs in the quantum circuit layout.
The generality of quantum period finding is what makes it so consequential: the same subroutine that factors integers also breaks Diffie-Hellman key exchange over finite fields and recovers private keys from elliptic curve public keys. Any cryptosystem whose security rests on the hardness of a hidden period problem is, in principle, vulnerable to a sufficiently large quantum computer running a variant of this technique.
Algorithmic Optimizations
Shor’s 1994 paper [3] described a general approach, but the way it is implemented in practice can be tailored to the specific problem and hardware constraints. Over the past three decades, a series of algorithmic optimizations have substantially reduced the quantum resources required to carry out Shor’s algorithm—lowering qubit counts, shrinking circuit depths, and reducing the number of expensive T-gates that dominate the cost of fault-tolerant quantum computation. These improvements matter because they effectively lower the threshold at which a quantum computer becomes cryptographically relevant: even without any hardware breakthroughs, better algorithms bring the threat closer.
The following subsections describe the most significant optimizations, followed by a summary comparison table benchmarked against RSA-2048 factoring.
Beauregard (2002) [50]
Stephane Beauregard introduced a circuit for Shor’s algorithm requiring only 2n + 3 qubits, a dramatic reduction from the roughly 5n qubits needed by earlier constructions. The key technique is Draper’s QFT-based addition circuit, which performs arithmetic directly in the Fourier (phase) domain rather than the computational basis, eliminating the need for classical carry propagation. Beauregard further reduced qubit overhead by using a “semi-classical” QFT that reuses qubits via sequential phase kickback, removing the need for a separate phase register entirely. For RSA-2048, this yields approximately 4,099 logical qubits. However, the space efficiency comes at the cost of circuit depth: the full algorithm requires O(n3) Toffoli gates, translating to on the order of 1011 T-gates after decomposition—a d eep, serial circuit that demands long coherence times.
Zalka (1998, 2006) [51, 52]
Christof Zalka explored the opposite end of the space-time tradeoff spectrum, showing how to parallelize modular exponentiation to reduce circuit depth at the cost of additional qubits. His 2006 result demonstrated that factoring can be performed with as few as n + o(n) qubits, approaching the information-theoretic minimum, while his depth-optimized variant uses O(n^2) qubits to achieve O(n) circuit depth. Zalka’s work established the fundamental principle that space and time are exchangeable resources in quantum factoring—a principle that every subsequent optimization has exploited.
Ekera-Hastad (2017) [53]
Martin Ekera and Johan Hastad observed that integer factoring can be reformulated as a “short” discrete logarithm problem. When factoring N = pq, the factors p and q are each only n/2 bits long, yet standard Shor’s algorithm uses a phase register of 2n qubits to find the full order r, which may be as large as n bits. Ekera and Hastad showed that because the secret (the factor) is much smaller than the group order, a phase register of only approximately n/2 bits suffices. This roughly halves the number of controlled modular multiplications—the most expensive component of the circuit—yielding an approximately 4x reduction in total quantum gate count for factoring. This reformulation was later adopted as a core component of the Gidney-Ekera construction, and Ekera’s subsequent work extended the technique to provide constant-factor improvements for the elliptic curve discrete logarithm problem as well.
Gidney-Ekera (2021) [54]
Craig Gidney and Martin Ekera produced the most concrete and widely cited resource estimate for RSA-2048 factoring. Their construction combines multiple techniques: windowed arithmetic using classical precomputation to reduce quantum multiplication cost, oblivious carry runways to handle carry propagation without knowing intermediate values, the Ekera-Hastad short discrete log reformulation, and the approximate QFT. The result is a circuit requiring approximately 4,000 logical qubits and 2.4 * 1010 Toffoli gates (approximately 1011 T-gates). Under a surface code error correction scheme with physical error rates of 10^-3, this translates to roughly 20 million physical qubits and a runtime of 8 hours. Gidney’s subsequent work (2024) further refined these estimates, projecting that improved magic state distillation and the use of qLDPC error correcting codes could reduce the physical qubit requirement to approximately 4 million—a 5x improvement achieved purely through better classical and quantum compilation, with no changes to the underlying hardware.
Regev (2023) [11]
Oded Regev proposed a fundamentally different approach: multidimensional quantum period finding. Rather than computing a single modular exponentiation with an n-bit exponent, Regev’s algorithm computes d independent modular exponentiations with smaller (approximately n/d)-bit exponents, then applies a d-dimensional QFT and uses classical lattice reduction (LLL) to recover the period from the multidimensional measurement results. Setting d = sqrt(n), the total gate count improves asymptotically from O(n2) to O(n(3/2)), a genuine reduction in circuit depth that is well suited to coherence-limited hardware. The tradeoff is space: the basic formu lation requires O(n^(3/2)) qubits—roughly 90,000 for RSA-2048, far more than the 4,000 of Gidney-Ekera. Follow-up work by Ragavan and Vaikuntanathan (2023) showed that the algorithm can be made to work with O(n) qubits while preserving the O(n^(3/2)) gate count. Whether Regev’s approach offers a practical advantage over highly optimized standard Shor for problem sizes like RSA-2048 remains an active area of investigation.
Gouzien-Sangouard (2021) [55]
Elie Gouzien and Nicolas Sangouard demonstrated an extreme space-time tradeoff using a multimode quantum memory architecture. Their scheme factors a 2048-bit RSA integer using approximately 13,436 logical qubits—far fewer than Gidney-Ekera—but requires 177 days of continuous quantum computation. The total space-time volume (qubits multiplied by time) is comparable to other approaches, confirming that the fundamental computational cost is roughly conserved across different tradeoff points. This result is significant because it shows that factoring may become feasible on machines with relatively few, high-quality qubits long before million-qubit processors are available, provided those machines can maintain coherence over extended periods.
Litinski (2023) and Huang et al. (2025) [56, 2]
While the preceding optimizations target RSA factoring, parallel work has focused on the elliptic curve discrete logarithm problem (ECDLP), which underpins the signature schemes used by virtually all blockchains. Daniel Litinski showed that a 256-bit elliptic curve private key can be recovered using approximately 2,330 logical qubits and only 50 million Toffoli gates—roughly two orders of magnitude cheaper than RSA-2048 factoring [56]. More recently, researchers exploited the isomorphism between Edwards and Weierstrass curve representations to reduce the T-count by a further 75%, the T-depth by 87%, and qubit requirements by 12% [2]. These results confirm that ECDLP is a significantly easier quantum target than RSA factoring, meaning that the elliptic curve cryptography used by Bitcoin, Ethereum, and most modern protocols will likely be the first to fall.
Webster et al (2026) [1]
The Pinnacle Architecture, which utilizes quantum low-density parity check (qLDPC) codes to factor RSA-2048 with fewer than 100,000 physical qubits. By employing generalized bicycle codes, the architecture achieves an encoding density of roughly 101 physical qubits per logical qubit, a 10x to 40x improvement over previous surface code estimates. A key innovation is the “Magic Engine,” a specialized module that distills and injects magic states within a single code block to maintain constant throughput. While this significantly lowers the qubit threshold, the scheme shifts the engineering challenge to the requirement for non-local connectivity and high-speed real-time qLDPC decoding.
Comparison Table
The following table summarizes the major optimizations benchmarked against RSA-2048 factoring. T-gate counts reflect the total number of T-gates required after Toffoli decomposition. Physical qubit estimates assume surface code error correction with physical error rates near 10-3.
| Algorithm / Technique | Year | Key Optimization | T-gates (approx) | Space-Time Tradeoff |
|---|---|---|---|---|
| Shor (original) | 1994 | Quantum period finding via QFT | ~10^12 | ~6,000 logical qubits; deep serial circuit |
| Beauregard | 2002 | QFT-based arithmetic; 2n+3 qubits | ~10^11 | Minimal qubits (4,099); very deep circuit |
| Zalka | 2006 | Parallelized modular exponentiation | ~10^11 | Tunable: ~3,000 qubits (deep) to ~10,000+ qubits (shallow) |
| Ekera-Hastad | 2017 | Factoring as short discrete log; ~4x gate reduction | ~10^10 | ~3n/2 logical qubits; halved circuit depth |
| Gidney-Ekera | 2021 | Windowed arithmetic, oblivious carry, short DLP | ~10^11 | ~4,000 logical qubits / 20M physical; 8 hours |
| Gouzien-Sangouard | 2023 | Multimode memory; extreme space optimization | ~10^10 | ~13,400 logical qubits; 177 days |
| Regev | 2023 | Multidimensional period finding; lattice reduction | ~10^9–10^10 | ~90,000 qubits (basic) or O(n) (improved); shallow depth |
| Gidney (updated) | 2024 | Improved distillation + qLDPC codes | ~10^10 | ~4,000 logical qubits / ~4M physical; ~10 hours |
| Webster et al. | 2026 | Pinnacle Architecture, GB qLDPC codes, Magic Engine | ~10^10 | < 100,000 physical qubits (~101:1 overhead); month-scale runtime |
Implications
The trend across these optimizations is unambiguous: the quantum resources required to break classical cryptography have decreased steadily over the past three decades, and continue to decrease. Each new technique—whether it targets qubit count, circuit depth, T-gate cost, or error correction overhead—effectively lowers the hardware threshold at which a cryptographically relevant quantum computer (CRQC) becomes feasible. Importantly, these algorithmic improvements compose with advances in quantum hardware and error correction. A 2x improvement in qubit quality combined with a 2x reduction in algorithmic resource requirements yields a 4x reduction in the effective distance to a CRQC. The existence of multiple independent optimization axes—hardware, error correction, and algorithms—means that progress on any one front accelerates the overall timeline.