5 commits in all time Dec 17, 2025 20:27 – Mar 17, 2026 20:27 UTC
juaquinlarendo falcon
Merge 2d0a761232a1f0e616fbacb8169fed554b973c59 into ce15e75bceb372867daf6b8e81918ab6978686eb
Git Commit ec51ce56 Branch pull/12/merge Document 1/17 ++ 109 --
Rexicon226 falcon
Merge 4f4ae0e98b7d1f6b007cf9db21e7df3a475f9c4f into ce15e75bceb372867daf6b8e81918ab6978686eb
Git Commit 94806487 Branch pull/15/merge Document 4/322 ++ 12 --
Rexicon226 falcon
speedup mq_ntt with SIMD
The original goal of the DIT-DIF transformation, which
this implementation also uses, was to allow for the core
butterflys to be parallized with SIMD. Unfortunately, this
idea seems to have been mostly lost to time.

By having the omega (or s as it's called in this codebase)
one loop higher, we're able to perform the entire inner loop
as a single chain of SIMD instructions, specifically for the
cases when t = 8, t = 4, and even t = 16 for AVX512.

This gives us a very large speedup, bringing the total verification
time down to around 7.6us (on Zen 5), from 11.6us. This is
when combined with the other optimizations introduced in this branch.
Git Commit 4f4ae0e9 Branch pull/15/head Document 2/200 ++ 5 --
Rexicon226 falcon
speedup pubkey decoding with SIMD
See the docstrings placed around this commit for a better
understanding of the strategy used.
Git Commit 62307fb5 Branch pull/15/head Document 1/113 ++ 1 --
Rexicon226 falcon
amortize SHAKE256 extraction in H2P
During verification (when we use a variable-time H2P), Falcon samples the SHAKE
state in a loop until it gets enough elements to full the polynomial. The
criteria for a valid element is that it is smaller than (12289). The loop
samples 2 bytes (16-bits), intepreters them in big-endian, and then checks
the criteria.

The naive method used by the reference implementation has it sample 2 bytes
at a time and check each one. Instead, we can trivially optimize this by
simply extracting an amount of bytes close to the Keccak state size (136),
and re-use the bytes until we run out again.

This gives us an approximate 10% speedup for verification (saving 1.2us on Zen 5).
Git Commit 7fb6d547 Branch pull/15/head Document 1/9 ++ 6 --