TY - JOUR
T1 - gECC
T2 - A GPU-based high-throughput framework for Elliptic Curve Cryptography
AU - Xiong, Qian
AU - Ma, Weiliang
AU - Shi, Xuanhua
AU - Zhou, Yongluan
AU - Jin, Hai
AU - Huang, Kaiyi
AU - Wang, Haozhou
AU - Wang, Zhengru
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025
Y1 - 2025
N2 - Elliptic Curve Cryptography (ECC) is an encryption method that provides security comparable to traditional techniques like Rivest-Shamir-Adleman (RSA) but with lower computational complexity and smaller key sizes, making it a competitive option for applications such as blockchain, secure multi-party computation, and database security. However, the throughput of ECC is still hindered by the significant performance overhead associated with elliptic curve (EC) operations, which can affect their efficiency in real-world scenarios. This article presents gECC , a versatile framework for ECC optimized for GPU architectures, specifically engineered to achieve high-throughput performance in EC operations. To maximize throughput, gECC incorporates batch-based execution of EC operations and microarchitecture-level optimization of modular arithmetic. It employs Montgomery's trick [40] to enable batch EC computation and incorporates novel computation parallelization and memory management techniques to maximize the computation parallelism and minimize the access overhead of GPU global memory. Furthermore, we analyze the primary bottleneck in modular multiplication by investigating how the user codes of modular multiplication are compiled into hardware instructions and what these instructions' issuance rates are. We identify that the efficiency of modular multiplication is highly dependent on the number of Integer Multiply-Add (IMAD) instructions. To eliminate this bottleneck, we propose novel techniques to minimize the number of IMAD instructions by leveraging predicate registers to pass the carry information and using addition and subtraction instructions (IADD3) to replace IMAD instructions. Our experimental results show that, for ECDSA and ECDH, the two commonly used ECC algorithms, gECC can achieve performance improvements of 5.56 × and 4.94 ×, respectively, compared to the state-of-the-art GPU-based system. In a real-world blockchain application, we can achieve performance improvements of 1.56 ×, compared to the state-of-the-art CPU-based system. gECC is completely and freely available at https://github.com/CGCL-codes/gECC.
AB - Elliptic Curve Cryptography (ECC) is an encryption method that provides security comparable to traditional techniques like Rivest-Shamir-Adleman (RSA) but with lower computational complexity and smaller key sizes, making it a competitive option for applications such as blockchain, secure multi-party computation, and database security. However, the throughput of ECC is still hindered by the significant performance overhead associated with elliptic curve (EC) operations, which can affect their efficiency in real-world scenarios. This article presents gECC , a versatile framework for ECC optimized for GPU architectures, specifically engineered to achieve high-throughput performance in EC operations. To maximize throughput, gECC incorporates batch-based execution of EC operations and microarchitecture-level optimization of modular arithmetic. It employs Montgomery's trick [40] to enable batch EC computation and incorporates novel computation parallelization and memory management techniques to maximize the computation parallelism and minimize the access overhead of GPU global memory. Furthermore, we analyze the primary bottleneck in modular multiplication by investigating how the user codes of modular multiplication are compiled into hardware instructions and what these instructions' issuance rates are. We identify that the efficiency of modular multiplication is highly dependent on the number of Integer Multiply-Add (IMAD) instructions. To eliminate this bottleneck, we propose novel techniques to minimize the number of IMAD instructions by leveraging predicate registers to pass the carry information and using addition and subtraction instructions (IADD3) to replace IMAD instructions. Our experimental results show that, for ECDSA and ECDH, the two commonly used ECC algorithms, gECC can achieve performance improvements of 5.56 × and 4.94 ×, respectively, compared to the state-of-the-art GPU-based system. In a real-world blockchain application, we can achieve performance improvements of 1.56 ×, compared to the state-of-the-art CPU-based system. gECC is completely and freely available at https://github.com/CGCL-codes/gECC.
KW - Elliptic curve cryptography
KW - GPU acceleration
KW - modular multiplication
U2 - 10.1145/3736176
DO - 10.1145/3736176
M3 - Journal article
AN - SCOPUS:105018675590
SN - 1544-3566
VL - 22
JO - ACM Transactions on Architecture and Code Optimization
JF - ACM Transactions on Architecture and Code Optimization
IS - 3
M1 - 84
ER -