gECC: A GPU-based high-throughput framework for Elliptic Curve Cryptography

Qian Xiong, Weiliang Ma, Xuanhua Shi*, Yongluan Zhou, Hai Jin, Kaiyi Huang, Haozhou Wang, Zhengru Wang

*Corresponding author af dette arbejde

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

1 Downloads (Pure)

Abstract

Elliptic Curve Cryptography (ECC) is an encryption method that provides security comparable to traditional techniques like Rivest-Shamir-Adleman (RSA) but with lower computational complexity and smaller key sizes, making it a competitive option for applications such as blockchain, secure multi-party computation, and database security. However, the throughput of ECC is still hindered by the significant performance overhead associated with elliptic curve (EC) operations, which can affect their efficiency in real-world scenarios. This article presents gECC , a versatile framework for ECC optimized for GPU architectures, specifically engineered to achieve high-throughput performance in EC operations. To maximize throughput, gECC incorporates batch-based execution of EC operations and microarchitecture-level optimization of modular arithmetic. It employs Montgomery's trick [40] to enable batch EC computation and incorporates novel computation parallelization and memory management techniques to maximize the computation parallelism and minimize the access overhead of GPU global memory. Furthermore, we analyze the primary bottleneck in modular multiplication by investigating how the user codes of modular multiplication are compiled into hardware instructions and what these instructions' issuance rates are. We identify that the efficiency of modular multiplication is highly dependent on the number of Integer Multiply-Add (IMAD) instructions. To eliminate this bottleneck, we propose novel techniques to minimize the number of IMAD instructions by leveraging predicate registers to pass the carry information and using addition and subtraction instructions (IADD3) to replace IMAD instructions. Our experimental results show that, for ECDSA and ECDH, the two commonly used ECC algorithms, gECC can achieve performance improvements of 5.56 × and 4.94 ×, respectively, compared to the state-of-the-art GPU-based system. In a real-world blockchain application, we can achieve performance improvements of 1.56 ×, compared to the state-of-the-art CPU-based system. gECC is completely and freely available at https://github.com/CGCL-codes/gECC.

OriginalsprogEngelsk
Artikelnummer84
TidsskriftACM Transactions on Architecture and Code Optimization
Vol/bind22
Udgave nummer3
Antal sider27
ISSN1544-3566
DOI
StatusUdgivet - 2025

Bibliografisk note

Publisher Copyright:
© 2025 Copyright held by the owner/author(s).

Citationsformater