IACR News item: 17 March 2025
Jiahui Gao, Yagaagowtham Palanikuma, Dimitris Mouris, Duong Tung Nguyen, Ni Trieu
DNA edit distance (ED) measures the minimum number of single nucleotide insertions, substitutions, or deletions required to convert a DNA sequence into another. ED has broad applications in healthcare such as sequence alignment, genome assembly, functional annotation, and drug discovery. Privacy-preserving computation is essential in this context to protect sensitive genomic data. Nonetheless, the existing secure DNA edit distance solutions lack efficiency when handling large data sequences or resort to approximations and fail to accurately compute the metric.
In this work, we introduce secureED, a protocol that tackles these limitations, resulting in a significant performance enhancement of approximately $2-24\times$ compared to existing methods. Our protocol computes a secure ED between two genomes, each comprising $1,000$ letters, in just a few seconds. The underlying technique of our protocol is a novel approach that transforms the established approximate matching technique (i.e., the Ukkonen algorithm) into exact matching, exploiting the inherent similarity in human DNA to achieve cost-effectiveness. Furthermore, we introduce various optimizations tailored for secure computation in scenarios with a limited input domain, such as DNA sequences composed solely of the four nucleotide letters.
In this work, we introduce secureED, a protocol that tackles these limitations, resulting in a significant performance enhancement of approximately $2-24\times$ compared to existing methods. Our protocol computes a secure ED between two genomes, each comprising $1,000$ letters, in just a few seconds. The underlying technique of our protocol is a novel approach that transforms the established approximate matching technique (i.e., the Ukkonen algorithm) into exact matching, exploiting the inherent similarity in human DNA to achieve cost-effectiveness. Furthermore, we introduce various optimizations tailored for secure computation in scenarios with a limited input domain, such as DNA sequences composed solely of the four nucleotide letters.
Additional news items may be found on the IACR news page.