CryptoDB
Matteo Scarlata
ORCID: 0009-0000-6285-6259
Publications and invited talks
Year
Venue
Title
2025
RWC
Mind the Gap! Secure File Sharing, from Theory to Practice
Abstract
End-to-end encryption (E2EE) allows data to be outsourced and stored on an untrusted server, such as in the cloud, without compromising its privacy. The need for stronger cryptographic guarantees for outsourced persistent data (such as encrypted files in cloud storage) has been highlighted by recent attacks on E2EE cloud storage providers, which all identify sharing as one of the main challenges. But even recently proposed E2EE cloud storage protocols which address this challenge suffer from another problem: when data is shared between a group of users, they all share access to the same, static, key material used for data encryption. This means that when the group membership changes, access control is only enforced by the server; security breaches or compelled disclosure would let even a removed member decrypt both current and future shared data. In this talk, we explore stronger security guarantees for groups of users and the data they share, and implement a practical system that delivers them.
We propose to move away from the use of static keys for data encryption in the setting of file sharing. Taking inspiration from the related setting of continuous group key agreement (CGKA) [3] and the MLS standardization effort for group messaging, we introduce a new primitive, called group key progression, that enables a dynamic group of users to agree on a persistent sequence of keys. With our efficient instantiation of this primitive, called Grappa, group members can secure future and past data from former and future group members, respectively, while themselves retaining access to all of their data. We avoid expensive data re-encryption and ensure that all users in Grappa only need to keep a compact cryptographic state. Grappa uses CGKA as a core building block to transport key updates between users, hence finding a use-case for MLS beyond group messaging.
In this talk, we want to share our take-aways from the journey of developing a file sharing system with strong security, from the novel theoretical building blocks, to challenges on the path to practice. On the theoretical side, we begin by showing that forward security (FS) and post-compromise security (PCS)—which are standard security notions for data in transit—are fundamentally more challenging to achieve for data at rest. Persistent data hence necessitates tailored methods to ensure strong end-to-end security. Instead of aiming for FS and PCS, we propose the new security notion of cryptographically-enforced interval access control (IAC), which gives similar guarantees in the common setting of persistent data applications where a group of users share access to the outsourced data, such as file sharing.
On the practical side, we spent significant engineering effort to implement a file sharing system which utilizes Grappa to achieve both end-to-end security and IAC. In doing so, we uncovered several interesting limitations of the current cryptography ecosystem that we believe to be of interest to the RWC audience. These include the lack of support for low-level cryptographic primitives in the Web Crypto API, barriers to using MLS outside of the secure messaging context as a transport layer for Grappa, and challenges with developing new cryptographic applications for cross-platform usage.
2025
RWC
Breaking and Fixing Length Leakage in Content-Defined Chunking
Abstract
Most applications that deduplicate data first split said data in smaller blocks, called chunks, using content-defined chunking (CDC). CDC cuts the chunks based on a local context window in the data: this means that chunks boundaries are preserved when the data is changed, and enables significant deduplication efficiency gains across applications dealing with large redundant dataset such as backup solutions, software patching systems, and file hosting platforms like IPFS and HuggingFace.
However, CDC also introduces a subtle leakage: the length of each chunk leaks information about the data being chunked. This enables fingerprinting attacks, where adversaries exploit chunk length patterns to infer the presence or structure of specific data. Such attacks threaten confidentiality in scenarios ranging from encrypted backups on untrusted cloud servers to data transmitted over encrypted channels. To address these risks, many systems - mainly in the cloud backup setting - have developed bespoke mitigations by mixing a cryptographic key inside the chunking process.
We demonstrate the ineffectiveness of these mitigations by presenting efficient key recovery attacks that rely solely on a known plaintext assumption. These attacks entirely circumvent all folklore mitigations except one, re-enabling fingerprinting attacks. To address this, we introduce a formal treatment for Keyed Content-Defined Chunking (KCDC) schemes and propose a provably secure construction that fulfills a strong notion of security. In doing so, we take a step towards making these real-world systems more resilient against leakage.
2025
RWC
D(e)rive with Care: Lessons Learned from Analyzing Real-World Multi-Input Key Derivation Functions
Abstract
Key derivation functions (KDFs) are integral to many cryptographic protocols, turning raw (e.g., Diffie-Hellman) key material into strong cryptographic keys. Traditionally KDFs are designed and analyzed, for settings where they take a single key material input. Modern protocol designs, however, regularly need to combine multiple secrets (e.g., in hybrid key exchange for quantum-safe migration) with the guarantee that the derived key is secure as long as at least one of the inputs is good. Complex applications, especially in the setting where keys are user-managed, may even require threshold versions of KDFs.
In this talk, we present lessons learned from analyzing such real-world proposals for multi-input KDFs. We first discuss combiner KDFs (aka key combiners), studying the designs in Signal's X3DH protocol, ETSI's TS 103-744 standard for hybrid key exchange, and MLS' combiner for pre-shared keys. Notably, the ETSI standard, widely recognized and recommended for use, for example by the German Federal Agency for IT Security, misuses the underlying HKDF salt input in a way that makes it insecure in its general form. We take the opportunity to revisit the syntax and security model for KDFs (mainly due to Krawczyk's HKDF paper, CRYPTO 2010) to give results on multiple-input KDFs. Taking an assertive stand on syntax, we do away with salts, which are needed in theory to extract from arbitrary sources in the standard model, but in practice, are almost never used (or even available) and sometimes even misused, as we saw. We then turn to the novel threshold primitive, which emerged as part of the multi-factor KDF (MFKDF) design (Nair and Song, USENIX 2023). We show how a naive implementation (such as the one proposed in MFKDF) leaves the scheme open to devastating cryptographic attacks and discuss ways forward.
2023
CRYPTO
When Messages are Keys: Is HMAC a dual-PRF?
Abstract
In Internet security protocols including TLS 1.3, KEMTLS, MLS and Noise, HMAC is being assumed to be a dual-PRF, meaning a PRF not only when keyed conventionally (through its first input), but also when "swapped" and keyed (unconventionally) through its second (message) input. We give the first in-depth analysis of the dual-PRF assumption on HMAC. For the swap case, we note that security does not hold in general, but completely characterize when it does; we show that HMAC is swap-PRF secure if and only if keys are restricted to sets satisfying a condition called feasibility, that we give, and that holds in applications. The sufficiency is shown by proof and the necessity by attacks. For the conventional PRF case, we fill a gap in the literature by proving PRF security of HMAC for keys of arbitrary length. Our proofs are in the standard model, make assumptions only on the compression function underlying the hash function, and give good bounds in the multi-user setting. The positive results are strengthened through achieving a new notion of variable key-length PRF security that guarantees security even if different users use keys of different lengths, as happens in practice.
2023
RWC
Three Lessons From Threema: Analysis of a Secure Messenger
Abstract
We provide an extensive cryptographic analysis of Threema, a Swiss-based encrypted messaging application with more than 10 million users and 7000 corporate customers. We present seven different attacks against the protocol in three different threat models. As one example, we present a cross-protocol attack which breaks authentication in Threema and which exploits the lack of proper key separation between different sub-protocols. As another, we demonstrate a compression-based side-channel attack that recovers users' long-term private keys through observation of the size of Threema encrypted backups.
From our analysis, we draw three wider lessons for developers of secure protocols.
Coauthors
- Matilda Backendal (3)
- David Balbás (1)
- Mihir Bellare (1)
- Sebastian Clermont (1)
- Nicola Dardanis (1)
- Marc Fischlin (1)
- Felix Günther (3)
- Miro Haller (2)
- Simon Phillipp Merz (1)
- Kenny Paterson (1)
- Kenneth G. Paterson (1)
- Matteo Scarlata (5)
- Kien Tuong Truong (2)