An oblivious algorithm that allows computation on encrypted data in the cloud | Cheriton School of Computer Science

Most master’s students a year into their graduate program are hopeful their research will contribute to their discipline, but few will crack a long-standing problem and in so doing develop a solution that’s a major advance in the field.

Simeon Krastnikov, a first-year master’s student at the Cheriton School of Computer Science, has done exactly that. He developed an oblivious algorithm — a type of algorithm whose behaviour is independent of its input data unlike a typical algorithm designed for the same problem — that solves how to join two tables securely in cloud databases.

More and more governments, businesses and organizations are relying on the cloud to store large amounts of data securely — everything from financial records to government documents to sensitive healthcare data. And along with this growth in cloud-based storage has been the demand for service providers to allow computation to occur remotely on the encrypted data they host while preserving data privacy.

A major algorithmic challenge in designing applications for secure computation in the cloud is to combat side channels by ensuring such applications are oblivious to their inputs — that their memory access patterns do not leak sensitive information to the server, Simeon explained. “You may think that computation on your encrypted data in the cloud is happening completely in secret, but depending, for example, on the memory accesses a program makes the server may be able to learn certain information about what’s happening and what the encrypted data looks like, thereby breaking the encryption.”

Various mechanisms have been developed to preserve privacy, among them dedicated secure cryptographic coprocessors and hardware enclaves such Intel’s Software Guard Extensions or SGX, which provide security through a dedicated set of processor instructions. These approaches do provide good cryptographic guarantees, ensuring a user’s data remains encrypted while computation is performed on it remotely, but on their own these mechanisms provide no guarantee to address various types of subtle information leakages.

“As the program reads and writes to specific addresses of the server’s memory, these memory access patterns can reveal information about the user’s data if the program’s control flow depends on its input,” said Professor Florian Kerschbaum, Simeon’s co-supervisor and a Professor at the Cheriton School of Computer Science. “This is why we need to develop oblivious algorithms, ones that do not leak sensitive information to the server through such side channels.”

Making database operations oblivious does not pose a great algorithmic challenge in many instances, but one common database operation — the join operator — has proven difficult to make oblivious. In fact, since the first paper describing this problem was published in 2006, no solution has been satisfactory until Simeon’s work that not only cracked the problem but did so elegantly.

“Making the sort-merge algorithm secure meant it had to be made oblivious, ensuring its decisions about which memory locations to access do not depend on the contents of its input data,” said Professor Douglas Stebila, Simeon’s other co-supervisor and a Professor in the Department of Combinatorics and Optimization. “This is difficult to do without introducing substantial computational overhead. Yet Simeon developed an oblivious algorithm for binary database joins with a run time that matches that of the standard non-oblivious sort-merge join algorithm. Any model of computation that can support a sorting network on encrypted data can also support his algorithm.”

The research team of cryptographers has tested a working prototype of the oblivious algorithm, which consists of just around 600 lines of C++ code, as well as a version that uses Intel SGX.

A future application would be to apply the logic behind this algorithm to purely cryptographic problems that require similar kinds of operations. A number of database operations are very similar to joins in their computational complexity and the newly introduced oblivious primitives used in the join algorithm could potentially be applied to those as well.

“Both Douglas and I are in awe of Simeon’s breakthrough,” Professor Kerschbaum said. “His solution is scalable to the optimal complexity, it is provably leakage free, it supports the full functionality of equi-joins, and requires no random access memory. All that and its practical efficiency is already extremely good, much better than anything else. It’s an impressive solution to a long-standing problem.”

This research will be presented at VLDB 2020, the 46^th International Conference on Very Large Data Bases, which is being held virtually this year from August 31 to September 4. To learn more about this research, please see S. Krastnikov, F. Kerschbaum, D. Stebila. Efficient oblivious database joins. Proceedings of the VLDB Endowment, 13(11): 2132–45, 2020.