FHE2: Tips and tricks for practical homomorphic encryption 01 Sep 2019
In the last post we had a look at the basics of Homomorphic Encryption. In this post we cover a few advanced tricks that are useful in practice especially with respect to performance. To that end, we have a look at the example of calculating the dot product of vectors. Doing that, we learn about how to use different encodings and specific optimizations that can be used with the batch encoding.
Case study: dot product of vectors
As already mentioned this entire post will deal with the dot product of vectors.
Given two vectors A=[a1,a2,...,aN]
and B=[b1,b2,...,bN]
, the dot product A*B
of A
and B
is defined as A*B=a1*b1+a2*b2+...+aN*bN
.
Example: [1,2,3]*[2,2,2] = 1*2 + 2*2 + 3*2 = 12
The dot product is an important operation for neural networks, which makes this example relevant for many applications.
In Python the calculation can be done with NumPy as follows…
… or manually.
Homomorphically encrypted dot product
Now assume that we still want to calculate A*B
for some vectors A
and B
, but we want A
to be encrypted.
This could be useful in a scenario where confidential data is fed into a neural network.
A naive approach would encrypt all numbers in A
separately and perform one multiplication for each position in the vector. This could look like this:
We can verify this simple implementation:
Performance
In order to analyze the performance of this naive approach we calculate the dot product with random inputs of different length.
We use the ipython magick %timeit
for the performance measurement.
Now let’s compare that to the processing times with plaintext, once using the dot
function from earlier and one using numpy.
We see that the performance of the encrypted dot product is about a factor of 2500 times slower than the plaintext operation in python. Also, multiplying more numbers takes more time, which should not be too surprising. In the next section we will improve this.
Parallelization with the batch encoding
While the naive approach offers very poor performance, we can use one of SEAL’s more advanced encodings, the batch encoding, in order to multiply 4096 (or whatever our polynomial modulus is) numbers at once. We can rewrite the dot product to use batch encoding as follows:
We now represent a whole vector of numbers in a single ciphertext and multiply all elements in A
and B
in one step.
After that we make clever use of the rotate function.
In order to sum up all entries of the vector, using only a logarithmic number of rotate operations, we rotate and add intermediate results.
If we call the intermediate result of the first loop i1
, the one of the second loop i2
, and so on, we can visualize the computation like this:
i1 = [product[1], product[2], ..., product[4095], product[4096]] +
[product[2], product[3], ..., product[4096], product[1]]
= [product[1]+product[2], product[2]+product[3], ...]
i2 = [i1[1], i1[2], ..., i1[4095], i1[4096]] +
[i1[3], i1[4], ..., i1[1], i1[2]]
= [product[1]+product[2]+product[3]+product[4], ...]
i3 = [i2[1], i2[2], ..., i2[4095], i2[4096]] +
[i2[5], i2[6], ..., i2[3], i2[4]]
= [product[1]+product[2]+...+product[8], ...]
...
This way the first entry in the resulting array contains the sum of all entries in product
:
We can now again test the performance the same way as before
A dotproduct of vectors of length 4096 now takes 57.7ms! Compared to the 3.52s from before that is a massive improvement.
Compare that with the dot
implementation in plain python, which took 1.46ms, and we see that the new encrypted dotproduct is less than 60 times slower!
A massive improvement.
Batch encoding with floating point numbers
In the use case of deep learning, the dot product often has to be computed on vectors of floating point numbers. As the batch encoding only supports vectors of integers, we cannot directly operate on floating point numbers.
Solutions to this are either using a framework that uses a crypto scheme supporting batches of floating point numbers (such as CKKS, implemented in SEAL 3), or treating the floating point numbers as fixed point numbers that can be scaled.
For example in order to have 3 digits accuracy, all values can be multiplied with 1000 and then rounded, before encryption. The decrypted result of a multiplication then has to be divided by 1000000.
This of course means that the plaintext modulus p
needs to be quite large, as the highest absolute value that can be stored in batch encoding is p/2
.
This is not very desirable, as a larger plaintext modulus lead to less noise budget, unless the polynomial modulus is increased as well, which degrades performance.
One trick that sometimes helps is to use the Chinese Remainder Theorem.
Using the CRT we can create two PythonFHE objects with different plaintext moduli p1, p2
and perform all computations with both PythonFHE objects.
The results modulo p1
and p2
can then be used to recover the result modulo p1*p2
.
This way much bigger numbers can be stored.
In an individual use case however, it is most often necessary to manually determine adequate parameters based on the required precision, noise budget and performance.
Conclusion
In this post we implemented a commonly used vector operation with homomorphic encryption. We first tried a naive approach, which lead to very bad performance. A second approach used the batch encoding to store complete vectors in a single ciphertext. This was more difficult to implement, but improved the performance massively, as the number of multiplications was reduced.
Last but not at least, the last section discussed ways of storing floating point numbers with batch encoding and quickly described a way of using the Chinese Remainder Theorem in order to achieve smaller polynomial moduli.