Understanding the Technology Behind DeepFakes

You can read all the posts in this series here:

If you are interested in reading more about AI Art (Stable Diffusion, Midjourney, etc) you can check this article instead: The Rise of AI Art.

In the previous part of this series, An Introduction to Neural Networks and Autoencoders, we have explained the concept of autoencoding, and how neural networks can use it to compress and decompress images.

The diagram above shows an image (in this specific case, a face) being fed to an encoder. Its result is a lower dimensional representation of that very same face, which is sometimes referred to as base vector or latent face. Depending on the network architecture, the latent face might not look like a face at all. When passed through a decoder, the latent face is then reconstructed. Autoencoders are lossy, hence the reconstructed face is unlikely to have the same level of detail that was originally present.

The programmer has full control over the shape of the network: how many layers, how many nodes per layer and how they are connected. The real knowledge of the network is stored in the edges which connect nodes. Each edge has a weight, and finding the right set of weights that make the autoencoder works like described is a time-consuming process.

Training a neural network means optimising its weights to achieve a specific goal. In the case of a traditional autoencoder, the performance of a network is measured on how well it reconstructs the original image from its representation in latent space.

📰 Ad Break

Training Deepfakes

It is important to notice that if we train two autoencoders separately, they will be incompatible with each other. The latent faces are based on specific features that each network has deemed meaningful during its training process. But if two autoencoders are trained separately on different faces, their latent spaces will represent different features.

What makes face swapping technology possible is finding a way to force both latent faces to be encoded on the same features. Deepfakes solved this by having both networks sharing the same encoder, yet using two different decoders.

During the training phase, these two networks are treated separately. The Decoder A is only trained with faces of A; the Decoder B is only trained with faces of B. However, all latent faces are produced by the same Encoder. This means that the encoder itself has to identify common features in both faces. Because all faces share a similar structure, it is not unreasonable to expect the encoder to learn the concept of “face” itself.

Generating Deepfakes

When the training process is complete, we can pass a latent face generated from Subject A to the Decoder B. As seen in the diagram below, the Decoder B will try to reconstruct Subject B, from the information relative to Subject A.

If the network has generalised well enough what makes a face, the latent space will represent facial expressions and orientations. This means generating a face for Subject B with the same expression and orientation of Subject A.

To better understand what this means, you can have a look at the animation below. On the left, faces of UI Artist Anisa Sanusi are extracted from a video (link) and aligned. On the right, a trained neural network is reconstructing the face of game designer Henry Hoffman to match Anisa’s expression.

It should be obvious, at this point, that the technology behind deep fakes is not constrained on faces. It can be used, for instance, to turn apples into kiwis.

What is important is that the two subjects used in the training share as many similarities as possible. This is to ensure that the shared encoder can generalise meaningful features that are easy to transfer. While this technique will work on both faces and fruits, is unlikely to convert faces into fruits.

📰 Ad Break

Conclusion

You can read all the posts in this series here:

Comments

21 responses to “Understanding the Technology Behind DeepFakes”

  1. Malika Rougaïbi avatar
    Malika Rougaïbi

    Hello,

    My name is Malika and I’m a master’s student in law and life sciences at the University of Sherbrooke (Canada). As part of my studies, I’m finalizing an essay called “Deepfakes in Canadian law: examples of the inadequacy of the law in the face of this emerging technology”. I would like to use, with your permission, the first three (3) figures that appear in your article (Understanding the Technology Behind DeepFakes) in my essay to describe how a GAN works. I will cite the source of the figures as well as a direct link to your website. If you are the copyright holder of this figure, may I have your permission to publish the figures in my article which will be published in my university’s database (Savoir UdeS, https://savoirs.usherbrooke.ca/?locale-attribute=en )?

    This is a non-commercial publication, purely for research, for which I will not receive any income.

    Best regards,

  2. […] Part 6. Understanding the Technology Behind DeepFakes […]

  3. […] Part 6. Understanding the Technology Behind DeepFakes […]

  4. […] can learn more about this in Understanding the Technology Behind Deepfakes, although since 2017 several other techniques and architectures have been developed to swap […]

  5. Can I find the code that uses same encoder, but different decoders

  6. AI vs AI: the Risks of Deepfakes and the Need of Regulations – AI & Society

    […] using autoencoders, two networks use the same encoder but different decoders for the training process. Then, in the […]

  7. Kunal Goyal avatar
    Kunal Goyal

    Please tell me how to remove that message which shows on the screen after installing the software.

  8. the best explanation of deepfakes I have found to date

    1. Thank you!
      I have a plan to make an updated series since so much has changed in the past few years!

  9. […] Deepfake video, training stepSource: Understanding the Technology Behind DeepFakes […]

  10. Jessica avatar

    Hi Alan! Thanks for your very informative blog posts. I just wanted to let you know that I actually cited your blog and used your info graphics in my Law Review note addressing some First Amendment concerns with regulating deepfakes. You can find the article here: https://scholarlycommons.law.case.edu/cgi/viewcontent.cgi?article=4854&context=caselrev
    I just wanted to let you know and say thanks!

    1. Thank you so much for letting me know, Jessica!
      I hope this article has helped you!

  11. Excelent post! One question, what are good resources to see the mathematical formulation of Deepfakes, in specific, how is the traininng process in the NN in Fig. 2 and 3. Thanks in Advance.

  12. […] you are curious to understand how face-swap technology works, have a look at this new tutorial about #DeepFakes. […]

  13. Do you have access to the training data for apples and kiwis? Is there any way I could download this model?

    1. Hi Andrew!

      No, I did not make that dataset public as it was not the best.
      I got a 3D model of a wiki and an apple.
      And simply used Unity to render images of them at different angles.

  14. This is interesting. It would be fun to see if it’s possible to generate human faces from animal faces, like what animal do you look like and other similar applications.

    1. The reason why DeepFakes work is because when two faces are compressed into latent faces, the network is forced to identify similarities. Faces have a very similar structure, but the same might not happen for animals. I’d be happy to see it implemented though!

  15. This is fascinating. I wonder if this could be applied to other objects within a video. Take for example of two national flags, blowing in the breeze. provided they have similar lighting and overall shape, could we change an Australian Flag into a US flag?

    1. Definitely!
      FakeApp is based on an autoencoder that could, potentially, learn any mapping.
      I have a picture, further on, where I am swapping apples with kiwis.
      The only problem is that if you want to do that on a video, you’ll need to find the position of the apple first.
      There are a lot of face detection algorithms, but not so many for apples and flags! 😀

Leave a Reply

Your email address will not be published. Required fields are marked *