Use Case: DeepFake Origin Detection

Image by [Matt Groh](https://www.media.mit.edu/projects/detect-fakes/overview/) from [MIT Media Lab](https://www.media.mit.edu/)

In the last several years, we constantly hear that we are close to be living in a post-truth era. Every TV set speaks about fake news, confirmatione bias and more recently about deepfakes — videos processed with deep learning techniques where the face of an original person is substituted with the fake one.

If several years ago we were just laughing at the first deepfake attempts, now the technology has matured, and we have to find a way to live in this new reality.

Introduction

We, human beings, have already learnt that we should not trust to text and image information sources (because they can be quite easily modified), however we can still accredit videos. But we should not anymore because of the deepfake technology development. If the first deepfake videos born in research labs several years ago could be easily detected even by non-technical person, the more recent examples look so natural that even a professional cannot catch that the video has been manipulated.

Indeed, the technology develops very rapidly. Every year, hundreds of scientific papers are published on this topic achieving more and more realistic image in the experiments. Moreover, the adoption of this research also blows my mind. Such projects as DeepFaceLab, actively supported by community and incorporating the most recent state-of-the-art techniques, now allow anyone to create a deepfake video clip almost on a commodity hardware: you just need a good dataset of images of a faked person and a powerful GPU.

Not surprisingly, the detection of deepfakes became a paramount task. For instance, the Kaggle's competition to detect facial or voice manipulation hold 1 year ago offered the highest prize on the platform of 1 million US dollars. Interestingly, none of the 2114 teams was able to reach 70% of accuracy on an unseen validation set. This shows that the problem of deepfakes detection is acute and not yet solved.

In Vertx, we do not try address the problem of deepfakes detection. However, we have found out that our algorithm allows us to attack an accompanying issue, namely deepfake source detection. I.e., if the original video clip has been already indexed by our algorithm, our system will be able to find it given the provided deepfake sample. This is possible because our algorithm relies on movements detection in order to detect copies. Indeed, the deepfake algorithms substitute faces, but the movements in videos remain the same.

In order to show how our system can be used to find the origins of deepfake videos I have downloaded several popular deepfake clips and analyzed them using our platform. I chose the most popular videos of the users mentioned at the DeepFaceLab Github page.

note

So as some of the deepfake video clips considered here are assembled from short pieces of original movies, we reduced the minimum match time from default 15 seconds to 5 (you can do this by using command line client parameters).

DeepFake Examples

Joe Biden in World War Z

The first video I have chosen to analyze is where Joe Biden substitutes a zombie in World War Z. It seems to be quite simple for the analysis, because from the first glance, it consists of a one continuous piece of movie processed using the DeepFaceLab technology. Below, you can watch this video sample:

As a result of the analysis, our algorithm produced the following json document:

Vertx Search Results

[
  {
    "matches": [
      {
        "metadata": {
          "album": null,
          "artist": null,
          "bucket": null,
          "cover_url": null,
          "imdb_id": 816711,
          "label": null,
          "title": "World War Z",
          "type": "movie",
          "uid": "6144699547833740691",
          "year": 2013
        },
        "segments": [
          {
            "duration": 22.4375,
            "que_offset": 22.6875,
            "ref_offset": 6382.3125
          },
          {
            "duration": 6.5,
            "que_offset": 48.625,
            "ref_offset": 6407.9375
          },
          {
            "duration": 7.0,
            "que_offset": 65.125,
            "ref_offset": 6450.0
          },
          {
            "duration": 16.6875,
            "que_offset": 72.0,
            "ref_offset": 6479.6875
          },
          {
            "duration": 10.375,
            "que_offset": 89.4375,
            "ref_offset": 6506.0625
          },
          {
            "duration": 13.8125,
            "que_offset": 114.875,
            "ref_offset": 6550.125
          },
          {
            "duration": 5.6875,
            "que_offset": 128.0,
            "ref_offset": 6586.4375
          },
          {
            "duration": 5.6875,
            "que_offset": 145.75,
            "ref_offset": 6642.8125
          }
        ],
        "uid": 6144699547833740691
      },
      {
        "metadata": {
          "album": null,
          "artist": null,
          "bucket": null,
          "cover_url": null,
          "imdb_id": 58700,
          "label": null,
          "title": "The Last Man on Earth",
          "type": "movie",
          "uid": "5890475500996554810",
          "year": 1964
        },
        "segments": [
          {
            "duration": 7.625,
            "que_offset": 3.0,
            "ref_offset": 2171.375
          }
        ],
        "uid": 5890475500996554810
      }
    ],
    "media_type": "audio",
    "reason": null,
    "source_path": "WWZ.mkv",
    "source_uid": "15247207028104530174",
    "status": "succeeded"
  },
  {
    "matches": [
      {
        "metadata": {
          "album": null,
          "artist": null,
          "bucket": null,
          "cover_url": null,
          "imdb_id": 816711,
          "label": null,
          "title": "World War Z",
          "type": "movie",
          "uid": "6144699547833740691",
          "year": 2013
        },
        "segments": [
          {
            "duration": 35.3125,
            "que_offset": 22.9375,
            "ref_offset": 6382.625
          },
          {
            "duration": 9.375,
            "que_offset": 62.8125,
            "ref_offset": 6447.8125
          },
          {
            "duration": 17.0,
            "que_offset": 71.4375,
            "ref_offset": 6479.3125
          },
          {
            "duration": 11.625,
            "que_offset": 88.4375,
            "ref_offset": 6505.3125
          },
          {
            "duration": 12.75,
            "que_offset": 115.375,
            "ref_offset": 6550.4375
          },
          {
            "duration": 6.0,
            "que_offset": 128.5,
            "ref_offset": 6587.0625
          },
          {
            "duration": 6.625,
            "que_offset": 133.6875,
            "ref_offset": 6597.9375
          }
        ],
        "uid": 6144699547833740691
      }
    ],
    "media_type": "video",
    "reason": null,
    "source_path": "WWZ.mkv",
    "source_uid": "15247207028104530174",
    "status": "succeeded"
  }
]

Unfortunately, this document is not easy for analysis for human beings. To facilitate the analysis, we use the timeline representation of the data. We present the data from the document as a timeline graph, where the x axes represents the time in the query sample (in our case, this is a deepfake video clip), while on the y axes we list the titles of all the matches (we call them reference videos) we have managed to find. A bar represents the match. Thus, analysing this graph it is possible to see what parts of the query sample has been found in the indexed data. So as our algorithm analyzes video and audio separately, for each query video we produce a pair of such timeline graphs: one for video and one for audio modalities. If you put your mouse pointer over a bar representing a match, you should see the details of the match. In particular, the hover data shows the start of the match in query (deepfake clip) and reference (source/original) videos, and the duration of the match.

The graph below shows the results of the video track analysis. As you can see, our algorithm was managed to detect that almost the whole deepfake is taken from the "World War Z" movie. However, from the results of the analysis you can see that the deepfake clip is not continuous - it consists of several pieces perfectly assembled together, so that you think that this is one undivided part of the movie.