Imagine for a moment that we could give a comic to an artificial intelligence and it would give us back an animation video based on it, but having filled in the gaps between vignettes . It would revolutionize the animation industry, right?
Well, that is exactly what could be achieved by a team of Google AI researchers : “advances in machine vision and machine learning make it an increasingly tangible goal,” they say in the research paper ( PDF ) .
For now, they have managed to generate realistic videos counting only on the first and last frame of them, and inferring from them the most ‘plausible’ intermediate frames , through a process known as ‘inbetweening’.
This technology is based on the use of convolutional neural networks , a model inspired by the functioning of neurons that can be found in the primary visual cortex of a biological brain, and that is why it is usually used in image analysis tasks.
In this case, it has three components : a 2D image decoding network, another generation of 3D latent representations and a final video generator.
“The key to the success of this technique lies in having a component dedicated to learning to represent latent video, regardless of the final stage of video decoding.”
Another twist
Shortly after presenting this paper, another team of researchers (this time from Google Research) presented another system based on artificial intelligence that, in this case, even dispensed with the initial and final frames: it is capable of generating videos from scratch , and (Here lies the novelty) so that the result is as diverse as realistic . That is to say, as it already happens with the ‘photos’ of deepfake faces.
To do this, they have made use of an open source video dataset called Kinetics, whose clips include elements such as camera movements, interaction with complete objects and a wide range of human movements . You can find more information in the corresponding paper ( PDF ).