[1909.02749] Video Interpolation and Prediction with Unsupervised Landmarks

Prediction and interpolation for long-range video data involves the complex task of modeling motion trajectories for each visible object, occlusions and dis-occlusions, as well as appearance changes due to viewpoint and lighting. Optical flow based techniques generalize but are suitable only for short temporal ranges. Many methods opt to project the video frames to a low dimensional latent space, achieving long-range predictions. However, these latent representations are often non-interpretable, and therefore difficult to manipulate. This work poses video prediction and interpolation as unsupervised latent structure inference followed by a temporal prediction in this latent space. The latent representations capture foreground semantics without explicit supervision such as keypoints or poses. Further, as each landmark can be mapped to a coordinate indicating where a semantic part is positioned, we can reliably interpolate within the coordinate domain to achieve predictable motion interp

2 mentions: @animesh_garg
Date: 2019/09/11 23:18

Referring Tweets

@animesh_garg Latest paper with collaborators at @NvidiaAI on High-quality, long-range video interpolation, and extrapolation through unsupervised latent structure inference followed by a temporal prediction. Paper: t.co/IUFMim57Be K. Shih,@aysegl_dndr, R. Pottorff, A. Tao, @ctnzr t.co/ULvd62OGF5