In the example directly above, the real-life video on the right (woman in red dress) is used to ‘puppet’ the captured identity (man in blue shirt) on the left via RigNeRF, which (the authors claim) is the first NeRF-based system to achieve separation of pose and expression while being able to perform novel view syntheses.
The male figure on the left in the image above was ‘captured’ from a 70-second smartphone video, and the input data (including the entire scene information) subsequently trained across 4 V100 GPUs to obtain the scene.
Since 3DMM-style parametric rigs are also available as entire-body parametric CGI proxies (rather than just face rigs), RigNeRF potentially opens up the possibility of full-body deepfakes where real human movement, texture and expression is passed to the CGI-based parametric layer, which would then translate action and expression into rendered NeRF environments and videos.
As for RigNeRF – does it qualify as a deepfake method in the current sense that the headlines understand the term? Or is it just another semi-hobbled also-ran to DeepFaceLab and other labor-intensive, 2017-era autoencoder deepfake systems?