Why AI Image Generators Can’t Get Hands Right

7 mars 2023 Intelligence Artificielle

AI images have shocked the photography world with their hyper-realistic output. But there is seemingly one thing they keep stumbling over — hands. AI image generators such as DALL-E, Midjourney, and Stable Diffusion are notorious for adding one too many fingers or morphing digits together, making them look nightmarish.

Earlier this year, PetaPixel reported on realistic party pictures generated by AI. But upon closer inspection, the giveaway was the hands; with one girl holding a camera with eight fingers.

Why is AI so Bad at Hands?

Part of the reason AI image generators do so badly with hands is that in the datasets used to train the image synthesizers, humans display their extremities less visibly than their faces, a Stability AI spokesperson tells BuzzFeed News.

“Hands also tend to be much smaller in the source images, as they are relatively rarely visible in large form.”

The 2D image generators also struggle to conceptualize the 3D geometry of a hand, that’s according to Professor Peter Bentley, a computer scientist and author based at the University College London.

“They’ve got the hang of the general idea of a hand. It has a palm, fingers and nails but none of these models actually understand what the full thing is,” he tells the BBC.

In PetaPixel’s tests, we asked Stable Diffusion and DALL-E to generate “two hands clasped together” and the results were typically monstrous.

Source