Even Meta CEO Mark Zuckerberg stepped in. He wrote in a Facebook post: “This is pretty amazing progress. It is much more difficult to generate videos than photos because beyond the correct generation of each pixel, the system [must also] predict how they will change over time.
According to white paper (pdf) published by the Meta AI team behind Make-A-Video, text-to-video generation lags behind text-to-image generation for two main reasons: a lack of large-scale datasets with pairs high-quality text-to-video, and the complexity of modeling higher-dimensional video data.
To circumvent this problem, Meta combined data from three open-source image and video datasets to train its model. The text-to-image datasets of labeled still images allowed the AI to learn the names of objects and their appearance; the video database helped him learn how objects move. This broke the reliance on text-video pairs for text-video generation.
or like Explain by Ars-Technica, Meta took image synthesis data of trained still images with captions and applied unlabeled video training data so that the model learned where a text or image prompt might exist in space and the time. This allowed Make-A-Video to predict what comes after the image and display the moving scene for a short time.
To be clear, Meta had shared just over a dozen AI-generated videos so far. Additionally, each video clip is no longer than five seconds and contains no sound. So it’s hard to say how easy it is to generate usable videos, especially since the system is not currently available to the public.
For his part, Meta said he is restricting access because he wants to think about how he builds next-gen AI systems. For now, you can Register to indicate its interest if Meta ever decides to make it available.
The road ahead
Naturally, Meta’s revelation has rekindled concerns that sophisticated text-to-image and text-to-video tools will fuel widespread misinformation by lowering barriers to creating fake media. (Open AI deleted its waiting list for DALL-E 2 last week, making it accessible to anyone wishing to verify their email address and mobile number)
Videos created from Make-A-Video currently have a small “Meta AI” logo in the lower right corner and are easy to remove with a video editing tool. Some commenters have called for larger watermarks that can’t be removed except it would render the AI-generated video useless.
However, what Meta decides to do may not matter in the long run. The existence of Make-A-Video will only spur other AI labs to build their text-to-video systems with equivalent capabilities, or release an improved text-to-video system by improving Meta searches.
And if that still isn’t too much for your mind to process, a report on Ars-Technica raised the ethical question around the use of commercial media for non-commercial academic research – but then being integrated into a commercial AI product.
Specifically, researcher Simon Willison found that Meta was using more than 10 million videos pulled from Shutterstock without permission, while researcher Andy Baio found an additional 3.3 million videos on YouTube.
You can read Baio’s stimulating commentary titled “AI Data Laundering: How Academic and Nonprofit Researchers Shield Tech Companies from Accountability” here.
Paul Mah is the editor of DSAITrends. A former system administrator, programmer and professor of computer science, he enjoys writing code and prose. You can reach him at [email protected].
Image credit: DALL-E 2