OpenAI just revealed new software that lets you create realistic video by simply typing a descriptiv

2024-02-18 11:11 Hayden Field


OpenAI announced Thursday it has expanded beyond text and images to offer video-generation AI for the first time.

The new model, called Sora, allows a user to type out a desired scene and turns it into a high-definition video clip.

AI-generated videos create another hurdle for platforms concerned about misinformation, especially as important elections are scheduled across the globe this year.

A Samoyed and a Golden Retriever dog are playfully romping through a futuristic neon city at night.

A Samoyed and a Golden Retriever dog are playfully romping through a futuristic neon city at night.


OpenAI, which burst into the mainstream last year thanks to the popularity of ChatGPT, is bringing its artificial intelligence technology to video.

The company on Thursday introduced Sora, its new generative AI model. Sora works similarly to OpenAI’s image-generation AI tool, DALL-E. A user types out a desired scene and Sora will return a high-definition video clip. Sora can also generate video clips inspired by still images, and extend existing videos or fill in missing frames.

Video could be the next frontier for generative AI now that chatbots and image generators have made their way into the consumer and business world. While the creative opportunities will excite AI enthusiasts, the new technologies present serious misinformation concerns as major political elections approach across the globe. The number of AI-generated deepfakes created has increased 900% year over year, according to data from Clarity, a machine learning firm.

With Sora, OpenAI is looking to compete with video-generation AI tools from companies such as Meta and Google, which announced Lumiere in January. Similar AI tools are available from other startups, such as Stability AI, which has a product called Stable Video Diffusion. Amazon has also released Create with Alexa, a model that specializes in generating prompt-based short-form animated children’s content.

Sora is currently limited to generating videos that are a minute long or less. OpenAI, backed by Microsoft, has made multimodality — the combining of text, image and video generation — a goal in its effort to offer a broader suite of AI models.

“The world is multimodal,” OpenAI COO Brad Lightcap told CNBC in November. “If you think about the way we as humans process the world and engage with the world, we see things, we hear things, we say things — the world is much bigger than text. So to us, it always felt incomplete for text and code to be the single modalities, the single interfaces that we could have to how powerful these models are and what they can do.”

Sora has thus far only been available to a small group of safety testers, or “red teamers,” who test the model for vulnerabilities in areas such as misinformation and bias. The company hasn’t released any public demonstrations beyond 10 sample clips available on its website, and it said its accompanying technical paper will be released later on Thursday.

OpenAI also said it’s building a “detection classifier” that can identify Sora-generated video clips, and that it plans to include certain metadata in its output that should help with identifying AI-generated content. It’s the same type of metadata that Meta is looking to use to identify AI-generated images this election year.

Sora is a diffusion AI model that, like ChatGPT, uses the Transformer architecture, introduced by Google researchers in a 2017 paper.

“Sora serves as a foundation for models that can understand and simulate the real world,” OpenAI wrote in its announcement.