OpenAI teases an amazing new generative video model called Sora

It may be some time before we find out. OpenAI’s announcement of Sora today is a tech tease, and the company says it has no current plans to release it to the public. Instead, OpenAI will today begin sharing the model with third-party safety testers for the first time.

In particular, the firm is worried about the potential misuses of fake but photorealistic video. “We’re being careful about deployment here and making sure we have all our bases covered before we put this in the hands of the general public,” says Aditya Ramesh, a scientist at OpenAI, who created the firm’s text-to-image model DALL-E.

But OpenAI is eyeing a product launch sometime in the future. As well as safety testers, the company is also sharing the model with a select group of video makers and artists to get feedback on how to make Sora as useful as possible to creative professionals. “The other goal is to show everyone what is on the horizon, to give a preview of what these models will be capable of,” says Ramesh.

To build Sora, the team adapted the tech behind DALL-E 3, the latest version of OpenAI’s flagship text-to-image model. Like most text-to-image models, DALL-E 3 uses what’s known as a diffusion model. These are trained to turn a fuzz of random pixels into a picture.

Sora takes this approach and applies it to videos rather than still images. But the researchers also added another technique to the mix. Unlike DALL-E or most other generative video models, Sora combines its diffusion model with a type of neural network called a transformer.

Transformers are great at processing long sequences of data, like words. That has made them the special sauce inside large language models like OpenAI’s GPT-4 and Google DeepMind’s Gemini. But videos are not made of words. Instead, the researchers had to find a way to cut videos into chunks that could be treated as if they were. The approach they came up with was to dice videos up across both space and time. “It’s like if you were to have a stack of all the video frames and you cut little cubes from it,” says Brooks.

The transformer inside Sora can then process these chunks of video data in much the same way that the transformer inside a large language model processes words in a block of text. The researchers say that this let them train Sora on many more types of video than other text-to-video models, including different resolutions, durations, aspect ratio, and orientation. “It really helps the model,” says Brooks. “That is something that we’re not aware of any existing work on.”

“From a technical perspective it seems like a very significant leap forward,” says Sam Gregory, executive director at Witness, a human rights organization that specializes in the use and misuse of video technology. “But there are two sides to the coin,” he says. “The expressive capabilities offer the potential for many more people to be storytellers using video. And there are also real potential avenues for misuse.”

Opinion: Why isn’t the House Judiciary Committee looking into red flags about Clarence Thomas?

The 15 Best Jim Broadbent Movies

Patriots Unify: RF Kennedy, Jr. Suspends Campaign

Death Toll Rises to Five, Including a Small Child, in Attack on German Christmas Market

Americans in Gaza Sue Biden for Leaving Them “Trapped in a War Zone”

‘Just tell the truth’: Nation’s sheriffs blast Biden over drone ‘stonewalling’

Why it’s so hard to create a truly recyclable Keurig coffee pod

Off-duty CPD officer in Austin shootout loses police powers

A better Farm Bill that feeds families

Opinion: Why isn’t the House Judiciary Committee looking into red flags about Clarence Thomas?

The 15 Best Jim Broadbent Movies

Patriots Unify: RF Kennedy, Jr. Suspends Campaign

Death Toll Rises to Five, Including a Small Child, in Attack on German Christmas Market

Americans in Gaza Sue Biden for Leaving Them “Trapped in a War Zone”

‘Just tell the truth’: Nation’s sheriffs blast Biden over drone ‘stonewalling’

Why it’s so hard to create a truly recyclable Keurig coffee pod

Off-duty CPD officer in Austin shootout loses police powers

A better Farm Bill that feeds families

OpenAI teases an amazing new generative video model called Sora

BOJ juggles yen weakness and fragile growth after GDP surprise

PM Modi and Amir of Qatar Sheikh Tamim bin Hamid Al Thani review ties

Related News

The 4 biggest AI stories from 2024 and one key prediction for 2025

Best Internet Providers in Murfreesboro, Tennessee

Amazon Prime Video: the best shows from 2024

Pairing live support with accurate AI outputs

PM Modi and Amir of Qatar Sheikh Tamim bin Hamid Al Thani review ties

Discussion about this post

Subscribe To Our Newsletters

Customer Support

Subscribe To Our Newsletters

Categories

Recent News

Unlikely Chelsea youngster now set for some big chances – Talk Chelsea

FG donates 4,000 water pump machines, other items to Katsina farmers — News — The Guardian Nigeria News – Nigeria and World News

A university is looking for ‘gender diverse’ kids to play with transgender dolls

Mahama can build on my anti-graft efforts -Akufo-Addo

Welcome Back!

Retrieve your password

Add New Playlist