Jump to content

Latest version of OpenAI SORA


Andrew Reid
 Share

Recommended Posts

EOSHD Pro Color 5 for Sony cameras EOSHD Z LOG for Nikon CamerasEOSHD C-LOG and Film Profiles for All Canon DSLRs

Really impressive. I think what's fascinating right now is the nice user-friendly buttons and sliders like "remix strength" that they have. Obviously OpenAI has designed this to be usable to the general public with very broad controls. I haven't looked into any API for Sora, but other platforms might be able to expose more technical or complex controls for more advanced users. Same applies for the filters they have added. If I recall, ChatGPT refuses to talk about certain subjects, whereas the API has no such limitation.

Link to comment
Share on other sites

  • Administrators

It's like the very start of a director's career... A little bit of control over some minor productions, and then the skillset / toolset gets greater and greater, until you're painting a masterpiece.

All about that UI

The model will evolve naturally but the UI can be fucked up!

I can see happening in the future a 3 or 4-way battle of AI software like the NLE wars between Adobe, Apple and Avid.

Link to comment
Share on other sites

It will be especially interesting if the "competitor" software rely on the same underlying model, or whether anyone will make their own. These models often work in layers, where lower abstraction levels are still using just a few models that may or may not be updated or improved upon.

My prediction is we end up with complex AI tools that end up like this classic xkcd, where the critical pillar will be a single image recognition model that a couple grad students trained in 2008 with 1,000 images they took around campus.

https://xkcd.com/2347/

Link to comment
Share on other sites

I thought this was a very disappointing announcement from Open AI. 

There was a lot of hype about Sora, but instead was just yet another example that is making it increasingly clear that "There is No Moat", and that the gap between Open AI vs the rest has been closing. In fact not only has the gap been closed, we can't even regard them as #1.

For instance in the case of Sora, just look at Hunyuan, Hailuo or Kling 

Or for GPT4 or o1, compare against Claude. 

On 12/11/2024 at 9:17 AM, KnightsFan said:

It will be especially interesting if the "competitor" software rely on the same underlying model, or whether anyone will make their own.

No, they're completely new models we're discussing here. 

Sure, OpenAI does take their GPT4 model for instance and constantly releases refinements on them. Or you or I could take some open sourced weights and train something on top of that. 

But something like Sora is "made from scratch". 

Anyway, I thought this was an interesting tweet thread:

https://x.com/deedydas/status/1866509455896260813 

I reckon it's a great analogy, "Gymnastics is The Turing Test of Generative Video AI"

Link to comment
Share on other sites

  • Administrators

Still very early days though.

Video is a proxy for AI's understanding of reality... in terms of a world simulator. Physics, perspective, light, and a lot more.

In the future it won't just be generating two dimensional eye-candy but immersive simulations which are interactive like video games.

And the fine grained control over the generation of this sort of content is going to be a very powerful thing, where the director's vision can be put out into this world in a very precise way fully under human supervision, and leaving a lot of room for direct human intervention and editing.

It's fascinating that the AI revolution has started in the same way as the earliest mainstream computer software.

  • First with text commands only
  • Then with a graphical UI like Premiere as we see with Sora
  • Giving rise to rich multimedia, video streaming and 3D worlds

It really shows something fundamental to mathematics.

That the innate thing about it is that it evolves language into worlds.

Makes me think that universes and reality are simply an expression of mathematical language.

And that the universe itself - our world - is a kind of computer simulation.

(My Sora wishlist btw... https://fullframe.ai/2024/02/21/the-ai-directors-wishlist-features-filmmakers-need-from-openai-and-sora)

Link to comment
Share on other sites

13 hours ago, Andrew Reid said:

Still very early days though.

Indeed! To compare it with other breathtaking revolution technology changes, maybe it's like the difference between the "mobile phones" (well, field phones) I played with as a kid:

image.thumb.png.1e82396715a487e5382d2fe2b906b37e.png

vs the mini supercomputer phones we hold in the palm of our hands today in 2024!

Almost completely unimagable the massive leap forward from one to another!

Then again, perhaps we'll hit a wall and have no more progress at all for the next half century? 

After all, since AlphaGo way back in 2016 (which was a shockingly breathtaking new development in AI! It achieved what all mainstream AI researchers thought was still decades away from being achieved) and the very famous "Attention Is All You Need" paper (published in 2017) then we haven't seen anything truly groundbreaking and paradigm shifting in AI be announced. 

Personally, just as an obessed AI nerd observing this over the decades, then I feel everything else since then has just been building upon and refining upon those earlier groundbreaking insights which laid the foundations for what we see now. 

Everything else since then has been: 1) further refinements and developments building upon those earlier  breakthrough foundations that were laid, such as going from GPT3 to GPT4 2) or a better UI wrapped around existing core AI tech 3) or these new insights being applied to new unexplored fields, such as for video generation, but still the same underlying idea at work

Thus why I think it's possible we might not see the decades forward leap happen again in AI like we saw happen 8yrs ago, as that did come as a surprise to everyone (well, for everyone who existed outside Google's DeepMind!).

But that doesn't matter, there is enough low hanging refinements to do (GPT5 when?) and unexplored new ground (such as just recently people have been doing AI generated video games! Mind blowing) to keep people busy for many years yet to come.  

For instance even if current core AI tech doesn't advance another inch, it's still good enough currently to replace 80% of workers in a Call Center. Just the implementation of making it done effectively and doing the conversion process of current Call Centers to being 80% AI based is what will take a few years to get done right. Just to give an example of just one industry that will be turned upside down, even if our core AI tech doesn't improve any more. 

Just because they're incremental improvements upon existing ideas, doesn't mean they can't still be high impact improvements. 

13 hours ago, Andrew Reid said:

Makes me think that universes and reality are simply an expression of mathematical language.

As a maths graduate myself, I 100% agree with this. 

Maths is the language of the universe. 

13 hours ago, Andrew Reid said:

Ohhh... you have a new website! ♥️ Looking good. 

Link to comment
Share on other sites

 

22 hours ago, Andrew Reid said:

Excellent write-up! Been experimenting with Runway a lot lately since production company has the unlimited license. But past the initial impressiveness, these prompt based AI generation tools do show quite quickly their limits imo. Never been a fan of stock footage and to me this is really still just customisable stock footage. Great potential if indeed they added log, sensor size, lens choice etc.. Generating from stills can help get you that precise look but its still way too quirky and uncanny valley for pro use. Prompt base also leads to too much free interpretation with odd quirks. At least with Sora and the remix function you can potentially get more accurate results via subsequent prompts a bit like how Chat GPT works. Love the storyboard feature with 4 variations. Very eager to try it.. but I can already see the limits of its actual use. No 4K being a big one as you point out. For story boards and pre-production its fantastic though.

Link to comment
Share on other sites

On 12/13/2024 at 7:37 AM, Django said:

 

Excellent write-up! Been experimenting with Runway a lot lately since production company has the unlimited license. But past the initial impressiveness, these prompt based AI generation tools do show quite quickly their limits imo. Never been a fan of stock footage and to me this is really still just customisable stock footage. Great potential if indeed they added log, sensor size, lens choice etc.. Generating from stills can help get you that precise look but its still way too quirky and uncanny valley for pro use. Prompt base also leads to too much free interpretation with odd quirks. At least with Sora and the remix function you can potentially get more accurate results via subsequent prompts a bit like how Chat GPT works. Love the storyboard feature with 4 variations. Very eager to try it.. but I can already see the limits of its actual use. No 4K being a big one as you point out. For story boards and pre-production its fantastic though.

Exactly this. It feels like a random footage generator rather than a tool. I've seen some impressive stuff in runway promos but they generate 1000s of videos in order to pick one. 

Link to comment
Share on other sites

Google VEO 2 look to have excellent physics... here's a You Tube Video comparing AI video models at starting at 15:39. Kling just release 1.6 on December 19th, the day this video was released so this is most likely the Kling 1.5... 1.6 of course supposed to be better.  One thing to note - film makers need consistent characters and the are ways to train the image generator for generating a starting video image frame. Kling allows you to train it to generate consistent characters. I think 2025 will be when AI video comes into it own.

 

 

Link to comment
Share on other sites

Google's version looks good, too. I do think we're still a couple years off from good, reliable video generators for serious videos. 2025 will see a flood of "content creators" using it for sure, but gluing together multiple layers of neural nets and traditional programming into a cohesive unit will take some time.

Image generators typically work on multiple abstraction layers, so the model has a concept of what a "cat" looks like inside a prompt "cat holding a beer." To solve physics and object permanence, I believe that video will need to have a concept of 3D space and objects in that space, and specific characters are conceptualized as a character that can be reused, etc. So I think significantly more work will need to be done on each layer of the network to get beyond making portraits with minor movement.

Then of course going off my earlier comment, I think that a significant constraint for something like Sora is that it's designed to be used by absolute amateurs. Professional software, like an integration into Adobe CC or DaVinci Resolve, can expose more controls or even basic scripting (e.g. in Fusion) and expect users to reference a manual to learn it all. The user base for that is so much smaller, it will take more time to get there.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

  • EOSHD Pro Color 5 for All Sony cameras
    EOSHD C-LOG and Film Profiles for All Canon DSLRs
    EOSHD Dynamic Range Enhancer for H.264/H.265
×
×
  • Create New...