Marble: A Multimodal World Model

by meetpateltechon 11/12/25, 10:11 PMwith 78 comments
by chaosprinton 11/13/25, 3:11 PM

Feifei is a great researcher. But to be honest, the progress her company has made in "world modeling" seems to deviate somewhat from what she has advertised, which is a bit disappointing. As this article (https://entropytown.com/articles/2025-11-13-world-model-lecu...) summarizes, she is mainly working on 3DGS applications. The problem is that, despite the substantial funding, this demo video clearly avoids the essentials; the camera movement is merely a panning motion. It's safe to assume that adding even a one-second extra second to each shot would drastically reduce the quality. It offers almost no improvement over the earliest 3DGS demo, let alone the addition of any characters.

by keyleon 11/13/25, 12:17 AM

I'm floored. Incredible work.

also check out their interactive examples on the webapp. It's a bit more rough around the edges but shows real user input/output. Arguably such examples could be pushed further to better quality output.

e.g. https://marble.worldlabs.ai/world/b75af78a-b040-4415-9f42-6d...

e.g. https://marble.worldlabs.ai/world/cbd8d6fb-4511-4d2c-a941-f4...

by jtfrenchon 11/13/25, 3:40 AM

I like that they distinguish between the collider mesh (lower poly) and the detailed mesh (higher poly).

As a game developer I'm looking for:

• Export low-poly triangle mesh (ideally OBJ or FBX format — something fairly generic, nothing too fancy) • Export texture map • Export normals • Bonus: export the scene as "de-structured" objects (e.g. instead of a giant world mesh with everything baked into it, separate exports for foreground and background objects to make it more game engine-ready.

Gaussian splats are awesome, but not critical for my current renderers. Cool to have though.

by jtfrenchon 11/13/25, 3:42 AM

Is Marble's definition of a "world model" the same as Yann LeCun's definition of a world model? And is that the same as Genie's definition of a world model?

by morgangoon 11/13/25, 4:45 PM

Are we nearing the capability to build something like the Mind's Game (from the Ender's Game book)

by msteffenon 11/13/25, 1:04 AM

I understand that DeepMind is working on this too: https://deepmind.google/blog/genie-3-a-new-frontier-for-worl...

I wonder how their approaches and results compare?

by dvrpon 11/13/25, 3:02 AM

Isn't this a Gaussian Splat model?

I work in AI and, to this day, I don't know what they mean by “world” in “world model”.

by anilgulechaon 11/13/25, 7:24 AM

This is going to take movie making to another level because now we can: 1. Generate a full scene 2. Generate a character with specific movements.

Combine these 2, and we can have moving cameras as needed (long takes). This is going to make storytelling very expressive.

Incredible times! Here's a bet: We'll have a AI superstar (A-list level) in the next 12 months.

by theizon 11/13/25, 9:13 AM

Nice tech! Would be great if this can also work on factual data, like design drawings. With that it could be used for BIM and regulatory use. For example to showcase to residents how a new residential area will look that is planned. Or to test the layout of a planned airport.

by venom2001viperon 11/13/25, 4:26 PM

It doesn't seem all great researchers can create great companies or products

by kkukshtelon 11/13/25, 4:13 AM

Incredibly disappointing release, especially for a company with so much talent and capital.

Looking at the worlds generated here https://marble.worldlabs.ai/ it looks a lot more like they are just doing image generation for a multiview stereo 360 panoramas and then reprojecting that into space. The generations exhibit all the same image artifacts that come from this type of scanning/reconstruction work, all the same data shadow artifacts, etc.

This is more of a glorified image generator, a far cry from a "world model".

by abixbon 11/13/25, 1:11 AM

As someone with barebones understanding of "world models," how does this differ from sophisticated game engines that generate three-dimensional worlds? Is it simply the adaptation of transformer architecture in generating the 3-D world v/s using a static/predictable script as in game engines (learned dynamics vs deterministic simulation mimicking 'generation')? Would love an explanation from SMEs.

by hobofanon 11/12/25, 10:52 PM

Duplicate: https://news.ycombinator.com/item?id=45902732

by girfanon 11/13/25, 12:56 AM

This seems very interesting. Timely, given that Yann LeCun's vision also seems to align with world models being the next frontier: https://news.ycombinator.com/item?id=45897271

by dclon 11/13/25, 3:16 AM

What happens when you prompt one of these kind of models with de_dust? Will it autocomplete the rest of the map?

edit: Just tried it and it doesn't, but it does a good job of creating something like a CS map.

by john_minskon 11/13/25, 7:46 AM

This is great. Can I use it with existing scan of my room to fill the gaps? Not a random world

Update - yes you can. To be tested.

by coolfoxon 11/13/25, 8:31 AM

this prompt seems to be blocked, "Los Angeles moments before a 8mile wide asteroid impacts." others work but when I use that it's always 'too busy'.

seems anything to do with asteroids (or explosions I imagine) are blocked.

by ProofHouseon 11/13/25, 3:38 AM

RIP GTA6

by badmonsteron 11/13/25, 2:18 AM

exciting towards world intelligence

by cubefoxon 11/12/25, 11:47 PM

Impressive!

by Marshfermon 11/13/25, 4:03 AM

If biology operated this way animals would never have evolved. This is bunk, it has nothing to do with intelligence and everything to do with hyping the oxymoronic/paradox branded as spatial intelligence. It's a flimsy way of using binary to burn energy.

Senses do not represent, if we needed them to in order to survive, we'd be dead before we never saw that branch, or that step, etc. This is the same mistaken approach cog-sci took in inventing the mind from the brain.

The problem is the whole brain prior to sensory and emotional integration is deeply involved so that an incredibly shallow oscillatory requirement fits atop the depth of long-term path-integration, memory consolidation involving allo and egocentric references of space and time, these are then correlated in affinities by sense emotion relays or values. None of this is here, it's all discarded for junk volumes made arbitrary, whereas the spaces here are immeasurably specific, fused by Sharp Wave Ripples.

There's no world model in the instantaneous shallow optic flow response (knowing when to slam on brakes, or pull in for a dive) and in the deep entanglements of memory and imagination and creativity.

This is one-size fits all nonsense. It's junk. It can't hope to resolve the space time requirements of shallow and deep experiences.