The quest for AI that can generate and simulate consistent, interactive worlds in real-time has taken a monumental leap forward. On December 17, 2025, Tencent's Hunyuan team open-sourced HY-World 1.5, codenamed WorldPlay. This isn't just an incremental update; it's a comprehensive framework that claims to resolve the fundamental trade-off between speed, memory, and long-term consistency in world modeling.
In short, WorldPlay enables the generation of long-horizon, interactive streaming video at a stunning 24 FPS, all while maintaining geometric consistency over time. Let's dive into what makes this model so revolutionary.
The Core Problem: Speed vs. Consistency#
Previous world models, including the team's own HY-World 1.0, often faced a critical limitation. They could generate impressive 3D worlds but typically through a slow, offline process. Achieving real-time interaction meant sacrificing the long-term consistency of the environment—objects would morph, textures would flicker, and the geometry would drift over time. WorldPlay aims to shatter this compromise.
The Four Pillars of WorldPlay's Architecture#
The breakthrough is powered by four key technical innovations:
-
Dual Action Representation: This is the "controller" of the model. It translates user inputs (like keyboard and mouse movements) into a robust, model-understandable action space that allows for precise and responsive control over the generated world's viewpoint.
-
Reconstituted Context Memory: This is the core of long-term consistency. To prevent the model from "forgetting" the past, this module dynamically rebuilds context from previously generated video chunks. It uses a clever technique called temporal reframing to keep geometrically important frames from the distant past accessible, effectively solving the problem of memory attenuation.
-
WorldCompass: A Novel RL Post-Training Framework: After initial training, the model undergoes a reinforcement learning (RL) phase specifically designed for long-horizon tasks. WorldCompass directly optimizes the model for better action-following and higher visual quality over extended sequences, ensuring the output remains stable and coherent.
-
Context Forcing: Memory-Aware Distillation: To achieve real-time speeds, a smaller, faster "student" model is often distilled from a larger "teacher" model. However, standard distillation can cause the student to lose its ability to use long-range context. Context Forcing is a novel distillation method that aligns the memory context between teacher and student, preserving the student's capacity for long-term reasoning while enabling 24 FPS generation.
Key Features and Capabilities#
- Real-Time and Interactive: Generates video streams at 24 FPS, allowing for live interaction based on user input.
- Long-Term Geometric Consistency: Maintains the stability and coherence of the world's structure over long generation horizons.
- Versatile Applications: Supports both first-person and third-person perspectives in real-world and stylized environments. Potential applications include interactive 3D reconstruction, promptable events (e.g., "make it rain"), and infinite world extension.
- Comprehensive Open-Source Release: The team has open-sourced not just the model weights but a full-stack framework covering data, training, and inference deployment.
Quantitative Superiority#
The model's performance is backed by extensive evaluations. As shown in the table below, the full WorldPlay model ("Ours (full)") outperforms existing state-of-the-art methods across key metrics like PSNR, SSIM, and LPIPS, especially in long-term scenarios, while being the only one that operates in real-time.
| Model | Real-time | Short-term PSNR/SSIM/LPIPS | Long-term PSNR/SSIM/LPIPS |
|---|---|---|---|
| CameraCtrl | ❌ | 17.93 / 0.569 / 0.298 | 10.09 / 0.241 / 0.549 |
| Gen3C | ❌ | 21.68 / 0.635 / 0.278 | 15.37 / 0.431 / 0.483 |
| Matrix-Game-2.0 | ✅ | 17.26 / 0.505 / 0.383 | 9.57 / 0.205 / 0.631 |
| Ours (full) | ✅ | 21.92 / 0.702 / 0.247 | 18.94 / 0.585 / 0.371 |
Getting Started with WorldPlay#
For developers eager to experiment, the repository provides a clear path to quick start. The model is built upon the powerful HunyuanVideo-1.5 base model. The setup involves:
- Creating a Python 3.10 environment and installing dependencies.
- Installing Flash Attention for optimized performance.
- Downloading the pre-trained HunyuanVideo-1.5 model and the specific WorldPlay checkpoints.
- Running the provided inference scripts (
generate.pyorgenerate_custom_trajectory.pyfor custom camera paths).
The code supports inference with different model variants: bidirectional, autoregressive, and the distilled autoregressive model for maximum speed.
Conclusion and Future Work#
HY-World 1.5 (WorldPlay) represents a significant milestone in AI-driven content creation and simulation. By systematically addressing the bottlenecks of speed and consistency, it opens up new possibilities for real-time, interactive applications in gaming, virtual reality, and architectural visualization.
The team has indicated that the training code is still on the TODO list for open-sourcing, which will be a crucial next step for the research community to build upon this work. For now, the release of the models and inference code is a massive contribution that allows everyone to experience and benchmark this state-of-the-art interactive world model.
Learn More:
- GitHub Repository: https://github.com/Tencent-Hunyuan/HY-WorldPlay
- Technical Report & Paper: Check the repository for links to the detailed technical report and research papers.



