HY-World 1.5 (WorldPlay): A Game-Changer for Real-Time Interactive World Models

HY-World 1.5 (WorldPlay): A Game-Changer for Real-Time Interactive World Models

4 min read

The quest for AI that can generate and simulate consistent, interactive worlds in real-time has taken a monumental leap forward. On December 17, 2025, Tencent's Hunyuan team open-sourced HY-World 1.5, codenamed WorldPlay. This isn't just an incremental update; it's a comprehensive framework that claims to resolve the fundamental trade-off between speed, memory, and long-term consistency in world modeling.

In short, WorldPlay enables the generation of long-horizon, interactive streaming video at a stunning 24 FPS, all while maintaining geometric consistency over time. Let's dive into what makes this model so revolutionary.

The Core Problem: Speed vs. Consistency#

Previous world models, including the team's own HY-World 1.0, often faced a critical limitation. They could generate impressive 3D worlds but typically through a slow, offline process. Achieving real-time interaction meant sacrificing the long-term consistency of the environment—objects would morph, textures would flicker, and the geometry would drift over time. WorldPlay aims to shatter this compromise.

The Four Pillars of WorldPlay's Architecture#

The breakthrough is powered by four key technical innovations:

  1. Dual Action Representation: This is the "controller" of the model. It translates user inputs (like keyboard and mouse movements) into a robust, model-understandable action space that allows for precise and responsive control over the generated world's viewpoint.

  2. Reconstituted Context Memory: This is the core of long-term consistency. To prevent the model from "forgetting" the past, this module dynamically rebuilds context from previously generated video chunks. It uses a clever technique called temporal reframing to keep geometrically important frames from the distant past accessible, effectively solving the problem of memory attenuation.

  3. WorldCompass: A Novel RL Post-Training Framework: After initial training, the model undergoes a reinforcement learning (RL) phase specifically designed for long-horizon tasks. WorldCompass directly optimizes the model for better action-following and higher visual quality over extended sequences, ensuring the output remains stable and coherent.

  4. Context Forcing: Memory-Aware Distillation: To achieve real-time speeds, a smaller, faster "student" model is often distilled from a larger "teacher" model. However, standard distillation can cause the student to lose its ability to use long-range context. Context Forcing is a novel distillation method that aligns the memory context between teacher and student, preserving the student's capacity for long-term reasoning while enabling 24 FPS generation.

Key Features and Capabilities#

  • Real-Time and Interactive: Generates video streams at 24 FPS, allowing for live interaction based on user input.
  • Long-Term Geometric Consistency: Maintains the stability and coherence of the world's structure over long generation horizons.
  • Versatile Applications: Supports both first-person and third-person perspectives in real-world and stylized environments. Potential applications include interactive 3D reconstruction, promptable events (e.g., "make it rain"), and infinite world extension.
  • Comprehensive Open-Source Release: The team has open-sourced not just the model weights but a full-stack framework covering data, training, and inference deployment.

Quantitative Superiority#

The model's performance is backed by extensive evaluations. As shown in the table below, the full WorldPlay model ("Ours (full)") outperforms existing state-of-the-art methods across key metrics like PSNR, SSIM, and LPIPS, especially in long-term scenarios, while being the only one that operates in real-time.

ModelReal-timeShort-term PSNR/SSIM/LPIPSLong-term PSNR/SSIM/LPIPS
CameraCtrl17.93 / 0.569 / 0.29810.09 / 0.241 / 0.549
Gen3C21.68 / 0.635 / 0.27815.37 / 0.431 / 0.483
Matrix-Game-2.017.26 / 0.505 / 0.3839.57 / 0.205 / 0.631
Ours (full)21.92 / 0.702 / 0.24718.94 / 0.585 / 0.371

Getting Started with WorldPlay#

For developers eager to experiment, the repository provides a clear path to quick start. The model is built upon the powerful HunyuanVideo-1.5 base model. The setup involves:

  1. Creating a Python 3.10 environment and installing dependencies.
  2. Installing Flash Attention for optimized performance.
  3. Downloading the pre-trained HunyuanVideo-1.5 model and the specific WorldPlay checkpoints.
  4. Running the provided inference scripts (generate.py or generate_custom_trajectory.py for custom camera paths).

The code supports inference with different model variants: bidirectional, autoregressive, and the distilled autoregressive model for maximum speed.

Conclusion and Future Work#

HY-World 1.5 (WorldPlay) represents a significant milestone in AI-driven content creation and simulation. By systematically addressing the bottlenecks of speed and consistency, it opens up new possibilities for real-time, interactive applications in gaming, virtual reality, and architectural visualization.

The team has indicated that the training code is still on the TODO list for open-sourcing, which will be a crucial next step for the research community to build upon this work. For now, the release of the models and inference code is a massive contribution that allows everyone to experience and benchmark this state-of-the-art interactive world model.

Learn More:

S
Author

Story321 AI Blog Team is dedicated to providing in-depth, unbiased evaluations of technology products and digital solutions. Our team consists of experienced professionals passionate about sharing practical insights and helping readers make informed decisions.

Start Creating with AI

Transform your creative ideas into reality with Story321 AI tools

Get Started Free

Related Articles