In the field of 3D content creation, high-quality and efficient generative models have always been a key goal for researchers. Microsoft's newly open-sourced TRELLIS.2 model brings groundbreaking progress to 3D generation with its innovative technical architecture and exceptional performance.
What is TRELLIS.2?#
TRELLIS.2 is a large-scale 3D generative model with 4 billion parameters, specifically designed for high-fidelity image-to-3D generation. The core breakthrough of this model lies in introducing a novel sparse voxel representation called "O-Voxel," which fundamentally transforms the traditional 3D generation workflow.
Key Technical Features#
🚀 Exceptional Generation Efficiency and Quality#
TRELLIS.2 achieves a perfect balance between generation speed and quality:
| Resolution | Total Time | Shape Generation | Material Generation |
|---|---|---|---|
| 512³ | ~3 seconds | 2 seconds | 1 second |
| 1024³ | ~17 seconds | 10 seconds | 7 seconds |
| 1536³ | ~60 seconds | 35 seconds | 25 seconds |
Tested on NVIDIA H100 GPU
🔄 Revolutionary O-Voxel Representation#
Traditional iso-surface field representations have limitations when handling complex structures, but O-Voxel technology breaks through these constraints:
- Open Surface Handling: Perfectly processes non-closed structures like clothing and leaves
- Non-Manifold Geometry Support: Handles complex topologies without cumbersome conversions
- Internal Structure Preservation: Maintains details of internal enclosed structures completely
🎨 Full PBR Material Support#
Unlike models that only generate basic colors, TRELLIS.2 supports complete Physically-Based Rendering (PBR) materials:
- Base Color
- Roughness
- Metallic
- Opacity
⚡ Minimalist Processing Pipeline#
TRELLIS.2 optimizes the data processing pipeline for near-instant conversions:
- Textured Mesh → O-Voxel: <10 seconds (single CPU)
- O-Voxel → Textured Mesh: <100 milliseconds (CUDA)
Technical Architecture Innovations#
Sparse 3D VAE Encoding#
The model uses a sparse 3D Variational Autoencoder with 16× spatial downsampling to encode 3D assets into a compact latent space, laying the foundation for subsequent generation.
DiT-Based Generation Architecture#
It employs standard Diffusion Transformers (DiT) for efficient generation, demonstrating the powerful potential of traditional architectures with new representations.
Application Prospects#
TRELLIS.2's technical breakthroughs open new possibilities for multiple fields:
- Game Development: Rapid generation of high-quality 3D assets
- Virtual Reality: Real-time creation of immersive environments
- Industrial Design: Fast prototyping and visualization
- Film Production: Efficient generation of special effects assets
Open Source Ecosystem#
The project is built on several high-performance specialized libraries:
- O-Voxel: Core representation processing library
- FlexGEMM: Efficient sparse convolution based on Triton
- CuMesh: CUDA-accelerated mesh processing utilities
Conclusion#
TRELLIS.2 represents a significant milestone in 3D generation technology. Its innovative O-Voxel representation and efficient generation architecture set new standards for the industry. With the complete open-sourcing of code and pre-trained models, this technology is poised to accelerate development across the entire 3D content creation field.
For developers and researchers, now is the perfect time to explore and leverage this powerful tool. Whether for commercial applications or academic research, TRELLIS.2 opens a new door to automated high-quality 3D content generation.
*Project Address: GitHub - microsoft/TRELLIS.2
*Pre-trained Model: Hugging Face Model Hub



