Mip-Splatting

Alias-free 3D Gaussian Splatting

CVPR 2024 (Oral, Best Student Paper)

Zehao Yu^1,2 Anpei Chen^1,2 Binbin Huang³ Torsten Sattler⁴ Andreas Geiger^1,2

¹University of Tübingen ²Tübingen AI Center
³ShanghaiTech University ⁴Czech Technical University in Prague

TL;DR: We introduce a 3D smoothing filter and a 2D Mip filter for 3D Gaussian Splatting (3DGS), eliminating multiple artifacts and achieving alias-free renderings.

Abstract

Recently, 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results, reaching high fidelity and efficiency. However, strong artifacts can be observed when changing the sampling rate, e.g., by changing focal length or camera distance. We find that the source for this phenomenon can be attributed to the lack of 3D frequency constraints and the usage of a 2D dilation filter. To address this problem, we introduce a 3D smoothing filter which constrains the size of the 3D Gaussian primitives based on the maximal sampling frequency induced by the input views, eliminating high frequency artifacts when zooming in. Moreover, replacing 2D dilation with a 2D Mip filter, which simulates a 2D box filter, effectively mitigates aliasing and dilation issues. Our comprehensive evaluation, including scenarios such as training on single-scale images and testing on multiple scales, validates the effectiveness of our approach.

Motivation

3D Gaussian Splatting renders images by representing 3D Objects as Gaussians which are projected onto the image plane followed by 2D Dilation in screen space as shown in (a). The method's intrinsic shrinkage bias leads to degenerate 3D Gaussians exceed the sampling limit as illustrated by the δ function in (b) while rendering similarly to 2D due to the dilation operation. However, when changing the sampling rate (via the focal length or camera distance), we observe strong dilation effects (c) and high frequency artifacts (d).

Video

Results

Comparison wtih 3DGS

3DGS produces dilation and erosion artifacts due to the use of dilation. It produces erosion effects when zooming in or moving the camera closer. This is because the dilated 2D Gaussians become smaller in screen space, rendering object structures thinner than they actually appear. Conversely, screen space dilation produces dilation artifacts when zooming out or moving away from the scene. In this case, dilated 2D Gaussian become bigger in screen space, rendering object structures thicker than they actually appear. In contrast, our method is free of such artifacts by introducing a 3D smoothing filter and a 2D Mip filter.

Comparison wtih 3DGS + EWA

Replacing the 2D dilation of 3DGS with an EWA (elliptical weighted average) filter, denoted as 3DGS + EWA, reduces the dilation and erosion artifacts. However, it produces high-frequency artifacts when zooming in, while our method is free of such artifacts, as shown in the following comparisons.

Here, we show more comparisons with 3DGS + EWA. Both models are trained with downsampled images with factor 8 and render at higher-resolution. GT (Training resolution) is the image we used for training but bilinearly upsampled to higher-resolution for reference and GT (8x resolution) is the real GT image we used for evaluation.

Effectiveness of 2D Mip Filter

Our 2D Mip filter simulates a 2D box filter in physical imaging process. It approximates exactly 1 pixel in screen space, thus effectively reducing aliasing artifacts. As shown in the following video, removing the 2D Mip filter results in aliasing artifacts when zooming out.

Effectiveness of 3D Smoothing Filter

The 3D smoothing filter constrains the size of the 3D Gaussian primitives based on the maximal sampling frequency induced by the training views, eliminating high frequency artifacts when zooming in. In the following comparisons, we train the models with downsampled images and render high resolution images to simulate zoom-in effects. Excluding the 3D smoothing filter results in high-frequency artifacts. Note that both models are trained with downsampled images with factor 8 and render at higher-resolution. GT (Training resolution) is the image we used for training but bilinearly upsampled to higher-resolution for reference and GT (8x resolution) is the real GT image we used for evaluation.

Real-Time Interactive Viewer

Click the image to use the real-time interactive viewer. Please open the viewer with Chrome or Firefox. For more results, please check our online viewer.

Bicycle

Garden

Stump

Kitchen

Chair

Ficus

Lego

Ship

BibTeX

@article{Yu2023MipSplatting,
  author    = {Yu, Zehao and Chen, Anpei and Huang, Binbin and Sattler, Torsten and Geiger, Andreas},
  title     = {Mip-Splatting: Alias-free 3D Gaussian Splatting},
  journal   = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2024},
}

Acknowledgements

ZY, AC and AG are supported by the ERC Starting Grant LEGO-3D (850533) and DFG EXC number 2064/1 - project number 390727645. TS is supported by a Czech Science Foundation (GACR) EXPRO grant (UNI-3D, grant no. 23-07973X). We also thank Christian Reiser for insightful discussions during the preparation of the draft.