High-Resolution Novel View Synthesis Using 3D Gaussian Splatting: A Focus on Human Subjects

1University College London,

A custom ultra-high-resolution dataset was created in a photogrammetry rig for this thesis

Abstract

Recent advancements in 3D Gaussian Splatting (3DGS) have demonstrated significant superiority over the previous state-of-the-art NeRF based methods for novel view synthesis, achieving higher performance with lower training times and computational costs. This breakthrough represents a paradigm shift in 3D reconstruction and novel view synthesis methods. This thesis aims to rigorously investigate the limitations of the current 3DGS pipeline, in the context of image resolution during both the training and rendering phases. The focus is on High Resolution Novel View Synthesis of human subjects captured using an ultra-high-resolution, photogrammetry setup. Additionally, this thesis investigates four distinct modifications to the traditional 3DGS training pipeline with an aim to identify key areas of the pipeline most responsible for enhancing the quality of novel view rendering especially when training on high-resolution data involving human subjects.

Overview

We tested out 4 variants of the 3DGS model, including the baseline model, to investigate the impact of different hyperparameters. All renders on this page are at (3840x2160) 4K resolution. Simply hover over the images to zoom in on high frequency features.

Baseline 3DGS

Baseline 3DGS model with all hyperparameters set to default values as described in the original implementation by Kerbl et al. (2023). The choice of hyperparameters was made to balance the trade-off between training time and rendering quality.

60K iterations

The 3DGS model was trained for twice the number of iterations compared to the baseline model. Consequently, all key steps in the pipeline were also scaled up by a factor of two to maintain the proportional relationship between different training phases.

Inclusion of Focal Frequency Loss

A Focal Frequency Loss Liming Jiang et al. (2020) term was incorporated in the loss function during training to improve the rendering quality of high frequency features.

Training on PNG images

The 3DGS model was trained on PNG images instead of the default JPEG format to investigate the impact of image compression on the rendering quality of the model.

Training on masked images

The 3DGS model was trained on masked images to focus only on the foreground objects in the scene. The masks were generated using Grounding DINO capable of detecting arbitrary objects with just query texts. We used simple bounding boxes to mask out the background and retain only the human subject and the related objects in the foreground.

Related Links

MrNeRF/awesome-3D-gaussian-splatting is a curated list of 3DGS papers and resources which was extremely helpful in the for my thesis.

Baseline 3DGS model ▶

60K iterations ▶

Training on PNG images ▶

Inclusion of Focal Frequency Loss ▶

Training on masked images ▶