Machine Learning Intern at Bodymetrics

Published:

Overview

For my MSc dissertation in Machine Learning at UCL, I investigated the limits of current 3D Gaussian Splatting pipeline when extended to ultra-high resolution training images to model high frequency features relating to humans and related subjects such as handbags, garments etc. I worked under the supervision of Dr. Anthony Steed, Dr. Suran Goonatilake and Daniele Giunchi in collaboration with Dr Suran’s company Bodymetrics under UCL’s Industry Exchange Network program.

Gaussian splats with maximum opacity
Gaussian splats with maximum opacity Credits: Dylan Ebert at HuggingFace

Problem Statement

Since the release of seminal work on 3DGS, promising results have been achieved in the field of 3D reconstruction and novel view synthesis. However, the current state-of-the-art methods are limited by the resolution of the training images available in most datasets. Such limitations ultimately puts a cap on the quality of the novel views generated by these methods and no amount of supersampling can recover the high frequency details. Additionally, consumer grade cameras are now capable of capturing ultra-high resolution images which are not being utilised to their full potential in most datasets. Since the fashion industry is heavily reliant on high resolution images for marketing and sales, it is imperative to investigate the limits of current 3DGS pipeline for high resolution novel view synthesis.

Custom Dataset Generation

I generated a custom dataset of ultra-high resolution images of a human subject captured in a photogrammetry rig at Centre for Creative and Immersive Extended Reality at the University of Portsmouth. The dataset consisted of 6 individual challenging scenes with a variety of poses, clothing and accessories captured from 140 cameras.

Photogrammetry rig
Photogrammetry & Scanning Studio used for Data Collection at CCIXR Image Credits: CCIXR

Results

I conducted a series of experiments to investigate the limits of current 3DGS pipeline and performed visual inspection on novel views generated using Unity. My experiments and the rationale behind them can be summarised as follows:

  1. Baseline Model: Setting the Foundation for Success
  2. Exploring Hyperparameters: Unlocking Better Performance
  3. Incorporating Loss on High-Frequency Image Features
  4. Why Train on PNG Instead of JPEG? Uncover the Key Differences

I created countless novel views using Unity to investigate differences between different models and it became clear that pointing all differences in a report format in PDF would be impossible for my dissertation. So I created a project page with most of the results presented in an intuitive manner along with hover-on-zoom feature for better inspection of high frequency features. Developing the project page was a great learning experience in HTML+CSS+JS and I am proud of the final outcome.

Overall this project presented a steep learning curve and I am grateful to my supervisors for their guidance and support throughout the project. I learnt invaluable skills in the field of 3D reconstruction, novel view synthesis and tools such as Unity, COLMAP apart from Python and PyTorch. I am currently awaiting the results of my dissertation and I am excited for the future of 3DGS especially in the field of fashion and retail.