TL;DR

Dynamic 3D Gaussian Splatting overfits by 6.18 dB on average on D-NeRF. A systematic ablation traces >80% of this gap to the split operation of Adaptive Density Control. Across 9 ablation conditions we see a log-linear count–gap correlation (r = 0.995). Then EER—a k-NN elastic-strain penalty on per-Gaussian deformation—breaks this correlation: it reduces the gap by 40.8% while increasing the Gaussian count by 85%. Our full combination closes 57.4% of the baseline gap.

6.18 dBbaseline gap (8 D-NeRF scenes)
99.72%EER strain reduction (8-scene mean)
40.8%EER gap reduction (D-NeRF)
14.9%EER gap reduction (4 HyperNeRF scenes)
57.4%full combination
Count vs gap: EER breaks the log-linear correlation.
The count–gap paradigm shift. Ablations (gray) follow a log-linear trend (r = 0.995, bootstrap 95% CI [0.993, 1.000]). EER (green) uses more Gaussians yet overfits less. The correlation holds within 41 non-EER configurations (r = 0.987) — EER is the only lever we found that breaks it.

Abstract

Dynamic 3D Gaussian Splatting achieves impressive novel-view synthesis on monocular video by coupling a deformable point cloud with Adaptive Density Control (ADC), but exhibits a severe train–test generalization gap. On the D-NeRF benchmark (8 synthetic scenes) we measure an average gap of 6.18 dB (up to 11 dB per scene) and, through a systematic ablation of every ADC sub-operation (split, clone, prune, frequency, threshold, schedule), identify splitting as the dominant pathway.

Our central finding is that Elastic Energy Regularization (EER) — a local smoothness penalty on the per-Gaussian deformation field — breaks the log-linear count–gap correlation observed across ablations. This reframes overfitting from a capacity problem to an incoherent deformation problem. Two small methods carry the paper: GAD (capacity lever, a loss-rate-aware densification threshold) and EER (coherence lever, a k-NN smoothness penalty on the deformation field). Their combination closes 48% of the gap; adding a stochastic complement (PTDrop) and a hard cloud cap (LogiGrow) reaches 57.4%.

All findings are on synthetic D-NeRF scenes; real-world validation (HyperNeRF, Deformable-3DGS cross-architecture) is partial and still in progress — see the cross-architecture section.

Key Findings

1. Split drives >80% of overfitting

Disabling split collapses both the cloud (2K vs 44K Gaussians) and the gap (1.15 dB vs 6.18 dB). Disabling pruning changes nothing.

2. Count–gap correlation is real but incomplete

r = 0.995 on 9 ablation conditions, holding within both sub-clusters (r = 0.998 on high-count, 0.95 on low-count) and across 41 non-EER configurations (r = 0.987).

3. EER breaks the correlation

+85% Gaussians, −40.8% gap. At the per-Gaussian level, EER reduces deformation strain by 99.6% on Lego, 99.8% on T-Rex, 99.6% on Hellwarrior.

4. Orthogonal axes compound

GAD+EER = 48.2% reduction. Adding LogiGrow + PTDrop = 57.4%, the only configuration in our sweep to more than halve the gap.

Method ranking (gap reduction)

V2 method ranking by gap reduction.
Gap reduction across the v2 methods we keep. Every configuration with a green (EER) bar dominates every non-EER configuration. Ablation baselines (grey) mark the extremes of what the ADC knob alone can do. The full combination GAD+LogiGrow+PTDrop+EER is the only configuration that crosses 50%.

Pareto frontier: quality vs overfitting

Pareto frontier.
GAD+EER and the full combination define the Pareto front. No non-EER combination exceeds 25% gap reduction.

Ablation summary

Ablation summary bar chart.
Left: test PSNR (quality). Right: train–test gap (overfitting). A1/A2 kill the gap but destroy quality; A3 (no clone) is the best ablation trade-off; A4 (no prune) is irrelevant.

Gap grows with training, not with iterations alone

Overfitting gap over training iterations.
Train–test PSNR gap over training (mean ± std across 8 scenes). Baseline grows to ~6 dB; disabling split holds it at ~1 dB. The divergence tracks the densification window (iters 500–15,000).

Why early stopping fails: densification is front-loaded

Front-loaded densification bar chart.
84–89% of cloud growth happens before iter 7,500. Stopping densification at iter 7,500 (A6) only trims the count by 10% and has essentially no effect on the gap — confirming that mitigation must modulate densification from the start, not truncate it at the end.

Dose–response: GAD and EER

Dose-response curves for GAD and EER.
Gap (blue/green, left axis) and test PSNR (red, right axis) as we sweep each method's strength parameter. Both GAD (capacity lever) and EER (coherence lever) produce smooth, monotonic trade-offs — EER's curve is markedly steeper, reaching a 44% gap reduction where GAD reaches 19%.

Method Taxonomy

Two small drop-in methods, each one hyperparameter and about 20 lines of code:

Capacity lever

  • GAD — a loss-rate-aware densification threshold. Rises when the cloud is large and loss has plateaued, so only Gaussians that still earn their complexity are kept.

Coherence lever

  • EER ★ — a local smoothness penalty on the per-Gaussian deformation field: penalize relative motion between each Gaussian and its k canonical neighbors.

Stochastic complement

  • PTDrop — Gaussian-level dropout on a cosine schedule (iters 5K–12K).

We also tried spectral-gated densification, temporal Sobolev smoothness, SH-coefficient penalties, and opacity-entropy maximization (SGD / STSR / ChromReg / OEM). At our scale none moves the gap by more than 10%, so the v2 paper documents them as negative results rather than first-class methods; the v1 paper (PDF, companion main_v1.tex) has the full taxonomy for reference.

GAD: a BIC-motivated threshold schedule

We adapt the per-iteration gradient threshold as

τGAD(t) = τbase · (1 + λ · K(t) / (N · Δℓema(t)))

where K(t) is the current count, N is the number of training pixels, and Δℓema is an EMA of the per-iteration loss improvement. λ is the single tunable knob. The mapping from BIC to this formula is a heuristic (see paper, §6.2); the empirical diminishing-returns exponent we measure (α ≈ 0.04) is too mild to justify the often-quoted O((N/λ)1/4) growth bound, so we present the bound qualitatively as "sublinear in N".

EER: k-NN elastic strain energy

For a subset of Gaussians i and their k=8 canonical neighbors j, we penalize

EER = meani,j ‖ u(xi, t) − u(xj, t) ‖² / (‖ xi − xj ‖² + ε)

where u(x, t) is the deformation offset at time t. This is the discrete elastic strain — physically the correct choice for linear elasticity (Hooke's law penalizes ∂u/∂x, not ∂u). In canonical space the k-NN graph is stable; we rebuild it every 500 iterations and apply a cosine ramp from iteration 3K to 10K.

Interactive 3D Deformation Viewer

Explore the deformation field in 3D. Left panel: baseline (incoherent per-Gaussian deformation). Right panel: EER (coherent elastic deformation). Use the time slider to animate — watch how baseline Gaussians scatter chaotically at novel timesteps while EER maintains spatial coherence. Drag to orbit; scroll to zoom. Cameras are linked between panels.

12,000 highest-opacity Gaussians per scene, 11 timesteps (t=0.0 to 1.0). Color by displacement magnitude (viridis) or strain (inferno). Requires serving via HTTP (python -m http.server 8000).

What EER Actually Does to the Deformation Field

For every D-NeRF scene, we load the trained 4DGS model, query the per-Gaussian deformation at 4 timesteps, and plot the distribution of per-Gaussian strain εi = meanj ‖ui−uj‖² / ‖xi−xj‖² over its 8 canonical neighbors.

Lego deformation field.
Lego: strain ↓ 99.62%
T-Rex deformation field.
T-Rex: strain ↓ 99.80%
Hellwarrior deformation field.
Hellwarrior: strain ↓ 99.58%
Bouncing-balls deformation field.
Bouncing-balls: strain ↓ 99.90%
Jumping-jacks deformation field.
Jumping-jacks: strain ↓ 99.84%
Stand-up deformation field.
Stand-up: strain ↓ 99.82%
Mutant deformation field.
Mutant: strain ↓ 99.64%
Hook deformation field.
Hook: strain ↓ 99.59%

Each panel shows (left) canonical cloud colored by displacement magnitude, (middle) a subsampled quiver of u(x, t=0.5), (right) the per-Gaussian strain histogram. Baseline is bimodal with heavy tails; EER collapses the distribution by two orders of magnitude. This is the direct mechanism behind EER's overfitting reduction.

Strain reduction on every scene

Scene Baseline ε EER ε Reduction
bouncingballs2.8350.0029699.90%
hellwarrior5.7850.0240899.58%
hook2.6270.0109099.59%
jumpingjacks6.7720.0110699.84%
lego1.5730.0059499.62%
mutant1.3230.0048199.64%
standup3.6860.0066799.82%
trex3.7150.0073899.80%
mean (n=8)3.5390.0092299.72%

Measured at iter 20,000 on trained 4DGS checkpoints. Strain ε is mean over k=8 canonical neighbors of ‖ui−uj‖² / ‖xi−xj‖², averaged over 4 timesteps (t=0, 0.25, 0.5, 0.75).

EER: The Paradigm Shift

EER three-panel analysis.
(a) EER λ sweep: consistent gap reduction across scenes. (b) EER increases final Gaussian count — the reverse of capacity control. (c) Per-scene gap reduction: consistent across all 8 scenes, including the pathological Lego and Hellwarrior.
Combination additivity plot.
Combinations are super-additive: GAD+EER exceeds the sum of individual reductions, confirming capacity and coherence target orthogonal failure modes.

Real-World Validation (HyperNeRF — 4 scenes)

EER transfers to real monocular video. On 4 HyperNeRF scenes, with 4DGS and the same λ=0.05 tuned on synthetic D-NeRF — no per-dataset re-tuning — EER reduces the gap on every scene at near-zero quality cost:

Scene Baseline gap EER gap Reduction ΔTest PSNR
chickchicken5.48 dB4.61 dB+15.9%−0.20
slice-banana5.89 dB5.40 dB+8.3%+0.03
vrig-3dprinter4.49 dB3.41 dB+24.0%+0.11
vrig-peel-banana0.89 dB0.83 dB+6.6%−0.23
mean (n=4)4.19 dB3.56 dB+14.9%−0.07

4DGS on HyperNeRF, 14K iterations (stock config), RTX 3070. Gap reductions range from +6.6% (peel-banana, where the baseline gap is already small at 0.89 dB) to +24.0% (3dprinter, where test PSNR also improves). Mean quality cost: −0.07 dB — effectively free. The coherence finding survives noisy poses and non-Lambertian materials.

Cross-Architecture Validation (Deformable-3DGS)

Main experiments are on 4DGS (HexPlane deformation). We ported EER and GAD to Deformable-3DGS (MLP deformation) and ran baseline + EER on three D-NeRF scenes for 20K iterations.

Phase 1: direct-transfer test at D-NeRF-tuned λ=0.05

Scene Baseline gap EER λ=0.05 gap Reduction ΔPSNR
lego13.15 dB13.56 dB-3.1%-0.02 dB
trex1.50 dB1.81 dB-20.8%-0.38 dB
hellwarrior4.08 dB3.87 dB+5.2%-0.22 dB

Direct transfer at λ=0.05 is poor (mean −6% reduction). Why? Deformable-3DGS trains with L1+0.2·(1−SSIM) vs.\ 4DGS's pure L1 — the loss magnitude is roughly 3× larger and λ=0.05 is therefore under-regularized. Our dimensional-analysis note (paper §6.2) predicts the correct λ for Deformable-3DGS is ≈ 0.15–0.30. Testing this directly:

Phase 2: λ sweep on Deformable-3DGS Lego (dimensional-analysis test)

λ Gap (dB) Train PSNR Test PSNR ΔTest Reduction
0 (baseline)13.1538.3825.23
0.0513.5638.7725.21−0.02−3.1%
0.1510.2335.5525.33+0.10+22.3%
0.308.2633.6025.34+0.11+37.2%
0.607.8233.2125.39+0.16+40.6%

Cross-scene confirmation at λ=0.30

To confirm the sweep is not Lego-specific, we replicated λ=0.30 on Hellwarrior:

SceneBaseline gapEER λ=0.30ReductionΔTest
Lego13.158.26+37.2%+0.11
Hellwarrior4.083.54+13.2%−2.44

The coherence mechanism transfers across deformation architectures and across scenes; the hyperparameter requires per-architecture (and to a lesser extent per-scene) calibration, exactly as the dimensional-analysis note predicted. On Hellwarrior the quality cost is larger at λ=0.30 (−2.44 dB); a smaller λ like 0.05 already gives +5.2% gap reduction at only −0.22 dB.

BibTeX

@article{droby2026monodygs,
  author  = {Ahmad Droby},
  title   = {Incoherent Deformation, Not Capacity: Diagnosing and
             Mitigating Overfitting in Dynamic Gaussian Splatting},
  journal = {arXiv preprint},
  year    = {2026}
}