TL;DR

Dynamic 3D Gaussian Splatting overfits by 6.18 dB on average on D-NeRF. A systematic ablation identifies the split operation as the bottleneck of the overfitting cascade: disabling it eliminates the gap (1.15 dB) but also collapses test PSNR by 9.93 dB, so it is not a viable mitigation. Across 9 ablation conditions, gap is monotone in count (Spearman ρ = 1.00). A local k-NN strain prior on the deformation field breaks this pattern: it reduces the gap by 40.8% while growing the cloud by 85%. A controlled ablation against E-D3DGS-style embedding smoothness and an SC-GS-style ARAP residual shows the three normalized variants are statistically tied — the canonical-distance normalization is the load-bearing element, not the choice of encoding. Our recommended combination GAD+EER closes 48.2% of the gap; the full stack reaches 57.4%.

6.18 dBbaseline gap (8 D-NeRF scenes)
9.93 dBtest-PSNR collapse if split is disabled
99.73%strain reduction at held-out test timesteps
47.5% / 46.1% / 40.4% / +2.2%strain / on-embed / arap / no-norm — variants ablation
48.2%recommended GAD+EER (D-NeRF)
+16.1%EER on HyperNeRF high-gap subset (n=3 of 5)
57.4%full stack (PTDrop + GrowthCap)
Count vs gap: EER breaks the log-linear correlation.
The count–gap paradigm shift. Ablations (gray) follow a log-linear trend (r = 0.995, bootstrap 95% CI [0.993, 1.000]). EER (green) uses more Gaussians yet overfits less. The correlation holds within 41 non-EER configurations (r = 0.987) — EER is the only lever we found that breaks it.

Abstract

Dynamic 3D Gaussian Splatting achieves impressive novel-view synthesis on monocular video by coupling a deformable point cloud with Adaptive Density Control (ADC), but exhibits a severe train–test generalization gap. On the D-NeRF benchmark (8 synthetic scenes) we measure an average gap of 6.18 dB (up to 11 dB per scene) and, through a systematic ablation of every ADC sub-operation (split, clone, prune, frequency, threshold, schedule), identify splitting as the bottleneck of the overfitting cascade — disabling split eliminates the gap (1.15 dB) but also collapses test PSNR by 9.93 dB, so it is not a viable mitigation. Split is the operation through which the cascade flows, not a knob one can simply turn off.

Our central finding is that a local smoothness penalty on the per-Gaussian deformation field — we use a k-NN strain prior we call EERbreaks the count–gap correlation observed across ablations: it reduces the gap by 40.8% while growing the cloud by 85%. This reframes overfitting from a capacity problem to an incoherent deformation problem. A controlled ablation against E-D3DGS-style per-embedding smoothness and an SC-GS-style ARAP residual shows the three normalized variants achieve statistically tied gap reductions (47.5% / 46.1% / 40.4%); dropping the canonical-distance normalization disables the prior entirely (+2.2%). The substantive contribution is therefore not a new method but the diagnostic finding plus the identification that the canonical-distance normalization is the load-bearing element of these priors. Combined with GAD (a loss-rate-aware densification threshold), the recommended configuration GAD+EER closes 48.2% of the gap; adding PTDrop (jitter-weighted dropout) and a soft cloud cap reaches 57.4% at larger quality cost.

Findings are validated on D-NeRF (8 synthetic scenes), Deformable-3DGS (cross-architecture), and HyperNeRF (5 real-world scenes; +16.1% gap reduction on the high-baseline-gap subset, neutral on low-baseline scenes). EER's k-NN cost scales with cloud size; approximate-NN structures are needed to scale beyond ~100K Gaussians on consumer hardware.

Key Findings

1. Split is the bottleneck of the cascade

Disabling split collapses both the cloud (2K vs 44K Gaussians) and the gap (1.15 dB vs 6.18 dB) — but also collapses test PSNR by 9.93 dB, so it is not a viable mitigation. Disabling pruning changes nothing.

2. Count–gap correlation is real but incomplete

r = 0.995 on 9 ablation conditions, holding within both sub-clusters (r = 0.998 on high-count, 0.95 on low-count) and across 41 non-EER configurations (r = 0.987).

3. EER breaks the correlation

+85% Gaussians, −40.8% gap. At the per-Gaussian level, EER reduces deformation strain by 99.6% on Lego, 99.8% on T-Rex, 99.6% on Hellwarrior.

4. Orthogonal axes compound

GAD+EER = 48.2% reduction. Adding LogiGrow + PTDrop = 57.4%, the only configuration in our sweep to more than halve the gap.

Method ranking (gap reduction)

V2 method ranking by gap reduction.
Gap reduction across the v2 methods we keep. Every configuration with a green (EER) bar dominates every non-EER configuration. Ablation baselines (grey) mark the extremes of what the ADC knob alone can do. The full combination GAD+LogiGrow+PTDrop+EER is the only configuration that crosses 50%.

Pareto frontier: quality vs overfitting

Pareto frontier.
GAD+EER and the full combination define the Pareto front. No non-EER combination exceeds 25% gap reduction.

Ablation summary

Ablation summary bar chart.
Left: test PSNR (quality). Right: train–test gap (overfitting). A1/A2 kill the gap but destroy quality; A3 (no clone) is the best ablation trade-off; A4 (no prune) is irrelevant.

Gap grows with training, not with iterations alone

Overfitting gap over training iterations.
Train–test PSNR gap over training (mean ± std across 8 scenes). Baseline grows to ~6 dB; disabling split holds it at ~1 dB. The divergence tracks the densification window (iters 500–15,000).

Why early stopping fails: densification is front-loaded

Front-loaded densification bar chart.
84–89% of cloud growth happens before iter 7,500. Stopping densification at iter 7,500 (A6) only trims the count by 10% and has essentially no effect on the gap — confirming that mitigation must modulate densification from the start, not truncate it at the end.

Dose–response: GAD and EER

Dose-response curves for GAD and EER.
Gap (blue/green, left axis) and test PSNR (red, right axis) as we sweep each method's strength parameter. Both GAD (capacity lever) and EER (coherence lever) produce smooth, monotonic trade-offs — EER's curve is markedly steeper, reaching a 44% gap reduction where GAD reaches 19%.

Method Taxonomy

Two small drop-in methods, each one hyperparameter and about 20 lines of code:

Capacity lever

  • GAD — a loss-rate-aware densification threshold. Rises when the cloud is large and loss has plateaued, so only Gaussians that still earn their complexity are kept.

Coherence lever

  • EER ★ — a local smoothness penalty on the per-Gaussian deformation field: penalize relative motion between each Gaussian and its k canonical neighbors.

Stochastic complement

  • PTDrop — Gaussian-level dropout on a cosine schedule (iters 5K–12K).

We also tried spectral-gated densification, temporal Sobolev smoothness, SH-coefficient penalties, and opacity-entropy maximization (SGD / STSR / ChromReg / OEM). At our scale none moves the gap by more than 10%, so the v2 paper documents them as negative results rather than first-class methods; the v1 paper (PDF, companion main_v1.tex) has the full taxonomy for reference.

GAD: a BIC-motivated threshold schedule

We adapt the per-iteration gradient threshold as

τGAD(t) = τbase · (1 + λ · K(t) / (N · Δℓema(t)))

where K(t) is the current count, N is the number of training pixels, and Δℓema is an EMA of the per-iteration loss improvement. λ is the single tunable knob. The mapping from BIC to this formula is a heuristic (see paper, §6.2); the empirical diminishing-returns exponent we measure (α ≈ 0.04) is too mild to justify the often-quoted O((N/λ)1/4) growth bound, so we present the bound qualitatively as "sublinear in N".

EER: k-NN elastic strain energy

For a subset of Gaussians i and their k=8 canonical neighbors j, we penalize

EER = meani,j ‖ u(xi, t) − u(xj, t) ‖² / (‖ xi − xj ‖² + ε)

where u(x, t) is the deformation offset at time t. This is the discrete elastic strain — physically the correct choice for linear elasticity (Hooke's law penalizes ∂u/∂x, not ∂u). In canonical space the k-NN graph is stable; we rebuild it every 500 iterations and apply a cosine ramp from iteration 3K to 10K.

Interactive 3D Deformation Viewer

Explore the deformation field in 3D. Left panel: baseline (incoherent per-Gaussian deformation). Right panel: EER (coherent elastic deformation). Use the time slider to animate — watch how baseline Gaussians scatter chaotically at novel timesteps while EER maintains spatial coherence. Drag to orbit; scroll to zoom. Cameras are linked between panels.

12,000 highest-opacity Gaussians per scene, 11 timesteps (t=0.0 to 1.0). Color by displacement magnitude (viridis) or strain (inferno). Requires serving via HTTP (python -m http.server 8000).

What EER Actually Does to the Deformation Field

For every D-NeRF scene, we load the trained 4DGS model, query the per-Gaussian deformation at 4 timesteps, and plot the distribution of per-Gaussian strain εi = meanj ‖ui−uj‖² / ‖xi−xj‖² over its 8 canonical neighbors.

Lego deformation field.
Lego: strain ↓ 99.62%
T-Rex deformation field.
T-Rex: strain ↓ 99.80%
Hellwarrior deformation field.
Hellwarrior: strain ↓ 99.58%
Bouncing-balls deformation field.
Bouncing-balls: strain ↓ 99.90%
Jumping-jacks deformation field.
Jumping-jacks: strain ↓ 99.84%
Stand-up deformation field.
Stand-up: strain ↓ 99.82%
Mutant deformation field.
Mutant: strain ↓ 99.64%
Hook deformation field.
Hook: strain ↓ 99.59%

Each panel shows (left) canonical cloud colored by displacement magnitude, (middle) a subsampled quiver of u(x, t=0.5), (right) the per-Gaussian strain histogram. Baseline is bimodal with heavy tails; EER collapses the distribution by two orders of magnitude. This is the direct mechanism behind EER's overfitting reduction.

Strain reduction on every scene

Scene Baseline ε EER ε Reduction
bouncingballs2.8350.0029699.90%
hellwarrior5.7850.0240899.58%
hook2.6270.0109099.59%
jumpingjacks6.7720.0110699.84%
lego1.5730.0059499.62%
mutant1.3230.0048199.64%
standup3.6860.0066799.82%
trex3.7150.0073899.80%
mean (n=8)3.5390.0092299.72%

Measured at iter 20,000 on trained 4DGS checkpoints. Strain ε is mean over k=8 canonical neighbors of ‖ui−uj‖² / ‖xi−xj‖², averaged over 4 timesteps (t=0, 0.25, 0.5, 0.75).

EER: The Paradigm Shift

EER three-panel analysis.
(a) EER λ sweep: consistent gap reduction across scenes. (b) EER increases final Gaussian count — the reverse of capacity control. (c) Per-scene gap reduction: consistent across all 8 scenes, including the pathological Lego and Hellwarrior.
Combination additivity plot.
Combinations are super-additive: GAD+EER exceeds the sum of individual reductions, confirming capacity and coherence target orthogonal failure modes.

Real-World Validation (HyperNeRF — 5 scenes)

EER transfers to real monocular video, in the regime where there is overfitting to remove. On 5 HyperNeRF scenes, with 4DGS and the same λ=0.05 tuned on synthetic D-NeRF — no per-dataset re-tuning — EER reduces the gap on the 3 high-baseline-gap scenes ($>$ 4 dB) at near-zero quality cost; on the 2 low-baseline-gap scenes ($<$ 2 dB) it is approximately neutral, as expected when the deformation field is already coherent:

Scene Baseline gap EER gap Reduction ΔTest PSNR
chickchicken5.48 dB4.61 dB+15.9%−0.20
slice-banana5.89 dB5.40 dB+8.3%+0.03
vrig-3dprinter4.49 dB3.41 dB+24.0%+0.11
high-gap subset mean (n=3)5.29 dB4.47 dB+16.1%−0.02
vrig-peel-banana0.89 dB0.83 dB+6.6%−0.23
vrig-broom21.81 dB1.83 dB−1.2%−0.21
full mean (n=5)3.71 dB3.22 dB+11.0%−0.10

4DGS on HyperNeRF, 14K iterations (stock config), RTX 3070. Both vrig-peel-banana and vrig-broom2 have baseline gaps below 2 dB (at the floor of measurable improvement); reductions on these scenes are within reproduction noise. The high-gap subset (chickchicken, slice-banana, vrig-3dprinter) is the regime where EER clearly helps: +16.1% mean reduction at $-$0.02 dB cost — effectively free. The coherence finding survives noisy poses and non-Lambertian materials in the regime where the optimizer has overfitting to remove. EER's k-NN cost scales with cloud size; we could not extend to scenes where the cloud grows beyond $\sim$100K Gaussians on consumer hardware (see Limitations).

Cross-Architecture Validation (Deformable-3DGS)

Main experiments are on 4DGS (HexPlane deformation). We ported EER and GAD to Deformable-3DGS (MLP deformation) and ran baseline + EER on three D-NeRF scenes for 20K iterations.

Phase 1: direct-transfer test at D-NeRF-tuned λ=0.05

Scene Baseline gap EER λ=0.05 gap Reduction ΔPSNR
lego13.15 dB13.56 dB-3.1%-0.02 dB
trex1.50 dB1.81 dB-20.8%-0.38 dB
hellwarrior4.08 dB3.87 dB+5.2%-0.22 dB

Direct transfer at λ=0.05 is poor (mean −6% reduction). Why? Deformable-3DGS trains with L1+0.2·(1−SSIM) vs.\ 4DGS's pure L1 — the loss magnitude is roughly 3× larger and λ=0.05 is therefore under-regularized. Our dimensional-analysis note (paper §6.2) predicts the correct λ for Deformable-3DGS is ≈ 0.15–0.30. Testing this directly:

Phase 2: λ sweep on Deformable-3DGS Lego (dimensional-analysis test)

λ Gap (dB) Train PSNR Test PSNR ΔTest Reduction
0 (baseline)13.1538.3825.23
0.0513.5638.7725.21−0.02−3.1%
0.1510.2335.5525.33+0.10+22.3%
0.308.2633.6025.34+0.11+37.2%
0.607.8233.2125.39+0.16+40.6%

Cross-scene confirmation at λ=0.30

To confirm the sweep is not Lego-specific, we replicated λ=0.30 on Hellwarrior:

SceneBaseline gapEER λ=0.30ReductionΔTest
Lego13.158.26+37.2%+0.11
Hellwarrior4.083.54+13.2%−2.44

The coherence mechanism transfers across deformation architectures and across scenes; the hyperparameter requires per-architecture (and to a lesser extent per-scene) calibration, exactly as the dimensional-analysis note predicted. On Hellwarrior the quality cost is larger at λ=0.30 (−2.44 dB); a smaller λ like 0.05 already gives +5.2% gap reduction at only −0.22 dB.

BibTeX

@article{droby2026monodygs,
  author  = {Ahmad Droby},
  title   = {Incoherent Deformation, Not Capacity: Diagnosing and
             Mitigating Overfitting in Dynamic Gaussian Splatting},
  journal = {arXiv preprint},
  year    = {2026}
}