CoGuide: Contrastive Diffusion Guidance for Spatial Inverse Problems

In submission

Code available at: GitHub

(TL;DR) We reconstruct indoor floorplans from sparse user trajectories by steering a diffusion prior with a (surrogate) likelihood score from a contrastive trajectory–layout embedding space.

Abstract

We consider the inverse problem of reconstructing the spatial layout of a place, a home floorplan for example, from a user’s movements inside that layout. Direct inversion is ill-posed since many floorplans can explain the same movement trajectories. We adopt a diffusion-based posterior sampler to generate layouts consistent with the measurements. While active research is in progress on generative inverse solvers, we find that the forward operator in our problem poses new challenges. The path planning process inside a floorplan is a non-invertible, non-differentiable function, and causes instability while optimizing using the likelihood score. We break-away from existing approaches and reformulate the likelihood score in a smoother embedding space. The embedding space is trained with a contrastive loss which brings compatible floorplans and trajectories close to each other, while pushing mismatched pairs far apart. We show that a surrogate form of the likelihood score in this embedding space is a valid approximation of the true likelihood score, making it possible to steer the denoising process towards the posterior. Across extensive experiments, our model CoGuide produces more consistent floorplans from trajectories, and is more robust than differentiable-planner baselines and guided-diffusion methods.

Introduction

In our setting, the measurement arises from a forward process that reflects human path planning. Because the path-planning operator is non-linear, non-differentiable, and only partially observed, small changes in layout can cause drastic changes in the planned path. The figure below illustrates this: the left panel shows a floorplan with the measured trajectory, while the right panel shows how planners such as A*, Neural A*, TransPath, and DiPPeR can select very different paths to a slight layout change (bottom row) triggering large path differences.

Problem overview: floorplan and paths

Method

We address the concerns raised by the forward operator by guiding a diffusion prior with a surrogate likelihood from a smooth, contrastive trajectory–layout space instead of the direct pixel-space likelihood. The resulting contrastive space (shown right) clusters compatible trajectory–layout pairs while pushing apart incompatible pairs. This yields a smoother likelihood score that is more amenable to gradient-based optimization. The contrastive model is trained with a combination of Supervised Contrastive Loss (SupCon), and an alignment loss. We also adapt Adam and DDIM into the reverse-time update to improve convergence and speed. The complete CoGuide algorithm is shown below.

Contrastive trajectory–layout embedding

Algorithm

CoGuide algorithm

Reverse Denoising Process

Below, we show several examples of the reverse denoising process under CoGuide that samples from the posterior \(p(\mathbf{x}|\mathbf{y})\). Each image has 3 columns: the leftmost column is the ground truth floorplan \(\mathbf{x}\), the middle column is the measured trajectory \(\mathbf{y}\), and the rightmost column shows the denoising process \(\mathbf{x}_T \rightarrow \mathbf{x}_0\) , from pure noise to the final output.

Reverse denoising example 1
Reverse denoising example 2
Reverse denoising example 3
Reverse denoising example 4
Reverse denoising example 5
Reverse denoising example 6
Reverse denoising example 7
Reverse denoising example 8

Results

Qualitative

CoGuide produces floorplans that align with the measured trajectory while avoiding common artifacts from planner-guided methods (see below). DPS+planner variants, DiffPIR, and DMPlug frequently violate trajectory consistency or introduce spurious walls. Although CFG can score well on metrics, its visuals are not always faithful and generates artifacts. CoGuide yields cleaner, trajectory-consistent layouts across diverse test scenes.

Qualitative results set 1 Qualitative results set 1 Qualitative results set 2

Quantitative

In the table shown below, we report F1/IoU (mean Â± std) across three trajectory-density regimes (low, medium, high). CoGuide leads in the sparse and moderate settings, surpassing CFG and DPS variants, and remains competitive in the dense regime where CFG is strongest. Overall, CoGuide consistently outperforms DPS-based (differentiable) planners, DiffPIR, and DMPlug.

Main quantitative results

Ablations

Noise Robustness

To model real-world localization errors, we inject Gaussian noise during trajectory generation and sweep the noise standard deviation across densities. The resulting comparison plot is shown below. As noise increases, performance degrades gracefully; higher trajectory density mitigates the impact and sustains stronger accuracy.

Gaussian noise ablation and performance

Optimizers and Samplers

We incorporate Adam into the reverse-time diffusion and compare against standard gradient descent under both DDPM and DDIM samplers. The table below demonstrates that Adam consistently improves convergence and final metrics across samplers, while DDIM achieves comparable accuracy to DDPM with fewer steps.

  • DDPM vs. DDIM: speed–accuracy trade-off is minimal under our guidance.
  • Adam vs. SGD: Adam converges faster; final metrics are comparable.
Comparison across Adam/SGD with DDPM/DDIM

Uncertainty vs. Trajectory Density

We also draw multiple samples from the posterior and compute the variance of the distance transform (with small translation tolerance) to estimate spatial uncertainty (see fig below). As trajectory density increases (across rows), uncertainty shrinks (reduction in the amount of "redness"), highlighting where additional user-collected trajectories would be most informative in a user-in-the-loop setting.

Uncertainty visualization 1
Uncertainty visualization 2