We consider the inverse problem of reconstructing the spatial layout of a place, a home floorplan for example, from a user’s movements inside that layout. Direct inversion is ill-posed since many floorplans can explain the same movement trajectories. We adopt a diffusion-based posterior sampler to generate layouts consistent with the measurements. While active research is in progress on generative inverse solvers, we find that the forward operator in our problem poses new challenges. The path planning process inside a floorplan is a non-invertible, non-differentiable function, and causes instability while optimizing using the likelihood score. We break-away from existing approaches and reformulate the likelihood score in a smoother embedding space. The embedding space is trained with a contrastive loss which brings compatible floorplans and trajectories close to each other, while pushing mismatched pairs far apart. We show that a surrogate form of the likelihood score in this embedding space is a valid approximation of the true likelihood score, making it possible to steer the denoising process towards the posterior. Across extensive experiments, our model CoGuide produces more consistent floorplans from trajectories, and is more robust than differentiable-planner baselines and guided-diffusion methods.
Abstract
Introduction
In our setting, the measurement arises from a forward process that reflects human path planning. Because the path-planning operator is non-linear, non-differentiable, and only partially observed, small changes in layout can cause drastic changes in the planned path. The figure below illustrates this: the left panel shows a floorplan with the measured trajectory, while the right panel shows how planners such as A*, Neural A*, TransPath, and DiPPeR can select very different paths to a slight layout change (bottom row) triggering large path differences.
Method
We address the concerns raised by the forward operator by guiding a diffusion prior with a surrogate likelihood from a smooth, contrastive trajectory–layout space instead of the direct pixel-space likelihood. The resulting contrastive space (shown right) clusters compatible trajectory–layout pairs while pushing apart incompatible pairs. This yields a smoother likelihood score that is more amenable to gradient-based optimization. The contrastive model is trained with a combination of Supervised Contrastive Loss (SupCon), and an alignment loss. We also adapt Adam and DDIM into the reverse-time update to improve convergence and speed. The complete CoGuide algorithm is shown below.
Algorithm
Reverse Denoising Process
Below, we show several examples of the reverse denoising process under CoGuide that samples from the posterior \(p(\mathbf{x}|\mathbf{y})\). Each image has 3 columns: the leftmost column is the ground truth floorplan \(\mathbf{x}\), the middle column is the measured trajectory \(\mathbf{y}\), and the rightmost column shows the denoising process \(\mathbf{x}_T \rightarrow \mathbf{x}_0\) , from pure noise to the final output.








Results
Qualitative
CoGuide produces floorplans that align with the measured trajectory while avoiding common artifacts from planner-guided methods (see below). DPS+planner variants, DiffPIR, and DMPlug frequently violate trajectory consistency or introduce spurious walls. Although CFG can score well on metrics, its visuals are not always faithful and generates artifacts. CoGuide yields cleaner, trajectory-consistent layouts across diverse test scenes.
Quantitative
In the table shown below, we report F1/IoU (mean ± std) across three trajectory-density regimes (low, medium, high). CoGuide leads in the sparse and moderate settings, surpassing CFG and DPS variants, and remains competitive in the dense regime where CFG is strongest. Overall, CoGuide consistently outperforms DPS-based (differentiable) planners, DiffPIR, and DMPlug.
Ablations
Noise Robustness
To model real-world localization errors, we inject Gaussian noise during trajectory generation and sweep the noise standard deviation across densities. The resulting comparison plot is shown below. As noise increases, performance degrades gracefully; higher trajectory density mitigates the impact and sustains stronger accuracy.
Optimizers and Samplers
We incorporate Adam into the reverse-time diffusion and compare against standard gradient descent under both DDPM and DDIM samplers. The table below demonstrates that Adam consistently improves convergence and final metrics across samplers, while DDIM achieves comparable accuracy to DDPM with fewer steps.
- DDPM vs. DDIM: speed–accuracy trade-off is minimal under our guidance.
- Adam vs. SGD: Adam converges faster; final metrics are comparable.
Uncertainty vs. Trajectory Density
We also draw multiple samples from the posterior and compute the variance of the distance transform (with small translation tolerance) to estimate spatial uncertainty (see fig below). As trajectory density increases (across rows), uncertainty shrinks (reduction in the amount of "redness"), highlighting where additional user-collected trajectories would be most informative in a user-in-the-loop setting.