We propose a cascaded model that generates high-resolution information in tabular data based on low-resolution features, including latent variables derived from numerical features. We impose a data-dependent coupling that provably reduces the transport cost bound. Together with learnable, non-linear probability paths, this greatly improves the realism of the generated data.
@inproceedings{mueller2026,title={Cascaded {{Flow Matching}} for {{Heterogeneous Tabular Data}} with {{Mixed-Type Features}}},booktitle={Under review.},author={Mueller, Markus and Gruber, Kathrin and Fok, Dennis},year={2026},idea={We propose a cascaded model that generates high-resolution information in tabular data based on low-resolution
features, including latent variables derived from numerical features. We impose a data-dependent coupling
that provably reduces the transport cost bound. Together with learnable, non-linear probability paths, this greatly
improves the realism of the generated data.}}
The main idea was to develop a diffusion model that integrates continuous and categorical effectively and efficiently. We aimed to unify both feature types in continuous space and to balance their losses to avoid implicit importance weights that impact sample quality.
@inproceedings{mueller2025,title={Continuous {{Diffusion}} for {{Mixed-Type Tabular Data}}},booktitle={International {{Conference}} on {{Learning Representations}}},author={Mueller, Markus and Gruber, Kathrin and Fok, Dennis},year={2025},idea={The main idea was to develop a diffusion model that integrates continuous and categorical effectively and efficiently. We aimed to unify both feature types in continuous space and to balance their losses to avoid implicit importance weights that impact sample quality.}}