Khai Nguyen

Hi! I’m Khai, a final-year Ph.D. candidate at Department of Statistics and Data Sciences, University of Texas at Austin. I am fortunate to be advised by Professor Nhat Ho and Professor Peter Müller, and to be associated with Institute for Foundations of Machine Learning (IFML). I graduated from Hanoi University of Science and Technology with a Computer Science Bachelor’s degree. Before joining UT Austin, I was an AI Research Resident at VinAI Research (acquired by Qualcomm AI Research) under the supervision of Dr. Hung Bui.

I’m always open to collaborations, discussions, and exploring new opportunities. Don’t hesitate to reach out if you’re interested in my research or want to discuss potential research projects.

GIF description

(This video is created by using my proposed energy-based sliced Wasserstein distance.)

Research: My research focuses on both fundamental problems and applied problems in probabilistic machine learning, deep learning, and statistics.

1. Computational Optimal Transport. My research makes Optimal Transport scalable in statistical inference (low time complexity, low space complexity, low sample complexity) via the one-dimensional projection approach which is known as sliced optimal transport (sliced Wasserstein distance). My work focuses on four key aspects of sliced Wasserstein: numerical approximation, projecting operator, quantile estimation, and slicing distribution.

2. Efficiency, Scalability, Interpretability, and Trustworthiness of Machine Learning. My research enhances the performance of 3D vision models, speeds up the training of generative models, adapts prediction models to new unseen domains, explains multimodal transferable representation, and ensures fairness and robustness in learning processes.

3. Machine Learning for Biological Sciences. My research quantifies interactions between cell types which drive various physiological and pathological processes.

News

May 01, 2025	1 paper Lightspeed Geometric Dataset Distance via Sliced Optimal Transport is accepted at ICML 2025.
Mar 07, 2025	My proposal Summarizing Bayesian Nonparametric Mixture Posterior - Sliced Optimal Transport Metrics for Gaussian Mixtures is accepted at The Bayesian Young Statisticians Meeting 2025 as a talk in a session with discussion.
Feb 27, 2025	I’m thrilled to be awarded a travel grant for International Conference on Bayesian Nonparametrics (BNP 14).
Feb 26, 2025	I’m thrilled to be awarded a UT Austin Continuing Fellowship which is a merit-based fellowship awarded based on academic achievements, research accomplishments, and potential for future contributions.
Feb 11, 2025	Our paper Towards Marginal Fairness Sliced Wasserstein Barycenter is selected as a spotlight at ICLR 2025.
Jan 26, 2025	My proposal Summarizing Bayesian Nonparametric Mixture Posterior - Sliced Optimal Transport Metrics for Gaussian Mixtures is accepted at 14th International Conference on Bayesian Nonparametrics as a contributed talk.
Jan 22, 2025	1 paper Towards Marginal Fairness Sliced Wasserstein Barycenter is accepted at ICLR 2025.
Sep 26, 2024	1 paper Hierarchical Hybrid Sliced Wasserstein: A Scalable Metric for Heterogeneous Joint Distributions is accepted at NeurIPS 2024.
May 01, 2024	1 paper Sliced Wasserstein with Random-Path Projecting Directions is accepted at ICML 2024.
Feb 27, 2024	1 paper Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning is accepted at CVPR 2024.
Jan 19, 2024	2 papers Towards Convergence Rates for Parameter Estimation in Gaussian-gated Mixture of Experts, On Parameter Estimation in Deviated Gaussian Mixture of Experts are accepted at AISTATS 2024.
Jan 16, 2024	4 papers Quasi-Monte Carlo for 3D Sliced Wasserstein - Spotlight Presentation, Sliced Wasserstein Estimation with Control Variates, Diffeomorphic Deformation via Sliced Wasserstein Distance Optimization for Cortical Surface Reconstruction, and Revisiting Deep Audio-Text Retrieval Through the Lens of Transportation are accepted at ICLR 2024.
Sep 21, 2023	4 papers Energy-Based Sliced Wasserstein Distance, Markovian sliced Wasserstein distances: Beyond independent projections, Designing robust Transformers using robust kernel density estimation, and Minimax optimal rate for parameter estimation in multivariate deviated models are accepted at NeurIPS 2023.
Apr 24, 2023	1 paper Self-Attention Amortized Distributional Projection Optimization for Sliced Wasserstein Point-Cloud Reconstruction is accepted at ICML 2023.
Jan 20, 2023	1 paper Hierarchical Sliced Wasserstein Distance is accepted at ICLR 2023.
Sep 14, 2022	4 papers Revisiting Sliced Wasserstein on Images: From Vectorization to Convolution, Amortized Projection Optimization for Sliced Wasserstein Generative Models, Improving Transformer with an Admixture of Attention Heads , and FourierFormer: Transformer Meets Generalized Fourier Integral Theorem are accepted at NeurIPS 2022.
Apr 24, 2022	2 papers Improving Mini-batch Optimal Transport via Partial Transportation and On Transportation of Mini-batches: A Hierarchical Approach are accepted at ICML 2022.
Jan 24, 2021	2 papers Distributional Sliced-Wasserstein and Applications to Generative Modeling - Spotlight Presentation and Improving Relational Regularized Autoencoders with Spherical Sliced Fused Gromov Wasserstein are accepted at ICLR 2021.

Selected Preprints

Preprint

Streaming Sliced Optimal Transport

Khai Nguyen

Under Review, 2025

Abs PDF Code

Sliced optimal transport (SOT) or sliced Wasserstein (SW) distance is widely recognized for its statistical and computational scalability. In this work, we further enhance the computational scalability by proposing the first method for computing SW from sample streams, called \emphstreaming sliced Wasserstein (Stream-SW). To define Stream-SW, we first introduce the streaming computation of the one-dimensional Wasserstein distance. Since the one-dimensional Wasserstein (1DW) distance has a closed-form expression, given by the absolute difference between the quantile functions of the compared distributions, we leverage quantile approximation techniques for sample streams to define the streaming 1DW distance. By applying streaming 1DW to all projections, we obtain Stream-SW. The key advantage of Stream-SW is its low memory complexity while providing theoretical guarantees on the approximation error. We demonstrate that Stream-SW achieves a more accurate approximation of SW than random subsampling, with lower memory consumption, in comparing Gaussian distributions and mixtures of Gaussians from streaming samples. Additionally, we conduct experiments on point cloud classification, point cloud gradient flows, and streaming change point detection to further highlight the favorable performance of Stream-SW
Preprint

Bayesian Density-Density Regression with Application to Cell-Cell Communications

Khai Nguyen, Yang Ni, and Peter Mueller

Under Review, 2025

Abs PDF

We introduce a scalable framework for regressing multivariate distributions onto multivariate distributions, motivated by the application of inferring cell-cell communication from population-scale single-cell data. The observed data consist of pairs of multivariate distributions for ligands from one cell type and corresponding receptors from another. For each ordered pair e=(l,r) of cell types (l≠r) and each sample i=1,…,n, we observe a pair of distributions (Fei,Gei) of gene expressions for ligands and receptors of cell types l and r, respectively. The aim is to set up a regression of receptor distributions Gei given ligand distributions Fei. A key challenge is that these distributions reside in distinct spaces of differing dimensions. We formulate the regression of multivariate densities on multivariate densities using a generalized Bayes framework with the sliced Wasserstein distance between fitted and observed distributions. Finally, we use inference under such regressions to define a directed graph for cell-cell communications.
Preprint

Summarizing Bayesian Nonparametric Mixture Posterior - Sliced Optimal Transport Metrics for Gaussian Mixtures

Khai Nguyen, and Peter Mueller

Under Revision, 2025

Contributed Talk at BNP14, Talk in session with discussion at BAYSM25
Abs PDF Code

Existing methods to summarize posterior inference for mixture models focus on identifying a point estimate of the implied random partition for clustering, with density estimation as a secondary goal (Wade and Ghahramani, 2018; Dahl et al., 2022). We propose a novel approach for summarizing posterior inference in nonparametric Bayesian mixture models, prioritizing density estimation of the mixing measure (or mixture) as an inference target. One of the key features is the model-agnostic nature of the approach, which remains valid under arbitrarily complex dependence structures in the underlying sampling model. Using a decision-theoretic framework, our method identifies a point estimate by minimizing posterior expected loss. A loss function is defined as a discrepancy between mixing measures. Estimating the mixing measure implies inference on the mixture density. Exploiting the discrete nature of the mixing measure, we use a version of sliced Wasserstein distance. We introduce two specific variants for Gaussian mixtures. The first, mixed sliced Wasserstein, applies generalized geodesic projections on the product of the Euclidean space and the manifold of symmetric positive definite matrices. The second, sliced mixture Wasserstein, leverages the linearity of Gaussian mixture measures for efficient projection.

Selected Publications

NeurIPS

Hierarchical Hybrid Sliced Wasserstein: A Scalable Metric for Heterogeneous Joint Distributions

Khai Nguyen, and Nhat Ho

Neural Information Processing Systems, 2024

Abs PDF Code

Sliced Wasserstein (SW) and Generalized Sliced Wasserstein (GSW) have been widely used in applications due to their computational and statistical scalability. However, the SW and the GSW are only defined between distributions supported on a homogeneous domain. This limitation prevents their usage in applications with heterogeneous joint distributions with marginal distributions supported on multiple different domains. Using SW and GSW directly on the joint domains cannot make a meaningful comparison since their homogeneous slicing operator i.e., Radon Transform (RT) and Generalized Radon Transform (GRT) are not expressive enough to capture the structure of the joint supports set. To address the issue, we propose two new slicing operators i.e., Partial Generalized Radon Transform (PGRT) and Hierarchical Hybrid Radon Transform (HHRT). In greater detail, PGRT is the generalization of Partial Radon Transform (PRT), which transforms a subset of function arguments non-linearly while HHRT is the composition of PRT and multiple domain-specific PGRT on marginal domain arguments. By using HHRT, we extend the SW into Hierarchical Hybrid Sliced Wasserstein (H2SW) distance which is designed specifically for comparing heterogeneous joint distributions. We then discuss the topological, statistical, and computational properties of H2SW. Finally, we demonstrate the favorable performance of H2SW in 3D mesh deformation, deep 3D mesh autoencoders, and datasets comparison.
ICLR Spotlight

Quasi-Monte Carlo for 3D Sliced Wasserstein

Khai Nguyen, Nicolas Bariletto, and Nhat Ho

International Conference on Learning Representations, 2024

Spotlight Presentation [5%]
Abs PDF Code

Spotlight

Monte Carlo (MC) approximation has been used as the standard computation approach for the Sliced Wasserstein (SW) distance, which has an intractable expectation in its analytical form. However, the MC method is not optimal in terms of minimizing the absolute approximation error. To provide a better class of empirical SW, we propose quasi-sliced Wasserstein (QSW) approximations that rely on Quasi-Monte Carlo (QMC) methods. For a comprehensive investigation of QMC for SW, we focus on the 3D setting, specifically computing the SW between probability measures in three dimensions. In greater detail, we empirically verify various ways of constructing QMC points sets on the 3D unit-hypersphere, including Gaussian-based mapping, equal area mapping, generalized spiral points, and optimizing discrepancy energies. Furthermore, to obtain an unbiased estimation for stochastic optimization, we extend QSW into Randomized Quasi-Sliced Wasserstein (RQSW) by introducing randomness to the discussed low-discrepancy sequences. For theoretical properties, we prove the asymptotic convergence of QSW and the unbiasedness of RQSW. Finally, we conduct experiments on various 3D tasks, such as point-cloud comparison, point-cloud interpolation, image style transfer, and training deep point-cloud autoencoders, to demonstrate the favorable performance of the proposed QSW and RQSW variants.
ICLR

Sliced Wasserstein Estimation with Control Variates

Khai Nguyen, and Nhat Ho

International Conference on Learning Representations, 2024

Abs PDF Code

The sliced Wasserstein (SW) distances between two probability measures are defined as the expectation of the Wasserstein distance between two one-dimensional projections of the two measures. The randomness comes from a projecting direction that is used to project the two input measures to one dimension. Due to the intractability of the expectation, Monte Carlo integration is performed to estimate the value of the SW distance. Despite having various variants, there has been no prior work that improves the Monte Carlo estimation scheme for the SW distance in terms of controlling its variance. To bridge the literature on variance reduction and the literature on the SW distance, we propose computationally efficient control variates to reduce the variance of the empirical estimation of the SW distance. The key idea is to first find Gaussian approximations of projected one-dimensional measures, then we utilize the closed-form of the Wasserstein-2 distance between two Gaussian distributions to design the control variates. In particular, we propose using a lower bound and an upper bound of the Wasserstein-2 distance between two fitted Gaussians as two computationally efficient control variates. We empirically show that the proposed control variate estimators can help to reduce the variance considerably when comparing measures over images and point-clouds. Finally, we demonstrate the favorable performance of the proposed control variate estimators in gradient flows to interpolate between two point-clouds and in deep generative modeling on standard image datasets, such as CIFAR10 and CelebA.
NeurIPS

Energy-Based Sliced Wasserstein Distance

Khai Nguyen, and Nhat Ho

Neural Information Processing Systems, 2023

Abs PDF Code

The sliced Wasserstein (SW) distance has been widely recognized as a statistically effective and computationally efficient metric between two probability measures. A key component of the SW distance is the slicing distribution. There are two existing approaches for choosing this distribution. The first approach is using a fixed prior distribution. The second approach is optimizing for the best distribution which belongs to a parametric family of distributions and can maximize the expected distance. However, both approaches have their limitations. A fixed prior distribution is non-informative in terms of highlighting projecting directions that can discriminate two general probability measures. Doing optimization for the best distribution is often expensive and unstable. Moreover, designing the parametric family of the candidate distribution could be easily misspecified. To address the issues, we propose to design the slicing distribution as an energy-based distribution that is parameter-free and has the density proportional to an energy function of the projected one-dimensional Wasserstein distance. We then derive a novel sliced Wasserstein metric, energy-based sliced Waserstein (EBSW) distance, and investigate its topological, statistical, and computational properties via importance sampling, sampling importance resampling, and Markov Chain methods. Finally, we conduct experiments on point-cloud gradient flow, color transfer, and point-cloud reconstruction to show the favorable performance of the EBSW.
ICML

Improving Mini-batch Optimal Transport via Partial Transportation

Khai Nguyen^*, Dang Nguyen^*, The-Anh Vu Le, and 2 more authors

International Conference on Machine Learning, 2022

Abs PDF Code

Mini-batch optimal transport (m-OT) has been widely used recently to deal with the memory issue of OT in large-scale applications. Despite their practicality, m-OT suffers from misspecified mappings, namely, mappings that are optimal on the mini-batch level but are partially wrong in the comparison with the optimal transportation plan between the original measures. Motivated by the misspecified mappings issue, we propose a novel mini-batch method by using partial optimal transport (POT) between mini-batch empirical measures, which we refer to as mini-batch partial optimal transport (m-POT). Leveraging the insight from the partial transportation, we explain the source of misspecified mappings from the m-OT and motivate why limiting the amount of transported masses among mini-batches via POT can alleviate the incorrect mappings. Finally, we carry out extensive experiments on various applications such as deep domain adaptation, partial domain adaptation, deep generative model, color transfer, and gradient flow to demonstrate the favorable performance of m-POT compared to current mini-batch methods.
ICLR Spotlight

Distributional Sliced-Wasserstein and Applications to Generative Modeling

Khai Nguyen, Nhat Ho, Tung Pham, and 1 more author

International Conference on Learning Representations, 2021

Spotlight Presentation [3.78%]
Abs PDF Code

Spotlight

Sliced-Wasserstein distance (SW) and its variant, Max Sliced-Wasserstein distance (Max-SW), have been used widely in the recent years due to their fast computation and scalability even when the probability measures lie in a very high dimensional space. However, SW requires many unnecessary projection samples to approximate its value while Max-SW only uses the most important projection, which ignores the information of other useful directions. In order to account for these weaknesses, we propose a novel distance, named Distributional Sliced-Wasserstein distance (DSW), that finds an optimal distribution over projections that can balance between exploring distinctive projecting directions and the informativeness of projections themselves. We show that the DSW is a generalization of Max-SW, and it can be computed efficiently by searching for the optimal push-forward measure over a set of probability measures over the unit sphere satisfying certain regularizing constraints that favor distinct directions. Finally, we conduct extensive experiments with large-scale datasets to demonstrate the favorable performances of the proposed distances over the previous sliced-based distances in generative modeling applications.