Khai Nguyen
Ph.D. Candidate at Department of Statistics and Data Sciences, University of Texas at Austin

Hi! I’m Khai, a third-year Ph.D. candidate at Department of Statistics and Data Sciences, University of Texas at Austin. I am fortunate to be advised by Professor Nhat Ho and Professor Peter Müller. I am associated with Institute for Foundations of Machine Learning (IFML) and I am a visiting student at Statistical Information Lab at The University of Texas MD Anderson Cancer Center. I graduated from Hanoi University of Science and Technology with a Computer Science Bachelor’s degree. Before joining UT Austin, I was an AI Research Resident at VinAI Research under the supervision of Dr. Hung Bui.
Research: My current works are making Optimal Transport scalable in statistical inference (low time complexity, low space complexity, low sample complexity) via the one-dimensional projection approach which is known as sliced optimal transport (sliced Wasserstein distance).
My works focus on three aspects of the SW distance:
- Slicing distributions. The vanilla sliced Wasserstein (SW) distance naively treats all one-dimensional projections the same and independently by using the uniform distribution over projecting directions. To improve and generalize the SW, I propose to search for the best distribution over projecting distributions (or the slicing distribution) which can maximize the expected projected distance. In particular, a regularized implicit family of distributions is introduced in [ICLR’21] and explicit families (von Mises-Fisher and Power Spherical) are introduced in [ICLR’21]. Moreover, I introduce the usage of amortized optimization to predict the optimal slicing distribution given two input probability measures in the setting which has various pairs of probability measures in [NeurIPS’22] and [ICML’23]. To enhance further the quality of projecting directions, I break the independence between them by imposing the first order Markov structure in [NeurIPS’23]. To avoid unstable optimization and model misspecification in designing slicing model, I propose the energy-based slicing distribution that is parameter-free and has the density proportional to an energy function of the projected one-dimensional Wasserstein distance in [NeurIPS’23].
- Projecting operators. The vanilla sliced Wasserstein distance utilizes the Radon Transform as the projecting operator. The Radon Transform simply takes the inner product between the supports of a probability measure and a projecting direction as the supports of the one-dimensional projected probability measure. To generalize the projecting operator to tensor spaces, I use the convolution operator to project probability measures over tensors to one-dimension in [NeurIPS’22]. In addition, I connect deep learning (neural networks) techniques to sliced Wasserstein by proposing Overaparameterized Radon Transform and Hierarchical Radon Transform in [ICLR’23].
- Numerical approximation. The SW distance is usually estimated by Monte Carlo integration due to the intractable expectation with respect to the slicing distribution. To reduce the variance of the Monte Carlo estimator, I first propose control variates which are based on the closed-form of the Wasserstein-2 distance between two Gaussians in [Arxiv’23]. Importantly, the proposed control variates have linear time complexity and space complexity. In addition, I propose to use low-discrepancy sequences on the sphere (Quasi-Monte Carlo) to approximate sliced Wasserstein in [Arxiv’23]. Moreover, we propose Randomized Quasi-sliced Wasserstein unbiased estimation of sliced Wasserstein which are based on randomizing low-discrepancy sequences.
Moreover, I aim to push forward the application of optimal transport, Wasserstein distance, and sliced Wasserstein distance in probabilistic Machine Learning models such as point-clouds applications [ICML’23], 3D mesh deformation [Arxiv’23], generative models [NeurIPS’22], domain adaptation [ICML’22] [ICML’22], and other tasks that need to deal with probability measures.
News
Sep 21, 2023 | 4 papers Energy-Based Sliced Wasserstein Distance, Markovian sliced Wasserstein distances: Beyond independent projections, Designing robust Transformers using robust kernel density estimation, and Minimax optimal rate for parameter estimation in multivariate deviated models are accepted at NeurIPS2023. |
---|---|
Apr 24, 2023 | 1 paper Self-Attention Amortized Distributional Projection Optimization for Sliced Wasserstein Point-Cloud Reconstruction is accepted at ICML2023. |
Jan 20, 2023 | 1 paper Hierarchical Sliced Wasserstein Distance is accepted at ICLR 2023. |
Sep 14, 2022 | 4 papers Revisiting Sliced Wasserstein on Images: From Vectorization to Convolution, Amortized Projection Optimization for Sliced Wasserstein Generative Models, Improving Transformer with an Admixture of Attention Heads , and FourierFormer: Transformer Meets Generalized Fourier Integral Theorem are accepted at NeurIPS 2022. |
Apr 24, 2022 | 2 papers Improving Mini-batch Optimal Transport via Partial Transportation and On Transportation of Mini-batches: A Hierarchical Approach are accepted at ICML 2022. |