Hi! I’m Khai, a third-year Ph.D. candidate at Department of Statistics and Data Sciences, University of Texas at Austin. I am fortunate to be advised by Professor Nhat Ho and Professor Peter Müller. I am associated with Institute for Foundations of Machine Learning (IFML) and I am a visiting student at Statistical Information Lab at The University of Texas MD Anderson Cancer Center. I graduated from Hanoi University of Science and Technology with a Computer Science Bachelor’s degree. Before joining UT Austin, I was an AI Research Resident at VinAI Research under the supervision of Dr. Hung Bui.
Research: My current works are making Optimal Transport scalable in statistical inference (low time complexity, low space complexity, low sample complexity) via the one-dimensional projection approach which is known as sliced optimal transport (sliced Wasserstein distance).
My works focus on three aspects of the SW distance:
Slicing distributions. The vanilla sliced Wasserstein (SW) distance naively treats all one-dimensional projections the same and independently by using the uniform distribution over projecting directions. To improve and generalize the SW, I propose to search for the best distribution over projecting distributions (or the slicing distribution) which can maximize the expected projected distance. In particular, a regularized implicit family of distributions is introduced in [ICLR'21] and explicit families (von Mises-Fisher and Power Spherical) are introduced in [ICLR'21]. Moreover, I introduce the usage of amortized optimization to predict the optimal slicing distribution given two input probability measures in the setting which has various pairs of probability measures in [NeurIPS'22] and [ICML'23]. To enhance further the quality of projecting directions, I break the independence between them by imposing the first order Markov structure in [NeurIPS'23]. To avoid unstable optimization and model misspecification in designing slicing model, I propose the energy-based slicing distribution that is parameter-free and has the density proportional to an energy function of the projected one-dimensional Wasserstein distance in [NeurIPS'23]. To push forward further the optimization-free direction, I propose the random-path projecting direction in [Arxiv'24].
Projecting operators. The vanilla sliced Wasserstein distance utilizes the Radon Transform as the projecting operator. The Radon Transform simply takes the inner product between the supports of a probability measure and a projecting direction as the supports of the one-dimensional projected probability measure. To generalize the projecting operator to tensor spaces, I use the convolution operator to project probability measures over tensors to one-dimension in [NeurIPS'22]. In addition, I connect deep learning (neural networks) techniques to sliced Wasserstein by proposing Overaparameterized Radon Transform and Hierarchical Radon Transform in [ICLR'23]].
Numerical approximation. The SW distance is usually estimated by Monte Carlo integration due to the intractable expectation with respect to the slicing distribution. To reduce the variance of the Monte Carlo estimator, I first propose control variates which are based on the closed-form of the Wasserstein-2 distance between two Gaussians in [ICLR'24]. Importantly, the proposed control variates have linear time complexity and space complexity. In addition, I propose to use low-discrepancy sequences on the sphere (Quasi-Monte Carlo) to approximate sliced Wasserstein in [ICLR'24]. Moreover, we propose Randomized Quasi-sliced Wasserstein, an unbiased estimation of sliced Wasserstein which are based on randomizing low-discrepancy sequences.
Moreover, I aim to push forward the application of optimal transport, Wasserstein distance, and sliced Wasserstein distance in probabilistic Machine Learning models such as point-clouds applications [ICML'23], 3D mesh deformation [ICLR'24], generative models (GANs, Diffusion Models) [NeurIPS'22] [Arxiv'24], domain adaptation [ICML'22], [ICML'22], multimodal AI [ICLR'24], and other tasks that need to deal with probability measures.
|Jan 19, 2024
|2 papers Towards Convergence Rates for Parameter Estimation in Gaussian-gated Mixture of Experts, On Parameter Estimation in Deviated Gaussian Mixture of Experts are accepted at AISTATS2024.
|Jan 16, 2024
|4 papers Quasi-Monte Carlo for 3D Sliced Wasserstein - Spotlight Presentation, Sliced Wasserstein Estimation with Control Variates, Diffeomorphic Deformation via Sliced Wasserstein Distance Optimization for Cortical Surface Reconstruction, and Revisiting Deep Audio-Text Retrieval Through the Lens of Transportation are accepted at ICLR2024.
|Sep 21, 2023
|4 papers Energy-Based Sliced Wasserstein Distance, Markovian sliced Wasserstein distances: Beyond independent projections, Designing robust Transformers using robust kernel density estimation, and Minimax optimal rate for parameter estimation in multivariate deviated models are accepted at NeurIPS2023.
|Apr 24, 2023
|1 paper Self-Attention Amortized Distributional Projection Optimization for Sliced Wasserstein Point-Cloud Reconstruction is accepted at ICML2023.
|Jan 20, 2023
|1 paper Hierarchical Sliced Wasserstein Distance is accepted at ICLR 2023.
|Sep 14, 2022
|4 papers Revisiting Sliced Wasserstein on Images: From Vectorization to Convolution, Amortized Projection Optimization for Sliced Wasserstein Generative Models, Improving Transformer with an Admixture of Attention Heads , and FourierFormer: Transformer Meets Generalized Fourier Integral Theorem are accepted at NeurIPS 2022.
|Apr 24, 2022
|2 papers Improving Mini-batch Optimal Transport via Partial Transportation and On Transportation of Mini-batches: A Hierarchical Approach are accepted at ICML 2022.
|Jan 24, 2021
|2 papers Distributional Sliced-Wasserstein and Applications to Generative Modeling - Spotlight Presentation and DImproving Relational Regularized Autoencoders with Spherical Sliced Fused Gromov Wasserstein are accepted at ICLR2021.
Selected Publications [Full List](*) denotes equal contribution
- ICLRSpotlightQuasi-Monte Carlo for 3D Sliced WassersteinInternational Conference on Learning RepresentationsSpotlight Presentation [Top 5%]
- Sliced Wasserstein Estimation with Control VariatesInternational Conference on Learning Representations
- Energy-Based Sliced Wasserstein DistanceNeural Information Processing Systems
- Markovian Sliced Wasserstein Distances: Beyond Independent ProjectionsNeural Information Processing Systems
- Self-Attention Amortized Distributional Projection Optimization for Sliced Wasserstein Point-Cloud ReconstructionInternational Conference on Machine Learning
- Hierarchical Sliced Wasserstein DistanceInternational Conference on Learning Representations
- Revisiting Sliced Wasserstein on Images: From Vectorization to ConvolutionNeural Information Processing Systems
- Amortized Projection Optimization for Sliced Wasserstein Generative ModelsNeural Information Processing Systems
- Improving Mini-batch Optimal Transport via Partial TransportationIn International Conference on Machine Learning
- On Transportation of Mini-batches: A Hierarchical ApproachIn International Conference on Machine Learning
- Improving Relational Regularized Autoencoders with Spherical Sliced Fused Gromov WassersteinIn International Conference on Learning Representations
- ICLRSpotlightDistributional Sliced-Wasserstein and Applications to Generative ModelingIn International Conference on Learning RepresentationsSpotlight Presentation [Top 3.78%]
Selected Preprints [Full List](*) denotes equal contribution
- PreprintSliced Wasserstein with Random-Path Projecting DirectionsUnder Review