Hi! I’m Khai, a third-year Ph.D. candidate at Department of Statistics and Data Sciences, University of Texas at Austin. I am fortunate to be advised by Professor Nhat Ho and Professor Peter Müller. I am associated with Institute for Foundations of Machine Learning (IFML) and I am a visiting student at Statistical Information Lab at The University of Texas MD Anderson Cancer Center. I graduated from Hanoi University of Science and Technology with a Computer Science Bachelor’s degree. Before joining UT Austin, I was an AI Research Resident at VinAI Research under the supervision of Dr. Hung Bui.
Research: My current works are making Optimal Transport scalable in statistical inference (low time complexity, low space complexity, low sample complexity) via the one-dimensional projection approach which is known as sliced optimal transport (sliced Wasserstein distance).
My works focus on three aspects of the SW distance:
- Slicing distributions. The vanilla sliced Wasserstein (SW) distance naively treats all one-dimensional projections the same and independently by using the uniform distribution over projecting directions. To improve and generalize the SW, I propose to search for the best distribution over projecting distributions (or the slicing distribution) which can maximize the expected projected distance. In particular, a regularized implicit family of distributions is introduced in [ICLR’21] and explicit families (von Mises-Fisher and Power Spherical) are introduced in [ICLR’21]. Moreover, I introduce the usage of amortized optimization to predict the optimal slicing distribution given two input probability measures in the setting which has various pairs of probability measures in [NeurIPS’22] and [ICML’23]. To enhance further the quality of projecting directions, I break the independence between them by imposing the first order Markov structure in [NeurIPS’23]. To avoid unstable optimization and model misspecification in designing slicing model, I propose the energy-based slicing distribution that is parameter-free and has the density proportional to an energy function of the projected one-dimensional Wasserstein distance in [NeurIPS’23].
- Projecting operators. The vanilla sliced Wasserstein distance utilizes the Radon Transform as the projecting operator. The Radon Transform simply takes the inner product between the supports of a probability measure and a projecting direction as the supports of the one-dimensional projected probability measure. To generalize the projecting operator to tensor spaces, I use the convolution operator to project probability measures over tensors to one-dimension in [NeurIPS’22]. In addition, I connect deep learning (neural networks) techniques to sliced Wasserstein by proposing Overaparameterized Radon Transform and Hierarchical Radon Transform in [ICLR’23].
- Numerical approximation. The SW distance is usually estimated by Monte Carlo integration due to the intractable expectation with respect to the slicing distribution. To reduce the variance of the Monte Carlo estimator, I first propose control variates which are based on the closed-form of the Wasserstein-2 distance between two Gaussians in [Arxiv’23]. Importantly, the proposed control variates have linear time complexity and space complexity. In addition, I propose to use low-discrepancy sequences on the sphere (Quasi-Monte Carlo) to approximate sliced Wasserstein in [Arxiv’23]. Moreover, we propose Randomized Quasi-sliced Wasserstein unbiased estimation of sliced Wasserstein which are based on randomizing low-discrepancy sequences.
Moreover, I aim to push forward the application of optimal transport, Wasserstein distance, and sliced Wasserstein distance in probabilistic Machine Learning models such as point-clouds applications [ICML’23], 3D mesh deformation [Arxiv’23], generative models [NeurIPS’22], domain adaptation [ICML’22] [ICML’22], and other tasks that need to deal with probability measures.
|Sep 21, 2023||4 papers Energy-Based Sliced Wasserstein Distance, Markovian sliced Wasserstein distances: Beyond independent projections, Designing robust Transformers using robust kernel density estimation, and Minimax optimal rate for parameter estimation in multivariate deviated models are accepted at NeurIPS2023.|
|Apr 24, 2023||1 paper Self-Attention Amortized Distributional Projection Optimization for Sliced Wasserstein Point-Cloud Reconstruction is accepted at ICML2023.|
|Jan 20, 2023||1 paper Hierarchical Sliced Wasserstein Distance is accepted at ICLR 2023.|
|Sep 14, 2022||4 papers Revisiting Sliced Wasserstein on Images: From Vectorization to Convolution, Amortized Projection Optimization for Sliced Wasserstein Generative Models, Improving Transformer with an Admixture of Attention Heads , and FourierFormer: Transformer Meets Generalized Fourier Integral Theorem are accepted at NeurIPS 2022.|
|Apr 24, 2022||2 papers Improving Mini-batch Optimal Transport via Partial Transportation and On Transportation of Mini-batches: A Hierarchical Approach are accepted at ICML 2022.|
Selected Publications [Full List](*) denotes equal contribution
- Energy-Based Sliced Wasserstein DistanceAdvances in Neural Information Processing Systems 2023
- Markovian Sliced Wasserstein Distances: Beyond Independent ProjectionsAdvances in Neural Information Processing Systems 2023
- Self-Attention Amortized Distributional Projection Optimization for Sliced Wasserstein Point-Cloud ReconstructionProceedings of the 40th International Conference on Machine Learning 2023
- Hierarchical Sliced Wasserstein DistanceInternational Conference on Learning Representations 2023
- Revisiting Sliced Wasserstein on Images: From Vectorization to ConvolutionAdvances in Neural Information Processing Systems 2022
- Amortized Projection Optimization for Sliced Wasserstein Generative ModelsAdvances in Neural Information Processing Systems 2022
- Improving Mini-batch Optimal Transport via Partial TransportationIn Proceedings of the 39th International Conference on Machine Learning 17–23 jul 2022
- On Transportation of Mini-batches: A Hierarchical ApproachIn Proceedings of the 39th International Conference on Machine Learning 17–23 jul 2022
- Improving Relational Regularized Autoencoders with Spherical Sliced Fused Gromov WassersteinIn International Conference on Learning Representations 17–23 jul 2021
- Distributional Sliced-Wasserstein and Applications to Generative ModelingIn International Conference on Learning Representations
(Spotlight)17–23 jul 2021
Selected Preprints [Full List](*) denotes equal contribution
- PreprintQuasi-Monte Carlo for 3D Sliced WassersteinarXiv preprint arXiv:2309.11713 2023
- PreprintControl Variate Sliced Wasserstein EstimatorsarXiv preprint arXiv:2305.00402 2023
- PreprintDiffeomorphic Deformation via Sliced Wasserstein Distance Optimization for Cortical Surface ReconstructionarXiv preprint arXiv:2305.17555 2023