Sliced-Regularized Optimal Transport

Sliced-Regularized Optimal Transport

In this post, we break down a new regularized optimal transport problem Sliced-Regularized Optimal Transport (SROT).

Overview

Optimal Transport (OT) is a powerful framework for comparing probability distributions, but it is often computationally expensive. A widely used workaround is entropic regularization (EOT), which makes OT scalable via Sinkhorn iterations. However, EOT introduces bias because it regularizes toward an uninformative prior — the independent coupling.

This post introduces a different idea:

What if we regularize OT toward a meaningful prior instead?

This leads to Sliced-Regularized Optimal Transport (SROT):

  • Replace the independent coupling with a sliced OT (SOT) plan
  • Keep the same KL regularization structure
  • Retain Sinkhorn-style efficiency

The result is a method that is as scalable as EOT, but closer to true OT.


1. From OT to Regularized OT

Optimal Transport

Given distributions $\mu $ and $\nu $, OT solves:

\[\pi^\star = \text{arg}\min_{\pi \in \Pi(\mu,\nu)} \int c(x,y)\, d\pi(x,y)\]

This finds the most efficient way to move mass from $\mu $ to $\nu $.


Entropic OT (EOT)

EOT adds a KL penalty:

\[\pi^\star_\varepsilon = \text{arg}\min_{\pi \in \Pi(\mu,\nu)} \int c(x,y)\, d\pi(x,y) + \varepsilon \, \mathrm{KL}(\pi \mid \mu \otimes \nu)\]

Key idea:

  • Encourages smooth plans
  • Reference = independent coupling $\mu \otimes \nu $

Limitation:
The prior is not informative about the true OT structure.


2. Sliced OT as a Prior

Sliced OT (SOT) approximates OT by:

  1. Projecting data onto 1D directions
  2. Solving 1D OT (cheap, closed-form)
  3. Lifting back to high dimensions
  4. Averaging over projections

Final SOT plan:

\[\pi^{\mathrm{SOT}} = \mathbb{E}_\theta[\pi_\theta]\]

Properties:

  • Fast (sorting-based)
  • Captures geometry better than independent coupling
  • Still approximate

3. SROT: The Core Idea

Definition

SROT replaces the EOT prior with a SOT plan:

\[\pi^\star_{\varepsilon,\mathrm{SOT}} = \text{arg}\min_{\pi \in \Pi(\mu,\nu)} \int c(x,y)\, d\pi(x,y) + \varepsilon \, \mathrm{KL}(\pi \mid \pi^{\mathrm{SOT}})\]

Interpretation

  • EOT: shrink toward independence
  • SROT: shrink toward a geometric proxy of OT

As $\varepsilon \to 0 $:

  • Recover exact OT

As $\varepsilon \to \infty $:

  • Recover SOT plan

Key Insight

In discrete form:

\[C'_{ij} = C_{ij} - \varepsilon \log P^{\mathrm{SOT}}_{ij}\]

So SROT is equivalent to EOT with a modified cost:

  • Pairs favored by SOT become cheaper
  • Geometry is encoded directly in the optimization

4. Sinkhorn Algorithm for SROT

SROT keeps the same computational structure as EOT.

Kernel

\[K = P^{\mathrm{SOT}} \odot \exp(-C/\varepsilon)\]

Iterations

\[u \leftarrow \alpha \oslash (K v), \quad v \leftarrow \beta \oslash (K^\top u)\]

Transport Plan

\[P = \mathrm{diag}(u)\, K \, \mathrm{diag}(v)\]



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Vertical Consensus Inference (VCI)