Sliced-Regularized Optimal Transport

In this post, we break down a new regularized optimal transport problem Sliced-Regularized Optimal Transport (SROT).

Overview

Optimal Transport (OT) is a powerful framework for comparing probability distributions, but it is often computationally expensive. A widely used workaround is entropic regularization (EOT), which makes OT scalable via Sinkhorn iterations. However, EOT introduces bias because it regularizes toward an uninformative prior — the independent coupling.

This post introduces a different idea:

What if we regularize OT toward a meaningful prior instead?

This leads to Sliced-Regularized Optimal Transport (SROT):

Replace the independent coupling with a sliced OT (SOT) plan
Keep the same KL regularization structure
Retain Sinkhorn-style efficiency

The result is a method that is as scalable as EOT, but closer to true OT.

1. From OT to Regularized OT

Optimal Transport

Given distributions $\mu $ and $\nu $, OT solves:

\[\pi^\star = \text{arg}\min_{\pi \in \Pi(\mu,\nu)} \int c(x,y)\, d\pi(x,y)\]

This finds the most efficient way to move mass from $\mu $ to $\nu $.

Entropic OT (EOT)

EOT adds a KL penalty:

\[\pi^\star_\varepsilon = \text{arg}\min_{\pi \in \Pi(\mu,\nu)} \int c(x,y)\, d\pi(x,y) + \varepsilon \, \mathrm{KL}(\pi \mid \mu \otimes \nu)\]

Key idea:

Encourages smooth plans
Reference = independent coupling $\mu \otimes \nu $

Limitation:
The prior is not informative about the true OT structure.

2. Sliced OT as a Prior

Sliced OT (SOT) approximates OT by:

Projecting data onto 1D directions
Solving 1D OT (cheap, closed-form)
Lifting back to high dimensions
Averaging over projections

Final SOT plan:

\[\pi^{\mathrm{SOT}} = \mathbb{E}_\theta[\pi_\theta]\]

Properties:

Fast (sorting-based)
Captures geometry better than independent coupling
Still approximate

3. SROT: The Core Idea

Definition

SROT replaces the EOT prior with a SOT plan:

\[\pi^\star_{\varepsilon,\mathrm{SOT}} = \text{arg}\min_{\pi \in \Pi(\mu,\nu)} \int c(x,y)\, d\pi(x,y) + \varepsilon \, \mathrm{KL}(\pi \mid \pi^{\mathrm{SOT}})\]

Interpretation

EOT: shrink toward independence
SROT: shrink toward a geometric proxy of OT

As $\varepsilon \to 0 $:

Recover exact OT

As $\varepsilon \to \infty $:

Recover SOT plan

Key Insight

In discrete form:

\[C'_{ij} = C_{ij} - \varepsilon \log P^{\mathrm{SOT}}_{ij}\]

So SROT is equivalent to EOT with a modified cost:

Pairs favored by SOT become cheaper
Geometry is encoded directly in the optimization

4. Sinkhorn Algorithm for SROT

SROT keeps the same computational structure as EOT.

Kernel

\[K = P^{\mathrm{SOT}} \odot \exp(-C/\varepsilon)\]

Iterations

\[u \leftarrow \alpha \oslash (K v), \quad v \leftarrow \beta \oslash (K^\top u)\]

Transport Plan

\[P = \mathrm{diag}(u)\, K \, \mathrm{diag}(v)\]