Binary dataset separation proof

2025-02-24 · 5min · Dmitry Scherbakov

Table of Contents

Task

Proof that any learning dataset can be made linearly-separate.

Solution

Suppose $S$ is the learning dataset, so we have:

$$ S = \set{\left(\tilde{x_i},y_i\right) \mid \tilde{x_i}=\left(x_i^1,x_i^2,\dots,x_i^r\right) \in \mathbb{R}^r,y_i \in \mathbb{B}} $$

$$ S_0 = \set{\tilde{x_i} \mid \left(\tilde{x_i}, 0\right) \in S} $$

$$ S_1 = \set{\tilde{x_i} \mid \left(\tilde{x_i}, 1\right) \in S} $$

Suppose also that $card\left(S_0\right)=n$ and $card\left(S_1\right)=m$ so in learning dataset $S$ we have exactly $card\left(S\right)=n+m$ elements

Let's now enter the hyperspace $\mathbb{V}$, where $dim\left(\mathbb{V}\right)=max\set{n,m}=p$

When performing the transformation from $s-dimensional$ hyperspace to $(s+1)-dimensional$ hyperspace, we will change the coordinates of all points in the dataset using the following rule:

$$ \tilde{x_i}^\prime=\left(x_i^1,x_i^2,\dots,x_i^s,f_{s+1}\left(\tilde{x_i}\right)\right) $$

Where $\tilde{x_i}^\prime \in \mathbb{R}^{s+1}$, $\tilde{x_i} \in \mathbb{R}^s$ and $f_{s+1}\left(x_1,x_2,\dots,x_s\right)$ is some function, unambiguously defined for all points in $S$. Note that $f_{s+1}$ does not need to make use of all its variables (meaning that it can be a constant). There are no restrictions on what $f_{s+1}$ could be.

Now we have $S^\prime = \set{\left(\tilde{x_i}^\prime,y_i\right) \mid \tilde{x_i}^\prime \in \mathbb{R}^p,y_i \in \mathbb{B}}$ with the coreponding $S_0^\prime$ and $S_1^\prime$.

If $n = m$, let's increase the dimension of $\mathbb{V}$ by one - now $dim\left(\mathbb{V}\right) = p + 1$.
- If $n=m=1$ - trivial case. We can always separate 2 points in any-dimensional linear space regardless of their location (because by the definition of $S$, those 2 points can not be equal to each other).
- If $n=m\neq1$:
  1. Since we are in $(p+1)-dimensional$ hyperspace, we can position 2 hyperplanes, containing points from $S_0^\prime$ and $S_1^\prime$ respectively so they are parallel (because we can draw infinitely-many hyperplanes using $(q-1)$ points in any $q-dimensional$ linear hyperspace).
  2. We can write the following inequality: $$ n \cdot m \ge p + 1 = max\set{n,m} + 1 $$
    
    Let's now select $p + 1$ points from both parallel hyperplanes so we can finally draw a separator hyperplane. Suppose those points are $\tilde{z}_1,\tilde{z}_2,\dots,\tilde{z}_{p+1}$, where $\tilde{z_i}=\frac{\tilde{x_i}+\tilde{y_i}}{2}, \tilde{x_i} \in S_0^\prime, \tilde{y_i} \in S_1^\prime, 1 \le i \le p + 1$ and $\tilde{z_i} \neq \tilde{z_j}$ when $i \neq j$. Using these $p+1$ points we can now draw a separator hyperplane $\pi$: $$ \pi: \begin{vmatrix} z^1-z_1^1 & z^2-z_1^2 & \cdots & z^{p+1}-z_1^{p+1} \\ z_2^1-z_1^1 & z_2^2-z_1^2 & \cdots & z_2^{p+1}-z_1^{p+1} \\ \vdots & \vdots & \ddots & \vdots \\ z_{p+1}^1-z_1^1 & z_{p+1}^2-z_1^2 & \cdots & z_{p+1}^{p+1}-z_1^{p+1} \end{vmatrix}=0 $$
    
    Here $z^1,z^2,\dots,z^{p+1}$ are variables, everything else are constants - coordinates for $\tilde{z}_1,\tilde{z}_2,\dots,\tilde{z}_{p+1}$ respectively.
If $n \neq m$:
1. Since we are in $p-dimensional$ hyperspace and $p=max\set{m,n}$, one hyperplane, containing all elements from either $S_0^\prime$ or $S_1^\prime$ is strictly positioned and we can't do anything with it. But another hyperplane, which has $min\set{m,n}$ elements, can still be positioned how we want (because again, we can draw infinitely-many hyperplanes using $(q-1)$ points in any $q-dimensional$ linear hyperspace). Since we can position the second hyperplane the way we want, we can, again, make first and second hyperplanes parallel.
2. We can write the following inequality: $$ n \cdot m \ge p = max\set{n,m} $$
  
  Similarly, we will select $p$ points from both parallel hyperplanes, so we can draw a separator hyperplane: $\tilde{z}_1,\tilde{z}_2,\dots,\tilde{z}_p$, where $\tilde{z_i}=\frac{\tilde{x_i}+\tilde{y_i}}{2}, \tilde{x_i} \in S_0^\prime, \tilde{y_i} \in S_1^\prime, 1 \le i \le p$ and $\tilde{z_i} \neq \tilde{z_j}$ when $i \neq j$. Using these $p$ points we can now draw a separator hyperplane $\pi$: $$ \pi: \begin{vmatrix} z^1-z_1^1 & z^2-z_1^2 & \cdots & z^p-z_1^p \\ z_2^1-z_1^1 & z_2^2-z_1^2 & \cdots & z_2^p-z_1^p \\ \vdots & \vdots & \ddots & \vdots \\ z_p^1-z_1^1 & z_p^2-z_1^2 & \cdots & z_p^p-z_1^p \end{vmatrix}=0 $$
  
  Here $z^1,z^2,\dots,z^p$ are variables, everything else are constants - coordinates for $\tilde{z}_1,\tilde{z}_2,\dots,\tilde{z}_p$ respectively.

Notes

This proof is non-standard approach, so it is not the regular way people proof things like this. There is a better (on my opinion) pages 1 and 2 (in russian).

Useful links

Machine learning book (in russian)
Support vector machine (SVB)