Binary dataset separation proof
Table of Contents
Task
Proof that any learning dataset can be made linearly-separate.
Solution
Suppose $S$ is the learning dataset, so we have:
$$ S = \set{\left(\tilde{x_i},y_i\right) \mid \tilde{x_i}=\left(x_i^1,x_i^2,\dots,x_i^r\right) \in \mathbb{R}^r,y_i \in \mathbb{B}} $$
$$ S_0 = \set{\tilde{x_i} \mid \left(\tilde{x_i}, 0\right) \in S} $$
$$ S_1 = \set{\tilde{x_i} \mid \left(\tilde{x_i}, 1\right) \in S} $$
Suppose also that $card\left(S_0\right)=n$ and $card\left(S_1\right)=m$ so in learning dataset $S$ we have exactly $card\left(S\right)=n+m$ elements
Let's now enter the hyperspace $\mathbb{V}$, where $dim\left(\mathbb{V}\right)=max\set{n,m}=p$
When performing the transformation from $s-dimensional$ hyperspace to $(s+1)-dimensional$ hyperspace, we will change the coordinates of all points in the dataset using the following rule:
$$ \tilde{x_i}^\prime=\left(x_i^1,x_i^2,\dots,x_i^s,f_{s+1}\left(\tilde{x_i}\right)\right) $$
Where $\tilde{x_i}^\prime \in \mathbb{R}^{s+1}$, $\tilde{x_i} \in \mathbb{R}^s$ and $f_{s+1}\left(x_1,x_2,\dots,x_s\right)$ is some function, unambiguously defined for all points in $S$. Note that $f_{s+1}$ does not need to make use of all its variables (meaning that it can be a constant). There are no restrictions on what $f_{s+1}$ could be.
Now we have $S^\prime = \set{\left(\tilde{x_i}^\prime,y_i\right) \mid \tilde{x_i}^\prime \in \mathbb{R}^p,y_i \in \mathbb{B}}$ with the coreponding $S_0^\prime$ and $S_1^\prime$.
- If $n = m$, let's increase the dimension of $\mathbb{V}$ by one - now $dim\left(\mathbb{V}\right) = p + 1$.
- If $n=m=1$ - trivial case. We can always separate 2 points in any-dimensional linear space regardless of their location (because by the definition of $S$, those 2 points can not be equal to each other).
- If $n=m\neq1$:
-
Since we are in $(p+1)-dimensional$ hyperspace, we can position 2 hyperplanes, containing points from $S_0^\prime$ and $S_1^\prime$ respectively so they are parallel (because we can draw infinitely-many hyperplanes using $(q-1)$ points in any $q-dimensional$ linear hyperspace).
-
We can write the following inequality: $$ n \cdot m \ge p + 1 = max\set{n,m} + 1 $$
Let's now select $p + 1$ points from both parallel hyperplanes so we can finally draw a separator hyperplane. Suppose those points are $\tilde{z}_1,\tilde{z}_2,\dots,\tilde{z}_{p+1}$, where $\tilde{z_i}=\frac{\tilde{x_i}+\tilde{y_i}}{2}, \tilde{x_i} \in S_0^\prime, \tilde{y_i} \in S_1^\prime, 1 \le i \le p + 1$ and $\tilde{z_i} \neq \tilde{z_j}$ when $i \neq j$. Using these $p+1$ points we can now draw a separator hyperplane $\pi$: $$ \pi: \begin{vmatrix} z^1-z_1^1 & z^2-z_1^2 & \cdots & z^{p+1}-z_1^{p+1} \\ z_2^1-z_1^1 & z_2^2-z_1^2 & \cdots & z_2^{p+1}-z_1^{p+1} \\ \vdots & \vdots & \ddots & \vdots \\ z_{p+1}^1-z_1^1 & z_{p+1}^2-z_1^2 & \cdots & z_{p+1}^{p+1}-z_1^{p+1} \end{vmatrix}=0 $$
Here $z^1,z^2,\dots,z^{p+1}$ are variables, everything else are constants - coordinates for $\tilde{z}_1,\tilde{z}_2,\dots,\tilde{z}_{p+1}$ respectively.
-
- If $n \neq m$:
-
Since we are in $p-dimensional$ hyperspace and $p=max\set{m,n}$, one hyperplane, containing all elements from either $S_0^\prime$ or $S_1^\prime$ is strictly positioned and we can't do anything with it. But another hyperplane, which has $min\set{m,n}$ elements, can still be positioned how we want (because again, we can draw infinitely-many hyperplanes using $(q-1)$ points in any $q-dimensional$ linear hyperspace). Since we can position the second hyperplane the way we want, we can, again, make first and second hyperplanes parallel.
-
We can write the following inequality: $$ n \cdot m \ge p = max\set{n,m} $$
Similarly, we will select $p$ points from both parallel hyperplanes, so we can draw a separator hyperplane: $\tilde{z}_1,\tilde{z}_2,\dots,\tilde{z}_p$, where $\tilde{z_i}=\frac{\tilde{x_i}+\tilde{y_i}}{2}, \tilde{x_i} \in S_0^\prime, \tilde{y_i} \in S_1^\prime, 1 \le i \le p$ and $\tilde{z_i} \neq \tilde{z_j}$ when $i \neq j$. Using these $p$ points we can now draw a separator hyperplane $\pi$: $$ \pi: \begin{vmatrix} z^1-z_1^1 & z^2-z_1^2 & \cdots & z^p-z_1^p \\ z_2^1-z_1^1 & z_2^2-z_1^2 & \cdots & z_2^p-z_1^p \\ \vdots & \vdots & \ddots & \vdots \\ z_p^1-z_1^1 & z_p^2-z_1^2 & \cdots & z_p^p-z_1^p \end{vmatrix}=0 $$
Here $z^1,z^2,\dots,z^p$ are variables, everything else are constants - coordinates for $\tilde{z}_1,\tilde{z}_2,\dots,\tilde{z}_p$ respectively.
-
Notes
This proof is non-standard approach, so it is not the regular way people proof things like this. There is a better (on my opinion) pages 1 and 2 (in russian).
Useful links
- Machine learning book (in russian)
- Support vector machine (SVB)