We'll go over the basics of RKHS and how to use them in practice using the Mercer representation.
Square integrable functions
The kinds of functions that we will care about are square integrable functions. Let \(X\) be a compact space. A function \(f\) is square integrable (meaning \(f \in L_2(X)\)) if it has a bounded norm with repspect to the inner product
In other words, the space of integrable functions is defined as
If we have a basis of functions \(\{e_i\}_{i=1}^{\infty}\) for \(L_2(X)\), then we can write any function \(f\in L_2(X)\) in terms of the basis functions as
where \(f_i\) is defined as the component of \(f\) in the \(e_i\) basis. By our assumption that \(f\in L_2(X)\), we have that \(f_i < \infty\) for all \(i\).
Kernels
A kernel function \(K: X\times X \to \mathbb{R}\) is a symmetric, positive definite function. We can think of a kernel function as a way to "smooth" out functions by integrating them with respect to the kernel function. We will define this "smoothing operation" as \(T_K: L_2(X) \to L_2(X)\):
Notice that \(T_K\) is a self-adjoint linear operator, so by the Mercer theorem it has an eigendecomposition. This means that there exists scalars \(\{\lambda_i\}_{i=1}^{\infty}\) and orthonormal functions \(\{e_i\}_{i=1}^{\infty}\) (where \(\int_{\infty}^{\infty} e_i(x)e_j(x) dx = \delta_{ij}\)) so that
This implies that we can write \(K\) in terms of its eigendecomposition as
This is similar to how We can verify this easily because
Using these definitions, we can relate \(K\) to its eigenfunctions in a more linear algabraic way as
This expression highlights the interpretation that a Hilbert space is an infinite dimensional vector space because \(x\) is analagous to a row index. For example, if we have a matrix symmetric PSD \(\Sigma \in \mathbb{R}^{n\times n}\) with eigenvectors \(\{v_i \in \mathbb{R}^n\}_{i=1}^{n}\) and eigenvalues \(\{\lambda_i \in \mathbb{R}\}_{i=1}^{n}\), we can write the j'th index of the scaled i'th eigenvector is \(\langle \Sigma_j, v_i \rangle_2 = \lambda_i (v_i)_j\).
Reproducing Kernel Hilbert Space
Now that we've seen how to relate the kernel function with its eigenfunctions, we can define the RKHS. An RKHS associated with a kernel \(K\) is a Hilbert space with an inner product that resembles the inner product weighted by a symmetric matrix \(\Sigma\) on \(R^n\), \(\langle x, y \rangle_\Sigma = x^T\Sigma^{-1} y\).
Recall that \(K(x,y) = \sum_{i=1}^{\infty} \lambda_i e_i(x)e_i(y)\). We define the inner product of an RKHS as follows:
This is similar to the inner product weighted by \(\Sigma^{-1}\) on \(\mathbb{R}^n\) because if the eigendecomposition of \(\Sigma^{-1}\) is \(\Sigma^{-1}=\frac{1}{\lambda_i}v_i v_i^T\), then \(\langle x, y \rangle_{\Sigma^{-1}} = x^T\Sigma^{-1} y = \sum_{i=1}^n \frac{1}{\lambda_i} (x^Tv_i)(y^Tv_i) = \sum_{i=1}^n \frac{1}{\lambda_i} \langle x, v_i \rangle_2 \langle y, v_i \rangle_2\). Finally, an RKHS is defined as the collection of functions with finite norm under this inner product:
Reproducing property
The "reproducing" part of RKHS comes from the following property:
This is called the reproducing property because it means that the evaluation of a function \(f\) at \(x\) can be "reproduced" by the inner product of \(f\) with \(K_x\). We get another nice property if we take the inner product with the partially filled kernel functions:
Finally, we can also push linear operators on \(x\) through the inner product. For example, let \(\text{grad }\) be the gradient function on \(x\). Then