Understanding Kriging (Part 1)

code
analysis
Author

Joaquin Cavieres

Published

May 16, 2025

In this first blog post, I would like to share my thoughts on how the Kriging method can often lead to confusion due to its different interpretations. My goal is to help readers distinguish between Kriging (the “Best Linear Unbiased Predictor (BLUP)”) viewed from the linear algebra perspective (as a solution of a linear system of equations) and its probabilistic interpretation as a Gaussian random field (the term “Gaussian process” I prefer using it for data without a spatial reference).

In Part 1, I will present the equations used to calculate the Kriging weights and demonstrate how to predict values at specific spatial locations. Using these weights, we will then perform interpolation over a regular square grid covering the entire spatial domain of interest. Finally, the results from our manual calculations will be compared with those obtained using the “gstat” package to assess their consistency and accuracy.

1 Introduction

We will denote the spatial locations as \(\{\mathbf{s}_1, \ldots, \mathbf{s}_n\}\), and the spatial data collected at these locations (the observed or measurement variable) will be denoted as \(Z(\mathbf{s}_1), \ldots, Z(\mathbf{s}_n)\). The spatial locations are determined by their coordinates in the space, for example, latitude-longitude and we will be mainly focused in the two-dimensional space. To denote the vector of all the observations we will use \(\mathbf{Z} = (Z(\mathbf{s}_1), \ldots, Z(\mathbf{s}_n))^{\top}\).

2 Computing the distance

Distance is a numerical description of how far apart things are. It is the most fundamental concept in geography. Considering the Waldo Tobler’s first law of the geography which says everything is related to everything else, but near things are more related than distant things (Tobler 1970) , we need to quantify this relationship in some way and it is not always as easy a question as it seems.

Computing the distance between a set of spatial data points is very important for spatial data analysis. If we have \(\mathbf{s}_i\) spatial locations, for \(i = 1, \ldots, n\), with \(n\) equal to the total number of spatial locations. The coordinates of \(\mathbf{s}_i\) are in latitude-longitude but for simplicity, we will denote them as \((x_i,y_i)\). Now, another set of spatial locations \(\mathbf{s}_j\) has coordinates \((x_i, y_j)\), hence the Euclidean distance between spatial points \(\mathbf{s}_i\) and \(\mathbf{s}_j\) is given by

\[ \begin{equation}\text{dist\_matrix}_{ij} = \sqrt{(x_i - x_j)^2 + (y_i - y_j)^2}.\end{equation} \tag{1}\]

There are other forms to calculate the distance, for example; great-circle distance, azimuth distance, travel distance from point to point, time needed to get from point to point, etc.).

The computation of “distances” are commonly represented by the distance matrix. In this object (the distance matrix) we have all the values calculated for the distances between the objects of interest. For example,

site1 <- c(30, 45) 
site2 <- c(95, 5)
site3 <- c(80, 45)
site4 <- c(90, 55)
site5 <- c(70, 30)
site6 <- c(30, 8)
sites <- rbind(site1, site2, site3, site4, site5, site6)
sites
      [,1] [,2]
site1   30   45
site2   95    5
site3   80   45
site4   90   55
site5   70   30
site6   30    8
plot(sites, xlim=c(0,100), ylim=c(0,100), pch=20, cex=2, col='red', xlab='X', ylab='Y', las=1)
text(sites+3, c("site1", "site2", "site3", "site4", "site5", "site6"))

Using these data, we can compute the distance between spatial points (distance matrix) as:

# Compute Euclidean distances
dist_matrix <- as.matrix(dist(sites))
dist_matrix
         site1    site2    site3    site4    site5    site6
site1  0.00000 76.32169 50.00000 60.82763 42.72002 37.00000
site2 76.32169  0.00000 42.72002 50.24938 35.35534 65.06919
site3 50.00000 42.72002  0.00000 14.14214 18.02776 62.20129
site4 60.82763 50.24938 14.14214  0.00000 32.01562 76.21680
site5 42.72002 35.35534 18.02776 32.01562  0.00000 45.65085
site6 37.00000 65.06919 62.20129 76.21680 45.65085  0.00000

3 (Semi)Variogram

In simple terms, it is a function of difference over distance. In mathematical terms, we could define it as the expected squared difference between values separated by a distance vector, generally denoted as \(h\). The equation of the Variogram is the following

\[ \begin{equation} \gamma(\mathbf{h}) = \frac{1}{2}\text{Var}[Z(\mathbf{s}) - Z(\mathbf{s} + \mathbf{h})], \end{equation} \tag{2}\]

where \(\mathbf{s}\) is the spatial location, \(\mathbf{h}\) is the vector of distance between two spatial points, and \(\gamma(\mathbf{h})\) measured how dissimilar the field values become as \(\mathbf{h}\) increases (this is for a third type of stationarity (intrinsic stationarity) and assuming that \(\mathbb{E}[Z(\mathbf{s} + \mathbf{h}) - Z(\mathbf{s})] = 0\)). The relationship of the variogram and the covariance function is given by:

\[\begin{align*} 2\gamma(\mathbf{h}) \hspace{2mm} = & \text{Var}[Z(\mathbf{s}) - Z(\mathbf{s} + \mathbf{h})] \\ = & \text{Var}(Z(\mathbf{s} + \mathbf{h}) + \text{Var}(Z(\mathbf{s}) - 2Cov(Z(\mathbf{s} + \mathbf{h}, Z(\mathbf{s})\\ = & C(0) + C(0) - 2C(\mathbf{h}) \end{align*}\]

Thus,

\[ \gamma(\mathbf{h}) = C(0) - C(\mathbf{h}). \tag{3}\]

3.1 Empirical Variogram

Traditionally, the selection of a variogram model begins with plotting the empirical semivariogram. This empirical plot is then visually compared against various theoretical models to identify the most appropriate fit. The standard form of the empirical semivariogram is given by

\[ \hat{\gamma}(\mathbf{h}) = \frac{1}{2n(\mathbf{h})} \sum^{n(\mathbf{h})}_{i,j}(Z(\mathbf{s}_i) - Z(\mathbf{s}_j))^{2}, \tag{4}\]

where \(Z(\mathbf{s}_i)\) and \(Z(\mathbf{s}_j)\) are the values (in the space) at the spatial locations \(\mathbf{s}\). Finally, we will denote the variogram as \(2 \gamma(\mathbf{h})\).

The typical variogram models are:

  • Exponential:

    \(\gamma(\mathbf{h}) = \sigma^2 \left(1 - \exp\left(-\frac{\mathbf{h}}{\rho}\right)\right)\)

  • Spherical:

    \[\begin{equation*} \gamma(\mathbf{h}) = \begin{cases} \sigma^2 \left[\frac{3\mathbf{h}}{2\rho} - \frac{\mathbf{h}^3}{2\rho^3}\right], & \text{if } 0 \le \mathbf{h} \le \rho \\\sigma^2, & \text{if } \mathbf{h} > \rho \end{cases} \end{equation*}\]
  • Gaussian:

    \(\gamma(\mathbf{h}) = \sigma^2 \left(1 - \exp\left(-\left(\frac{\mathbf{h}}{\rho}\right)^2\right)\right)\)

For all the variograms presented, \(\rho\) is the range parameter.

4 Variogram computation

From the Equation 4, we know that; \(n(\mathbf{h})\) is the number of spatial points for a specific distance vector \(\mathbf{h}\), with \(Z(\mathbf{s})\) and \(Z(\mathbf{s} + \mathbf{h})\) and \(2 \gamma(\mathbf{h})\) is the variogram.

First, we will consider the following important observations before of the computation:

  1. As the distance increase, then the variability also increase: this is typical, as larger distance offsets generally lead to greater differences between the head and tail samples.
  2. It is calculated over all possible pairs separated by the distance vector \(\mathbf{h}\): we evaluate all data, identifying all possible pair combinations with every other spatial point. The variogram is then calculated as half the expected squared difference between these pairs. A larger number of pairs provides a more reliable estimate.
  3. It is necessary plot the sill to know the correlation between the spatial points: here, sill is the spatial variance \(\sigma^2_{\mathbf{s}}\). As we are assuming stationarity, the covariance is:
\[\begin{equation*} C(\mathbf{h}) = \sigma^2_\mathbf{s} - \gamma(\mathbf{h}) \end{equation*}\]

The covariance measure the similarity over distance, and if \(\sigma^2_{\mathbf{s}} = 1.0\), \(C(\mathbf{h})\) is equal to the correlogram \(\rho(\mathbf{h})\) such that:

\[\begin{equation} \rho(\mathbf{h}) = \sigma^2_\mathbf{s} - \gamma(\mathbf{h}). \end{equation}\]
  1. The distance at which the variogram reaches \(\sigma^2_{\mathbf{s}}\) is known as the range: the distance in which the difference of the variogram from \(\sigma^2_{\mathbf{s}}\) becomes negligible.

  2. Evaluate the discontinuity at the origin, the nugget effect: It represents variability in the data that occurs at scales smaller than the sampling distance, or it may be caused by measurement errors or random noise.

4.1 Practical example

Given the following observed values:

\[\begin{equation*} Z(\mathbf{s}_1) = 10, \hspace{2mm} Z(\mathbf{s}_2) = 12, \hspace{2mm} Z(\mathbf{s}_3) = 14, \hspace{2mm} Z(\mathbf{s}_4) = 13, \hspace{2mm} Z(\mathbf{s}_5) = 15 \end{equation*}\]

and the corresponding distances between the spatial points (the distance matrix)

\[\begin{bmatrix} 0 & 2 & 4 & 6 & 8 \\ 2 & 0 & 2 & 4 & 6 \\ 4 & 2 & 0 & 2 & 4 \\ 6 & 4 & 2 & 0 & 2 \\ 8 & 6 & 4 & 2 & 0 \end{bmatrix}\]

Calculate the experimental variogram for the distance \(\mathbf{h} = 2\).

Solution: The empirical variogram estimator \(\hat{\gamma}(\mathbf{h})\) for \(\mathbf{h}\) is defined as:

\[\begin{equation*} \hat{\gamma}(\mathbf{h}) = \frac{1}{2 n(\mathbf{h})} \sum^{n(\mathbf{h})}_{i,j} \left( Z(\mathbf{s}_i) - Z(\mathbf{s}_j) \right)^2, \end{equation*}\]

where \(n(\mathbf{h})\) is the number of pairs \((i, j)\) such that the distance between \(\mathbf{s}_i\) and \(\mathbf{s}_j\) is approximately \(\mathbf{h}\).

  1. For \(\mathbf{h} = 2\), the pairs of points whose distance is 2 are:

    • \((\mathbf{s}_1, \mathbf{s}_2)\)
    • \((\mathbf{s}_2, \mathbf{s}_3)\)
    • \((\mathbf{s}_3, \mathbf{s}_4)\)
    • \((\mathbf{s}_4, \mathbf{s}_5)\)

    We now calculate the squared differences for each pair of spatial points:

    \[\begin{aligned} \left( Z(\mathbf{s}_1) - Z(\mathbf{s}_2) \right)^2 &= (10 - 12)^2 = 4, \\ \left( Z(\mathbf{s}_2) - Z(\mathbf{s}_3) \right)^2 &= (12 - 14)^2 = 4, \\ \left( Z(\mathbf{s}_3) - Z(\mathbf{s}_4) \right)^2 &= (14 - 13)^2 = 1, \\ \left( Z(\mathbf{s}_4) - Z(\mathbf{s}_5) \right)^2 &= (13 - 15)^2 = 4. \end{aligned}\]

    Now, compute the empirical variogram:

    \[\begin{equation*} \hat{\gamma}(2) = \frac{1}{2 \times 4} \left( 4 + 4 + 1 + 4 \right) = \frac{1}{8} \times 13 = 1.625. \end{equation*}\]

    Thus, for \(\mathbf{h} = 2\) is \(\hat{\gamma}(2) = 1.625\).

4.2 Using the “gstat” package of R

The “gstat” package in R is a versatile tool used for geostatistical modelling, spatial prediction, and multivariable geostatistics (Pebesma 2004). It provides functionality for variogram modeling, Kriging, and conditional simulation, supporting univariate and multivariate spatial data. Some of the main features of this library are the following:

  • Computing the variogram \(\rightarrow\) empirical variogram (also called experimental) and variogram fitting (variogram modeling).

  • Kriging \(\rightarrow\) ordinary, simple, and universal kriging.

  • Multivariable geostatistics \(\rightarrow\) co-kriging and other methods for spatial modelling.

  • Simulation \(\rightarrow\) conditional and unconditional GRF for spatial predictions.

The “gstat” package is simple to use and here are the basics steps for spatial data analysis:

  1. Define spatial data ( a \(\texttt{SpatialPointsDataFrame})\) from the “sp” package.
  2. Compute the empirical variogram using the function \(\texttt{variogram()}\)
  3. Fit a variogram model using the function \(\texttt{fit.variogram()}\)
  4. Perform kriging prediction with the \(\texttt{krige()}\) function.

For the California air pollution data (from the “rspatial” package), select the “airqual” data and interpolate the levels of ozone (averages for 1980-2009). Here, you must consider “OZDLYAV” (unit is parts per billion) for interpolation purposes.

# Installing the rspat package
if (!require("rspatial")) remotes::install_github('rspatial/rspatial')

# Read the data
library(rspatial)
x <- sp_data("airqual")
x$res <- x$OZDLYAV * 1000

Now, we create a \(\texttt{SpatialPointsDataFrame}\) and transform to Teale Albers. Note the units=km, which was needed to fit the variogram.

#===================================
#         Installing rgdal
#===================================
# url <- "https://download.r-forge.r-project.org/bin/windows/contrib/4.4/rgdal_1.6-7.zip"
# install.packages(url, type="source", repos=NULL)

library(rgdal)
coordinates(x) <- ~LONGITUDE + LATITUDE
proj4string(x) <- CRS('+proj=longlat +datum=NAD83')
TA <- CRS("+proj=aea +lat_1=34 +lat_2=40.5 +lat_0=0 +lon_0=-120 +
          x_0=0 +y_0=-4000000 +datum=WGS84 +units=km")
coords <- spTransform(x, TA)

Now we will create an empirical variogram using the package “gstat” as follows:

library(gstat)
gs <- gstat(formula = res~1, locations = coords)
v <- variogram(gs, width=20)
head(v)
    np      dist    gamma dir.hor dir.ver   id
1 1010  11.35040 34.80579       0       0 var1
2 1806  30.63737 47.52591       0       0 var1
3 2355  50.58656 67.26548       0       0 var1
4 2619  70.10411 80.92707       0       0 var1
5 2967  90.13917 88.93653       0       0 var1
6 3437 110.42302 84.13589       0       0 var1
plot(v)

and then fit a model variogram (Exponential in this case):

fve <- fit.variogram(v, vgm(85, "Exp", 75, 20))
fve
  model    psill    range
1   Nug 21.96600  0.00000
2   Exp 85.52957 72.31404
plot(variogramLine(fve, 400), type='l', ylim=c(0,120))
points(v[,2:3], pch=20, col='red')

You can change the type of variograms (Spherical in stead of Exponential) and compare them, for example:

fvs <- fit.variogram(v, vgm(85, "Sph", 75, 20))
fvs
  model    psill    range
1   Nug 25.57019   0.0000
2   Sph 72.65881 135.7744
plot(variogramLine(fvs, 400), type='l', ylim=c(0,120) ,col='blue', lwd=2)
points(v[,2:3], pch=20, col='red')

5 Kriging

This section introduces the traditional method for spatial prediction in point-referenced data, which aims to minimize the mean squared prediction error. Known as Kriging, this technique was formalized by (Matheron 1963), who named it in tribute to Danie G. Krige (Krige 1951). Here, the main task is finding an optimal spatial prediction, that is; giving a vector of observations \(\mathbf{Z} = (Z(\mathbf{s}_1), \ldots, Z(\mathbf{s}_n))^{\top}\) with the purpose of accurately estimating a value at an unsampled location \(Z(\mathbf{s}_0)\). So, basically the question is; what is the best predictor of the value of \(Z(\mathbf{s}_0)\) based upon the data \(\mathbf{Z}\)?

5.1 Ordinary Kriging

Let consider the observed values \(\mathbf{Z} = (Z(\mathbf{s}_1), Z(\mathbf{s}_2), \dots, Z(\mathbf{s}_n))^{\top}\) at known locations \(\{\mathbf{s}_1, \mathbf{s}_2, \dots, \mathbf{s}_n\}\). The goal is to predict the value of a random field \(Z(\mathbf{s}_0)\) at an unobserved location \(\mathbf{s}_0\). The model assumption for this case is the following:

\[ \mathbf{Z} = \mu + \delta(\mathbf{s}), \hspace{6mm} \mathbf{s} \in S, \hspace{2mm} \mu \in \mathbb{R} (\text{unknown}) \]

where \(\delta(\mathbf{s})\) is a zero-mean spatial stochastic process with variogram or with some covariance function and \(S \subset \mathbb{R}^{d}\), with \(d\) being a dimensional Euclidean space. For this case, Kriging assumes a spatial random field expressed through a variogram or covariance function, and allows to face this spatial rediction problem. In this case, the Ordinary Kriging predictor for \(Z(\mathbf{s}_0)\) is a linear combination of the observed values such that:

\[\begin{equation*} \hat{Z}(\mathbf{s}_0) = \sum_{i=1}^n w_i Z(\mathbf{s}_i), \end{equation*},\]

where \(w_i\), for \(i = 1, \ldots, n\) are the Kriging weights that need to be calculated. In this way, Kriging minimizes the mean squared error of the prediction such that:

\[\begin{equation*} \text{min} \hspace{2mm}\sigma^{2}_{\text{error}} = \mathbb{E}[Z(\mathbf{s}_0) - \hat{Z}(\mathbf{s}_0)]^{2} \end{equation*},\]

or

\[ \text{min}\hspace{2mm} \sigma^{2}_{\text{error}} = \mathbb{E}[Z(\mathbf{s}_0) - \sum_{i=1}^n w_i Z(\mathbf{s}_i)]^{2} \tag{5}\]

In terms of a covariance function, for a zero-mean and second order stationary spatial process, and also considering to \(C(\mathbf{h}), \mathbf{h} \in \mathbb{R}^{d}\), then Equation 5 can be written as:

\[\begin{equation} \sigma^{2}_{\text{error}} = C(0) - 2 \sum^{n}_{i=1}w_iC(\mathbf{s}_0, \mathbf{s}_i) + \sum^{n}_{i=1}\sum^{n}_{j=1} w_{i}w_{j}C(\mathbf{s}_i, \mathbf{s}_j). \end{equation}\]

The ordinary Kriging weights \(w_i\) must satisfy two requirements:

  • Unbiasedness: The predictor should be unbiased, which requires that \(\sum_{i=1}^n w_i = 1\).

  • Optimality: The weights should minimize the variance of the prediction error (the Kriging variance), leading to the solution of a system of equations.

The unbiasedness requirement ensures that the expected value of the Kriging predictor matches the expected value of the random field, preventing systematic over- or under-prediction (interpolation).

5.2 Finding the Kriging weights

Based on the data \(\mathbf{Z}\) and \(\{\mathbf{s}_1, \ldots, \mathbf{s}_n\}\), with \(C(\mathbf{h})\) being a covariance function of the spatial random field, and \(\mathbf{h} = \mathbf{s}_i - \mathbf{s}_j\), for Ordinary Kriging we can write a linear system of equations fo find the Kriging weights \(w_1, \ldots, w_n\). But, how can we propose this system of equations? Well, we already know that the ordinary Kriging estimator is a linear combination of the observed values,

\[\begin{equation*} \hat{Z}(\mathbf{s}_0) = \sum^{n}_{i=1} w_i Z(\mathbf{s}_i), \end{equation*}\]

and since that here we are considering the covariance then the minimization problem can be written as:

\[ \text{min} \hspace{2mm} C(0) - 2\sum^{n}_{i=1}w_{i}C(\mathbf{s}_0, \mathbf{s}_i) + \sum^{n}_{i=1}\sum^{n}_{j=1}w_{i}w_{j}C(\mathbf{s}_i, \mathbf{s}_j) + 2\mathcal{M}(\sum^{n}_{i=1} w_i - 1) \tag{6}\]

where \(\mathcal{M}\) is the Lagrange multiplier. Thus, after differentiating Equation 6 with respect \(w_{1}, \ldots, w_{n}\) and \(\mathcal{M}\), and set the derivatives equal to zero we have:

\[\begin{align*} 2\sum^{n}_{j=1}w_{j}C(\mathbf{s}_i, \mathbf{s}_j) - 2C(\mathbf{s}_0, \mathbf{s}_i) - 2\mathcal{M} \hspace{2mm} = & 0 \\ \sum^{n}_{j=1}w_{j} C(\mathbf{s}_i - \mathbf{s}_j) + C(\mathbf{s}_0 - \mathbf{s}_i) - \mathcal{M} \hspace{2mm} = & \hspace{2mm} 0 \\ \sum^{n}_{i=1}w_{i} \hspace{2mm} = & \hspace{2mm} 1 \end{align*}\]

Therefore, using a matrix notation to find \(w_{1}, \ldots, w_{n}\) we propose the following linear system of equations:

\[ \boldsymbol{\Gamma}\mathbf{w} = \mathbf{c}, \]

where:

  • \(\mathbf{w} = (w_{1}, \ldots, w_{n}, -\mathcal{M})^{\top}\).

  • \(\mathbf{c} = (C(\mathbf{s}_0, \mathbf{s}_1), \ldots, C(\mathbf{s}_0, \mathbf{s}_n), 1)^{\top}\).

  • \(\boldsymbol{\Gamma} = C(\mathbf{s}_i, \mathbf{s}_j), \hspace{2mm} i = 1, \ldots, n, \hspace{2mm} j = 1, \ldots, n\).

Thus, expressed in a linear system of equations (the Kriging system):

\[\begin{equation*} \begin{pmatrix} C(\mathbf{s}_1, \mathbf{s}_1) & C(\mathbf{s}_1, \mathbf{s}_2) & \dots & C(\mathbf{s}_1, \mathbf{s}_n) & 1 \\ C(\mathbf{s}_2, \mathbf{s}_1) & C(\mathbf{s}_2, \mathbf{s}_2) & \dots & C(\mathbf{s}_2, \mathbf{s}_n) & 1 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ C(\mathbf{s}_n, \mathbf{s}_1) & C(\mathbf{s}_n, \mathbf{s}_2) & \dots & C(\mathbf{s}_n, \mathbf{s}_n) & 1 \\ 1 & 1 & \dots & 1 & 0 \\ \end{pmatrix} \begin{pmatrix} w_1 \\ w_2 \\ \vdots \\ w_n \\ -\mathcal{M} \\ \end{pmatrix} = \begin{pmatrix} C(\mathbf{s}_1, \mathbf{s}_0) \\ C(\mathbf{s}_2, \mathbf{s}_0) \\ \vdots \\ C(\mathbf{s}_n, \mathbf{s}_0) \\ 1 \\ \end{pmatrix} \end{equation*}\]

We can find \(\hat{\mathbf{w}}\) as

\[ \hat{\mathbf{w}} = \boldsymbol{\Gamma}^{-1}\mathbf{c}. \]

Finally, the Kriging variance, which quantifies the uncertainty of the prediction at \(\mathbf{s}_0\), is given by:

\[\begin{equation*} \sigma_\text{OK}^2(\mathbf{s}_0) = C(\mathbf{0}) - \sum_{i=1}^n w_i C(\mathbf{s}_i - \mathbf{s}_0) - \mathcal{M} \end{equation*},\]

where lower Kriging variance indicates a reliable prediction (interpolation) values.

6 Example using R

We will simulate some spatial data (point referenced or geostatistical data) to compute manually the Ordinary Kriging predictor using the Kriging system.

# Load neccesary libraries
library(tidyverse)
library(scales)
library(ggplot2)
library(gridExtra)
library(magrittr)


# Simulating the spatial locations
set.seed(123)
n <- 100
locations <- matrix(runif(2 * n, 1, 10), ncol = 2)
# Simulating the observed values at these locations
z_values <- rnorm(n, mean = 10, sd = 2)

The next step is to choose an appropriate covariance function for modeling spatial dependence. While several types of covariance (or correlation) functions are available for spatial data, we will use the Exponential covariance function for its simplicity and effectiveness. It is defined as:

\[\begin{equation*} C(\mathbf{h}) = \sigma^{2}\exp\bigg (\frac{\mathbf{h}}{\rho} \bigg ), \end{equation*}\]

where \(\sigma^{2}\) is the sill (representing the variance of the spatial process at zero distance), \(\rho\) is the range (controls how quickly the correlation decays with distance) and \(\mathbf{h}\) is the distance between two locations. We will create this function in R:

# Exponential covariance function
exp_cov <- function(h, sill = 1, range = 3) {
  sill * exp(-h / range)}

Now we will build the covariance matrix

# Build the covariance matrix
cov_mat <- matrix(0, nrow = n + 1, ncol = n + 1)
nugget <- 1e-10

for (i in 1:n) {
  for (j in 1:n) {
    h <- sqrt(sum((locations[i, ] - locations[j, ])^2))
    cov_mat[i, j] <- exp_cov(h)
  }
  cov_mat[i, n + 1] <- 1  # Add 1s for constraint
  cov_mat[n + 1, i] <- 1
}
cov_mat[n + 1, n + 1] <- 0  # Lagrange multiplier part
diag(cov_mat)[1:n] <- diag(cov_mat)[1:n] + nugget

At this point, we have all the elements to predict manually a specific value in a spatial coordinate \(\mathbf{s_0} = (s_1, s_2)^{T}\). For example, if we are interested in the prediction in the spatial coordinates longitude = 5, and latitude = 5. This means, \(\mathbf{s_0} = (latitude, longitude) = (5, 5)\), hence we want to predict \(\hat{Z}(\mathbf{s}_0)\). Now, applying the exponential covariance function to the right side of the linear system:

# Cov between new coordinates and known points
# Location for prediction
s0 <- c(5, 5)

# Initializing the cov in the right hand side
c_rhs <- numeric(n + 1)
for (i in 1:n) {
  h <- sqrt(sum((locations[i, ] - s0)^2))
  c_rhs[i] <- exp_cov(h)
}
c_rhs[n + 1] <- 1  # constraint

Finally, we will solve the Kriging system

# Fiding w and Lagrange multiplier (M)
sol <- solve(cov_mat, c_rhs)
w <- sol[1:n]; w
  [1]  5.276449e-04 -1.803309e-04  1.808293e-02  3.633872e-05  3.512794e-07
  [6]  4.037552e-06 -1.075393e-04 -4.808129e-05  1.185162e-01 -2.727913e-04
 [11]  7.221219e-05  2.943618e-02 -8.813330e-05 -1.101876e-05 -2.536848e-05
 [16] -5.618937e-06 -2.864650e-03  7.090709e-05 -1.194771e-02  3.334460e-05
 [21] -1.292535e-05 -4.060164e-04 -4.474131e-04  3.765660e-05 -6.044917e-03
 [26]  4.564017e-05 -4.555052e-04 -8.970886e-05 -6.535305e-05  7.507714e-05
 [31]  6.314879e-05  9.338884e-07 -4.830208e-04 -1.474871e-05  3.282166e-05
 [36] -5.492154e-03 -4.961159e-05  1.600371e-04  1.783362e-05 -2.947635e-03
 [41] -5.033814e-05  4.059160e-01 -3.603510e-05 -2.640918e-04 -3.769253e-05
 [46]  2.305515e-05 -1.535201e-05 -6.635154e-05 -1.800664e-03  1.070637e-05
 [51] -1.374114e-06  4.253124e-02  5.859619e-05  1.412490e-05 -8.819846e-05
 [56]  2.088056e-04  8.108485e-05 -2.172081e-04 -1.140082e-04 -2.971160e-03
 [61] -3.386579e-03  6.540672e-05 -5.333364e-03 -6.878677e-03  3.795126e-05
 [66] -1.479047e-02  6.800025e-06 -2.405783e-05 -2.911957e-04 -6.566962e-03
 [71] -5.174284e-04 -4.830647e-04 -1.256172e-04  2.572667e-05 -5.796527e-04
 [76]  5.918588e-04 -2.105901e-02  1.104930e-02  7.288147e-05  4.985706e-07
 [81]  4.945123e-05 -7.647327e-05  2.493547e-04 -6.589906e-05 -5.331597e-06
 [86]  4.589720e-01  5.533826e-06 -1.741735e-04 -8.916278e-06 -1.164895e-05
 [91] -2.342534e-05 -8.579970e-04  2.773732e-05 -2.368735e-03 -2.154586e-05
 [96] -1.211798e-04 -8.375836e-04 -1.596751e-05 -5.827816e-04  1.571912e-02
M <- sol[n + 1]; M
[1] -0.0002080739

With these values, we will compute \(\hat{Z}(\mathbf{s}_0)\) and the Kriging variance \(\sigma^{2}_{\text{OK}}(\mathbf{s}_0)\) as follows:

# Kriging estimate
z_hat <- sum(w * z_values); z_hat
[1] 9.516983
# Kriging variance
c0 <- exp_cov(0)  # C(0)
sigma_OK2 <- c0 - sum(w * c_rhs[1:n]) - M; sigma_OK2
[1] 0.1262456

As we see, \(\hat{Z}(\mathbf{s}_0) = 9.516983\) and \(\sigma^{2}_{\text{OK}}(\mathbf{s}_0) = 0.1262456\). This mean that the value in the location longitude = 5 and latitude = 5 is 9.516983 and the variance associated with this prediction is 0.1262456. We will compare this result with the package “gstat”:

# Using gstat
spat_data <- data.frame(z_values, locations)
colnames(spat_data) <- c("z_values", "s1", "s2")


# Convert to spatial object
coordinates(spat_data) <- ~s1 + s2

# Empirical variogram
emp_vario <- variogram(z_values ~ 1, spat_data)

# Fit a model variogram
fit_vario <- fit.variogram(emp_vario, model = vgm(psill = 1, model = "Exp", range = 1, nugget = 0.1))


# Create a data frame with the prediction location
s0 <- data.frame(s1 = 5, s2 = 5)

# Convert to spatial object
coordinates(s0) <- ~s1 + s2
# Kriging
s0_pred <- krige(z_values ~ 1, spat_data, s0, model = fit_vario)
[using ordinary kriging]
z_hat_gstat <- s0_pred$var1.pred; z_hat_gstat
[1] 9.715304
z_var_gstat <- s0_pred$var1.var; z_var_gstat
[1] 3.880513

Comparing the prediction in location \(\mathbf{s}_0\), for Kriging computed manually and using “gstat” package, the values are very close (9.516983 and 9.715304, respectively). The variance for the prediction using “gstat” is 3.880513.

Finally, we can compare the interpolation in a grid for both methods.

#============================================
#              Interpolation
#============================================
# Set up prediction grid
bbox_vals <- bbox(spat_data)

# Extract ranges
grid_size <- 100
s1_seq <- seq(bbox_vals[1, 1] - 0.5, bbox_vals[1, 2] + 0.5, length.out = grid_size)
s2_seq <- seq(bbox_vals[2, 1] - 0.5, bbox_vals[2, 2] + 0.5, length.out = grid_size)

# Create prediction grid
grid_points <- expand.grid(x = s1_seq, y = s2_seq)


# Initialize vector for predictions (interpolation)
z_inter <- numeric(nrow(grid_points))
var_inter <- numeric(nrow(grid_points))

# Loop over grid points
for (k in 1:nrow(grid_points)) {
  s0 <- as.numeric(grid_points[k, ])
  
  # Build RHS covariance vector
  c_rhs <- numeric(n + 1)
  for (i in 1:n) {
    h <- sqrt(sum((locations[i, ] - s0)^2))
    c_rhs[i] <- exp_cov(h)
  }
  c_rhs[n + 1] <- 1
  
  # Solve for weights
  sol <- solve(cov_mat, c_rhs)
  w <- sol[1:n]
  
  # Kriging estimate
  z_inter[k] <- sum(w * z_values)
  var_inter[k] <- exp_cov(0) - sum(w * c_rhs[1:n]) + M
}

# Reshape result into matrix for plotting
z_pred <- matrix(z_inter, nrow = grid_size, ncol = grid_size, byrow = FALSE)
z_var  <- matrix(var_inter, nrow = grid_size, ncol = grid_size, byrow = FALSE)



#====================================
#     Using the "gstat" package
#====================================
# Convert grid to SpatialPoints
coordinates(grid_points) <- ~x + y
gridded(grid_points) <- TRUE


# Create gstat object for Kriging
krige_gstat <- krige(z_values ~ 1, spat_data, grid_points, model = fit_vario)
[using ordinary kriging]
z_pred_df <- expand.grid(x = s1_seq, y = s2_seq)

# Adding a column of z values
z_pred_df$z_values <- z_values

# Predictions
z_pred_df$z_pred <- as.vector(z_pred) 
z_pred_df$z_pred_gstat <- as.vector(krige_gstat$var1.pred)
# Variances
z_pred_df$z_var <- as.vector(z_var)
z_pred_df$z_var_gstat <- as.vector(krige_gstat$var1.var)




# Plot prediction + observed points
spat_data <- data.frame(spat_data)
head(spat_data)
   z_values       s1       s2 optional
1  8.579187 3.588198 6.399901     TRUE
2 10.513767 8.094746 3.995412     TRUE
3  9.506616 4.680792 5.397517     TRUE
4  9.304915 8.947157 9.590264     TRUE
5  8.096763 9.464206 5.346122     TRUE
6  9.909945 1.410008 9.013152     TRUE
p1 <- ggplot(z_pred_df, aes(x = x, y = y)) +
  geom_tile(aes(fill = z_pred)) +  # prediction surface
  geom_point(data = spat_data, aes(x = s1, y = s2, size = z_values), color = "blue", alpha = 0.75) +
  coord_equal() +
  xlab("s1") + ylab("s2") + 
  scale_fill_gradient(low = "yellow", high = "red") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma) +
  theme_bw() +
  labs(fill = "Predicted z values (manually)", size = "Observed z values") + 
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=14,face="bold"))


p2 <- ggplot(z_pred_df, aes(x = x, y = y)) +
  geom_tile(aes(fill = z_pred_gstat)) +  # prediction surface
  geom_point(data = spat_data, aes(x = s1, y = s2, size = z_values), color = "blue", alpha = 0.75) +
  coord_equal() +
  xlab("s1") + ylab("s2") + 
  scale_fill_gradient(low = "yellow", high = "red") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma) +
  theme_bw() +
  labs(fill = "Predicted z values (gstat)", size = "Observed z values") +
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=14,face="bold"))


grid.arrange(p1, p2, ncol = 1)

# Variances
p3 <- ggplot(z_pred_df, aes(x = x, y = y)) +
  geom_tile(aes(fill = z_var)) +  
  geom_point(data = spat_data, aes(x = s1, y = s2, size = z_values), color = "blue", alpha = 0.75) +
  coord_equal() +
  xlab("s1") + ylab("s2") + 
  scale_fill_gradient(low = "yellow", high = "red") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma) +
  theme_bw() +
  labs(fill = "Variance of predictions (manually)", size = "Observed z") +
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=14,face="bold"))


p4 <- ggplot(z_pred_df, aes(x = x, y = y)) +
  geom_tile(aes(fill = z_var_gstat)) +  
  geom_point(data = spat_data, aes(x = s1, y = s2, size = z_values), color = "blue", alpha = 0.75) +
  coord_equal() +
  xlab("s1") + ylab("s2") + 
  scale_fill_gradient(low = "yellow", high = "red") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma) +
  theme_bw() +
  labs(fill = "Variance of predictions (gstat)", size = "Observed z") +
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=14,face="bold"))


grid.arrange(p3, p4, ncol = 1)

7 Conclusion

In this first entry, I explored the foundational idea of Kriging interpolation, initially proposed by Danie G. Krige and later formalized mathematically by Georges Matheron. The prediction at a specific location, or interpolation over a spatial domain, can be done by solving a system of linear equations. The goal here is to demonstrate that Kriging can be applied both through its classical formulation.

In next post (Understanding Kriging Part 2), I will be presenting its full probabilistic interpretation given by a “Gaussian Process Regression”. While in this post, Kriging predictions or interpolations are based on solving a linear system of equations, a hold understanding of stochastic (spatial) processes is still essential (I did not cover that part given the core of the post, but you can review it in Cressie (1993)). However, the solution itself of the Kriging system of equations relies on linear algebra rather than a likelihood-based method.

Note 1: I am not an expert in using thegstat" package, so the comparisons presented may not be optimal or the most efficient. The package was used solely for illustrative purposes.

Note 2: It is interesting to see high values for the variance in the interpolation using thegstat" package. Let’s explore it in the next post..

References

Cressie, Noel AC. 1993. “Spatial Prediction and Kriging.” Statistics for Spatial Data (Cressie NAC, Ed). New York: John Wiley & Sons, 105–209.
Krige, D. G. 1951. “A Statistical Approach to Some Basic Mine Valuation Problems on the Witwatersrand.” Journal of the Chemical, Metallurgical and Mining Society of South Africa 52 (6): 119–39.
Matheron, Georges. 1963. “Principles of Geostatistics.” Economic Geology 58 (8): 1246–66. https://doi.org/10.2113/gsecongeo.58.8.1246.
Pebesma, Edzer J. 2004. “Multivariable Geostatistics in s: The Gstat Package.” Computers & Geosciences 30 (7): 683–91.
Tobler, Waldo R. 1970. “A Computer Movie Simulating Urban Growth in the Detroit Region.” Economic Geography 46 (sup1): 234–40.