8 Spatial Error Models

8.1 SEM Formula

The equation for a Spatial Error Model is:

\[ y = X\beta + u \]

where the error term is modeled as:

\[ u = \lambda Wu + \varepsilon \]

where:

(y) = outcome vector
(X) = regular regression predictors and coefficients
(u) = spatially autocorrelated error term
(lambda) = spatial error coefficient (strength of autocorrelation in errors)
(W) = spatial weights matrix
(Wu) = weighted average of neighboring errors
(epsilon) = random independent error

8.2 What This Means

In ordinary regression, error is assumed independent.

In SEM, error depends partly on neighboring errors.

In simple terms, if one location has unexplained positive error, nearby places may also have positive unexplained error.

8.3 Step 1: Define Neighbors

SEM begins by defining what counts as a neighbor.

For points:

distance thresholds

For polygons:

rook adjacency (shared border)
queen adjacency (shared border or corner)

This determines spatial relationships in the weights matrix W.

8.4 Step 2: Create a Spatial Weights Matrix

After defining neighbors, we represent those relationships in a spatial weights matrix.

W tells the model:

which observations are spatially connected
how strongly they are connected

Example:

import geopandas as gpd
from libpysal.weights import KNN

gdf = gpd.read_file("your_shapefile.shp")

# define 4 nearest neighbors
w = KNN.from_dataframe(gdf, k=4)

# row standardize
w.transform = "R"

print(w)

8.5 Step 3: Fit an OLS Model First

SEM is often used after fitting ordinary least squares regression.

First fit a normal regression, then examine residuals.

If residuals show spatial autocorrelation (for example using Moran’s I), this suggests SEM may be appropriate.

Example:

from sklearn.linear_model import LinearRegression

X = df[["x1", "x2"]]
y = df["y"]

model = LinearRegression()
model.fit(X, y)

df["predicted"] = model.predict(X)
df["residuals"] = y - df["predicted"]

8.6 Step 4: Test Residual Spatial Autocorrelation

Now test whether residuals are clustered spatially.

If nearby residuals are similar, model is missing spatial structure, and SEM may improve model performance. Positive Moran’s I in residuals often indicates this.

Fitting a Spatial Error Model:

A Spatial Error Model can be fit using PySAL:

from spreg import ML_Error
import numpy as np

X = df[["x1", "x2"]].values
y = df["y"].values.reshape((-1,1))

model = ML_Error(y, X, w=w)

print(model.summary)

The output estimates:

regression coefficients (β)
spatial error coefficient (λ)

8.7 Interpreting λ

λ measures spatial autocorrelation in the residuals.

λ≈0 → little spatial error dependence
large positive λ → strong residual clustering
negative λ → neighboring residuals differ strongly

The larger λ, the stronger the spatial structure in unexplained variation.

8.8 Summary

SEM models spatial autocorrelation in residuals
Used when unexplained variation is spatial
Often caused by missing or unmeasured variables
Requires defining neighbors and building a spatial weights matrix
λ measures strength of spatial error dependence

--- title: "Spatial Error Models" --- ## SEM Formula The equation for a Spatial Error Model is: $$ y = X\beta + u $$ where the error term is modeled as: $$ u = \lambda Wu + \varepsilon $$ where: - $y$ = outcome vector - $X\beta$ = regular regression predictors and coefficients - $u$ = spatially autocorrelated error term - $lambda) = spatial error coefficient (strength of autocorrelation in errors) - \(W$ = spatial weights matrix - $Wu$ = weighted average of neighboring errors - \(epsilon) = random independent error --- ## What This Means In ordinary regression, error is assumed independent. In SEM, error depends partly on neighboring errors. In simple terms, *if one location has unexplained positive error, nearby places may also have positive unexplained error*. --- ## Step 1: Define Neighbors SEM begins by defining what counts as a neighbor. For points: - distance thresholds For polygons: - rook adjacency (shared border) - queen adjacency (shared border or corner) This determines spatial relationships in the weights matrix W. --- ## Step 2: Create a Spatial Weights Matrix After defining neighbors, we represent those relationships in a spatial weights matrix. W tells the model: - which observations are spatially connected - how strongly they are connected Example: ```python import geopandas as gpd from libpysal.weights import KNN gdf = gpd.read_file("your_shapefile.shp") # define 4 nearest neighbors w = KNN.from_dataframe(gdf, k=4) # row standardize w.transform = "R" print(w) ``` --- ## Step 3: Fit an OLS Model First SEM is often used after fitting ordinary least squares regression. First fit a normal regression, then examine residuals. If residuals show spatial autocorrelation (for example using Moran's I), this suggests SEM may be appropriate. Example: ```python from sklearn.linear_model import LinearRegression X = df[["x1", "x2"]] y = df["y"] model = LinearRegression() model.fit(X, y) df["predicted"] = model.predict(X) df["residuals"] = y - df["predicted"] ``` --- ## Step 4: Test Residual Spatial Autocorrelation Now test whether residuals are clustered spatially. If nearby residuals are similar, model is missing spatial structure, and SEM may improve model performance. Positive Moran's I in residuals often indicates this. **Fitting a Spatial Error Model:** A Spatial Error Model can be fit using PySAL: ```python from spreg import ML_Error import numpy as np X = df[["x1", "x2"]].values y = df["y"].values.reshape((-1,1)) model = ML_Error(y, X, w=w) print(model.summary) ``` The output estimates: - regression coefficients (β) - spatial error coefficient (λ) --- ## Interpreting λ λ measures spatial autocorrelation in the residuals. - λ≈0 → little spatial error dependence - large positive λ → strong residual clustering - negative λ → neighboring residuals differ strongly The larger λ, the stronger the spatial structure in unexplained variation. --- ## Summary - SEM models spatial autocorrelation in residuals - Used when unexplained variation is spatial - Often caused by missing or unmeasured variables - Requires defining neighbors and building a spatial weights matrix - λ measures strength of spatial error dependence ---