9  Spatial Lag Models

9.1 SLM Formula

The equation for a Spatial Lag Model is:

\[ y = ρWy + Xβ + ε \]

where:

  • (y) = outcome vector
  • (ρ) = spatial autoregression coefficient (how much do neighbors influence you?)
  • (W) = spatial weights matric (who is neighbors with who?)
  • (Wy) = spatially lagged y (a weighted average of neighbors’ outcomes)
  • (Xβ) = regular OLS predictor x coefficients
  • (epsilon) = error term

9.2 Step 1: Define Neighbors

SLM begins by defining what counts as a neighbor, similar to SEM.

For points:

  • distance thresholds

For polygons:

  • rook adjacency (shared border)
  • queen adjacency (shared border or corner)

This determines spatial relationships in the weights matrix W.


9.3 Step 2: Create a Spatial Weights Matrix

After defining neighbors, we represent those relationships in a spatial weights matrix.

W tells the model:

  • which observations are spatially connected
  • how strongly they are connected

Example:

import geopandas as gpd
from libpysal.weights import KNN

gdf = gpd.read_file("your_shapefile.shp")

# define 4 nearest neighbors
w = KNN.from_dataframe(gdf, k=4)

# row standardize
w.transform = "R"

print(w)

9.4 Step 3: Create the Spatially Lagged Variable (Wy)

The key feature of SLM is the spatial lag of the dependent variable.

Wy = weighted average of neighboring y values

Before fitting a regression model, it is important to visualize the relationship between variables.

Let’s create a simple dataset:

import matplotlib.pyplot as plt

plt.scatter(df["x"], df["y"])
plt.xlabel("x")
plt.ylabel("y")
plt.title("Scatter plot of x vs y")
plt.show()

9.4.1 Fitting a Simple Linear Regression Model

We can use scikit-learn to fit a regression model.

from sklearn.linear_model import LinearRegression

X = df["x"].values.reshape(-1, 1)
y = df["y"].values

model = LinearRegression()
model.fit(X, y)

intercept = model.intercept_
slope = model.coef_[0]

print("Intercept (a):", intercept)
print("Slope (b):", slope)

9.4.2 Plotting the Regression Line

y_pred = model.predict(X)

plt.scatter(df["x"], df["y"], label="Observed data")
plt.plot(df["x"], y_pred, color="red", label="Regression line")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.show()

9.5 Residuals

\[ residual = y_{observed} - y_{predicted} \]

df["y_pred"] = y_pred
df["residuals"] = df["y"] - df["y_pred"]

df

9.5.1 Visualizing Residuals

plt.scatter(df["x"], df["residuals"])
plt.axhline(0)
plt.xlabel("x")
plt.ylabel("Residuals")
plt.title("Residual Plot")
plt.show()

9.6 Interpreting ρ

ρ measures how strongly neighboring outcomes influence local outcomes.

  • ρ≈0 → little spatial dependence
  • positive ρ → nearby high values increase local values
  • large positive ρ → strong spillover effects
  • negative ρ → nearby high values lower local values

The larger ρ magnitude, the stronger neighbor influence.


9.7 Key Assumptions of SLR

  1. Linearity
  2. Independence of errors
  3. Constant variance of errors
  4. Normally distributed errors

9.8 Summary

  • SLR models the relationship between one predictor and one response
  • It fits a straight line using (y = a + bx)
  • Residuals help evaluate model performance