9 Spatial Lag Models
9.1 SLM Formula
The equation for a Spatial Lag Model is:
\[ y = ρWy + Xβ + ε \]
where:
- (y) = outcome vector
- (ρ) = spatial autoregression coefficient (how much do neighbors influence you?)
- (W) = spatial weights matric (who is neighbors with who?)
- (Wy) = spatially lagged y (a weighted average of neighbors’ outcomes)
- (Xβ) = regular OLS predictor x coefficients
- (epsilon) = error term
9.2 Step 1: Define Neighbors
SLM begins by defining what counts as a neighbor, similar to SEM.
For points:
- distance thresholds
For polygons:
- rook adjacency (shared border)
- queen adjacency (shared border or corner)
This determines spatial relationships in the weights matrix W.
9.3 Step 2: Create a Spatial Weights Matrix
After defining neighbors, we represent those relationships in a spatial weights matrix.
W tells the model:
- which observations are spatially connected
- how strongly they are connected
Example:
import geopandas as gpd
from libpysal.weights import KNN
gdf = gpd.read_file("your_shapefile.shp")
# define 4 nearest neighbors
w = KNN.from_dataframe(gdf, k=4)
# row standardize
w.transform = "R"
print(w)9.4 Step 3: Create the Spatially Lagged Variable (Wy)
The key feature of SLM is the spatial lag of the dependent variable.
Wy = weighted average of neighboring y values
Before fitting a regression model, it is important to visualize the relationship between variables.
Let’s create a simple dataset:
import matplotlib.pyplot as plt
plt.scatter(df["x"], df["y"])
plt.xlabel("x")
plt.ylabel("y")
plt.title("Scatter plot of x vs y")
plt.show()9.4.1 Fitting a Simple Linear Regression Model
We can use scikit-learn to fit a regression model.
from sklearn.linear_model import LinearRegression
X = df["x"].values.reshape(-1, 1)
y = df["y"].values
model = LinearRegression()
model.fit(X, y)
intercept = model.intercept_
slope = model.coef_[0]
print("Intercept (a):", intercept)
print("Slope (b):", slope)9.4.2 Plotting the Regression Line
y_pred = model.predict(X)
plt.scatter(df["x"], df["y"], label="Observed data")
plt.plot(df["x"], y_pred, color="red", label="Regression line")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.show()9.5 Residuals
\[ residual = y_{observed} - y_{predicted} \]
df["y_pred"] = y_pred
df["residuals"] = df["y"] - df["y_pred"]
df9.5.1 Visualizing Residuals
plt.scatter(df["x"], df["residuals"])
plt.axhline(0)
plt.xlabel("x")
plt.ylabel("Residuals")
plt.title("Residual Plot")
plt.show()9.6 Interpreting ρ
ρ measures how strongly neighboring outcomes influence local outcomes.
- ρ≈0 → little spatial dependence
- positive ρ → nearby high values increase local values
- large positive ρ → strong spillover effects
- negative ρ → nearby high values lower local values
The larger ρ magnitude, the stronger neighbor influence.
9.7 Key Assumptions of SLR
- Linearity
- Independence of errors
- Constant variance of errors
- Normally distributed errors
9.8 Summary
- SLR models the relationship between one predictor and one response
- It fits a straight line using (y = a + bx)
- Residuals help evaluate model performance