9 Spatial Lag Models

9.1 SLM Formula

The equation for a Spatial Lag Model is:

\[ y = ρWy + Xβ + ε \]

where:

(y) = outcome vector
(ρ) = spatial autoregression coefficient (how much do neighbors influence you?)
(W) = spatial weights matric (who is neighbors with who?)
(Wy) = spatially lagged y (a weighted average of neighbors’ outcomes)
(Xβ) = regular OLS predictor x coefficients
(epsilon) = error term

9.2 Step 1: Define Neighbors

SLM begins by defining what counts as a neighbor, similar to SEM.

For points:

distance thresholds

For polygons:

rook adjacency (shared border)
queen adjacency (shared border or corner)

This determines spatial relationships in the weights matrix W.

9.3 Step 2: Create a Spatial Weights Matrix

After defining neighbors, we represent those relationships in a spatial weights matrix.

W tells the model:

which observations are spatially connected
how strongly they are connected

Example:

import geopandas as gpd
from libpysal.weights import KNN

gdf = gpd.read_file("your_shapefile.shp")

# define 4 nearest neighbors
w = KNN.from_dataframe(gdf, k=4)

# row standardize
w.transform = "R"

print(w)

9.4 Step 3: Create the Spatially Lagged Variable (Wy)

The key feature of SLM is the spatial lag of the dependent variable.

Wy = weighted average of neighboring y values

Before fitting a regression model, it is important to visualize the relationship between variables.

Let’s create a simple dataset:

import matplotlib.pyplot as plt

plt.scatter(df["x"], df["y"])
plt.xlabel("x")
plt.ylabel("y")
plt.title("Scatter plot of x vs y")
plt.show()

9.4.1 Fitting a Simple Linear Regression Model

We can use scikit-learn to fit a regression model.

from sklearn.linear_model import LinearRegression

X = df["x"].values.reshape(-1, 1)
y = df["y"].values

model = LinearRegression()
model.fit(X, y)

intercept = model.intercept_
slope = model.coef_[0]

print("Intercept (a):", intercept)
print("Slope (b):", slope)

9.4.2 Plotting the Regression Line

y_pred = model.predict(X)

plt.scatter(df["x"], df["y"], label="Observed data")
plt.plot(df["x"], y_pred, color="red", label="Regression line")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.show()

9.5 Residuals

\[ residual = y_{observed} - y_{predicted} \]

df["y_pred"] = y_pred
df["residuals"] = df["y"] - df["y_pred"]

df

9.5.1 Visualizing Residuals

plt.scatter(df["x"], df["residuals"])
plt.axhline(0)
plt.xlabel("x")
plt.ylabel("Residuals")
plt.title("Residual Plot")
plt.show()

9.6 Interpreting ρ

ρ measures how strongly neighboring outcomes influence local outcomes.

ρ≈0 → little spatial dependence
positive ρ → nearby high values increase local values
large positive ρ → strong spillover effects
negative ρ → nearby high values lower local values

The larger ρ magnitude, the stronger neighbor influence.

9.7 Key Assumptions of SLR

Linearity
Independence of errors
Constant variance of errors
Normally distributed errors

9.8 Summary

SLR models the relationship between one predictor and one response
It fits a straight line using (y = a + bx)
Residuals help evaluate model performance

--- title: "Spatial Lag Models" --- ## SLM Formula The equation for a Spatial Lag Model is: $$ y = ρWy + Xβ + ε $$ where: - $y$ = outcome vector - $ρ$ = spatial autoregression coefficient (how much do neighbors influence you?) - $W$ = spatial weights matric (who is neighbors with who?) - $Wy$ = spatially lagged y (a weighted average of neighbors' outcomes) - $Xβ$ = regular OLS predictor x coefficients - \(epsilon) = error term --- ## Step 1: Define Neighbors SLM begins by defining what counts as a neighbor, similar to SEM. For points: - distance thresholds For polygons: - rook adjacency (shared border) - queen adjacency (shared border or corner) This determines spatial relationships in the weights matrix W. --- ## Step 2: Create a Spatial Weights Matrix After defining neighbors, we represent those relationships in a spatial weights matrix. W tells the model: - which observations are spatially connected - how strongly they are connected Example: ```python import geopandas as gpd from libpysal.weights import KNN gdf = gpd.read_file("your_shapefile.shp") # define 4 nearest neighbors w = KNN.from_dataframe(gdf, k=4) # row standardize w.transform = "R" print(w) ``` --- ## Step 3: Create the Spatially Lagged Variable (Wy) The key feature of SLM is the spatial lag of the dependent variable. Wy = weighted average of neighboring y values Before fitting a regression model, it is important to visualize the relationship between variables. Let's create a simple dataset: ```python import matplotlib.pyplot as plt plt.scatter(df["x"], df["y"]) plt.xlabel("x") plt.ylabel("y") plt.title("Scatter plot of x vs y") plt.show() ``` ### Fitting a Simple Linear Regression Model We can use `scikit-learn` to fit a regression model. ```python from sklearn.linear_model import LinearRegression X = df["x"].values.reshape(-1, 1) y = df["y"].values model = LinearRegression() model.fit(X, y) intercept = model.intercept_ slope = model.coef_[0] print("Intercept (a):", intercept) print("Slope (b):", slope) ``` ### Plotting the Regression Line ```python y_pred = model.predict(X) plt.scatter(df["x"], df["y"], label="Observed data") plt.plot(df["x"], y_pred, color="red", label="Regression line") plt.xlabel("x") plt.ylabel("y") plt.legend() plt.show() ``` ## Residuals $$ residual = y_{observed} - y_{predicted} $$ ```python df["y_pred"] = y_pred df["residuals"] = df["y"] - df["y_pred"] df ``` ### Visualizing Residuals ```python plt.scatter(df["x"], df["residuals"]) plt.axhline(0) plt.xlabel("x") plt.ylabel("Residuals") plt.title("Residual Plot") plt.show() ``` --- ## Interpreting ρ ρ measures how strongly neighboring outcomes influence local outcomes. - ρ≈0 → little spatial dependence - positive ρ → nearby high values increase local values - large positive ρ → strong spillover effects - negative ρ → nearby high values lower local values The larger ρ magnitude, the stronger neighbor influence. --- ## Key Assumptions of SLR 1. Linearity 2. Independence of errors 3. Constant variance of errors 4. Normally distributed errors --- ## Summary - SLR models the relationship between one predictor and one response - It fits a straight line using $y = a + bx$ - Residuals help evaluate model performance ---