::install_github("LukeCe/spflow") devtools
In-Class Exercise 5: Spatial Econometric Interaction Models
1. Overview
This exercise is a continuation of in-class exercises 3 and 4.
2. Getting Started
Install development version of the spflow
package:
We will use the following R packages in this exercise:
spflow
for spatial econometric interaction modelling
::p_load(spflow, tmap, sf, spdep, sp,
pacman Matrix, reshape2, tidyverse, readr)
3. The Data
Three datasets are required before Spatial Econometric Interaction Models can be calibrated: 1. spatial weights 2. tibble data.frame with origin, destination, flows and distances between origin and destination 3. tibble data.frame with explanatory variables
We will load the datasets into the R environment using the following codes:
# spatial weights and neighbour list
<- read_rds("data/rds/mpsz_nb.rds")
mpsz_nb head(mpsz_nb)
$by_contiguity
Neighbour list object:
Number of regions: 313
Number of nonzero links: 1902
Percentage nonzero weights: 1.94143
Average number of links: 6.076677
$by_distance
Neighbour list object:
Number of regions: 313
Number of nonzero links: 15422
Percentage nonzero weights: 15.74171
Average number of links: 49.27157
1 region with no links:
313
2 disjoint connected subgraphs
$by_knn
Neighbour list object:
Number of regions: 313
Number of nonzero links: 939
Percentage nonzero weights: 0.9584665
Average number of links: 3
Non-symmetric neighbours list
We can observe that one area (313) has no neighbour if we use fixed distance.
# origin, destination, flow and distance
<- read_rds("data/rds/mpsz_flow.rds")
mpsz_flow glimpse(mpsz_flow)
Rows: 97,969
Columns: 4
$ ORIGIN_SZ <chr> "RVSZ05", "SRSZ01", "MUSZ02", "MPSZ05", "SISZ01", "BMSZ17", …
$ DESTIN_SZ <chr> "RVSZ05", "RVSZ05", "RVSZ05", "RVSZ05", "RVSZ05", "RVSZ05", …
$ DISTANCE <dbl> 0.0000, 305.7370, 951.8314, 5254.0664, 4975.0021, 3176.1592,…
$ TRIPS <dbl> 67, 549, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, …
# explanatory variables
<- read_rds("data/rds/mpsz_var.rds")
mpsz_var glimpse(mpsz_var)
Rows: 313
Columns: 14
$ SZ_NAME <chr> "INSTITUTION HILL", "ROBERTSON QUAY", "FORT CANNING", "…
$ SZ_CODE <chr> "RVSZ05", "SRSZ01", "MUSZ02", "MPSZ05", "SISZ01", "BMSZ…
$ BUSSTOP_COUNT <int> 2, 10, 6, 2, 1, 10, 5, 4, 21, 11, 2, 9, 6, 1, 4, 7, 24,…
$ AGE7_12 <dbl> 330, 320, 0, 0, 200, 0, 0, 0, 350, 470, 0, 300, 390, 0,…
$ AGE13_24 <dbl> 360, 350, 10, 0, 260, 0, 0, 0, 460, 1160, 0, 760, 890, …
$ AGE25_64 <dbl> 2260, 2200, 30, 0, 1440, 0, 0, 0, 2600, 6260, 630, 4350…
$ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((28481.45 30..., MULTIPOLYG…
$ SCHOOL_COUNT <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 1, 0, 0, 0, 1, 0, 0…
$ BUSINESS_COUNT <int> 6, 4, 7, 0, 1, 11, 15, 1, 10, 1, 17, 6, 0, 0, 51, 2, 3,…
$ RETAILS_COUNT <int> 26, 207, 17, 0, 84, 14, 52, 0, 460, 34, 263, 55, 37, 1,…
$ FINSERV_COUNT <int> 3, 18, 0, 0, 29, 4, 6, 0, 34, 4, 26, 4, 3, 6, 59, 3, 8,…
$ ENTERTN_COUNT <int> 0, 6, 3, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0…
$ FB_COUNT <int> 4, 38, 4, 0, 38, 15, 5, 0, 20, 0, 9, 25, 0, 0, 9, 1, 3,…
$ LR_COUNT <int> 3, 11, 7, 0, 20, 0, 0, 0, 19, 2, 4, 4, 1, 1, 13, 0, 17,…
4. Creating spflow_network-class
Objects
spflow_network-class
is a S4 class that contains all information on a spatial network that is composed by a set of nodes linked by some neighbourhood relation. It can be created by using spflow_network
from spflow
package.
We will use the following code to create a sp flow network class using contiguity based neighbourhood structure:
<- spflow_network(
mpsz_net # we assign "sg" as an id
id_net = "sg",
node_neighborhood = nb2mat(mpsz_nb$by_contiguity),
node_data = mpsz_var,
node_key_column = "SZ_CODE")
mpsz_net
Spatial network nodes with id: sg
--------------------------------------------------
Number of nodes: 313
Average number of links per node: 6.077
Density of the neighborhood matrix: 1.94% (non-zero connections)
Data on nodes:
SZ_NAME SZ_CODE BUSSTOP_COUNT AGE7_12 AGE13_24 AGE25_64
1 INSTITUTION HILL RVSZ05 2 330 360 2260
2 ROBERTSON QUAY SRSZ01 10 320 350 2200
3 FORT CANNING MUSZ02 6 0 10 30
4 MARINA EAST (MP) MPSZ05 2 0 0 0
5 SENTOSA SISZ01 1 200 260 1440
6 CITY TERMINALS BMSZ17 10 0 0 0
--- --- --- --- --- --- ---
308 NEE SOON YSSZ07 12 90 140 590
309 UPPER THOMSON BSSZ01 47 1590 3660 15980
310 SHANGRI-LA AMSZ05 12 810 1920 9650
311 TOWNSVILLE AMSZ04 9 980 2000 11320
312 MARYMOUNT BSSZ02 25 1610 4060 16860
313 TUAS VIEW EXTENSION TSSZ06 11 0 0 0
SCHOOL_COUNT BUSINESS_COUNT RETAILS_COUNT FINSERV_COUNT ENTERTN_COUNT
1 1 6 26 3 0
2 0 4 207 18 6
3 0 7 17 0 3
4 0 0 0 0 0
5 0 1 84 29 2
6 0 11 14 4 0
--- --- --- --- --- ---
308 0 0 7 0 0
309 3 21 305 30 0
310 3 0 53 9 0
311 1 0 83 11 0
312 3 19 135 8 0
313 0 53 3 1 0
FB_COUNT LR_COUNT COORD_X COORD_Y
1 4 3 103.84 1.29
2 38 11 103.84 1.29
3 4 7 103.85 1.29
4 0 0 103.88 1.29
5 38 20 103.83 1.25
6 15 0 103.85 1.26
--- --- --- --- ---
308 0 0 103.81 1.4
309 5 11 103.83 1.36
310 0 0 103.84 1.37
311 1 1 103.85 1.36
312 3 11 103.84 1.35
313 0 0 103.61 1.26
We will use the spflow_network_pair()
to hold information on origin-destination (OD) pairs:
<- spflow_network_pair(
mpsz_net_pairs id_orig_net = "sg",
id_dest_net = "sg",
pair_data = mpsz_flow,
orig_key_column = "ORIGIN_SZ",
dest_key_column = "DESTIN_SZ")
mpsz_net_pairs
Spatial network pair with id: sg_sg
--------------------------------------------------
Origin network id: sg (with 313 nodes)
Destination network id: sg (with 313 nodes)
Number of pairs: 97969
Completeness of pairs: 100.00% (97969/97969)
Data on node-pairs:
DESTIN_SZ ORIGIN_SZ DISTANCE TRIPS
1 RVSZ05 RVSZ05 0 67
314 SRSZ01 RVSZ05 305.74 251
627 MUSZ02 RVSZ05 951.83 0
940 MPSZ05 RVSZ05 5254.07 0
1253 SISZ01 RVSZ05 4975 0
1566 BMSZ17 RVSZ05 3176.16 0
--- --- --- --- ---
96404 YSSZ07 TSSZ06 26972.97 0
96717 BSSZ01 TSSZ06 25582.48 0
97030 AMSZ05 TSSZ06 26714.79 0
97343 AMSZ04 TSSZ06 27572.74 0
97656 BSSZ02 TSSZ06 26681.7 0
97969 TSSZ06 TSSZ06 0 270
Lastly, we will use spflow_network_multi()
to combine the network class and pair class to contain information on the node and node-pairs:
<- spflow_network_multi(
mpsz_multi_net
mpsz_net,mpsz_net_pairs)
mpsz_multi_net
Collection of spatial network nodes and pairs
--------------------------------------------------
Contains 1 spatial network nodes
With id : sg
Contains 1 spatial network pairs
With id : sg_sg
Availability of origin-destination pair information:
ID_ORIG_NET ID_DEST_NET ID_NET_PAIR COMPLETENESS C_PAIRS C_ORIG C_DEST
sg sg sg_sg 100.00% 97969/97969 313/313 313/313
5. Correlation Analysis
When building explanatory models, it is important to check for multicollinearity to avoid including variables that are highly correlated to each other.
We will use the following functions from the spflow package o check for collinearity: 1. pair_cor()
to create a correlation matrix 2. cor_image()
to plot the correlation matrix as a correlogram
<- log(1 + TRIPS) ~
cor_formula +
BUSSTOP_COUNT +
AGE7_12 +
AGE13_24 +
AGE25_64 +
SCHOOL_COUNT+
BUSINESS_COUNT +
RETAILS_COUNT +
FINSERV_COUNT P_(log(DISTANCE + 1))
<- pair_cor(
cor_mat
mpsz_multi_net,spflow_formula = cor_formula,
add_lags_x = FALSE
)
colnames(cor_mat) <- paste0(
substr(
colnames(cor_mat),1,3),
"...")
cor_image(cor_mat)
6. Model Calibration
6.1. Base Model
We will calibrate a base model using the following code:
<- spflow(
base_model spflow_formula = log(1 + TRIPS) ~
O_(BUSSTOP_COUNT +
+
AGE25_64) D_(SCHOOL_COUNT +
+
BUSINESS_COUNT +
RETAILS_COUNT +
FINSERV_COUNT) P_(log(DISTANCE + 1)),
spflow_networks = mpsz_multi_net)
base_model
--------------------------------------------------
Spatial interaction model estimated by: MLE
Spatial correlation structure: SDM (model_9)
Dependent variable: log(1 + TRIPS)
--------------------------------------------------
Coefficients:
est sd t.stat p.val
rho_d 0.680 0.004 192.554 0.000
rho_o 0.678 0.004 187.733 0.000
rho_w -0.396 0.006 -65.592 0.000
(Intercept) 0.410 0.065 6.267 0.000
(Intra) 1.312 0.081 16.262 0.000
D_SCHOOL_COUNT 0.017 0.002 7.885 0.000
D_SCHOOL_COUNT.lag1 0.002 0.004 0.552 0.581
D_BUSINESS_COUNT 0.000 0.000 3.015 0.003
D_BUSINESS_COUNT.lag1 0.000 0.000 -0.249 0.804
D_RETAILS_COUNT 0.000 0.000 -0.306 0.759
D_RETAILS_COUNT.lag1 0.000 0.000 0.152 0.880
D_FINSERV_COUNT 0.002 0.000 6.787 0.000
D_FINSERV_COUNT.lag1 -0.002 0.001 -3.767 0.000
O_BUSSTOP_COUNT 0.002 0.000 6.807 0.000
O_BUSSTOP_COUNT.lag1 -0.001 0.000 -2.364 0.018
O_AGE25_64 0.000 0.000 7.336 0.000
O_AGE25_64.lag1 0.000 0.000 -2.797 0.005
P_log(DISTANCE + 1) -0.050 0.007 -6.794 0.000
--------------------------------------------------
R2_corr: 0.6942948
Observations: 97969
Model coherence: Validated
6.2. Residual Diagnostics
<- par(mfrow = c(1,3),
old_par mar = c(2,2,2,2))
spflow_moran_plots(base_model)
par(old_par)
Closer to the zero line means lesser spatial autocorrelation.
Next, we use pair_cor()
to inspect the relationship of the residual and explanatory variables:
<- pair_cor(base_model)
corr_residual colnames(corr_residual) <- substr(colnames(corr_residual),1,3)
cor_image(corr_residual)
6.3. Model Control
<- log(1 + TRIPS) ~
spflow_formula O_(BUSSTOP_COUNT +
+
AGE25_64) D_(SCHOOL_COUNT +
+
BUSINESS_COUNT +
RETAILS_COUNT +
FINSERV_COUNT) P_(log(DISTANCE + 1))
<- spflow_control(
model_control estimation_method = "mle",
model = "model_8")
<- spflow(
mle_model8
spflow_formula,spflow_networks = mpsz_multi_net,
estimation_control = model_control
)
mle_model8
--------------------------------------------------
Spatial interaction model estimated by: MLE
Spatial correlation structure: SDM (model_8)
Dependent variable: log(1 + TRIPS)
--------------------------------------------------
Coefficients:
est sd t.stat p.val
rho_d 0.689 0.003 196.832 0.000
rho_o 0.687 0.004 192.214 0.000
rho_w -0.473 0.003 -142.469 0.000
(Intercept) 1.086 0.049 22.275 0.000
(Intra) 0.840 0.075 11.255 0.000
D_SCHOOL_COUNT 0.019 0.002 8.896 0.000
D_SCHOOL_COUNT.lag1 0.019 0.004 5.130 0.000
D_BUSINESS_COUNT 0.000 0.000 3.328 0.001
D_BUSINESS_COUNT.lag1 0.000 0.000 1.664 0.096
D_RETAILS_COUNT 0.000 0.000 -0.414 0.679
D_RETAILS_COUNT.lag1 0.000 0.000 -0.171 0.864
D_FINSERV_COUNT 0.002 0.000 6.150 0.000
D_FINSERV_COUNT.lag1 -0.003 0.001 -4.601 0.000
O_BUSSTOP_COUNT 0.003 0.000 7.676 0.000
O_BUSSTOP_COUNT.lag1 0.000 0.000 0.552 0.581
O_AGE25_64 0.000 0.000 6.870 0.000
O_AGE25_64.lag1 0.000 0.000 -0.462 0.644
P_log(DISTANCE + 1) -0.125 0.005 -22.865 0.000
--------------------------------------------------
R2_corr: 0.6965974
Observations: 97969
Model coherence: Validated
Model 8 brings out intrazonal flows to examine its influence on commuter flows.
<- par(mfrow = c(1,3),
old_par mar = c(2,2,2,2))
spflow_moran_plots(mle_model8)
par(mle_model8)
NULL