In-Class Exercise 5: Spatial Econometric Interaction Models

Published

December 16, 2023

Modified

December 17, 2023

1. Overview

This exercise is a continuation of in-class exercises 3 and 4.

2. Getting Started

Install development version of the spflow package:

devtools::install_github("LukeCe/spflow")

We will use the following R packages in this exercise:

  • spflow for spatial econometric interaction modelling
pacman::p_load(spflow, tmap, sf, spdep, sp,
               Matrix, reshape2, tidyverse, readr)

3. The Data

Three datasets are required before Spatial Econometric Interaction Models can be calibrated: 1. spatial weights 2. tibble data.frame with origin, destination, flows and distances between origin and destination 3. tibble data.frame with explanatory variables

We will load the datasets into the R environment using the following codes:

# spatial weights and neighbour list
mpsz_nb <- read_rds("data/rds/mpsz_nb.rds")
head(mpsz_nb)
$by_contiguity
Neighbour list object:
Number of regions: 313 
Number of nonzero links: 1902 
Percentage nonzero weights: 1.94143 
Average number of links: 6.076677 

$by_distance
Neighbour list object:
Number of regions: 313 
Number of nonzero links: 15422 
Percentage nonzero weights: 15.74171 
Average number of links: 49.27157 
1 region with no links:
313
2 disjoint connected subgraphs

$by_knn
Neighbour list object:
Number of regions: 313 
Number of nonzero links: 939 
Percentage nonzero weights: 0.9584665 
Average number of links: 3 
Non-symmetric neighbours list

We can observe that one area (313) has no neighbour if we use fixed distance.

# origin, destination, flow and distance
mpsz_flow <- read_rds("data/rds/mpsz_flow.rds")
glimpse(mpsz_flow)
Rows: 97,969
Columns: 4
$ ORIGIN_SZ <chr> "RVSZ05", "SRSZ01", "MUSZ02", "MPSZ05", "SISZ01", "BMSZ17", …
$ DESTIN_SZ <chr> "RVSZ05", "RVSZ05", "RVSZ05", "RVSZ05", "RVSZ05", "RVSZ05", …
$ DISTANCE  <dbl> 0.0000, 305.7370, 951.8314, 5254.0664, 4975.0021, 3176.1592,…
$ TRIPS     <dbl> 67, 549, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, …
# explanatory variables
mpsz_var <- read_rds("data/rds/mpsz_var.rds")
glimpse(mpsz_var)
Rows: 313
Columns: 14
$ SZ_NAME        <chr> "INSTITUTION HILL", "ROBERTSON QUAY", "FORT CANNING", "…
$ SZ_CODE        <chr> "RVSZ05", "SRSZ01", "MUSZ02", "MPSZ05", "SISZ01", "BMSZ…
$ BUSSTOP_COUNT  <int> 2, 10, 6, 2, 1, 10, 5, 4, 21, 11, 2, 9, 6, 1, 4, 7, 24,…
$ AGE7_12        <dbl> 330, 320, 0, 0, 200, 0, 0, 0, 350, 470, 0, 300, 390, 0,…
$ AGE13_24       <dbl> 360, 350, 10, 0, 260, 0, 0, 0, 460, 1160, 0, 760, 890, …
$ AGE25_64       <dbl> 2260, 2200, 30, 0, 1440, 0, 0, 0, 2600, 6260, 630, 4350…
$ geometry       <MULTIPOLYGON [m]> MULTIPOLYGON (((28481.45 30..., MULTIPOLYG…
$ SCHOOL_COUNT   <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 1, 0, 0, 0, 1, 0, 0…
$ BUSINESS_COUNT <int> 6, 4, 7, 0, 1, 11, 15, 1, 10, 1, 17, 6, 0, 0, 51, 2, 3,…
$ RETAILS_COUNT  <int> 26, 207, 17, 0, 84, 14, 52, 0, 460, 34, 263, 55, 37, 1,…
$ FINSERV_COUNT  <int> 3, 18, 0, 0, 29, 4, 6, 0, 34, 4, 26, 4, 3, 6, 59, 3, 8,…
$ ENTERTN_COUNT  <int> 0, 6, 3, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0…
$ FB_COUNT       <int> 4, 38, 4, 0, 38, 15, 5, 0, 20, 0, 9, 25, 0, 0, 9, 1, 3,…
$ LR_COUNT       <int> 3, 11, 7, 0, 20, 0, 0, 0, 19, 2, 4, 4, 1, 1, 13, 0, 17,…

4. Creating spflow_network-class Objects

spflow_network-class is a S4 class that contains all information on a spatial network that is composed by a set of nodes linked by some neighbourhood relation. It can be created by using spflow_network from spflow package.

We will use the following code to create a sp flow network class using contiguity based neighbourhood structure:

mpsz_net <- spflow_network(
  # we assign "sg" as an id 
  id_net = "sg",
  node_neighborhood = nb2mat(mpsz_nb$by_contiguity),
  node_data = mpsz_var,
  node_key_column = "SZ_CODE")

mpsz_net
Spatial network nodes with id: sg
--------------------------------------------------
Number of nodes: 313
Average number of links per node: 6.077
Density of the neighborhood matrix: 1.94% (non-zero connections)

Data on nodes:
                SZ_NAME SZ_CODE BUSSTOP_COUNT AGE7_12 AGE13_24 AGE25_64
1      INSTITUTION HILL  RVSZ05             2     330      360     2260
2        ROBERTSON QUAY  SRSZ01            10     320      350     2200
3          FORT CANNING  MUSZ02             6       0       10       30
4      MARINA EAST (MP)  MPSZ05             2       0        0        0
5               SENTOSA  SISZ01             1     200      260     1440
6        CITY TERMINALS  BMSZ17            10       0        0        0
---                 ---     ---           ---     ---      ---      ---
308            NEE SOON  YSSZ07            12      90      140      590
309       UPPER THOMSON  BSSZ01            47    1590     3660    15980
310          SHANGRI-LA  AMSZ05            12     810     1920     9650
311          TOWNSVILLE  AMSZ04             9     980     2000    11320
312           MARYMOUNT  BSSZ02            25    1610     4060    16860
313 TUAS VIEW EXTENSION  TSSZ06            11       0        0        0
    SCHOOL_COUNT BUSINESS_COUNT RETAILS_COUNT FINSERV_COUNT ENTERTN_COUNT
1              1              6            26             3             0
2              0              4           207            18             6
3              0              7            17             0             3
4              0              0             0             0             0
5              0              1            84            29             2
6              0             11            14             4             0
---          ---            ---           ---           ---           ---
308            0              0             7             0             0
309            3             21           305            30             0
310            3              0            53             9             0
311            1              0            83            11             0
312            3             19           135             8             0
313            0             53             3             1             0
    FB_COUNT LR_COUNT COORD_X COORD_Y
1          4        3  103.84    1.29
2         38       11  103.84    1.29
3          4        7  103.85    1.29
4          0        0  103.88    1.29
5         38       20  103.83    1.25
6         15        0  103.85    1.26
---      ---      ---     ---     ---
308        0        0  103.81     1.4
309        5       11  103.83    1.36
310        0        0  103.84    1.37
311        1        1  103.85    1.36
312        3       11  103.84    1.35
313        0        0  103.61    1.26

We will use the spflow_network_pair() to hold information on origin-destination (OD) pairs:

mpsz_net_pairs <- spflow_network_pair(
  id_orig_net = "sg",
  id_dest_net = "sg",
  pair_data = mpsz_flow,
  orig_key_column = "ORIGIN_SZ",
  dest_key_column = "DESTIN_SZ")

mpsz_net_pairs
Spatial network pair with id: sg_sg
--------------------------------------------------
Origin network id: sg (with 313 nodes)
Destination network id: sg (with 313 nodes)
Number of pairs: 97969
Completeness of pairs: 100.00% (97969/97969)

Data on node-pairs:
      DESTIN_SZ ORIGIN_SZ DISTANCE TRIPS
1        RVSZ05    RVSZ05        0    67
314      SRSZ01    RVSZ05   305.74   251
627      MUSZ02    RVSZ05   951.83     0
940      MPSZ05    RVSZ05  5254.07     0
1253     SISZ01    RVSZ05     4975     0
1566     BMSZ17    RVSZ05  3176.16     0
---         ---       ---      ---   ---
96404    YSSZ07    TSSZ06 26972.97     0
96717    BSSZ01    TSSZ06 25582.48     0
97030    AMSZ05    TSSZ06 26714.79     0
97343    AMSZ04    TSSZ06 27572.74     0
97656    BSSZ02    TSSZ06  26681.7     0
97969    TSSZ06    TSSZ06        0   270

Lastly, we will use spflow_network_multi() to combine the network class and pair class to contain information on the node and node-pairs:

mpsz_multi_net <- spflow_network_multi(
  mpsz_net,mpsz_net_pairs)

mpsz_multi_net
Collection of spatial network nodes and pairs
--------------------------------------------------
Contains 1 spatial network nodes  
    With id :  sg
Contains 1 spatial network pairs  
    With id :  sg_sg

Availability of origin-destination pair information:

 ID_ORIG_NET ID_DEST_NET ID_NET_PAIR COMPLETENESS     C_PAIRS  C_ORIG  C_DEST
          sg          sg       sg_sg      100.00% 97969/97969 313/313 313/313

5. Correlation Analysis

When building explanatory models, it is important to check for multicollinearity to avoid including variables that are highly correlated to each other.

We will use the following functions from the spflow package o check for collinearity: 1. pair_cor() to create a correlation matrix 2. cor_image() to plot the correlation matrix as a correlogram

cor_formula <- log(1 + TRIPS) ~
  BUSSTOP_COUNT +
  AGE7_12 +
  AGE13_24 +
  AGE25_64 +
  SCHOOL_COUNT+
  BUSINESS_COUNT +
  RETAILS_COUNT +
  FINSERV_COUNT +
  P_(log(DISTANCE + 1))

cor_mat <- pair_cor(
  mpsz_multi_net,
  spflow_formula = cor_formula,
  add_lags_x = FALSE
)

colnames(cor_mat) <- paste0(
  substr(
    colnames(cor_mat),1,3),
  "...")

cor_image(cor_mat)

6. Model Calibration

6.1. Base Model

We will calibrate a base model using the following code:

base_model <- spflow(
  spflow_formula = log(1 + TRIPS) ~
    O_(BUSSTOP_COUNT +
         AGE25_64) +
    D_(SCHOOL_COUNT +
         BUSINESS_COUNT +
         RETAILS_COUNT +
         FINSERV_COUNT) +
    P_(log(DISTANCE + 1)),
  spflow_networks = mpsz_multi_net)

base_model
--------------------------------------------------
Spatial interaction model estimated by: MLE  
Spatial correlation structure: SDM (model_9)
Dependent variable: log(1 + TRIPS)

--------------------------------------------------
Coefficients:
                          est     sd   t.stat  p.val
rho_d                   0.680  0.004  192.554  0.000
rho_o                   0.678  0.004  187.733  0.000
rho_w                  -0.396  0.006  -65.592  0.000
(Intercept)             0.410  0.065    6.267  0.000
(Intra)                 1.312  0.081   16.262  0.000
D_SCHOOL_COUNT          0.017  0.002    7.885  0.000
D_SCHOOL_COUNT.lag1     0.002  0.004    0.552  0.581
D_BUSINESS_COUNT        0.000  0.000    3.015  0.003
D_BUSINESS_COUNT.lag1   0.000  0.000   -0.249  0.804
D_RETAILS_COUNT         0.000  0.000   -0.306  0.759
D_RETAILS_COUNT.lag1    0.000  0.000    0.152  0.880
D_FINSERV_COUNT         0.002  0.000    6.787  0.000
D_FINSERV_COUNT.lag1   -0.002  0.001   -3.767  0.000
O_BUSSTOP_COUNT         0.002  0.000    6.807  0.000
O_BUSSTOP_COUNT.lag1   -0.001  0.000   -2.364  0.018
O_AGE25_64              0.000  0.000    7.336  0.000
O_AGE25_64.lag1         0.000  0.000   -2.797  0.005
P_log(DISTANCE + 1)    -0.050  0.007   -6.794  0.000

--------------------------------------------------
R2_corr: 0.6942948  
Observations: 97969  
Model coherence: Validated

6.2. Residual Diagnostics

old_par <- par(mfrow = c(1,3),
               mar = c(2,2,2,2))
spflow_moran_plots(base_model)

par(old_par)

Closer to the zero line means lesser spatial autocorrelation.

Next, we use pair_cor() to inspect the relationship of the residual and explanatory variables:

corr_residual <- pair_cor(base_model)
colnames(corr_residual) <- substr(colnames(corr_residual),1,3)
cor_image(corr_residual)

6.3. Model Control

spflow_formula <- log(1 + TRIPS) ~
  O_(BUSSTOP_COUNT +
         AGE25_64) +
    D_(SCHOOL_COUNT +
         BUSINESS_COUNT +
         RETAILS_COUNT +
         FINSERV_COUNT) +
    P_(log(DISTANCE + 1))

model_control <- spflow_control(
  estimation_method = "mle",
  model = "model_8")

mle_model8 <- spflow(
  spflow_formula,
  spflow_networks = mpsz_multi_net,
  estimation_control = model_control
)
  
mle_model8
--------------------------------------------------
Spatial interaction model estimated by: MLE  
Spatial correlation structure: SDM (model_8)
Dependent variable: log(1 + TRIPS)

--------------------------------------------------
Coefficients:
                          est     sd    t.stat  p.val
rho_d                   0.689  0.003   196.832  0.000
rho_o                   0.687  0.004   192.214  0.000
rho_w                  -0.473  0.003  -142.469  0.000
(Intercept)             1.086  0.049    22.275  0.000
(Intra)                 0.840  0.075    11.255  0.000
D_SCHOOL_COUNT          0.019  0.002     8.896  0.000
D_SCHOOL_COUNT.lag1     0.019  0.004     5.130  0.000
D_BUSINESS_COUNT        0.000  0.000     3.328  0.001
D_BUSINESS_COUNT.lag1   0.000  0.000     1.664  0.096
D_RETAILS_COUNT         0.000  0.000    -0.414  0.679
D_RETAILS_COUNT.lag1    0.000  0.000    -0.171  0.864
D_FINSERV_COUNT         0.002  0.000     6.150  0.000
D_FINSERV_COUNT.lag1   -0.003  0.001    -4.601  0.000
O_BUSSTOP_COUNT         0.003  0.000     7.676  0.000
O_BUSSTOP_COUNT.lag1    0.000  0.000     0.552  0.581
O_AGE25_64              0.000  0.000     6.870  0.000
O_AGE25_64.lag1         0.000  0.000    -0.462  0.644
P_log(DISTANCE + 1)    -0.125  0.005   -22.865  0.000

--------------------------------------------------
R2_corr: 0.6965974  
Observations: 97969  
Model coherence: Validated

Model 8 brings out intrazonal flows to examine its influence on commuter flows.

old_par <- par(mfrow = c(1,3),
               mar = c(2,2,2,2))
spflow_moran_plots(mle_model8)

par(mle_model8)
NULL