05.
Modeling
Open Space Accessibility in New York City
2021, Public Health
Tools: QGIS, R
“Access” by definition refers to a means of approaching or entering a place; to obtain, examine, or retrieve. In urban planning and public health, the power to define access is also the power to control access. The intepretation of accessibility and the operationalizationalization of that definition also impact people’s experience. Results from previous studies are inconsistent at demonstrating the association between park distribution and demographic and socioeconomic status variables. This project attempts to use a spatial regression model to capture disparities in park accessibility in New York City. Through the comparison between spatial and nonspatial accessibility measurements, the project aims to capture the discrepancy and challenges in the existing methods for measuring accessibility in public health and urban planning.
Method
Two main steps were taken study new yorker’s accessibility to parks in relationship to people’s socioeconomic status. In the first step, the physical accessibility defined as “the shortest distance” travelled needs to be measured. In the second step, GWR model and spatial regression model were used to access the relationship between shortest distance and SES. Initially, the centroid of residential unit (in building footprint) generated from PLUTO was the preferred smallest analysis unit. However, there are over 764,372 units that qualified for residential. The computational power required to calculate the distance between each of these points to other 4,086 park access points is beyond the capability of computational power available. Census block would be another option. Currently, there are 38,800 census blocks in New York City. Still, the computer available was not able to process this amount of data. Therefore, using the census tract as the smallest residential unit became the next option. To calculate the closest distance to the nearest park for each of the 2,165 census tracts in New York City, a closest facility analysis was performed using QNEAT 3, a QGIS plug-in (Raffler, 2018). To perform the closest facility analysis, start points and end points were provided. The start points were centroids of census tracts. To create end points, simply using the centroid of each park would be misrepresenting since some parks are much larger than the others. For 2031 parks selected, the minimal size is 218.7 square feet and the maximum size is 86,055,448.7 square feet. Moving on, the park access points were defined and created as the intersection between LION Street line and Park outline. As a result, 4,086 access points were created (Figure 2).
Figure 1. Methodology for generating shortest distance Result
The closest facility analysis generated 2,175 shortest routes from each census tract to the closest park access point (Figure 2). These routes were matched with their corresponding census tract. The full data frame including shortest distance and selected SES indicators were imported into R to perform the spatial regression analysis. A global model was generated using linear regression. Selected SES indicators from ACS 5 years estimates for the 2019 period were included in the model (Table 1). Each indicator, except individual median income and family median income, was normalized -- number per 1,000 people. Then a geographically weighted regression (GWR) was applied to the same model. To access capital dependency, R2 and standard errors from the GWR model were plotted (Figure 3 and Figure 4). For the spatial regression model, minimum and maximum distances, and distance ranges were calculated. Moren’s I test for each distance weight was calculated and compared. The one with higher Moren’s I was used for the rest of the analysis followed by calculating local Moren’s I using selected spatial weight. Then “lm.LMtests” function was used to access 5 models ( Error dependence, Robust form of Error dependence, lag dependence, robust form of lag dependence and SARMA models) and understand the type of spatial dependence in the model.
Figure 2. Shortest distance from census tract centroid to the nearest park
ResultsResults from the global model show that out of 11 indicators, excluding total population which was used for normalizing other variables, suggest mostly negative correlations between each variable and the shortest distance. Percent of units lived by owers and percent of family living under the federal poverty line were both positively correlated to the shortest distance. The estimated coefficient for household income was close to zero, indicating zero correlation.
After performing geographically weighted regression analysis, a spatial dependency of each outcome and each predictor variable was clearly shown on the R2 map (Figure 3). For the spatial regression model, minimum and maximum distance was selected as spatial lag because it yielded a higher Moren’s I ( min-max: 0.198 vs. range: 0.056). The results from local Moren’s I suggest that 17.95% of total census tracts with long distance to the nearest park was surrounded by other high-value census tracts, 24.67% of total census tracts with short distance to the nearest park was surrounded by other low-value census tracts. The rest 57.38% did not have significant spatial weight.
The results from “lm.LMtests” demonstrated statistical significance for all 5 models, indicating that both lag and error dependencies exist in the model. Therefore, a Durbin model was used to control both dependencies. The result of Durbin model is shown in Table 3. Predicators were statistically significant in the global mode ( percent of the black population, percent of unit lived by its owner, median individual income, and median family income) became statistically insignificant in the Durbin model. Percent of households living under the federal poverty line was insignificant in the global model, but it became statistically significant in the Durbin model. Percent of individuals living under the federal poverty line was negatively correlated to the shortest distance to the park, but it became positively correlated in the Durbin Model.
Overall, some spatial dependencies were found among selected SES; however, most correlations that existed before became insignificant after controlling for spatial dependence. This was true except for one variable -- the owner-occupied housing unit, which demonstrated a positive correlation to the distance to the nearest park in the Durbin model. Figure 5 shows the spatial nonstationarity of this variable. Especially in Staten Island, the percentage of owner-occupied housing units was positively correlated to the distance to the nearest park. This relationship aligned with the current residential type compositions in the city. Staten Island, which has large green space coverage, has more owner-occupied units than the other four boroughs which are mostly occupied by renters. For other SES variables, especially communities of color, the results contradicted the hypothesis. Increased percentage of Black and Hispanic populations were negatively correlated to the distance to the nearest park and the percentage of unemployment and household poverty rate.
Shortest distance from census tract centroid to the nearest park
R2 from GWR
Standard errors from GWR
GWR
result (shortest distance vs. owner-occupied unit)
Limitations
A limitation of this study is the consideration of various park sizes and park amenities. The number of parks is not a synonym for the total square footage of parks. Looking at Figure 1, South Bronx has more parks, but their sizes are much smaller. Compared to the sporadic small parks in South Bronx, Upper West Manhattan may have fewer parks, but the size, especially Central Park, is much more significant. In reality, park size is often correlated to park usage. With larger parks having more amenities and hosting more usage, more people would prefer going to larger parks than smaller parks. The other limitation of this study is that the residential unit used for analysis was derived from census tracts. The aggregation of the neighborhood reduced spatial resolution, which made calculated distance to the nearest parkless representative of the actual neighborhood condition.