01.
 Thesis: NYC’s Open Streets Program and Air Quality
Numerous conversations have been generated around the Open Streets program in New York City regarding its economic and public health benefits ever since the program started in Summer 2020. The program created additional restaurants sittings and school spaces by closing streets to traffic. The opportunity allows a wide range of programs that have supported economic development and activated open spaces in communities. This study takes this opportunity to collect street-level data with high temporal resolution using a combination of black carbon sensor and noise sensor to study the effect of a street closure event on local black carbon concentration. A set of factors is considered as potential sources for either emitting black carbon or mitigating black carbon exposures. Moreover, this study uses the Generalized Additive Model and penalized splines to account for unknown effects from meteorological variables and street canyons. Results from this study suggest a measurable decrease in black carbon concentration on the streets with the Open Streets segment when compared to their parallel streets without the program during Open Streets program hours, during the study period of March to October 2021.
Research Questions
The main research question -- “How does the Open Streets program impact black carbon concentration in Upper Manhattan?” is parsed into multiple quantifiable steps:
- What is the baseline black carbon (BC) concentration on streets in the selected study areas in Upper Manhattan? 
- 
What is the difference in BC concentration on streets with an active Open Streets segment compared to that of parallel streets without?
- 
What is the difference in BC concentration on a street segment with an active Open Street compared to the same segment while the Open Street is not active? 
- 
What is the difference in BC concentration on a street segment without Open Streets program while a parallel streets with Open Street is active compared to the same segment while the parallel Open Street is not active?
- How do other variables correlate to the BC measurement (i.e. weather, traffic, road types, and road activities)?
To answer these questions, the study used black carbon concentration and traffic noise data collected on Open Streets using microAeth® AE51 (AE51) and a noise sensor. The noise sensor was designed and assemble by Luc Dekoninck, a noise scientist who have used same equipment for his previous studies. Black carbon data and traffic noise data was collected during 30 biking trips on designated streets from June to October in 2021. Segments from five parallel streets were included in the study: Broadway Avenue, Amsterdam Avenue, Columbus Avenue, Manhattan Avenue, and the Eighth Avenue which contains partial Central Park West and Frederick Douglass Boulevard. Using this original data, instantaneous black carbon pollution models were produced. Results from the model helped compare black carbon concentration on Open Streets and non-Open Streets, and therefore allowing us understand the scale of environmental impact of the Open Street program with quantitative evidence. Its hyper-local level can also facilitate policymaker’s future emission reduction strategies.
Data Cleaning (Tools: Python and R)
 
A total of 30 trips were made from March 6th to October 30th in 2021. 27 out of 30 trips had complete black carbon data. Only trips with complete data were being used for data analysis.  9829 black carbon data points were ultimately kept in the dataframe with a mean of 1336 ug/m3 and standard deviation of 9084.8 ug/m3, ranging from -48706 ug/m3 to 626792 ug/m3. This raw data contains a number of negative BC values recorded by AE51. While the official website of AethLAB indicated that negative readings are not necessarily a concern and should be treated as noise, it is still possible that some of these values are related to the settings and operation of the instrument. According to the AethLAB website, lowering the flow rate or timebase could potentially reduce noise ratio. This method is not applicable to this research since data has to be collected at a high flow rate and timebase to ensure its high temporal resolution. 
An investigation of these negative values was conducted. Figure 10 shows the percentage of negative values produced in each trip. These values were not distributed equally -- some trips, such as those taken on 06/26, 10/06, 10/09, and 10/14 had about 30% of data that were negative compared to an average of 13.8%. Arbitrarily dropping or replacing these values would be detrimental to the dataset. To further understand why certain days had more negative values than other days, a time series plot and a map of where data was collected were compared.
Figure 11. Examples of two typical trips and their BC value recorded. Negative BC values (shown in blue) were often found at the beginning and evenly recorded during the trip.
Figure 11 is an examples route and black carbon data points collected from two trips. By comparing all 27 trips, the researcher found all negative BC values could be categorized into two types by their potential causes -- negative values due to device calibration, and negative values that are noise. A total of 16 extreme negative values that are smaller than -10,000 ug/ml were found in both cases. Considering their ratio to total data is small but can still have significant influence over the mean, they are dropped from the dataset. For negative values produced during device calibration, they are replaced with the lowest value recorded in that trip. Same treatment was given to negative values distributed during the trip. After dropping and replacing negative BC values, the statistical summary found the data range had changed to 1 ug/m3 to 626792 ug/m3 with a mean of 1554 ug/m3 and standard deviation of 8977.65 ug/m3.
The standard deviation is abnormally high, indicating more investigations are needed to detect outliers. Anomaly detection was used in this step to detect data points that are deviating from the dataset's usual behavior (“Detect Anomalies in Time Series Using Anomalize Package In R,” 2020). A total of 268 points were detected using the default setting and by comparing averages among all trips. Using a different method, by comparing values within each trip, a total of 181 anomalies were detected. Since preserving each trip’s characteristics is important to the study, the later method was used and those 181 anomalies were deleted from the dataset. This treatment effectively smoothed data and allowed better interpretation of BC trend for each trip (Figure 12 and 13).
Figure 12. Time Series of black carbon values before detecting and treating the anomalies
Exploratory Data Analysis
Black carbon dataset is not normally distributed, therefore, a log transformation using natural base was performed to ensure all black carbon values can be compared within the same scale. The distribution of black carbon data before and after log transformation is seen in Figure 14 and Figure 15.
| Figure 14. The distribution of black carbon values in a histogram | 
Figure 15. The distribution of black carbon value after natural log transformations in a histogram
| Figure 16. Distribution of black carbon value (after being log transformed) in each trip. | 
To test Multicollinearity, the “mctext” package in R was used. An individual diagnostic checking for multicollinearity was performed, and Farrar-Glauber test (F-test) was performed on each of these variables to check if their collinearity is significant. The diagnostic outpu from the collinearity test include: Variance Inflation Factor (VIF), Tolerance (TOL) and Farrar-Glauber F-test (Wi). As expected, the “Temperature” variable had the highest VIF, followed by “Dew Point ''. To test whether the correlation between each variable is significant, a t-test was performed for each correlation coefficient. The results are aligned with the observation, the high partial correlation between temperature and other weather variables is very high, except with dew point, followed by wind direction. negative correlations between wind speed and dew point, and between wind direction and dew point were also observed.
Variables were removed to test collinearity among variables that are left. Starting with the variable with the highest VIF, “Temperature” and “Dewpoint” were removed. After that, collinearity was no longer detectable among “Wind Speed”, “Wind Direction” and “Street Canyon”.
Figure 20. The correlation matrix of meteorological variables and Street Canyon Index.
Note: The plot shows that temperature and dewpoint are highly correlated. Some correlations are also present between temperature and wind Speed, and temperature and wind direction. No evidence of correlation is observed between Street Canyon Index and any weather variables.
Results 
“
What is the difference in BC concentration on streets with an active Open Streets segment compared to that of parallel streets without?”Finding 1: When Open Streets program is active, overall Black Carbon (BC) concentration increased in the study area.
*Open Streets segment is highlighted in blue
Finding 2: When Open Streets program was active and when compared to BC on Amsterdam Avenue, a decreased BC on Broadway Avenue, Columbus Avenue, and Manhattan Avenue was observed. At the same time, an increased BC on Frederick Douglass Blouvard and Central Park West was observed.
*Open Streets segment is highlighted in blue
Finding 2: When Open Streets program was active and when compared to BC on Amsterdam Avenue, a decreased BC on Broadway Avenue, Columbus Avenue, and Manhattan Avenue was observed. At the same time, an increased BC on Frederick Douglass Blouvard and Central Park West was observed.
Finding 3: In a subset of data that only contained data collected on Amsterdam Avenue and Columbus Avenue, streets with Open Streets program segment, when Open Streets program was active, the overall Black Carbon (BC) concentration increased.
“What is the difference in BC concentration on a street segment not part of the Open Street program while a parallel Open Street is active compared to the same segment while the parallel Open Street is not active?”
Finding 4: In a subset of data that only contained data collected on Broadway, Manhattan Avenue, 8th Avenue(Central Park West & Frederick Douglass Avenue), streets without Open Streets program segment, when Open Streets program was active, the overall Black Carbon (BC) concentration increased.
However, when looking at each street segment, a decreased BC concentration was  found on Broadway and increase BC concentration was found on Manhattan Avenue and 8th Avenue.
Finding 5: After controlling for a set of environmental variables, the final model suggested an increased BC concentration on parallel streets that without Open Streets segment during Open Streets program hours. In another word, during Open Streets program hours, BC concentration on Amsterdam Avenue and Columbus Avenue decresed but that of on Broadway, Manhattan Avenue, and 8th Avenue increased. 
Limitations and Indications for Future Studies
There are limitations in the study. First, the collinearity between variables is not fully understood and therefore, requires further studies. The selected Open Street program for this study is only on Fridays and weekends. The traveling behaviors differ on weekends then on weekdays. It is difficult to get a result that is fully clean from the weekend effect. For future studies, more data on an Open Streets program that operates during weekdays need to be collected. There are also Open Streets programs that are not Open Restaurants, such as Play Streets program—the host streets are closed to traffic to provide extra space for school activates. Having more data from different types of Open Streets program will be important in order to have a broader sense of understanding of the impact of streets closure events on black carbon concentration regardless of the time and location of the events.
Second, the binary way of including a variable limits the model’s ability to fully reflect a variable effect on black carbon concentration. In the data, park and restaurants variables are included as binary variables. The “binary” means, the data only reflects whether there is a park or restaurant within 50 meters search radius of a black carbon data point. The variable is limited by the search radius defined in the analysis, which does not fully reflect the more transient impact these variables may have on black carbon concentration in the neighborhood. Similarly, the restaurant variable also limited because the not all restaurants emit the same amount of black carbon. Some restaurants are more likely to have outdoor barbeques than others, but the restaurant type was not considered. Future research can improve the limitation of binary variables by using distance as the indicator. Instead of a research radius, the future research can measure the distance to the nearest restaurant and nearest park. The distance will be included as the variable—a more nuanced way of account for their effects. The restaurant types may also be included in the dataset.
Third, the temporal effect on black carbon concentration is not fully accounted. The analysis included “before 5pm/after 5pm” and as a temporal variable; however, this binary variable cannot fully reflect and control the temporal effect. Forth, when comparing black carbon concentration on different streets, the analysis used one of the street as the constant and compared it to every other streets. It may not have been the most accurate way to present black carbon effect if the comparison is done among the streets. For future research, it’s possible to include use splines for time. Deciding the intervals that would be used for aggregation would be important. Future analysis should consider a time interval that preserves a meaningful level temporal resolution, and then, consider using the spline for the aggregated time variable, which is an easy and more nuanced way of accounting for temporal resolution and maintaining flexibility.
Overall, while the data collected has high spatial and temporal resolutions, the analysis conducted in the scope of this research may not be enough to detect all subtle differences in black carbon concentration in the study area. Data aggregation and grouping processes may weaken some of the spatial and temporal effects. As an example, aggregating data using 5pm as the cut-off will overlook the change happening before 5pm and after 5pm. For future analysis, the model can be benefited from introducing time variables through a more sensitive method.
Still, the results from this study play a role for setting up future hyper local and hyper temporal air quality studies. Using a combination of sensors is an innovative method which gives the flexibility and enables high spatial and temporal resolution data to be examined at the same time. The highlighted temporal and spatial effect on black carbon concentration proves the potentiality for using low-cost innovative data collecting methods for air quality studies.
Significance
This study contributes to the existing body of air quality literature in a few ways. Few studies used dynamic sensing to capture temporal and spatial effects on air quality. Most air quality studies are constrained by air quality data with low time resolution and siting options. The approach used in this project proves the feasibility of using combined sensing techniques and citizen air monitoring. The hyper local and hyper temporal air quality data generated in this project allows for greater flexibility in data analysis. Hence more air quality research questions regarding locality and temporality could be asked and studied using this set of data. Its spatiality allows a number of built environment factors to be studied as well. While it is known that the built environment has a tremendous impact on air quality, their interactions and the dynamic emission patterns existing between them and traffic patterns are rarely studied. It is important for future research to investigate such relationships.
The main results from this project are important because they prove a traffic pattern during a street closure event. A measurable difference was observed when comparing streets with Open Streets segments and parallel streets without active Open Streets hours. Even if the difference is small, it may still reflect local experiences during a street closure event. The data was collected through biking; therefore, the results reflect what a biker’s experience is like in the study area and the subsequent black carbon exposure a biker may have had.
The observed traffic pattern changes and their overall effect on black carbon concentration suggest that current traffic policies related to air quality should also be expanded to include more events like the Open Streets program. Further works should assess the interaction between traffic patterns and built environment, as well as their effect on black carbon. Though the air quality improvement due to the Open Streets program was not found in this study, it is important to keep recognizing the program’s value in other aspects, such as providing open spaces and economic opportunities. While improving air quality was not what this program was designed for, its influence on traffic, traffic related emissions, emission exposures and its subsequent effect on non-traffic related emission should not be overlooked.