by: Dr. James Weber
Air quality (AQ), and the policies enacted to improve it, is becoming an increasingly important issue. It is also becoming increasingly politicised; exemplified by arguments over clean air zones like London’s ULEZ, low traffic neighbourhoods and even moorland burning (Weber et al. 2023). The negative impact of poor AQ on health, particularly for the most vulnerable people, is well established yet understanding the drivers of air quality and, thus the ways it can be improved, is challenging.
AQ is generally defined in terms of the concentrations of key pollutants which negatively impact human health, for example nitrogen dioxide (NO2), ozone (O3) and fine particulate matter (PM2.5). Their concentrations are determined by the balance of pollutant sources (local emissions, longer range transport and production in the atmosphere) and sinks (loss to terrestrial or aqueous surfaces and dispersion in the atmosphere). Thus, understanding AQ requires knowledge of meteorology, atmospheric chemistry and aerosol science. While an extreme example, the London Smog of 1952 which resulted in at least 10,000 deaths, cannot be understood without considering both the anti-cyclonic behaviour which led to a temperature inversion, trapping air close to the ground, and the chemistry which converted the sulphur dioxide emitted by the burning of low quality coal into sulphuric acid.
The dependence on prevailing meteorology can make evaluation of interventions, such as clean air zones, challenging, particularly in the short term because it can be hard to determine the extent to which any AQ change (or lack thereof) following an intervention is due to changes in local emissions (which the intervention can influence) and/or the extent to which it is due to prevailing meteorology and longer-range transport of pollution.
The assessment of clean air zones is beyond the scope of a single blog post but here I present a summary of how some key, widely measured pollutants have changed in Reading over the last 10 years, explore how simple analysis can point to their source(s) and demonstrate how machine learning can be used to assess the influence of different variables on their concentrations. I focus on Reading, but this analysis could be done with data from any of the ~270 air quality monitoring sites maintained by the Department of the Environment, Food and Rural Affairs (DEFRA) around the UK as well as those maintained by local authorities and the ever-growing number of air quality monitoring sites around the world.
Comparison to AQ Targets
I use the DEFRA air quality monitoring site located in Reading New Town, an urban background site (i.e. not next to a busy road) in this case and make use of the excellent Openair R package from David Carslaw and colleagues at the University of York. It is important to recognise that the concentrations of shorter-lived pollutants can vary across an urban region due to varying proximity to local sources and the (lack of) mixing due to orography, for example the street canyon effect. Therefore, to construct a complete picture, multiple sites across a city or town should be considered.
The most recent WHO air quality targets recommend that daily mean NO2 and PM2.5 concentrations should not exceed 25 μg m-3 and 15 μg m-3 respectively for more than 3-4 days per year (~1%)viii. Plots of daily mean NO2 and PM2.5 for 2014-2024 (Fig 1) demonstrate just how frequently these pollutants have exceeded this limit over the last 10 years (30% and 10% of the time respectively). Of course, care must be taken when applying a limit proposed in 2021 to years prior but it is nevertheless informative.
Longer Term Trends
Analysing the longer term trend however paints a slightly different picture. NO2 exhibits a consistent decrease (p<0.001) over 2014-2024 (Fig 2a) while PM2.5 shows a much slower decline (p<0.05) (Fig 2b). The reduction in NO2 is likely driven by improvements to the vehicle fleet given traffic’s role as a major NO2 source (see later) with the noticeable drop during early 2020 when the COVID19 lockdown reduced traffic flow supporting this. The rate of PM2.5 decrease is around an order of magnitude lower than that of NO2 which is due in part to traffic contributing a smaller fraction to PM2.5 emissions than NO2.
While NO2 and PM2.5 show decreases, O3 exhibits a steady increase (p<0.001) (Fig 2c). This increase is in part due to the reduction in NOx(=NO + NO2); in particular, the increase in O3 levels in early 2020 coincided with the Covid lockdown. This highlights a key challenge in AQ policy; under certain chemical environments, reducing NOx will increase O3 (Grange et al., 2021).
These trends are also seen in the exceedances. NO2 drops from ~50% exceedance in 2014-2017 to <10% in 2023 (with a COVID19 dip also visible) while PM2.5 shows slower improvement (Fig 3).
The Detective Work Begins
Of course, from a policy perspective, the key aim is to understand the relative importance of difference sources of air pollution so that measures – local, regional, national, or even international – can be designed to improve the situation.
The diurnal and weekly cycles of NO2 and PM2.5 (Fig 4) provide some clues as to their sources and so how they might be affected by policies. During the week, NO2 shows strong peaks in the morning and evening rush hour, supporting the dominant role of traffic. In contrast, PM2.5’s morning peak is much smaller, and the evening peak occurs slightly later, suggesting the reduction in boundary layer height and greater confinement of pollution closer to the surface is more important for PM2.5 than NO2 (local emissions from domestic heating may also play a small role in the winter). Over the course of a week, NO2 also exhibits a much greater reduction at the weekend as the flow of commercial vehicles, a major source of NO2, is greatly reduced; however, PM2.5 exhibits little weekday-weekend variation, further supporting the argument that a greater fraction of PM2.5 comes from non-traffic sources.
Local Emissions vs. Longer Range Transport
If we combine the AQ data with meteorological data from the University of Reading’s atmospheric observatory, the influence of wind speed and wind direction on pollutant levels can be examined. The polar plots in Figure 5 show normalised NO2 (left) and PM2.5 (right) concentrations as a function of wind speed and direction. NO2 concentrations are highest at very low wind speeds (i.e. centre of plot) while stronger winds from East-North-East (ENE, ~the Greater London region) are also associated with higher NO2. Higher wind speeds from other directions are associated with low NO2, suggesting the dispersion of local emissions is outweighing any longer range transport. The story is quite different for PM2.5: low wind speeds do yield higher than average concentrations but, by far the highest pollution arises when there are strong winds from ENE. This presents strong evidence that NO2 is primarily governed by local emissions, with a smaller contribution from longer range transport, while PM2.5 is driven much more by transport of pollution. Therefore, policies to reduce local emissions (e.g. a Reading clean air zone) are more likely to improve NO2 (which is already decreasing steadily anyway) than PM2.5. Of course, the situation may be very different in another town or city; just because it appears (at least from preliminary analysis) that a clean air zone in Reading would have little impact on PM2.5, this does not mean such a policy would be ineffective for PM2.5 everywhere.
The Rise of the Machine (Learning)
As in many fields, machine learning can be used *with great care* to understand the variability of a particular parameter (e.g. NO2) to a range of explanatory variables. In this case, I use the deweather package developed by David Carslaw and colleagues to construct a statistical model of pollutant concentration from contemporaneous meteorological and temporal data (e.g. wind speed, hour of the day).
After building the model, I explore the dependency of a pollutant to each variable in turn – the so-called partial dependency – by sampling the model with many different values of that variable while holding all other variables at their mean value. This provides information as to how each variable in isolation can affect air quality and the magnitude of its influence (Fig 6). Of course, this requires the user to have included all the important factors and, in this case, we have used a basic set of explanatory variables, omitting more complex ones such as air mass origin which can be useful for tracking long range transport of pollution.
The plots for PM2.5 and NO2 (for 2021-2024) show that both pollutants are predicted to decrease with higher wind speeds (U10) but note the slight increase at high speed for PM2.5, most likely driven by the behaviour seen in Fig 5. Increasing temperatures (Td) are also associated with lower pollution, possibly due to a higher boundary layer and therefore greater mixing of pollutants away from the surface, although the cause of the uptick in concentration from PM2.5 remains unclear.
Higher levels of pollution are also associated with winds from the east (i.e. those which have passed over Greater London) but this is more influential for PM2.5 than NO2 (in agreement with Fig 5). The diurnal cycles seen in the observational data are broadly reproduced but this factor is more important for NO2 than PM2.5, reflecting the greater role of traffic in NO2 production. The model returns a similar weekday-weekend pattern as observed in Figure 4.
For both pollutants, the trend is the single most influential component. This can be thought of as the variability not captured by the other explanatory variables and will include the impact of longer term emission changes but could also include the impact of factors not included in the explanatory variables, such as varying air mass origin in this case, and so should be interpreted with care.
An obvious use of this statistical model is to predict pollutant concentrations under counterfactual situations. For example, if a clean air zone is implemented, such a model can be used to predict the concentration of pollutants which would have occurred had no such policy been put in place (Grange et al., 2021). The difference between the modelled counterfactual concentration and measured concentration at a given time is the true impact of the policy and a better metric than the oft-used approach of comparing air quality values at an (often arbitrary) time before and after a policy’s implementation. These models are thus powerful tools for analysing the impact for policies but great care must be taken to ensure model biases – which are inevitable – are not conflated with a policy’s impact.
References and Further Reading
For analysing air quality data, the Openair package offers a wide range of analysis and data visualisation tools. Grange et al (2018) presents the use of deweather in analysing PM while Grange et al (2021) demonstrates predicting the counterfactual to analyse the impact of changing emissions. The R scripts used to generate the plots are available on request from the author.
Weber, James; Val Martin, Maria; Bryant, Robert (2023). Impact of Moorland Fires on Sheffield Air Quality on 9th October 2023. The University of Sheffield. Report. https://doi.org/10.15131/shef.data.24356629.v1
Stuart K. Grange, James D. Lee, Will S. Drysdale, Alastair C. Lewis, Christoph Hueglin, Lukas Emmenegger, and David C. Carslaw, (2021). COVID-19 lockdowns highlight a risk of increasing ozone pollution in European urban areas. Atmospheric Chemistry and Physics. https://doi.org/10.5194/acp-21-4169-2021
Stuart K. Grange, David C. Carslaw, Alastair C. Lewis, Eirini Boleti, and Christoph Hueglin, (2018). Random forest meteorological normalisation models for Swiss PM10 trend analysis. Atmospheric Chemistry and Physics. https://doi.org/10.5194/acp-18-6223-2018
Discussion about this post