Spatially distributed anthropogenic and open burning emissions are fundamental data needed by Earth system models. We describe the methods used for generating gridded datasets produced for use by the modeling community, particularly for the Coupled Model Intercomparison Project Phase 6. The development of three sets of gridded data for historical open burning, historical anthropogenic, and future scenarios was coordinated to produce consistent data over 1750–2100. Historical data up to 2014 were provided with annual resolution and future scenario data in 10-year intervals. Emissions are provided on a sectoral basis, along with additional files for speciated non-methane volatile organic compounds (NMVOCs). An automated framework was developed to produce these datasets to ensure that they are reproducible and facilitate future improvements. We discuss the methodologies used to produce these data along with limitations and potential for future work.
Anthropogenic activities, from the generation of electricity to the ignition of forest fires, result in the emissions of gases and aerosol species into the atmosphere. These emissions, in turn, alter atmospheric composition, deposition rates, and the Earth's radiative balance. Emissions have, for the most part, increased over the 20th century, leading to higher aerosol concentrations and tropospheric ozone levels as industrial activities and fossil fuel consumption increased. Over the past couple of decades air pollutant emissions have shifted away from North America and Europe, due in large part to air pollution controls, to East and South Asia, driven by rapid industrialization and population growth in those regions.
One of the important tools used to examine these impacts are models of the
Earth system, including chemical transport, climate–chemistry, and
Earth system models. These models require gridded emissions data in order to
simulate the impact of these emissions within the Earth system. Furthermore,
for multi-decadal to century-long model runs, temporally and spatially
consistent emissions data are needed. Here we describe the production of
several related gridded datasets that were produced, in large part, to
facilitate the Coupled Model Intercomparison Project Phase 6 (CMIP6; Eyring
et al., 2016). These datasets contain anthropogenic chemically reactive gases
(CO,
Emission species provided in the emission data files. For the molecular
weights assumed for each species, see
The first dataset considered here are anthropogenic emissions, which are defined as emissions that stem directly from human activities such as energy transformation, buildings, transportation, and agricultural and industrial activities (see Hoesly et al., 2018, for a complete listing). Historical emissions gridding is also discussed in Hoesly et al. (2018), along with an extensive description of the methodologies used to produce country-level historical anthropogenic emissions. We provide a more complete discussion of the gridding methodology, a description of the closely related methods used for gridding future emissions, and related supplementary data such as speciated VOCs. Compared to previous datasets, our anthropogenic data have a greater degree of consistency across species and over time, with seasonality for all species. The historical anthropogenic emissions were produced by the Community Emissions Data System (CEDS). This paper focuses on the data produced for CMIP6 (see Appendix A). Updated emission data releases for general scientific use by the CEDS project are in progress (e.g., Hoesly et al., 2019).
The second dataset is historical open burning emissions, which are defined as forest, grassland, and peatland fires, along with agricultural waste burning (AWB) on fields. While open burning emissions can also have anthropogenic drivers, these emissions are in a separate category here and elsewhere in the literature, as the techniques for estimating are generally distinct from the methods used for “anthropogenic” emissions. The open burning emissions over recent years are from the Global Fire Emissions Database version 4 with small fires (GFED4s; Van der Werf et al., 2017), which have been driven by satellite data since 1997. Estimates for earlier years are based on proxies and fire models. We only briefly discuss this dataset in this paper since those data are inherently in gridded form during their development, as described in detail by van Marle et al. (2017). Note that open burning emissions are often described in the literature as “biomass burning”, but we do not use this term to avoid confusion with anthropogenic biofuel combustion such as biofuel use in cookstoves, which is included in the anthropogenic emissions dataset.
The third set of data comprises gridded data over the future (2015–2100) for these same species for both anthropogenic and open burning sectors for selected future scenarios. These future gridded data were produced using a variation of the same gridding methodologies used for the historical anthropogenic data, which is why these are discussed together in this paper. The future emission trajectories are discussed in Gidden et al. (2019), with the gridding methodology described in more detail herein. As discussed below, the future gridded emissions build on these two historical datasets and, in large part, inherit their properties such as within-country spatial distribution and seasonality.
A number of global gridded datasets have been produced over the years. One
of the most widely used datasets is the Emissions Database for Global
Atmospheric Research (EDGAR), which provides an independent estimate of
historical greenhouse gas (GHG) and pollutant emissions by country, sector,
and spatial grid (
Lamarque et al. (2010) developed the historical dataset used in the Coupled Model Intercomparison Project Phase 5 (CMIP5), which included global gridded estimates of anthropogenic and open burning emissions for 1850–2000 at 10-year intervals. It was a compilation of the “best available estimates” from many sources including EDGAR-HYDE (van Aardenne et al., 2001), RETRO (Schultz and Sebastian, 2007), and emissions reported largely by Organization for Economic Cooperation and Development (OECD) countries over recent years. One focal point of that work was the compilation of a year 2000 emissions dataset that was used as the starting point for the future projections. See Hoesly et al. (2018) and van Marle et al. (2017) for a comparison of the CMIP6 and CMIP5 anthropogenic and open burning datasets, respectively, and Gidden et al. (2019) for a comparison of the CMIP6 and CMIP5 projections.
We first discuss the overall methodologies for producing the gridded data, then present the gridded data and discuss the underlying properties of these data, focusing on the anthropogenic sources. The paper concludes with a discussion of the issues identified in these data and potential further work to improve their quality.
We first provide an overview of the gridding methodology, with further details provided in the following subsections. The gridding methodology for historical anthropogenic and all future emissions is summarized in Fig. 1. The fundamental underlying data used here are emissions by country and sector. This provides both total emission trends over time as well as changes in the sectoral composition of each emission species. As discussed in Hoesly et al. (2018), emissions in the preindustrial period were generally dominated by biofuel use in the residential sector. As industrialization proceeded emissions from the industrial, energy transformation, and transportation sectors became increasingly important. Emissions are translated to a spatial grid for each country and aggregate gridding sector (see Table 2). The methodology applied to future emissions is similar to that used for historical emissions, although with a lower sectoral and temporal resolution, as detailed below. This comprehensive gridded dataset is therefore produced with a consistent methodology by sector across all emission species spanning 1750 through 2014, with a consistent set of future projections over 2015–2100. The details of the methodology are first described for the historical emissions data (1750–2014), followed by a discussion of areas in which the methodology differs for future emissions (2015–2100). Further details including code are available online for both historical and future gridding (Sect. 5 below).
Emissions gridding overview. Emissions data at the country and aggregate sector level are mapped to spatial grids separately by country and sector, then combined into global emissions grids as described in the main text.
Proxy data used for gridding anthropogenic emissions data, adapted
from Hoesly et al. (2018). Gridding proxies marked with an asterisk (
Aggregate anthropogenic emissions over the historical period by country and
CEDS sector from Hoesly et al. (2018) are aggregated to 16 intermediate
sectors (Table 2) and mapped to a
Emissions at the country and gridding sector level are mapped to a
spatial grid using a variety of spatial proxy data, as described below.
Emissions are distributed into target grid cells
The proxy data values are preprocessed for each country by multiplying the data by the area fraction of each grid cell that is in the specified country at an annual resolution (if annual gridded data are available). In this way grid cells that contain multiple countries will be assigned emissions proportionately. After assigning emissions to spatial grids by country, the spatially distributed emissions from each country are then added into a global spatial matrix. The “country” list includes a global region for emissions related to international shipping that are not associated with a particular country. Aircraft emissions are gridded separately, with one three-dimensional distribution that is scaled uniformly to match the global emissions estimate.
In most cases the proxy data are gridded emissions so as not to duplicate the effort needed to convert raw proxy data into the form needed for emissions inventories. We define two levels of spatial proxy data, the primary gridding proxies and a backup gridding proxy, which is gridded population. The backup gridding proxy is used where the primary gridding proxies are not appropriate due to either (1) the primary proxy not being available (e.g., is equal to zero) for the given country–sector–year combination or (2) the primary proxy being inaccurate for the given country–sector combination. In the latter case, we perform this proxy substitution when the ratio of proxy to sector emissions data for that country–sector combination is an outlier compared to the global distribution of this ratio across countries.
Over recent decades the primary gridding proxy data were from the EDGAR v4.2 (EC-JRC/PBL, 2012) inventory (Table 2) since these data were available over 1970 through 2008. Road transportation uses the EDGAR 4.3.2 road transportation grid, which is significantly improved over previous versions (Crippa et al., 2016) but was only available for 2010 at the time these data were produced, so the country-specific 2010 spatial distribution is used for all years. Flaring emissions use a blend of grids from EDGAR and ECLIPSE (Klimont et al., 2017). The backup gridded population proxy, as well as the proxy for early years for the residential–commercial sector, is a combination of gridded population from the Gridded Population of the World (GPW) (Doxsey-Whitfield et al., 2015) and HYDE (Goldewijk et al., 2011). Aircraft emission distributions are from Lee et al. (2009) and international shipping uses ECLIPSE shipping grids, with additional data from Endresen et al. (2003) for NMVOC emissions from oil tanker venting as used in Lamarque et al. (2010).
The only proxy data that vary before 1970 are those for the RCO (residential, commercial, other) and waste sectors (Table 2). For the RCO sector, for 1900 to 1969, the proxy linearly blends grids from the EDGAR v4.2 RCO 1970 grid and the gridded population, with the gridded population used for years before 1900. The proxy for the waste burning sector is based on rural population. More specific methodological details for certain aspects of the emissions dataset are outlined below.
The EDGAR and GPW proxy datasets that are used for most of the proxy data
are initially processed at the highest resolution available (e.g.,
0.1
The development of historical open burning emissions is described in van
Marle et al. (2017). In brief, the spatial emissions distribution for these
emissions is fundamentally grid-based, with emissions over the satellite era
estimated from remotely sensed data (GFED4s; 1997–2015) merged with several
existing historical proxies for fires, including charcoal records, for boreal
and temperate North America and Europe as well as visibility-based fire emissions
for the tropical areas of equatorial Asia and the arc of deforestation. The
spatial distribution for the pre-satellite era was based on the 1997–2015
average and uniformly adjusted for large geographic regions based on the
proxies (see van Marle et al., 2017, for the basis regions used). For the
regions where proxies had limited coverage the output of six different fire
models from the Fire Model Intercomparison Project (FireMIP) was used
(Rabin et al., 2017) (
Future open burning emissions originate from the integrated assessment models (IAMs), which report by model region and broad category (e.g., forest or grassland burning). While many of the models have some additional level of spatial detail, emissions were reported at the model region level to facilitate common data harmonization, downscaling, and gridding routines. Future open burning emissions were therefore mapped to spatial grids using the same methodologies as used for anthropogenic emissions, as further described below. This means that, unlike historical open burning emissions, the spatial distribution of open burning emissions within a category (e.g., forest burning) within any country does not vary in the future scenarios.
Scenario data for future emissions from IAMs are first harmonized to a common 2015 base-year value by native model region and sector. This harmonization process adjusts the native model data to match the 2015 starting year values with a smooth transition forward in time, generally converging to native model results (Gidden et al., 2018a). The production of the harmonized future emissions data is described in Gidden et al. (2019).
The 2015 base-year anthropogenic emissions data by country and sector are
extensions of the 2014 historical data largely using emission factor trends
for combustion sources from the GAINS model (ECLIPSE V5a;
Stohl et al., 2015; Klimont et al., 2017) and BP fuel consumption statistics (BP, 2016).
Noncombustion sources were generally scaled by estimated population. There
are potentially large changes in emissions over this period, for example in
China (Zheng et al., 2018), which results in uncertainty in these estimates
for regions with rapid changes in air pollution control technology
deployment. Near-term
For open burning, the 2015 base-year emissions data for harmonization and downscaling are an average of the previous 10 years of historical data extracted by country. This is because interannual variability is not captured in the future projections, so a longer-term average is a more appropriate starting point for open burning emissions. While a decadal averaging period was used here, a longer averaging period could also be considered in future work.
The global integrated models (IAMs) that generated the Shared Socioeconomic Pathway (SSP) emission projections each have different numbers of socioeconomic regions (11–32) for which the unharmonized emission data are available. Harmonization to the common 2015 starting dataset therefore occurs at each individual model's native spatial definition in order to preserve as much detail from the model as possible (Gidden et al., 2019). Harmonized emissions for the native model regions are then downscaled to the country level as described in Gidden et al. (2019). The country-level downscaling is performed in order to provide a uniform basis for subsequent mapping to spatial grids but does not necessarily represent specific policies that might be in place for any particular country.
After the downscaling procedure the country-level future emissions projections are at the same level of sectoral resolution as the final gridding sectors in Table 2. These are then mapped to a spatial grid using the same underlying methodology as for the historical anthropogenic data, with a few differences in detail. Spatial proxies for each country are taken to be the 2014 gridded historical emissions discussed above for anthropogenic emissions and the average of the last 10 years (2005–2014) of historical data from van Marle et al. (2017) for open burning emissions. These spatial proxies are therefore constant into the future and do not represent any shifts in the spatial location of emissions within a country. The one exception is international shipping, which uses year-specific spatial distributions from the ECLIPSE project for the years 2015, 2020, 2030, 2040, and 2050. These distributions capture projected changes in shipping fuel sulfur content and the imposition of low sulfur control near coastal areas.
Future open burning emissions are provided from the IAMs in the following categories: agricultural waste burning on fields (AWB), forest burning, grassland burning, and peatland burning. Future trends for these emissions are generated by each model driven by, in large part, changes in land use. Note that these are future projections of climatologically average emission rates over time and do not include interannual variability. Because the future model projections used here do not include data on peatland burning, peatland burning emissions are held constant into the future.
Future emissions were gridded at the aggregate sectoral level, which corresponds to the final gridding sectors in the historical emissions data for anthropogenic emissions (Table 2), and with forest burning combined into one category, aggregating the open burning historical categories of boreal forest fires, temperate forest fires, and tropical deforestation and degradation. This lower level of sectoral detail was used because the models that generated the future scenario data used in this process (Gidden et al., 2019) often lack the finer level of sector detail that was available in the historical emissions datasets. The future data were constructed so that the data were consistent at the grid cell level when moving from the historical (up to 2014) to the future (2015 and forward) dataset.
The second major difference is that future emissions are not produced annually but are provided for 2015, 2020, and at decadal intervals thereafter. This is because long-term models do not provide annual data. The choice was made to only distribute data for years that were originally produced by the models and not interpolated data. This also means that fewer, and smaller, data files need to be downloaded and processed by end users. We note that the format of the open burning emissions in the future data is slightly different than the format used in the historical data. This is because we use the same software for both open burning and anthropogenic future emissions, so a similar data format was used for all future emissions data.
In the default historical and future emissions gridding, emissions from
electric power generation and some other large industrial facilitates are
generally mapped to a spatial grid as large point sources. For most emission
species this is an appropriate representation of emissions currently and
also into the future. For future carbon dioxide emissions, however, an
inconsistency occurs for net negative
We therefore distribute net negative
A global allocation was chosen because biomass trade information was not
available from the models. While a more detailed time-changing map could be
produced, carbon cycle modelers indicated that a global distribution would
be sufficient for use in future scenarios. Note that net negative
Note that this procedure is only applied for net negative
Monthly seasonality is applied to the gridded emissions by applying a set of spatially explicit seasonality fractions. The primary source for the seasonality fractions was the ECLIPSE dataset, with the addition of EDGAR v4.3 for international shipping and Lamarque et al. (2010) for aircraft. The dataset was processed to a consistent seasonality that matched the monthly calendar used in the final emissions data (e.g., a 365 d year). As part of this process some minor inconsistencies in the anthropogenic seasonality data between sectors in terms of the number of days assumed in the year were corrected so that the month distribution was consistent with the 365 d year used in the anthropogenic dataset. Seasonality for future open burning emissions was taken from a 10-year average from the historical dataset.
We note that, after the distribution of this dataset, it was found that the industrial sector in some regions has a high level of seasonality in the ECLIPSE dataset, and this distribution was carried through to the CMIP6 data (as noted on the online GitHub issues list for this dataset; see below). In general, we would expect industrial sector emissions to be fairly constant as most industrial activities operate year-round. The magnitude of seasonality overall in some regions may therefore be overestimated. This will be reexamined in future data releases.
Modeling atmospheric chemistry requires emissions for specific species, or
species groups, of reactive compounds instead of the total mass of volatile
hydrocarbon emissions (NMVOCs) generally tabulated in inventories.
Anthropogenic VOC emissions were speciated by applying speciation profiles
by gridding sector and country, largely derived from the RETRO project.
Emissions for 23 anthropogenic VOC groups were extracted by country
and broad sector from gridded HTAP v2 emissions data (
While speciation profiles by gridding sector are held constant in time, the aggregate VOC speciation for a country will change over time. This is because of different speciation profiles for each sector combined with a changing sectoral contribution to total VOC emissions over time.
Figure 2 shows the ratio of butanes to hexanes
Range across countries of the butane
For both historical and future emissions, however, VOC speciation profiles can potentially change with technology deployment separately from changes in NMVOC emission rates. For example, regulations to limit ozone formation not only result in altered bulk NMVOC emission amounts, but also change the specific VOC species emitted (Kirchstetter et al., 1999). Other examples include changes over time in the composition of consumer products (McDonald et al., 2018) and changing formulation of paints and other coating materials.
It is not known how much the differences illustrated in Fig. 2 reflect updated or different data sources or actual changes in VOC speciation over time. The RETRO dataset is targeted at 1990–1995 and uses speciation profiles available in time period, while the EDGAR data take in newer measurements. While the emission time series presented here, and also the EDGAR data, capture the broad changes in VOC speciation due to changing sectoral contributions to VOC emissions over time, these datasets do not capture underlying changes in speciation profiles due to changing regulations or other fundamental changes to speciation profiles over time. It would therefore be useful to determine the importance of such speciation changes for the historical modeling of atmospheric chemistry compared to the broader changes in VOC emission magnitudes and speciation changes due to sectoral shifts over time. Note that the Huang et al. (2017) data do capture changes in speciation over time due to changing mixes of fuels in each end-use sector, which is illustrated in Fig. 2.
For future open burning emissions, speciation profiles by country and open burning sector were extracted from the historical open burning emissions data files and applied to the future scenario bulk NMVOC emissions. For details on the historical open burning VOC speciation see van Marle et al. (2017). Note that the species provided are different for the anthropogenic emissions and open burning emissions, following the past conventions used by each of those communities.
Total anthropogenic and open burning global emissions for selected
time periods. Values for 2050 and 2100 are from the SSP2-45 scenario. Only
anthropogenic
In this section we provide a brief overview of the gridded data products, focusing largely on diagnostic graphics that illustrate the underlying structure and potential issues that impact the use and interpretation of the gridded data.
Table 4 provides a summary of the gridded data files described in this paper
with the emission species listed in Table 1. The files listed there, plus
historical gridded open burning emissions (van Marle et al., 2017), provide
the complete set of emissions data required for CMIP6 experiments (Eyring et al., 2016). As discussed in Hoesly et al. (2018) the anthropogenic emissions
are consistently generated from the same driver data across sectors and
emission species. The gridded data files are generated using consistent
spatial proxy data, seasonality, and sector definitions. This means that
some emission species, such as As discussed above, future emissions contain an
additional sector to represent net negative
Gridded historical anthropogenic and future scenario emission data files provided for CMIP6. The complete historical anthropogenic dataset consists of 327 files. The complete future scenarios dataset consists of 1017 files. Historical emissions are provided annually, while future emissions are provided for 2015, 2020, and in decadal intervals after that. All emissions are provided with monthly seasonality. Full data citations, including digital object identifiers (DOIs), are provided in the “Data availability” section at the end of this paper.
Historical emission files are generally provided in 50-year
segments in order to keep the size of individual files reasonable
(historical
Supplementary data files include speciated NMVOC emissions for all time
periods. Also provided are historical emissions from solid biomass fuels
(wood, agricultural residues, etc.) as these are used as supplementary data
in some models. Some models use information on solid biofuel
combustion to derive information on associated VOC emissions and/or primary
aerosol (e.g., BC, OC) characteristics.
Spatial distributions for anthropogenic reactive gases and carbonaceous aerosol emissions (e.g., BC and OC) for selected historical and future years are shown in Figs. 3–6, with numerical values in Table 3. An overview of large-scale trends is provided here, with the emission trajectories and their derivation described in more detail in the underlying literature for the historical emissions (van Marle et al., 2017; Hoesly et al., 2018), future scenarios (Calvin et al., 2017; Fricko et al., 2017; Fujimori et al., 2017; Kriegler et al., 2017; Riahi et al., 2017; van Vuuren et al., 2017), and the harmonized CMIP6 data (Gidden et al., 2019).
Total anthropogenic emissions in 1850 by species. The color scale is
the same for each emission species across all gridded figures for
comparison. Note that the
Total open burning emissions in 1850 by species. The color scale is the same for each emission species across all gridded figures for comparison.
Total anthropogenic emissions in 1980 by species. The color scale is
the same for each emission species across all gridded figures for
comparison. Note that the
Total anthropogenic emissions in 2015 by species. The color scale is
the same for each emission species across all gridded figures for
comparison. The
Global emissions of species that are the product of incomplete combustion
(BC, CO,
Future emissions diverge depending on the emission scenario and follow
trends as described in Gidden et al. (2019). A few illustrative graphics are
provided here. By 2050 in SSP2, the “middle of the road” scenario,
emissions have decreased from 2015 levels in most world regions (Fig. 7),
while they have decreased to quite low levels in the SSP1-26 scenario (Fig. S7 ). In
contrast, emissions do not decrease nearly as much in the SSP3-70 scenario
(Fig. S9). By 2100, under the SSP2-45 scenario, anthropogenic emissions are
generally larger than open burning emissions, but anthropogenic
Total anthropogenic emissions in 2050 by species in the SSP2-45
scenario. The color scale is the same for each emission species across all
gridded figures for comparison. The
While 0.5
Figure 8a shows a time series plot of
Total anthropogenic
Time series plot for
For any particular spatial location, the accuracy of the gridded emissions depends on the accuracy of the underlying proxy dataset as well as underlying the country-level emissions data. Global proxy datasets that are available for use in projects such as this, including projects of similar scope such as EDGAR (from which many of the spatial distributions used here are drawn), will not necessarily correspond exactly to the actual spatial location and magnitude of emission sources. For example, databases of global road networks, while they are improving (e.g., Crippa et al., 2018), will not exactly correspond to either actual road traffic at every point, nor are they likely to be complete for all regions. While large emission sources, such as power plants, are more likely to be in global databases, such data can still be incomplete, particularly for smaller plants (Liu et al., 2015). For other sectors, it is difficult to capture in proxy datasets historical changes over time in emission strengths in specific spatial locations.
At the grid cell level, discontinuities over time can therefore potentially be due to (a) discontinuities in the country-level emission time series, (b) actual changes in the underlying emitting processes as represented in the proxy dataset, or (c) data breaks or inconsistencies in the proxy dataset that do not represent an actual historical change in emissions. As either the sectoral aggregation or the spatial scale increases, however, the data will tend to become more robust, at least to the extent that the country-level emission time series used for calibration in this project are accurate (see Hoesly et al., 2018, for a further discussion).
Monthly time series plots visualizing the seasonal cycle of the emissions data, as shown in Fig. 10, are also useful diagnostics. Note, again, that discontinuities at the single cell level are due to temporal structure in the proxy datasets.
Time series plot for total BC emissions over time at the monthly level for a grid cell in Finland.
The temporal signature of the open burning data is quite different, as interannual variations in those data are dominated in large part by year-to-year changes in local meteorological conditions. At least for the modern era, the interannual variations are inferred from satellite observations. As discussed below, the increased use of remote sensing and regional bottom-up data has the potential to improve the spatial and temporal accuracy of anthropogenic emissions data as well.
Spatial data presented at a lower resolution can also be useful in
understanding overall patterns and spatial differences between datasets. As
discussed in Hoesly et al. (2018), we have found that spatial maps at a
10
Difference in CMIP6 (e.g., CEDS) and CMIP5 gridded total
Also note that the CMIP6 dataset has a different distribution of
The spatially distributed emissions data discussed here represent a number of improvements over previous century-scale gridded datasets. These improvements include anthropogenic emissions that were consistently generated from the same driver data and mapped to spatial grids using the same proxy data across emission species with consistent seasonality. Future emissions were downscaled to the country level and then consistently mapped to spatial grids using largely the same gridding methodologies as used for the historical anthropogenic emissions. Further, the software used for historical anthropogenic emissions and future scenario emissions has been made publicly available as open-source software.
Emission estimates remain uncertain owing to data gaps, biases and errors in activity data, biases in emission factors, and spatial and temporal variations in real-world conditions. Relative uncertainties will generally increase going back in time as less detailed sectoral, spatial, and temporal information is available. Future emissions projections are inherently uncertain, which is why multiple scenarios are presented. Within any one scenario, no attempt was made in these data products to estimate how the spatial distributions of emissions within countries might change into the future. What is not clear, however, is the practical value of attempting such detail given the overall uncertainty in future trajectories.
The historical open burning emissions have specific sources of uncertainties, as discussed further in van Marle et al. (2017). These start with uncertainties of the different products used in the construction of the overall dataset: GFED4s, emissions stemming from visibility observations, charcoal-based fire records, and modeled emissions. The choice of which fire models are used before the satellite-observed fires (1997–2015) can influence the results, as will the land-use change data and other assumptions used to drive those models. The use of GFED4s climatology to distribute emissions spatially and temporally (e.g., seasonally) from regional estimates before 1997 assumes that subregional distributions stayed constant before 1997.
While any errors in aggregate emission estimates will flow down to errors in
spatial distribution, here we are particularly interested in
potential errors in allocating emissions spatially. Systematic analyses of
uncertainties in spatial emission patterns, however, have not been assessed
to date. The methodologies used here for historical anthropogenic and future
scenario emissions rely on underlying spatial proxy data. One source of
error is when the spatial proxy data do not match the actual spatial
distribution of emissions. One example was discussed above in the case of
Capturing the distinctions between urban and rural emissions, and the finer distinctions in between, is an ongoing challenge for emission inventories, for example road transport emissions, open refuse burning, and fuel combustion for residential heating and/or cooking. Driving conditions and emission profiles tend to differ between urban and rural conditions, resulting in different fuel consumption and emission factors. To the extent that regional inventories become better able to capture these distinctions, those methodologies could ultimately be incorporated into the methods for generating proxy data for global datasets.
While the focus here was the use of consistent spatial proxy data across
species, in some cases it might be desirable to use varying proxies between
pollutants; e.g., coal use in the residential sector will produce significant
More general spatial errors can also occur because of mismatches between proxy data and actual emission sources. Traffic volume, for example, is not likely to scale purely with the class of roadway in global road datasets. Country-level emissions from power plants, for example, are allocated spatially using proxy data such as power plant capacity and the presence of emission control devices, which is the primary information available in global proxy datasets. Actual emissions, however, will not necessarily be proportional to facility capacity, resulting in spatial errors. Further, point sources can be missing in global proxy datasets, particularly in developing countries (Liu et al., 2015). In addition, the spatial location of individual sources might not always accurate in proxy datasets.
Additional kinds of errors can occur in the temporal dimension. Even if the time series at the aggregate country level is correct, emissions at any spatial location will likely not follow the same temporal pathway. The emissions rate of localized sources is likely to change over time, and individual point sources could close down, or new sources could be put into operation. Temporal consistency over time at the grid cell level is perhaps one of the most difficult challenges for long-term gridded datasets. Improved information on emissions over time for large point sources (see below) is the most important issue here, as year-to-year changes in large sources could be a large source of temporal changes in emissions.
The distribution of emissions over the year, represented in these datasets as monthly average seasonality, is another area in which improvement is likely needed. The seasonal distribution for anthropogenic emissions is based on global sectoral analysis, for example mapped by heating degree days (HDDs), which is not likely to incorporate country-specific details on how HDDs map to actual fuel use. The rate of heating fuel consumption relative to temperature will likely be different in a cold region of a country (where heating demand is high) compared to a region with a milder climate using different heating technologies. The seasonal distribution of ammonia emissions depends on the sector and region (Paulot et al., 2014), and it is not clear if there is a general consensus on these seasonal distributions.
There are several potential strategies for reducing such biases and errors in gridded datasets. One general approach is to improve the completeness and representativeness (relative to actual emissions) of global proxy datasets to reduce the issues discussed above. One example is community efforts to improve global databases, such as those for power plants and road networks, used in emissions work.
Another method is to estimate emissions at smaller geographical units, such
as states or provinces, and allocate emissions spatially at that level.
Emissions information is often available at subnational levels, for example
for US states (US EPA), Chinese provinces (Liu et al., 2015), and Indian states
(Klimont et al., 2017). Allocating emissions information at this level would
result in an improved overall distribution of emissions over large
countries, even if the same global proxy datasets were used for each
individual state or province. For global-scale models this would likely be
sufficient accuracy, assuming the state-level emission time series were
accurate. An effort is underway, for example, to implement a state-level
disaggregation for the USA in the CEDS, which would fix the east–west
Detailed spatial data for emissions have been developed over many regions for air quality modeling purposes, and it is likely that incorporating these data into global inventories could result in improved spatial data overall. One method is to simply “stitch” together detailed spatial maps where they are available (Janssens-Maenhout et al., 2015). One disadvantage of this approach, however, is that the resulting datasets are often inconsistent because detailed spatial data are not available for every year; different years in different regions need to be used. Inconsistencies also arise at country borders where grid cells overlap. Another disadvantage is that these detailed data are often time-consuming to generate, which means that these regional datasets are not available for the most recent years. An alternative to binding together emission data grids is to incorporate the underlying spatial proxy distributions into workflows such as those described here. This would presumably result in improved spatial distributions for those regions with data but also be flexible enough to allow for greater temporal consistency between regions. One challenge, however, is that these detailed datasets are often only available for recent decades. For longer-term modeling, some method of extending these proxy data back into time would be needed, perhaps by blending with simpler proxy data such as population distributions.
For large point sources in some regions direct measurements of emissions
through continuous emission monitoring (CEM) systems are in place, which
potentially provide highly accurate emission estimates for those specific
sources. Another source of emission information for large sources is
satellite remote sensing data. Fioletov et al. (2016), for example, have used
satellite data to develop a catalog of emission sources “missing” in
existing gridded inventories. Either of these data sources could be combined
with bottom-up estimates to produce a more robust emission dataset, with the
spatial location and temporal (e.g., year-to-year) variations of emissions
from large sources provided by CEMs or remote sensing analysis. This was
done, in a limited manner, for
Additional data sources on specific large sources, such as literature
sources and corporate reports, can also be valuable for providing temporal
emission detail. While such information was used in the CEDS CMIP6 release,
for example for For example, the Kazakhmys smelter in Kazakhstan, Raahe
sintering plant in Finland, and the Algoma sintering plant in Canada. See
the CEDS online documentation (
While there are many aspects of emissions data that can potentially be improved, is will be important to better assess which improvements are most critical for focused efforts. For example, is a 10 % error in overall emissions level in a region more or less important than a 20 % east–west error in spatial allocation? These trade-offs will depend on the model and specific application. For example, smaller-scale details will be less important for coarser-scale models. It would be useful to more systematically test the sensitivity of models to spatial allocation. While localized results will depend on spatial distribution, it is not clear how sensitive larger-scale results, such as circulation, temperature, and precipitation changes, are to differences in spatial allocation. One work directly examining this sensitivity (Geng et al., 2017) found that, once power plant point source and road network proxy data were used, there was a significant benefit from using industrial economic activity data to spatially allocate industrial sector emissions.
The current datasets produced for CMIP6 were produced at either
0.5
Emission datasets would also benefit from more systematic comparisons with observations. Some observations, such as concentration ratios, can be directly compared to inventories (e.g., Hoesly et al., 2018), while, more generally, modeled concentrations need to be compared to observations. Where model biases can be ruled out, differences with observations can be used to inform improvements to inventory data. At present, however, this is performed in an ad hoc manner, with no regular systematic comparison across species and regions.
Finally, it would be useful if the types of long-term emission datasets discussed here could be more regularly updated. In the past, most of these datasets were compiled once every 5 or so years in order to produce data for the next CMIP exercise. A paradigm of continuous improvement would allow for more incremental releases, allow new emissions data to be more thoroughly tested, and result in fewer differences between each subsequent dataset release. This is already the case for the GFED open burning data for the satellite era and is in progress for the CEDS anthropogenic data.
Below we provide a summary of the release history for gridded emissions data.
June and July 2016: preindustrial (1750–1850) data (ver. 2016-06-18) and
historical 1851–2014 (ver. v2016-07-26) for aerosol and reactive gas species
were published on ESGF. September 2016: files were republished in a new format (with sectors
contained within a dimension) due to a limitation in ESGF software on some
nodes (version numbers were amended with -sectordim). Data values did not
change. May 2017: new data version (ver. 2017-05-18) that corrected some errors in
the previous gridded data (ver. 2016-06-18 and 2016-07-26; see README and
analysis of difference between the new and old datasets at August 2017: “rough cut” extension of September 2017: new versions of aircraft gridded emissions (ver. 2017-08-30
and 2017-10-05 for
V1.2 (December 2016). Sectoral contributions for all species are now
included (in v1.1 only for CO). Minor modifications to the interannual
variability between 1960 and 1997 for tropical regions. V1.1 (November 2016). Updated emissions, mainly for the boreal regions.
Global total fire carbon emissions decreased by about 5 % ( V1.0 (
As of this writing only one version of the future emissions has been
published. Most species and data files were published on ESGF on 28 June 2018.
Fossil
The codes and data described in this paper are available in a number of open-source data and code repositories. The gridded data provided for CMIP6 are
available on ESGF, as also described in Table 4 for gridded historical
anthropogenic and future scenario emission data files. Data references,
including DOIs for each gridded dataset, are also provided below.
Historical anthropogenic emissions: Historical open burning emissions: Future emissions downscaling and gridding:
CMIP6 gridded data files:
The supplement related to this article is available online at:
LF developed historical and future emissions gridding methodology and code, produced historical anthropogenic gridded emissions, and developed the initial paper outline and text. SJS administered the project, participated in data development, and produced the initial full paper text. CB extended future emissions gridding code and produced future scenario gridded emissions. LF, MC, MG, ZK MvM, MvdB, and GRvdW commented on the paper text. MC, ZK, MvdB, and RH supplied key data for anthropogenic emissions. MvM and GRvdW produced data on historical open burning emissions. MG led the overall production of the harmonized future emissions scenarios.
The authors declare that they have no conflict of interest.
This research at JGCRI was support by the National Aeronautics and Space Administration's Atmospheric Composition: Modeling and Analysis Program (ACMAP), award NNH15AZ64I (historical emissions gridding), the U.S. Department of Energy, Office of Science, as part of research in Multi-Sector Dynamics, Earth and Environmental System Modeling Program (future scenario gridding), and the DOE Office of Science, Biological and Environmental Research (historical emissions development). The Pacific Northwest National Laboratory is operated for the DOE by the Battelle Memorial Institute under contract DE-AC05-76RLO1830. Work on future scenario data received funding from the European Union's Horizon 2020 research and innovation program under grant agreement no. 641816 (CRESCENDO). The authors would like to acknowledge Chris Heyes for gridded data development and validation at IIASA, Kalyn Dorheim for helpful comments, and Matthew Nicholson for assistance with figures.
This research has been supported by the U.S. Department of Energy, Office of Science (grant no. DE-AC05-76RLO1830), the European Union Horizon 2020 research and innovation program under grant agreement no. 641816 (CRESCENDO), and the National Aeronautics and Space Administration (grant no. NNH15AZ64I).
This paper was edited by Jason Williams and reviewed by two anonymous referees.