A machine learning-based dissolved organic carbon climatology

Posted by Design Studio

26 June 2024

Challenge 3: Marine Biota

T Panaïotis¹, J Wilson² & BB Cael¹

¹National Oceanography Centre, Southampton, UK

²University of Liverpool, Liverpool, UK



Ocean dissolved organic carbon (DOC) is a major reservoir of carbon in the ocean-atmosphere-climate system, containing as much carbon as the preindustrial atmosphere. Although DOC is thought to be a reactive reservoir that impacts climate, it is one of the least constrained parts of the global carbon cycle. The lack of a comprehensive DOC climatology hinders model validation, estimation of the modern DOC inventory, and understanding of DOC’s role in the carbon cycle and climate. In this work, we take advantage of machine learning (ML) to relate DOC observations to other variables for which climatologies are already available and use the inferred relationships to predict DOC values where no observations are available, hence generating a DOC climatology.


Method: data and machine learning (ML) model

DOC observations were spatially associated with annual climatologies of various biogeochemical variables (temperature, oxygen, nutrients…). Boosted regression trees (BRTs) were used to relate DOC observations to these predictors in 4 distinct layers: surface (< 10 m), epipelagic (10 – 200 m), mesopelagic (200 – 1000 m) and bathypelagic (> 1000 m). This subdivision provides a clear depth separation of the different depth-dependent processes driving DOC content, while providing products suitable for different analyses (e.g. surface climatology for constraining satellite products). For each prediction task, we used 10-fold nested cross-validation to obtain robust estimates of model performance (one R² value per fold) while fine-tuning the model at the same time. Inferred relationships between DOC observations and other variables (i.e. predictors) are then extrapolated layer-wise to the entire globe to compute annual DOC climatologies as well as uncertainties.


Results: DOC projections, uncertainties and global estimate

Prediction performance was satisfying, with R² values within 0.6 – 0.8 for all layers. On the surface, the Southern Ocean is particularly depleted in DOC (Figure 1), while higher values are found in coastal areas. In both mesopelagic and bathypelagic layers, high DOC values are found in the Atlantic, in accordance with the ventilation of that basin by the Atlantic Meridional Overturning Circulation. Nitrate being the most important predictor in the upper ocean layers suggests that upper-ocean DOC concentrations are primarily determined by plankton ecosystem productivity. Conversely, oxygen being the most important bathypelagic predictor, but not in the upper layers, is consistent with deep-ocean DOC concentrations being determined by the rate of consumption of DOC by deep-ocean heterotrophic bacteria. Finally, by integrating our predictions over the globe, we estimate the total ocean DOC inventory to be 691 PgC, in line with values from the literature.



In conclusion, we show that ML is a powerful tool for constructing a global climatology from a limited number of DOC observations. Our climatology is useful not only for empirically quantitatively constraining the present-day DOC inventory, but also for validation of both prognostic and diagnostic models of DOC. Further observations should allow us to refine this product and eventually predict large DOC shifts in the context of global climate change.  Both the climatology products and the code will be made available online.

Map of annual DOC predictions in the surface (0-10 m) layer, averaged across CV folds.