Climate Downscaling using SRGAN with Multi-Task Learning | by Jack Shi Wei Lun | Jun, 2023

0
26


Pixels to Precipitation: Unraveling the complexities and significance of local weather downscaling, and learn how to carry out them utilizing machine studying strategies

Downscaling image
Supply: https://www.engr.scu.edu/~emaurer/old/research.shtml

With the current advances in AI, inevitably, local weather science will also be improved through the use of machine studying. An insightful paper lately authored by prime machine studying consultants gives a complete evaluation of potential machine studying purposes in preventing local weather change, and advocates for utilizing machine studying to deal with this international problem. Many articles on Medium additionally started to emphasize the significance of local weather change and state how machine studying may help.

Nonetheless, none of the local weather articles on Medium present a practical in-depth technical overview of learn how to use machine studying particularly for local weather downscaling, a vital element inside local weather science. Moreover, the vast majority of the technical articles solely defined typical 1-D time-series or 2-D warmth map evaluation of some variables, or the utilization of gradient boosting algorithms for tabular local weather knowledge.

Moreover that, local weather terminologies are sometimes obscure and complicated, with surprisingly few good explanations accessible on-line. Even with an unlimited quantity of excellent local weather knowledge accessible, only some individuals are keen to experiment with these knowledge. That being stated, experimenting with different knowledge associated to common inventory tickers or the newest craze relating to GPT-4 would in all probability excite these identical individuals.

I don’t blame them.

A typical multidimensional gridded climate data from Xarray
A typical multidimensional gridded local weather knowledge. Supply: https://earth-env-data-science.github.io/lectures/xarray/xarray_intro.html

For one, local weather knowledge are simply far more advanced to work with. Scripting, extracting, and post-processing uncooked local weather knowledge take means an excessive amount of effort in comparison with a inventory ticker. Above the standard time-series evaluation, virtually all of those local weather knowledge are multi-dimensional. It might require some Python libraries like Xarray to unpack as a substitute of having the ability to visualize them properly into rows and columns.

What Is This Article About?

This primary a part of a two-part article goals to offer a complete head begin to readers venturing into local weather science however really feel misplaced within the huge quantity of data, or wish to discover local weather knowledge utilizing machine studying strategies on their very own. I’ve accrued numerous assets that benefitted me alongside means and can be sharing them with you on this article.

Though many alternative cool matters inside local weather science could also be tackled utilizing machine studying (e.g., predictions of maximum climate occasions or CO2 emissions, local weather change coverage exploration utilizing reinforcement studying), we will focus particularly on local weather downscaling on this article. Multi-task studying with the Tremendous Decision Generative Adversarial Community (SRGAN) structure would be the technique used to downscale coarse local weather fashions [7] and enhance climate forecasts. The rationale for discussing this technique is as a result of I really feel it properly encapsulates the essence of getting novelty whereas tackling two important challenges for enhancing local weather science — bias correction and downscaling.

On this article, we are going to present an outline of what precisely is local weather downscaling and its significance, among the strategies used for local weather downscaling, and focus on three essential forms of local weather knowledge which are generally used because the enter and floor fact for many machine studying fashions.

We’ll then clarify what’s SRGAN and multi-task studying, and the way they are often utilized collectively for local weather downscaling. Then, we will dive into the research space and enter knowledge used, the reason of loss capabilities for the 2 particular duties, present explanations and Python codes for the total structure.

Be happy to skip to any predominant part by clicking on the respective hyperlink within the desk of contents under.

Desk of Contents

Supply: https://loca.ucsd.edu/

Have you ever ever began watching a YouTube video, solely to be annoyed by its preliminary blurriness, main you to marvel why your web is appearing up? Then, as if by magic, the video all of the sudden turns into crystal clear. That’s principally what we are attempting to do on this article (albeit deliberately) — acquiring high-resolution meteorological variables from coarse ones.

With the onset of world warming in recent times, temperature (together with different meteorological fields) and the hydrological cycle are anticipated to escalate, exacerbating the impacts of maximum climate occasions resembling warmth waves droughts and flooding. These occasions have grim socio-economical penalties [4] on ruining communities, infrastructure, agriculture, and different human-environment programs.

Sadly, most of those essential planning selections must be made primarily based on the coarser decision local weather knowledge, because the high-fidelity and high-resolution local weather knowledge usually are not available. These straight have an effect on the usefulness of the insurance policies and accuracy of any local weather change affect assessments.

Forest fireplace within the Boise Nationwide Forest, Idaho, because of human-induced local weather change. Supply: https://education.nationalgeographic.org/resource/influence-climate-change-extreme-environmental-events/ and {photograph} by David R. Frazier.

Right here is the place local weather downscaling comes into play.

It performs a crucial position in aiding numerous purposes, together with agriculture, water assets administration, and concrete planning. As an example, agricultural planning necessitates information of native local weather situations resembling temperature, precipitation, and soil moisture to optimize crop yields and cut back crop failure dangers. Moreover, downscaled local weather projections can present important data for designing resilient infrastructure that stand up to future local weather situations.

Local weather downscaling may also assist policy-makers and decision-makers to judge the potential impacts of local weather change on the regional degree and plan adaptation measures accordingly. This correct data can then inform authorities insurance policies to hasten the resilience initiatives relating to local weather change impacts, resembling sea-level rise, drought, and excessive climate occasions.

An instance of the downscaling course of for max recorded temperature, changing coarse knowledge to the next decision. Supply: Databasin.org

In local weather science, downscaling refers back to the technique of refining coarse-resolution international local weather fashions (GCMs) to fantastic spatial scale floor station knowledge [1, 2], as proven within the picture above. Its predominant purpose is to convey GCM knowledge in nearer settlement with station degree knowledge [3]. It’s a necessary software for local weather scientists to enhance the accuracy of GCMs by offering extra detailed details about regional local weather patterns. The most typical meteorological variables of curiosity to be downscaled are often precipitation and temperature (though the previous variable is far tougher to downscale as a result of chaotic conduct of its parts and excessive intermittency in area and time). On this article, we will deal with downscaling precipitation.

Sometimes, local weather downscaling is carried out utilizing both dynamical or statistical strategies.

Dynamical downscaling includes using high-resolution regional local weather fashions (RCMs) to higher resolve finer options of local weather variables on the floor, resembling temperature, precipitation, and wind patterns. This technique requires working the GCM output by means of a regional local weather mannequin with larger decision and extra detailed details about topography and land use. This course of yields a extra detailed understanding of regional local weather patterns and their results on native communities and ecosystems, nevertheless it necessitates considerably extra computing energy.

Dynamical downscaling course of. Supply: By the writer, with inspiration from Geophysical Fluid Dynamics Laboratory.

Statistical downscaling, however, includes utilizing statistical methods to estimate native local weather variables primarily based on the connection between large-scale and native local weather variables. This technique is computationally environment friendly and could be useful for long-term projections of future local weather change impacts.

Statistical downscaling process. Z, T, Q represent geopotential height, air temperature and specific humidity respectively at different pressure levels. Various other predictors can also be used.
Statistical downscaling course of. Z, T, Q symbolize geopotential top, air temperature and particular humidity respectively at completely different stress ranges. Varied different predictors will also be used. Supply: https://bookdown.org/floriandierickx/bookdown-demo/climate-data-discovery.html

Within the introduction part of this paper, they defined additional and offered different papers related to the comparability between dynamical and statistical downscaling strategies.

Listed below are additionally some good assets on how local weather fashions work, and the historical past of local weather modeling:

On this part, we take a look at among the conventional and newest deep studying strategies used for local weather downscaling. For every technique talked about under, a hyperlink to a paper is offered. Do notice that the strategies are solely defined in brevity, as an in-depth clarification of these strategies is out of the scope of this text. One also needs to remember that quite a few downscaling strategies usually are not talked about under, resembling switch perform fashions, lengthy short-term reminiscence networks, transformers, and many more.

Some statistical strategies used for downscaling of GCMs. Supply: https://ore.exeter.ac.uk/repository/handle/10871/17139

Climate Typing

The method of weather typing includes the classification of atmospheric circulation patterns right into a finite variety of classes. Stochastic fashions are then used to simulate climate sorts, whereas conditional possibilities set up the hyperlink between rainfall prevalence and climate sort. The ensuing climate sorts can be utilized to simulate precipitation and different hydro-meteorological processes.

One of many key advantages of this method is that it considers the connection between local weather at a big scale and climate at an area scale. Climate typing strategies are goal, have sound bodily foundation, and can be utilized for multisite downscaling [11]. This technique additionally has the potential to generate lengthy sequences of each day precipitation knowledge at a given web site utilizing restricted historic knowledge units.

Climate Turbines

Weather generators (or stochastic rainfall fashions) are statistical fashions that generate artificial sequences of climate variables that statistically resemble each day climate knowledge at a selected location. The Markov chain technique is without doubt one of the basic classes of each day climate turbines. Within the Markov chain method, a random course of determines the each day rainfall prevalence primarily based on the state of yesterday, whereas the quantity of rainfall is drawn from a likelihood distribution. The parameters of climate turbines could be conditioned on large-scale states or relationships between large-scale and local-scale parameters for statistical downscaling. This conditioning can subsequently improve spatial correlation and alleviate the underestimation of inter-annual variations in climate variables, and is a really cheap means to take action.

Regional Local weather Fashions

A regional climate model (RCM) is obtained by means of dynamically downscaling GCM as talked about earlier. It’s just like a GCM in that it simulates the bodily processes within the local weather system. RCMs cowl a restricted space of the globe and are run at a lot finer spatial decision (i.e., 1–50 km grid spacing versus 100–300 km grid spacing in a GCM). Thus they’ll simulate the interactions between large-scale climate patterns (from a GCM) and the native terrain. How RCM is obtained is that, GCM output knowledge are used to pressure the RCM at its boundaries and the RCM dynamically downscales the GCM by producing fine-scale climate patterns in step with the coarse-resolution options within the GCM.

The disadvantages of a RCM are that it’s computationally expensive and can’t explicitly take away systematic variations (biases) between the GCM and observations as statistical strategies can [5]. Thus, for a lot of purposes, some bias corrections are generally utilized to the outcomes to take away the mixed biases of the GCM and RCM.

Convolutional Neural Networks

Convolutional neural networks (CNNs) have additionally been used as a promising software for local weather downscaling. They will extract related spatial and temporal options from large-scale local weather knowledge and predict local-scale local weather variables with larger accuracy and spatial decision. The flexibility of CNNs to seize advanced spatiotemporal patterns in local weather knowledge can reproduce the fine-scale variability of native local weather processes not resolved in GCMs.

Nonetheless, like several machine studying method, CNNs require giant quantities of coaching knowledge, and biases within the coaching knowledge can have an effect on their efficiency. Restricted interpretability supplied by these “black-box” CNN fashions [6] may additionally hinder extrapolation evaluation, which limits their usefulness in sure purposes. Nonetheless, CNNs supply a promising avenue for advancing local weather downscaling analysis and enhancing our understanding of local-scale local weather processes and their impacts on regional and international local weather variability.

On this part, we focus on three local weather knowledge generally used as inputs, floor fact, or output validation for machine studying fashions.

Many alternative sources present reforecasts and reanalyses knowledge; you’ll find a few of them right here: GEFSv12 reanalysis, GEFSv12 reforecast (or here), and ERA5 reanalysis. For our experiment within the second a part of the article, we are going to make use of GEFSv12 reforecast and GEFSv12 reanalysis as our datasets. Extra particulars on the info used can be defined within the second half.

Normally, it may be anticipated that the reforecast knowledge are used for inputs, and reanalysis knowledge can be utilized for floor fact (not all the time the case!). Station knowledge can then be used as floor fact as properly, or to validate the ultimate output of the mannequin, since it’s anticipated to be closest to the precise measured variables assuming no errors with the stations.

Reforecast

Reforecasts (also referred to as ‘retrospective forecasts’ or ‘hindcasts’) are forecasts run over previous dates. In different phrases, they’re produced in the present day however ranging from some level up to now [8]. They’re usually used to evaluate the talent of a forecast system (i.e., predictability) or to develop instruments for bias correction of the fashions. Like all forecasts, reforecasts require a set of applicable preliminary situations, which reanalysis can readily provide [8].

For instance, the World Ensemble Forecast System, model 12 (GEFSv12) that we are going to be utilizing in our experiment for the subsequent a part of this text, makes use of 5‑member operational ensemble reforecasts which are initialized by the GEFSv12 reanalysis as soon as per day from 00 UTC preliminary situations, spanning the interval 2000–2019. As soon as per week, an 11-member reforecast was generated, and these lengthen in lead time to +35 days [9], though we is not going to be utilizing these. This in flip can be utilized to find out long-range forecast anomalies such because the weekly imply deviation of forecasted variables [8]. This could even be ultimate as a enter dataset for machine studying fashions.

Here’s a journal explaining the significance of reforecast dataset:

Reanalysis

Have you ever ever questioned how are scientists in a position to get hold of the temperature of the Earth’s system lots of of years in the past? Certainly, they don’t have the measuring instruments to carry out that again then. Even when there have been observations as time progressed, they had been often scattered very far aside and distributed inconsistently all over the world. Immediately, our skies are affected by satellites, however a full and correct image of the Earth system continues to be not doable with these observations alone.

Due to this fact, reanalysis comes into play by assimilating numerous knowledge sources of climate observations (defined within the subsequent part) with mannequin data. It is going to then fill within the gaps between these noticed knowledge, reconstruct, and develop a complete file of how climate and local weather change over time. Therefore, they’re globally full and constant in time, and are generally known as ‘maps with out gaps’ [10]. This could even be ultimate as a floor fact dataset for machine studying fashions.

Listed below are two fascinating movies that evaluate reanalysis knowledge to that of a soccer kick and a jigsaw puzzle:

Station/Observational

Because the title suggests, station knowledge are locations the place bodily properties resembling climate situations, wind pace, cloud cowl, and temperature are recorded at a given reporting station. These stations embody numerous sources resembling meteorological satellites, drifting ocean buoys, ships, climate balloons, plane, and lots of extra (these knowledge could be assimilated for reanalysis). Station knowledge is often collected utilizing standardized strategies and devices, making certain excessive knowledge high quality and consistency. High quality management measures are sometimes utilized to establish and proper errors or biases within the knowledge, making it a dependable supply of local weather data for scientific analysis and purposes. Due to this fact, it’s primarily used as a floor fact dataset in machine studying fashions.

Nonetheless, do notice that the majority station knowledge are non-gridded in area, and you’ll in all probability have to remodel (i.e., interpolate) them into gridded knowledge for direct and correct validation/comparability with the (seemingly) gridded output obtained out of your machine studying fashions.

Listed below are some good assets on learn how to deal with local weather knowledge:

Subsequent, we give a quick description on how SRGAN and Multi-Activity Studying work, and the way they are often utilized collectively in local weather downscaling. Then, we are going to delve into the research space and knowledge used, clarification of loss capabilities used for the 2 particular duties, present Python codes for the total structure.

For the total scripts and cleaned local weather knowledge information, please check with my GitHub. Nonetheless, be at liberty to scrap your individual GEFSv12 knowledge for a research space of your individual desire from NOAA; there are many assets on-line exhibiting you the way to try this. For the time being, the outcomes for .h5 fashions usually are not uploaded. Be happy to experiment with the codes your self.

As talked about within the first a part of the article, there are various methods to go about local weather downscaling. On this second half, we will deal with SRGAN with multi-task studying for local weather downscaling.

Earlier than we go into our full structure used for local weather downscaling, let’s take a look at what precisely are multi-task studying and SRGAN.

Multi-Activity Studying

Multi-task studying with 2 particular duties. Supply: By the writer

Multi-task studying (MTL) is a machine studying paradigm that allows the simultaneous studying of a number of associated duties utilizing a shared mannequin. Because of its skill to extend mannequin efficiency and generalization, lower overfitting, and improve effectivity, MTL has been gaining traction within the machine studying neighborhood. At the moment, MTL already has a variety of purposes in pure language processing, pc imaginative and prescient (tesla autopilot uses this approach), and speech recognition, amongst others.

One of many key advantages of MTL is the flexibility to switch information between considerably associated duties. By leveraging shared representations throughout duties, we hope that MTL can enhance the efficiency of particular person duties. Moreover, MTL may help deal with the difficulty of area shift, the place the distribution of the info modifications throughout completely different duties or domains (e.g., an algorithm skilled on newswires might need to adapt to a brand new dataset of biomedical paperwork [12]). By collectively studying throughout duties, the mannequin can be taught to generalize higher to new domains, resulting in improved efficiency on unseen knowledge.

Listed below are 2 medium articles that additional defined how MTL works:

Tremendous Decision Generative Adversarial Networks

Tremendous Decision Generative Adversarial Networks (SRGAN) is a deep studying approach used to boost the decision and high quality of low-resolution photographs. SRGAN is an extension of Generative Adversarial Networks (GANs), that are a sort of deep neural community used for producing artificial knowledge. The purpose of SRGAN is to generate top quality photographs with the next decision than the enter photographs.

SRGAN works through the use of a generator community to upscale the low-resolution photographs and a discriminator community to tell apart between the generated high-resolution photographs and the true high-resolution photographs. The generator community learns to generate high-resolution photographs by minimizing the distinction between the generated and actual photographs, whereas the discriminator community learns to tell apart between the generated and actual photographs.

One of many predominant benefits of SRGAN is its skill to generate photo-realistic photographs with excessive decision. SRGAN has proven important enchancment over conventional interpolation-based strategies, which solely interpolate the pixel values of low-resolution photographs to create high-resolution photographs. SRGAN will also be utilized in a wide range of purposes, together with medical imaging, satellite tv for pc imaging, and video compression, amongst others.

Listed below are 2 medium articles that offered detailed explanations of how SRGAN works:

Now, we’ve come to the essence of the article. So how can we mix SRGAN with MTL for local weather downscaling purposes? For the sake of brevity and ease, the remainder of this part shall assume that we’re downscaling precipitation. Let’s focus on SRGAN first.

SRGAN in Local weather Downscaling

Utilizing solely the generator community portion proven within the pink field as our shared layers for MTL. Supply: https://arxiv.org/pdf/1609.04802.pdf

We will exploit simply the generator community of the SRGAN structure by treating it because the shared community previous to splitting into duties. The a number of residual blocks within the generator community can extract fantastic spatial precipitation options whereas stopping degradation issues for the deep neural community by means of the utilization of convolutional and batch normalization layers. In comparison with regular CNN architectures, residual blocks can enhance the efficiency of extensively deep networks with out experiencing mannequin accuracy saturation and degradation as residual blocks execute residual mapping and embody skipping connections [7].

On this article, the way in which that skipping connection skips layers and connects the subsequent layers is thru element-wise addition. A complete variety of 16 residual blocks had been utilized within the SRGAN structure, making the community extraordinarily deep and able to extracting fantastic spatial options.

MTL in Local weather Downscaling

Downscaling of precipitation consists of two predominant elements.

First, we have to be sure that the precipitation variable is bias corrected. Direct output may very well be used, however these outputs are sometimes not helpful due to important biases (e.g., rainfall could be constantly too excessive or low, the mannequin does an incorrect simulation of the monsoon, the rains begin too early or too late, and fashions are inclined to overestimate the variety of days with rain and underestimate precipitation extremes [13]).

Second — which is fairly apparent on condition that that is the entire function of downscaling — we have to be sure that the coarse precipitation enter is downscaled into larger decision output.

Are you able to see how this may be framed into two distinct (albeit comparable) duties?

The primary process (process 1) could be framed as a classification drawback. We will classify precipitation into classes to bias right them. For instance, they are often labeled below its absolute depth (i.e., gentle rain for < 2.5mm/h, average rain for two.5mm-10mm/h, heavy rain for 10mm-50mm/h), or could be labeled below numerous quantiles of the coaching dataset (i.e., fiftieth, seventy fifth and ninety fifth quantiles). The primary function of that is to make sure that the rainfall is bias corrected inside the suitable classes (i.e., heavy rain shouldn’t be wrongly labeled/predicted as gentle rain)

The second process (process 2) handles the downscaling (and intrinsically the bias correction as a result of nature of the mannequin) of precipitation. This may be accomplished by means of the utilization of upsampling blocks. Do notice, confusingly, the time period ‘upsampling’ in pc imaginative and prescient refers back to the improve of the spatial decision of a picture, synonymous with the time period ‘downscaling’ in local weather science.

Combining the 2 duties layers with the shared layers, the total structure is proven under:

With out additional ado, let’s dive proper in.

Examine Space

Our research space can be Singapore, a densely populated city-state with an space of round 721 km2 , is situated off the southern finish of Malay Peninsula. It’s separated from Malaysia by the Straits of Johor to the north and from Indonesia’s Riau Islands by the Singapore Strait to the south. Singapore is located one-and-a-half diploma (137 km) north of the equator, and has a tropical local weather with copious precipitation, excessive and constant temperatures, and excessive humidity all year long.

Examine space of our postprocessed coarse GEFSv12 knowledge. Every grid represents a knowledge level. One can observe that Singapore is just surrounded by six grid factors, which is usually too coarse for utilization. Supply: By the writer

Enter Information

For our coarse enter knowledge, we will make use of GEFSv12 reforecast dataset. We’ll use GEFSv12 reanalysis dataset as the bottom fact.

Since there are 2 floor truths required for two particular duties, we can have y_class and y_hr, each coming from GEFSv12 reanalysis dataset. y_class would be the classification model of GEFSv12 reanalysis (grouped into 4 quantiles), whereas y_hr would be the bilinearly downscaled model of GEFSv12 reanalysis, utilizing a complete downscaled issue of 12 (3 upsampling blocks). Thus, the ultimate decision of precipitation can be 96 x 132.

Much like how a number of options are often used to foretell a inventory worth, we are able to additionally use a number of meteorological variables to downscale 6-hourly precipitation. Do notice that for whole precipitation and precipitable water, we included all 5 members (1 management and 4 perturbed members) of the GEFSv12 reforecast knowledge. For two-meter temperature, convective inhibition and convective accessible potential power, we chosen solely the c00 member.

Supply: By the writer. (The inspiration of what variables to select from for our enter knowledge got here from https://arxiv.org/abs/2203.12297)
Coarse enter knowledge of dimensions 8 x 11 x 13. 8 x 11 refers back to the latitude and longitude respectively, and 13 refers back to the variety of variables. Supply: By the writer

Postprocessing of Enter Information

The datasets are cut up into coaching, validation, and testing. The timeframe for coaching is from 2000–01–01 T06 to 2014–01–01 T06, validation is from 2014–01–01 T12 to 2016–12–31 T12, and testing is from 2016–12–31 T18 to 2019–12–31 T18, with whole of 20457, 4381, 4381 samples respectively.

Rainfall samples are all log-transformed. A pseudo depend of x+1 is utilized to remodel rainfall values which are 0 mm. Variables that aren’t rainfall (i.e., 2-meter temperature, convective accessible potential power, and convective inhibition forecast) usually are not log-transformed. After that, a min-max normalization is utilized to the coaching set just for all variables. This min-max normalization is then utilized to the validation and take a look at units to stop knowledge leakage.

The enter datasets (prepare, validation, and take a look at) are then concatenated to type a dimension of n x 8 x 11 x 13 as proven under, the place n is the pattern dimension of every respective datasets.

lst_train_ensemble = [apcp_train_p01, apcp_train_p02, apcp_train_p03, apcp_train_p04, apcp_train_c00,
pwat_train_p01, pwat_train_p02, pwat_train_p03, pwat_train_p04, pwat_train_c00,
cape_train_c00, t2m_train_c00, cin_train_c00]

X_train_ensemble = np.stack((lst_train_ensemble), axis = -1) # stacking 13 variables into 1 single 4D array

lst_val_ensemble = [apcp_val_p01, apcp_val_p02, apcp_val_p03, apcp_val_p04, apcp_val_c00,
pwat_val_p01, pwat_val_p02, pwat_val_p03, pwat_val_p04, pwat_val_c00,
cape_val_c00, t2m_val_c00, cin_val_c00]

X_val_ensemble = np.stack((lst_val_ensemble), axis = -1) # stacking 13 variables into 1 single 4D array

lst_test_ensemble = [apcp_test_p01, apcp_test_p02, apcp_test_p03, apcp_test_p04, apcp_test_c00,
pwat_test_p01, pwat_test_p02, pwat_test_p03, pwat_test_p04, pwat_test_c00,
cape_test_c00, t2m_test_c00, cin_test_c00]

X_test_ensemble = np.stack((lst_test_ensemble), axis = -1) # stacking 13 variables into 1 single 4D array

The ideas of log-transformation and min-max normalization are comparable for the bottom fact dataset as properly. Do check with the ‘notebooks’ folder in GitHub for extra particulars.

For our two particular duties, we make use of two loss capabilities. The primary goal is to…

… reduce a linear mixture of the person duties’ loss capabilities. Every process can have its personal particular person loss perform L_i. So in (the coaching stage of) our multi-task mannequin, we merely weight every loss perform and reduce the sum of those weighted losses — Devin Soni in his Medium article, “Multi-task learning in Machine Learning

Goal perform for multi-task studying. Supply: https://cs330.stanford.edu/lecture_slides/cs330_multitask_transfer_2022.pdf

Activity 1: L_1 = Weighted Categorical Cross Entropy Loss

The common cross entropy loss perform for classification process can lead to underestimation of minority class. Thus, we are able to apply a weighted cross entropy as loss perform for the classification process with the intention to penalize extra in the direction of heavy rain class (i.e. ninety fifth quantile). Our weights for 4 completely different classes are 4, 19, 23 and 56 for <fiftieth quantile, fiftieth–seventy fifth quantile, seventy fifth–ninety fifth quantile, and >ninety fifth quantile respectively, which is the inverse proportion of quantiles courses (i.e., 4% dataset incorporates >ninety fifth quantile, which shall be the numerical weight for <fiftieth quantile).

Activity 2: L_2 = Fraction Ability Rating Loss

Not like conventional convolutional outputs the place loss computations usually revolve round pixel-to-pixel comparisons, our mannequin makes use of a customized loss perform that computes loss over an space.

Supply: https://jye-lim.github.io/weather_prediction_app/

The expected grids with rain are just one grid away from the noticed values. Utilizing the built-in loss capabilities, resembling Imply Squared Error (MSE) loss, would consequence within the mannequin being penalized twice for what may very well be thought of an affordable prediction. The primary penalty can be utilized to the grid that has noticed precipitation however no predicted precipitation, whereas the second would apply to the grid with predicted precipitation however no noticed precipitation. That is regardless of the mannequin having pretty precisely recognized the areas experiencing precipitation.

To beat this subject, we implement a customized loss perform referred to as the Fractions Ability Rating loss [13]. The FSS loss scans an space of dimension m x m (the place m refers back to the user-defined masks dimension), calculating the typical precipitation inside that space, after which computing the losses between the true and predicted values. This method higher accommodates the spatial nature of our knowledge and mitigates overly penalizing affordable predictions.

Right here, we create the community structure for the residual block and the generator. Previous to splitting off into two particular duties, one convolutional layer (k3n256s1) follows the final element-wise addition operation to summarize characteristic maps.

Utilizing solely the generator community portion proven within the pink field as our shared layers for MTL. Supply: https://arxiv.org/pdf/1609.04802.pdf

We add on the 2 process particular layers after the k3n256s1 convolutional layer in line 27 (process 1) and line 32 (process 2).

After we’re accomplished with the total structure, we are able to proceed to coach the mannequin.

Right here is the code to load all of the coaching and validation datasets:

I want to spotlight line 20 of the code under. In that line, we’re utilizing class_loss for process 1, and make_FSS_loss for process 2. Keep in mind, the output1 corresponds to that of process 1, and output2 corresponds to that of process 2.

We additionally set the loss weights to be 0.01 for w_1 and 1.0 for w_2, in order that the magnitude of the 2 losses stay roughly the identical all through the coaching stage.

After working the codes, one can then resolve which gen_model.h5 to make use of primarily based on the losses.txt file to experiment on the take a look at set. This .h5 mannequin can then be used to downscale real-time reforecast GEFSv12 dataset to acquire high-resolution dataset for Singapore. Sooner or later, .h5 fashions can be uploaded to GitHub as properly.

[1] J. Murphy, An evaluation of statistical and dynamical techniques for downscaling local climate (1999). Journal of Local weather, 12(8), pp.2256–2284.

[2] H. J. Fowler, S. Blenkinsop and C. Tebaldi, Linking climate change modelling to impacts studies: recent advances in downscaling techniques for hydrological modelling (2007). Worldwide Journal of Climatology: A Journal of the Royal Meteorological Society, 27(12), pp.1547–1578.

[3] Maraun et al, Precipitation downscaling under climate change: Recent developments to bridge the gap between dynamical models and the end user (2010). Evaluations of geophysics, 48(3).

[4] T.T. Le et al, Development of 48-hour precipitation forecasting model using nonlinear autoregressive neural network (2020), Innovation for Sustainable Infrastructure: Proceedings of the fifth Worldwide Convention on Geotechnics, Civil Engineering Works and Constructions (pp. 1191–1196).

[5] E.P. Salathé, L.R. Leung, Y. Qian and Y. Zhang, Regional climate model projections for the State of Washington (2010). Climatic Change, 102, pp.51–75.

[6] J. Baño-Medina, R. Manzanas and J.M. Gutiérrez, Configuration and intercomparison of deep learning neural models for statistical downscaling (2020). Geoscientific Mannequin Growth, 13(4), pp.2109–2124.

[7] F. Wang, D. Tian and M. Carroll, Customized deep learning for precipitation bias correction and downscaling (2023). Geoscientific Mannequin Growth, 16(2), pp.535–556.

[8] Frédéric Vitart et al, Use of ERA5 reanalysis to initialise re‑forecasts proves beneficial (2019), ECMWF Publication Quantity 161 — Autumn 2019.

[9] Nationwide Oceanic and Atmospheric Administration, NOAA’s Global Ensemble Forecast System Version 12: Reforecast Data Storage Information

[10] ECMWF, Fact sheet: Reanalysis (2020).

[11] City Local weather Downscaling Portal, Weather Typing Methods. Primarily based off a guide by Mujumdar & Nagesh Kumar, Floods in a Altering Local weather: Hydrologic Modeling.

[12] H. Daumé III, Frustratingly easy domain adaptation (2009). arXiv preprint arXiv:0907.1815.

[13] Ebert-Uphoff et al, CIRA Guide to Custom Loss Functions for Neural Networks in Environmental Sciences — Version 1 (2021). arXiv preprint arXiv:2106.09757.

For additional technical readings on different matters associated to local weather science, check with this web site:



Source link

HINTERLASSEN SIE EINE ANTWORT

Please enter your comment!
Please enter your name here