On the Genealogy of Machine Learning Weather Prediction (MLWP): Physics Inherited, Data Forgotten — Toward a Principled Trade-off in Surrogate Modeling

Exploring the foundations of numerical weather prediction and how machine learning can serve as effective surrogates, emphasizing the need to balance physical constraints with data-driven approaches for principled surrogate modeling.

Traditionally, physics has been responsible for explaining the Earth as a system. Fundamental physical laws, such as Newton's Second Law of Motion and the First Law of Thermodynamics, govern the evolution of the atmosphere. These laws are converted into the mathematical equations that form the core of what we call numerical weather prediction (NWP).

Physics has been so dominant over the past few decades that, even as we pivot toward machine learning (ML) surrogates, we often use ML merely as a mimic for traditional numerical solvers. However, we must not forget that the goal is to surrogate this physics with a data-driven approach. Consequently, the unique characteristics of data and the established best practices of data science should be considered within this new paradigm. We need to reconcile physical laws with the broader conventions of machine learning and data-driven methodologies. This article explores how respecting ML conventions alongside physics laws can help us avoid the common pitfalls of surrogate modeling.

In MLWP, we aim to replace traditional NWP with machine learning. This problem has two vital aspects: first, we must deeply understand the characteristics of NWP, its core components and how it is solved. Second, we must select the appropriate ML components (modules) capable of capturing the inherent physical characteristics of the task, which is crucial for model performance.

To achieve this, we will construct our own narrative regarding NWP. This does not mean we will distort facts or alter the truth; rather, we are shifting our perspective to an angle that serves our ultimate objective: effectively surrogating physical systems with machine learning. By framing NWP through the lens of numerical integration schemes, as well as prognostic and diagnostic interplay, we can better identify which physical constraints are essential to preserve and which aspects of the system are best suited for data-driven approximation.

Numerical Weather Prediction

Disclaimer: The following content provides a simplified conceptual overview of Numerical Weather Prediction (NWP) to facilitate a discussion on machine learning surrogates. While the mathematical narrative is "fair enough" for a high-level perspective, some technical nuances have been simplified for the sake of clarity.

Vilhelm Bjerknes first recognized that numerical weather prediction was possible in principle in 1904. He proposed that weather prediction could be viewed as an initial value problem in mathematics: since physical laws govern how meteorological variables evolve over time, if we possess an accurate representation of the atmosphere's initial state, we can numerically integrate these governing equations forward in time to generate a forecast.

At its core, NWP involves solving a set of partial differential equations, commonly referred to as the Primitive Equations. These equations are designed to resolve six fundamental resolved variables: three-dimensional wind velocity components ($u, v, \omega$), temperature ($T$), moisture ($q$), and geopotential height ($z$).

The Primitive Equations

The following system serves as the foundational framework for atmospheric motion and thermodynamics:

Wind Forecast Equations

1a. $$\frac{\partial u}{\partial t} = - u \frac{\partial u}{\partial x} - v \frac{\partial u}{\partial y} - \omega \frac{\partial u}{\partial p} + fv - g \frac{\partial z}{\partial x} + F_x$$

1b. $$\frac{\partial v}{\partial t} = - u \frac{\partial v}{\partial x} - v \frac{\partial v}{\partial y} - \omega \frac{\partial v}{\partial p} - fu - g \frac{\partial z}{\partial y} + F_y$$

Continuity Equation

2. $$\frac{\partial u}{\partial x} + \frac{\partial v}{\partial y} + \frac{\partial \omega}{\partial p} = 0$$

Temperature Forecast Equation

3. $$\frac{\partial T}{\partial t} = - u \frac{\partial T}{\partial x} - v \frac{\partial T}{\partial y} - \omega \left( \frac{\partial T}{\partial p} - \frac{RT}{c_p p} \right) + \frac{H}{c_p}$$

Moisture Forecast Equation

4. $$\frac{\partial q}{\partial t} = - u \frac{\partial q}{\partial x} - v \frac{\partial q}{\partial y} - \omega \frac{\partial q}{\partial p} + E - P$$

Hydrostatic Equation

5. $$\frac{\partial z}{\partial p} = - \frac{RT}{pg}$$

Source: MetEd course: Impact of Model Structure and Dynamics

From Theory to Numerical Integration

Because these non-linear partial differential equations do not possess closed-form analytical solutions, we must rely on numerical schemes to solve them. In practice, solving these equations is a process of discrete integration over time and space. To complete our narrative, we can represent the essence of this integration using a simple Euler forward scheme.

If $\psi$ represents any of our resolved variables, the state at time $t + \Delta t$ can be approximated by the current state and its time tendency:

$$\psi(t + \Delta t) \approx \psi(t) + \left( \frac{\partial \psi}{\partial t} \right) \Delta t$$

In this framework, the "model" acts as an engine that calculates the tendency term ($\frac{\partial \psi}{\partial t}$) using the physical laws shown above, then iteratively updates the state of the atmosphere.

Physical Processes

In our set of "sacred" equations, some variables, specifically $F_x, F_y, H, E,$ and $P$, represent physical processes that impact our primary variables. These processes are inherently complex; they often involve scales far smaller than the grid spacing of the model (such as individual convective clouds) or rely on physical mechanisms (like radiation transfer) that are too computationally expensive to resolve from first principles.

Because we cannot calculate these effects directly within the core equations, we must estimate them using empirical approximations. In numerical modeling, this technical estimation process is known as parameterization. The accuracy of an NWP forecast is fundamentally linked to how well these parameterizations mimic reality.

Prognostics vs. Diagnostics

We must distinguish between two types of variables in our NWP system:

Prognostic Variables ($u, v, T, q$): These are our "state" variables. We determine their future values by solving the time-dependent equations (the Wind, Temperature, and Moisture Forecast equations).
Diagnostic Variables ($\omega, z$): These are derived directly from the prognostic variables at any given time step, rather than being solved via time-tendency equations (the Continuity and Hydrostatic equations).

For simplicity and to improve the coherence of the narrative, we categorize all variables parameterized in the physical processes ($F_x, F_y, H, E, P$) as diagnostics, since they share the defining characteristics of diagnostic variables; i.e., they are calculated from the current prognostic state, even though they influence the future state.

Machine Learning

Disclaimer: The data-driven landscape is more amorphous than traditional physics, defined largely by experimental conventions and evolving best practices. While standard ML taxonomy categorizes tasks into supervised (classification/regression) and unsupervised (clustering/dimensionality reduction) methods, we will adopt a customized framework to better align with our goal of surrogate modeling. We define our analysis through two primary lenses:

Time-Series Analysis (Forecasting)

In this context, we define forecasting as estimating future states based on the history of a time series. The defining characteristic is the temporal anchor. We rely on the autocorrelation of the system—the principle that the state at time $t$ fundamentally influences the state at $t+1$. The objective is to project the known trajectory of the atmosphere forward into an unknown future, maintaining the continuity of the system's evolution.

Regression Analysis (Prediction)

We define prediction as the estimation of a target variable $Y$ based on a given feature set $X$. At its core, this is a functional mapping: $f(X) \to Y$. Unlike time-series analysis, regression often treats data points as independent observations within a feature space, without an inherent requirement that the target must exist in the future. We are essentially performing interpolation or extrapolation within the feature space, and the goal is to learn a mapping function rather than project a temporal trajectory.

By distinguishing between forecasting (projecting the trajectory forward via temporal correlation) and prediction (mapping inputs to outputs within a state space), we can better classify which components of the NWP system belong to which ML approach. This is the crux of our "principled trade-off": knowing whether our surrogate should be acting as a time-evolving forecaster or a diagnostic predictor.

I believe this is a good place to wrap up the first part of this article. In the next piece, we will work on how to define prognostics and diagnostics in MLWP, building upon our broad understanding of the ingredients of both NWP and MLWP.