##### Disaster Recovery: Why Your Organization Needs a Plan

by Prachi Sinha Introduction Modern financial institutions rely on a consistent flow of real-time data to power some of their most critical business…

**By Michael Surace and Robert Chang**

Despite the prevalence of macroeconomic-based forecasting tools, correlating outcomes with macroeconomic time series is a statistically complex process rife with challenges (see Part I). Organizations must be prepared to understand stationarity, time trends, unit roots, structural breaks, cointegration, and vector error correction models. Naïve linear regression with a macroeconomic time series often leads to spurious regressions and nonsense results^{[1]}. Building forecasting models based on macroeconomic conditions is also computationally challenging. A modeler must evaluate thousands of different macroeconomic time series for inclusion into a model, prepare and transform each time series, and then test different variable combinations as candidate models, which could number into the hundreds of thousands.

FI Consulting built an automated tool for regressing any variable of interest (e.g., default rate, prepayment rate) against macroeconomic factors, based on our deep experience in building macroeconomic-based forecasting models in various public and private sector settings. Our tool searches through a long list of candidate macroeconomic variables based on business line preferences and automatically prepares the variables for regression, checking stationarity, cointegration, and structural breaks. The tool then tests candidate models and ranks them based on goodness-of-fit. The business line can then select the model best suited for the business purpose, remain confident that the model is statistically sound, and that the model passes statistical tests.

In the following sections, we demonstrate how our tool overcomes the statistical challenges of building macroeconomic-based forecasting models. We walk through examples of using our tool to forecast mortgage portfolio delinquency rates and consumer loan charge-off rates under different economic environments. While we use mortgage delinquency rates and consumer loan charge-off rates as examples, we designed our tool to forecast any dependent variable of choice based on any number of potential macroeconomic variables as predictors.

As described in Part I, incorporating macroeconomic variables into a business’s forecasting processes is statistically challenging and requires significant resources to implement correctly. To reduce the level of effort institutions face, FI has developed a tool that automates the building of macroeconomic-based forecasting models. FI designed the tool to accept any dependent variable of interest, prepare macroeconomic variables for regression analysis, and output a set of candidate models that are accurate and statistically sound. The remainder of this section discusses the capabilities of FI’s automated tool, the value it provides to financial institutions, and analyzes the tool’s performance when used to forecast two different variables, a mortgage portfolio delinquency rate and consumer loan charge-off rate. The diagram below provides an overview of the automated model-building process.

To show an instance of macroeconomic-based forecasting with non-stationary time series, we apply the tool for forecasting a mortgage portfolio delinquency rate and consumer loan charge-off rate regressed on several macroeconomic factors. While we use mortgage delinquency rates and consumer loan charge-offs in this example, we designed the tool to forecast any dependent variable of choice based on any number of potential macroeconomic variables as predictors. This section first discusses how to prepare and transform the data to ensure that the model is statistically sound. Next, we discuss selecting our predictor variables and evaluating candidate models. Finally, we apply our model to forecasting a mortgage portfolio delinquency rate and consumer charge-off rate and discuss the results.

We designed the tool to predict any dependent variable of choice and develop forecast models using relevant macroeconomic variables as independent variables. The tool houses a suite of raw, publicly available historical macroeconomic data for building forecast models^{[2]}. The data covers a broad spectrum of macroeconomic variables, including interest rates, housing market variables, consumer debt, consumer sentiment, and monetary data.

We first import macroeconomic data and put it in a panel format (each column representing a time series of a different variable). Next, we select our dependent variable (y) and independent variables (X). Then, for each independent variable, we create transformations, including lags, differences, and growth rates^{[3]}. The diagram below provides an illustration of the types of transformations that are performed on the macroeconomic data.

Transformations of each macroeconomic variable significantly expand the number of possible predictors for estimation and can be a resource expensive exercise. A modeler must conduct a series of tests on each variable to determine which transformation will do the best job at predicting the dependent variable. Automating this step is critically important, as a manual approach would severely restrict the number of macroeconomic variables available for consideration.

Once the data is imported and prepared, we must analyze it to determine if it can be used in regression analysis. One of the main issues with time series data is the presence of a unit root. That is, the data series is non-stationary and exhibits changes in the mean, variances, and covariance over time. Non-stationary data used in regression analysis can lead to biased predictions. The tool conducts three different statistical tests to check for stationarity in each time series:

__Augmented Dickey-Fuller (ADF) Test__: Null Hypothesis that a unit root is present in the time series. Failure to reject the null hypothesis concludes that the time series is non-stationary.__Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test__: Null hypothesis that a unit root is not present in the time series. Failure to reject the null hypothesis concludes that the time series is stationary.__Phillips-Perron (PP) Test__: Null Hypothesis that a unit root is present in the time series. Failure to reject the null hypothesis concludes that the time series is non-stationary.

For each macroeconomic variable, we follow the process below:

- Run all three unit roots tests to check for stationary. There are instances where each test’s conclusion may differ; when this happens, the model uses a “Majority Vote” rule. That is, if two or more of the unit root tests conclude the data series is stationary, then the model will classify the variable as stationary. Otherwise, the variable will be classified as non-stationary.
- Discard all variables that are classified as non-stationary.

The result of this process is a pool of macroeconomic variables and transformations that are stationary. These variables will be used as predictors in the forecast models.

Once the model has identified the pool of candidate variables that can be used in the regression models, we must identify which combination of variables produces the “best” models. The tool uses the Best Subsets algorithm to identify candidate variables^{[4]}. The algorithm tests every possible combination of variables and, based on a defined criterion, selects the variable groupings that perform the best. For example, if we have 3 variables, we will fit separate models for all possible combinations of three variables. This approach is advantageous because it tests all possible variable groupings. Once all possible combinations of “n” predictors have been specified, the model reduces the number of candidate models using the following approach:

- Removes models that do not consider a diverse set of macroeconomic data. For example, a model that used three different transformations of GDP would be removed. This is to ensure that the final model considers different types of macroeconomic indicators.
- Removes models where the residuals are non-stationary. This ensures that we do not violate the assumptions of linear regression, utilizes OLS for estimation.

After the model has removed candidate models that contain redundant predictors and ones that do not have stationary residuals, the tool selects the best candidate based on a specified testing criterion, such as the candidate models with the highest adjusted R^{2} or lowest residual sum of squares (RSS). This approach is advantageous because every possible combination of variables gets tested; however, when working with a large number of predictors, the number of possible combinations can be immense. In general, if there are k predictors, there are 2^{k} possible combinations. This approach can be very resource and time intensive. The diagram below illustrates the resulting dataset that is created after the best subset algorithm is executed and the filtering process has been completed.

Once we identify the optimal predictors of the dependent variable, the tool runs a final regression on the dependent variable. In this model, we use the predictor variables observed at time t (X_{t,1}, X_{t,2}, … X_{t,K}) to forecast Y_{t} in a classical linear regression setting^{[5]}. The model can be summarized in Equation as follows:

Y_{t} = β_{0} + β_{1}X_{t,1} + β_{2}X_{t,2 }+ ⋯ + β_{K}X_{t,K} + ϵ_{t} |

We estimate the coefficients (β’s) by OLS. The model’s estimation window is fixed. For instance, suppose we have 100 observations in our time series data; we use the first 70 observations to estimate the coefficients. Following the estimation procedure, we use the same estimated coefficients to predict the values of the dependent variable for the remaining observations. This is equivalent to splitting our data into training and testing datasets^{[6]}.

In order to assess the performance of The Model we ran the algorithm on two sets of dependent variables; 1) Mortgage Delinquency Rates, and 2) Consumer Loan Charge Off Rates. The model results are presented in the following sections.

We pulled quarterly mortgage delinquency rates (MDR) from FRED for the period 1991 to 2021. The tool trained the regression on data from 1991 to 2013. We selected the final specification presented below:

MDR_{t} = β_{0} + β_{1}DiffCPI_{t} + β_{2}GrowthUMCSENT_{t-5} + β_{3}GrowthVIX_{t-4} + ∈_{t} |

where,

*DiffCPI* = First difference of CPI

*GrowthUMCSENT* = 5 Quarter lag of US consumer sentiment

*GrowthVIX* = 4 quarter growth rate of the VIX index

The figure below shows the model prediction compared to the actual rates for the hold-out period.

Figure 1. Mortgage Delinquency Rates Out of Sample Performance

We pulled quarterly consumer loan charge-off rates (COR) from FRED for the period 1987 to 2021. The model trained the regression on data from 1987 to 2013. We selected the final specification presented below:

COR_{t} = β_{0} + β_{1}GrowthFEDFUNDS_{t–8} + β_{2}GrowthDRCC_{t–8} + β_{3}VIX_{t–8} + ∈_{t} |

where,

*GrowthFEDFUNDS* = 8 quarter growth rate of the Federal Funds rate

*GrowthDRCC* = 8 Quarter growth rate of the delinquency rate of credit cards

*VIX* = 8 quarter lag of the VIX index

The figure below shows the model prediction compared to the actual rates for the hold-out period.

Figure 2. Consumer Loan Charge-off Rates Out of Sample Performance

Incorporating macroeconomic variables into a business’s forecasting processes is statistically challenging and requires significant resources to implement correctly [Macroeconomics – Part I]. To reduce the level of effort institutions face, FI has developed a tool that automates the building of macroeconomic-based forecasting models. Interested in learning more or reviewing if your institutions models are implemented correctly? Email Robert Chang and Michael Surace at info@ficonsulting.com.

**References:**