Data & Analytics Insights

Predicting daily current coupon changes using neural networks

Yishul Wei 

Senior Analyst, Research at LSEG Analytics

Seunghyeon Son

Head of Fixed-income Pricing and Modeling at Yield Book, part of LSEG's Data & Analytics Division

MBS prepayment cashflow modeling critically depends on accurate projections of the current coupon (CC) along simulated interest rate paths. Underlying the Yield Book’s market-leading MBS analytics is the MOATS model for the generation of such projections. The recent success of AI witnessed in other domains suggests the possibility that equally high-performing models for CC projection may be developed through universal AI frameworks such as neural networks (NN). In this report we explore this potential by developing a NN model for the task of predicting daily CC changes and comparing its performance with the benchmark constant-spread model as well as MOATS.

Introduction

In pricing mortgage-backed securities (MBS), the current coupon (CC) and its future projections along interest rate paths are key factors determining the modelled prepayment behaviour (and hence the modelled future cashflows). The CC is the coupon of a hypothetical to-be-announced (TBA) contract that settles in 30 days and has present value 102 or 100. (The corresponding CC is also known as CMM102 and CMM100, respectively. In this report, we focus on CMM102 since it has rich historical data as compared with CMM100 which was reintroduced a few years ago.) Its value is interpolated from the available TBA market prices and is intended to represent the fair yield of new mortgages. (The technical definition of CC is not particularly relevant for this research report, and we refer interested readers to our previous publications [1,2] for more details.)

The LSEG Yield Book (YB) MBS pricing suite offers users two alternatives to produce CC projections along interest rate paths generated by the LIBOR Market Model (LMM). The first is the constant-spread model that assumes a constant spread (which YB can extract from closing market quotes) between the CC and the 10-year par swap rate. The other is the Mortgage Option-Adjusted Term Structure (MOATS) model that utilizes a simplified prepayment model and a backward induction methodology to systematically derive projected TBA prices along the interest rates paths, from which the CC projections can be calculated according to the above definition. These models have been documented extensively in our previous publications [1,2].

Our previous publications have also mentioned regression-based approaches as possible alternatives for the generation of CC projections but have not provided quantitative evaluation with respect to how regression-based approaches to CC projections compare with the above-mentioned two models. Regression-based approaches can easily incorporate variables other than the 10-year swap rate into the model that determines CC projections along the paths yet are typically much more computationally efficient than MOATS. Traditional statistical regression, however, has the drawback that modelling frameworks are usually rigid in the assumed functional form that relates output and input variables. (In other words, they require very strong statistical assumptions.) For example, commonly adopted linear regression assumes the relation between output and input variables were linear, whereas it may not be the case.

Neural networks (NN) are more sophisticated regression models equipped with higher levels of flexibility than traditional regression. They are also the workhorse driving the recent artificial intelligence (AI) revolution. NN loosen the statistical assumption regarding the functional form of the relation between output and input variables, as the number and sizes (i.e. dimensions) of the “hidden layers” can be freely adjusted. (However, if too much flexibility is allowed, the problem of overfitting is likely to happen. The standard practice is to use cross-validation to select the appropriate model size. This is also the approach we take below.)

Although NN are usually regarded as “black box” models, recently the AI community has proposed methodologies that intend to make NN more explainable. For example, the Shapley Additive Explanations (SHAP) have now become the widely adopted metrics to evaluate the contribution of each input variable to the prediction output.

This report presents our work on comparing three approaches to the task of predicting daily CC changes:

  • Constant Spread Model 
  • MOATS
  • NN

Essentially, the research question is: Given the changes of par swap rates and swaption-implied volatilities from those of the previous day, how well does each model predict the change of CC from that of the previous day? Our previous publication [2] has reported on the comparison between the performance of 1) and 2) on the same task. By adding 3) to the comparison, this report intends to demonstrate, at least at a proof-of-concept level, that NN can effectively capture more details than the constant-spread model about the dependency of CC on rate dynamics, yet be more computationally efficient – and conceptually more straightforward to understand through tools like SHAP – than MOATS.

Data

The daily closing values of par swap rates, normal swaption-implied volatilities, and CC from 10/03/2016 to 12/31/2024 were extracted from the YB database. (2016/10/03 is the first day for which the normal LMM model is available in the current YB system.) All data were converted to daily changes by subtracting from the daily closing values the corresponding previous day’s closing values. In the below, Δvariable is used to denote the daily change of variable.

Note that for the constant spread model, the predicted ΔCC is exactly the daily change of the 10-year par swap rate (Δpar10y). For MOATS, the rates and volatilities first need to be fed into the LMM to generate the Monte Carlo interest rate paths, which are then used to derive the predicted CC through backward induction. The NN can take multiple input features and are thus more flexible than the constant-spread model, while avoiding the need for full-fledged term structure modelling such as the LMM.

Development of the NN Model

In total, 2060 daily change data are available, of which 1750 (85%, “development set”, 10/04/2016 up to 10/05/2023) were used for model development, and the remaining 310 (15%, “test set”, 10/06/2023 up to 12/31/2024) were used for later evaluation of out-of-sample prediction.

Because many par rates and normal volatilities demonstrate high historical mutual correlations (> 0.9) and we observed through some initial experimentation that adding highly-correlated input features tends to harm NN performance, we restricted the pool of potential input features to the NN model to a few key par rate (3m, 6m, 1y, 3y, 10y) and normal volatility (1y1y, 1y10y, 5y5y, 5y10y, 10y10y) daily changes. We considered only single-hidden-layer NN given our small data size. (Increasing the number of hidden layers increases model complexity and would usually require larger dataset in order to achieve decent performance.) The link function of the hidden layer was chosen to be tanh.

To determine the final set of input features, size of the hidden layer, and number of training epochs, we employed five-fold cross validation as follows. The development set was further divided into five parts. Given a set of input features and a hidden layer size, five NN were trained on different four out of the five parts and evaluated on the remaining part, respectively. The number of training epochs was individually optimized for the five networks based on performance over the evaluation part. Then, the percentage explained variance over the evaluation part, averaged across the five networks, was taken as the score for the set of input features and hidden layer size. The combination of input features and hidden layer size that yielded the highest score was used for our final NN predictions.

In this way we fixed the final set of input features to be {Δpar1y,Δpar3y,Δpar10y,Δvol1y10y,Δvol5y10y} and hidden layer size to be 80.  

Figure 1 showcases the cross-validation explained variance percentage over the training and evaluation parts, with the input features fixed but with different hidden layer sizes. It can be observed that there is a trend of improving evaluation performance with increasing hidden layer sizes up to 80, after which the performance becomes unstable (due to the fact that our data size is insufficient for fitting large models).

Note that, because we optimized the numbers of training epochs against evaluation data, the curve for explained variance over the training data does not show the typical monotonically increasing pattern that one might expect for machine learning models.

Figure 1: NN Explained Variance

In the following, the reported performance of NN is based on the averaged prediction of the five optimized networks obtained in the cross-validation procedure.

Performance of ΔCC Prediction

Table 1 displays the total variance of ΔCC and the mean squared difference (MSD) between the actualized ΔCC and the prediction made by each approach, across the development and test sets. We can see that for in-sample prediction (development set) the NN model fits the data better than the other models, whereas for out-of-sample prediction (test set) the performance of the NN model lies in between those of the other models.

Table 1: Overall Variance of  and the Prediction Performance of Each Model

Table 1 displays the total variance of ΔCC and the mean squared difference (MSD) between the actualized ΔCC and the prediction made by each approach, across the development and test sets.
  Development Test
Historical variance of actualised ΔCC 0.005162 0.0044
MSD constant-spread model vs. actualised ΔCC 0.001778 0.000869
MSD MOATS vs. actualised ΔCC 0.001771 0.000643
MSD NN vs. actualised ΔCC 0.001403 0.000721
We can also look at the MSD between the models’ predictions as shown in Table 2. Interestingly, the mutual differences between the predictions are much smaller than the differences between the actualised ΔCC and the predictions above. Also, the MSD between NN prediction and either of the other models’ predictions are about half of the MSD between the other two models’ predictions. It is worth noting that although the NN were trained to predict the actualised ΔCC and did not “see” the MOATS output during training, the MSD between NN and MOATS predictions ended up much smaller than the MSD between either prediction and the actualised ΔCC. This suggests that the information contained in the input features which is relevant to CC movement and captured by each model significantly overlaps. Furthermore, the prediction made by NN lies somewhere in the “middle ground” between the predictions made by the other models in the high-dimensional feature space.

Table 2: Differences Between Model Predictions

table 2 looks at the MSD between the models’ predictions
  Development Test
MSD constant-spread model vs. MOATS 0.000603 0.000478
MSD constant-spread model vs. NN 0.000352 0.000207
MSD MOATS vs. NN 0.000289 0.000187

Feature Contributions to Prediction per SHAP

SHAP values are nowadays commonly reported in the AI literature to provide explainable insights into NN models. For each predicted value given by a model, each input feature’s SHAP value is a metric signifying its contribution to the prediction output. (All features’ SHAP values sum up to the predicted value.)

When we plot each input feature’s SHAP values across the entire dataset, we see that Δpar10y is the main driver for most of the prediction results. This corroborates the observation above that the MSD between the constant-spread and NN models are small. However, we can see that the other features also make non-trivial contributions, at least for a portion of the data. This is information that is captured by the NN model but missed by the constant-spread model.

The majority of the SHAP values for the features Δvol1y10y, Δvol5y10y, and Δpay1y are close to 0. It means that if one trains NN dispensing of these features, one can already expect decent NN prediction for most of the days. However, for a minority of days these features become key contributors to the predicted values. This illustrates how the SHAP analysis can help us attain deeper understanding regarding the role that each variable plays in driving CC movement.

Figure 2: SHAP Contributions of Each Input Feature to NN Predicted Values over the Entire Dataset

Source information

Summary

In recent years, the potential of AI to uncover insights within huge amounts of data has gained traction across the financial community. When it comes to MBS, however, explorations along this direction have still been scarce. We present a task – predicting daily CC changes – for which we demonstrate that even a simple NN model can achieve performance on par with existing models. Interestingly, we observed that even though the NN model takes only five input par rate and volatility features, its prediction of ΔCC is close to that of MOATS, whose output is dependent on the Monte Carlo interest rate paths generated by the LMM.

The results we present are encouraging first steps and certainly point to greater roles that AI can play in MBS pricing and analytics. An obvious next step is to see how NN and MOATS compare in the cases of more general rate/volatility changes (i.e. not limited to actualized daily changes). If it can be verified that the two approaches are comparable for general rate/volatility change scenarios, the possibility would be opened up for YB to offer NN as a simplified modeling alternative for CC projection that could deliver reliable MBS pricing and scenario analytics but consumes significantly less computing resources than MOATS.

Before closing, we provide further empirical observations in Figure 3 to stress the point that the constant-spread model, or even a model that takes more par swap rates (but not swaption-implied volatilities) into account, does not incorporate all relevant information for the prediction of CC changes: The scatter plot of the [CC – 10-year par swap rate] spread vs. the 1y10y normal volatility shows a clear correlation between the two quantities. For the specific task of predicting daily CC changes, the residual error of the constant-spread model (that is, ΔCC-Δpar10y) still exhibits a decent correlation with Δvol1y10y, whereas correlation virtually does not exist between Δvol1y10y and the residual error of the MOATS or NN model. Thus, the variation of volatility clearly contributes information about CC variation that should be taken advantage of by the predictive models – which is noticeable already for the daily change case and even more so for the general case.

Figure 3: Scatter Plots Illustrating Dependency of CC Movement on Volatilities

Source information

Read more about

Stay updated

Subscribe to an email recap from:

Legal Disclaimer

Republication or redistribution of LSE Group content is prohibited without our prior written consent. 

The content of this publication is for informational purposes only and has no legal effect, does not form part of any contract, does not, and does not seek to constitute advice of any nature and no reliance should be placed upon statements contained herein. Whilst reasonable efforts have been taken to ensure that the contents of this publication are accurate and reliable, LSE Group does not guarantee that this document is free from errors or omissions; therefore, you may not rely upon the content of this document under any circumstances and you should seek your own independent legal, investment, tax and other advice. Neither We nor our affiliates shall be liable for any errors, inaccuracies or delays in the publication or any other content, or for any actions taken by you in reliance thereon.

Copyright © 2025 London Stock Exchange Group. All rights reserved.