Data & Analytics Insights

Predicting daily current coupon changes using neural networks

August 21, 2025

17 min

Yishul Wei

Senior Analyst, Research at LSEG Analytics

Seunghyeon Son

Head of Fixed-income Pricing and Modeling at Yield Book, part of LSEG's Data & Analytics Division

MBS prepayment cashflow modeling critically depends on accurate projections of the current coupon (CC) along simulated interest rate paths. Underlying the Yield Book’s market-leading MBS analytics is the MOATS model for the generation of such projections. The recent success of AI witnessed in other domains suggests the possibility that equally high-performing models for CC projection may be developed through universal AI frameworks such as neural networks (NN). In this report we explore this potential by developing a NN model for the task of predicting daily CC changes and comparing its performance with the benchmark constant-spread model as well as MOATS.

In pricing mortgage-backed securities (MBS), the current coupon (CC) and its future projections along interest rate paths are key factors determining the modelled prepayment behaviour (and hence the modelled future cashflows). The CC is the coupon of a hypothetical to-be-announced (TBA) contract that settles in 30 days and has present value 102 or 100. (The corresponding CC is also known as CMM102 and CMM100, respectively. In this report, we focus on CMM102 since it has rich historical data as compared with CMM100 which was reintroduced a few years ago.) Its value is interpolated from the available TBA market prices and is intended to represent the fair yield of new mortgages. (The technical definition of CC is not particularly relevant for this research report, and we refer interested readers to our previous publications [1,2] for more details.)

The LSEG Yield Book (YB) MBS pricing suite offers users two alternatives to produce CC projections along interest rate paths generated by the LIBOR Market Model (LMM). The first is the constant-spread model that assumes a constant spread (which YB can extract from closing market quotes) between the CC and the 10-year par swap rate. The other is the Mortgage Option-Adjusted Term Structure (MOATS) model that utilizes a simplified prepayment model and a backward induction methodology to systematically derive projected TBA prices along the interest rates paths, from which the CC projections can be calculated according to the above definition. These models have been documented extensively in our previous publications [1,2].

Our previous publications have also mentioned regression-based approaches as possible alternatives for the generation of CC projections but have not provided quantitative evaluation with respect to how regression-based approaches to CC projections compare with the above-mentioned two models. Regression-based approaches can easily incorporate variables other than the 10-year swap rate into the model that determines CC projections along the paths yet are typically much more computationally efficient than MOATS. Traditional statistical regression, however, has the drawback that modelling frameworks are usually rigid in the assumed functional form that relates output and input variables. (In other words, they require very strong statistical assumptions.) For example, commonly adopted linear regression assumes the relation between output and input variables were linear, whereas it may not be the case.

Neural networks (NN) are more sophisticated regression models equipped with higher levels of flexibility than traditional regression. They are also the workhorse driving the recent artificial intelligence (AI) revolution. NN loosen the statistical assumption regarding the functional form of the relation between output and input variables, as the number and sizes (i.e. dimensions) of the “hidden layers” can be freely adjusted. (However, if too much flexibility is allowed, the problem of overfitting is likely to happen. The standard practice is to use cross-validation to select the appropriate model size. This is also the approach we take below.)

Although NN are usually regarded as “black box” models, recently the AI community has proposed methodologies that intend to make NN more explainable. For example, the Shapley Additive Explanations (SHAP) have now become the widely adopted metrics to evaluate the contribution of each input variable to the prediction output.

This report presents our work on comparing three approaches to the task of predicting daily CC changes:

Constant Spread Model
MOATS
NN

Essentially, the research question is: Given the changes of par swap rates and swaption-implied volatilities from those of the previous day, how well does each model predict the change of CC from that of the previous day? Our previous publication [2] has reported on the comparison between the performance of 1) and 2) on the same task. By adding 3) to the comparison, this report intends to demonstrate, at least at a proof-of-concept level, that NN can effectively capture more details than the constant-spread model about the dependency of CC on rate dynamics, yet be more computationally efficient – and conceptually more straightforward to understand through tools like SHAP – than MOATS.

The daily closing values of par swap rates, normal swaption-implied volatilities, and CC from 10/03/2016 to 12/31/2024 were extracted from the YB database. (2016/10/03 is the first day for which the normal LMM model is available in the current YB system.) All data were converted to daily changes by subtracting from the daily closing values the corresponding previous day’s closing values. In the below, Δvariable is used to denote the daily change of variable.

Note that for the constant spread model, the predicted ΔCC is exactly the daily change of the 10-year par swap rate (Δpar10y). For MOATS, the rates and volatilities first need to be fed into the LMM to generate the Monte Carlo interest rate paths, which are then used to derive the predicted CC through backward induction. The NN can take multiple input features and are thus more flexible than the constant-spread model, while avoiding the need for full-fledged term structure modelling such as the LMM.

In total, 2060 daily change data are available, of which 1750 (85%, “development set”, 10/04/2016 up to 10/05/2023) were used for model development, and the remaining 310 (15%, “test set”, 10/06/2023 up to 12/31/2024) were used for later evaluation of out-of-sample prediction.

Because many par rates and normal volatilities demonstrate high historical mutual correlations (> 0.9) and we observed through some initial experimentation that adding highly-correlated input features tends to harm NN performance, we restricted the pool of potential input features to the NN model to a few key par rate (3m, 6m, 1y, 3y, 10y) and normal volatility (1y1y, 1y10y, 5y5y, 5y10y, 10y10y) daily changes. We considered only single-hidden-layer NN given our small data size. (Increasing the number of hidden layers increases model complexity and would usually require larger dataset in order to achieve decent performance.) The link function of the hidden layer was chosen to be tanh.

To determine the final set of input features, size of the hidden layer, and number of training epochs, we employed five-fold cross validation as follows. The development set was further divided into five parts. Given a set of input features and a hidden layer size, five NN were trained on different four out of the five parts and evaluated on the remaining part, respectively. The number of training epochs was individually optimized for the five networks based on performance over the evaluation part. Then, the percentage explained variance over the evaluation part, averaged across the five networks, was taken as the score for the set of input features and hidden layer size. The combination of input features and hidden layer size that yielded the highest score was used for our final NN predictions.

In this way we fixed the final set of input features to be {Δpar1y,Δpar3y,Δpar10y,Δvol1y10y,Δvol5y10y} and hidden layer size to be 80.

Figure 1 showcases the cross-validation explained variance percentage over the training and evaluation parts, with the input features fixed but with different hidden layer sizes. It can be observed that there is a trend of improving evaluation performance with increasing hidden layer sizes up to 80, after which the performance becomes unstable (due to the fact that our data size is insufficient for fitting large models).

Note that, because we optimized the numbers of training epochs against evaluation data, the curve for explained variance over the training data does not show the typical monotonically increasing pattern that one might expect for machine learning models.

In the following, the reported performance of NN is based on the averaged prediction of the five optimized networks obtained in the cross-validation procedure.

Table 1 displays the total variance of ΔCC and the mean squared difference (MSD) between the actualised ΔCC and the prediction made by each approach, across the development and test sets. We can see that for in-sample prediction (development set) the NN model fits the data better than the other models, whereas for out-of-sample prediction (test set) the performance of the NN model lies in between those of the other models.

Table 1 displays the total variance of ΔCC and the mean squared difference (MSD) between the actualized ΔCC and the prediction made by each approach, across the development and test sets.
	Development	Test
Historical variance of actualised ΔCC	0.005162	0.0044
MSD constant-spread model vs. actualised ΔCC	0.001778	0.000869
MSD MOATS vs. actualised ΔCC	0.001771	0.000643
MSD NN vs. actualised ΔCC	0.001403	0.000721

We can also look at the MSD between the models’ predictions as shown in Table 2. Interestingly, the mutual differences between the predictions are much smaller than the differences between the actualised ΔCC and the predictions above. Also, the MSD between NN prediction and either of the other models’ predictions is about half of the MSD between the other two models’ predictions. It is worth noting that although the NN were trained to predict the actualised ΔCC and did not “see” the MOATS output during training, the MSD between NN and MOATS predictions ended up much smaller than the MSD between either prediction and the actualised ΔCC. This suggests that the information contained in the input features which is relevant to CC movement and captured by each model significantly overlaps. Furthermore, the prediction made by NN lies somewhere in the “middle ground” between the predictions made by the other models in the high-dimensional feature space.

table 2 looks at the MSD between the models’ predictions
	Development	Test
MSD constant-spread model vs. MOATS	0.000603	0.000478
MSD constant-spread model vs. NN	0.000352	0.000207
MSD MOATS vs. NN	0.000289	0.000187

SHAP values are nowadays commonly reported in the AI literature to provide explainable insights into NN models. For each predicted value given by a model, each input feature’s SHAP value is a metric signifying its contribution to the prediction output. (All features’ SHAP values sum up to the predicted value.)

When we plot each input feature’s SHAP values across the entire dataset, we see that Δpar10y is the main driver for most of the prediction results. This corroborates the observation above that the MSD between the constant-spread and NN models are small. However, we can see that the other features also make non-trivial contributions, at least for a portion of the data. This is information that is captured by the NN model but missed by the constant-spread model.

The majority of the SHAP values for the features Δvol1y10y, Δvol5y10y, and Δpay1y are close to 0. It means that if one trains NN dispensing of these features, one can already expect decent NN prediction for most of the days. However, for a minority of days these features become key contributors to the predicted values. This illustrates how the SHAP analysis can help us attain deeper understanding regarding the role that each variable plays in driving CC movement.

image shows SHAP values are nowadays commonly reported in the AI literature to provide explainable insights into NN models. For each predicted value given by a model, each input feature’s SHAP value is a metric signifying its contribution to the prediction output. — All the data for the charts and tables included in this report are sourced from LSEG Yield Book. Past performance is no guarantee of future returns. Please see the end for important legal disclosures.

In recent years, the potential of AI to uncover insights within huge amounts of data has gained traction across the financial community. When it comes to MBS, however, explorations along this direction have still been scarce. We present a task – predicting daily CC changes – for which we demonstrate that even a simple NN model can achieve performance on par with existing models. Interestingly, we observed that even though the NN model takes only five input par rate and volatility features, its prediction of ΔCC is close to that of MOATS, whose output is dependent on the Monte Carlo interest rate paths generated by the LMM.

The results we present are encouraging first steps and certainly point to greater roles that AI can play in MBS pricing and analytics. An obvious next step is to see how NN and MOATS compare in the cases of more general rate/volatility changes (i.e. not limited to actualised daily changes). If it can be verified that the two approaches are comparable for general rate/volatility change scenarios, the possibility would be opened up for YB to offer NN as a simplified modeling alternative for CC projection that could deliver reliable MBS pricing and scenario analytics but consumes significantly less computing resources than MOATS.

Before closing, we provide further empirical observations in Figure 3 to stress the point that the constant-spread model, or even a model that takes more par swap rates (but not swaption-implied volatilities) into account, does not incorporate all relevant information for the prediction of CC changes: The scatter plot of the [CC – 10-year par swap rate] spread vs. the 1y10y normal volatility shows a clear correlation between the two quantities. For the specific task of predicting daily CC changes, the residual error of the constant-spread model (that is, ΔCC-Δpar10y) still exhibits a decent correlation with Δvol1y10y, whereas correlation virtually does not exist between Δvol1y10y and the residual error of the MOATS or NN model. Thus, the variation of volatility clearly contributes information about CC variation that should be taken advantage of by the predictive models – which is noticeable already for the daily change case and even more so for the general case.

in Figure 3 to stress the point that the constant-spread model, or even a model that takes more par swap rates (but not swaption-implied volatilities) into account, does not incorporate all relevant information for the prediction of CC changes: The scatter plot of the [CC – 10-year par swap rate] spread vs. the 1y10y normal volatility shows a clear correlation between the two quantities. For the specific task of predicting daily CC changes, the residual error of the constant-spread model (that is, ΔCC-Δpar10y) still exhibits a decent correlation with Δvol1y10y, whereas correlation virtually does not exist between Δvol1y10y and the residual error of the MOATS or NN model. — All the data for the charts and tables included in this report are sourced from LSEG Yield Book. Past performance is no guarantee of future returns. Please see the end for important legal disclosures.

[1] Projecting Mortgage Rates for MBS Valuation: Citigroup’s MOATS Model. Citigroup Global Markets, 2005.

[2] MOATS Model. LSEG Yield Book, 2023.

See all Insights

Product and Services

Yield Book API

Yield Book flexible API technology provides access to Yield Book’s trusted data and analytics from within your own custom applications.

Request details Opens in a new tab

Email address

Country/territory

Subscribe to an email recap from:

FTSE Russell

Data & Analytics

Republication or redistribution of LSE Group content is prohibited without our prior written consent.

The content of this publication is for informational purposes only and has no legal effect, does not form part of any contract, does not, and does not seek to constitute advice of any nature and no reliance should be placed upon statements contained herein. Whilst reasonable efforts have been taken to ensure that the contents of this publication are accurate and reliable, LSE Group does not guarantee that this document is free from errors or omissions; therefore, you may not rely upon the content of this document under any circumstances and you should seek your own independent legal, investment, tax and other advice. Neither We nor our affiliates shall be liable for any errors, inaccuracies or delays in the publication or any other content, or for any actions taken by you in reliance thereon.

The content of this publication is provided by London Stock Exchange Group plc, its applicable group undertakings and/or its affiliates or licensors (the “LSE Group” or “We”) exclusively.

Neither We nor our affiliates guarantee the accuracy of or endorse the views or opinions given by any third party content provider, advertiser, sponsor or other user. We may link to, reference, or promote websites, applications and/or services from third parties. You agree that We are not responsible for, and do not control such non-LSE Group websites, applications or services.

The content of this publication is for informational purposes only. All information and data contained in this publication is obtained by LSE Group from sources believed by it to be accurate and reliable. Because of the possibility of human and mechanical error as well as other factors, however, such information and data are provided "as is" without warranty of any kind. You understand and agree that this publication does not, and does not seek to, constitute advice of any nature. You may not rely upon the content of this document under any circumstances and should seek your own independent legal, tax or investment advice or opinion regarding the suitability, value or profitability of any particular security, portfolio or investment strategy. Neither We nor our affiliates shall be liable for any errors, inaccuracies or delays in the publication or any other content, or for any actions taken by you in reliance thereon. You expressly agree that your use of the publication and its content is at your sole risk.

To the fullest extent permitted by applicable law, LSE Group, expressly disclaims any representation or warranties, express or implied, including, without limitation, any representations or warranties of performance, merchantability, fitness for a particular purpose, accuracy, completeness, reliability and non-infringement. LSE Group, its subsidiaries, its affiliates and their respective shareholders, directors, officers employees, agents, advertisers, content providers and licensors (collectively referred to as the “LSE Group Parties”) disclaim all responsibility for any loss, liability or damage of any kind resulting from or related to access, use or the unavailability of the publication (or any part of it); and none of the LSE Group Parties will be liable (jointly or severally) to you for any direct, indirect, consequential, special, incidental, punitive or exemplary damages, howsoever arising, even if any member of the LSE Group Parties are advised in advance of the possibility of such damages or could have foreseen any such damages arising or resulting from the use of, or inability to use, the information contained in the publication. For the avoidance of doubt, the LSE Group Parties shall have no liability for any losses, claims, demands, actions, proceedings, damages, costs or expenses arising out of, or in any way connected with, the information contained in this document.

LSE Group is the owner of various intellectual property rights ("IPR”), including but not limited to, numerous trademarks that are used to identify, advertise, and promote LSE Group products, services and activities. Nothing contained herein should be construed as granting any licence or right to use any of the trademarks or any other LSE Group IPR for any purpose whatsoever without the written permission or applicable licence terms.