
Yishul Wei

Seunghyeon Son
MBS prepayment cashflow modeling critically depends on accurate projections of the current coupon (CC) along simulated interest rate paths. Underlying the Yield Book’s market-leading MBS analytics is the MOATS model for the generation of such projections. The recent success of AI witnessed in other domains suggests the possibility that equally high-performing models for CC projection may be developed through universal AI frameworks such as neural networks (NN). In this report we explore this potential by developing a NN model for the task of predicting daily CC changes and comparing its performance with the benchmark constant-spread model as well as MOATS.
Introduction
In pricing mortgage-backed securities (MBS), the current coupon (CC) and its future projections along interest rate paths are key factors determining the modelled prepayment behaviour (and hence the modelled future cashflows). The CC is the coupon of a hypothetical to-be-announced (TBA) contract that settles in 30 days and has present value 102 or 100. (The corresponding CC is also known as CMM102 and CMM100, respectively. In this report, we focus on CMM102 since it has rich historical data as compared with CMM100 which was reintroduced a few years ago.) Its value is interpolated from the available TBA market prices and is intended to represent the fair yield of new mortgages. (The technical definition of CC is not particularly relevant for this research report, and we refer interested readers to our previous publications [1,2] for more details.)
The LSEG Yield Book (YB) MBS pricing suite offers users two alternatives to produce CC projections along interest rate paths generated by the LIBOR Market Model (LMM). The first is the constant-spread model that assumes a constant spread (which YB can extract from closing market quotes) between the CC and the 10-year par swap rate. The other is the Mortgage Option-Adjusted Term Structure (MOATS) model that utilizes a simplified prepayment model and a backward induction methodology to systematically derive projected TBA prices along the interest rates paths, from which the CC projections can be calculated according to the above definition. These models have been documented extensively in our previous publications [1,2].
Our previous publications have also mentioned regression-based approaches as possible alternatives for the generation of CC projections but have not provided quantitative evaluation with respect to how regression-based approaches to CC projections compare with the above-mentioned two models. Regression-based approaches can easily incorporate variables other than the 10-year swap rate into the model that determines CC projections along the paths yet are typically much more computationally efficient than MOATS. Traditional statistical regression, however, has the drawback that modelling frameworks are usually rigid in the assumed functional form that relates output and input variables. (In other words, they require very strong statistical assumptions.) For example, commonly adopted linear regression assumes the relation between output and input variables were linear, whereas it may not be the case.
Neural networks (NN) are more sophisticated regression models equipped with higher levels of flexibility than traditional regression. They are also the workhorse driving the recent artificial intelligence (AI) revolution. NN loosen the statistical assumption regarding the functional form of the relation between output and input variables, as the number and sizes (i.e. dimensions) of the “hidden layers” can be freely adjusted. (However, if too much flexibility is allowed, the problem of overfitting is likely to happen. The standard practice is to use cross-validation to select the appropriate model size. This is also the approach we take below.)
Although NN are usually regarded as “black box” models, recently the AI community has proposed methodologies that intend to make NN more explainable. For example, the Shapley Additive Explanations (SHAP) have now become the widely adopted metrics to evaluate the contribution of each input variable to the prediction output.
This report presents our work on comparing three approaches to the task of predicting daily CC changes:
- Constant Spread Model
- MOATS
- NN
Essentially, the research question is: Given the changes of par swap rates and swaption-implied volatilities from those of the previous day, how well does each model predict the change of CC from that of the previous day? Our previous publication [2] has reported on the comparison between the performance of 1) and 2) on the same task. By adding 3) to the comparison, this report intends to demonstrate, at least at a proof-of-concept level, that NN can effectively capture more details than the constant-spread model about the dependency of CC on rate dynamics, yet be more computationally efficient – and conceptually more straightforward to understand through tools like SHAP – than MOATS.
Data
The daily closing values of par swap rates, normal swaption-implied volatilities, and CC from 10/03/2016 to 12/31/2024 were extracted from the YB database. (2016/10/03 is the first day for which the normal LMM model is available in the current YB system.) All data were converted to daily changes by subtracting from the daily closing values the corresponding previous day’s closing values. In the below, Δvariable is used to denote the daily change of variable.
Note that for the constant spread model, the predicted ΔCC is exactly the daily change of the 10-year par swap rate (Δpar10y). For MOATS, the rates and volatilities first need to be fed into the LMM to generate the Monte Carlo interest rate paths, which are then used to derive the predicted CC through backward induction. The NN can take multiple input features and are thus more flexible than the constant-spread model, while avoiding the need for full-fledged term structure modelling such as the LMM.
Development of the NN Model
In total, 2060 daily change data are available, of which 1750 (85%, “development set”, 10/04/2016 up to 10/05/2023) were used for model development, and the remaining 310 (15%, “test set”, 10/06/2023 up to 12/31/2024) were used for later evaluation of out-of-sample prediction.
Because many par rates and normal volatilities demonstrate high historical mutual correlations (> 0.9) and we observed through some initial experimentation that adding highly-correlated input features tends to harm NN performance, we restricted the pool of potential input features to the NN model to a few key par rate (3m, 6m, 1y, 3y, 10y) and normal volatility (1y1y, 1y10y, 5y5y, 5y10y, 10y10y) daily changes. We considered only single-hidden-layer NN given our small data size. (Increasing the number of hidden layers increases model complexity and would usually require larger dataset in order to achieve decent performance.) The link function of the hidden layer was chosen to be tanh.
To determine the final set of input features, size of the hidden layer, and number of training epochs, we employed five-fold cross validation as follows. The development set was further divided into five parts. Given a set of input features and a hidden layer size, five NN were trained on different four out of the five parts and evaluated on the remaining part, respectively. The number of training epochs was individually optimized for the five networks based on performance over the evaluation part. Then, the percentage explained variance over the evaluation part, averaged across the five networks, was taken as the score for the set of input features and hidden layer size. The combination of input features and hidden layer size that yielded the highest score was used for our final NN predictions.
In this way we fixed the final set of input features to be {Δpar1y,Δpar3y,Δpar10y,Δvol1y10y,Δvol5y10y} and hidden layer size to be 80.
Figure 1 showcases the cross-validation explained variance percentage over the training and evaluation parts, with the input features fixed but with different hidden layer sizes. It can be observed that there is a trend of improving evaluation performance with increasing hidden layer sizes up to 80, after which the performance becomes unstable (due to the fact that our data size is insufficient for fitting large models).
Note that, because we optimized the numbers of training epochs against evaluation data, the curve for explained variance over the training data does not show the typical monotonically increasing pattern that one might expect for machine learning models.
Figure 1: NN Explained Variance
In the following, the reported performance of NN is based on the averaged prediction of the five optimized networks obtained in the cross-validation procedure.
Performance of ΔCC Prediction
Table 1 displays the total variance of ΔCC and the mean squared difference (MSD) between the actualized ΔCC and the prediction made by each approach, across the development and test sets. We can see that for in-sample prediction (development set) the NN model fits the data better than the other models, whereas for out-of-sample prediction (test set) the performance of the NN model lies in between those of the other models.
Table 1: Overall Variance of and the Prediction Performance of Each Model
Development | Test | |
---|---|---|
Historical variance of actualised ΔCC | 0.005162 | 0.0044 |
MSD constant-spread model vs. actualised ΔCC | 0.001778 | 0.000869 |
MSD MOATS vs. actualised ΔCC | 0.001771 | 0.000643 |
MSD NN vs. actualised ΔCC | 0.001403 | 0.000721 |
Table 2: Differences Between Model Predictions
Development | Test | |
---|---|---|
MSD constant-spread model vs. MOATS | 0.000603 | 0.000478 |
MSD constant-spread model vs. NN | 0.000352 | 0.000207 |
MSD MOATS vs. NN | 0.000289 | 0.000187 |
Feature Contributions to Prediction per SHAP
SHAP values are nowadays commonly reported in the AI literature to provide explainable insights into NN models. For each predicted value given by a model, each input feature’s SHAP value is a metric signifying its contribution to the prediction output. (All features’ SHAP values sum up to the predicted value.)
When we plot each input feature’s SHAP values across the entire dataset, we see that Δpar10y is the main driver for most of the prediction results. This corroborates the observation above that the MSD between the constant-spread and NN models are small. However, we can see that the other features also make non-trivial contributions, at least for a portion of the data. This is information that is captured by the NN model but missed by the constant-spread model.
The majority of the SHAP values for the features Δvol1y10y, Δvol5y10y, and Δpay1y are close to 0. It means that if one trains NN dispensing of these features, one can already expect decent NN prediction for most of the days. However, for a minority of days these features become key contributors to the predicted values. This illustrates how the SHAP analysis can help us attain deeper understanding regarding the role that each variable plays in driving CC movement.
Figure 2: SHAP Contributions of Each Input Feature to NN Predicted Values over the Entire Dataset

Source information
Summary
In recent years, the potential of AI to uncover insights within huge amounts of data has gained traction across the financial community. When it comes to MBS, however, explorations along this direction have still been scarce. We present a task – predicting daily CC changes – for which we demonstrate that even a simple NN model can achieve performance on par with existing models. Interestingly, we observed that even though the NN model takes only five input par rate and volatility features, its prediction of ΔCC is close to that of MOATS, whose output is dependent on the Monte Carlo interest rate paths generated by the LMM.
The results we present are encouraging first steps and certainly point to greater roles that AI can play in MBS pricing and analytics. An obvious next step is to see how NN and MOATS compare in the cases of more general rate/volatility changes (i.e. not limited to actualized daily changes). If it can be verified that the two approaches are comparable for general rate/volatility change scenarios, the possibility would be opened up for YB to offer NN as a simplified modeling alternative for CC projection that could deliver reliable MBS pricing and scenario analytics but consumes significantly less computing resources than MOATS.
Before closing, we provide further empirical observations in Figure 3 to stress the point that the constant-spread model, or even a model that takes more par swap rates (but not swaption-implied volatilities) into account, does not incorporate all relevant information for the prediction of CC changes: The scatter plot of the [CC – 10-year par swap rate] spread vs. the 1y10y normal volatility shows a clear correlation between the two quantities. For the specific task of predicting daily CC changes, the residual error of the constant-spread model (that is, ΔCC-Δpar10y) still exhibits a decent correlation with Δvol1y10y, whereas correlation virtually does not exist between Δvol1y10y and the residual error of the MOATS or NN model. Thus, the variation of volatility clearly contributes information about CC variation that should be taken advantage of by the predictive models – which is noticeable already for the daily change case and even more so for the general case.
Figure 3: Scatter Plots Illustrating Dependency of CC Movement on Volatilities

Source information
Legal Disclaimer
Republication or redistribution of LSE Group content is prohibited without our prior written consent.
The content of this publication is for informational purposes only and has no legal effect, does not form part of any contract, does not, and does not seek to constitute advice of any nature and no reliance should be placed upon statements contained herein. Whilst reasonable efforts have been taken to ensure that the contents of this publication are accurate and reliable, LSE Group does not guarantee that this document is free from errors or omissions; therefore, you may not rely upon the content of this document under any circumstances and you should seek your own independent legal, investment, tax and other advice. Neither We nor our affiliates shall be liable for any errors, inaccuracies or delays in the publication or any other content, or for any actions taken by you in reliance thereon.
Copyright © 2025 London Stock Exchange Group. All rights reserved.
The content of this publication is provided by London Stock Exchange Group plc, its applicable group undertakings and/or its affiliates or licensors (the “LSE Group” or “We”) exclusively.
Neither We nor our affiliates guarantee the accuracy of or endorse the views or opinions given by any third party content provider, advertiser, sponsor or other user. We may link to, reference, or promote websites, applications and/or services from third parties. You agree that We are not responsible for, and do not control such non-LSE Group websites, applications or services.
The content of this publication is for informational purposes only. All information and data contained in this publication is obtained by LSE Group from sources believed by it to be accurate and reliable. Because of the possibility of human and mechanical error as well as other factors, however, such information and data are provided "as is" without warranty of any kind. You understand and agree that this publication does not, and does not seek to, constitute advice of any nature. You may not rely upon the content of this document under any circumstances and should seek your own independent legal, tax or investment advice or opinion regarding the suitability, value or profitability of any particular security, portfolio or investment strategy. Neither We nor our affiliates shall be liable for any errors, inaccuracies or delays in the publication or any other content, or for any actions taken by you in reliance thereon. You expressly agree that your use of the publication and its content is at your sole risk.
To the fullest extent permitted by applicable law, LSE Group, expressly disclaims any representation or warranties, express or implied, including, without limitation, any representations or warranties of performance, merchantability, fitness for a particular purpose, accuracy, completeness, reliability and non-infringement. LSE Group, its subsidiaries, its affiliates and their respective shareholders, directors, officers employees, agents, advertisers, content providers and licensors (collectively referred to as the “LSE Group Parties”) disclaim all responsibility for any loss, liability or damage of any kind resulting from or related to access, use or the unavailability of the publication (or any part of it); and none of the LSE Group Parties will be liable (jointly or severally) to you for any direct, indirect, consequential, special, incidental, punitive or exemplary damages, howsoever arising, even if any member of the LSE Group Parties are advised in advance of the possibility of such damages or could have foreseen any such damages arising or resulting from the use of, or inability to use, the information contained in the publication. For the avoidance of doubt, the LSE Group Parties shall have no liability for any losses, claims, demands, actions, proceedings, damages, costs or expenses arising out of, or in any way connected with, the information contained in this document.
LSE Group is the owner of various intellectual property rights ("IPR”), including but not limited to, numerous trademarks that are used to identify, advertise, and promote LSE Group products, services and activities. Nothing contained herein should be construed as granting any licence or right to use any of the trademarks or any other LSE Group IPR for any purpose whatsoever without the written permission or applicable licence terms.