Financial Language Modelling

Financial Language Modelling

LSEG Labs Project: Financial Language Modelling

Creating new NLP models that have a better understanding of financial language

Problem

Open-source models like Google’s BERT-BASE architecture allow for state-of-the-art performance in natural language processing (NLP).

However, the BERT-BASE model is trained on Wikipedia and has not been exposed to finance-specific language and semantics, limiting the accuracy that financial data scientists can expect from their machine learning models.

Solution

LSEG Labs saw an opportunity to extend BERT-BASE and create finance-domain specific models that outperform the open-source equivalents by leveraging LSEG’s depth and breadth of unstructured financial data.

The team have trained two domain-specific versions of Google’s BERT language models using extensive News and Transcripts archives – BERT-RNA and BERT-TRAN.

The models have a better understanding of financial language, produce more accurate word embeddings, and ultimately can improve the performance of downstream tasks such as text classification, topic modelling, auto summarisation and sentiment analysis.

Financial Language Modelling in action

LSEG Labs’ models return a single document embedding, or a vector of word embeddings, for two pre-trained models:

         1.   BERT-RNA

Pre-trained using Reuters News Archive, this model consists of all Reuters articles published between 1996 and 2019.

On the downstream task of classifying financial news for ESG controversies, BERT-RNA outperformed BERT-BASE by 4% in terms of accuracy.

On the downstream task of identifying news related to COVID-19 as either a risk or opportunities, BERT-RNA again outperformed BERT-BASE by 4% in terms of accuracy.

         2.   BERT-TRAN

The BERT-BASE was pre-trained using a large corpus of earnings call transcripts, consisting of 390,000 transcripts, totalling 2.9bn words.

Financial Language Modelling UI

What we’re thinking next

Both models are now available on the Refinitiv Data Platform and LSEG Labs are also giving a small group of customers early access to use their new models via a test user interface with include tutorials, example training data and use-cases. Their feedback will inform the next phase of the Financial Language Modelling project.

This is an early but important step in being able to scale the understanding of trends and insight in finance’s unstructured data. The team hope their findings and results continue to help move the performance on BERT forwards in the financial industry.