ESG Controversy Prediction
LSEG Labs Project: ESG Controversy Prediction
Using machine learning and natural language processing (NLP) to detect ESG controversies in company news
Environmental, social and governance (ESG) data is crucial to financial markets as investors take a more sustainable approach to achieving their investment goals.
Investors can review companies’ ESG disclosures, but they also typically want to access information that is not reported, and may indicate ESG controversies.
These controversies could include toxic waste spills (environmental), human rights violations (social), or corrupt CEOs (governance).
Unlike traditional financial information, ESG data is often unstructured and sourced from companies’ self-reported data and news articles. Right now, analyzing this data is a challenging and manual task, even though ESG controversies could have a significant impact on investment performance.
LSEG Labs used a combination of supervised machine learning and natural language processing (NLP) to train an algorithm to detect ESG controversies in unstructured data.
The algorithm automatically classifies whether articles contain reference to 20 ESG controversy topics defined in-house, and - where they do - provides a probability score for each of the topics.
Where the probability sits above a confidence threshold, it proceeds directly through the ESG pipeline, while low confidence predictions are sent to human analysts for further review.
The prototype could give investors a clear view of controversy predictions so they can make informed, sustainable and proactive decisions quickly and efficiently.
ESG Controversy Prediction in action
The supervised machine learning element of the project relied on Refinitiv’s leading financial news data.
The prototype includes:
- Google’s open-source NLP model, BERT (Bidirectional Encoder Representation from Transformers), a neural language model that is pretrained on 3.3 billion words from a general domain corpus such as Wikipedia and the open BookCorpus dataset.
- LSEG Labs further trained BERT to make it domain-specific to finance and business, by using content from the Reuters News Archive, adding 715 million words from about two million articles covering business and financial news.
- The team then fine-tuned the model with 31,600 annotated news articles to classify ESG controversy topics and improved the performance of the prototype beyond that of the basic BERT model.
What we’re thinking next
ESG Controversy Prediction is just one example of how Google’s open-source NLP model can be pre-trained on high-quality financial data and outperform existing language models in downstream tasks, such as classification and sentiment analysis.
LSEG Labs are currently working to train two domain-specific versions of Google’s BERT language models using our extensive Refinitiv News and Transcripts datasets. Find out more about their work here.