Recent publication

Using automatically created confidence measures in the transcription of financial events

As part of the London Stock Exchange Group (LSEG), LSEG Data & Analytics provides approximately 40,000 transcripts of financial events per year.

Overview

Request product details

As part of the London Stock Exchange Group (LSEG), LSEG Data & Analytics provides approximately 40,000 transcripts of financial events per year which are produced by our Data & Analytics Content Operations team, a group of highly skilled domain experts. Given the vast volume of events that are covered, and the rigorous standard of quality required, we consistently strive to develop innovative measures for increasing the efficiency and accuracy of the workflow used by our transcripts production team. One such development has been the use of automatically created confidence measures, a project which has leveraged machine learning.

Machine learning is used for a multitude of natural language processing (NLP) tasks. As the use of automatically generated predictions has become increasingly pervasive in many scenarios, the ability to assess the “correctness” of these predictions has become a dedicated research area of its own. Such measures of correctness represent the confidence that a system has in the decisions it makes, and they have wide-ranging implications in many diverse machine learning areas from self-driving vehicles to medical diagnosis.

In the domain of automatic speech recognition (ASR), confidence measures represent the probability that the output text is the correct transcript of what was said. These confidence measures can be generated on a variety of granularities, from the sub-word level to the document level. There are a variety of different ways of retrieving such measures: the ultimate solution will depend on the data available, the machine learning architecture being used, and the downstream use-case.

The work outlined in this paper describes the research work that was undertaken by LSEG’s in-house speech processing experts, the Centre of Expertise in Spoken Language Technologies (CESLT) to provide confidence measures for the output of their ASR system. This work was conducted as part of a broader project which saw the establishment of a pipeline for automatically transcribing financial events. This automatically generated output is manually corrected by our domain experts in a purpose-built UI. By providing document-level confidence measures, the resource allocation of work was made more efficient.

In this paper, we discuss the research that was conducted into this capability. Additionally, we outline some of the experiments that were carried out and how their results shaped the project. Finally, we discuss some of the wider implications of this project, in addition to some suggested next steps.

Read our research