Recent publication
Using automatically created confidence measures in the transcription of financial events
As part of the London Stock Exchange Group (LSEG), LSEG Data & Analytics provides approximately 40,000 transcripts of financial events per year.
Using automatically created confidence measures in the transcription of financial events
As part of the London Stock Exchange Group (LSEG), LSEG Data & Analytics provides approximately 40,000 transcripts of financial events per year which are produced by our Data & Analytics Content Operations team, a group of highly skilled domain experts. Given the vast volume of events that are covered, and the rigorous standard of quality required, we consistently strive to develop innovative measures for increasing the efficiency and accuracy of the workflow used by our transcripts production team. One such development has been the use of automatically created confidence measures, a project which has leveraged machine learning.
Machine learning is used for a multitude of natural language processing (NLP) tasks. As the use of automatically generated predictions has become increasingly pervasive in many scenarios, the ability to assess the “correctness” of these predictions has become a dedicated research area of its own. Such measures of correctness represent the confidence that a system has in the decisions it makes, and they have wide-ranging implications in many diverse machine learning areas from self-driving vehicles to medical diagnosis.
In the domain of automatic speech recognition (ASR), confidence measures represent the probability that the output text is the correct transcript of what was said. These confidence measures can be generated on a variety of granularities, from the sub-word level to the document level. There are a variety of different ways of retrieving such measures: the ultimate solution will depend on the data available, the machine learning architecture being used, and the downstream use-case.
The work outlined in this paper describes the research work that was undertaken by LSEG’s in-house speech processing experts, the Centre of Expertise in Spoken Language Technologies (CESLT) to provide confidence measures for the output of their ASR system. This work was conducted as part of a broader project which saw the establishment of a pipeline for automatically transcribing financial events. This automatically generated output is manually corrected by our domain experts in a purpose-built UI. By providing document-level confidence measures, the resource allocation of work was made more efficient.
In this paper, we discuss the research that was conducted into this capability. Additionally, we outline some of the experiments that were carried out and how their results shaped the project. Finally, we discuss some of the wider implications of this project, in addition to some suggested next steps.
Useful links
Request product details
Call your local sales team
Americas
All countries (toll free): +1 800 427 7570
Brazil: +55 11 47009629
Argentina: +54 11 53546700
Chile: +56 2 24838932
Mexico: +52 55 80005740
Colombia: +57 1 4419404
Europe, Middle East, Africa
Europe: +442045302020
Africa: +27 11 775 3188
Middle East & North Africa: 800035704182
Asia Pacific (Sub-Regional)
Australia & Pacific Islands: +612 8066 2494
China mainland: +86 10 6627 1095
Hong Kong & Macau: +852 3077 5499
India, Bangladesh, Nepal, Maldives & Sri Lanka:
+91 22 6180 7525
Indonesia: +622150960350
Japan: +813 6743 6515
Korea: +822 3478 4303
Malaysia & Brunei: +603 7 724 0502
New Zealand: +64 9913 6203
Philippines: 180 089 094 050 (Globe) or
180 014 410 639 (PLDT)
Singapore and all non-listed ASEAN Countries:
+65 6415 5484
Taiwan: +886 2 7734 4677
Thailand & Laos: +662 844 9576