Tim Anderson
Delta Parquet enables financial services firms to speed up data processing and reduce computation costs for today’s analytics, artificial intelligence (AI) and machine learning (ML) use cases. It builds on the Parquet file format – which provides efficient compression of data via columnar storage – by adding transaction logs, schema enforcement, and performance optimisations that bring more database-like intelligence to the data layer. The result is a high-performance storage format that makes large-scale analytics dramatically faster and more cost effective.
- Using the Delta Parquet file format can shrink a 1 TB data CSV file by 87% and dramatically reduce compute costs by 99.7%
- This efficiency comes from combining columnar storage, efficient compression, and an intelligent transaction layer
- LSEG’s Quantitative Analytics, Tick History, Tick History – PCAP, and Filings over S3 Direct on AWS are now available in Delta Parquet format, enabling faster and more economical computation than ever before.
Modern analytics, AI and ML workloads demand ever larger quantities of high-quality, granular data. As these data sets grow, so do the associated storage and compute expenses. To significantly reduce data storage and processing costs, as well as greatly accelerate processing time, financial services firms are turning to the Delta Parquet file format.
At its core, Delta Parquet is an open-source, column-oriented data file format created for highly efficient data storage and retrieval. It uses Parquet’s proven columnar compression methods and augments them with Delta Lake enhancements, time-travel capability and metadata-driven query optimisation. As a result, organisations can handle larger datasets, run more complex queries and accelerate innovation more efficiently.
Speeding up queries by 34x
Delta Parquet’s results are impressive. It has the potential to reduce a data CSV file of 1 TB to around 130 GB in the Delta Parquet format – 87% less. Query run time for the same size file shrinks from 236 seconds to just 6.78 seconds, making Delta Parquet 34 times faster. Compute costs shrink from $5.75 to just $0.01. Delta Parquet accomplishes this through a combination of its features:
- Columnar storage – This is the most fundamental feature – data storage column-by-column instead of row-by-row – highly efficient for analytical queries that typically only access a subset of columns.
- Efficient compression – Data of the same type is stored together in columns, enabling highly effective compression algorithms. This significantly reduces file sizes and storage costs.
- Schema evolution – designed to handle changes in schema over time. Users can add, remove, or modify columns without needing to rewrite the entire dataset, which is crucial in evolving data environments.
- Metadata inclusion – files include metadata within the file footer. This metadata contains information about the schema, compression used, data types, minimum and maximum values in column chunks, and more. This allows query engines to understand the data structure and optimise reads.
- Splitable and parallel processing – files are designed to be splitable, which means that they can be divided into smaller pieces. This enables distributed processing frameworks like Apache Spark and Hadoop to process parts of a single file in parallel across multiple nodes, leading to faster processing times for large datasets.
- Support for complex data structures – efficient storage of complex nested data structures like arrays and maps.
- Language and platform independent – Delta Parquet is open standard and is not tied to any specific programming language or data processing framework. This makes it highly interoperable across different systems and technologies.
- Row group statistics – Data is organised into "row groups." Each row group has statistics (like min/max values) for each column. This allows query engines to skip entire row groups when applying filters, further enhancing performance.
- Optimised for big data frameworks – widely adopted and optimised for use with popular big data processing frameworks like Apache Spark, Apache Hive, and Apache Impala, making it standard in the big data ecosystem.
As a result of these elements, Delta Parquet can deliver significant competitive advantage through faster processing and reduced storage and processing costs. This enables traders, investors and other use cases to make decisions faster, and act on opportunities more effectively.
Accelerating full historical market data and point-in-time updates with Delta Parquet
By leveraging LSEG’s Quantitative Analytics, Tick History, Tick History – PCAP, and Filings over S3 Direct in Delta Parquet format you gain operational flexibility, lower costs, and fast, secure access to extensive financial data. This unlocks high-value use cases across research, compliance, and strategy development. And because Delta Parquet is built on the open Delta Lake protocol – the native and default table format on Databricks – these datasets integrate seamlessly into Databricks lakehouse workflows for accelerated analytics and model development.
With shared storage, you eliminate data silos, improve governance, and reduce expenses. Access point-in-time and tick-level data to streamline analysis and enable robust backtesting, research, and advanced transaction cost analysis. Combined with LSEG’s QA Research Accelerator and a broad range of ready-to-use datasets, firms can pivot quickly between exploratory analysis, model development and production deployment.
From best-execution analytics and regulatory compliance (including FRTB), to AI and ML initiatives, market surveillance, and systematic trading strategic design, Delta Parquet helps organisations drive smarter decisions and stronger performance.
Legal Disclaimer
Republication or redistribution of LSE Group content is prohibited without our prior written consent.
The content of this publication is for informational purposes only and has no legal effect, does not form part of any contract, does not, and does not seek to constitute advice of any nature and no reliance should be placed upon statements contained herein. Whilst reasonable efforts have been taken to ensure that the contents of this publication are accurate and reliable, LSE Group does not guarantee that this document is free from errors or omissions; therefore, you may not rely upon the content of this document under any circumstances and you should seek your own independent legal, investment, tax and other advice. Neither We nor our affiliates shall be liable for any errors, inaccuracies or delays in the publication or any other content, or for any actions taken by you in reliance thereon.
Copyright © 2025 London Stock Exchange Group. All rights reserved.
The content of this publication is provided by London Stock Exchange Group plc, its applicable group undertakings and/or its affiliates or licensors (the “LSE Group” or “We”) exclusively.
Neither We nor our affiliates guarantee the accuracy of or endorse the views or opinions given by any third party content provider, advertiser, sponsor or other user. We may link to, reference, or promote websites, applications and/or services from third parties. You agree that We are not responsible for, and do not control such non-LSE Group websites, applications or services.
The content of this publication is for informational purposes only. All information and data contained in this publication is obtained by LSE Group from sources believed by it to be accurate and reliable. Because of the possibility of human and mechanical error as well as other factors, however, such information and data are provided "as is" without warranty of any kind. You understand and agree that this publication does not, and does not seek to, constitute advice of any nature. You may not rely upon the content of this document under any circumstances and should seek your own independent legal, tax or investment advice or opinion regarding the suitability, value or profitability of any particular security, portfolio or investment strategy. Neither We nor our affiliates shall be liable for any errors, inaccuracies or delays in the publication or any other content, or for any actions taken by you in reliance thereon. You expressly agree that your use of the publication and its content is at your sole risk.
To the fullest extent permitted by applicable law, LSE Group, expressly disclaims any representation or warranties, express or implied, including, without limitation, any representations or warranties of performance, merchantability, fitness for a particular purpose, accuracy, completeness, reliability and non-infringement. LSE Group, its subsidiaries, its affiliates and their respective shareholders, directors, officers employees, agents, advertisers, content providers and licensors (collectively referred to as the “LSE Group Parties”) disclaim all responsibility for any loss, liability or damage of any kind resulting from or related to access, use or the unavailability of the publication (or any part of it); and none of the LSE Group Parties will be liable (jointly or severally) to you for any direct, indirect, consequential, special, incidental, punitive or exemplary damages, howsoever arising, even if any member of the LSE Group Parties are advised in advance of the possibility of such damages or could have foreseen any such damages arising or resulting from the use of, or inability to use, the information contained in the publication. For the avoidance of doubt, the LSE Group Parties shall have no liability for any losses, claims, demands, actions, proceedings, damages, costs or expenses arising out of, or in any way connected with, the information contained in this document.
LSE Group is the owner of various intellectual property rights ("IPR”), including but not limited to, numerous trademarks that are used to identify, advertise, and promote LSE Group products, services and activities. Nothing contained herein should be construed as granting any licence or right to use any of the trademarks or any other LSE Group IPR for any purpose whatsoever without the written permission or applicable licence terms.