Data & Analytics Insights

Delta Parquet: Accelerating innovation by optimising for big data

Tim Anderson

Director of Tick History and Quantitative and Economic Data at LSEG Data & Analytics

Delta Parquet enables financial services firms to speed up data processing and reduce computation costs for today’s analytics, artificial intelligence (AI) and machine learning (ML) use cases. It builds on the Parquet file format – which provides efficient compression of data via columnar storage – by adding transaction logs, schema enforcement, and performance optimisations that bring more database-like intelligence to the data layer. The result is a high-performance storage format that makes large-scale analytics dramatically faster and more cost effective. 

  • Using the Delta Parquet file format can shrink a 1 TB data CSV file by 87% and dramatically reduce compute costs by 99.7%
  • This efficiency comes from combining columnar storage, efficient compression, and an intelligent transaction layer
  • LSEG’s Quantitative Analytics, Tick History, Tick History – PCAP, and Filings over S3 Direct on AWS are now available in Delta Parquet format, enabling faster and more economical computation than ever before. 

Modern analytics, AI and ML workloads demand ever larger quantities of high-quality, granular data. As these data sets grow, so do the associated storage and compute expenses. To significantly reduce data storage and processing costs, as well as greatly accelerate processing time, financial services firms are turning to the Delta Parquet file format. 

At its core, Delta Parquet is an open-source, column-oriented data file format created for highly efficient data storage and retrieval. It uses Parquet’s proven columnar compression methods and augments them with Delta Lake enhancements, time-travel capability and metadata-driven query optimisation. As a result, organisations can handle larger datasets, run more complex queries and accelerate innovation more efficiently. 

Speeding up queries by 34x

Delta Parquet’s results are impressive. It has the potential to reduce a data CSV file of 1 TB to around 130 GB in the Delta Parquet format – 87% less. Query run time for the same size file shrinks from 236 seconds to just 6.78 seconds, making Delta Parquet 34 times faster. Compute costs shrink from $5.75 to just $0.01. Delta Parquet accomplishes this through a combination of its features:

  • Columnar storage – This is the most fundamental feature – data storage column-by-column instead of row-by-row – highly efficient for analytical queries that typically only access a subset of columns.
  • Efficient compression – Data of the same type is stored together in columns, enabling highly effective compression algorithms. This significantly reduces file sizes and storage costs.
  • Schema evolution – designed to handle changes in schema over time. Users can add, remove, or modify columns without needing to rewrite the entire dataset, which is crucial in evolving data environments.
  • Metadata inclusion – files include metadata within the file footer. This metadata contains information about the schema, compression used, data types, minimum and maximum values in column chunks, and more. This allows query engines to understand the data structure and optimise reads.
  • Splitable and parallel processing – files are designed to be splitable, which means that they can be divided into smaller pieces. This enables distributed processing frameworks like Apache Spark and Hadoop to process parts of a single file in parallel across multiple nodes, leading to faster processing times for large datasets.
  • Support for complex data structures – efficient storage of complex nested data structures like arrays and maps.
  • Language and platform independent – Delta Parquet is open standard and is not tied to any specific programming language or data processing framework. This makes it highly interoperable across different systems and technologies.
  • Row group statistics – Data is organised into "row groups." Each row group has statistics (like min/max values) for each column. This allows query engines to skip entire row groups when applying filters, further enhancing performance.
  • Optimised for big data frameworks – widely adopted and optimised for use with popular big data processing frameworks like Apache Spark, Apache Hive, and Apache Impala, making it standard in the big data ecosystem.

As a result of these elements, Delta Parquet can deliver significant competitive advantage through faster processing and reduced storage and processing costs. This enables traders, investors and other use cases to make decisions faster, and act on opportunities more effectively.

Accelerating full historical market data and point-in-time updates with Delta Parquet

By leveraging LSEG’s Quantitative Analytics, Tick History, Tick History – PCAP, and Filings over  S3 Direct in Delta Parquet format you gain operational flexibility, lower costs, and fast, secure access to extensive financial data. This unlocks high-value use cases across research, compliance, and strategy development. And because Delta Parquet is built on the open Delta Lake protocol – the native and default table format on Databricks – these datasets integrate seamlessly into Databricks lakehouse workflows for accelerated analytics and model development.

With shared storage, you eliminate data silos, improve governance, and reduce expenses. Access point-in-time and tick-level data to streamline analysis and enable robust backtesting, research, and advanced transaction cost analysis. Combined with  LSEG’s QA Research Accelerator and a broad range of ready-to-use datasets, firms can pivot quickly between exploratory analysis, model development and production deployment.

From best-execution analytics and regulatory compliance (including FRTB), to AI and ML initiatives, market surveillance, and systematic trading strategic design, Delta Parquet helps organisations drive smarter decisions and stronger performance.

Read more about

Stay updated

Subscribe to an email recap from:

Legal Disclaimer

Republication or redistribution of LSE Group content is prohibited without our prior written consent. 

The content of this publication is for informational purposes only and has no legal effect, does not form part of any contract, does not, and does not seek to constitute advice of any nature and no reliance should be placed upon statements contained herein. Whilst reasonable efforts have been taken to ensure that the contents of this publication are accurate and reliable, LSE Group does not guarantee that this document is free from errors or omissions; therefore, you may not rely upon the content of this document under any circumstances and you should seek your own independent legal, investment, tax and other advice. Neither We nor our affiliates shall be liable for any errors, inaccuracies or delays in the publication or any other content, or for any actions taken by you in reliance thereon.

Copyright © 2025 London Stock Exchange Group. All rights reserved.