As Britain's energy market shifts to Half-Hourly Settlement (MHHS), Octopus Energy overcame a 48-fold data growth challenge by decomposing monolithic data pipelines into specialized streams and leveraging incremental processing to reduce cloud costs dramatically.

  • Pipeline split into three streams matching data granularity
  • Incremental data processing cuts compute load by 98.8%
  • Margin data freshness accelerated from weekly to daily

Infrastructure signal

The transition to Half-Hourly Settlement dramatically increased data volume for margin calculations by 48 times. Under the legacy monolithic architecture built for monthly data processing, operating costs were projected to skyrocket, potentially adding $1 million per year. Recognizing that simply adding compute would be financially unsustainable, the engineering team redesigned the infrastructure to handle independent data grains specialized for different commercial needs.

A key innovation was the introduction of three dedicated pipeline streams—Settlement, Half-Hourly, and Monthly—each optimized for their specific data frequency. Additionally, the adoption of Delta Lake's Change Data Feed enabled incremental processing of multi-terabyte datasets by reading only data changed since the previous pipeline run rather than full overwrites. This architectural shift reduced data processing volume by almost 99%, substantially lowering cloud compute costs and improving scalability.

Developer impact

Developers faced the challenge of transitioning from a single monolithic pipeline to managing multiple parallel workflows, each with distinct data grains and processing requirements. This necessitated new orchestration patterns, such as the 'Job of Jobs', to coordinate dependencies and execution across the three specialized streams while tuning each pipeline for optimized performance under different workloads.

The re-architecture not only lowered cost but improved developer agility with clearer workload encapsulation and incremental data frameworks. Engineers could iterate faster on domain-specific processing logic and optimize Spark jobs independently on each stream without risking cross-impact, streamlining deployment cycles and troubleshooting.

What teams should watch

Teams managing cloud infrastructure and data pipelines should monitor the evolving demands of regulatory-driven data granularity shifts, particularly how growth in data volume necessitates fundamental architecture changes rather than simple scale-up of compute. Applying incremental processing technologies like Change Data Feed and decomposing pipelines along natural grain boundaries are high-impact strategies to control cost and improve responsiveness.

Operational teams need to focus on end-to-end data freshness improvements, as improved margin visibility at half-hourly granularity enables faster commercial decision-making on pricing and risk management. Observability enhancements that track pipeline performance and data reconciliation across independent streams will be increasingly critical for maintaining reliability and compliance at scale.

Source assisted: This briefing began from a discovered source item from Databricks Blog. Open the original source.
How SignalDesk reports: feeds and outside sources are used for discovery. Public briefings are edited to add context, buyer relevance and attribution before they are published. Read the standards

Related briefings