Why tombola selected Graviton-powered RG situations for Amazon Redshift


A part of Flutter Leisure, the world’s largest on-line sports activities betting and iGaming operator, tombola is the world’s largest on-line bingo group and has been utilizing Amazon Redshift to run its information analytics workloads. Based in Sunderland, UK, the corporate traces its roots to the Fifties, when it started printing bingo tickets through the golden age of the sport. tombola launched on-line in 2006 and has since expanded to Italy, Spain, Denmark, and Sweden. The corporate builds all of its video games in-house, holds probably the most prestigious Safer Playing award, and just lately partnered with Flutter sibling model Sisal to convey its bingo utility to Italian gamers.

On this put up, you learn the way tombola adopted a strict engineering precept: no adjustments to manufacturing with out proof. That meant a head-to-head comparability of RA3 versus RG on their precise workload. You additionally see benchmark outcomes on Amazon S3 Tables and the migration from RA3 to RG situations.

Present information structure

Amazon Redshift sits on the middle of tombola’s information structure. The manufacturing cluster runs on RA3 nodes and serves a number of schemas with a whole bunch of tables, supporting each analytical workload the enterprise runs, from sub-second utility lookups to multi-minute extract, remodel, load (ETL) transforms. What makes tombola’s Amazon Redshift workload distinctive is the breadth of what flows by means of it. Amazon Managed Workflows for Apache Airflow (Amazon MWAA) DAGs orchestrate pipelines throughout over 14 enterprise domains, together with segmentation, fraud detection, advertising and marketing, finance, and SafePlay responsible-gaming. Configuration-driven ingestion pipelines land information from SQL Server, Amazon DynamoDB, Amazon OpenSearch Service, Postgres, and exterior APIs into Bronze and Silver layers on Amazon Easy Storage Service (Amazon S3), earlier than loading it into Amazon Redshift. From there, over 250 dbt fashions working on Amazon Elastic Container Service (Amazon ECS) remodel the information into analytical gold layers. Outputs feed a number of downstream shoppers: Amazon SageMaker for fraud scoring and churn prediction, Amazon DynamoDB for low-latency APIs, and region-specific pipelines spanning the UK, Italy, Spain, Denmark, and Sweden. As the appliance grew, with extra domains, extra DAGs, and extra concurrent customers, the group started evaluating methods to scale back steady-state question latency and decrease compute value with out rearchitecting the system. When AWS made Graviton-powered RG nodes out there for Amazon Redshift, the timing was proper.

Benchmark efficiency outcomes

The benchmark infrastructure was totally outlined as infrastructure as code (IaC), ensuring each take a look at run was reproducible. The group deployed two take a look at benchmark clusters (one RA3 and one RG) in a like-for-like configuration. They mirrored the settings (Amazon Digital Personal Cloud (Amazon VPC), safety teams, AWS Key Administration Service (AWS KMS), AWS Identification and Entry Administration (IAM) roles, and parameter teams) from the manufacturing atmosphere to take away configuration drift. The benchmark runner was containerized as an Amazon ECS job (python:3.11-slim-bookworm ARM64 base), offering repeatable, remoted execution for every take a look at spherical. Benchmark workloads have been chosen by analyzing manufacturing cluster logs and metrics, then categorized into three tiers:

  • Heavy: ETL queries with multi-table CTE chains, full-table scans, and aggregation home windows.
  • Medium: Enterprise intelligence (BI) queries driving reporting and analytics dashboards.
  • Gentle: Software queries with sub-second response occasions.

Structure

Eventualities examined

To validate the efficiency of Graviton-powered RG situations towards the present RA3 nodes, tombola designed 4 benchmark situations that progressively improve in complexity and realism. Collectively, these situations present a complete view of efficiency from remoted question execution by means of to sustained, real-world analytical workloads.

State of affairs 01: Chilly-cache, single-stream execution. This state of affairs isolates uncooked compute efficiency by working queries towards a chilly cache in a single stream, avoiding caching and concurrency as variables.

Per-query speedups ranged from 1.05× (mild lookup queries) to 1.68× (heavy ETL transforms). Zero errors on each clusters (28 makes an attempt every).

Weight Class RA3 p50 (ms) RG p50 (ms) Speedup
Heavy (ETL) 210,372 133,855 1.57×
Medium (BI) 2,193 1,642 1.34×
Gentle (App) 3.20 2.76 1.16×

The next chart exhibits per-query speedup ratios for the cold-cache state of affairs. Heavy ETL queries (left) present the biggest features, with speedups of 1.57–1.68×, and lighter queries nonetheless profit at 1.05–1.16×. The sample is constant: RG’s benefit scales with question complexity.

State of affairs 02: Heat-cache, single-stream execution. This state of affairs repeats State of affairs 01 with the consequence cache enabled to substantiate that RG maintains its latency benefit even when cached outcomes are in play.

Per-query speedups ranged from 1.04× to 1.64×. Zero errors on each clusters (35 makes an attempt every).

Weight Class RA3 p50 (ms) RG p50 (ms) Speedup
Heavy (ETL) 93,636 61,691 1.52×
Medium (BI) 2,189 1,584 1.38×
Gentle (App) 3.08 2.58 1.19×

With consequence caching enabled, the speedup sample holds for non-cached queries. Cache hits on each clusters land in 118–185 ms, confirming the caching subsystem operates identically no matter node sort. The RG benefit seems completely on execution paths that bypass the cache.

State of affairs 03: Concurrency sweep. This state of affairs introduces parallel load by sweeping by means of 1, 5, 10, and 20 concurrent streams, testing how every node sort handles rivalry and queuing beneath stress.

Each clusters used the identical Concurrency Scaling configuration (max_concurrency_scaling_clusters=1, WLM-only). RG accomplished 482 extra queries in the identical wall-clock window.

Metric RA3 RG Enchancment
Whole queries accomplished 1,438 1,920 +33% throughput
Gentle p50 (ms) 3.44 3.04 1.13×
Medium p50 (ms) 20,784 15,055 1.38×
Errors 0 0

Beneath rising parallel load (1, 5, 10, and 20 concurrent streams), RG maintained decrease latencies and accomplished 33 p.c extra queries in the identical wall-clock window. Each clusters used the identical Concurrency Scaling configuration, so the throughput distinction is attributable to per-node compute effectivity.

State of affairs 04: Blended life like workload. This state of affairs combines the earlier parts right into a combined life like workload, working 10 streams concurrently for half-hour with a weighted distribution of heavy, medium, and lightweight queries to simulate precise manufacturing circumstances.

This state of affairs finest simulates manufacturing. The headline discovering: heavy ETL queries noticed speedups of as much as 2.27× beneath concurrent load, and RG accomplished 46 p.c extra complete queries in the identical 30-minute window. Zero errors on each clusters.

Metric RA3 RG Enchancment
Whole queries accomplished 405 593 +46% throughput
Heavy p50 (ms) 1,186,572 642,294 1.85×
Medium p50 (ms) 2,319 1,631 1.42×
Gentle p50 (ms) 3.12 2.90 1.08×
Errors 0 0

The mixed-realistic state of affairs finest simulates manufacturing. Beneath 10 concurrent streams over half-hour, heavy ETL queries confirmed speedups of as much as 2.27×. RG’s per-vCPU throughput benefit compounds beneath rivalry, precisely the situation the place manufacturing clusters spend most of their time.

Prolonged benchmark: Amazon S3 Tables (Iceberg) efficiency

tombola’s future information structure will combine with brokers and revolves round Apache Iceberg, backed by Amazon S3 Tables. Amazon S3 Tables provide Amazon S3 storage that’s particularly tuned for analytics, with built-in capabilities that maintain making queries sooner and serving to decrease storage prices for desk information. They’re purpose-built to carry tabular datasets, resembling every day buy logs, streaming sensor readings, or advert impression occasions. On this mannequin, information is organized into rows and columns, much like how info is structured in a conventional database desk. With that path in thoughts, tombola additionally benchmarked Graviton’s efficiency querying Iceberg tables instantly. The dataset consists of participant profiles, recreation session historical past, and geolocation information: a mixture of vast tables and high-cardinality columns that stress each compute and I/O.

To guage efficiency throughout completely different situations, tombola generated queries at various ranges of complexity. Medium queries contain customary analytical features like rating and aggregation, and Medium-Excessive queries introduce multi-step transformations with joins and cumulative calculations. On the Excessive tier, queries mix distinct counting, conditional pivoting, and time-window aggregations. Very Excessive queries are probably the most demanding: self-joins throughout the total dataset, multi-signal scoring logic, and superior statistical features. This tiered strategy captures how every node sort performs as computational calls for improve.

As with the earlier benchmarks, the group stored the take a look at as comparable as potential: a real like-for-like analysis between RG (powered by Graviton) and RA3 nodes of equal dimension.

Testing was cut up into two phases:

Part 1: Concurrency. All queries have been submitted concurrently to measure how effectively every node sort handles concurrent workloads. The aim was to grasp throughput variations: how far more work RG nodes can push by means of beneath stress in comparison with equally sized RA3 nodes.

All queries have been run concurrently throughout a number of rounds:

Grouped bar chart showing total execution time across 3 rounds for RA3 vs Graviton

Part 2: Sequential execution. Every question was run in isolation with full compute sources out there. This eliminated concurrency as a variable and gave a clear learn on uncooked question efficiency. The outcomes have been clear: RG outperformed RA3 throughout a number of question varieties, displaying constant features when given devoted compute.

In sequential execution, Graviton (RG) delivered constant efficiency features throughout all question complexity ranges: Medium-complexity queries ran 45–73 p.c sooner (common 58 p.c), Medium-Excessive queries improved by 42 p.c, Excessive-complexity queries achieved 57–66 p.c sooner execution (common 62 p.c), and Very Excessive-complexity queries noticed features of 60–67 p.c (common 63 p.c). The outcomes display that RG’s benefit scales with workload complexity, delivering the biggest enhancements on probably the most demanding analytical queries.

tombola’s modernization strategy

tombola is modernizing its Amazon Redshift cluster utilizing the Elastic Resize path to alter from RA3 to RG node varieties. The operation snapshots the present cluster, provisions a brand new RG cluster from that snapshot, and transfers information within the background. Throughout this switch interval, the supply cluster stays out there in read-only mode. When the resize nears completion, Amazon Redshift robotically updates the endpoint to level to the brand new RG cluster and drops connections to the supply. The group selected this strategy as a result of it aligns with their engineering precept of evidence-based adjustments: no manufacturing cutover with out proof. The benchmark outcomes, with zero errors throughout all situations towards production-representative workloads, supplied the boldness wanted to proceed. After the resize is full, the exterior tables, schemas, and question syntax stay unchanged. With RG’s built-in information lake question engine, tombola additionally removes its dependency on Amazon Redshift Spectrum. Information lake queries now run instantly on cluster nodes inside the Amazon VPC boundary, utilizing present IAM roles, with zero per-TB scanning costs.

Conclusion

The benchmark outcomes make a compelling case for migrating tombola’s Amazon Redshift infrastructure from RA3 (Intel Xeon) to RG (Graviton4) situations. Throughout each state of affairs examined, RG delivered important and constant efficiency features:

  • Chilly-cache efficiency: 1.57× sooner on heavy ETL queries, with per-query speedups as much as 1.68×.
  • Heat-cache efficiency: 1.52× sooner on heavy workloads, sustaining benefit even with consequence caching enabled.
  • Concurrency: 33 p.c increased throughput beneath parallel load, with RG sustaining decrease latencies as streams elevated from 1 to twenty.
  • Blended life like workload: 1.85× sooner on heavy ETL queries and 46 p.c extra complete queries accomplished, the state of affairs closest to manufacturing visitors patterns.
  • Amazon S3 Tables (Iceberg): As much as 51 p.c sooner beneath concurrent load and 57 p.c sooner in sequential execution, vital for tombola’s future lakehouse structure.

Past uncooked efficiency, RG delivers architectural advantages that align with tombola’s strategic path. The built-in information lake question engine removes Amazon Redshift Spectrum overhead and per-TB scan costs. The 4:3 node mapping (4 ra3.4xlarge nodes to three rg.4xlarge nodes) reduces infrastructure prices by 25 p.c.

Primarily based on these outcomes, tombola are modernizing their manufacturing Amazon Redshift cluster to Graviton4-based RG situations. The work has already began and related outcomes as above are observed.  The present RA3 options, together with concurrency scaling, information sharing, and system views, are totally supported on RG. This positions tombola to deal with rising information volumes and person concurrency with higher efficiency, higher value effectivity, and a predictable pricing mannequin as the appliance scales.

The outcomes and advantages described on this put up are particular to tombola’s workload and atmosphere. Though Amazon Redshift RG situations powered by AWS Graviton4 processors can ship important efficiency enhancements, precise outcomes will range based mostly on elements together with workload traits, information volumes, cluster configuration, and question complexity. We encourage you to guage RG situations with your individual workloads to find out the advantages in your atmosphere. To study extra, go to the Amazon Redshift advertising and marketing web page and the Amazon Redshift documentation, or get began within the Amazon Redshift console.


Concerning the authors

Prabhu Pandian

Prabhu Pandian

Prabhu has over 15 years of expertise spanning information engineering, enterprise intelligence, and information analytics. He has constructed a profession on turning complicated information challenges into actionable insights throughout industries together with retail, healthcare, logistics, iGaming, and the general public sector. He has led high-performing groups at organisations architecting information warehouses, constructing ETL pipelines processing tens of tens of millions of data every day, and delivering analytics. At present, because the Information Engineering Lead at tombola, he’s targeted on harnessing the facility of AWS providers to construct scalable, optimised information platforms that drive actual enterprise worth. He’s obsessed with engineering information infrastructure that’s not simply strong and environment friendly, however one which empowers groups to make sooner, smarter choices.

Akshay Srinivasan

Akshay Srinivasan

Akshay is a Information Engineer at tombola, the place he runs the Information Platform & Reliability pod, shaping the structure, scalability, and resilience of the corporate’s core information infrastructure throughout batch, streaming, and machine studying workloads. He favors open supply tooling and composable AWS providers, constructing platforms designed to be versatile and operationally sustainable. Over the previous eight years he has constructed information platforms from the bottom up throughout fintech, gaming, and enterprise environments, standing up greenfield infrastructure, automating complicated operational workflows, and engineering programs in domains the place information reliability instantly impacts regulatory and enterprise outcomes. Having labored with Amazon Redshift since 2017, he has seen its evolution first-hand, from early node varieties by means of to the trendy lakehouse capabilities the platform gives in the present day.

Sidhanth Muralidhar

Sidhanth Muralidhar

Sidhanth is a Principal Technical Account Supervisor at AWS, the place he companions with enterprise clients to design, scale, and optimize cloud-focused programs. He makes a speciality of guiding organizations by means of complicated architectural choices throughout value effectivity, reliability, efficiency, and operational excellence. His work more and more sits on the intersection of information programs and AI as effectively, serving to clients operationalize fashionable information architectures and construct clever, production-ready programs.

Vlad Siniavin

Vlad Siniavin

Vlad is a Sr. Technical Account Supervisor at AWS with over 15 years of expertise in constructing modern options, services. He’s pushed by delivering measurable outcomes for his clients – whether or not that’s lowering operational danger, optimising prices, or accelerating cloud adoption. He believes the perfect technical steerage begins with deeply understanding what issues most to the client and performing of their finest curiosity.

Deixe um comentário

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *