Prisma Cloud is the main Cloud Safety platform that gives complete code-to-cloud visibility into your dangers and incidents, providing key remediation capabilities to handle and monitor your code-to-cloud journey. The platform at this time secures over 1B+ property or workloads throughout code to cloud globally. It secures a number of the most demanding environments with clients who’ve tens of 1000’s of cloud accounts that see fixed mutations and configuration modifications within the scale of trillions each hour.
All through this weblog we’ll evaluate Prisma Cloud’s historic strategy to constructing knowledge and AI into our merchandise, the challenges we bumped into with our current knowledge platform, and the way with Databricks Information Intelligence Platform, Prisma Cloud have achieved a transformative, enterprise-wide affect that immediately advantages each our clients and inner groups.
Prisma Cloud’s focus was to supply best-of-breed options inside every section/module after which present value-added safety features that assist tie indicators from totally different modules to ship deeper capabilities as a platform providing. Some examples embrace:
- Addressing posture points associated to infrastructure configuration and administration. Fixing these points in code and fostering an automation mindset assist stop them in manufacturing. Combining our Posture Administration providing with our Code Safety providing was important to make sure traceability and resolve points immediately in code.
- Visualizing and managing controls by a platform ‘data graph’ helps clients perceive how sources and workloads are related. This strategy permits them to evaluate findings and determine paths that pose larger considerations for a SOC administrator. Aggregating all indicators in a single place is essential for this course of.
Prisma Cloud is about up with over 10 modules, every being better of breed in its safety features and producing indicators to the platform. Clients can select to leverage the platform for his or her vertical wants (e.g. for vulnerability administration) or for the entire suite. The platform strategy encourages the shopper to discover adjoining areas, growing total worth and driving larger stickiness.
Prisma Cloud’s technical problem is basically a knowledge problem. With our speedy module growth—pushed by each natural innovation and M&As—growing a unified knowledge technique from scratch was a demanding activity. Nevertheless the imaginative and prescient was clear: with out a answer to consolidate all knowledge in a single place, we couldn’t totally ship the capabilities our clients want whereas harnessing the ability of best-of-breed modules.
As one of many largest adopters of GenAI, Palo Alto Networks has constructed its AI technique round three key pillars: leveraging AI to reinforce safety choices, securing AI to assist clients defend their AI utilization, and optimizing person expertise by AI-driven copilots and automation. See PrecisionAI for extra particulars.
Palo Alto Networks and Prisma Cloud had a robust historical past of deep AI/ML utilization throughout a number of merchandise and options lengthy earlier than the GenAI wave reshaped the trade. Nevertheless, the speedy evolution of AI capabilities accelerated the necessity for a long-term, complete knowledge technique.
Databricks ecosystem in Prisma Cloud Structure
We selected the Databricks Information Intelligence Platform as the most effective match for our strategic course and necessities, because it encompassed all of the essential facets wanted to assist our imaginative and prescient. With Databricks, we’ve considerably accelerated our knowledge consolidation efforts and scaled modern use circumstances—delivering measurable buyer advantages inside simply six months of rollout.
In simply the primary yr of integrating Databricks, Palo Alto Networks achieved a transformative, enterprise-wide affect that immediately advantages each our clients and inner groups. By centralizing knowledge workflows on the Databricks Platform, we considerably decreased complexity and accelerated innovation, enabling us to iterate on AI/ML options 3 times sooner than earlier than. Alongside this elevated pace, we realized a 20% discount in value of products offered and a 3x lower in engineering growth time.
Leveraging enhanced collaboration—fueled by Databricks Workflows, Databricks Unity Catalog for unified governance, and Databricks Auto Loader has allowed us to ship safety options at an unprecedented pace and scale. This has dramatically accelerated Prisma Cloud’s knowledge processing and enabled us to deliver impactful options to market sooner than ever earlier than.
The challenges of homegrown options
Prisma Cloud runs most of its infrastructure on AWS with a mature engineering tech stack constructed round AWS native companies. Our group had intensive expertise leveraging Apache Spark for ETL and analytical processing, working our infrastructure on AWS Glue and EMR.
Recognizing the necessity for a devoted knowledge platform, we initially developed a homegrown answer leveraging EMR, Glue and S3 as the muse for our preliminary model. Whereas this strategy labored properly with a small group, scaling it to assist a broader knowledge technique and adoption throughout a number of groups rapidly grew to become a problem. We discovered ourselves managing 1000’s of Glue jobs and a number of EMR clusters—all requiring enterprise-grade capabilities resembling monitoring, alerting, reliability checks, and governance/safety guardrails.
As our wants grew, so did the operational overhead. A good portion of our engineering effort was diverted to sustaining what had successfully change into an “Working System” for our knowledge platform somewhat than specializing in innovation and value-driven use circumstances.
Whereas this effort addressed our strategic wants, we quickly began working into a number of challenges in sustaining this model. A few of them are listed beneath
- Bespoke tooling and knowledge transformations – Groups spent appreciable time in a number of conferences simply to determine knowledge attributes, find them and design customized pipelines for every use case, slowing down growth and collaboration.
- Time-consuming infrastructure administration – With a number of tuning parameters on the core of our Spark jobs, we struggled to develop a scalable, generic change administration course of. This added important cognitive load to infrastructure groups liable for managing clusters.
- Value administration and budgeting – Managing EMR and Glue immediately required manually setting a number of guardrails, together with centralized observability throughout all stacks. As our tasks grew, so did the headcount necessities for sustaining a extra mature knowledge platform.
- Spark Administration – We additionally bumped into challenges round a number of the updates to the Spark core libraries not being supported on AWS which brought on a few of our jobs to be inefficient in comparison with what could be state-of-the-art. Inside AWS limits on executor administration pressured us into intensive troubleshooting and recurring conferences to find out root causes.
Regardless of these challenges, our homegrown answer continues to scale, processing tens of thousands and thousands of information mutations per hour for essential use circumstances. As we glance forward, we see a transparent must migrate to a extra mature platform—one that permits us to retire in-house tooling and refocus engineering efforts on securing our clients’ cloud environments somewhat than managing infrastructure.
Information structure and its evolution at Prisma Cloud
At Prisma Cloud, we observe the 8-factor rule for any technical analysis to evaluate its benefits and downsides. These components are analyzed by our inner technical management committee, the place we have interaction in discussions to succeed in a consensus. In circumstances the place an element can’t be adequately rated, we collect extra knowledge by business-relevant prototyping to make sure a well-informed determination.
The important thing components are listed beneath:
- Practical match – Does it resolve our enterprise wants?
- Structure/Design match – Is it aligned with our long-term technical imaginative and prescient?
- Developer adoption – How in style is it with builders at this time?
- Stability/Ecosystem – Are there large-scale enterprises utilizing this expertise?
- Deployment complexity – How a lot effort are we speaking about with its deployment and alter administration?
- Value – How do the COGs evaluate to the worth of the options we plan to supply to leverage this expertise?
- Comparative benchmarks – Are there current benchmarks that show comparable scale?
One among our key long-term objectives was the power to maneuver in the direction of a safety knowledge mesh mannequin. Given our platform strategy, we categorize knowledge into 3 basic varieties:
- Uncooked knowledge – This contains knowledge ingested immediately from producers or modules because it enters the platform. In Databricks lakehouse terminology – this corresponds to Bronze knowledge.
- Processed knowledge – The Prisma Cloud Platform is an opinionated platform, transforms uncooked knowledge into normalized platform objects. That is referred to as Processed knowledge, which aligns with the Silver knowledge layer in lakehouse terminology.
- Correlated knowledge – This class unlocks web worth by correlating totally different datasets, enabling superior insights and analytics. This corresponds to the Gold layer in lakehouse terminology.
Not like conventional knowledge lakes, the place Bronze knowledge is usually discarded, our platform’s breadth and depth necessitate a extra evolutionary strategy. Somewhat than merely reworking knowledge into Gold datasets, we envision our knowledge lake evolving into a knowledge mesh, permitting for larger flexibility, accessibility, and cross-domain insights. The diagram beneath displays the long-term functionality that we search to extract from our knowledge lake investments.
All of our assessments had been centered across the above philosophy.
Analysis outcomes
Other than checking all of the containers in our new expertise analysis framework, the next key insights additional cemented Databricks as our most well-liked knowledge platform.
- Simplification of current tech stack – Our infrastructure relied on a number of Glue and EMR jobs, a lot of which required ad-hoc tooling and repetitive upkeep. With Databricks, we recognized a possibility to scale back 30%-40% of our jobs, permitting our engineers to give attention to core enterprise options as an alternative of repairs.
- Value discount – We noticed no less than a 20% drop in current spend, even earlier than factoring in amortization with accelerated adoption throughout numerous use circumstances.
- Platform options and ecosystem – Databricks supplied speedy worth by options resembling JDBC URL publicity for knowledge consumption, built-in ML/AI infrastructure, automated mannequin internet hosting, enhanced governance and entry management, and superior knowledge redaction and masking. These capabilities had been essential as we upgraded our knowledge dealing with methods for each tactical and strategic wants.
- Coaching and adoption ease – Onboarding new engineers onto Databricks proved considerably simpler than having them construct scalable ETL pipelines from scratch on AWS. This lowered the barrier to entry and accelerated the adoption of Spark-based applied sciences, that are important at our scale.
Analysis particulars
Standards | EMR/GLUE (or Cloud Present native tech) | Databricks |
---|---|---|
Ease of Deployment | Every group must work on their deployment code. Typically a dash of labor. | One-time integration and groups will undertake. SRE work was decreased to a couple days. |
Ease of Admin | Sustaining variations and safety patches. SREs typically take a couple of days. | SRE work is now not wanted. |
Integrations | SRE must setup Airflow and ksql (typically a dash of labor for brand spanking new groups) | Out of the Field |
MLflow | Want to purchase a software or undertake open supply. Every group must combine. (A number of months first time, a dash of labor for every group). | Out of the Field |
Information Catalog(Requires Information lineage, safety, role-based entry management, searchable and tagging the info.) | Want to purchase instruments and combine with Prisma. | Out of the Field |
Leverage ML Libraries and Auto ML | Want to purchase and combine with Prisma. | Out of the Field |
SPOG for Builders and SRE | Not obtainable with EMR/GLUE. | Out of the Field |
DB sql(SQL on s3 knowledge) | Athena, Presto. SRE assist is required to combine with Prisma. | Out of the Field |
Software case research
Given our early pilots, we had been satisfied to start out planning a migration path from our current S3-based knowledge lake onto the Databricks Platform. An ideal alternative arose with a key insights mission that required entry to knowledge from each Uncooked and Correlated layers to uncover web new safety insights and optimize safety downside decision.
Earlier than adopting Databricks, executing this kind of mission concerned a number of advanced and time-consuming steps:
- Figuring out knowledge wants – A chicken-and-egg downside emerged: whereas we would have liked to outline our knowledge wants upfront, most insights required exploration throughout a number of datasets earlier than figuring out their worth.
- Integration complexity – As soon as knowledge wants had been outlined, we needed to coordinate knowledge with homeowners to determine integration paths—usually resulting in bespoke, one-off pipelines.
- Governance & entry management – As soon as all knowledge is obtainable, then we had to make sure correct safety and governance. This required guide configurations, with totally different implementations relying on the place the info resides.
- Observability and troubleshooting – With knowledge pipeline monitoring cut up throughout a number of groups, resolving points required important cross-team coordination, making debugging extremely use-case-specific.
We examined the affect of the Databricks Information Intelligence Platform on this essential mission by the next steps:
- Step 1: Infrastructure and Migration Planning
We bootstrapped Databricks in our dev environments and began planning the migration of our inhouse knowledge lake on S3 onto Databricks. We utilized Databricks Asset Bundles and Terraform for each the migration and our infrastructure and useful resource deployment.
Previous to adopting Databricks, engineers spent most of their time managing AWS infrastructure throughout numerous instruments. With Databricks, we have now a centralized platform to handle person and group cluster configurations.
Databricks presents an enhanced Spark atmosphere by Photon, offering a completely managed platform with optimized efficiency, whereas AWS primarily delivers Spark by its EMR service, which requires extra guide configuration and doesn’t obtain the identical degree of efficiency optimization as Databricks. Moreover, the power to construct, deploy, and serve fashions on Databricks has enabled us to scale extra quickly.
- Step 2: Structuring Workstreams for Scale
We divided the mission into 4 workstreams on the Databricks platform: Information Catalog Administration, Information Lake Hydration, Governance and Entry Management, and Dev Tooling/Automation.
Unity Catalog was important for constructing our platform, offering unified governance and entry controls in a single area. By using attribute-based entry management (ABAC) and knowledge masking, we had been capable of obfuscate knowledge as wanted with out slowing down growth time.
- Step 3: Accelerating Information Onboarding & Governance
Catalog registration and onboarding of our current knowledge in our knowledge lake took only some hours whereas establishing governance and entry management was a one-time effort.
Unity Catalog supplied a centralized platform for managing all permissions, simplifying the safety of our total knowledge property, together with each structured and unstructured knowledge. This encompassed governance for knowledge, fashions, dashboards, notebooks, and extra.
- Step 4: Scaling Information Hydration & Observability
We seamlessly built-in beforehand unavailable uncooked knowledge into our current knowledge lake and prioritized its hydration onto the Databricks Platform. Capitalizing on complete Kafka, database, and S3 integrations, we efficiently developed production-grade hydration jobs, scaling to trillions of rows inside just some sprints.
In manufacturing, we rely extensively on Databricks Workflows, whereas interactive clusters assist growth, testing, and efficiency environments devoted to constructing modern options for our Prisma Cloud product. Databricks Serverless SQL underpins our dashboards, guaranteeing environment friendly monitoring of mannequin drift and efficiency metrics. Furthermore, system tables empower us to pinpoint and analyze high-cost jobs and runs over time, observe important funds fluctuations, and foster efficient value optimization and useful resource administration.
This holistic strategy grants executives clear visibility into platform utilization and consumption, streamlining observability and budgeting with out counting on fragmented insights from a number of AWS instruments resembling EMR, Glue, SageMaker, and Neptune.
The consequence
This consolidation proved transformative. Inside a single week of prototyping, we uncovered useful insights by combining uncooked, processed, and correlated knowledge units, enabling a extra productive analysis of product-market match. Because of this, we gained clear course on which buyer challenges to pursue and a stronger understanding of the affect we might ship.
Inside simply six months of partnering with Databricks, we launched a pivotal safety innovation for our clients—an achievement that may have been nearly inconceivable given our former expertise stack, expansive buyer base, and the necessity to prioritize core safety features.
Databricks utilization stats
- ~3 Trillion data crunching per day.
- P50 processing time: < 30 minutes.
- Max parallelism: 24
- Auto Loader utilization drops ingest latencies to seconds, providing close to real-time processing.
- Out-of-the-box options, resembling AI/BI dashboards with system tables, helped growth groups determine and analyze high-cost jobs and runs over time, monitor important funds modifications, and assist efficient value optimization and useful resource administration.
Conclusion
Because the above utility case research confirmed, the timing of our development aligned with Databricks rising because the main knowledge platform of alternative. Our shared dedication to speedy innovation and scalability made this partnership a pure match.
By reframing the technical problem of cloud safety as a knowledge downside, we had been capable of hunt down expertise suppliers who had been consultants on this space. This strategic shift allowed us to give attention to depth, leveraging Databricks’ highly effective platform whereas making use of our area intelligence to tailor it for our scale and enterprise wants. Finally, this collaboration has empowered us to speed up innovation, improve safety insights, and ship larger worth to our clients.
Learn extra concerning the Databricks and Palo Alto Networks collaboration right here.