How Kyligence Cloud makes use of Amazon EMR Serverless to simplify OLAP







This put up was co-written with Daniel Gu and Yolanda Wang, from Kyligence.

Immediately, greater than ever, organizations understand that fashionable enterprise runs on information—nearly all our interactions with enterprise are based mostly on information, and organizations should use analytics to grasp, plan, and enhance their operations. That’s the place On-line Analytical Processing (OLAP) is available in. OLAP is designed to handle and analyze large information, enabling organizations to make use of their information to extract enterprise insights in a number of dimensions.

Kyligence Cloud OLAP resolution provides an Clever OLAP Platform to simplify multi-dimensional analytics for cloud information lakes. Up to now, Kyligence used to deploy and preserve its personal Spark clusters based mostly on Amazon Elastic Compute Cloud (Amazon EC2) to deal with the multi-dimensional mannequin pre-computing course of that required customers to construct their monitoring and alerting techniques to enhance the observability and reliability of the Spark cluster. On this put up, we current how Kyligence constructed and end-to-end Kyligence Cloud OLAP resolution with Amazon EMR Serverless to simplify deployment and operations, scale back prices, and speed up time-to-value over the information lake.

What’s Amazon EMR Serverless?

Amazon EMR Serverless is an enormous information cloud platform for working large-scale distributed information processing jobs, and machine studying (ML) purposes utilizing open-source analytics frameworks like Apache Spark and Apache Hive. Amazon EMR Serverless makes it straightforward and cost-effective for information engineers and analysts to run purposes with out having to tune, function, optimize, safe, or handle clusters.

What’s OLAP?

OLAP is an method to rapidly reply analytics queries at excessive speeds on giant volumes of information, offering capabilities for precomputation, refined information modeling, and multi-dimensional analytics by rolling up giant, typically separate datasets right into a multi-dimensional database generally known as an OLAP Dice that allows “slicing and dicing” of information from completely different viewpoints for a streamlined question expertise. Apache Kylin, Apache Druid, and ClickHouse are a few of the common OLAP instruments.

Though OLAP instruments have been efficiently utilized in numerous industries, they nonetheless face many challenges:

  • Dependence on IT organizations – Conventional OLAP instruments require complicated infrastructure to run large-scale information computing. It requires a big workforce of IT professionals to function and preserve this infrastructure, leading to excessive prices.
  • Want for giant compute assets – Conventional OLAP instruments want large quantity of computing assets for processing, and remodeling information by means of a collection of particular steps towards a concrete objective. Lack of computational capabilities results in longer response occasions, limits the quantity of information that may be processed, and impedes the flexibleness of the OLAP device significantly . In consequence, information analysts are sometimes confined to slender datasets, incapable of analyzing all the information freely.
  • Inefficient utilization of assets within the cloud – When a large-scale information modeling calculation is carried out within the cloud, the fee estimation instruments estimate and deploy the corresponding computing assets. Nonetheless, the utilization price of assets is usually not very excessive, leading to inefficient utilization of assets.

With OLAP built-in with Amazon EMR Serverless, OLAP instruments can use Amazon EMR Serverless as a serverless computing useful resource pool to finish information processing jobs, which simplifies and enhances consumer expertise.

Kyligence method to OLAP utilizing Amazon EMR Serverless

Kyligence is an AWS ISV accomplice that gives an Clever OLAP Platform to simplify multi-dimensional analytics for cloud information lakes. As a cloud-native OLAP platform, Kyligence Cloud now integrates with Amazon EMR Serverless to routinely provision Spark to run indexing and constructing jobs. This empowers you to make use of all of the options and advantages of Kyligence’s OLAP with Amazon EMR Serverless.

Kyligence seamlessly connects to main AWS-native information sources together with Amazon Easy Storage Service (Amazon S3), Amazon Redshift, and Amazon Relational Database Service (Amazon RDS) to get essentially the most out of your information on AWS, constructing a complete AWS large information resolution. Throughout information modeling, Kyligence makes use of Amazon S3 to retailer the pre-computed information, and serves it for prime concurrency queries. Kyligence additionally seamlessly interfaces with common enterprise intelligence (BI) instruments resembling Tableau, Microsoft Energy BI, and Microsoft Excel to offer wealthy, built-in information visualization and self-service instruments.

The next diagram illustrates the Kyligence Cloud structure on AWS.

What you possibly can anticipate from Kyligence Cloud on AWS

This resolution provides the next advantages:

  • Excessive efficiency – With AWS’s international infrastructure and the distributed computing capabilities of Amazon EMR, Kyligence provides a scalable, cost-effective, high-performance OLAP engine for multi-dimensional analytics. It permits vital information purposes and large-scale interactive analytics, and helps you obtain sub-second question response occasions and excessive concurrency on PB-scale information.
  • Auto-scaling – Kyligence Cloud’s computing assets could be expanded with one click on, and as load decreases, cluster dimension could be routinely lowered. This auto-scaling functionality gives optimized prices with service stability.
  • Excessive compatibility – Kyligence Cloud gives a wealthy set of APIs (ODBC, JDBC, Relaxation API, Python Consumer) and commonplace ANSI-SQL and XMLA/MDX interface, which could be simply built-in with common analytics instruments like Tableau, Microsoft Excel, Microsoft Energy BI, and information science instruments like Python.
  • Safety and reliability – With Amazon S3, Amazon RDS, Kyligence enterprise-level safety features, and AWS Id and Entry Administration (IAM) assist, Kyligence Cloud safely manages entry to the providers and assets deployed on AWS whereas supporting multi-level entry management of information fashions, tables, and cells to make sure information safety and privateness safety.
  • One-click deployment on AWS – Kyligence Cloud is accessible in AWS Market. The deployment is accomplished routinely based mostly on an AWS CloudFormation template and parameter settings. Kyligence performs automated cluster operation and upkeep, and elastic rule-based cluster scaling, which lightens the workload for IT directors and cloud infrastructure groups. Kyligence additionally provides a fast deployment methodology within the Kyligence Cloud Portal.

How Amazon EMR Serverless integrates with OLAP

With Amazon EMR Serverless, Kyligence Cloud gives out-of-the-box managed Apache Spark providers. The Kyligence engine can distribute the compute job to Apache Spark in Amazon EMR Serverless. With the automated on-demand provisioning and scaling capabilities of Amazon EMR Serverless, Kyligence can rapidly meet altering processing necessities at any information quantity.

The next diagram illustrates Kyligence Cloud built-in with Amazon EMR Serverless.

Advantages of utilizing Kyligence Cloud with Amazon EMR Serverless

Up to now, Kyligence used to deploy and preserve its personal Spark clusters based mostly on Amazon Elastic Compute Cloud (Amazon EC2) to deal with the multi-dimensional mannequin pre-computing course of that required Kyligence customers to construct their monitoring and alerting techniques to enhance the observability and reliability of the Spark clusters.

Now, working Kyligence on Amazon EMR Serverless provides a less expensive, and high-performance strategy to run cloud analytics on AWS:

  • Simplified deployment on the cloud – With managed providers, you don’t want to think about the lifecycle of the underlying infrastructure and assets. This significantly reduces software complexity and simplifies the deployment of Kyligence Cloud.
  • Enhance efficiency on the cloud – With the assistance of Amazon EMR Serverless, it gives a refined scaling technique, which can assist Kyligence Cloud spin up and recycle assets quicker. In Kyligence efficiency benchmark testing, we noticed 15–20% quicker efficiency in comparison with open-source Spark cluster for index constructing.
  • Scale back the issue of operation and upkeep – With the assistance of Amazon EMR Serverless capabilities, operation and upkeep personnel can simply preserve the capability and working standing of computing assets with out having to grasp the underlying evaluation framework.
  • Price-optimization on the cloud – Amazon EMR Serverless gives a refined scaling technique that may routinely decide the assets that the applying wants, acquires these assets to course of your jobs, and releases the assets when the roles full. You solely pay for the assets utilized by the applying, which helps scale back the Whole Price of Operations (TCO) on the cloud.

Get began with Kyligence Cloud on Amazon EMR Serverless

You may get began with the total potential of Kyligence Cloud on the AWS Market or rapidly take a look at drive Kyligence.

To make use of Amazon EMR Serverless, you simply want to pick out Serverless Spark on the Construct Cluster tab throughout deployment.


Utilizing managed and scalable providers like Amazon EMR Serverless permits Kyligence customers to hurry up self-service analytics on giant volumes of information, and preserve a comparatively simplified structure. With this resolution, now you can consider enterprise calls for as a substitute of technical points.

About Kyligence

Kyligence was based in 2016 by the unique creators of Apache Kylin™, the main open-source OLAP for large information. Kyligence provides an Clever OLAP Platform to simplify multi-dimensional analytics for cloud information lakes.

For extra info, go to Kyligence.

In regards to the authors

Daniel Gu is a senior product supervisor on the Kyligence Cloud Crew, who manages services and conducts analysis to find out the viability of merchandise within the cloud.

Yolanda Wang is a senior product advertising supervisor at Kyligence, who owns the positioning, messaging, and branding of Kyligence merchandise and works with numerous groups to drive go-to-market methods.

Kiran Guduguntla is a WW Go-to-Market Specialist for Amazon EMR at AWS. He works with AWS prospects throughout the globe to strategize, construct, develop, and deploy fashionable information analytics options.


Share this


What companies are using big data analytics

What do companies use big data for? What companies are using big data analytics. There are a multitude of reasons companies use big data, but...

How to use big data in healthcare

What is data quality and why is it important in healthcare? How to use big data in healthcare. In healthcare, data quality is important for...

How to build a big data platform

What is big data platform? How to build a big data platform. A big data platform is a powerful platform used to manage and analyze...

Recent articles

More like this