November 16, 2020

Jean-Yves Stephan

by

Jean-Yves Stephan

Data Mechanics is a cloud-based Spark platform - an alternative to Databricks, EMR, Dataproc, Azure HDInsight, and so forth - with a focus on making Spark easy-to-use and cost-effective for data engineers. It is deployed on a Kubernetes cluster inside our customers’ cloud account, and adds a lot of features on top of Spark on Kubernetes open source

But today, we’re not talking about a new feature of our platform. 

Today we’re releasing a web-based Spark UI which works on top of any Spark platform, whether it’s on-premise or in the cloud, over Kubernetes or over YARN, with a commercial service or running on open-source Apache Spark.

It consists of a dashboard listing your Spark applications after they have finished running, and a hosted Spark History Server that will back the Spark UI for this application at the click of a button. This project is partially open-sourced, and it is entirely free of charge.

How Can I Use it?

Create an account on https://delight.datamechanics.co/login.

You should use your company’s Google account if you want to share a single dashboard with your colleagues, or your personal Google account if you want the dashboard to be private to you. As of today, you need a Google account to access our dashboard, but additional sign-in methods will be added in the future. Once your account is created, go under Settings and create a personal access token. This will be needed in the next step.

Attach our open-source agent to your Spark applications.

Follow the instructions on our Github page. We have instructions available for the most common setups of Spark (Note: At this time, Databricks is not supported due to an incompability with our agent, we'll be working on a fix in the coming weeks). If you run into an issue, ask us a question, we’ll be happy to help.

Your applications will automatically appear on our dashboard once they complete (successfully or with a failure). Clicking on an application opens up the corresponding Spark UI. That’s it!

How Does It Work? Is It Secure?

This project consists of two parts:

  • An open-source Spark agent which runs inside your Spark applications. This agent will stream non-sensitive Spark event logs from your Spark application to our backend.
  • A closed-source backend consisting of a real-time logs ingestion pipeline, storage services, a web application, and an authentication layer to make this secure.

The agent collects your Spark applications event logs. This is non-sensitive information about the metadata of your Spark application. For example, for each Spark task there is metadata on memory usage, CPU usage, network traffic (view a sample event log). The agent does not record sensitive information such as the data that your Spark applications actually work on. The agent does not collect your application logs either -- as typically they may contain sensitive information.

This data is encrypted using your personal access token and sent over the internet using the HTTPS protocol. This information is then stored securely inside the Data Mechanics control plane behind an authentication layer. Only you and your colleagues from your Google/GSuite organization will be able to see your application in our dashboard. The collected data will automatically be deleted 30 days after your Spark application completion. 

What’s Next?

The release of this free and cross-platform hosted Spark History Server is our first step towards building a Spark UI replacement tool called Data Mechanics Delight. This will be a free and cross-platform Spark UI replacement with new metrics and visualizations that will "delight" you! Our announcement in June 2020 to build a Spark UI replacement had indeed generated a lot of interest from the Spark community. We’re targeting the next release for January 2021.

We know the current release is far from what Delight fans expect, but we hope it will still be valuable to the Spark community, as the Spark History Server is not always easy to set up. More importantly, the current release means we have built most of the base infrastructure of the project -- the Spark agent, a real-time logs collection pipeline, a storage system, an authentication layer and a webapp. We will now gradually add the new screens and visualizations that the community awaits.

The next release of Delight, scheduled in January 2021, will consist of an overview screen giving a bird’s-eye view of your applications’ performance. Links to specific jobs, stages or executor pages will still take you to the corresponding Spark UI pages until we gradually replace these pages too. If you’d like to be notified when the next release is out, fill out this form

Our mission at Data Mechanics is to make Spark easier-to-use and more cost-effective for data engineering workloads. We hope this tool will contribute to this goal and prove useful to the Spark community. We’d love your feedback about it!

Ready to get started?

Read more

Our Latest Blog Posts

Learn about company news, product updates, and technology best practices straight from the Data Mechanics engineering team.

Data + AI Summit Europe 2020 Highlights

Data + AI Summit 2020 Highlights: What’s new for the Apache Spark community? In this article we’ll go over the highlights of the conference, focusing on the new developments which were recently added to Apache Spark or are coming up in the coming months: Spark on Kubernetes, Koalas, Project Zen.

Tuesday, November 24, 2020

We’re releasing a free, cross-platform Spark UI and Spark History Server

Today we’re releasing a web-based Spark UI and Spark History Server which work on top of any Spark platform, whether it’s on-premise or in the cloud, over Kubernetes or YARN, with a commercial service or using open-source Apache Spark. This is our first step towards building Data Mechanics Delight - the new and improved Spark UI.

Monday, November 16, 2020

Spark on Kubernetes Made Easy: How Data Mechanics Improves On The Open-Source Version

How Is Data Mechanics different than running Spark on Kubernetes open-source? In this article, we explain how our platform extends and improves on Spark on Kubernetes to make it easy-to-use, flexible, and cost-effective. We'll go over our intuitive user interfaces, dynamic optimizations, and custom integrations

Tuesday, November 10, 2020

🍪 We use cookies to optimize your user experience. By browsing our website, you agree to the use of cookies.
close
30