We export the Spark event logs of each application running on the platform and sum up the duration of all the Spark tasks. It’s the same information that you can see on the Spark UI, reported by Spark, accurate down to the millisecond.
If you don’t run any Spark command, there is no Data Mechanics fee. This can happen if you run a pure Python/Scala application from a notebook or one that you've submitted through our API.
You will still incur costs from the cloud provider though, so it’s a good idea to shut down a notebook once you’re done with your work, so that all the pods are destroyed and the Kubernetes cluster can autoscale down.
As soon as an application finishes, our dashboard provides you this information. For recurring applications (we call them "jobs"), you can also track the evolution of your Data Mechanics costs, along with other key metrics, over time. Finally, we produce a billing report with detailed information and a useful cost attribution breakdown at the end of each month.
Get in touch with us by booking a demo so that we lean more about your use case and answer your questions about the platform. We'll then invite you to a shared Slack channel that we will use for most of our interactions and for live support. We will send you instructions on Slack on how to get started -- the first step is to grant Data Mechanics scoped permissions on the AWS, GCP, or Azure account of your choice.
We've achieved 50% to 75% cost reductions for customers who migrated their workloads from a competing platform. This is because we have features that will reduce your cloud costs (automated tuning of pipelines, a single shared cluster with fast autoscaling, spot nodes support), and because the management fee we charge is smaller than other Spark platforms too.
Competing Spark platforms charge you a management fee based on server uptime. This means that if an instance is up, but not running any Spark task, you pay their management fee. This situation happens a lot - up to 80% of the time on average, because of configuration mistakes, resource overprovisioning, long periods of idleness and imperfect parallelism. The graph below gives such an example of bad parallelism (most of the cluster is idle while it waits for a straggler task to finish).
At Data Mechanics, we only charge you our management fee when Spark tasks are running. This means you only pay a fee when your machines are actually used by Spark, and it becomes our job to tune your jobs to make your data infrastructure efficient and eliminate the wasted resources.
We've achieved 50 to 75% cost reductions for customers migrating their workloads from a competitor. If you'd like us to estimate the potential savings with more details, install our free and cross-platform monitoring tool Data Mechanics Delight on top of your current Spark platform, then get in touch with us by booking a demo. We will analyze the logs collected by Delight to estimate the savings we can generate for your Spark workloads.
Yes - on the cloud provider portion of your costs. At the end of the month, you will get one bill from the cloud provider and one bill from Data Mechanics. Your cloud credits will apply to the cloud provider bill, which makes up the larger portion of your total costs.
You have control over the autoscaling behavior of the general Kubernetes cluster and of each unique Spark application. So you can, for example, set a maximum size for the cluster and for each unique Spark application.