We export the Spark event logs of each application running on the platform and sum up the duration of all the Spark tasks. It’s the same information that you can see on the Spark UI, reported by Spark, accurate down to the millisecond.
If you don’t run any Spark command, there is no Data Mechanics fee. This can happen if you run a pure Python/Scala application from a notebook or one that you've submitted through our API.
You will still incur costs from the cloud provider though, so it’s a good idea to shut down a notebook once you’re done with your work, so that all the pods are destroyed and the Kubernetes cluster can autoscale down.
As soon as an application finishes, our dashboard provides you this information. For recurring applications (we call them "jobs"), you can also track the evolution of your Data Mechanics costs, along with other key metrics, over time. Finally, we produce a billing report with detailed information and a useful cost attribution breakdown at the end of each month.
Get in touch with us by booking a demo so that we lean more about your use case and answer your questions about the platform. We'll then invite you to a shared Slack channel that we will use for most of our interactions and for live support. We will send you instructions on Slack on how to get started -- the first step is to grant Data Mechanics scoped permissions on the AWS, GCP, or Azure account of your choice.
The short answer is: we're cheaper than most competing platforms, because we operate on a serverless model, meaning we only charge your for compute time, not for server uptime.
If you're currently on another Spark platform, you're probably using a small number of static cluster and Spark configurations for most of your workloads. You're likely to suffer from resource overprovisioning, long periods of idleness and parallelization issues as shown in the graph below.
When these problems occur, other Spark platforms will charge you for the total server uptime, including the wasted compute time. At Data Mechanics, not only do we charge you solely for compute time, we also tune your configurations automatically and continuously for each of your Spark applications to eliminate the waste altogether.
Some of our customers have reduced their costs by over 50% since they migrated to our platform.
Our platform tunes the infrastructure and Spark configurations automatically for each pipeline to optimize performance and stability (e.g. memory and CPU sizing, parallelism and partitions configurations, shuffle and I/O improvements). Each application runs in full isolation and autoscales quickly to adapt to the load. Finally, you only pay for the real work being done (Spark tasks duration), not wasted server uptime.
Yes, you can. At the end of the month, you will get one bill from the cloud provider and one bill from Data Mechanics. Your cloud credits will apply to the cloud provider bill, which makes up the larger portion of your total costs.
You have control over the autoscaling behavior of the general Kubernetes cluster and of each unique Spark application. So you can, for example, set a maximum size for the cluster and for each unique Spark application.