The Cloud-Native Spark Platform for Data Engineers

Run continuously optimized Apache Spark workloads on a managed kubernetes cluster in your cloud account (AWS, GCP, or Azure).

Focus on your data while we handle the mechanics.

They trust us

Trust your production

Maintenance Is On Us

Our platform automates the tuning of infrastructure parameters and Spark configurations to make them fast and stable. Track your jobs stability and performance over time in our web interface along with key metrics and actionable insights on your data pipelines.

use your tools

Develop Locally, Run At Scale

Move seamlessly from local development to cloud execution through our IDE integrations. Take control of your dev environment and package your dependencies inside a Docker image. Bring the DevOps best practices to your data stack.

iterate quickly

Autoscaled Jupyter Notebooks

Connect Jupyter notebooks directly to the platform. Your application starts and scales in seconds to handle the work that has to be done. Mix single-machine and distributed workloads according to your needs as the platform automatically reclaims unused resources.

Deployment process

How it works

Data Mechanics is deployed on a Kubernetes cluster in your cloud account that we manage for you. Your sensitive data never leaves your account.

1. Connect your cloud account

Give us scoped permissions on your AWS, GCP or Azure account. We will deploy the platform on a Kubernetes cluster we manage for you.

2. Submit Spark applications

Attach a Jupyter notebook and start exploring interactively or submit jobs programmatically through our API or our Airflow operator.

3. You’re all set

Sit back and relax. Our dashboard is all you need to monitor your apps.


Need help ?

Frequently Asked Questions

If you have an infrequently asked question, use the chat in the bottom-right corner and our team will get back to you shortly.

Does Data Mechanics need access to my data?

No. The Data Mechanics platform is deployed in your cloud account and your sensitive data never leaves it. You can restrict incoming traffic and make the platform accessible only from your office IP and / or your VPN. You can control data access using security best practices like Role Based Access Control, Cloud Identity and Access Management, and secrets management solutions.

Which cloud providers do you support?

Our platform is available on GCP, AWS, and Azure. We use the managed Kubernetes offerings of these cloud providers to deploy Data Mechanics under the hood. You don't need to create or manage these Kubernetes clusters yourself - our software takes care of that.

Are you available on-premise?

Not yet. We focus on cloud deployments as they let us deploy and push releases quickly and in a standardized way. Support for on-premise deployments is on our roadmap. Contact us if you're interested.

Can I still see and control the infrastructure?

Yes. We automate infrastructure management to make your life simpler, but we do this in a transparent way. You can view and control the Kubernetes cluster that we manage for you using the cloud provider console, its CLI and API. Similarly, you can view and control the infrastructure parameters and Spark configurations used by each application. Your preferences take precedence over our autoscaling and automated tuning features.

How can I get started with a trial?

Get in touch with us by booking a demo so that we lean more about your use case and answer your questions about the platform. We'll then invite you to a shared Slack channel that we will use for most of our interactions and for live support. We will send you instructions on Slack on how to get started -- the first step is to grant Data Mechanics scoped permissions on the AWS, GCP, or Azure account of your choice.

What makes Data Mechanics serverless?

Our platform tunes the infrastructure and Spark configurations automatically for each pipeline to optimize performance and stability (e.g. memory and CPU sizing, parallelism and partitions configurations, shuffle and I/O improvements). Each application runs in full isolation and autoscales quickly to adapt to the load. Finally, you only pay for the real work being done (Spark tasks duration), not wasted server uptime.

Do you support spot/pre-emptible instances?

Yes. We have configuration templates ready to help guide you towards their adoption. It's typically recommended to only put Spark executors on spot nodes, and place the Spark driver on an on-demand node, to make your workloads resilient to spot kills.

Which languages do you support?

Your applications can be written in Python, Scala, Java, and SQL. Even though many of our features are primarily built for Spark, it's also possible to run pure Python, Scala, and Java applications on the platform with the same ease.

Which versions of Apache Spark do you support?

We currently support Apache Spark versions 2.4 and 3.0. We're always up to date with the newest versions of Spark and update our platform to support the latest version within a few days of release.

Can I use a custom Docker image to package my dependencies?

Yes. We offer a list of pre-compiled and optimized Spark images to choose from, and they should serve most of your use cases. For advanced use cases, you can use our image as a base, build your own Docker image on top of it, and then use it in the Data Mechanics platform. That’s one of the perks you get for free by using a container-based Spark platform. Visit our documentation page to learn about the other ways to work with dependencies.

Ready to get started?

🍪 We use cookies to optimize your user experience. By browsing our website, you agree to the use of cookies.
close
30