Preferences

Privacy is important to us, so you have the option of disabling certain types of storage that may not be necessary for the basic functioning of the website. Blocking categories may impact your experience on the website. More information

Accept all cookiesClose button

These items are required to enable basic website functionality.

Always active

These items are used to deliver advertising that is more relevant to you and your interests.

These items allow the website to remember choices you make (such as your user name, language, or the region you are in) and provide enhanced, more personal features.

These items help the website operator understand how its website performs, how visitors interact with the site, and whether there may be technical issues.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Cookies

Data developer - Databricks

Hybrid
Full-time
Contractor
by agreement
Developer

Overview

We are looking for a Data Developer with a specialization in Databricks, who will be responsible for developing, optimizing and managing data pipelines on Spark, Delta Lake and Unity Catalog. The role is focused on working with very large datasets (hundreds of millions of rows, TB volumes), tuning performance and efficiently using cluster resources.

Mission

  • Develop and optimize data pipeline in Apache Spark on Databricks platform
  • Work with Unity Catalog and manage data objects within the governance model
  • Optimize pipeline performance over tables of 500+ million rows and >1 TB of data
  • Tune and configure Databricks clusters for maximum efficiency and resource utilization
  • Optimize Delta Lake tables (layout, partitioning, Z-ordering, vacuum, compaction)
  • Collaborate with architects, data engineers and analysts on data platform development

Skills

  • At least 1 year of experience with Unity Catalog
  • At least 2 years of experience on min. 2 projects with:
    • tuning and optimizing Spark pipeline
    • working with tables of min. 500 million rows
    • working with data of at least 1 TB**
    • optimizing cluster settings for maximum resource utilization
  • At least 2 years of experience with Delta Lake debugging on tables > 1 TB

Advantage

  • Experience with Azure (ADF, Synapse, Data Lake Storage)
  • Knowledge of CI/CD for data pipeline
  • Experience with Spark job performance monitoring
  • Experience from the financial sector

Benefits

  • Great colleagues and fully flexible work policy
  • Career coaching and development
  • Flexible working hours
  • Technical training and workshops
  • Technical equipment for work (Mac / Windows)
  • Company parties
  • Company psychologist for mental well-being
  • Multisport card
Apply for this job