We are seeking a highly skilled Azure Data Engineer to design, build, and maintain robust data pipelines using a comprehensive suite of Azure services including ADF, ADLS, Databricks, Synapse, and more. This role requires a strong understanding of data engineering principles, proficient Python coding skills, and expertise in utilizing Azure data tools to efficiently manage large datasets.
Key Responsibilities
Data Pipeline Development:
Design, develop, and implement scalable data pipelines using Azure Data Factory (ADF) to ingest, transform, and load data from diverse sources into data warehouses and data lakes.
Azure Databricks Expertise:
Leverage Databricks platform to build and optimize complex data processing workflows using PySpark, including data cleaning, feature engineering, and data transformation.
Delta Lake Management:
Utilize Delta Lake on Databricks to manage large-scale data sets with efficient data versioning and time travel capabilities.
Data Lake Storage (ADLS):
Design and manage data storage architecture on ADLS, ensuring optimal data organization and accessibility for downstream applications.
Azure Synapse Analytics:
Utilize Synapse pipelines and Spark pools for data integration and complex data analysis tasks.
Data Quality Assurance:
Implement robust data quality checks and monitoring mechanisms to ensure data accuracy and integrity throughout the pipeline.
Collaboration:
Work closely with cross-functional teams including data analysts, business analysts, and application developers to understand data requirements and deliver data solutions that align with business objectives.
Performance Optimization:
Continuously monitor and optimize data pipeline performance to ensure scalability and efficient data processing.
Required Skills And Experience
Strong Azure Expertise:
Proven experience in designing and implementing data solutions using a wide range of Azure data services, including ADF, ADLS, Databricks, Synapse, and Azure Cosmos DB.
Programming Proficiency:
Excellent proficiency in Python programming with a strong understanding of data manipulation libraries like Pandas and NumPy.
Spark Development:
Extensive experience with Apache Spark (PySpark) for large-scale data processing and distributed computing.
Delta Lake Knowledge:
Familiarity with Delta Lake format for data lake management and data versioning.
Data Modeling:
Ability to design and implement data models for data warehouses and data lakes, considering data quality and performance considerations.
Cloud Architecture:
Understanding of cloud computing concepts, distributed systems, and best practices for building scalable cloud-based data pipelines.
Problem-Solving:
Strong analytical skills to troubleshoot data issues and optimize pipeline performance.
Communication Skills:
Effective communication to collaborate with stakeholders and clearly articulate technical concepts.
Preferred Qualifications
Azure Data Engineer Associate certification
Experience with data visualization tools like Power BI or Tableau
Knowledge of data governance practices and data security principles
Experience with DevOps practices for data pipeline automation and deployment