UST

Lead I - Data Engineering

Bengaluru, KA, IN

10 days ago
Save Job

Summary

Role Description

Key Responsibilities:

  • Data Pipeline Development & Optimization:
    • Design, develop, and maintain scalable and high-performance data pipelines using PySpark and Databricks.
    • Ensure data quality, consistency, and security throughout all pipeline stages.
    • Optimize data workflows and pipeline performance, ensuring efficient data processing.
  • Cloud-Based Data Solutions:
    • Architect and implement cloud-native data solutions using AWS services (e.g., S3, Glue, Lambda, Redshift), GCP (DataProc, DataFlow), and Azure (ADF, ADLF).
    • Work on ETL processes to transform, load, and process data across cloud platforms.
  • SQL & Data Modeling:
    • Utilize SQL (including windowing functions) to query and analyze large datasets efficiently.
    • Work with different data schemas and models relevant to various business contexts (e.g., star/snowflake schemas, normalized, and denormalized models).
  • Data Security & Compliance:
    • Implement robust data security measures, ensuring encryption, access control, and compliance with industry standards and regulations.
    • Monitor and troubleshoot data pipeline performance and security issues.
  • Collaboration & Communication:
    • Collaborate with cross-functional teams (data scientists, software engineers, and business stakeholders) to design and integrate end-to-end data pipelines.
    • Communicate technical concepts clearly and effectively to non-technical stakeholders.
  • Domain Expertise:
    • Understand and work with domain-related data, tailoring solutions to address the specific business needs of the customer.
    • Optimize data solutions for the business context, ensuring alignment with customer requirements and goals.
  • Mentorship & Leadership:
    • Provide guidance to junior team members, fostering a collaborative environment and ensuring best practices are followed.
    • Drive innovation and promote a culture of continuous learning and improvement within the team.
Required Qualifications

  • Experience:
    • 6-8 years of total experience in data engineering, with 3+ years of hands-on experience in Databricks, PySpark, and AWS.
    • 3+ years of experience in Python and SQL for data engineering tasks.
    • Experience working with cloud ETL services such as AWS Glue, GCP DataProc/DataFlow, Azure ADF and ADLF.
  • Technical Skills:
    • Strong proficiency in PySpark for large-scale data processing and transformation.
    • Expertise in SQL, including window functions, for data manipulation and querying.
    • Experience with cloud-based ETL tools (AWS Glue, GCP DataFlow, Azure ADF) and understanding of their integration with cloud data platforms.
    • Deep understanding of data schemas and models used across various business contexts.
    • Familiarity with data warehousing optimization techniques, including partitioning, indexing, and query optimization.
    • Knowledge of data security best practices (e.g., encryption, access control, and compliance).
  • Agile Methodologies: Experience working in Agile (Scrum or Kanban) teams for iterative development and delivery.
  • Communication: Excellent verbal and written communication skills, with the ability to explain complex technical concepts to non-technical stakeholders.
Skills

Python,Databricks,Pyspark,Sql

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job

People also searched: