4+ years of experience as a Data Engineer or in a similar role
Must-Have Skills
Big Data Processing: Hands-on experience with PySpark
Cloud Platforms: Experience with Google Cloud Platform (GCP) or any other cloud provider
Data Pipeline Development: Expertise in Spark, Hadoop, Hive
Database & Querying: Strong proficiency in SQL
Collaboration & Requirements Gathering: Ability to work with product managers and data stewards to translate data requirements into efficient workflows
Good-to-Have Skills
Experience with other cloud platforms like AWS or Azure
Familiarity with ETL tools and frameworks
Knowledge of data governance and data quality best practices
Exposure to real-time data processing technologies (Kafka, Flink, etc.)
Key Responsibilities
Design, implement, and optimize scalable data pipelines using Spark, Hadoop, Hive, and other technologies
Develop and maintain ETL processes for efficient data ingestion and transformation
Monitor and troubleshoot data pipelines to ensure high availability and minimal downtime
Work closely with cross-functional teams to understand data needs and deliver effective solutions
Optimize query performance and data storage strategies for improved efficiency
Ensure data integrity, quality, and security in compliance with best practices
Soft Skills & Competencies
Strong problem-solving skills and ability to work in a collaborative environment
Excellent communication skills, both written and verbal
Ability to adapt to evolving technologies and business needs
Education & Qualifications
Bachelor’s degree in Computer Science, Engineering, or a related field
Experience working in Agile or DevOps environments is a plus
Skills
Spark,Hadoop,Hive,Gcp
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job