Data Pipeline Development: Design, build, and maintain scalable and reliable data pipelines using Apache Spark, Kafka, and Apache Flink for real-time data processing and large-scale batch data workflows.
Real-time Data Streaming: Implement and manage real-time data streaming architectures leveraging Apache Kafka to process and transmit high volumes of streaming data in a fault-tolerant manner.
Data Transformation and Orchestration: Develop data transformation workflows and integrate data from various sources while ensuring that pipelines are robust, efficient, and adhere to data engineering best practices.
Data Quality Assurance: Implement data validation, quality checks, and monitoring systems to ensure data integrity and consistency across the entire data pipeline.
Collaboration with Cross-functional Teams: Work closely with Data Scientists, Analysts, and other stakeholders to understand data requirements and provide reliable data infrastructure solutions.
Performance Optimization: Continuously monitor and optimize data processing performance, focusing on scaling solutions and improving efficiency.
Documentation & Best Practices: Maintain clear documentation for data pipelines, data structures, and processes. Advocate for industry-standard data engineering practices across the team.
Tool Expertise: Leverage tools like Looker for Business Intelligence (BI) and BigQuery (BQ) for data warehousing to support analytics and decision-making processes.
Requirements
Proficiency in Scala and SQL: Strong experience in writing scalable, efficient, and maintainable code in Scala, and querying complex datasets using SQL.
Apache Spark Experience: Solid hands-on experience with Apache Spark for large-scale data processing, including performance tuning, fault tolerance, and optimization.
Kafka Expertise: Proficient in working with Apache Kafka to set up, manage, and scale real-time data streaming solutions.
Real-time Processing with Flink: Familiarity with Apache Flink and its capabilities for building real-time data processing pipelines.
Data Engineering Best Practices: Demonstrated experience in implementing industry-standard practices for data transformation, orchestration, and ensuring high data quality.
Looker & BigQuery: Knowledge of Looker for business intelligence, as well as BigQuery for data warehousing and querying large datasets.
Problem-Solving & Analytical Thinking: Strong analytical and problem-solving skills with a focus on optimizing data workflows and architectures.
Collaboration & Communication: Excellent communication and collaboration skills, with the ability to work effectively with both technical and non-technical teams.
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job