Tarento Group

Data Engineer

about 1 month ago

Save Job

Key Responsibilities

Data Pipeline Development: Design, build, and maintain scalable and reliable data pipelines using Apache Spark, Kafka, and Apache Flink for real-time data processing and large-scale batch data workflows.
Real-time Data Streaming: Implement and manage real-time data streaming architectures leveraging Apache Kafka to process and transmit high volumes of streaming data in a fault-tolerant manner.
Data Transformation and Orchestration: Develop data transformation workflows and integrate data from various sources while ensuring that pipelines are robust, efficient, and adhere to data engineering best practices.
Data Quality Assurance: Implement data validation, quality checks, and monitoring systems to ensure data integrity and consistency across the entire data pipeline.
Collaboration with Cross-functional Teams: Work closely with Data Scientists, Analysts, and other stakeholders to understand data requirements and provide reliable data infrastructure solutions.
Performance Optimization: Continuously monitor and optimize data processing performance, focusing on scaling solutions and improving efficiency.
Documentation & Best Practices: Maintain clear documentation for data pipelines, data structures, and processes. Advocate for industry-standard data engineering practices across the team.
Tool Expertise: Leverage tools like Looker for Business Intelligence (BI) and BigQuery (BQ) for data warehousing to support analytics and decision-making processes.

Requirements

Proficiency in Scala and SQL: Strong experience in writing scalable, efficient, and maintainable code in Scala, and querying complex datasets using SQL.
Apache Spark Experience: Solid hands-on experience with Apache Spark for large-scale data processing, including performance tuning, fault tolerance, and optimization.
Kafka Expertise: Proficient in working with Apache Kafka to set up, manage, and scale real-time data streaming solutions.
Real-time Processing with Flink: Familiarity with Apache Flink and its capabilities for building real-time data processing pipelines.
Data Engineering Best Practices: Demonstrated experience in implementing industry-standard practices for data transformation, orchestration, and ensuring high data quality.
Looker & BigQuery: Knowledge of Looker for business intelligence, as well as BigQuery for data warehousing and querying large datasets.
Problem-Solving & Analytical Thinking: Strong analytical and problem-solving skills with a focus on optimizing data workflows and architectures.
Collaboration & Communication: Excellent communication and collaboration skills, with the ability to work effectively with both technical and non-technical teams.