Fission Labs

Lead Data Engineer - Python/Spark

Hyderabad, TS, IN

2 months ago
Save Job

Summary

Role : Senior/Lead-Data Engineer & AI

About Us

Headquartered in Sunnyvale, with offices in Dallas & Hyderabad, Fission Labs is a leading software development company, specializing in crafting flexible, agile, and scalable solutions that propel businesses forward. With a comprehensive range of services, including product development, cloud engineering, big data analytics, QA, DevOps consulting, and AI/ML solutions, we empower clients to achieve sustainable digital transformation that aligns seamlessly with their business goals.

Role Overview

A Senior/Lead Data Engineer is responsible for overseeing the design, development, and management of data infrastructure and pipelines within an organisation. This role involves a mix of technical leadership, project management, and collaboration with other teams to ensure the efficient collection, storage, processing, and analysis of large datasets. The Lead Data Engineer typically manages a team of data engineers, architects, and analysts, ensuring that data workflows are scalable, reliable, and meet the business's requirements.

Responsibilities

  • Lead the design, development, and maintenance of data pipelines and ETL processes architect and implement scalable data solutions using Databricks and AWS.
  • Optimize data storage and retrieval systems using Rockset, Clickhouse, and CrateDB.
  • Develop and maintain data APIs using FastAPI.
  • Orchestrate and automate data workflows using Airflow.
  • Collaborate with data scientists and analysts to support their data needs.
  • Ensure data quality, security, and compliance across all data systems.
  • Mentor junior data engineers and promote best practices in data engineering.
  • Evaluate and implement new data technologies to improve the data infrastructure.
  • Participate in cross-functional projects and provide technical leadership.
  • Manage and optimize data storage solutions using AWS S3, implementing best practices for data lakes and data warehouses.
  • Implement and manage Databricks Unity Catalog for centralized data governance and access control across the organization.

Qualifications

  • Bachelor's or Master's degree in Computer Science, Engineering, or related field
  • 5+ years of experience in data engineering, with at least 3 years in a lead role
  • Strong proficiency in Python, PySpark, and SQL
  • Extensive experience with Databricks and AWS cloud services
  • Hands-on experience with Airflow for workflow orchestration
  • Familiarity with FastAPI for building high-performance APIs
  • Experience with columnar databases like Rockset, Clickhouse, and CrateDB
  • Solid understanding of data modeling, data warehousing, and ETL processes
  • Experience with version control systems (e.g., Git) and CI/CD pipelines
  • Excellent problem-solving skills and ability to work in a fast-paced environment
  • Strong communication skills and ability to work effectively in cross-functional teams
  • Knowledge of data governance, security, and compliance best practices
  • Proficiency in designing and implementing data lake architectures using AWS S3
  • Experience with Databricks Unity Catalog or similar data governance and metadata management tools

Preferred Qualifications

  • Experience with real-time data processing and streaming technologies
  • Familiarity with machine learning workflows and MLOps
  • Certifications in Databricks, AWS
  • Experience implementing data mesh or data fabric architectures
  • Knowledge of data lineage and metadata management best practices

Tech Stack

  • Databricks, Python, PySpark, SQL, Airflow, FastAPI, AWS (S3, IAM, ECR, Lambda), Rockset, Clickhouse, CrateDB

(ref:hirist.tech)

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job