We are seeking an experienced Data Quality Analyst with over 10 years of expertise to join our innovative AI company, where high-quality data is the cornerstone of groundbreaking solutions. This role focuses on ensuring the integrity, accuracy, and reliability of data using SQL, Python, Great Expectations, and Delta Expectations to support AI-driven analytics and machine learning pipelines. The ideal candidate will design and implement robust data quality frameworks, enabling trustworthy data for AI applications in a dynamic, technology-forward environment.
Responsibilities
Develop and maintain data quality frameworks using Great Expectations and Delta Expectations to validate, monitor, and ensure the reliability of data in AI pipelines.
Write and optimize complex SQL queries to profile, analyze, and validate datasets, identifying data quality issues across AI and analytics workflows.
Leverage Python for scripting, automation, and integration of data quality checks into AI data pipelines and processes.
Collaborate with data engineers, AI scientists, and business stakeholders to define data quality requirements and metrics for AI-driven use cases.
Monitor and audit data pipelines in real-time, ensuring compliance with data governance and quality standards critical to AI model performance.
Identify, investigate, and resolve data anomalies, inconsistencies, and errors, providing actionable insights to improve data reliability.
Design and implement automated data quality tests and validation rules to support scalable AI data ecosystems.
Create comprehensive reports and dashboards to communicate data quality metrics and trends to technical and non-technical stakeholders.
Mentor junior analysts and promote best practices in data quality management, particularly for AI applications.
Stay current with advancements in data quality tools, AI data requirements, and industry standards, recommending improvements to enhance data trustworthiness.
Requirements
Essential Skills:
Job
SQL: Advanced proficiency in writing complex queries for data profiling, validation, and anomaly detection in AI datasets.
Python: Strong expertise in scripting, automation, and building custom data quality checks for AI pipelines.
Great Expectations: Deep experience in implementing and managing data quality frameworks to validate and document datasets in AI environments.
Delta Expectations: Proficiency in using Delta Expectations to ensure data quality and integrity in Delta Lake-based AI workflows.
Comprehensive understanding of data quality principles, including completeness, accuracy, consistency, and timeliness.
Experience with cloud data platforms (e.g., Azure, Databricks, or AWS) for managing and validating large-scale AI datasets.
Knowledge of data governance, lineage, and metadata management to support AI-driven applications.
Personal
Analytical thinker with a sharp eye for identifying and resolving data quality issues in AI contexts.
Excellent communication skills to articulate data quality findings and recommendations to diverse stakeholders.
Detail-oriented with a commitment to upholding the highest standards of data integrity.
Collaborative mindset with a proactive approach to driving data quality excellence.
Ability to prioritize and manage multiple tasks in a fast-paced, innovation-driven environment.
Preferred Skills:
Job
Familiarity with additional data quality tools (e.g., Apache Griffin, Deequ) for cross-tool adaptability.
Experience with cloud-native platforms like Azure Synapse Analytics, Databricks, or Snowflake for AI data validation.
Knowledge of PySpark or Scala for processing and validating large-scale datasets in AI pipelines.
Exposure to AI/ML data requirements, such as feature validation or drift detection, to ensure model-ready data.
Certifications in data quality, data engineering, or cloud technologies (e.g., Microsoft Certified: Azure Data Engineer, Databricks Certified Data Engineer).
Personal
Leadership potential to guide junior team members and shape data quality strategies.
Adaptability to embrace evolving AI and data quality technologies, refining processes as needed.
Strategic vision to align data quality frameworks with long-term AI and business objectives.
Passion for ensuring data trustworthiness to unlock AI innovation.
Other Relevant Information
Bachelor’s degree in Computer Science, Data Science, Information Systems, or a related field.
Advanced degrees or certifications in data quality, data engineering, or AI are highly valued.
Benefits
This role offers the flexibility of working remotely in India.
LeewayHertz is an equal opportunity employer and does not discriminate based on race, color, religion, sex, age, disability, national origin, sexual orientation, gender identity, or any other protected status. We encourage a diverse range of applicants.