Virtusa

Data Engineering Architect (ATC)

Bangalore Division, KA, IN

18 days ago
Save Job

Summary

Experience

8+ years of experience in data engineering, specifically in cloud environments like AWS.

Proficiency in PySpark for distributed data processing and transformation.

Solid experience with AWS Glue for ETL jobs and managing data workflows.

Hands-on experience with AWS Data Pipeline (DPL) for workflow orchestration.

Strong experience with AWS services such as S3, Lambda, Redshift, RDS, and EC2.

Technical Skills

Proficiency in Python and PySpark for data processing and transformation tasks.

Deep understanding of ETL concepts and best practices.

Familiarity with AWS Glue (ETL jobs, Data Catalog, and Crawlers).

Experience building and maintaining data pipelines with AWS Data Pipeline or similar orchestration tools.

Familiarity with AWS S3 for data storage and management, including file formats (CSV, Parquet, Avro).

Strong knowledge of SQL for querying and manipulating relational and semi-structured data.

Experience with Data Warehousing and Big Data technologies, specifically within AWS.

Additional Skills

Experience with AWS Lambda for serverless data processing and orchestration.

Understanding of AWS Redshift for data warehousing and analytics.

Familiarity with Data Lakes, Amazon EMR, and Kinesis for streaming data processing.

Knowledge of data governance practices, including data lineage and auditing.

Familiarity with CI/CD pipelines and Git for version control.

Experience with Docker and containerization for building and deploying applications.

Design and Build Data Pipelines: Design, implement, and optimize data pipelines on AWS using PySpark, AWS Glue, and AWS Data Pipeline to automate data integration, transformation, and storage processes.

ETL Development: Develop and maintain Extract, Transform, and Load (ETL) processes using AWS Glue and PySpark to efficiently process large datasets.

Data Workflow Automation: Build and manage automated data workflows using AWS Data Pipeline, ensuring seamless scheduling, monitoring, and management of data jobs.

Data Integration: Work with different AWS data storage services (e.g., S3, Redshift, RDS) to ensure smooth integration and movement of data across platforms.

Optimization and Scaling: Optimize and scale data pipelines for high performance and cost efficiency, utilizing AWS services like Lambda, S3, and EC2.

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job