Tip· Working knowledge on Hadoop Architecture and ecosystems such as HDFS, configuration of nodes, YARN, MapReduce, Sentry, Spark, Falcon, HBase, Hive, Pig, Sentry, Ranger.
· Having almost 5 years of experience in Big Data Developer.
· Deep knowledge in incremental imports, partitioning and bucketing concepts in Hive and Spark SQL needed for optimization.
· Have good problem solving and analytical skills and love to innovate in order to perform better.
· Have strong Interpersonal skills, communication skills and people skills to manage a team.
· Proficient in creating and managing Hive tables, including managed, external, and partitioned tables
· Expertise in querying Hive tables using SQL-like syntax and performing data analysis using tools like Apache Spark.
· Familiarity with Hive query optimization techniques, such as subquery unnesting, predicate pushdown, and vectorization, and their impact on query performance and resource utilization.
· Expertise in using Spark RDD transformations and actions to process large-scale structured and unstructured data sets, including filtering, mapping, reducing, grouping, and aggregating data.
· Designed and implemented end-to-end data integration solutions using Sqoop for large-scale data migrations from on-premise databases to Hadoop clusters.
· Skilled in working with binary and textual data formats in Spark, such as CSV, JSON, and XML, and their serialization and deserialization using Spark DataFrames and RDDs.
s: Provide a summary of the role, what success in the position looks like, and how this role fits into the organization overall.
Responsibilities
· Designed, developed, and maintained ETL (Extract, Transform, Load) pipelines for efficient data integration.
· Built and optimized SQL queries, stored procedures, and database functions to improve data processing efficiency.
· Developed and maintained data models, ensuring database normalization and performance optimization.
· Database Management & Optimization
· Managed relational and NoSQL databases, ensuring data integrity and performance.
· Implemented indexing, partitioning, and query tuning techniques to optimize database performance.
· Ensured data security and compliance with industry regulations and best practices.
· Data Processing & Transformation
· Worked with structured and unstructured data to create meaningful insights.
· Designed and implemented Spark jobs for natural language understanding.
· Collaborated with data governance teams for data cataloging.
· Developed custom Spark solutions for fraud prevention.
· Utilized Spark for time-series forecasting.
· Conducted Spark job performance tuning.
· Worked with Spark for market basket analysis.
· Created Spark applications for click fraud detection.
Qualifications
· B.sc-CS(55%) from Barkatulla University Bhopal (M.P) 2018