Primarily a PySpark ETL and SQL Developer. Needs to be super strong in PySpark.
Rest below is good to have.
5-6 years of development experience with Oracle and Big Data Hadoop platform on Data Warehousing and/or Data Integration projects in agile environment.
Understanding of business requirements as well as technical aspects.
Good knowledge of Big Data, Hadoop, Hive, Impala database, data security and dimensional model design.
Strong knowledge in analyzing data in data warehouse environment with Cloudera Bigdata Technologies (Hadoop, MapReduce, Sqoop, PySpark, Spark, HDFS, Hive, Impala, StreamSets, Kudu, Oozie, Hue, Kafka, Yarn, Python, Flume, Zookeeper, Sentry, Cloudera Navigator) and Oracle SQL/PL-SQL.
Strong knowledge in writing complex SQL queries (Oracle and Hadoop (Hive/Impala etc.)).
Knowledge in analyzing the log files, error files for any data ingestion failures.
Experience in writing Python/Impala scripts
Tokenization or Data masking knowledge
Experience working in Medicaid and healthcare domain is preferred.
Participate in Team activities, Design discussions, stand ups, sprint planning and execution meetings with team.
Perform data analysis, data profiling and data quality assessment in various layers using big data/Hadoop/Hive/Impala and Oracle SQL queries.
Job Type: Contract
Pay: $60.00 - $80.00 per hour
Benefits:
Dental insurance
Health insurance
Vision insurance
Schedule:
8 hour shift
Monday to Friday
Experience:
pyspark: 4 years (Required)
SQL: 2 years (Required)
Data warehouse: 1 year (Required)
ETL: 2 years (Required)
Bigdata: 3 years (Required)
Willingness to travel:
100% (Required)
Work Location: On the road