Operating Global Data Platform components (VM Servers, Kubernetes, Kafka) and applications (Apache stack, Collibra, Dataiku and similar).
Implement automation of infrastructure, security components, and Continuous Integration & Continuous Delivery for optimal execution of data pipelines (ELT/ETL).
You have 5+ years of experience in building or designing large-scale, fault-tolerant, distributed systems, (for example: data lakes, delta lakes, data meshes, data lake houses, data platforms, data streaming solutions…)
In-depth knowledge and experience in one or more large scale distributed technologies including but not limited to: Hadoop ecosystem, Kafka, Kubernetes, Spark
Migration experience of storage technologies (e.g. HDFS to S3 Object Storage)
Integration of streaming and file based data ingestion / consumption (Kafka, Control M, AWA)
Experience in DevOps, data pipeline development, and automation using Jenkins and Octopus (optional: Ansible, Chef, XL Release, and XL Deploy)
Expert in Python and Java or another static language like Scala/R, Linux/Unix scripting, Jinja templates, puppet scripts, firewall config rules setup
VM setup and scaling (pods), K8S scaling, managing Docker with Harbor, pushing Images through CI/CD