Active Top Secret/SCI Clearance with Polygraph (REQUIRED)
Are you passionate about harnessing data to solve some of the nation’s most critical challenges? Do you thrive on innovation, collaboration, and building resilient solutions in complex environments?
Join a high-impact team at the forefront of national security, where your work directly supports mission success. We're seeking a Data Engineer with a rare mix of curiosity, craftsmanship, and commitment to excellence. In this role, you’ll design and optimize secure, scalable data pipelines while working alongside elite engineers, mission partners, and data experts to unlock actionable insights from diverse datasets.
Requirements
Engineer robust, secure, and scalable data pipelines using Apache Spark, Apache Hudi, AWS EMR, and Kubernetes
Maintain data provenance and access controls to ensure full lineage and auditability of mission-critical datasets
Clean, transform, and condition data using tools such as dbt, Apache NiFi, or Pandas
Build and orchestrate repeatable ETL workflows using Apache Airflow, Dagster, or Prefect
Develop API connectors for ingesting structured and unstructured data sources
Collaborate with data stewards, architects, and mission teams to align on data standards, quality, and integrity
Provide advanced database administration for Oracle, PostgreSQL, MongoDB, Elasticsearch, and others
Ingest and analyze streaming data using tools like Apache Kafka, AWS Kinesis, or Apache Flink
Perform real-time and batch processing on large datasets in secure cloud environments (e.g., AWS GovCloud, C2S)
Implement and monitor data quality and validation checks using tools such as Great Expectations or Deequ
Work across agile teams using DevSecOps practices to build resilient full-stack solutions with Python, Java, or Scala
Required Skills
Experience building and maintaining data pipelines using Apache Spark, Airflow, NiFi, or dbt
Proficiency in Python, SQL, and one or more of: Java, Scala
Strong understanding of cloud services (especially AWS and GovCloud), including S3, EC2, Lambda, EMR, Glue, Redshift, or Snowflake
Hands-on experience with streaming frameworks such as Apache Kafka, Kafka Connect, or Flink
Familiarity with data lakehouse formats (e.g., Apache Hudi, Delta Lake, or Iceberg)
Experience with NoSQL and RDBMS technologies such as MongoDB, DynamoDB, PostgreSQL, or MySQL
Ability to implement and maintain data validation frameworks (e.g., Great Expectations, Deequ)
Comfortable working in Linux/Unix environments, using bash scripting, Git, and CI/CD tools
Knowledge of containerization and orchestration tools like Docker and Kubernetes
Collaborative mindset with experience working in Agile/Scrum environments using Jira, Confluence, and Git-based workflows
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job