WorkHQ

Lead Data Engineer

Los Angeles, CA, US

15 days ago
Save Job

Summary

WorkHQ is an all-in-one recruiting platform that provides: 1. Database of 100M US professionals 2. Email and phone number lookup 3. Email outreach and sequencing 4. Applicant tracking system Recruiting well can set a company up for long term success, while poor recruiting can setup a company for failure. We are working on a bold mission to replace the current jumble of multiple expensive and confusing systems all into a single platform at an affordable price.

Company Context

Series A, well-funded US startup in HRTech developing WorkHQ.com and an AI Recruiter product.

This is a US-only, Remote role (Mainland).

Role Overview

Lead data infrastructure architect managing billions of data points across 250M+ professional profiles.

Hire data engineers to aid you in that journey.

Core Responsibilities

  • Design scalable data pipelines processing massive record volumes
  • Architect ETL processes using PySpark on Amazon EMR (Open to shifting to other solutions like Data Bricks / Snowflake)
  • Distribute enriched data through medallion architecture across Postgres, Athena, OpenSearch
  • Integrate new data sources into the main pipeline
  • Implement advanced data matching using Splink

Technical Requirements

  • 5-8 years professional data engineering experience
  • Good proficiency in:
    • PySpark and distributed computing
    • AWS data services (EMR, Glue, Athena)
    • Docker
    • Pandas and DataFrame manipulation
    • Complex data format handling (JSONL, Parquet)
  • Strong background in:
    • Big data processing architectures
    • Data warehouse design
    • Performance optimization
  • Advanced Python, SQL skills
Nice to Have

  • Probabilistic record linking expertise
  • OpenSearch/elasticsearch technologies
  • Machine learning data pipeline design
  • Recruitment tech ecosystem knowledge

Technical Stack

  • Big Data: PySpark, EMR
  • Databases: Postgres, OpenSearch
  • Cloud: AWS
  • Containerization: Docker
  • Data Formats: JSONL, Parquet
  • Analytics: Metabase, Athena, Glue
  • Data Processing: Pandas, Splink

Other Considerations

While this role has specific requirements - if you lack a few technical skills, but motivated to learn and lead the platform, please apply for consideration.

If you are coming from Director/Head of/VP levels that is relevant to this job, you can apply as well.

You will need to apply directly on our platform.

Thank you for your time.

The role requires 5-8 years of professional data engineering experience with proficiency in PySpark, AWS data services, Docker, and data manipulation using Pandas. Candidates should have a strong background in big data processing architectures, data warehouse design, and performance optimization, along with advanced skills in Python and SQL.

Join a well-funded US startup in HRTech with the opportunity to lead data infrastructure projects remotely. Work with cutting-edge technologies and a talented team in a dynamic environment.

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job

People also searched: