WorkHQ

Lead Data Engineer

Los Angeles, CA, US

15 days ago

Save Job

Summary

WorkHQ is an all-in-one recruiting platform that provides: 1. Database of 100M US professionals 2. Email and phone number lookup 3. Email outreach and sequencing 4. Applicant tracking system Recruiting well can set a company up for long term success, while poor recruiting can setup a company for failure. We are working on a bold mission to replace the current jumble of multiple expensive and confusing systems all into a single platform at an affordable price.

Company Context

Series A, well-funded US startup in HRTech developing WorkHQ.com and an AI Recruiter product.

This is a US-only, Remote role (Mainland).

Role Overview

Lead data infrastructure architect managing billions of data points across 250M+ professional profiles.

Hire data engineers to aid you in that journey.

Core Responsibilities

Design scalable data pipelines processing massive record volumes
Architect ETL processes using PySpark on Amazon EMR (Open to shifting to other solutions like Data Bricks / Snowflake)
Distribute enriched data through medallion architecture across Postgres, Athena, OpenSearch
Integrate new data sources into the main pipeline
Implement advanced data matching using Splink

Technical Requirements

5-8 years professional data engineering experience
Good proficiency in:

PySpark and distributed computing
AWS data services (EMR, Glue, Athena)
Docker
Pandas and DataFrame manipulation
Complex data format handling (JSONL, Parquet)

Strong background in:

Big data processing architectures
Data warehouse design
Performance optimization

Advanced Python, SQL skills

Nice to Have

Probabilistic record linking expertise
OpenSearch/elasticsearch technologies
Machine learning data pipeline design
Recruitment tech ecosystem knowledge

Technical Stack

Big Data: PySpark, EMR
Databases: Postgres, OpenSearch
Cloud: AWS
Containerization: Docker
Data Formats: JSONL, Parquet
Analytics: Metabase, Athena, Glue
Data Processing: Pandas, Splink

Other Considerations

While this role has specific requirements - if you lack a few technical skills, but motivated to learn and lead the platform, please apply for consideration.

If you are coming from Director/Head of/VP levels that is relevant to this job, you can apply as well.

You will need to apply directly on our platform.

Thank you for your time.

The role requires 5-8 years of professional data engineering experience with proficiency in PySpark, AWS data services, Docker, and data manipulation using Pandas. Candidates should have a strong background in big data processing architectures, data warehouse design, and performance optimization, along with advanced skills in Python and SQL.

Join a well-funded US startup in HRTech with the opportunity to lead data infrastructure projects remotely. Work with cutting-edge technologies and a talented team in a dynamic environment.

WorkHQ

Lead Data Engineer

Los Angeles, CA, US

Summary

How strong is your resume?

How strong is your resume?

People also searched:

Our Company

Career Guides

Career Advice

Support