WorkHQ is an all-in-one recruiting platform that provides: 1. Database of 100M US professionals 2. Email and phone number lookup 3. Email outreach and sequencing 4. Applicant tracking system Recruiting well can set a company up for long term success, while poor recruiting can setup a company for failure. We are working on a bold mission to replace the current jumble of multiple expensive and confusing systems all into a single platform at an affordable price.
Company Context
Series A, well-funded US startup in HRTech developing WorkHQ.com and an AI Recruiter product.
This is a US-only, Remote role (Mainland).
Role Overview
Lead data infrastructure architect managing billions of data points across 250M+ professional profiles.
Hire data engineers to aid you in that journey.
Core Responsibilities
- Design scalable data pipelines processing massive record volumes
- Architect ETL processes using PySpark on Amazon EMR (Open to shifting to other solutions like Data Bricks / Snowflake)
- Distribute enriched data through medallion architecture across Postgres, Athena, OpenSearch
- Integrate new data sources into the main pipeline
- Implement advanced data matching using Splink
Technical Requirements
- 5-8 years professional data engineering experience
- Good proficiency in:
- PySpark and distributed computing
- AWS data services (EMR, Glue, Athena)
- Docker
- Pandas and DataFrame manipulation
- Complex data format handling (JSONL, Parquet)
- Strong background in:
- Big data processing architectures
- Data warehouse design
- Performance optimization
- Advanced Python, SQL skills
Nice to Have
- Probabilistic record linking expertise
- OpenSearch/elasticsearch technologies
- Machine learning data pipeline design
- Recruitment tech ecosystem knowledge
Technical Stack
- Big Data: PySpark, EMR
- Databases: Postgres, OpenSearch
- Cloud: AWS
- Containerization: Docker
- Data Formats: JSONL, Parquet
- Analytics: Metabase, Athena, Glue
- Data Processing: Pandas, Splink
Other Considerations
While this role has specific requirements - if you lack a few technical skills, but motivated to learn and lead the platform, please apply for consideration.
If you are coming from Director/Head of/VP levels that is relevant to this job, you can apply as well.
You will need to apply directly on our platform.
Thank you for your time.
The role requires 5-8 years of professional data engineering experience with proficiency in PySpark, AWS data services, Docker, and data manipulation using Pandas. Candidates should have a strong background in big data processing architectures, data warehouse design, and performance optimization, along with advanced skills in Python and SQL.
Join a well-funded US startup in HRTech with the opportunity to lead data infrastructure projects remotely. Work with cutting-edge technologies and a talented team in a dynamic environment.