Develop and implement quantitative software applications to process and analyze real-time financial market data in a high-performance computing environment
Maintain and optimize existing software applications, and recommend and implement improvements
Design, develop, solutions using Python and PySpark in spark/Hadoop environment.
Build, optimize, and troubleshoot ETL workflows to extract, transform, and load data from diverse sources into big data platforms
Collaborate with data engineers, data scientists, and business stakeholders to gather requirements and deliver data-driven solutions
Ensure data quality, consistency, and integrity throughout the data lifecycle
Monitor, debug, and optimize data processing jobs for performance and scalability
Document technical solutions, workflows, and best practices for knowledge sharing
Qualifications and Skills:
Must to have skills
Proven experience in automation development with Python and PySpark. Excellent coding skills and ability to write stable, maintainable, and reusable code.
Strong knowledge of data manipulation and visualization tools (e.g., Pandas, Matplotlib, Seaborn).
Familiarity with Linux/OS X command line, version control software (git).
Strong understanding of big data ecosystems (e.g., Apache Spark, Hadoop) and distributed computing
Experience with SQL/NoSQL databases such as: MySQL and mongo DB.
Good understanding with RESTful APIs.
Good to have skills –
Understanding in Statistics, e.g., hypothesis formulation, hypothesis testing, descriptive analysis and data exploration.