Must be local to TX
Must Have:
Gen AI
LLM
Python
Spark
Pyspark
Hadoop
Description:
Integral team member of our Data Engineering team responsible for design and development of Big data solutions Partner with domain experts, product managers, analyst, and data scientists to develop Big Data pipelines in Hadoop or Snowflake Responsible for delivering data as a service framework
Responsible for moving all legacy workloads to cloud platform
Work with data scientist to build Client pipelines using heterogeneous sources and provide engineering services for data science applications
Ensure automation through CI/CD across platforms both in cloud and on-premises
Ability to research and assess open source technologies and components to recommend and integrate into the design and implementation
Be the technical expert and mentor other team members on Big Data and Cloud Tech stacks
Define needs around maintainability, testability, performance, security, quality and usability for data platform
Drive implementation, consistent patterns, reusable components, and coding standards for data engineering processes
Convert SAS based pipelines into languages like PySpark, Scala to execute on Hadoop and non-Hadoop ecosystems
Tune Big data applications on Hadoop and non-Hadoop platforms for optimal performance
Evaluate new IT developments and evolving business requirements and recommend appropriate systems alternatives and/or enhancements to current systems by analyzing business processes, systems and industry standards.
Supervise day-to-day staff management issues, including resource management, work allocation, mentoring/coaching and other duties and functions as assigned
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency
Roles and Responsibilities:
Proficiency in Python, SQL, and AI/ML frameworks (Transformers, LangChain, OpenAI, Hugging Face, PyTorch, TensorFlow).
Strong understanding of AI cost optimization strategies, including serverless inference, model distillation, quantization, and GPU efficiency tuning.
Experience deploying AI models in cloud environments (AWS, Azure, GCP), including model orchestration and MLOps (Vertex AI, SageMaker, Azure ML).
Track record of deploying and scaling AI solutions in production, ensuring reliability, latency optimization, and cost-effective serving.
Strong analytical skills, problem-solving abilities, and experience working in cross-functional AI teams.
Excellent communication and stakeholder management skills, with the ability to align AI initiatives with business impact and scalability.
Hands-on experience with LLM fine-tuning, Retrieval-Augmented Generation (RAG), prompt engineering, and vector databases (Pinecone, Weaviate, FAISS, Milvus).
Skills:
3-5 years of experience:
4+ years of experience in hadoop/big data technologies.
3+ years of experience in spark.
2+ years’ experience in Snowflake
2+ year of experience working on Google or AWS cloud developing data solutions. Certifications preferred.
Hands-on experience with Python/Pyspark/Scala and basic libraries for machine learning is required;
Experience with containerization and related technologies (e.g. Docker, Kubernetes)
Experience with all aspects of DevOps (source control, continuous integration, deployments, etc.)
1 year Hadoop administration experience preferred
1+ year of SAS experience preferred.
Comprehensive knowledge of the principles of software engineering and data analytics
Advanced knowledge of the Hadoop ecosystem and Big Data technologies Hands-on experience with the Hadoop eco-system (HDFS, MapReduce, Hive, Pig, Impala, Spark, Kafka)
Knowledge of agile (scrum) development methodology is a plus
Strong development/automation skills
Proficient in programming in Java or Python with prior Apache Beam/Spark experience a plus.
System level understanding - Data structures, algorithms, distributed storage & compute
Can-do attitude on solving complex business problems, good interpersonal and teamwork skills