Globant

Data Engineer Lead - Vietnam

Hanoi, Hanoi, VN

8 days ago
Save Job

Summary

We are a digitally native company that helps organizations reinvent themselves and unleash their potential. We are the place where innovation, design, and engineering meet scale. Globant is a 21 year old, NYSE-listed public organization with more than 29000 employees worldwide working out of 33 countries globally.

www.globant.com

Our expertise in Data & AI allows our studio to create a wide variety of end to end solutions for industries including finance, travel, media & entertainment, retail, and health, among others. We democratize data and foster organizational changes towards a data-driven culture.

Location: Vietnam (HN/DN - Hybrid, HCM - Remote)

Experience: 7+ years

YOU WILL GET THE CHANCE TO:

As a Data Engineer Lead, you will be domain agnostic; hence, you'll be able to work in different domains like finance, pharmaceutical, media and entertainment, manufacturing, hospitality, and so on. You will get the exposure to the following:

  • Lead a Data Engineer team from multi-locations
  • Get an opportunity to work on different projects, deal with ELT/ETL jobs, create complex SQL queries, and create analytical tables as per the requirements of the Data Science team.
  • Contribute to innovative accelerators and develop tools, methodologies, and frameworks that can be used to accelerate the development of data science solutions.
  • Build and maintain data pipelines that support data movement, deployment, and monitoring, ensuring that they meet the highest standards for quality and reliability.
  • Hands-on experience on the Google Cloud platform which includes exposure to BigQuery, GCS, DataFlow, Cloud Functions, Cloud Composer, and Cloud Scheduler.
  • Design and develop multiple POCs/POVs for existing customers or prospective leads. Preserving the knowledge through this research and innovation and utilizing it to enhance the overall capability of the Artificial Intelligence Studio.
  • Proactively interacts with the client and takes important technical decisions regarding design and architecture. Establish and maintain relationships with clients, acting as a trusted advisor and identifying opportunities for new or expanded business.
  • Agree on scope, priorities, and deadlines with the project managers.
  • Describe problems, provide solutions, and communicate clearly and accurately.
  • Assure the overall technical quality of the solution.
  • Estimate the time of development tasks and perform difficult/critical coding tasks.
  • Defining metrics and setting objectives in multiple complex tasks.


WHAT WILL HELP YOU SUCCEED?

  • Good communication, client facing and stakeholders management skills
  • Experience in leading/managing a team
  • A strong base in SQL along with Data Warehousing concepts is mandatory. Strong experience in advanced data pipelines and data analysis techniques. You should be well versed with dbt (Data Build Tool) as a data transfer tool in the BigQuery.
  • An expert professional with a lot of zeal to learn and explore new methodologies. You must work in a collaborative environment and come up with innovative ideas to continuously improve the solutions/models. Candidates must be willing to explore and research newer areas/technology/algorithms and look to continuously improve the models.


Core Technical Skills

  • Programming: Strong programming skills in Python for data manipulation and pipeline development, and Expertise in SQL for data querying and analysis.
  • Data Pipeline Architecture: Designing, building, and maintaining robust, scalable, and fault-tolerant data pipelines to handle near real-time and batch data processing.
  • Cloud Platforms: Deep understanding of GCP services like Cloud Storage, BigQuery, Pub/Sub, Dataflow, Cloud Functions, Cloud Scheduler, and Cloud Composer.
  • Good to have ‘Google Cloud Certified Professional Data Engineer’ certified.
  • Data Processing Frameworks: Proficiency in Apache Beam, Spark, or similar frameworks for batch and stream processing.
  • Data Modeling: Ability to design efficient data models for storing and querying video and image data.
  • Metadata Management: Understanding of metadata standards and tools for managing data lineage and quality.
  • Data Quality: Implementing data validation and cleaning processes.
  • Performance Optimization: Identifying and resolving performance bottlenecks in data pipelines.
  • Monitoring and Alerting: Setting up monitoring and alerting systems to track pipeline health and performance.
  • Basic knowledge of MLOps
  • The candidate should know the basics of Big Data and similar tools of the Hadoop ecosystem. ❖ Should be aware of terminologies such as SCD, Star/Galaxy Schemas, Data Lake, Data Warehouse, etc.
  • Good to have an understanding of Data Governance and Data Security.
  • Good knowledge of code versioning and familiarity with Git and Azure DevOps are musts. Specific Skills for Computer Vision
  • Video Data Understanding: Knowledge of video codecs, formats, and metadata.
  • Data Synchronization: Understanding techniques for synchronizing data from multiple sources.
  • Time Series Data: Experience with handling and processing time-series data.
  • Data Labeling: Familiarity with data labeling processes and tools.
  • Machine Learning Collaboration: Ability to work closely with data scientists to understand their data needs and provide necessary support.


Tools and Technologies

  • GCP Services:
  • Cloud Storage for storing video data,
  • BigQuery for data warehousing,
  • Pub/Sub for real-time messaging,
  • Dataflow for batch and stream processing,
  • Cloud Functions for event-driven functions,
  • Cloud Scheduler for job scheduling,
  • Cloud Composer for workflow orchestration,
  • Cloud Monitoring for monitoring and alerting.
  • Data Processing Frameworks: Apache Beam or Spark for data processing.
  • Version Control: Git for code management.
  • CI/CD: Tools like Cloud Build for continuous integration and deployment. We use Azure DevOps
  • Data Visualization: Tools like Tableau/Looker/Data Studio for data exploration and visualization.
  • Cloud-based Video Processing Tools: Consider using GCP's Video Intelligence API or other specialized tools for video analysis.


Additional Considerations

  • Robustness and Fault Tolerance: Implement error handling, retries, and backup mechanisms.
  • Scalability: Design pipelines to handle increasing data volumes and processing needs.
  • Security: Ensure data privacy and security by following GCP security best practices.
  • Cost Optimization: Optimize resource utilization and costs.
  • Documentation: Maintain clear and up-to-date documentation of the data pipeline.

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job