Job Title: AI Quality Engineer
Location: Muscat, Oman(Onsite)
About the Prana Tree LLC :
Prana Tree LLC is an innovative IT consulting firm specializing in building next-generation business applications. We are committed to leveraging cutting-edge technologies to develop scalable, high-performance solutions for our clients across industries such as eCommerce, manufacturing, and supply chain. Join us in a dynamic, fast-paced start-up environment where your ideas and skills make a real impact.
About the Role:
We are seeking a highly skilled AI Quality Engineer to support our National Large Language Model (LLM) Project, with a specific focus on Arabic language processing. This pivotal role involves establishing and enforcing comprehensive data quality frameworks, evaluation methodologies, and quality gates throughout the LLM development lifecycle. You will ensure our Arabic LLM achieves world-class performance, reliability, and cultural relevance before deployment to 20,000 government employees.
Key Responsibilities
- Design and implement data quality frameworks for LLM training and evaluation.
- Establish and enforce quality gates across all phases: data preparation, model training, evaluation, and RAG implementation.
- Define acceptance criteria and secure stakeholder sign-offs at each quality gate.
- Develop annotation quality metrics targeting >90% inter-annotator agreement and >95% contextual/cultural accuracy.
- Implement QA processes for text normalization, diacritics standardization, and dialect mapping.
- Optimize tokenization with >98% vocabulary coverage and >95% morphological accuracy.
- Develop robust RAG (Retrieval-Augmented Generation) quality measurement frameworks.
- Build real-time dashboards for automated quality monitoring and alerts.
- Define and execute testing protocols for evaluation across diverse NLP tasks.
- Establish regression testing frameworks to safeguard model quality during updates.
- Design processes for bias detection and mitigation in training data and outputs.
- Benchmark model performance against global LLM standards.
- Lead human evaluation campaigns to assess qualitative aspects of model responses.
- Collaborate with annotation teams to ensure accurate and high-quality ground truth datasets.
- Maintain thorough documentation on QA processes, metrics, guidelines, and acceptance criteria.
- Participate in weekly quality governance meetings and RAG evaluation reviews.
Requirements:
- Bachelor’s or Master’s degree in Computer Science, AI, Machine Learning, or related field.
- 4+ years of experience in AI/ML quality assurance, with an emphasis on NLP and LLMs.
- Proven experience with LLM evaluation frameworks and benchmarking techniques.
- Expertise in setting quality gates and defining acceptance criteria for AI systems.
- Strong understanding of NLP challenges and Semitic language intricacies.
- Experience with multi-level annotation reviews and metrics-driven QA.
- Skilled in RAG system evaluation (retrieval and generation components).
- Proficient in Python and testing/QA libraries for machine learning.
- Background in statistical analysis, human evaluation, and continuous quality tracking.
- Understanding of fairness, bias detection, and responsible AI practices.
Preferred Qualifications
- Experience in government or enterprise-scale LLM evaluations.
- Knowledge of NLP benchmarks and datasets.
- Experience using platforms like Scale AI, Human loop, or similar tools.
- Familiarity with hallucination detection, prompt quality assessment, and factual consistency validation.
- Knowledge of ROUGE, BLEU, BERT Score, and custom evaluation metrics.
- MLOps experience including CI/CD quality gates and A/B testing frameworks.
- Background in security, privacy, and UX testing for AI systems.
- Participation in quality governance boards or national standards bodies.
What We Offer
- Opportunity to contribute to a nationally significant and socially impactful AI project.
- Competitive compensation package.
- Collaboration with leading researchers and engineers in LLM development.
- A fast-paced, innovation-driven environment that values ownership and growth.
- Professional development in cutting-edge AI quality assurance