๐
Overview
Join
TechX as we continue expanding our AI infrastructure team and delivering impactful GenAI-powered products for enterprise and industry clients.
We are looking for an experienced
Platform Engineer to build and operate the core infrastructure that powers the safe, reliable, and efficient delivery of our GenAI solutions. This role is at the heart of how we scale AI applications in production environments โ ensuring observability, automation, cost control, and compliance for our large language model (LLM) operations.
โก
Note: This is not a prompt engineering or model tuning role. Instead, you will architect and manage the infrastructure that enables AI teams to operate Gemini Pro/Flash models at scale.
๐ฏ Key Responsibilities
โ
Own LLMs-Oriented Platform Architecture
Design platform components that abstract LLM's (eg, Gemini) APIs into a consistent, testable, and production-ready interface.
Handle retries, latency tracking, fallback switching, and configuration routing logic.
โ
Design Multi-Version Prompt Configuration Management
Manage prompt and parameter versions across deployments.
Track version statuses (active, canary, deprecated), maintain changelogs, and ensure rollback safety.
โ
Build Observability & Cost Intelligence for Gemini Usage
Define structured logs and metrics for Gemini interactions.
Monitor latency, feedback scores, token usage, and cost estimates.
Develop dashboards and alerts to catch performance regressions or anomalies.
โ
Enable Safe, Automated Rollbacks
Implement health scoring, statistical deviation logic, and automated rollback mechanisms.
Maintain robust audit logs, cooldown strategies, and โlast known goodโ states.
โ
Secure Integration & Configuration Safety
Manage API keys and configuration securely using GCP-native tools (Secret Manager, IAM).
Enforce log redaction and PII masking.
Design version-aware deployment hooks and readiness checks.
๐ ๏ธ
Key Requirements
Must-Have Skills
- GCP + Gemini Integration: Proven experience integrating with Google Gemini APIs (Pro/Flash), with a deep understanding of request structures, cost models, latency behaviors, and operational best practices.
- Python Engineering: Strong Python backend development skills, particularly with asynchronous frameworks like FastAPI or similar, capable of building robust and scalable backend services.
- Observability Design: Expertise in designing structured logging and metrics for APIs, using formats like JSON or EMF, and implementing structured feedback tracking systems to ensure reliable monitoring and performance analysis.
- Prompt and Configuration Versioning: Hands-on experience working with version-controlled configuration systems or registries, such as YAML or JSON-based setups, GitOps workflows, or similar, to manage prompt versions and deployment safety.
- Automation and CLI Tooling: Ability to develop internal tooling and automation scripts (e.g., CLI tools for configuration management or rollback operations), including audit logging and safety mechanisms.
- Security and Compliance: Familiarity with GCP Identity and Access Management (IAM), secure API key handling, log masking and redaction strategies for PII, configuration gating, and readiness for audit compliance in production environments.
โจ
Extra / Nice-to-Have Skills
Experience working with OpenAI, Claude, or AWS Bedrock (in addition to Gemini).
Experience designing model abstraction layers or runtime LLM routing.
Exposure to token cost modeling or billing/reporting APIs for LLMs.
Familiarity with AI security best practices in cloud environments.
๐ค
Collaboration Scope
Work closely with
Prompt Engineers to monitor version health and feedback.
Partner with
AI Architects to optimize Gemini performance and integration.
Coordinate with
Product & Operations for cost reporting, SLAs, and system health.
Engage with the
DevOps (AWS) Team for hybrid observability and CI/CD processes.
๐
Experience Level
4โ6+ years in backend engineering, platform engineering, or SRE roles.
Prior experience deploying and monitoring
AI/ML workloads (GCP preferred; multi-cloud a plus).
Bonus: Direct hands-on usage of Gemini APIs or managing LLM configurations in production.
๐
Why Join TechX?
Take ownership of
Gemini observability and integration at scale.
Lead the
GCP / Gemini-first strategy while collaborating across hybrid cloud environments.
Be part of a forward-thinking team, building
mission-critical GenAI platforms for regulated industries.
Competitive salary, modern engineering culture, and career growth opportunities.