DevOps Engineer
Job Description
Numrah is a fast-growing and innovative company that focuses on developing and publishing top-notch mobile apps and games. Our team is passionate about creating engaging experiences that captivate users worldwide. As we continue to expand our portfolio, we are seeking a skilled and motivated User Acquisition Specialist to join our talented marketing team.
Job Description:
Numrah is seeking a highly experienced DevOps Engineer to join our dynamic team. The ideal candidate will have a strong background in Google Cloud Platform (GCP), Kubernetes, PostgreSQL, real-time application management, high-availability services, and site reliability engineering (SRE). This role is essential to our mission of building resilient, scalable, and efficient infrastructure for our applications, ensuring seamless performance and availability for our clients.
Responsibilities
Infrastructure Management: Design, implement, and manage scalable infrastructure on Google Cloud Platform (GCP) to support highly available and secure applications.
Kubernetes Orchestration: Manage, deploy, and scale containerized applications using Kubernetes, ensuring optimal resource usage and smooth operation in production environments.
Database Administration: Configure, monitor, and optimize PostgreSQL databases to ensure data integrity, high availability, and performance for real-time applications.
Site Reliability Engineering: Develop and implement SRE practices to monitor, automate, and improve system reliability, performance, and availability, aiming for zero downtime and high service availability.
Real-Time Application Support: Work with engineering teams to ensure that real-time applications run smoothly, with fast response times and minimal latency, focusing on performance tuning and troubleshooting as necessary.
CI/CD and Automation: Develop and maintain CI/CD pipelines for code deployment, infrastructure changes, and automated testing, focusing on reducing lead time and improving deployment speed.
Monitoring & Incident Management: Set up and manage monitoring systems, logging, and alerting for early detection of issues, leading incident response and postmortem analysis to continuously improve the infrastructure.
Performance Optimization: Identify bottlenecks and optimize the performance of services, including the tuning of Kubernetes clusters, databases, and GCP resources
Requirements
Experience: 2+ years of experience as a DevOps Engineer, Site Reliability Engineer, or similar role, with a strong focus on Google Cloud Platform.
Technical Expertise:In-depth knowledge of Google Cloud Platform services, including Compute Engine, Cloud Storage, VPC, GKE, Cloud Pub/Sub, Cloud SQL and BigQuery.
Proficiency in Kubernetes (preferably GKE), including deployment, scaling, and troubleshooting.
Strong experience with PostgreSQL administration, including clustering, replication, and backup strategies.
Background in supporting real-time applications and ensuring low-latency, high-performance environments.
Skills:
Advanced knowledge of CI/CD tools (such as Google Cloud Deploy, Github Actions, or equivalent) and Infrastructure as Code (IaC) using Terraform or equivalent.
Familiarity with observability tools like Prometheus, Grafana, Stackdriver, and ELK/EFK stack.
Proven experience implementing SRE principles and working with SLAs, SLIs, and SLOs to improve service reliability.
Soft Skills: Strong analytical skills, a collaborative approach to problem-solving, and excellent communication skills with both technical and non-technical stakeholders.
Compensation
Not disclosed.
Additional Information
Please let Numrah know that you found the position through Waivly Work as it supports us to be able to keep sharing exciting new positions.
More job openings
Access more jobs and powerful resources
Join Waivly Work Premium to access exclusive listings, land a job faster, and unlock powerful resources like templates and advice. We verify all job listings for accuracy and legitimacy.