-
How Grab Leveraged Performance Marketing Automation to Improve Conversion Rates by 30%
Read to find out how Grab's Performance Marketing team leveraged on automation to improve conversion rates.
-
One Small Step Closer to Containerising Service Binaries
Learn how Grab is investigating and reducing service binary size for Golang projects.
-
-
Serving Driver-partners Data at Scale Using Mirror Cache
Find out how a team at Grab used Mirror Cache, an in-memory local caching solution, to serve driver-partners data efficiently.
-
Trident - Real-time Event Processing at Scale
Find out where the messages and rewards come from, that arrive on your Grab app. Walk through scaling and processing optimisations that achieve tremendous throughput.
-
Pharos - Searching Nearby Drivers on Road Network at Scale
Learn how Grab stores driver locations and how these locations are used to find nearby drivers around you.
-
How Grab is Blazing Through the Super App Bazel Migration
Learn how we planned and started migrating our super app to Bazel at Grab.
-
Democratizing Fare Storage at Scale Using Event Sourcing
Read how we built Grab's single source of truth for fare storage and management. In this post, we explain how we used the Event Sourcing pattern to build our fare data store.
-
Keeping 170 Libraries Up to Date on a Large Scale Android App
Learn how we maintain our libraries and prevent defect leaks in our Grab Passenger app.
-
Optimally Scaling Kafka Consumer Applications
Read this deep dive on our Kubernetes infrastructure setup for Grab's stream processing framework.
-
Our Journey to Continuous Delivery at Grab (Part 1)
Continuous Delivery is the principle of delivering software often, every day. Read more to find out how we implemented continuous delivery at Grab.
-
Uncovering the Truth Behind Lua and Redis Data Consistency
Redis does not guarantee the consistency between master and its replica nodes when Lua scripts are used. Read more to find out why and how to guarantee data consistency.
-
Securing and Managing Multi-cloud Presto Clusters with Grab’s DataGateway
This blog post discusses how Grab's DataGateway plays a key role in supporting hundreds of users in our entire Presto ecosystem - from managing user access, cluster selection, workload distribution, and many more.
-
Go Modules- A Guide for monorepos (Part 2)
This is the second post on the Go module series, which highlights Grab’s experience working with Go modules in a multi-module monorepo. Here, we discuss the additional solutions for addressing dependency issues, as well as cover automatic upgrades.
-
The Journey of Deploying Apache Airflow at Grab
This blog post shares how we designed and implemented an Apache Airflow-based scheduling and orchestration platform for teams across Grab.
-
How We Built Our In-house Chat Platform for the Web
This blog post shares our learnings from building our very own chat platform for the web.
-
Go Modules- A Guide for monorepos (Part 1)
This post is the first in a series of blogs about Grab’s experience with Go modules in a multi-module monorepo. Here, we discuss the challenges we faced along the way and the solutions we came up with.
-
Tackling UI Test Execution Time Imbalance for Xcode Parallel Testing
This blog post introduces how we use Xcode parallel testing to balance test execution time and improve the parallelism of our systems. We also share how we overcame a challenge that prevented us from running the tests efficiently.
-
Returning 575 Terabytes of Storage Space to Our Users
This blog explains how we measured and reduced our app's storage footprint on user devices.
-
Grab-Posisi - Southeast Asia’s First Comprehensive GPS Trajectory Dataset
This blog highlights Grab's latest GPS trajectory dataset - its content, format, applications, and how you can access the dataset for your research purpose.
-
How We Prevented App Performance Degradation from Sudden Ride Demand Spikes
This blog addresses how engineers overcame the challenges Grab faced during the initial days due to sudden spike in ride demand.
-
Plumbing At Scale
This article details our journey building and deploying an event sourcing platform in Go, building a stream processing framework over it, and then scaling it (reliably and efficiently) to service over 300 billion events a week.
-
Journey to a Faster Everyday Super App Where Every Millisecond Counts
This post narrates the journey of our performance improvement efforts on the Grab passenger app. It highlights how we were able to reduce the time spent starting the app by more than 60%, while preventing regressions introduced by new features.
-
Marionette - Enabling E2E User-scenario Simulation
Do you know how we get early feedback on any breaking changes? Read through our blog to find out how Marionette, an in-house simulation platform, detects breaking changes in booking workflows. It even generates resources for running simulations and facilitates the testing of microservices powering our Driver and Passenger apps.
-
How We Implemented Domain-Driven Development in Golang
Are you curious how we quickly enabled our partners to self-service using our platform? Have you wondered how some teams at Grab implemented domain-driven development while using Golang? Read this blog post to know more.
-
Griffin, an Anti-fraud Risk Rule Engine Making Billions of Predictions Daily
This blog highlights Grab’s high-performance risk rule engine that automates the creation of rules to detect fraudulent activities with minimal efforts by engineers.
-
Using Grab’s Trust Counter Service to Detect Fraud Successfully
This blog introduces Grab’s Trust Counter service for detecting fraud. It explains how the solution was designed so that different stakeholders like data analysts and data scientists can use the Counter service without any manual intervention from engineers. The Counter service provides a reliable data feed to the data science world.
-
Being a Principal Engineer at Grab
Curious about what a Principal Engineer role at Grab entails? Our Principal Engineers' responsibilities range from solving complex problems, taking care of the system-level architecture, collaborating with cross-functional teams, providing mentorship, and more.
-
Data First, SLA Always
Introducing Trailblazer, the Data Engineering team’s solution to implementing change data capture of all upstream databases. In this article, we introduce the reason why we needed to move away from periodic batch ingestion towards a real time solution and show how we achieved this through an end to end streaming pipeline.
-
How We Built a Logging Stack at Grab
This blog post explains what we did to solve our inhouse logging problem around the lack of visualizations and metrics for our service logs.
-
Catwalk: Serving Machine Learning Models at Scale
This blog post explains why and how we came up with a machine learning model serving platform to accelerate the use of machine learning in Grab.
-
React Native in GrabPay
This blog post describes how we used React Native to optimize the Grab PAX app.
-
Preventing Pipeline Calls from Crashing Redis Clusters
This blog post describes Grab’s post-mortem findings for the outage caused by the Redis Cluster failure.
-
Loki, a Dynamic Mock Server for HTTP/TCP Testing
Read our blog to know how Loki, a dynamic mock server, makes local box testing of mobile apps easy, repeatable, and exhaustive. It supports both HTTP and TCP protocols and can provide dynamic runtime responses.
-
Designing Resilient Systems Beyond Retries (Part 3): Architecture Patterns and Chaos Engineering
This post is the third of a three-part series on going beyond retries and circuit breakers to improve system resiliency. This whole series covers techniques and architectures that can be used as part of a strategy to improve resiliency. In this article, we will focus on architecture patterns and chaos engineering to reduce, prevent, and test resiliency.
-
Designing Resilient Systems Beyond Retries (Part 2): Bulkheading, Load Balancing, and Fallbacks
This post is the second of a three-part series on going beyond retries to improve system resiliency. We’ve previously discussed about rate-limiting as a strategy to improve resiliency. In this article, we will cover these techniques: bulkheading, load balancing, and fallbacks.
-
Designing Resilient Systems Beyond Retries (Part 1): Rate-Limiting
This post is the first of a three-part series on going beyond retries to improve system resiliency. In this series, we will discuss other techniques and architectures that can be used as part of a strategy to improve resiliency. To start off the series, we will cover rate-limiting.
-
Context Deadlines and How to Set Them
This blog post explains from the ground up a strategy for configuring timeouts and using context deadlines correctly, drawing from our experience developing microservices in a large scale and often turbulent network environment.
-
Recipe for Building a Widget: How We Helped to “Peak-Shift” Demand by Helping Passengers Understand Travel Trends
We help to “peak-shift” demand by helping passengers understand travel trends with Grab’s data. Curious to know how we empower our passengers to make better travel decisions? Read on!
-
Structured Logging: The Best Friend You’ll Want When Things Go Wrong
This blog post describes how we built a structured logging framework that integrates well with our existing Elastic stack-based logging backend, allowing us to do logging better and more efficiently.
-
How We Simplified Our Data Ingestion & Transformation Process
This blog post describes how Grab built a scalable data ingestion system and how we went from prototyping with Spark Streaming to running a production-grade data processing cluster written in Golang.
-
A Lean and Scalable Data Pipeline to Capture Large Scale Events and Support Experimentation Platform
This blog post focuses on the lessons we learned while building our batch data pipeline.
-
Designing Resilient Systems: Circuit Breakers or Retries? (Part 2)
Grab designs fault-tolerant systems that can withstand failures allowing us to continuously provide our customers with the many services they expect from us.
-
Querying Big Data in Real-time with Presto & Grab's TalariaDB
In this article, we focus on TalariaDB, a distributed, highly available, and low latency time-series database that stores real-time data. For example, logs, metrics, and click streams generated by mobile apps and backend services that use Grab's Experimentation Platform SDK. It "stalks" the real-time data feed and only keeps the last one hour of data.
-
Designing Resilient Systems: Circuit Breakers or Retries? (Part 1)
Grab designs fault-tolerant systems that can withstand failures allowing us to continuously provide our customers with the many services they expect from us.
-
Orchestrating Chaos Using Grab's Experimentation Platform
At Grab, we practice chaos engineering by intentionally introducing failures in a service or component in the overall business flow. But the failed’ service is not the experiment’s focus. We’re interested in testing the services dependent on that failed service.
-
Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab
Grab’s feature toggle SDK provides a dynamic feature toggle capability to our engineering, data, product, and even business teams. Feature toggles also let teams modify system behaviour without changing code. Developers use the feature flags to keep new features hidden until product and marketing teams are ready to share and to run experiments (A/B tests) by dynamically changing feature toggles for specific users, rides, etc.
-
Mockers - Overcoming Testing Challenges at Grab
Sustaining quality in fast paced development is a challenge. At Grab, we use Mockers - a tool to expand the scope of local box testing. It helps us overcome testing challenges in a microservice architecture.
-
How We Designed the Quotas Microservice to Prevent Resource Abuse
Reliable, scalable, and high performing solutions for common system level issues are essential for microservice success, and there is a Grab-wide initiative to provide those common solutions. As an important component of the initiative, we wrote a microservice called Quotas, a highly scalable API request rate limiting solution to mitigate the problems of service abuse and cascading service failures.
-
Building Grab’s Experimentation Platform
At Grab, we continuously strive to improve the user experience of our app for both our passengers and driver partners. To do that, we’re constantly experimenting, and in fact, many of the improvements we roll out to the Grab app are a direct result of successful experiments.
-
Introducing Grab-Kit: Distributed Service Design at Grab
As we evolved from a single monolithic application to a microservices-based architecture, we were faced with a new challenge. How do we support exponential growth while maintaining consistency, coordination, and quality?
-
Deep Dive into Database Timeouts in Rails
Disaster strikes when you do not configure timeout values properly. In this post, we dive into the details of how timeouts work with Ruby on Rails and Databases.
-
Dealing with the Meltdown Patch at Grab
The meltdown attack reported recently had far reaching implications in terms of security as well as performance. This post is a quick rundown of what performance impacts we noted as well as how we went on to mitigate them.
-
The Art of Hiring Good Engineers
Hiring the first five good engineers in your team requires a different approach to hiring the first twenty good engineers. The approach to designing this process will be even more different, when you want to hire to scale up to a 100 Engineers... or even to 300.
-
Migrating Existing Datastores
At Grab we take pride in creating solutions that impact millions of people in Southeast Asia and as they say, with great power comes great responsibility. As an app with 55 million downloads and 1.2 million drivers, it's our responsibility to keep our systems up-and-running. Any downtime causes drivers to miss earning and passengers to miss their appointments.
-
So You Need to Hire Good Engineers
If you are in a fast growing tech startup, you're probably actively interviewing and hiring engineers to scale teams. My question to you is, what hiring strategy are you using when interviewing engineering warriors?
-
Come and #hackallthethings at Grab
For the longest time, security has been at the center of our priorities. There’s nothing more self-evident about the trust our millions of driving partners and customers put in Grab. We strive everyday to build the best tools available to ensure their data stays secure.
-
How We Scaled Our Cache and Got a Good Night's Sleep
Caching is arguably the most important and widely used technique in computer industry, from CPU to Facebook live videos, cache is everywhere.
-
Grab's Front End Study Guide
Grab is Southeast Asia (SEA)’s leading transportation platform and our mission is to drive SEA forward, leveraging on the latest technology and the talented people we have in the company. As of May 2017, we handle 2.3 million rides daily and we are growing and hiring at a rapid scale. To keep up with Grab’s phenomenal growth, our web team and web platforms have to grow as well. Fortunately, or unfortunately, at Grab, the web team has been keeping up with the latest best practices and has incorporated the modern JavaScript ecosystem in our web apps.
-
DNS Resolution in Go and Cgo
This article is part two of a two-part series. In this article, we will talk about RFC 6724 (3484), how DNS resolution works in Go and Cgo, and finally explaining why disabling IPv6 also disables the sorting of IP Addresses.
-
Driving Southeast Asia Forward with AWS
My name is Arul Kumaravel, VP of Engineering at Grab. Grab's mission is to drive Southeast Asia (SEA) forwards. Today I would like to share with you how AWS is helping us with this mission.
-
Troubleshooting Unusual AWS ELB 5XX Error
This article is part one of a two-part series. In this article we explain the ELB 5XX errors which we experience without an apparent reason. We walk you through our investigative process and show you our immediate solution to this production issue. In the second article, we will explain why the non-intuitive immediate solution works and how we eventually found a more permanent solution.
-
Scaling Like a Boss with Presto
A year ago, the data volumes at Grab were much lower than the volume we currently use for data-driven analytics. We had a simple and robust infrastructure in place to gather, process and store data to be consumed by numerous downstream applications, while supporting the requirements for data science and analytics.
-
Deep Dive into iOS Automation at Grab - Continuous Delivery
This is the second part of our series "Deep Dive into iOS Automation at Grab", where we will cover how we manage continuous delivery. As a common solution to Apple developer account device whitelist limitation, we use an enterprise account to distribute beta apps internally. There are 4 build configurations per target.
-
Deep Dive into iOS Automation at Grab - Integration Testing
This is the first part of our series "Deep Dive Into iOS Automation At Grab", where we will cover testing automation in the iOS team. Over the past two years at Grab, the iOS passenger app team has grown from 3 engineers in Singapore to 20 globally. Back then, each one of us was busy shipping features and had no time to set up a proper automation process.
-
A Key Expired in Redis, You Won't Believe What Happened Next
One of Grab's more popular caching solutions is Redis (often in the flavour of the misleadingly named ElastiCache), and for most cases, it works. Except for that time it didn't. Follow our story as we investigate how Redis deals with consistency on key expiration.
-
How Grab Hires Engineers in Singapore
Working at Grab will be the “most challenging yet rewarding opportunity” any employee will ever encounter.
-
Battling with Tech Giants for the World's Best Talent
Grab steadily attracts a diverse set of engineers from around the world in its three R&D centres: Singapore, Seattle, and Beijing. Right now, half of Grab’s top leadership team is made up of women and we have attracted people from five continents to work together on solving the biggest challenges for Southeast Asia.
-
This Rocket Ain't Stopping - Achieving Zero Downtime for Rails to Golang API Migration
Grab has been transitioning from a Rails + NodeJS stack to a full Golang Service Oriented Architecture. To contribute to a single common code base, we wanted to transfer engineers working on the Rails server powering our passenger app APIs to other Go teams.
-
Round-robin in Distributed Systems
While working on Grab's Common Data Service (CDS), there was the need to implement client side load balancing between CDS clients and servers. However, I kept encountering persistent connection issues with Elastic Load Balance (ELB).
-
Programmers Beware - UX is Not Just for Designers
Perhaps one of the biggest missed opportunities in Tech in recent history is UX. Somehow, UX became the domain of Product Designers and User Interface Designers. While they definitely are the right people to be thinking about web pages, mobile app screens and so on, we've missed a huge part of what we engineers work on everyday: SDKs and APIs.
-
Grab You Some Post-Mortem Reports
Grab adopts a Service-Oriented Architecture (SOA) to rapidly develop and deploy new feature services. One of the drawbacks of such a design is that team members find it hard to help with debugging production issues that inevitably arise in services belonging to other stakeholders.
-
The Curious Case of the Phantom Instance
Here at the Grab Engineering team, we have built our entire backend stack on top of Amazon Web Services (AWS). Over time, it was inevitable that some habits have started to form when perceiving our backend monitoring statistics.