Grab Tech

The evolution of Grab's machine learning feature store cover photo

Engineering

The evolution of Grab's machine learning feature store

Learn how Grab is modernising its machine learning platform with a feature table-centric architecture powered by AWS Aurora for Postgres. This shift from a legacy feature fetching system to decentralised deployments enhances performance and user experience, while solving challenges like atomic updates and noisy neighbor issues.

Daniel Tai · Oscar Cassetti 24 Jul 2025 | 11 min read

AWS Database

Engineering
Grab's service mesh evolution: From Consul to Istio

When you're running 1000+ microservices across Southeast Asia's most complex transport and delivery platform, 'good enough' stops being good enough. Discover how Grab tackled the challenge of migrating from Consul to Istio across a hybrid infrastructure spanning AWS and GCP, separate AWS organizations, and diverse deployment models. This isn't your typical service mesh migration story. We share the real challenges of designing resilient architecture for massive scale, the unconventional decisions that paid off, and the lessons learned from coordinating migrations while keeping critical services like food delivery and ride-hailing running seamlessly. From evaluation criteria to architecture decisions, migration strategies to operational insights - get an inside look at how we're building the backbone of Grab's microservices future, one service at a time.

Hilman Kurniawan · Jay Chin · Shiyu Chen · Sok Ann Yap 16 Jul 2025 | 7 min read

AWS GCP Kubernetes Microservice Service mesh
Engineering
DispatchGym: Grab’s reinforcement learning research framework

DispatchGym is a research framework that supports reinforcement learning (RL) studies for dispatch systems. A system that matches bookings with drivers. Designed to be efficient, cost-effective, and accessible, this article outlines its principles, research benefits, and real-world applications.

Tan Sien Yi · Henokh Fibrianto · Larry Lin 7 Jul 2025 | 7 min read

Dispatch Python
Engineering
Counter Service: How we rewrote it in Rust

The Integrity Data Platform team at Grab rewrote a QPS-heavy Golang microservice in Rust, achieving 70% infrastructure savings while maintaining similar performance. This initiative explored the ROI of adopting Rust for production services, balancing efficiency gains against challenges like Rust’s steep learning curve and the risks of rewriting legacy systems. The blog delves into the selection process, approach, pitfalls, and the ultimate business value of the rewrite.

Jia Long Loh · Pu Li · Muqi Li · Md Riyadh 20 Jun 2025 | 13 min read

Data Database Rust
Engineering
The complete stream processing journey on FlinkSQL

Introducing FlinkSQL interactive solution to enhance real-time stream processing exploration. The new system simplifies stream processing development, automates production workflows and democratises access to real-time insights. Read on about our journey that begun at addressing challenges encountered with the previous Zeppelin notebook-based solution to the current state of integration with and productionisation of FlinkSQL.

Calvin Tran · Shi Kai Ng 12 Jun 2025 | 8 min read

Database FlinkSQL
Engineering
Effortless enterprise authentication at Grab: Dex in action

This article outlines Grab's journey towards enabling a seamless single sign-on experience for its numerous internal applications. It addresses the challenges of fragmented authentication and authorisation systems and introduces Dex, an open-source federated OpenID Connect provider, as the chosen solution. The document details the implementation of Dex, its key features, and discusses future plans for an unified authorisation model.

Kah Wei Lee · Jack Wang · Weibin Wu · Jan Bissinger 23 May 2025 | 9 min read

Access control Engineering Security
Engineering
From failure to success: The birth of GrabGPT, Grab’s internal ChatGPT

When Grab's Machine Learning team sought to automate support queries, a failed chatbot experiment sparked an unexpected pivot: GrabGPT. Born from the need to harness Large Language Models (LLMs) internally, this tool became a go-to resource for employees. Offering private, auditable access to models like GPT and Gemini, the author shares his journey of turning failed experiments into strategic wins.

Wenbo Wei 19 May 2025 | 4 min read

AI Engineering Optimisation
Engineering · Data Analytics · Data Science
Streamlining RiskOps with the SOP agent framework

Discover how the SOP-driven Large Language Model (LLM) agent framework is revolutionising Risk Operations (RiskOps) by automating Account Takeover (ATO) investigations. Explore the potential of this transformative tool to unlock unprecedented levels of productivity and innovation across industries.

Fujiao Liu · Haitao Bao · Jia Chen · Meichen Lu · Muqi Li 8 May 2025 | 5 min read

Engineering Experiment Generative AI LLM Machine learning

1 of 26

Next