Data Science

Engineering · Data Analytics · Data Science
Streamlining RiskOps with the SOP agent framework

Discover how the SOP-driven Large Language Model (LLM) agent framework is revolutionising Risk Operations (RiskOps) by automating Account Takeover (ATO) investigations. Explore the potential of this transformative tool to unlock unprecedented levels of productivity and innovation across industries.

Fujiao Liu · Haitao Bao · Jia Chen · Meichen Lu · Muqi Li 8 May 2025 | 5 min read

Engineering Experiment Generative AI LLM Machine learning
Engineering · Data Analytics · Data Science
Introducing the SOP-driven LLM agent frameworks

The SOP-driven Large Language Model (LLM) agent framework, revolutionises enterprise AI by integrating Standard Operating Procedures (SOPs) to ensure reliable execution and boost productivity. Achieving over 99.8% accuracy, it offers versatile automation tools and app development, making AI solutions 10 times faster. The framework addresses LLM challenges by structuring SOPs as a tree, enabling intuitive workflow creation. The framework aims to transform enterprise operations and explore industry applications.

Fujiao Liu · Shuqi Wang · Wenhui Wu · Muqi Li · Jia Chen · Haitao Bao · Meichen Lu 25 Apr 2025 | 6 min read

Engineering Experiment Generative AI LLM Machine learning
Engineering · Data Science
Grab AI Gateway: Connecting Grabbers to multiple GenAI providers

GenAI has become integral to innovation, powering the next generation of AI enabled applications. With easy integration with multiple AI providers, it brings cutting edge technology to every user. This article explores why we need GenAI Gateway, how it works, what are the user benefits, and the challenges faced in GenAI in this article.

Bjorn Jee · Daniel Tai · Siddharth Pandey · Wenbo Wei 19 Feb 2025 | 10 min read

Data Science Engineering Generative AI LLM Machine Learning Optimisation
Engineering · Data Science
How we seamlessly migrated high volume real-time streaming traffic from one service to another with zero data loss and duplication

In the world of high-volume data processing, migrating services without disruption is a formidable challenge. At Grab, we recently undertook this task by splitting one of our backend service's stream read and write functionalities into two separate services. Discover how we conducted this transition with zero data loss and duplication using a simple switchover strategy, along with rigorous validation mechanisms.

Md Riyadh · Jia Long Loh · Muqi Li · Pu Li 5 Dec 2024 | 4 min read

Data streaming Engineering Optimisation Real-time streaming Service
Engineering · Data Science
Supercharging LLM application development with LLM-Kit

Discover how Grab's LLM-Kit enhances AI app development by addressing scalability, security, and integration challenges. This article discusses the challenges faced in LLM app building, the solution, the architecture of the LLM-Kit as well as the future plans of the LLM-Kit.

Boon Zhan Chew · Kendrick Tan · Swati Joshi · Yu Jie Ang 29 Nov 2024 | 7 min read

Engineering Generative AI LLM Machine Learning
Engineering · Data Science
Metasense V2: Enhancing, improving and productionisation of LLM powered data governance

In the initial article, we explored the integration of Large Language Models (LLM) to automate metadata generation, addressing challenges like limited customisation and resource constraints. This integration enabled efficient column-level tag classifications and data sensitivity tiering. With the model initially scanning over 20,000 entries, we identified areas for improvement post-rollout. These advancements have significantly reduced manual workloads, increased accuracy, and bolstered trust in our data governance processes.

Nick Buhrer · Shreyas Parbat · Yucheng Zeng 14 Nov 2024 | 6 min read

Engineering Generative AI LLM Machine Learning
Engineering · Data Science
LLM-assisted vector similarity search

Vector similarity search has revolutionised data retrieval, particularly in the context of Retrieval-Augmented Generation in conjunction with advanced Large Language Models (LLMs). However, it sometimes falls short when dealing with complex or nuanced queries. In this post, we explore our experimentation with a simple yet effective approach to mitigate this shortcoming by combining the efficiency of vector similarity search with the contextual understanding of LLMs.

Md Riyadh · Muqi Li · Felix Haryanto Lie · Jia Long Loh · Haotian Mi · Sayam Bohra 23 Oct 2024 | 8 min read

Engineering Experiment Generative AI LLM Machine Learning
Engineering · Analytics · Data Science
Leveraging RAG-powered LLMs for analytical tasks

The emergence of Retrieval-Augmented Generation (RAG) has significantly revolutionised Large Language Models (LLMs), propelling them to unprecedented heights. This development prompts us to consider its integration into the field of Analytics. Explore how Grab harnesses this technology to optimise our analytics processes.

Edmund Hong · Yi Ni Ong 9 Oct 2024 | 7 min read

Engineering Experiment Generative AI LLM Machine learning
Engineering · Data Science
Evolution of Catwalk: Model serving platform at Grab

Read about the evolution of Catwalk, Grab's model serving platform, from its inception to its current state. Discover how it has evolved to meet the needs of Grab's growing machine learning model serving requirements.

Vishal Sharma · Wenbo Wei · Siddharth Pandey · Daniel Tai · Bjorn Jee 1 Oct 2024 | 8 min read

Data Science Docker Kubernetes Machine Learning Models TensorFlow
Engineering · Data Science
LLM-powered data classification for data entities at scale

With the advent of the Large Language Model (LLM), new possibilities dawned for metadata generation and sensitive data identification at Grab. This prompted the inception of our project aimed to integrate LLM classification into our existing data management service. Read to find out how we transformed what used to be a tedious and painstaking process to a highly efficient system and how it has empowered the teams across the organisation.

Hualin Liu · Stefan Jaro · Harvey Li · Jerome Tong · Andrew Lam · Chamal Sapumohotti · Feng Cheng · Aaqib Kufran 15 Jul 2024 | 11 min read

Data Generative AI Machine Learning
Data Science · Engineering · Security
Ensuring data reliability and observability in risk systems

As the amount of data Grab handles grows, there is an increased need for quick detections for data anomalies (incompleteness or inaccuracy), while keeping it secure. Read this to learn how the Risk Data team utilised Flink and Datadog to enhance data observability within Grab’s services.

Yi Ni Ong · Kamesh Chandran · Jia Long Loh 23 Apr 2024 | 6 min read

Data observability Data reliability Data Science Risk Security
Engineering · Data Science
Grab Experiment Decision Engine - a Unified Toolkit for Experimentation

Explore how the GrabX Decision Engine, an integral part of Grab's Experimentation platform, streamlines the testing of thousands of experimental variants weekly. This blog delves into how this internally open-sourced package institutionalises best practices in experimental efficiency and analytics, thereby ensuring accurate and reliable conclusions from each experiment.

Ruike Zhang · Panos Mavrokonstantis 9 Apr 2024 | 15 min read

Data Science Econometrics Experiment Python Package Statistics
Engineering · Data Science
Iris - Turning observations into actionable insights for enhanced decision making

With cross-platform monitoring, a common problem is the difficulty in getting comprehensive and in-depth views on metrics, making it tough to see the big picture. Read to find out how the Data Tech team ideated Iris to turn observations into actionable insights for enhanced decision-making.

Huong Vuong · Hai Nam Cao 3 Apr 2024 | 19 min read

Analytics Data insights Decision making Metrics
Engineering · Data Science
Enabling near real-time data analytics on the data lake

As the data lake landscape matures over the years, it presents opportunities to unlock more business value from the data. This correlates with the increased demand for flexible ad-hoc usage of fresh data. This article explores how we implemented data ingestion in Hudi table formats using Flink to meet this business demand.

Shi Kai Ng · Shuguang Xiang 23 Feb 2024 | 8 min read

Data Analytics Kafka Real-Time Stream Processing
Engineering · Data Science
Kafka on Kubernetes: Reloaded for fault tolerance

Dive into this insightful post to explore how Coban, Grab's real-time data streaming platform, has drastically enhanced the fault tolerance on its Kafka on Kubernetes design, to ensure seamless operation even amid unexpected disruptions.

Fabrice Harbulot · Thang Le 26 Dec 2023 | 20 min read

AWS Data Streaming Kafka Kubernetes
Engineering · Data Science · Product
An elegant platform

Supporting real-time data streaming enables our internal users to build intelligent applications and services, a crucial aspect of continuously out-serving our community. Read this article to understand our journey of building a real-time data streaming platform from pure Infrastructure-as-Code towards a more sophisticated control plane, and the benefits of this solution.

Fabrice Harbulot · Minh Khoi Nguyen 30 Nov 2023 | 16 min read

Data Data streaming Platformisation Real-time streaming
Engineering · Data Science · Product
Road localisation in GrabMaps

With GrabMaps powering the Grab superapp we have the opportunity to improve our services and enhance our map with hyperlocal data. No matter the use case, road localisation plays an important role in Grab’s map-making process. However, road localisation entails handling a substantial volume of data, making it a costly and time-consuming endeavour. In this article, we explore the strategies we have implemented to drive down costs and reduce processing times associated with road localisation.

Roxana Crisan 17 Nov 2023 | 13 min read

Big Data Data Data processing GrabMaps Hyperlocalisation Maps
Engineering · Data Science
Scaling marketing for merchants with targeted and intelligent promos

Apart from ensuring advertisements reach the right audience, it is also important to make promos by merchants more targeted and intelligent to help scale their marketing. With Grab’s innovative AI tool, merchants can boost sales while cutting costs. Dive into this game-changing tool that’s reshaping the future of marketing and find out how the Data Science team at Grab used automation and made promo assignments a more seamless and intelligent process.

Sharon Teng 11 Oct 2023 | 7 min read

Advertising Data Data science Marketing Scalability
Engineering · Data Science
Stepping up marketing for advertisers: Scalable lookalike audience

A key challenge in advertising is reaching the right audience who are most likely to use your product. Read this article to find out how the Data Science team improved advertising effectiveness by using lookalike audiences to identify individuals who share similar characteristics with an existing consumer base.

William Wu 22 Sep 2023 | 8 min read

Advertising Data Data science Lookalike audience Marketing Scalability
Engineering · Data Science · Product
Building hyperlocal GrabMaps

Being hyperlocal is a key advantage for GrabMaps. In this article we will explain what being hyperlocal means and how it helps GrabMaps bring value to our driver-partners and passengers through the Grab platform.

Adriana Lazar 30 Aug 2023 | 6 min read

Big Data Data Data processing GrabMaps hyperlocalisation Maps navigation
Data Science · Security
Unsupervised graph anomaly detection - Catching new fraudulent behaviours

As fraudsters continue to evolve, it becomes more challenging to automatically detect new fraudulent behaviours. At Grab, we are committed to continuously improving our security measures and ensuring our users are protected from fraudsters. Find out how Grab’s Data Science team designed a machine learning model that has the ability to discover new fraud patterns without the need for label supervision.

Rizal Fathony · Jenn Ng · Jia Chen 2 Aug 2023 | 8 min read

Anomaly detection Data science Fraud detection Graph networks Graph visualisation Graphs Machine learning Security
Engineering · Security · Data Science
Graph service platform

Graphs are powerful data representations that detect relationships and data linkages between devices and help reveal fraudulent or malicious users. Learn how GrabDefence built the graph service platform to help discover potentially malicious data linkages.

Wenxiang Lu · Bruce Li · Jacob Yu · Muqi Li · Jia Chen 5 Jan 2023 | 7 min read

Analytics Engineering Fraud detection Graph networks Graph visualisation Graphs Security
Engineering · Data Science · Security
Graph for fraud detection

Fraud detection has become increasingly important in a fast growing business as new fraud patterns arise when a business product is introduced. We need a sustainable framework to combat different types of fraud and prevent fraud from happening. Read and find out how we use graph-based models to protect our business from various known and unknown fraud risks.

Min Chen · Advitiya Vashist · Jenn Ng · Jia Chen 24 Nov 2022 | 9 min read

Analytics Data Science Fraud detection Graph networks Graph visualisation Graphs Security
Engineering · Data Science
Query expansion based on user behaviour

User behaviour data is a gold mine to gain insights about users and help us improve user experience. In this blog, we explore a query expansion framework based on user rewrite behaviour and how it improves user search experience and conversion.

Shuailong Liang · Weilun Wu · Yuan Meng · Simone Wong 16 Nov 2022 | 6 min read

Analytics Data Science
Engineering · Data Science · Security
Using mobile sensor data to encourage safer driving

Telematics is most commonly used to monitor vehicle movements and track driving safety, profiling, fleet optimisation and possible productivity improvements. Read this to find out more about how Grab uses telematics to encourage safer driving across our driver and delivery partner fleet.

Laiyi Lin 25 Oct 2022 | 12 min read

Analytics Data Science Driving patterns GPS Security
Engineering · Data Science
Automatic rule backtesting with large quantities of data

At Grab, real-time fraud detection is built on a rule engine. As data scientists and analysts, we need to analyse and simulate a rule on historical data to check the performance and accuracy of the rule. Backtesting, also known as Replay, enables analysts to run simulations of either newly-invented rules, or evaluate the performance of existing rules using past events ranging from days to months, and significantly improve rule creation efficiency.

Chao Wang · Clemens Valiente · Jun Liu · Daniel Wang 8 Sep 2022 | 6 min read

Automation Backtesting Data science Testing
Engineering · Data Science
How we store and process millions of orders daily

The Grab Order Platform is a distributed system that processes millions of GrabFood or GrabMart orders every day. Learn about how the Grab order platform stores food order data to serve transactional (OLTP) and analytical (OLAP) queries.

Xi Chen · Siliang Cao 15 Aug 2022 | 9 min read

Database Distributed Systems Platform Storage
Engineering · Security · Data Science
Graph Networks - 10X investigation with Graph Visualisations

As fraud schemes get more complex, we need to stay one step ahead by improving fraud investigation methods. Read to find out more about graph visualisation, why we need it and how it helps with uncovering patterns and relationships.

Fujiao Liu · Shuqi Wang · Muqi Li · Jia Chen 30 Jun 2022 | 5 min read

Graph technology Graph visualisation Graphs concepts Security
Engineering · Security · Data Science
How facial recognition technology keeps you safe

Facial recognition technology has grown tremendously in recent years due to the rise of deep learning techniques and accelerated digital transformation. Read to find out more about facial recognition technology in Grab and the components that help keep you safe.

Kai Feng Tee · Wentao Xie 9 Jun 2022 | 12 min read

Facial recognition Security
Engineering · Data Science
Automated Experiment Analysis - Making experimental analysis scalable

Analysts and data scientists invest lots of time into creating trustworthy experiments, which are key to making sound decisions. Read to find out how Automated Experiment Analysis helps make experimental analysis more scalable.

Albert Cheng · Ankit Sinha · Saubhagya Awaneesh · Kenneth Rithvik · Ruike Zhang 30 May 2022 | 7 min read

Azure Databricks Experiment Experimental analysis
Engineering · Data Science
How telematics helps Grab to improve safety

Coupled with data science, telematics can help to detect traffic events such as harsh braking and unsafe lane changes so we can provide a safer experience for our users. Read on to find out more about the challenges faced and how we addressed them with telematics.

Wilson Burhan 24 Mar 2022 | 5 min read

Analytics Data Science Driving patterns Engineering Safety
Engineering · Data Science
Real-time data ingestion in Grab

When it comes to data ingestion, there are several prevailing issues that come to mind: data inconsistency, integrity and maintenance. Find out how the Caspian team leveraged real-time data ingestion to help address these pain points.

Shuguang Xiang · Irfan Hanif · Feng Cheng 14 Mar 2022 | 10 min read

Data ingestion Engineering
Data Science
Using real-world patterns to improve matching in theory and practice

Find out how real-world patterns can be used to improve algorithm performance when performing bipartite matching for passengers and driver-partners.

Tenindra Abeywickrama · Victor Liang 22 Nov 2021 | 12 min read

Data Science Research
Engineering · Data Science
Securing and Managing Multi-cloud Presto Clusters with Grab’s DataGateway

This blog post discusses how Grab's DataGateway plays a key role in supporting hundreds of users in our entire Presto ecosystem - from managing user access, cluster selection, workload distribution, and many more.

Vinnson Lee 24 Aug 2020 | 10 min read

Access Control Cluster Data Data Pipeline Engineering Presto Workload Distribution
Engineering · Data Science
The Journey of Deploying Apache Airflow at Grab

This blog post shares how we designed and implemented an Apache Airflow-based scheduling and orchestration platform for teams across Grab.

Chandulal Kavar 14 Jul 2020 | 11 min read

Airflow Data Pipeline Engineering Kubernetes Platform Scheduling
Data Science
Does Southeast Asia Run on Coffee?

This blog post shares insights on GrabFood data around how much our fellow Southeast Asians love coffee.

Siu Sing Lai · Lara PuReum Yim 26 Mar 2020 | 5 min read

Data Data Analytics Data Visualisation
Data Science
GrabChat Much? Talk Data to Me!

This blog post uncovers some interesting insights from our GrabChat data in Singapore, Malaysia, and Indonesia.

Jason Lee Jie Shien · Lara PuReum Yim 24 Mar 2020 | 7 min read

Data Data Analytics Data Visualisation
Data Science
7 Fun Facts about Grab’s Driver-Partners in Singapore

This blog post shares the most interesting data points from 2019 about our Singapore driver-partners.

Lara PuReum Yim · Chong You Zhen · Sze Han Ong · Michael Chirico · Kelly Kuo · Kenny Chan 20 Mar 2020 | 4 min read

Data Data Analytics
Data Science · Engineering
Data First, SLA Always

Introducing Trailblazer, the Data Engineering team’s solution to implementing change data capture of all upstream databases. In this article, we introduce the reason why we needed to move away from periodic batch ingestion towards a real time solution and show how we achieved this through an end to end streaming pipeline.

Johan Kok · Pramiti Goel · Feng Cheng · Irfan Hanif · Deepak Barr 1 Aug 2019 | 11 min read

Data Pipeline
Data Science · Product
Making Grab’s Everyday App Super

To excel in a heavily diversified market like Southeast Asia, we leverage on the depth of our data to understand what sorts of information users want to see on our Feed and when they should see them. In this article we will discuss Grab Feed’s recommendation logic and strategies, as well as its future roadmap.

Justin Bolilia · Romain Basseville · Karen Kue 3 Jul 2019 | 7 min read

Data Science Feed Machine Learning Recommendations Superapp
Engineering · Data Science
Catwalk: Serving Machine Learning Models at Scale

This blog post explains why and how we came up with a machine learning model serving platform to accelerate the use of machine learning in Grab.

Nutdanai Phansooksai · Juho Lee · Pratyush More · Romain Basseville 2 Jul 2019 | 12 min read

Data Science Machine Learning Models TensorFlow
Data Science
Tourists on GrabChat!

Just over two years ago we introduced GrabChat, Southeast Asia’s first of its kind in-app messaging platform. Since then we’ve added all sorts of useful features to it such as auto-translated messages, the ability to send photos, and even voice messages! It’s been a great tool to facilitate smoother communications between our driver-partners and our passengers, and one group in particular has found it incredibly useful: tourists!

Lara PuReum Yim · Dustin Chung 22 May 2019 | 7 min read

Analytics Data Data Analytics
Data Science
Bubble Tea Craze on GrabFood!

Bubble Tea’s popularity on GrabFood has captured our attention and we want to celebrate its fascinating growth with you! We have deep-dived into Grab’s Big Data to find the most interesting bubbles of treasures that can excite your palette. Hopefully these insights can help you understand what’s behind the Bubble Tea craze in GrabFood in Southeast Asia!

Lara PuReum Yim · MingXuan Lee 9 May 2019 | 4 min read

Analytics Data Data Analytics
Data Science
How We Harnessed the Wisdom of Crowds to Improve Restaurant Location Accuracy

We questioned some of the estimates that our algorithm for calculating restaurant wait times was making, and found that the "errors" were actually useful to discover restaurants whose locations had been incorrectly registered in our system. By combining such error signals across multiple orders, we were able to identify correct restaurant locations and amend them to improve the experience for our consumers.

Pravin Kakar 2 Apr 2019 | 8 min read

Data Science
Data Science · Engineering · Product · Design
Recipe for Building a Widget: How We Helped to “Peak-Shift” Demand by Helping Passengers Understand Travel Trends

We help to “peak-shift” demand by helping passengers understand travel trends with Grab’s data. Curious to know how we empower our passengers to make better travel decisions? Read on!

Lara PuReum Yim · Prashant Kumar · Raghav Garg · Preeti Kotamarthi · Ajmal Afif · Calvin Ng Tjioe · Renrong Weng 7 Mar 2019 | 12 min read

Analytics Data Data Analytics
Data Science
Understanding Supply & Demand in Ride-hailing Through the Lens of Data

Grab aims to ensure that our passengers can get a ride conveniently while providing our drivers better livelihood. To achieve this, balancing demand and supply is crucial. This article gives you a glimpse of one of our analytics initiatives - how to measure the supply and demand ratio at any given area and time.

Aayush Garg · Lara PuReum Yim · ChunKai Phang 20 Feb 2019 | 7 min read

Analytics Data Data Analytics Data Storytelling Data Visualisation
Data Science
Journey of a Tourist via Grab

Grab's services to tourists are an integral part of connecting tourists to various destinations and attractions. Do tourists travel on Grab to outlandishly fancy places like those you see in the movie "Crazy Rich Asians"? What are their favourite local places? Did you know that Grab's data reveals that medical tourism is growing in Singapore? Here are some exciting travel patterns that we found about our tourists' Grab rides in Singapore!

Lara PuReum Yim 11 Sep 2018 | 6 min read

Analytics Data Data Analytics Tourism Tourists
Data Science
Grab Senior Data Scientist Liuqin Yang Wins Beale-Orchard-Hays Prize

Grab Senior Data Scientist Dr. Liuqin Yang wins the 2018 Beale-Orchard-Hays Prize, the highest honor in Computational Mathematical Optimization. He has been recognised for his paper and the corresponding software SDPNAL+.

Yang Liuqin 20 Jul 2018 | 4 min read

BOH Data Science
Data Science
GrabShare at the Intelligent Transportation Engineering Conference

We're excited to share the publication of our paper GrabShare: The Construction of a Realtime Ridesharing Service, which was Grab's contribution to the Intelligent Transportation Engineering Conference in Singapore last month.

Dominic Widdows 13 Dec 2017 | 3 min read

Data Science GrabShare
Data Science
The Data and Science Behind GrabShare Part I: Verifying Potential and Developing the Algorithm

Launching GrabShare was no easy feat. After reviewing the academic literature, we decided to take a different approach and build a new matching algorithm from the ground up.

Tang Muchen 20 Oct 2017 | 10 min read

Data Science GrabShare
Data Science · Product
How to Go from a Quick Idea to an Essential Feature in Four Steps

How do you work within a startup team and build a quick idea into a key feature for an app that impacts millions of people? It's one of those things that is hard to understand when you just graduate as an engineer.

Huang Da · Tan Sien Yi 16 May 2017 | 7 min read

Data Science Product Management