OceanBase Ant Group Alibaba Distributed Database

Meet OceanBase, the data-processing powerhouse behind Alibaba Singles' Day

Rebbeca Ren

posted on November 23, 2022 2:49 pm

During this year's Singles' Day, sales on Chinese e-commerce platforms rose 13.7% year-on-year to 1.1 trillion yuan. Originally created by Alibaba as China's answer to Black Friday, Singles' Day has since grown to become the largest online promotion in the country, with nearly all e-commerce platforms and e-tailers taking part. In recent years, the retail spree, which began as a one-day event on November 11, has morphed into a weeklong event.

Once the shopping extravaganza starts, massive purchasing power is unleashed. But as people flock to Alibaba's online marketplaces Taobao and Tmall, an avalanche of data will be generated: The spike in data volume could place huge strain on back-end processing and directly impair user experience.

With the solid data managing support from OceanBase, Ant Group’s self-developed distributed database, as well as other technological innovations, consumers had another hassle-free purchasing experience on this Singles' Day. Alibaba owns a roughly 33% stake in Ant Group, the company that runs the ubiquitous mobile payments and lifestyle app Alipay in China. 

Within four years, OceanBase went from carrying 10% of Alipay's data to supporting it 100%

For this year’s event, Alibaba said the total gross merchandise value (GMV), or sales, was “in line” with last year, which recorded the equivalent of $84.5 billion GMV at the time. The growth rate leveled out as Singles’ Day entered its 13th year, but when it was just getting started, the figure was astounding. 

"During the shopping season in the early years, traffic surged dozens or even hundreds of times, which is why we need to build a database with high concurrent processing capabilities and elastic scalability." Yang Chuanhui, CTO of OceanBase, told PingWest.

The GMV of Singles' Day in 2010 reached 936 million yuan, 18 times that of when it was first held in 2009. Existing solutions at the time were unable to keep up with Alibaba's growth, hence the plan to develop an innovative, distributed, low-cost, and high-reliability database management system from scratch was put forth.

Since its inception, OceanBase has played an increasingly vital role in helping Alipay process Singles’ Day payments on Alibaba’s marketplaces as its capabilities evolve. 

In 2013, it started assisting with the data processing of Singles’ Day promotions for the first time. After successfully handling 10% of the traffic allocated by Alipay on Singles’ Day in 2014, the database began to shoulder more responsibilities: In 2016, the database smoothly supported a peak of 120,000 payments per second during the shopping season; in the following year, it carried the entirety of Alipay’s traffic and set a then-database processing record of 256,000 transactions per second at peak; and a new record high of 544,000 was set in 2019.

The OceanBase team posed for a group photo after successfully supporting the Singles' Day shopping festival in 2018.
The OceanBase team posed for a group photo after successfully supporting the Singles' Day shopping festival in 2018.

The first task given to OceanBase is to provide a better high-concurrency control solution during Singles’ Day. Once the shopping carnival kicked out, a significant number of users rushed into Taobao or Tmall, many of them may do the same thing at the same time, like thousands of people simultaneously adding one same item to their shopping cart, or making a purchase at the same time — that is high concurrency situation. If concurrency is not well controlled, consumers will repeatedly fail when adding items to the shopping cart or making a payment.

Conventionally, the problem can be solved by locking or isolating a particular transaction to a single process, a transaction cannot read or write the data until it acquires an appropriate lock. As one might expect, this can be time-consuming and inefficient, with the main drawback being that it hits the ceiling of data processing easily. 

OceanBase has introduced MVCC (Multiversion Concurrency Control) to each partition so that each partition controls concurrency independently. Besides, OceanBase uses Global Timestamp Service and two-phase commit to ensure cross-partition transactions. This approach can more efficiently ensure that multiple transactions run concurrently without the possibility of inconsistencies.

The distributed design also makes OceanBase more skilled at governing high-concurrency as well as unexpected data influx. Under the distributed framework, the database can scale horizontally at ease, allowing it swiftly respond to changes in demand and guaranteeing that all queries are processed promptly.

In comparison to centralized databases that can only be scaled vertically by adding more resources, such as CPUs, memories, and disks, to the computer that stores and processes all data, distributed databases are built to be modular from the ground up and can be scaled out/scaled in by adding/retrieving more individual computing nodes.

“OceanBase is able to precisely call or recycle nodes in response to fluctuations in traffic," Yang said. "The shorter the node is occupied for, the lower the cost is."

According to OceanBase, in this year's promotion, 150,000 CPU cores have been saved, and the utilization rate of resources has reached 65%, which helped reduce carbon dioxide emissions.

Data loss is another issue that is likely to happen on Singles' Day, as a certain number of servers that host data can fail under the extreme condition. "When we first started providing support for Singles’ Day, we were so concerned about the possibility of losing data, so we put more engineers in to monitor the servers," Yang said.

The anxiety, however, is waning, as OceanBase's autonomous processing power raises. With an operation platform that can automatically locate and deal with broken servers, and a dependable fault-tolerant solution that can reroute traffic to backup servers without any human intervention, the database dramatically cuts down on data loss.

All due to OceanBase's elasticity in handling high-concurrency and data surges, as well as fault-tolerant ability, the overall operating efficiency has gone up. "Back in 2015, we set a target, that is, within three or four years, except for the regular on-duty personnel on Singles' Day, no extra team will be needed to support OceanBase," the CTO said. "Actually, we have almost reached that goal. Since 2020, we barely need additional manpower to back up the event."

In this once-a-year, highly demanding setting, the database is continually being honed and enhanced, becoming the backbone of data storage and processing for Alipay and other business lines of Ant Group. 

Demands for real-time analysis give rise to the HTAP function of OceanBase

Competition is heating up as more merchants onboard Singles' Day promotions. In response to sellers' requests for timely adaptation of their marketing strategies, OceanBase's real-time analysis service came into being. 

Real-time analytics, which refers to the practice of interpreting data as soon as it hits the database, could assist sellers in reducing the time between data collecting and decision-making, thereby expeditiously improving customer service, inventory management, content personalization, and more.

“In June 2021, we introduced the HTAP (Hybrid Transaction/Analytic Processing) function, which allows transactions and analysis to proceed simultaneously. Since then, retailers can make in-time adjustments to their strategies during the shopping extravaganza or on normal days,” Yang said. “Previously, sellers had to wait until all transactions were processed, usually T+1 (T+1 refers to settlement date that occurs on a transaction date plus one day) , before making any changes to their marketing or promotion strategies.”

Under conventional database solutions, if users want to achieve real-time analysis, they need to create an ETL (extract, transform, load) pipeline to copy data from an OLTP (online transaction processing) database to an OLAP (online analytical processing) database, which can be time-consuming and resource-intensive. With HTAP databases, OLTP and OLAP workloads can be hosted simultaneously. This not only streamlines the whole process and saves operational costs, but also frees up IT and data professionals to focus on higher-level, value-added tasks.

The ability to make real-time decisions will make enterprises nimbler, boost their customer outreach, and offer a significant advantage over the competition. According to a Gartner report released in May, 80% of companies surveyed have seen their revenues increase after implementing real-time analytics. 

Also, OceanBase finds a balance between compression and performance. Typically, high compression ratios are applied to save storage costs, but this approach could drag down the read and write performance of memory and hard disk. OceanBase uses column encoding for compression. It implements several encoding algorithms and it automatically chooses the most suitable one for each column. It adopts column compression by leveraging the similarity of the original data, such as same data type, same value range, etc. With LSM-Tree architecture and query optimization against encoded data, OceanBase does not compromise performance for high compression ratios.

When compared to conventional database, the compression technique provided by OceanBase can lower users' storage expenses by up to 70% without negatively impacting the throughput of data read or write operations. 

A mini model of "Five Data Centers in Three Regions" architecture of OceanBase, displayed at the Apsara Conference 2022
A mini model of "Five Data Centers in Three Regions" architecture of OceanBase, displayed at the Apsara Conference 2022

After 10 years of supporting Singles’ Day, the world’s largest online shopping festival, OceanBase has become a well-known technology service provider in the financial industry. Globally, the distributed database has served over 400 customers thus far, including some of China's top financial institutions, such as the Industrial and Commercial Bank of China, which has chosen OceanBase as its preferred technology provider to improve its core IT systems.

Distributed database is a better fit for fast-growing markets and businesses to manage their data

Now, OceanBase is seeking global expansion, and the Asia-Pacific region, especially Southeast Asia, has become a key market due to its burgeoning digital economy.

Likely because of cultural and geographical proximity, Singles' Day has gone beyond China and is flourishing in Southeast Asia. Singapore-based Lazada said on Friday that 11 minutes into its "biggest one-day sale" on Nov. 11, sales were 124 times bigger than on normal days. Without providing exact numbers, Shopee, Lazada's arch-rival, said that on Singles' day, the sales volume of tens of thousands of sellers was more than ten times higher than usual.

The explosive expansion of these marketplaces is a microcosm of Southeast Asia's e-commerce sector as a whole.  According to a recent report by Google, Temasek and Bain & Company, the region’s e-commerce sector grew 16% to $131 billion in 2022. Despite the resumption of offline shopping as pandemic lockdowns lifted, e-commerce continues to drive the growth of digital economy in the region, projected to grow at a CAGR of 17% from 2022 to 2025, said the report.

Fast growth will inevitably heap pressures on data management, but as the experience in China shows, distributed database like OceanBase are capable of guiding businesses through the challenges of managing rising data workloads. 

In 2021, GCash, the largest e-wallet in the Philippines, started migrating its business to OceanBase to address a series of issues brought on by the exponential increase in both user base and transaction volume.  

The company, which tripled its users in three years, encountered scalability issues as its MySQL database struggled to keep up with the influx of new customers. As an effort to house the increasing number of new users, the company's developers worked overnight on average 4 days a week to complete database sharding and data cleanup, resulting in frequent business interruptions lasting 2 to 3 hours. 

After migrating to OceanBase, GCash is able to tackle the problem of growing data and high concurrency requests, and obtains AZ-level disaster recovery capability, which is crucial to the financial industry. With OceanBase's offerings, GCash has reduced data storage space by 70% and spent 40% less on database resources.

“OceanBase's expertise in high-concurrency control, elastic scale, and disaster tolerance, as well as its cost-effective advantages, make it ideally fit for e-commerce and payment firms in Southeast Asia,” said Yang, the OceanBase CTO. 

Apart from GCash, OceanBase also helped Indonesian digital wallet DANA migrate from MySQL in 2019, making it largely free from scalability issues and fully coping with its high-speed business development. Launched in 2018, DANA has now reached over 115 million users in Indonesia.

As businesses in Southeast Asia seek to manage and leverage data to gain an edge in digital transformation, there is going to be enormous room for distributed database solutions to grow. Considering that Southeast Asia is a multi-cloud market, OceanBase has recently landed on AWS Marketplace and plans to cooperate with more cloud vendors in the future to make its services more accessible.

"Distributed database is an inevitable choice for database development, and the future of real-time data processing depends on it," said Yang Zhenkun, the chief scientist of OceanBase.

OceanBase, which strives to "make data management and use easier," has been at the forefront of tech innovation, giving it a chance to play a pivotal role in the shift from centralized to distributed database architectures.