How does a European e-commerce giant with a revenue of 5 billion turn its technological strength into operational competitiveness?
The Taiwan developer event MWC (Modern Web Conference) returned to the physical form this year, with more than 60 speeches in three days, and the scale is larger than in the past. Bringing overseas development experience to Taiwan is the original intention of MWC. This year, the focus is on introducing multinational corporate experience, inviting JP Morgan and Zalando to Taiwan to share. Andrew Howden is the head of Zalando's embedded SRE team, responsible for improving the reliability of the transaction experience. He is also a member of the technical maintenance readiness team of the online shopping week, helping colleagues establish processes for technical risk management and maintenance readiness.
When a customer adds a new product to the shopping cart (this is the starting point for the product to start trading with the customer), it is shipped from the warehouse in accordance with the order and handed over to the logistics operator (this is the end point of the platform holding the product). In this process, the customer Feeling, Zalando calls it transaction experience, includes the process from when customers can see the product entering the shopping cart on the platform to displaying the shipment. This transaction experience process involves 4 business departments, 10 teams, and more than 100 developers. Zalando's embedded SRE team was established to solve the special problems of the transaction experience process.
In 2019, after two years of trying SRE on a small scale, Zalando decided to develop SRE into a company-wide operating strategy and establish a large-scale SRE department.
It was in this year that Andrew Howden joined Zalando as an SRE engineer. In 2021, he became one of the main engineers and participated in the design and maintenance operation readiness workflow plan. He wanted to develop a set of self-service evaluations to improve the reliability of thousands of systems. , which involves knowledge of the technical architecture, business fields, and event processing behind different systems.
Early start-up period: Quickly build a mall with PHP software package, and two people can manage the whole site
To understand Zalando's technological evolution, we have to go back to 2008, when Google had just released the Android system, Apple's iPhone was a hit, and the App Store was newly launched. In the fall of that year, Zalando's two founders, Robert Gentz and David Schneider, founded this fast-fashion e-commerce company to sell shoes online. Unexpectedly, the company encountered a financial crisis within a few days of its establishment.
Zalando didn't have much money on hand and had to find ways to save money. The two founders rented a sublet apartment in Berlin as an office and warehouse. Because they were just starting out and didn’t have many customers, they tried different business experiments. Later they found that free shipping and the 100-day return right were the two most popular highlights for customers, which also became Zalando’s early advantages in expanding the e-commerce market. Zalando's two delivery guarantees later became the standard that customers expect from other e-commerce platforms.
At that time, in order to quickly build an e-commerce transaction platform and because it was easy to find development engineers familiar with PHP, the first generation of Zalando e-commerce platform used the e-commerce software Magento developed in PHP. This software is like the ancient WordPress and can integrate third-party functional modules.
Zalando's strategy in its early stages was "Move fast, Break things." Zalando had no maintenance team, let alone a platform team, and relied on only 1 or 2 employees to manage the entire website.
Two years later, in 2010, the first iPad was launched, Netflix released 12,000 movies, and social media began to appear on mobile phones. Zalando's performance entered a period of rapid growth, and it expanded beyond Germany and into other countries, including the Netherlands and France. At that time, Zalando had 20 full-time employees and many highly loyal customers.
In 2010, I redeveloped the entire e-commerce platform using Java.
However, the PHP e-commerce software suite chosen at the time of its founding began to have problems and could not be further expanded. Even though the development team tried their best to modify the underlying code and architecture, they still could not solve the problem. In the end, they had to cut off the entire architecture and start over. Zalando redeveloped the entire e-commerce platform, changing from a combination of PHP and MySQL database to a large monolithic application using Java language and PostgreSQL database.
Zalando switched from an old version of its e-commerce software to a new version of the platform. At that time, it was down for "only" 90 minutes. "This was an acceptable time back then. If it happened now, it would be a catastrophe that would attract media attention." Andrew Howden joked say. People's expectations and standards for technology will change with time and space. This is a typical example.
However, the team at that time was not large, but it also began to encounter agile problems. How to perform releases more safely? In order to maintain a certain degree of control, they developed a deployment checklist, which must be confirmed every time they release it. Although it slows down the release of updated code, it can make the release process more reliable and maintain customer satisfaction. trust.
In 2014, Zalando continued to expand in scale, and this year it also completely changed to a responsive (RWD) website design. This year is the year when the container technology Docker was born.
Annual revenue has grown 4 times in 4 years, and the technical challenges behind the three key strategies
From 2010 to 2014, Zalando's revenue increased fourfold to 22 billion euros (approximately NT$750 billion).
Andrew Howden pointed out that the key to Zalando's positive performance growth comes from three major strategies. The first is the "scale" strategy, which involves taking any necessary means to expand the software system, even cutting off the old version completely and creating a new one. The second strategy is "localization". Zalando's service scope has expanded to multiple markets, from Sweden, Denmark, Finland, Norway, Belgium, Spain, Poland to Austria. Make local adjustments based on differences in language, currency, legal compliance requirements, and maintenance needs of each country.
Another key strategy is "differentiation". Zalando has also begun to transform into a department store model, introducing a partnership system to allow third-party suppliers to sell their products on the Zalando platform.
"The biggest test these three decisions will bring to technology is that software that was originally developed only for internal use must now be provided to third parties. It must also have scale capabilities and meet the local needs of each country." Andrew Howden emphasized. .
In the next three years, Zalando relied on several technical countermeasures to quickly launch in various countries. On the one hand, it introduced enterprise-level ERP, and on the other hand, it began to develop localized official websites in each country, and also built an online store capable of handling large-scale orders and transactions. Centralized platform.
However, these three strategies led to Zalando developing more and more systems, which became more and more complex. For example, in 2010, there were only 7 deployment units, but by 2012, there were more than 100 deployment teams, and release management and coordination became a big challenge.
In order to simplify the complexity, Zalando requires that each piece of software be developed using only three main technologies: Java, Tomcat, and PostgreSQL.
The birth of the first platform team
Not only that, Zalando has also begun to build its first "platform" team to manage system engineering, database engineering, platform software engineering, and also provide various security consulting.
For example, they built a deployctl tool to manage the release process, and also used the open source network management monitoring tool zmon (similar to Nagios) to regularly track whether all software and systems are operating normally.
With these organizational, tool and technology-defined supporting practices, Zalando was able to release weekly at the time. Every release is tested by a Quality Assurance team to ensure correctness, and a small team provides on-call support.
However, as the number of people in the development team grows, more and more work needs to be coordinated for each release. As systems become larger and larger and related maintenance information is scattered everywhere, it becomes increasingly difficult for engineers to effectively grasp relevant information about the work they are doing.
In order to control the releases of engineers, the platform team adopted increasingly strict measures, but unexpectedly, this in turn slowed down the pace of releases. Andrew Howden said: "Although the platform team had good intentions, it unexpectedly limited the company's ability to innovate and become a market leader. The platform team began to move closer to reliability."
In 2014, Kubernetes was born and quickly became mainstream. The official version 1.0 was launched the following year, which also led to the establishment of CNCF. This organization took over many cloud-native technology projects, such as gRPC, etcd, RunC, Envoy, Jaeger, etc. 2014 was also the year of Zalando's initial public offering (IPO).
In response to the new vision after IPO, we embrace cloud native for cloud operations in many countries across Europe
The IPO will bring Zalando more funds and greater capabilities, but it will also mean greater pressure for development. Andrew Howden said that if Zalando wants to grow and expand faster, it must innovate.
Zalando's new vision is to create a "fashion platform" that can connect a large number of people with fashion, allowing third-party partners and stores to sell various fashion products on this platform.
However, Zalando's technical decisions in the past few years have gradually developed into a relatively reliable but difficult-to-change technical architecture. This "stable" architecture cannot keep up with the new vision after the IPO.
In order to support the future layout of multi-country operations across Europe, at the end of 2014, Zalando decided to fully embrace the public cloud, introduced Docker container technology, and began to use microservice architecture to replace the old monolithic architecture.
The year after its listing, in 2015, Zalando began to develop an e-commerce platform strategy. It wanted to turn itself into a technology platform provider in the e-commerce ecosystem and began to export its own technical services to partners and ecosystem partners.