UK health and wellbeing firm Holland & Barrett is using a wide set of tools from Amazon Web Services (AWS) to help a move to a fully modern data management architecture.
The company’s Chief Data Officer, Dobo Radichkov, states that while the company is only in phase II of a planned digital transformation, it has already achieved the basics of a three-layer modern data architecture.
If you look at where we were even a year ago, a lot of the strategic decision making was subjective—a lot of gut feel and opinion. Now, we’re removing that and introducing accountability and transparency into discussions and key meetings.
Both high level and on an operational level are starting to be more and more backed by data.
Governing data from a single place
Holland & Barrett’s cloud partner AWS defines modern data management as the best way to break down what it says are three ‘silos’ from keeping data being fully utilized in business – data, people, and business.
The vendor claims that these can finally be neutralized if organizations move to a four-part replacement structure, composed of a data lake at its base.
This should be supplemented by purpose-built data services, as well as the ability to seamlessly move data around the architecture to where it needs to be, for the right process and analysis. Finally, AWS recommends organizations should put unified governance in place to securely manage data across a data platform, from a single place.
Radichkov agrees – but says that this is a portrait of an ideal modern data management reference architecture, which his environment is progressing toward but does not yet fully match.
That picture is for sure the best conceptual modern data architecture, but what we have is more of a real world use case – implementing aspects of that, not the whole thing in full.
Elements like the data warehouse, data lake, and governance, we have those in place – but other aspects we do it in a slightly different way. In practice, every company will do modern data architecture differently.
From data lake storeroom to data kitchen
That’s not to say Radichkov is not a keen user of AWS solutions – far from it.
For example, the base layer of the three data layers that make up the company’s data architecture is “essentially a collection of Amazon S3 buckets.”
Our bottom layer is a data lake where we take the raw data from our source systems, which are a combination of legacy on-premise systems like our AS/400 or legacy Oracle systems, as well as new tech and cloud databases and other data sources we link to via APIs.
This is where we store the raw ingredients, and keep them at the right temperature, etc. So, we’ve got the data, it’s governed, it’s protected, it’s ready to be accessed, but it’s not ready to be consumed.
There, the company securely stores data using technologies such as S3 for storage, Amazon’s EMR cloud big data platform for processing , Athena for analytic queries, and the supplier’s RDS relational database system, among other technologies.
The main software that works on the data lake’s data is Redshift, which is the platform Holland & Barrett’s data warehouse is written in.
However, the firm also uses a range of open source tech, too – like Kafka for streaming, and Apache Airflow for workflow management and Matillion for ETL (Extraction, Transformation and Load).
Next layer up, he says, is where the ‘cooking’ of the data takes place – the company’s data ‘kitchen.’
Specifically, this is where preparation of data for organizational consumption happens. Here, Redshift takes Holland & Barrett’s raw data, operationalizes it into logical entities in an entity relationship model, and builds in concepts around what is the customer, what’s an order, what’s a store, what’s a unit of stock, and so on.
Holland & Barrett pumps the data lake content in as raw as possible, he notes, either using the Redshift native way of ingesting data from S3 into a cluster, copy, or a feature called Spectrum which allows developers to create external tables that can be read directly from S3.
At the top of the three layers is delivering that out to users as a finished ‘meal.’ At this level, the data team has built services for the consumers of its prepared data.
These, says Radichkov, range from either the firm’s central data analytics team or any analytic-enabled team across the organization who connect either directly to the data warehouse or through the firm’s chosen open source BI (Business Intelligence) platform Metabase.
They then run the analytics and run the workflows they need to do their jobs. To give you an idea of the scale, we’ve got now 2,500 tables and data sets within Redshift. And in terms of activity, we’re supporting about 2 million queries a month from all our use cases and BI solutions.
In fact, in just about a year, we’ve gone from zero to over 800 registered, active monthly BI users on our platform. And these are the users that go in every day, consume information, fetch analytics and insights to do their day job.
In addition, all of the company’s operational daily, weekly, monthly reporting is done through its modern data platform. This includes standard management reporting, but also real-time intraday reporting, which allows line of business managers to know how the business is doing hour by hour. This is especially useful to track the progress of promotions or in peak season.
Another benefit of modern data management, he says, is that the company now also has exceptional reporting which allows it to catch problems early on.
Even better, predictive capability is also coming on-stream. He says:
Using our data and analytics, we can ask questions like: is the unit economics of our business model correct? Is our cost structure right? And can we optimize our store network in terms of where we open new stores or close?
Omnipresent analytics across the business
Radichkov says his new data engine can also help tell decision makers in the firm if they are correctly competitively positioned when it comes to pricing, or if the size of its range is correct, as well as other new tools, like better data-driven insights into promotions and return on investment on marketing budget spend.
Summing up his approach to modern cloud-based data management, for Radichkov the verdict is clear:
We see this transformation as a core part of our strategy; we want to own it and we want to build it ourselves, as we believe that we can scale it faster that way.
In the next six months, I would like to see much more omnipresent analytics across the business—making sure that we have strong analytics in every department and in every important decision making process, but already we’re seeing improved availability of our products and more personalized and more relevant experiences for our customers which leads to better retention.
Overall, we are seeing the whole organization becoming more effective and productive, because they have the data that they need to make decisions.