An increasing amount is now being written about the amount of energy huge data centers – and IT in general – consume. A good deal of the comment is directed at the amount of data IT creates, with the references to its exponential growth growing exponentially. Less is said about what could be done to not just reduce the rate of data growth, but to even start on plans to reduce it – send data into recession.
It may seem a fanciful hope, but something needs to be done to counter the well-entrenched user and industry supposition that data is now, somehow, free. This suggestion came from the changes in computer memory technology wrought by the integrated circuit and the impact of Moore’s Law. One of the first to define the nature of what was happening back in the late 1970s was the Founder of AMD, Jerry Sanders, speaking at a Morgan Stanley Semiconductor Conference hosted by then famed analyst, and later founder and chairman of PC maker Compaq Computer, Ben Rosen. He uttered the immortal words:
We have made computer memory free, and with it data essentially becomes free.
That thought grew, and stuck, and with it the notion that it was no longer an issue of concern. Performance was king and ever-faster read and write speeds became minor Holy Grails. But the old capacity problems of iron-core memory technology – where 4K of RAM was physically big, heavy, and expensive – were thrown away. The dictum of Moore’s Law – half the unit cost and double the capacity every 18 months – made adding more memory a regular system upgrade step. Now such thinking needs to be urgently re-thought.
It is not just an old and cynical journalist who thinks so, either. Senior people in the vendor community are starting to talk seriously in terms of `what do we do about data?’
One such is Mark Molyneux, CTO for EMEA with data management and security specialist, Cohesity, and a man with an interestingly relevant background. While he came to Cohesity from Dell, before that he was a Director with Barclays Bank, running its group storage, virtualization and public cloud operations. He therefore has a real grounding in the relationship between data tech and data as a business tool and raw material.
Drain the (data) lakes
Molyneux sees a direct connection between data management – including the previously unspeakable notion that we are producing far too much of the stuff and it has to stop – and sustainability as a very good reason for that to be the case. Part of the problem now is that not only is there exponential growth in data, but also an exponential growth in ways of creating exponential growth in data, ranging from the multiple copies of Terabyte-sized files in multiple siloes through to the sloppy less-than-optimized way in which so much bloatware applications software is written.
It’s up to the industry to start looking at how that how do it solves these problems and face up to the seemingly silly question – is the time coming when companies have to make the choice between having the lights on or creating new data? Molyneux opines:
I think a lot of it comes down to the way that companies are handling data in their mindset. So there’s always going to be technologies, like data lakes for example, where you can put Big Data in there, do analytics against it, and then throw the data away. But what we’re finding is that a lot of companies are not throwing that data away, they’re keeping it. So it grows. So in terms of passing the point of where the lights are actually kept on, we’re way past that point.
Another great example is the digital twin, a great idea in many ways but, as Molyneux observed, if a change is made to a digital twin of a production system, the need will be to keep the original and the change, and all the iterations of it as well. Suddenly the production environment can be accompanied by ‘n’ copies, all slightly different and all kept ‘just in case’.
Cohesity recently produced an ESG report which showed that companies were still growing their data by 40 to 50% a year, with 28% saying it was more than 50% a year. Molyneux says:
They also said that for every Terabyte of production data, they were keeping four Terabytes of secondary data. This is all data growth without any sort of control around what’s happening to it, and that’s the problem that we’ve got in the industry. At the shop floor nobody’s willing to make the hard choice and say this is how we control data from this point moving forward.
While the mindset is that, in some way, data is ‘free’, none of the above is relevant, but as we move into an era where adding storage capacity comes by the rack, racks cost upwards of £500/per month to host and created data is consuming racks at an ever faster rate, so some changes need to be made.
A five-point plan
Molyneux sets out a five point plan of the steps that any company needs to take to control the growth of its hordes of data, including the option to use the feared, big red button marked `PRESS DELETE, I DARE YOU’. He argues:
It’s the data management strategy that’s critical to everything. If you’ve got that, you can make a very clear decision on what data you’re keeping, where it is, and how long you’re keeping it for. That determines your future data growth. This is where you’re starting to deliver on your sustainability measures.
So what’s the plan?
- Number one on his list, is a relevant records strategy at the top of the company
- Number two is the need to have that strategy pushed down in the company and applied at a technology level, so everybody signs up to it
- Number three is a capable software solution that allows you to run classification and indexing
- Number four, need to have the support of your legal department
- Number five, you need the ability to report on it
The records strategy he sees as the key thing. It is needed to understand, as a business, what data needs to be held, why, where, and for how long. This has a wide range, from critical, but short lifespan, through to locatable but infinite lifespan, and covers every aspect of the business from today’s financials and transactions to historical product designs.
Point two is perhaps even more important. It is one thing to have a policy, but another to actually implement it properly. So its great to apply sensitivity labelling to data, but the point of this is defeated if it is parked in an unstructured data store. Everyone relevant must be attuned to managing data from a technology perspective, and the company should be aiming to use automation to manage it as the data arrives.
Point three is the ‘why do it?’ to the ‘what to do’ of points one and two. A well-organized data classification process will help identify the most important data, in terms of business criticality, urgency and lifespan, and – perhaps most important of all – the option of the fourth point, a defensible deletion. Such strategies can be quite simple to begin with, built around obvious questions everyone should ask of themselves, such as, ‘Why are you retaining that specific data? For what purpose? And for how long?’.
The answers will often need to be based on existing policies, as in the financial sector (such as the classic seven-year Statute of Limitations on many bank records), while others will be the subject of negotiation/discussion within the business. It is also reasonable to assume that one of the biggest issues will be how these classifications are applied to the Petabytes of heritage data every company will still be holding and will need to be indexed and classified so that such defensible deletions can be made.
Molyneux feels there should be more government involvement now in setting the underlying requirements here, especially in the light of them having signed up to international agreements on Net Zero carbon emissions and sustainability. It is also an area where industry and trade associations can play an important role, setting the base levels for data records and classification strategies for their specific sectors.
Lawyers are also more likely to feel better about the possibility of ‘destroying evidence’ with data deletions if such a process is clearly part of what is now seen as best practice in both data management and sustainability, especially as the latter is now an area where the failure to meet specified goals is becoming a legal issue in its own right.
The ability to report on all this is unlikely to be just a ‘nice to have’, especially as it applies to sustainability and reductions in carbon footprints. This data now forms a core part of the reporting required by the European Commission Digital Strategy, for example. A company that is selling something has to be able to defend the making of that something on sustainability grounds. This means that profligacy and poor management in the energy consumed in creating and storing data could contribute to a company losing out on major contracts in the future – and that profligacy and poor management in energy consumption will be hard to hide.
The edge, VDI and not making data at all
Perhaps the biggest unapproachable question now, but one that needs to be asked, is whether it is possible or not to actually reduce, halt, or send into decline, the rate at which data is consuming energy? And by association, can the same be achieved with the exponential rate of data generation?
The short answer is that there are already a couple of possible solutions in play, and as sustainability becomes the unavoidable ‘must tackle’ issue – short of some miracle happening to create boundless supplies of hydrogen, or nuclear fusion suddenly works and energy itself becomes ‘essentially free’ – their potential is bound to grow.
One is edge computing, which has some of its roots in the realization that moving Petabytes of data uses up a hell of a lot of energy, costs a hell of a lot of money, and takes a hell of a lot of time. It is easier to move compute to the data. And at the edge, data is being created, used and managed in its earliest, rawest state, a point where the application of some local intelligence can greatly reduce the amount of data a business might normally generate.
For example, a machine or process sensor will be reporting normality – another ‘thing’ made to spec. In a traditional environment, all that data heads off to the data center for processing and storing. But it can all be thrown away and replaced with a single hourly report – ‘n’ in-spec units produced. The only additional data then would concern exceptions to that normality, and even then the application of a bit more local intelligence could quite possibly handle many of the adjustments and remediations required without reference back to higher authority. This way, data can be kept small and kept local. Molyneux suggests:
The amount of data that needs to be moved about can be reduced just by a completely different structure to the way you think in terms of where data needs to stay, and what it’s really for. You can definitely help your sustainability agenda by focusing on how you handle data at the edge. And if you could bring in an effective data management and classification system that targets that data as soon as it entered the ecosystem, you’re actually going to determine its landing zone and its final destination at that point.
The other main opportunity to cut back on data creation is by using Virtual Desktop Integration (VDI). This is hardly new tech of course, but three things make a big difference now. Firstly, the available technologies have improved markedly to provide the performance levels required. Second, its mode of operation plays straight to needs of all businesses to meet both data security and operational sustainability goals, and thirdly, the growing interest in hybrid working practices, where home working is an option required by a growing number of existing and potential employees.
End users don’t actually ever get the data, they are just sent a pixel stream facsimile of what their work environment is back on the data center. They interact with that in exactly the same way as they would do with any application running on their desktop, laptop or whatever. But everything – data, compute functions and the rest – is retained in the data center, so it’s security is theoretically at least as good as it can get. It also means that there is just one copy of much of the data to manage, not the multiple copies that commonly fly round the world. Molyneux notes:
I’ve been a big advocate of VDI for a long time. I’ve seen how successful VDI can be in simplifying how you deploy and use technology, and also the cost side of it as well. It costs a lot of money to send a pre-built device out to an end user. If it’s a VDI, I can do that centrally.
Molyneux did acknowledge one negative, namely that if there is an outage, everyone can be hit. But this can now be solved if the policies on operational resiliency are good enough. This probably does need more technology in the data center, but still involves less investment and resources than rolling out pre-built systems to `n’ thousand users, rather than thin clients or old, doctored PCs:
It makes strategies like data management very simple, because you’re only applying it at that one source in the datacenter. So now you don’t have that fragmented problem where data is out at the edge. VDI solves a lot of that because the data actually isn’t there. It’s here.