Big Data and Artificial Intelligence can support us in one of the major quests of the 21st century - solving the problem of climate change. However, it is becoming more and more obvious that data collection and storage can also itself be a part of the problem. The energy consumption of data centers in the European Union could potentially reach a staggering 3.21% of total electricity demand in 2030. Considering the increasing transparency requirements of ESG regulations, rising energy costs, and data centres expected to increase more than 100% throughout the world and especially in Europe, climate change is a major problem not only on a macroeconomic level but also on a microeconomic level. Annual global data amounted to 2 zettabytes in 2010 and increased to 97 zettabytes in 2022. With annual data production expected to reach 180 zettabytes in 2025, almost double the current amount, the C-suite needs to rethink data management choices.
In recent years, many countries implemented environmental regulations to combat climate change and took steps to decrease greenhouse gas emissions around the world with the Paris Climate Agreement, the EU Green Deal, and the COP27 Conference. With the Green Deal, European Union countries aim to decrease carbon emissions by 55% by 2030, and be the first carbon neutral continent by 2050. Accordingly, carbon border adjustment (CBAM) has been implemented to prevent companies from shifting their production to countries with fewer emission restrictions. The first international emission trade system implemented under this regulation means that in industries with intensive carbon emissions, such as cement, iron-steel, aluminium, fertiliser, products produced outside European Union countries and exported to European Union countries must be priced as per their carbon costs.
Preserving the EU market is very important for Türkiye, as 40% of its exports are sent to EU countries. The Final Declaration of Türkiye’s first Climate Council, published in 2022 by the Ministry of Environment, Urbanisation and Climate Change, acts as a roadmap to combat climate change for the next 100 years. According to the Final Declaration, which provides the infrastructure for short, medium and long term plans in line with the 2053 net zero emission target and green development aims, an ETS pilot will be implemented in our country in 5-year stages starting from 2024, in accordance with the Carbon Border Adjustment Mechanism (CBAM) calender implemented by the European Union. To avoid double carbon pricing, in the scope of ETS implementations the creation of a roadmap suited to national circumstances for converting existing taxes into carbon taxes by revaluation is planned to be completed by 2025.
How many emails have you and/or your team written today? How many e-mails accumulate in your mailbox that you have to keep a record of? Do you have an email network where uploaded attachments have to be sent and received? If data is gathered per department, company or country, your answers to these simple questions will reveal a lot of information. One potential piece of information which could be gathered from the answers given is the grams of CO2e you and/or your team produces in one day just by sending emails. Studies have found that one standard email equates to around 5-10 grams of CO2e per a day. Imagine attaching large files to the email, and this number doubles or triples instantly. Gathering, sending, and especially saving data creates emissions and ultimately contributes to harming our environment. The average numbers per capita for companies in finance, insurance, and service industries are higher than in other industries. These numbers increase by greater amounts in retail and fast-moving consumer goods industries, which analyse and use big data to understand customer habits and increase service quality. This is also true for cloud computing, where most of the data is hosted today and which is still experiencing immense growth. There are already some early initiatives to make cloud computing “green”, by focusing on the environmental impact of data centers. Nevertheless, there are still some hurdles to be cleared in order to achieve the goal of green data, which cannot only be realized at the data center level.
In the past, companies - especially in big tech - have gathered as much information as they could find. For simplification, let’s use the image of a giant hoover gathering all the information available, even though you are only after that one needle you have been looking for in your carpet for a long time and that keeps stabbing you in the foot. Clearly, you never know if there is a second needle which might stab you in the future (to relate this to the business context, such a needle might be an issue that needs to be solved or a piece of information you would need for a different, important analysis). Due to ever lower storage and computation costs, widespread data gathering, and storage has made sense in the past; however, the situation is changing.
The “hoover” data gathering strategy has two main negative side effects, which need to be accounted for:
SNR (Signal to Noise Ratio) – Data analysts and companies can no longer detect the relevant information, i.e., the signal, within the sheer amount of data they are provided with. Companies start to be more concerned with thinking what to do with the data and complying with storage and retention obligations established by law than actually working with the data. The high volumes of data that need to be processed lead to an immense consumption of human resources of highly trained professionals. In the current situation, these professionals are not only difficult to find in the labor market, but also expensive to maintain. A possible solution for this issue, of course, would be to use computer/software-based technology. There are already a vast amount of Big Data tools available on the market but using one of these tools always runs the risk of high associated costs without creating a comparable impact on daily business. Even well-known and successful data tools need time and the right input variables to filter appropriate results from the huge data pool. This process also requires energy, which can lead to CO2e emissions and ultimately cost the company money. In addition, there is the risk that the tools only filter out correlations and not causations from the data. The lack of causality in non-hypothesis-tested data could, in the worst case, even lead to wrong decisions that can harm the company in the long run.
An example of misinterpreting data in this way could be the layout of retail shops. Imagine that in a recent analysis of collected data, it is found that customers in shops with a larger footprint spend more time in the shop and make more impulse purchases. Thus a higher volume of purchases per customer and also per square meter of retail space is achieved. One could now conclude, regardless of other factors, that stores with a larger footprint are more attractive from the sales point of view. Without a controlled experiment or hypothesis-based testing, the decision could be made to build all new stores on a larger footprint - even in the city. However, it is difficult to find out whether this relationship was causal. It could also be that people in rural areas spend more time in the store due to buying more and the longer distances for weekly shopping. In addition they might be more inclined to make impulse purchases due to other demographic factors. If you use the aggregated data and apply the logic to city shops, this conclusion does not necessarily hold true.
Greenhouse gas emissions - Collecting and storing big amounts of data requires a lot of energy, which contributes negatively to a company’s ecological footprint. We are observing a very positive trend, in that more and more companies are proactively addressing climate change. This also includes large providers of cloud computing services (e.g., Google Cloud, Microsoft Azure, AWS, and even Tencent) which are actively marketing their cloud as a "green" cloud. Marketing activities include publishing KPIs such as PUE (Power Usage Effectiveness) and the use of carbon offsets (especially in the US). Changing the way companies gather, use, and store data can have an impactful effect in curbing CO2e emissions and reaching carbon targets. Gathering a huge amount of data which is ultimately not even needed costs the company money (e.g., electricity costs) and does not create value.
Source: Republic of Türkiye Ministry of Environment, Climate Council 2022
Since we welcomed smart phones into our lives our total annual energy consumption has increased by 0.4% as a result of charging these phones. According to a survey, the energy consumption of large data centres, of which there are more than 75 in Türkiye, increase every day. Total data also increases every day because the amount of data processed continues to rise and statistical and regulatory data must be stored. The question to be addressed is how to decide which pieces of information should be gathered and stored. A simple solution to this data problem is to ensure that only useful data is collected for a limited time and data that is not useful is deleted. BUT this only works to a limited extent because the use cases change over time. Two years down the line, missing data that could have been gathered earlier might increase project costs and duration significantly. In our opinion, there are three actions which help companies to improve their data management when focusing specially on decarbonization.
Plan ahead
Today, the average annual growth of the Fintech ecosystem in Türkiye is 14%, which requires companies to think about tomorrow’s usage rates now. For sure, that is easier said than done, but also bear in mind that in our ever-changing world keeping very long historical data is only useful for very specific cases anyway. Planning in advance which data might be needed and which topics are very unlikely to be needed can already be done now. Beware, however, of the trap of collecting mindlessly. A focus on the most relevant information must be ensurred.
Divide into hot and cold data
The decision to store "cold/dormant" data is one that has to be made consciously and is especially aimed at information that does not need to be directly and always accessible. Obviously, this consideration should already be included in the aforementioned use case planning. Data that is needed for projects with longer planning horizons and without tight deadlines can be stored cold in this way. Current offers from storage service providers show that cold storage is on average half the cost of hot storage, this being specifically due to the lower energy consumption.
Implement agile data management
Of course, planning once and classifying data into different categories is not enough. Iterative and recurring processes to identify relevant data and determine the granularity of this data is the path to success. Following the motto that it is sometimes necessary to prune the tree so that it can grow again, agile data management should be implemented within companies to guarantee reevaluation and energy efficiency.
Not following the trend of focused data collection, analysis and storage will ultimately lead to high inefficiencies, useless analyses, expensive data storage, and a bad impact on the environment. With our proven methodology, your company can save up to 10-20% of data holding costs and will lower its carbon footprint significantly.
Yusuf Bulut, Mehmet Özenbaş and Michael Zeitelberger also contributed to this article.