Fig. 1: In the age of big data, artificial intelligence architectures such as neural networks are powerful tools for pattern recognition in large datasets - at an energy cost. (Source: Wikimedia Commons) |
Artificial intelligence, or AI, is the field of study
that aims to develop computer systems with the ability to process
information and handle tasks that typically require human intelligence,
such as visual and auditory pattern recognition and decision- making.
[1] When we reflect on the ways in which advances in AI have impacted
our everyday lives, we may look to the facial recognition technology
unlocking our mobile phones and the ease at which we can now say,
Hey, Google,
when a random question pops into our heads. It is
undeniable that AI has vastly changed our lives, improved the
accessibility of information and learning, and advanced other fields of
study, such as medical imaging and autonomous vehicles. But at what
cost? The energy usage and costs associated with AI development, which
have been largely invisible to the public eye, are worth examining more
closely. [2]
Since its conception, the AI community has largely focused on obtaining state-of-the-art model accuracy, regardless of the cost or efficiency. This type of performance-based AI research has been labeled as "Red AI." [3] In recent years, there has been the emergence of a "Green AI" movement to make computational efficiency a key criterion for AI models. [3] Presently, most AI models are "red" with following cost scheme:
where E is the cost of executing the model in a single example or instance, D is the size of the dataset, and H is the number of hyperparameter experiments, which affects the number of times the model is trained. In the above equation, the cost of training the AI model linearly increases with an increase in any of the parameters. [3] A green AI model would improve efficiency in a number of ways, such as by reducing the electricity usage, total runtime, or number of parameters. One concrete measure for efficiency is the number of floating point operations (FPOs) required to generate a result. The FPO of a model is often reported in units of FLOPS (floating point operations per second) and can be thought of as the computational analog to Joules. In the following, we will examine a case study that allows us to make a direct conversion of FPO to energy.
Fig. 2: Christofari, Russia's fastest supercomputer designed to handle artificial intelligence algorithms and training models at high speeds. (Source: Wikimedia Commons) |
Deep learning models, which comprise several network layers (see Fig. 1 for a photograph representation) and typically thousands of input images, require particularly extensive training times. [3,4] Unsurprisingly, this translates to high computational costs. In the span of 5 years, from the AlexNet in 2012 to the AlphaZero in 2017, the computational cost of state-of-the-art deep learning models has increased from the order of 102 petaflops to 107 petaflops at a rate that doubles nearly every 3 months. This rapid increase in computational cost is projected to continue for the next decade. [3]
Let us assume a machine utilization of 33% and the use of a top tier graphical processor unit (GPU), Nvidia GeForce RTX 3080, which has an average performance rating of 29.77 teraflops per second (TF s-1) and a 320W power rating. [3] Let us first calculate the number of petaflops (PF) associated with running this GPU for 1 day (assuming utilization of 0.33):
2.977 × 10-2 PF sec-1 × 3600 sec h-1 × 24 h × 0.33 | |
= | 849 PF. |
The associated energy associated with running this 320 W rating GPU for 1 day is:
320 J s-1 × 8.64 × 104 s day-1 | |
= | 2.76 × 107 Joules day-1. |
Assuming the price of electricity in the US is $0.10/kWh or $2.78 ×10-8 per Joule, this energy usage corresponds to an economic cost of $0.77 per full day of processor use and $9.05 ×10-4 per PF. [5] Taking the reported FPO of AlphaZero (1860 PF), this corresponds to a total cost of only $1.68 for one training experiment over a span of approximately two days, which seems fairly reasonable. [3] However, in practice, training deep learning models on a single GPU is not very time efficient. For large scale model training, supercomputers (such as shown in Fig. 2) with more than 100,000 processors are used. [6] Using the Nvidia RTX 3080 specifications and assuming 1 × 105 processors, we obtain an energy cost of 2.76 × 1012 joules and a cost of over $77,000 to run a supercomputer for one day. Note that iterative costs of several hyperparameter experiments would increase the energy and economic costs further.
If we zero in on the AlphaZero model again as a case study, the number of joules burned for training experiment lasting 2 days is approximately 6 × 107 joules. Taking an average (50th percentile) carbon emission of burning coal as 1 kg CO2 per kWh, this corresponds to approximately 16.8 kg of CO2 emitted. [7] While this emission level may seem acceptable for one experiment, let us compare it to the carbon footprint of training multiple deep learning models on a supercomputer for one day. An energy cost of 2.76 × 1012 joules corresponds to almost 7.7× 105 kg or 770 metric tonnes of CO2 emitted per day. [7] This carbon footprint is quite alarming, especially considering the rapid FPO doubling rate as AI models become increasingly computationally expensive. While there have been efforts to examine energy-aware scheduling in supercomputing facilities to improve power consumption efficiency, and a push for green(er) algorithms may slow the rapid growth of energy consumption for AI training, these measures will not be sufficient if an overwhelmingly silent majority of AI research underreports (or omits reporting) the energy costs. As climate change and the environmental impact of technological advancements become an increasingly vital point of concern, the ethics underlying the tradeoff between performance and sustainability are more important than ever. While we celebrate a world class chess-playing AI program and demand advancements in autonomous vehicle collision avoidance, a troubling question surfaces: in our capitalistic push for faster and more accurate AI systems to improve our livelihood today, are we sacrificing the future of generations to come?
© Elaine Lui. The author warrants that the work is the author's own and that Stanford University provided no input other than typesetting and referencing guidelines. The author grants permission to copy, distribute and display this work in unaltered form, with attribution to the author, for noncommercial purposes only. All other rights, including commercial rights, are reserved to the author.
[1] N. J. Nilsson, Principles of Artificial Intelligence (Morgan Kaufmann, 2014).
[2] B. Brevini, "Black Boxes, Not Green: Mythologizing Artificial Intelligence and Omitting the Environment," Big Data and Society 7, No. 2, (July-December 2020).
[3] R. Schwartz et al., "Green AI," Commun. ACM 63, 54 (2020).
[4] T.-J. Yang et al., "A Method to Estimate the Energy Consumption of Deep Neural Networks," 51st Asilomar Conference on Signals, Systems, and Computers, IEEE 8335698, 29 Oct 17.
[5] "Electric Power Monthly December 2008," DOE/EIA-0226 (2008/12), U.S. Energy Information Administration, December 2008.
[6] Supercomputers: Directions in Technology and Applications (National Academies Press, 1989), pp. 35-47.
[7] W. Moomal et al., "Annex II: Methodology," in IPCC Special Report on Renewable Energy Sources and Climate Change Mitigation, ed. by O. Edenhofer et al. (Cambridge University Press, 2011).