A more efficient method for training large language models, like those in the GPT series, can complete the process in the same amount of time while using up to 30% less energy, as revealed by a recent study from the University of Michigan.
This new approach could save enough energy to power approximately 1.1 million homes in the U.S. by 2026, according to projections from Wells Fargo regarding the demand for AI energy. Additionally, it could help mitigate the International Monetary Fund’s forecast that data centers may contribute to 1.2% of global carbon emissions by 2027, along with the related water usage from that energy consumption.
Some specialists believe that the potential benefits to the environment could outweigh these costs. They suggest that AI could significantly aid in combating climate change by optimizing supply chains, managing energy needs, and enhancing climate change research. However, the unnecessary consumption of energy remains a concern, particularly because some energy utilized for training AI has no real effect on the training duration or the accuracy of the model.
“Why waste anything when there’s no benefit?” asked Mosharaf Chowdhury, an associate professor of computer science and engineering at U-M and the lead author of the study presented at the 30th Symposium on Operating Systems Principles.
“We cannot continue to build larger and larger data centers due to power constraints. By reducing the energy used by AI, we can lessen its carbon footprint and cooling demands, allowing us to perform more computations within our existing energy limits.”
The energy waste stems from the uneven distribution of tasks among GPUs, which are specialized computer processors for handling large data and graphic tasks. Though this division of work is necessary to handle massive datasets, it can lead to inefficiencies.
“Today’s AI models are so extensive that they cannot be accommodated by a single computer processor,” stated Jae-Won Chung, a U-M doctoral candidate in computer science and engineering and the study’s primary author. “These models must be segmented across thousands of processors for training, but achieving perfect equality in the division is virtually impossible.”
The challenge in evenly distributing training tasks lies in the need to group certain tasks together on the same processor, similar to arranging book series on a shelf. Depending on the way tasks are organized, some processors may end up with significantly larger workloads compared to others.
Current training techniques operate each processor at maximum capacity, causing those with lighter loads to finish their tasks earlier than others. This approach does not accelerate the training process, which only concludes when all processors have completed their calculations – it simply results in wasted energy, as faster processing consumes more power. Issues like faulty hardware or network delays further contribute to energy waste by hindering a single processor’s performance.
To mitigate energy waste, the researchers developed a software tool named Perseus, which identifies a critical path, or a sequence of subtasks that will take the longest to accomplish. Perseus then reduces the speed of processors not on the critical path, ensuring that they complete their tasks around the same time, thereby minimizing unnecessary energy consumption.
“Lowering the energy costs of AI can significantly affect the accessibility of AI technologies,” Chowdhury commented. “If a country lacks enough power to operate a large AI model, it might have to rely on distant services or be limited to using smaller, less accurate models. This could exacerbate existing inequalities among different communities.”
The team evaluated Perseus by training it on GPT-3, three additional large language models, and one model for computer vision.
Perseus is available as an open-source tool within Zeus, which is designed for assessing and optimizing AI energy usage.
This research received funding from several sources, including the National Science Foundation, the Dutch Research Council (NWO) Talent Programme, VMware, the Mozilla Foundation, Salesforce, and the Kwanjeong Educational Foundation. Computational resources were provided by Chameleon Cloud and CloudLab for this study.