Benchmarking power system optimization: CPU vs GPU

Author khanal1990
PublishedDecember 29, 2024December 29, 2024
in Energy

Power systems are getting increasingly complicated with renewable energy integration, distributed generation, storage, increasing demand, and more. Naturally, optimizing power systems is getting more computationally intensive. While gaming and AI industries have adopted GPU to speed up intensive computation, the adoption seems limited in power system planning. As an applied AI/ML researcher for energy and climate, I wanted to explore the extent to which GPU-based operations could speed up power system optimization.

I benchmark optimization power systems of different sizes using CPU and GPU-based approaches. I modeled power systems of various node sizes using PyPSA (a popular Pythonic framework) and another bespoke minimal GPU optimal setup. PyPSA was chosen because of its simplicity of implementation and growing adoption.

The PowerSystem class that models the fundamental components of an electrical grid includes:

Buses (nodes) representing connection points
Generators with specified capacities and costs
Loads (power demand) at various points
Transmission lines with physical parameters (reactance, resistance, capacity)

A simple representation of the power system is chosen with a linear chain network topology for experimental purposes only. The Appendix details configurations with a 100-node system chosen as an example.

A power system optimization problem typically involves finding the most cost-effective way to meet electrical demand across a network while respecting physical constraints like transmission line capacities and generator limitations. This is a linear programming (LP) problem in which we minimize generation costs subject to power flow and capacity constraints.

For the GPU-based optimization, I used the CuPy library. The main way GPU speeds up computation is by dividing the computation into batches and running them in a large number of parallel processes. I chose a batch size of 100 and used an OSQP solver optimized for GPU configuration.

The study tested both implementations across eight different system sizes:

Small systems: 10, 100 nodes
Medium systems: 500, 1,000, 2,000 nodes
Large systems: 5,000, 10,000, 20,000 nodes

For each size, the benchmark measured:

Solution status and correctness
Execution time for both CPU and GPU implementations
Objective function values (total generation cost)

Summary of results: I logged time taken by CPU and GPU, as well as the optimized objective values estimated by CPU and GPU. We notice that the optimized objective values as comparable for most sizes (except the smallest size).

Size	CPU Time (s)	GPU Time (s)	Speedup	Objective value Difference (%)

0.659

0.187

3.532

20.16%

100

0.553

0.227

2.437

2.08%

500

0.693

0.884

0.784

0.41%

1000

1.203

1.772

0.679

0.21%

2000

2.652

4.192

0.633

0.10%

5000

27.465

13.599

2.020

0.04%

10000

179.226

39.463

4.542

0.02%

20000

1177.919

126.769

9.292

0.01%

While I found that for the system sizes chosen, GPU generally speeds up computation, the order of magnitude differs by system size. The speed-up is relatively high for smaller system, goes does to below 1 for medium sizes and again dramatically speeds up for larger sizes.

The results reveal several important insights about GPU acceleration for power system optimization:

Performance Crossover Point

There appears to be a “crossover point” around 2,000 nodes where GPU acceleration becomes clearly advantageous. This suggests that:

For smaller systems, the overhead of GPU memory transfers may offset potential gains
Larger systems better utilize GPU parallelism, leading to substantial speedups

Scalability Characteristics

The GPU implementation shows superior scalability:

CPU time grows roughly quadratically with system size
GPU time grows more linearly, especially for larger systems
The speedup factor increases with system size, suggesting even better performance for very large systems

Implications

The results show remarkable speedup using GPU-based optimization, favoring its usage in large systems requiring prompt optimization. However, GPUs consume more electricity and water, and environmental factors must be taken into account during implementation. Ultimately, it’s about the trade-off between time saved, GPU costs, and potential environmental costs.

Use the following Github gist for replication. https://gist.github.com/kshitizkhanal7/4bed7ac04f9f89f64c99a5d297a611b7

Appendix: Reference system with 100 nodes

The system represents a large-scale power transmission network with several key characteristics:

System Structure
- 100 buses (B0 through B99)
- 100 generators (G0 through G99)
- 100 loads (L0 through L99)
- 99 transmission lines connecting adjacent buses
Generator Characteristics Each generator has:
- A maximum capacity of 1000 MW
- A cost that varies linearly across the system:
  - G0 starts at 50 $/MWh
  - G99 ends at 150 $/MWh
  - Each generator’s cost increases by approximately 1 $/MWh This cost gradient creates an interesting optimization problem where cheaper generators are preferred but transmission constraints may force the use of more expensive ones.
Load Pattern Each load follows a sinusoidal pattern:
- Base load of 500 MW
- Variation of ±100 MW based on position
- The formula P = 500 + 100*sin(2πi/100) creates a wave pattern across the system This pattern mimics real-world load variations while maintaining mathematical tractability.
Transmission Lines Each line connecting adjacent buses has:
- Capacity of 1000 MW
- Reactance (X) of 0.1 per unit
- Resistance (R) of 0.01 per unit These parameters create realistic power flow constraints.
Optimization Problem Size The complete system creates a substantial optimization problem with:
- 100 decision variables (generator outputs)
- 99 line flow constraints
- 100 power balance constraints
- 100 generator capacity constraints Total: ~400 constraints and ~100 variables
Performance Results For this 100-node system, the benchmark showed:
- CPU time: 0.553 seconds
- GPU time: 0.227 seconds
- Speedup factor: 2.437x
- CPU objective: 4.717929e+06
- GPU objective: 4.815897e+06