Energy Archives - Kshitiz Khanal

Most of the energy/systems modeling tools I have used, from engineering school to today, have been desktop (mostly Windows, followed by Linux) applications. Typically, I’d use a popular desktop software to define my model within its platform and compiled and ran it on the same PC to get the results. I reported the numbers and never had to think about the hardware. Thinking about hardware was not a part of the modeler’s job description.

Many models these days run on vendor’s cloud, with an API linking the user to the software that sits within a hardware controlled by the vendor. The modeler’s PC is only a terminal. The vendor thinks about the hardware, the modeler models. Thinking about hardware is still not part of a modeler’s job description. Then again, modeling used to be a high-level endeavor in which modelers think about the problem and its mathematical formulation within a modeling platform.

The accessibility in definition enabled by general purpose programming languages

Modeling is not merely a high-level abstraction anymore. For me, the switch to lower-level programming for modeling came with increasing use of data science and machine learning based approaches in energy and systems planning/modeling. Using Python/R packages instead of a GUI unlocked capabilities to not only build data-based models, but to define simple energy system models outside the confines of proprietary software. Models increasingly developed using open-source packages in Python or Julia (and to some extent R, JavaScript, and Rust among others) have become competitive against expensive proprietary models.

The accessibility in computing enabled by availability of GPUs and HPCs

Lately, I have also been tinkering with energy modeling using a GPU. See, for example, this blog post on benchmarking power system optimization using a GPU vs a CPU. Using GPU or HPCs, as they have become more accessible, enables the big leap in computing with massive parallelization.

However, parallelization is not only about using N-times the processing units simultaneously. The architecture and system design trade-offs in using GPUs or HPCs forces choices from the modeler. In utilizing the parallelization, the GPU changes the arithmetic, more specifically the precision. While CPUs use FP64 (64-bit floating point precision, GPUs use FP32. For reference, FP64 can distinguish between the numbers 1.000000000000000 and 1.000000000000001, but FP32 can only distinguish between 1.000000 and 1.000001. At first glance, the difference seems negligible. But, with thousands of matrix multiplications that go into a model, the rounding errors accumulate. Slightly different gradients lead to slightly different positioning on solution space. The final optimal output provided by the model differs. Meaningfully.

For a simple demonstration, I tested this on a gas power plant model. Specifically, a heat rate curve, which is a simple formula that tells you how efficiently a plant burns fuel at different power levels. Think of it as a U-shaped curve: the plant has a sweet spot where it burns fuel most efficiently and gets less efficient as you push it harder or back it off. That sweet spot, and the shape of the curve around it, directly determines the plant’s fuel bill and its bid into the energy market. The model is a quadratic equation fit to 350 observations of simulated plant operating data. I used the same optimizer, the same loss function, the same data. The only thing I changed between the two runs was the arithmetic precision: FP64 on one run, FP32 on the other.

The two models disagreed with where the plant runs most efficiently. FP64 said 346.6 MW. FP32 said 345.4 MW. About one megawatt apart on a 500 MW plant. That sounds small until you price it out: the difference in estimated fuel cost between the two models at a typical dispatch point of 350 MW comes to $545,000 per year, at $4.50/MMBtu gas running 8,000 hours a year, for one plant.

I ran 200 bootstrap samples on each platform to check whether any of this was noise. It was not. FP32 was consistently less certain about where the optimum efficiency was, with a spread of estimates across 200 resamples was 18.1 MW wide versus 11.9 MW for FP64. More uncertainty about the number that determines dispatch bids. The interactive demonstration, with all charts and results, is available here.

The accessibility in upskilling and software development enabled by agentic coding

Agentic coding has lowered the floor for building low-level software. Modelers are well placed to take advantage of this because of their understanding of numbers, constraints, and system behavior translates directly into thinking like a software developer. The development cycle is shorter, upskilling is faster, and unlocking massive parallelization on GPUs and HPCs no longer requires a deep systems programming background.

But with that accessibility comes a responsibility that did not exist when modeling was a desktop endeavor. While we generally think of a model as a mathematical artifact, the hardware has been a neutral substrate so far. Something the model runs on, not something that shapes what the model is. Hardware behaves more like an implicit hyperparameter: not set, not tuned, not documented, but active in every gradient computation. The emerging paradigm asks modelers to become more aware of the speed-precision tradeoff, to take reproducibility seriously across hardware environments, and to think harder about what a model output means when a different machine might return a different answer. Before we can fully specify the model, perhaps we need to specify the machine.

Power systems are getting increasingly complicated with renewable energy integration, distributed generation, storage, increasing demand, and more. Naturally, optimizing power systems is getting more computationally intensive. While gaming and AI industries have adopted GPU to speed up intensive computation, the adoption seems limited in power system planning. As an applied AI/ML researcher for energy and climate, I wanted to explore the extent to which GPU-based operations could speed up power system optimization.

I benchmark optimization power systems of different sizes using CPU and GPU-based approaches. I modeled power systems of various node sizes using PyPSA (a popular Pythonic framework) and another bespoke minimal GPU optimal setup. PyPSA was chosen because of its simplicity of implementation and growing adoption.

The PowerSystem class that models the fundamental components of an electrical grid includes:

Buses (nodes) representing connection points
Generators with specified capacities and costs
Loads (power demand) at various points
Transmission lines with physical parameters (reactance, resistance, capacity)

A simple representation of the power system is chosen with a linear chain network topology for experimental purposes only. The Appendix details configurations with a 100-node system chosen as an example.

A power system optimization problem typically involves finding the most cost-effective way to meet electrical demand across a network while respecting physical constraints like transmission line capacities and generator limitations. This is a linear programming (LP) problem in which we minimize generation costs subject to power flow and capacity constraints.

For the GPU-based optimization, I used the CuPy library. The main way GPU speeds up computation is by dividing the computation into batches and running them in a large number of parallel processes. I chose a batch size of 100 and used an OSQP solver optimized for GPU configuration.

The study tested both implementations across eight different system sizes:

Small systems: 10, 100 nodes
Medium systems: 500, 1,000, 2,000 nodes
Large systems: 5,000, 10,000, 20,000 nodes

For each size, the benchmark measured:

Solution status and correctness
Execution time for both CPU and GPU implementations
Objective function values (total generation cost)

Summary of results: I logged time taken by CPU and GPU, as well as the optimized objective values estimated by CPU and GPU. We notice that the optimized objective values as comparable for most sizes (except the smallest size).

Size	CPU Time (s)	GPU Time (s)	Speedup	Objective value Difference (%)

0.659

0.187

3.532

20.16%

100

0.553

0.227

2.437

2.08%

500

0.693

0.884

0.784

0.41%

1000

1.203

1.772

0.679

0.21%

2000

2.652

4.192

0.633

0.10%

5000

27.465

13.599

2.020

0.04%

10000

179.226

39.463

4.542

0.02%

20000

1177.919

126.769

9.292

0.01%

While I found that for the system sizes chosen, GPU generally speeds up computation, the order of magnitude differs by system size. The speed-up is relatively high for smaller system, goes does to below 1 for medium sizes and again dramatically speeds up for larger sizes.

The results reveal several important insights about GPU acceleration for power system optimization:

Performance Crossover Point

There appears to be a “crossover point” around 2,000 nodes where GPU acceleration becomes clearly advantageous. This suggests that:

For smaller systems, the overhead of GPU memory transfers may offset potential gains
Larger systems better utilize GPU parallelism, leading to substantial speedups

Scalability Characteristics

The GPU implementation shows superior scalability:

CPU time grows roughly quadratically with system size
GPU time grows more linearly, especially for larger systems
The speedup factor increases with system size, suggesting even better performance for very large systems

Implications

The results show remarkable speedup using GPU-based optimization, favoring its usage in large systems requiring prompt optimization. However, GPUs consume more electricity and water, and environmental factors must be taken into account during implementation. Ultimately, it’s about the trade-off between time saved, GPU costs, and potential environmental costs.

Use the following Github gist for replication. https://gist.github.com/kshitizkhanal7/4bed7ac04f9f89f64c99a5d297a611b7

Appendix: Reference system with 100 nodes

The system represents a large-scale power transmission network with several key characteristics:

System Structure
- 100 buses (B0 through B99)
- 100 generators (G0 through G99)
- 100 loads (L0 through L99)
- 99 transmission lines connecting adjacent buses
Generator Characteristics Each generator has:
- A maximum capacity of 1000 MW
- A cost that varies linearly across the system:
  - G0 starts at 50 $/MWh
  - G99 ends at 150 $/MWh
  - Each generator’s cost increases by approximately 1 $/MWh This cost gradient creates an interesting optimization problem where cheaper generators are preferred but transmission constraints may force the use of more expensive ones.
Load Pattern Each load follows a sinusoidal pattern:
- Base load of 500 MW
- Variation of ±100 MW based on position
- The formula P = 500 + 100*sin(2πi/100) creates a wave pattern across the system This pattern mimics real-world load variations while maintaining mathematical tractability.
Transmission Lines Each line connecting adjacent buses has:
- Capacity of 1000 MW
- Reactance (X) of 0.1 per unit
- Resistance (R) of 0.01 per unit These parameters create realistic power flow constraints.
Optimization Problem Size The complete system creates a substantial optimization problem with:
- 100 decision variables (generator outputs)
- 99 line flow constraints
- 100 power balance constraints
- 100 generator capacity constraints Total: ~400 constraints and ~100 variables
Performance Results For this 100-node system, the benchmark showed:
- CPU time: 0.553 seconds
- GPU time: 0.227 seconds
- Speedup factor: 2.437x
- CPU objective: 4.717929e+06
- GPU objective: 4.815897e+06

Category: Energy

Time for modelers to think about hardware as part of the model

Benchmarking power system optimization: CPU vs GPU

Performance Crossover Point

Scalability Characteristics

Implications

Appendix: Reference system with 100 nodes