Bridging Two Worlds: Machine Learning and Operations Research in Energy Systems Optimization

Author khanal1990
PublishedApril 11, 2025April 11, 2025

Energy systems are becoming increasingly complex, integrating diverse renewable sources while striving to meet growing global demands sustainably. Two powerful approaches have emerged to tackle these challenges: Machine Learning (ML) and Operations Research (OR). While both use optimization at their core, they approach problems from distinctly different angles, each with unique strengths and limitations.

The Fundamental Difference

Machine learning optimization focuses primarily on learning model parameters from data by iteratively refining parameters to minimize a loss function or maximize a reward signal. The goal is to capture patterns in data and make reliable predictions. Operations research, on the other hand, applies analytical techniques to support decision-making across industries, focusing on maximizing desired outcomes while adhering to real-world constraints.

This fundamental difference shapes how each field approaches energy system challenges.

Machine Learning Approaches

ML optimization in energy systems typically employs several key techniques:

Gradient Descent Algorithms: These methods update model parameters by taking small steps in the direction opposite to the gradient of the objective function. Variants like Stochastic Gradient Descent (SGD) and mini-batch versions improve efficiency by updating parameters using only small subsets of training data.
Adaptive Optimization Methods: Sophisticated algorithms like Adam (Adaptive Moment Estimation) and RMSProp dynamically adjust learning rates for each parameter based on historical gradients, often leading to faster convergence in complex optimization landscapes.
Bayesian Optimization: Particularly suited for hyperparameter tuning, this technique builds a probabilistic model of the objective function and uses an acquisition function to balance exploration and exploitation, efficiently finding near-optimal hyperparameters with fewer evaluations.

ML excels in energy applications like demand forecasting, predictive maintenance, consumption optimization through reinforcement learning, renewable energy prediction, and storage system management.

Operations Research Approaches

OR brings a different toolkit to energy system optimization:

Linear Programming: A foundational technique dealing with optimizing linear objective functions subject to linear constraints. The simplex method efficiently finds optimal solutions by exploring corners of the feasible region defined by constraints.
Integer Programming: An extension of linear programming where some or all variables must take integer values, making problems more complex but essential for modeling discrete decisions like facility numbers or scheduling tasks.
Dynamic Programming: This approach breaks complex problems into smaller, overlapping subproblems, finding optimal solutions by combining solutions to these subproblems. It’s well-suited for sequential decision problems where optimal decisions depend only on current system state.

OR techniques have been extensively applied to power plant scheduling, grid optimization, renewable energy integration, supply chain management, and energy policy planning.

Comparative Strengths and Weaknesses

The approaches differ significantly in several key aspects:

Problem Formulation:
Machine learning uses empirical objective functions derived directly from data, often complex and non-linear. Operations research objective functions tend to be more explicit, based on well-defined metrics like costs or efficiencies. ML often incorporates constraints implicitly through regularization, while OR uses explicit constraints to precisely define the feasible solution region.

Solution Approaches:
ML predominantly uses iterative optimization methods like gradient descent, which don’t always guarantee global optimality. OR employs mathematical programming methods seeking exact solutions, as well as heuristics for intractable problems.

Data vs. Models:
Machine learning is fundamentally data-driven, learning patterns directly from input data. Operations research typically builds mathematical models based on domain knowledge and then optimizes within those models.

Handling Uncertainty:
OR has well-developed techniques like stochastic programming and robust optimization specifically designed to account for parameter uncertainty. ML typically handles uncertainty through probabilistic models or by training on noisy datasets.

Bridging the Gap: Hybrid Approaches

The most promising development is the emergence of hybrid approaches combining ML and OR strengths:

Predict-then-Optimize: ML models predict uncertain variables (e.g., renewable generation), which feed into OR optimization models determining the best actions (e.g., power plant scheduling).
ML-Enhanced OR Algorithms: Machine learning can guide branching decisions in integer programming or learn heuristics for combinatorial problems, significantly reducing solution time.
Surrogate Models: ML models can approximate computationally expensive OR optimization models, providing near-instantaneous solutions for real-time applications.

The Future of Energy System Optimization

The energy sector critically needs effective optimization approaches. As renewable integration, efficiency demands, and sustainability requirements grow more complex, the trend toward integrating ML and OR techniques appears most promising. Hybrid approaches leveraging ML’s predictive power and OR’s structured decision-making capabilities are especially well-suited for modern energy system challenges.

Future research should focus on developing more interpretable ML methods, improving OR scalability for massive datasets, and creating novel hybrid methodologies for specific challenges like grid management with high renewable penetration.

The path forward lies in greater collaboration between ML and OR researchers, combining strengths to create innovative solutions for a sustainable energy future.

Curating LLM Tuning Data from the FineWeb Dataset for High-fidelity Domain Adaptation

Author khanal1990
PublishedJanuary 10, 2025January 10, 2025

We created a post-training dataset from FineWeb dataset for high-fidelity domain adaptation of open weight LLM (Google Flan). Parameter efficient fine-tuning through prompt tuning resulted in remarkable improvement in perplexity scores as well as demonstration of ability of the tuned model to generalize based on information in the tuning dataset.

The work was selected for oral presentation at AGU24. Slide attached.

AGU-LLM-talk

Benchmarking power system optimization: CPU vs GPU

Author khanal1990
PublishedDecember 29, 2024December 29, 2024

Power systems are getting increasingly complicated with renewable energy integration, distributed generation, storage, increasing demand, and more. Naturally, optimizing power systems is getting more computationally intensive. While gaming and AI industries have adopted GPU to speed up intensive computation, the adoption seems limited in power system planning. As an applied AI/ML researcher for energy and climate, I wanted to explore the extent to which GPU-based operations could speed up power system optimization.

I benchmark optimization power systems of different sizes using CPU and GPU-based approaches. I modeled power systems of various node sizes using PyPSA (a popular Pythonic framework) and another bespoke minimal GPU optimal setup. PyPSA was chosen because of its simplicity of implementation and growing adoption.

The PowerSystem class that models the fundamental components of an electrical grid includes:

Buses (nodes) representing connection points
Generators with specified capacities and costs
Loads (power demand) at various points
Transmission lines with physical parameters (reactance, resistance, capacity)

A simple representation of the power system is chosen with a linear chain network topology for experimental purposes only. The Appendix details configurations with a 100-node system chosen as an example.

A power system optimization problem typically involves finding the most cost-effective way to meet electrical demand across a network while respecting physical constraints like transmission line capacities and generator limitations. This is a linear programming (LP) problem in which we minimize generation costs subject to power flow and capacity constraints.

For the GPU-based optimization, I used the CuPy library. The main way GPU speeds up computation is by dividing the computation into batches and running them in a large number of parallel processes. I chose a batch size of 100 and used an OSQP solver optimized for GPU configuration.

The study tested both implementations across eight different system sizes:

Small systems: 10, 100 nodes
Medium systems: 500, 1,000, 2,000 nodes
Large systems: 5,000, 10,000, 20,000 nodes

For each size, the benchmark measured:

Solution status and correctness
Execution time for both CPU and GPU implementations
Objective function values (total generation cost)

Summary of results: I logged time taken by CPU and GPU, as well as the optimized objective values estimated by CPU and GPU. We notice that the optimized objective values as comparable for most sizes (except the smallest size).

Size	CPU Time (s)	GPU Time (s)	Speedup	Objective value Difference (%)

0.659

0.187

3.532

20.16%

100

0.553

0.227

2.437

2.08%

500

0.693

0.884

0.784

0.41%

1000

1.203

1.772

0.679

0.21%

2000

2.652

4.192

0.633

0.10%

5000

27.465

13.599

2.020

0.04%

10000

179.226

39.463

4.542

0.02%

20000

1177.919

126.769

9.292

0.01%

While I found that for the system sizes chosen, GPU generally speeds up computation, the order of magnitude differs by system size. The speed-up is relatively high for smaller system, goes does to below 1 for medium sizes and again dramatically speeds up for larger sizes.

The results reveal several important insights about GPU acceleration for power system optimization:

Performance Crossover Point

There appears to be a “crossover point” around 2,000 nodes where GPU acceleration becomes clearly advantageous. This suggests that:

For smaller systems, the overhead of GPU memory transfers may offset potential gains
Larger systems better utilize GPU parallelism, leading to substantial speedups

Scalability Characteristics

The GPU implementation shows superior scalability:

CPU time grows roughly quadratically with system size
GPU time grows more linearly, especially for larger systems
The speedup factor increases with system size, suggesting even better performance for very large systems

Implications

The results show remarkable speedup using GPU-based optimization, favoring its usage in large systems requiring prompt optimization. However, GPUs consume more electricity and water, and environmental factors must be taken into account during implementation. Ultimately, it’s about the trade-off between time saved, GPU costs, and potential environmental costs.

Use the following Github gist for replication. https://gist.github.com/kshitizkhanal7/4bed7ac04f9f89f64c99a5d297a611b7

Appendix: Reference system with 100 nodes

The system represents a large-scale power transmission network with several key characteristics:

System Structure
- 100 buses (B0 through B99)
- 100 generators (G0 through G99)
- 100 loads (L0 through L99)
- 99 transmission lines connecting adjacent buses
Generator Characteristics Each generator has:
- A maximum capacity of 1000 MW
- A cost that varies linearly across the system:
  - G0 starts at 50 $/MWh
  - G99 ends at 150 $/MWh
  - Each generator’s cost increases by approximately 1 $/MWh This cost gradient creates an interesting optimization problem where cheaper generators are preferred but transmission constraints may force the use of more expensive ones.
Load Pattern Each load follows a sinusoidal pattern:
- Base load of 500 MW
- Variation of ±100 MW based on position
- The formula P = 500 + 100*sin(2πi/100) creates a wave pattern across the system This pattern mimics real-world load variations while maintaining mathematical tractability.
Transmission Lines Each line connecting adjacent buses has:
- Capacity of 1000 MW
- Reactance (X) of 0.1 per unit
- Resistance (R) of 0.01 per unit These parameters create realistic power flow constraints.
Optimization Problem Size The complete system creates a substantial optimization problem with:
- 100 decision variables (generator outputs)
- 99 line flow constraints
- 100 power balance constraints
- 100 generator capacity constraints Total: ~400 constraints and ~100 variables
Performance Results For this 100-node system, the benchmark showed:
- CPU time: 0.553 seconds
- GPU time: 0.227 seconds
- Speedup factor: 2.437x
- CPU objective: 4.717929e+06
- GPU objective: 4.815897e+06

Making the spectrum of ‘openness’ in AI more visible

Author khanal1990
PublishedDecember 18, 2023December 18, 2023

A (very) recent history of openness in AI

Google released demos of Gemini last week with much fanfare, but no way to even test it except with a supposed integration with Bard.

Mistral AI tweeted a Magnet link to one of its models. No fanfare. No press. Anyone with decent LLM skills could download, use, and even fine-tune the model. For open-source enthusiasts, it was a much better release than Gemini. This kind of accessibility to pretrained parameters of the neural network is called open weights. It enables users to use the model for inference and finetuning.

Open weights are better than just a demo or access to a product like ChatGPT or an API, no doubt. The example of Mistral is a case in point on what seems to be open source, might not be open source or fully open source. A post from The Register discusses in detail how Meta’s llama 2 isn’t exactly open source despite the claims.

Other models are more open. BLOOM (BigScience Large Open-science Open-access Multilingual Language Model) provides fully accessible source code and uses responsibly sourced training data, with support for diverse languages and cultures.

My main argument is that whenever an AI model is released for public consumption, where the model falls on the spectrum of openness should be clearly expressed and understood, without putting the burden of digging that information from the tome of license agreements on the user. AI, as a community of practice, should engage more in making that happen.

Spectrum of openness in AI

To make the idea of the spectrum of openness easier to understand, let’s take the example of openness in software. Openness, or typically a digital artifact being “open” is often thought of as binary. Whether something is open or closed. A straightforward example is that Linux is open while Windows is not. OpenStreetMap is open while Google Maps is not.

Openness is not exactly binary, it’s a spectrum. It’s easier to understand with the example of open-source software, as the history of free/open/libre software movements paves the way for discussions in openness of other artifacts such as data, research, science, etc. Software can be open source, but still varies in the level of “freedom” it provides the users.

Here’s what a spectrum of freedom in open source software might look like:

Freedom to modify source code and redistribute
Freedom to modify source code, but not to redistribute
Freedom to modify source code of core components, but additional features are proprietary
Freedom to view source code, but not to modify

This is only for software that’s considered open source. Some freemiums are free to use, but source code is not available and sometimes might be mistaken for open source. This kind of freedom is only one dimension in which we can discuss the openness of software. There are other dimensions to consider, for example: community engagement and governance, language support, documentation, interoperability, commercial engagement, and more.

Extrapolating the same concepts to openness in AI, even for an open weights model, the following (at the very least) are most likely closed:

Training dataset (with all potential bias and ethical issues, including legal compliance and copyright issues
Ethical guidelines and safety measures behind the creation of the model
Training code, methodology, hyperparameters, optimization techniques, post-training
Complete model architecture
Documentation
Objective evaluation following the norms of open, reproducible science
Organizational collaboration, governance
Finance, GPU, labor, and other resources necessary

Why is openness to all this information important?

Mainly, because we should be able to trust AI before using it, like we need to trust any product before we use it. Some instances of what trustworthy AI might look like:

Model architecture can be studied to make further developments. For example, the publication of the “Attention Is All You Need” paper with details on the attention mechanisms enabled much of the recent developments in Large Language Models.
An AI auditor can look at the training datasets and methodology to identify potential legal and ethical issues.
A startup developing an LLM-based app for their customers can understand potential security issues with the app and address those to save their customers from harm.
A lot of social bias and potential harm to underprivileged communities can be scrutinized so they can be avoided or remarkably mitigated.

However, the benefits of a level of privacy must be acknowledged as with all discussions in openness. Information that might affect the privacy or security concerns of stakeholders, including trademark and copyright issues should be private. Ultimately, it’s about finding the right trade-off to maximize social utility.

What next?

Now that we understand the value of openness and its visibility in AI, here are some actions the community can take.

We should develop a framework to define openness in AI.

The framework covers all the information about a model that its users need to be aware of. Some efforts have already been made. Sunil Ramlochan makes the distinction between open source, open weights, and restricted weights and suggests a simple framework for openness in AI. We can consolidate similar efforts to develop a comprehensive framework for openness in AI.

We should encourage the practice of discussing openness of AI models/products, not just using them.

AI as a community of practice, has enabled discussions on finetuning models and building products on top of them, pushing the limits of making diffusing AI to the masses. In addition to this, we should also discuss openness. Openness is not only an idealistic concept for academic discussions, but also a property of the models that can enable or hinder innovation and usefulness.

AI creators/companies should make openness information more accessible during release.

Instead of burying limitations in license agreements, creators/companies can make the information of where the models like in the spectra of openness in accessible language help the users understand the possibilities and limitations more easily and help reduce friction for the creators to enforce compliance with the terms.

We should develop a community-supported index to track and discuss openness of AI models/products.

Leaderboards have been very helpful recently in facilitating discussions of the performance of recently released models. Since openness is more qualitative than benchmark performance, an index can be designed that represents the openness of models in various dimensions in quantitative or definitive qualitative terms. Open data has a rich history of using indices to assess the current state of openness and pinpoint areas for improvement. Open Knowledge Foundation’s Open Data Index and Web Foundation’s Open Data Barometer can serve as good references for the AI models’ openness index. It could be hosted on a platform with good community support, for instance, HuggingFace. [I was involved in the Open Data Index and Open Data Barometer as a country reviewer for Nepal.] Stanford University has recently launched the Foundation Model Transparency Index which provided a rating of openness of 10 large foundation models. The project can provide lessons for a more active and community-managed project in which the openness of models can be assessed and compared with others soon after release.

We should increase community engagement in developing licenses for AI models.

Similar to how Creative Commons has made licensing content (text, images, etc.) easier, we need a variety of licenses that suit AI models with substantial community engagement. A notable initiative is the OpenRAIL project that has made a great start but still feels niche. The conversation about licensing needs to be more mainstream, and for that we need greater community engagement. As someone involved with open data, open source software, and OpenStreetMap communities for over a decade, vibrant community support is required to make open projects more widely accessible.

Summing up

Open access to AI research, openly available neural network architectures, open weights, and in general support for open source in various forms even from large tech companies have gotten us this far in making powerful AI more accessible. Openness in provenance information and source, and the freedom this enables will help make the future of AI more trustworthy.

Embedding Shiny App in WordPress

Author khanal1990
PublishedJuly 2, 2023July 2, 2023

I mostly code in R and Python for my data science/machine learning projects and use WordPress in my portfolio blog. In order to communicate my experiments as interactive visualizations, I can either publish those as ShinyApps, or Quarto websites.

I wanted to test if I could embed a Shiny app in WordPress. It could help me write the data analysis and interactive visualization code in R, and publish it to my WordPress-based personal website.

The solution was to embed a Shiny app as an “iframe” in a WordPress blog.

An iframe (short for inline frame) is an HTML element that allows us to embed another HTML document within the current document. It provides a way to include external content from another source or website into your web page. The content within the iframe is displayed as a separate independent window within the parent document.

I published the example ShinyApp in “https://kshitizkhanal7.shinyapps.io/basic_shiny/“. Then I used the following HTML code in this WordPress code to get the app embedded here.

<iframe src="https://kshitizkhanal7.shinyapps.io/basic_shiny/" width="150%" height = "650"></iframe>

Let’s break it down:

<iframe>: This is the opening tag of the iframe element.
src="https://kshitizkhanal7.shinyapps.io/basic_shiny/": The src attribute specifies the URL of the external web page you want to display within the iframe. In this case, it is set to "https://kshitizkhanal7.shinyapps.io/basic_shiny/".
width="150%": The width attribute determines the width of the iframe. In this example, it is set to "150%", indicating that the iframe will be 150% of the width of its container. This allows the iframe to expand beyond the normal width of the container if needed.
height="650": The height attribute specifies the height of the iframe in pixels. In this case, it is set to "650" pixels.
</iframe>: This is the closing tag of the iframe element.

The resulting embedded app follows.

I plan to use this and explore other tools to create scrolly data stories in WordPress. Follow this space for more.

I am on Twitter @kshitizkhanal7.