Time for modelers to think about hardware as part of the model

Most of the energy/systems modeling tools I have used, from engineering school to today, have been desktop (mostly Windows, followed by Linux) applications. Typically, I’d use a  popular desktop software to define my model within its platform and compiled and ran it on the same PC to get the results. I reported the numbers and never had to think about the hardware. Thinking about hardware was not a part of the modeler’s job description.

Many models these days run on vendor’s cloud, with an API linking the user to the software that sits within a hardware controlled by the vendor. The modeler’s PC is only a terminal. The vendor thinks about the hardware, the modeler models.  Thinking about hardware is still not part of a modeler’s job description. Then again, modeling used to be a high-level endeavor in which modelers think about the problem and its mathematical formulation within a modeling platform.

The accessibility in definition enabled by general purpose programming languages

Modeling is not merely a high-level abstraction anymore. For me, the switch to lower-level programming for modeling came with increasing use of data science and machine learning based approaches in energy and systems planning/modeling. Using Python/R packages instead of a GUI unlocked capabilities to not only build data-based models, but to define simple energy system models outside the confines of proprietary software. Models increasingly developed using open-source packages in Python or Julia (and to some extent R, JavaScript, and Rust among others) have become competitive against expensive proprietary models.

The accessibility in computing enabled by availability of GPUs and HPCs

Lately, I have also been tinkering with energy modeling using a GPU. See, for example, this blog post on benchmarking power system optimization using a GPU vs a CPU. Using GPU or HPCs, as they have become more accessible, enables the big leap in computing with massive parallelization.

However, parallelization is not only about using N-times the processing units simultaneously. The architecture and system design trade-offs in using GPUs or HPCs forces choices from the modeler. In utilizing the parallelization, the GPU changes the arithmetic, more specifically the precision. While CPUs use FP64 (64-bit floating point precision, GPUs use FP32. For reference, FP64 can distinguish between the numbers 1.000000000000000 and 1.000000000000001, but FP32 can only distinguish between 1.000000 and 1.000001. At first glance, the difference seems negligible. But, with thousands of matrix multiplications that go into a model, the rounding errors accumulate. Slightly different gradients lead to slightly different positioning on solution space. The final optimal output provided by the model differs. Meaningfully.

For a simple demonstration, I tested this on a gas power plant model. Specifically, a heat rate curve, which is a simple formula that tells you how efficiently a plant burns fuel at different power levels. Think of it as a U-shaped curve: the plant has a sweet spot where it burns fuel most efficiently and gets less efficient as you push it harder or back it off. That sweet spot, and the shape of the curve around it, directly determines the plant’s fuel bill and its bid into the energy market. The model is a quadratic equation fit to 350 observations of simulated plant operating data. I used the same optimizer, the same loss function, the same data. The only thing I changed between the two runs was the arithmetic precision: FP64 on one run, FP32 on the other.

The two models disagreed with where the plant runs most efficiently. FP64 said 346.6 MW. FP32 said 345.4 MW. About one megawatt apart on a 500 MW plant. That sounds small until you price it out: the difference in estimated fuel cost between the two models at a typical dispatch point of 350 MW comes to $545,000 per year, at $4.50/MMBtu gas running 8,000 hours a year, for one plant.

I ran 200 bootstrap samples on each platform to check whether any of this was noise. It was not. FP32 was consistently less certain about where the optimum efficiency was, with a spread of estimates across 200 resamples was 18.1 MW wide versus 11.9 MW for FP64. More uncertainty about the number that determines dispatch bids. The interactive demonstration, with all charts and results, is available here.

The accessibility in upskilling and software development enabled by agentic coding

Agentic coding has lowered the floor for building low-level software. Modelers are well placed to take advantage of this because of their understanding of numbers, constraints, and system behavior translates directly into thinking like a software developer. The development cycle is shorter, upskilling is faster, and unlocking massive parallelization on GPUs and HPCs no longer requires a deep systems programming background.

But with that accessibility comes a responsibility that did not exist when modeling was a desktop endeavor. While we generally think of a model as a mathematical artifact, the hardware has been a neutral substrate so far. Something the model runs on, not something that shapes what the model is. Hardware behaves more like an implicit hyperparameter: not set, not tuned, not documented, but active in every gradient computation. The emerging paradigm asks modelers to become more aware of the speed-precision tradeoff, to take reproducibility seriously across hardware environments, and to think harder about what a model output means when a different machine might return a different answer. Before we can fully specify the model, perhaps we need to specify the machine.

Curating LLM Tuning Data from the FineWeb Dataset for High-fidelity Domain Adaptation

We created a post-training dataset from FineWeb dataset for high-fidelity domain adaptation of open weight LLM (Google Flan). Parameter efficient fine-tuning through prompt tuning resulted in remarkable improvement in perplexity scores as well as demonstration of ability of the tuned model to generalize based on information in the tuning dataset.

The work was selected for oral presentation at AGU24. Slide attached.

AGU-LLM-talk

Benchmarking power system optimization: CPU vs GPU

Power systems are getting increasingly complicated with renewable energy integration, distributed generation, storage, increasing demand, and more. Naturally, optimizing power systems is getting more computationally intensive. While gaming and AI industries have adopted GPU to speed up intensive computation, the adoption seems limited in power system planning. As an applied AI/ML researcher for energy and climate, I wanted to explore the extent to which GPU-based operations could speed up power system optimization.

I benchmark optimization power systems of different sizes using CPU and GPU-based approaches. I modeled power systems of various node sizes using PyPSA (a popular Pythonic framework) and another bespoke minimal GPU optimal setup. PyPSA was chosen because of its simplicity of implementation and growing adoption.

The PowerSystem class that models the fundamental components of an electrical grid includes:

  • Buses (nodes) representing connection points
  • Generators with specified capacities and costs
  • Loads (power demand) at various points
  • Transmission lines with physical parameters (reactance, resistance, capacity)

A simple representation of the power system is chosen with a linear chain network topology for experimental purposes only. The Appendix details configurations with a 100-node system chosen as an example.

A power system optimization problem typically involves finding the most cost-effective way to meet electrical demand across a network while respecting physical constraints like transmission line capacities and generator limitations. This is a linear programming (LP) problem in which we minimize generation costs subject to power flow and capacity constraints.

For the GPU-based optimization, I used the CuPy library. The main way GPU speeds up computation is by dividing the computation into batches and running them in a large number of parallel processes. I chose a batch size of 100 and used an OSQP solver optimized for GPU configuration.

The study tested both implementations across eight different system sizes:

  • Small systems: 10, 100 nodes
  • Medium systems: 500, 1,000, 2,000 nodes
  • Large systems: 5,000, 10,000, 20,000 nodes

For each size, the benchmark measured:

  • Solution status and correctness
  • Execution time for both CPU and GPU implementations
  • Objective function values (total generation cost)

Summary of results: I logged time taken by CPU and GPU, as well as the optimized objective values estimated by CPU and GPU. We notice that the optimized objective values as comparable for most sizes (except the smallest size).

SizeCPU Time (s)GPU Time (s)SpeedupObjective value Difference (%)
100.6590.1873.53220.16%
1000.5530.2272.4372.08%
5000.6930.8840.7840.41%
10001.2031.7720.6790.21%
20002.6524.1920.6330.10%
500027.46513.5992.0200.04%
10000179.22639.4634.5420.02%
200001177.919126.7699.2920.01%

While I found that for the system sizes chosen, GPU generally speeds up computation, the order of magnitude differs by system size. The speed-up is relatively high for smaller system, goes does to below 1 for medium sizes and again dramatically speeds up for larger sizes.

The results reveal several important insights about GPU acceleration for power system optimization:

Performance Crossover Point

There appears to be a “crossover point” around 2,000 nodes where GPU acceleration becomes clearly advantageous. This suggests that:

  • For smaller systems, the overhead of GPU memory transfers may offset potential gains
  • Larger systems better utilize GPU parallelism, leading to substantial speedups

Scalability Characteristics

The GPU implementation shows superior scalability:

  • CPU time grows roughly quadratically with system size
  • GPU time grows more linearly, especially for larger systems
  • The speedup factor increases with system size, suggesting even better performance for very large systems

Implications

The results show remarkable speedup using GPU-based optimization, favoring its usage in large systems requiring prompt optimization. However, GPUs consume more electricity and water, and environmental factors must be taken into account during implementation. Ultimately, it’s about the trade-off between time saved, GPU costs, and potential environmental costs.

Use the following Github gist for replication. https://gist.github.com/kshitizkhanal7/4bed7ac04f9f89f64c99a5d297a611b7

Appendix: Reference system with 100 nodes

The system represents a large-scale power transmission network with several key characteristics:

  1. System Structure
    • 100 buses (B0 through B99)
    • 100 generators (G0 through G99)
    • 100 loads (L0 through L99)
    • 99 transmission lines connecting adjacent buses
  2. Generator Characteristics Each generator has:
    • A maximum capacity of 1000 MW
    • A cost that varies linearly across the system:
      • G0 starts at 50 $/MWh
      • G99 ends at 150 $/MWh
      • Each generator’s cost increases by approximately 1 $/MWh This cost gradient creates an interesting optimization problem where cheaper generators are preferred but transmission constraints may force the use of more expensive ones.
  3. Load Pattern Each load follows a sinusoidal pattern:
    • Base load of 500 MW
    • Variation of ±100 MW based on position
    • The formula P = 500 + 100*sin(2πi/100) creates a wave pattern across the system This pattern mimics real-world load variations while maintaining mathematical tractability.
  4. Transmission Lines Each line connecting adjacent buses has:
    • Capacity of 1000 MW
    • Reactance (X) of 0.1 per unit
    • Resistance (R) of 0.01 per unit These parameters create realistic power flow constraints.
  5. Optimization Problem Size The complete system creates a substantial optimization problem with:
    • 100 decision variables (generator outputs)
    • 99 line flow constraints
    • 100 power balance constraints
    • 100 generator capacity constraints Total: ~400 constraints and ~100 variables
  6. Performance Results For this 100-node system, the benchmark showed:
    • CPU time: 0.553 seconds
    • GPU time: 0.227 seconds
    • Speedup factor: 2.437x
    • CPU objective: 4.717929e+06
    • GPU objective: 4.815897e+06

Making the spectrum of ‘openness’ in AI more visible

A (very) recent history of openness in AI

Google released demos of Gemini last week with much fanfare, but no way to even test it except with a supposed integration with Bard.

Mistral AI tweeted a Magnet link to one of its models. No fanfare. No press. Anyone with decent LLM skills could download, use, and even fine-tune the model. For open-source enthusiasts, it was a much better release than Gemini. This kind of accessibility to pretrained parameters of the neural network is called open weights. It enables users to use the model for inference and finetuning.

Open weights are better than just a demo or access to a product like ChatGPT or an API, no doubt. The example of Mistral is a case in point on what seems to be open source, might not be open source or fully open source. A post from The Register discusses in detail how Meta’s llama 2 isn’t exactly open source despite the claims.

Other models are more open. BLOOM (BigScience Large Open-science Open-access Multilingual Language Model) provides fully accessible source code and uses responsibly sourced training data, with support for diverse languages and cultures.

My main argument is that whenever an AI model is released for public consumption, where the model falls on the spectrum of openness should be clearly expressed and understood, without putting the burden of digging that information from the tome of license agreements on the user. AI, as a community of practice, should engage more in making that happen.

Spectrum of openness in AI

To make the idea of the spectrum of openness easier to understand, let’s take the example of openness in software. Openness, or typically a digital artifact being “open” is often thought of as binary. Whether something is open or closed. A straightforward example is that Linux is open while Windows is not. OpenStreetMap is open while Google Maps is not.

Openness is not exactly binary, it’s a spectrum. It’s easier to understand with the example of open-source software, as the history of free/open/libre software movements paves the way for discussions in openness of other artifacts such as data, research, science, etc. Software can be open source, but still varies in the level of “freedom” it provides the users.

Here’s what a spectrum of freedom in open source software might look like:

  • Freedom to modify source code and redistribute
  • Freedom to modify source code, but not to redistribute
  • Freedom to modify source code of core components, but additional features are proprietary
  • Freedom to view source code, but not to modify

This is only for software that’s considered open source. Some freemiums are free to use, but source code is not available and sometimes might be mistaken for open source. This kind of freedom is only one dimension in which we can discuss the openness of software. There are other dimensions to consider, for example: community engagement and governance, language support, documentation, interoperability, commercial engagement, and more.

Extrapolating the same concepts to openness in AI, even for an open weights model, the following (at the very least) are most likely closed:

  • Training dataset (with all potential bias and ethical issues, including legal compliance and copyright issues
  • Ethical guidelines and safety measures behind the creation of the model
  • Training code, methodology, hyperparameters, optimization techniques, post-training
  • Complete model architecture
  • Documentation
  • Objective evaluation following the norms of open, reproducible science
  • Organizational collaboration, governance
  • Finance, GPU, labor, and other resources necessary

Why is openness to all this information important?

Mainly, because we should be able to trust AI before using it, like we need to trust any product before we use it. Some instances of what trustworthy AI might look like:

  • Model architecture can be studied to make further developments. For example, the publication of the “Attention Is All You Need” paper with details on the attention mechanisms enabled much of the recent developments in Large Language Models.
  • An AI auditor can look at the training datasets and methodology to identify potential legal and ethical issues.
  • A startup developing an LLM-based app for their customers can understand potential security issues with the app and address those to save their customers from harm.
  • A lot of social bias and potential harm to underprivileged communities can be scrutinized so they can be avoided or remarkably mitigated.

However, the benefits of a level of privacy must be acknowledged as with all discussions in openness. Information that might affect the privacy or security concerns of stakeholders, including trademark and copyright issues should be private. Ultimately, it’s about finding the right trade-off to maximize social utility.

What next?

Now that we understand the value of openness and its visibility in AI, here are some actions the community can take.

We should develop a framework to define openness in AI.

The framework covers all the information about a model that its users need to be aware of. Some efforts have already been made. Sunil Ramlochan makes the distinction between open source, open weights, and restricted weights and suggests a simple framework for openness in AI. We can consolidate similar efforts to develop a comprehensive framework for openness in AI.

We should encourage the practice of discussing openness of AI models/products, not just using them.

AI as a community of practice, has enabled discussions on finetuning models and building products on top of them, pushing the limits of making diffusing AI to the masses. In addition to this, we should also discuss openness. Openness is not only an idealistic concept for academic discussions, but also a property of the models that can enable or hinder innovation and usefulness.

AI creators/companies should make openness information more accessible during release.

Instead of burying limitations in license agreements, creators/companies can make the information of where the models like in the spectra of openness in accessible language help the users understand the possibilities and limitations more easily and help reduce friction for the creators to enforce compliance with the terms.

We should develop a community-supported index to track and discuss openness of AI models/products.

Leaderboards have been very helpful recently in facilitating discussions of the performance of recently released models. Since openness is more qualitative than benchmark performance, an index can be designed that represents the openness of models in various dimensions in quantitative or definitive qualitative terms. Open data has a rich history of using indices to assess the current state of openness and pinpoint areas for improvement. Open Knowledge Foundation’s Open Data Index and Web Foundation’s Open Data Barometer can serve as good references for the AI models’ openness index. It could be hosted on a platform with good community support, for instance, HuggingFace. [I was involved in the Open Data Index and Open Data Barometer as a country reviewer for Nepal.] Stanford University has recently launched the Foundation Model Transparency Index which provided a rating of openness of 10 large foundation models. The project can provide lessons for a more active and community-managed project in which the openness of models can be assessed and compared with others soon after release.

We should increase community engagement in developing licenses for AI models.

Similar to how Creative Commons has made licensing content (text, images, etc.) easier, we need a variety of licenses that suit AI models with substantial community engagement. A notable initiative is the OpenRAIL project that has made a great start but still feels niche. The conversation about licensing needs to be more mainstream, and for that we need greater community engagement. As someone involved with open data, open source software, and OpenStreetMap communities for over a decade, vibrant community support is required to make open projects more widely accessible.

Summing up

Open access to AI research, openly available neural network architectures, open weights, and in general support for open source in various forms even from large tech companies have gotten us this far in making powerful AI more accessible. Openness in provenance information and source, and the freedom this enables will help make the future of AI more trustworthy.

Embedding Shiny App in WordPress

I mostly code in R and Python for my data science/machine learning projects and use WordPress in my portfolio blog. In order to communicate my experiments as interactive visualizations, I can either publish those as ShinyApps, or Quarto websites.

I wanted to test if I could embed a Shiny app in WordPress. It could help me write the data analysis and interactive visualization code in R, and publish it to my WordPress-based personal website.

The solution was to embed a Shiny app as an “iframe” in a WordPress blog.

An iframe (short for inline frame) is an HTML element that allows us to embed another HTML document within the current document. It provides a way to include external content from another source or website into your web page. The content within the iframe is displayed as a separate independent window within the parent document.

I published the example ShinyApp in “https://kshitizkhanal7.shinyapps.io/basic_shiny/“. Then I used the following HTML code in this WordPress code to get the app embedded here.

<iframe src="https://kshitizkhanal7.shinyapps.io/basic_shiny/" width="150%" height = "650"></iframe>

Let’s break it down:

  • <iframe>: This is the opening tag of the iframe element.
  • src="https://kshitizkhanal7.shinyapps.io/basic_shiny/": The src attribute specifies the URL of the external web page you want to display within the iframe. In this case, it is set to "https://kshitizkhanal7.shinyapps.io/basic_shiny/".
  • width="150%": The width attribute determines the width of the iframe. In this example, it is set to "150%", indicating that the iframe will be 150% of the width of its container. This allows the iframe to expand beyond the normal width of the container if needed.
  • height="650": The height attribute specifies the height of the iframe in pixels. In this case, it is set to "650" pixels.
  • </iframe>: This is the closing tag of the iframe element.

The resulting embedded app follows.

I plan to use this and explore other tools to create scrolly data stories in WordPress. Follow this space for more.

I am on Twitter @kshitizkhanal7.