Time for modelers to think about hardware as part of the model

Most of the energy/systems modeling tools I have used, from engineering school to today, have been desktop (mostly Windows, followed by Linux) applications. Typically, I’d use a  popular desktop software to define my model within its platform and compiled and ran it on the same PC to get the results. I reported the numbers and never had to think about the hardware. Thinking about hardware was not a part of the modeler’s job description.

Many models these days run on vendor’s cloud, with an API linking the user to the software that sits within a hardware controlled by the vendor. The modeler’s PC is only a terminal. The vendor thinks about the hardware, the modeler models.  Thinking about hardware is still not part of a modeler’s job description. Then again, modeling used to be a high-level endeavor in which modelers think about the problem and its mathematical formulation within a modeling platform.

The accessibility in definition enabled by general purpose programming languages

Modeling is not merely a high-level abstraction anymore. For me, the switch to lower-level programming for modeling came with increasing use of data science and machine learning based approaches in energy and systems planning/modeling. Using Python/R packages instead of a GUI unlocked capabilities to not only build data-based models, but to define simple energy system models outside the confines of proprietary software. Models increasingly developed using open-source packages in Python or Julia (and to some extent R, JavaScript, and Rust among others) have become competitive against expensive proprietary models.

The accessibility in computing enabled by availability of GPUs and HPCs

Lately, I have also been tinkering with energy modeling using a GPU. See, for example, this blog post on benchmarking power system optimization using a GPU vs a CPU. Using GPU or HPCs, as they have become more accessible, enables the big leap in computing with massive parallelization.

However, parallelization is not only about using N-times the processing units simultaneously. The architecture and system design trade-offs in using GPUs or HPCs forces choices from the modeler. In utilizing the parallelization, the GPU changes the arithmetic, more specifically the precision. While CPUs use FP64 (64-bit floating point precision, GPUs use FP32. For reference, FP64 can distinguish between the numbers 1.000000000000000 and 1.000000000000001, but FP32 can only distinguish between 1.000000 and 1.000001. At first glance, the difference seems negligible. But, with thousands of matrix multiplications that go into a model, the rounding errors accumulate. Slightly different gradients lead to slightly different positioning on solution space. The final optimal output provided by the model differs. Meaningfully.

For a simple demonstration, I tested this on a gas power plant model. Specifically, a heat rate curve, which is a simple formula that tells you how efficiently a plant burns fuel at different power levels. Think of it as a U-shaped curve: the plant has a sweet spot where it burns fuel most efficiently and gets less efficient as you push it harder or back it off. That sweet spot, and the shape of the curve around it, directly determines the plant’s fuel bill and its bid into the energy market. The model is a quadratic equation fit to 350 observations of simulated plant operating data. I used the same optimizer, the same loss function, the same data. The only thing I changed between the two runs was the arithmetic precision: FP64 on one run, FP32 on the other.

The two models disagreed with where the plant runs most efficiently. FP64 said 346.6 MW. FP32 said 345.4 MW. About one megawatt apart on a 500 MW plant. That sounds small until you price it out: the difference in estimated fuel cost between the two models at a typical dispatch point of 350 MW comes to $545,000 per year, at $4.50/MMBtu gas running 8,000 hours a year, for one plant.

I ran 200 bootstrap samples on each platform to check whether any of this was noise. It was not. FP32 was consistently less certain about where the optimum efficiency was, with a spread of estimates across 200 resamples was 18.1 MW wide versus 11.9 MW for FP64. More uncertainty about the number that determines dispatch bids. The interactive demonstration, with all charts and results, is available here.

The accessibility in upskilling and software development enabled by agentic coding

Agentic coding has lowered the floor for building low-level software. Modelers are well placed to take advantage of this because of their understanding of numbers, constraints, and system behavior translates directly into thinking like a software developer. The development cycle is shorter, upskilling is faster, and unlocking massive parallelization on GPUs and HPCs no longer requires a deep systems programming background.

But with that accessibility comes a responsibility that did not exist when modeling was a desktop endeavor. While we generally think of a model as a mathematical artifact, the hardware has been a neutral substrate so far. Something the model runs on, not something that shapes what the model is. Hardware behaves more like an implicit hyperparameter: not set, not tuned, not documented, but active in every gradient computation. The emerging paradigm asks modelers to become more aware of the speed-precision tradeoff, to take reproducibility seriously across hardware environments, and to think harder about what a model output means when a different machine might return a different answer. Before we can fully specify the model, perhaps we need to specify the machine.

Making the spectrum of ‘openness’ in AI more visible

A (very) recent history of openness in AI

Google released demos of Gemini last week with much fanfare, but no way to even test it except with a supposed integration with Bard.

Mistral AI tweeted a Magnet link to one of its models. No fanfare. No press. Anyone with decent LLM skills could download, use, and even fine-tune the model. For open-source enthusiasts, it was a much better release than Gemini. This kind of accessibility to pretrained parameters of the neural network is called open weights. It enables users to use the model for inference and finetuning.

Open weights are better than just a demo or access to a product like ChatGPT or an API, no doubt. The example of Mistral is a case in point on what seems to be open source, might not be open source or fully open source. A post from The Register discusses in detail how Meta’s llama 2 isn’t exactly open source despite the claims.

Other models are more open. BLOOM (BigScience Large Open-science Open-access Multilingual Language Model) provides fully accessible source code and uses responsibly sourced training data, with support for diverse languages and cultures.

My main argument is that whenever an AI model is released for public consumption, where the model falls on the spectrum of openness should be clearly expressed and understood, without putting the burden of digging that information from the tome of license agreements on the user. AI, as a community of practice, should engage more in making that happen.

Spectrum of openness in AI

To make the idea of the spectrum of openness easier to understand, let’s take the example of openness in software. Openness, or typically a digital artifact being “open” is often thought of as binary. Whether something is open or closed. A straightforward example is that Linux is open while Windows is not. OpenStreetMap is open while Google Maps is not.

Openness is not exactly binary, it’s a spectrum. It’s easier to understand with the example of open-source software, as the history of free/open/libre software movements paves the way for discussions in openness of other artifacts such as data, research, science, etc. Software can be open source, but still varies in the level of “freedom” it provides the users.

Here’s what a spectrum of freedom in open source software might look like:

  • Freedom to modify source code and redistribute
  • Freedom to modify source code, but not to redistribute
  • Freedom to modify source code of core components, but additional features are proprietary
  • Freedom to view source code, but not to modify

This is only for software that’s considered open source. Some freemiums are free to use, but source code is not available and sometimes might be mistaken for open source. This kind of freedom is only one dimension in which we can discuss the openness of software. There are other dimensions to consider, for example: community engagement and governance, language support, documentation, interoperability, commercial engagement, and more.

Extrapolating the same concepts to openness in AI, even for an open weights model, the following (at the very least) are most likely closed:

  • Training dataset (with all potential bias and ethical issues, including legal compliance and copyright issues
  • Ethical guidelines and safety measures behind the creation of the model
  • Training code, methodology, hyperparameters, optimization techniques, post-training
  • Complete model architecture
  • Documentation
  • Objective evaluation following the norms of open, reproducible science
  • Organizational collaboration, governance
  • Finance, GPU, labor, and other resources necessary

Why is openness to all this information important?

Mainly, because we should be able to trust AI before using it, like we need to trust any product before we use it. Some instances of what trustworthy AI might look like:

  • Model architecture can be studied to make further developments. For example, the publication of the “Attention Is All You Need” paper with details on the attention mechanisms enabled much of the recent developments in Large Language Models.
  • An AI auditor can look at the training datasets and methodology to identify potential legal and ethical issues.
  • A startup developing an LLM-based app for their customers can understand potential security issues with the app and address those to save their customers from harm.
  • A lot of social bias and potential harm to underprivileged communities can be scrutinized so they can be avoided or remarkably mitigated.

However, the benefits of a level of privacy must be acknowledged as with all discussions in openness. Information that might affect the privacy or security concerns of stakeholders, including trademark and copyright issues should be private. Ultimately, it’s about finding the right trade-off to maximize social utility.

What next?

Now that we understand the value of openness and its visibility in AI, here are some actions the community can take.

We should develop a framework to define openness in AI.

The framework covers all the information about a model that its users need to be aware of. Some efforts have already been made. Sunil Ramlochan makes the distinction between open source, open weights, and restricted weights and suggests a simple framework for openness in AI. We can consolidate similar efforts to develop a comprehensive framework for openness in AI.

We should encourage the practice of discussing openness of AI models/products, not just using them.

AI as a community of practice, has enabled discussions on finetuning models and building products on top of them, pushing the limits of making diffusing AI to the masses. In addition to this, we should also discuss openness. Openness is not only an idealistic concept for academic discussions, but also a property of the models that can enable or hinder innovation and usefulness.

AI creators/companies should make openness information more accessible during release.

Instead of burying limitations in license agreements, creators/companies can make the information of where the models like in the spectra of openness in accessible language help the users understand the possibilities and limitations more easily and help reduce friction for the creators to enforce compliance with the terms.

We should develop a community-supported index to track and discuss openness of AI models/products.

Leaderboards have been very helpful recently in facilitating discussions of the performance of recently released models. Since openness is more qualitative than benchmark performance, an index can be designed that represents the openness of models in various dimensions in quantitative or definitive qualitative terms. Open data has a rich history of using indices to assess the current state of openness and pinpoint areas for improvement. Open Knowledge Foundation’s Open Data Index and Web Foundation’s Open Data Barometer can serve as good references for the AI models’ openness index. It could be hosted on a platform with good community support, for instance, HuggingFace. [I was involved in the Open Data Index and Open Data Barometer as a country reviewer for Nepal.] Stanford University has recently launched the Foundation Model Transparency Index which provided a rating of openness of 10 large foundation models. The project can provide lessons for a more active and community-managed project in which the openness of models can be assessed and compared with others soon after release.

We should increase community engagement in developing licenses for AI models.

Similar to how Creative Commons has made licensing content (text, images, etc.) easier, we need a variety of licenses that suit AI models with substantial community engagement. A notable initiative is the OpenRAIL project that has made a great start but still feels niche. The conversation about licensing needs to be more mainstream, and for that we need greater community engagement. As someone involved with open data, open source software, and OpenStreetMap communities for over a decade, vibrant community support is required to make open projects more widely accessible.

Summing up

Open access to AI research, openly available neural network architectures, open weights, and in general support for open source in various forms even from large tech companies have gotten us this far in making powerful AI more accessible. Openness in provenance information and source, and the freedom this enables will help make the future of AI more trustworthy.