🤖 Linking LLMs to Climate Problems #018

learning about language models from Julia Wu and Helena Merk

Sep 28, 2023

TLDR: Climate is a theme. AI is also a theme. At the intersection of these two is a particular type of problem. Specifically in climate, LLM applications can be used to glide through policy paperwork, coordinate within a massive, tangled web of stakeholders, and much more.

Yes - climate is a physical problem that requires physical solutions. But building all this stuff will take a lot of money. We have some of it already (the IRA is a great start) and still need plenty more, but there’s already inefficiencies to getting existing money. Navigating the complexities of policy and renewable project development are two problem areas where software can help. Not just software; not just AI; specifically large language models (LLMs).

When it comes to largely text-based, unstructured data, LLMs have the capability to organize and streamline information to eliminate tedious requirements and reduce time spent on indirect work. By “indirect work” I’m referring to all the mundane climate-relevant tasks that aren’t actually saving the planet. Things like applying for grants, sending documents to stakeholders, filing for interconnection, and applying for permits. Although LLM-based software is not literally producing renewably generated electricity or sucking carbon out of the air, it has the potential to turbocharge climate in game-changing ways.

When it comes to the latest AI trends, I’ve been living under a rock so I sat down with two founders building at the intersection of Climate x AI to understand how LLMs can be applied in climate. Julia Wu is exploring the renewable project development space and is also the creator of SolarGPT, a chat bot that answers your questions on all things clean energy. Helena Merk is the Co-founder and CEO of Streamline, helping climate companies access money via grant sourcing, qualifying, and drafting.

In our conversation, we cover:

Framing Climate x AI
Identifying Good Problems for LLMs
The Role of LLMs in Climate
Unstructured vs. Structured Data
Complexities in Renewable Project Development
Risks and Bottlenecks

What do you think of this intersection of two themes: Climate x AI ? How do you even frame it? What is your model?

Helena: These two themes, Climate and AI, represent perhaps the most significant ongoing trends in the world right now. When it comes to AI, I would narrow it down further to emphasize that much of the recent progress we've witnessed pertains to Language Models (LLMs).

For climate, the issue has been deteriorating gradually over an extended period. There's currently a rapid surge in funding; brought by new legislation such as the bipartisan infrastructure law, the CHIPS Act, the IRA. On top of this, we’re seeing the emergence of other countries' equivalents to the IRA. This sudden global push places us on a more favorable path towards channeling appropriate investments into the necessary areas.

However, the current funding, impressive as it might be, falls short of the mark. To genuinely achieve the net-zero objectives set forth by various nations we’ll need trillions annually invested. Globally, this figure hovers around $4 trillion for clean energy alone, necessitating substantial financial commitment from governments to fuel climate-oriented initiatives. The gist is that to get to net zero, we’ll need a complete overhaul of our built environment – a comprehensive reimagining and reconstruction process spanning everything from food production to supply chains and even the composition of concrete.

Now we finally have the political will to push things in the right direction. Similarly, with AI’s recent breakthroughs, so much more is possible. Combining the two is what I’m currently most excited about — we can use LLMs to navigate policy and accelerate deployment of funding to impactful causes. Policy is traditionally this text-heavy, hard-to-navigate bureaucratic system that moves slowly because of real world stakeholders at risk and moving too quickly would be bad. That said, there are parts of it that are moving slowly for (what I would argue) no reason.

Julia: We started having conversations with developers of commercial and community solar and noticed people asking the same questions over and over. They asked “What does the IRA say about low-income communities?” or “How do I apply evidence that I am meeting the prevailing wage requirement as dictated by the IRA?” “How much domestic material do I need to qualify for the domestic content stuff?”

So you have people doing the same process of, to Helena's point, plowing through policy documents, analyzing it, and then putting the paperwork together, which gets analyzed by more and more people. There's a lot at stake when you're talking about tax credits. the risk of tax credit fraud, and credits getting clawed back.

There's a huge policy aspect to all of this. There's a lot of repetitive work being done but also in addition, at the utility-scale projects, they have a lot of paperwork that analyzes beyond policy; so interconnection, permitting design, feasibility and engineering, feasibility studies, engineering reports, and appraisal reports. These are all hundreds of pages in total.

Identifying the problem first, Building the solution second

“If all you have is a hammer, everything looks like a nail." It would be a mistake to build the solution (LLMs) and then go looking for the right problem to tackle. What was your journey to uncover the problem before realizing that LLMs could be a potential solution?

Julia: Somebody was like “Julia, you should read the IRA with a magnifying glass.” And I was like, I don't wanna do that. It's hundreds of pages and it's a very important document, but I don't wanna read the whole thing. So I just kept thinking “Where are the summaries of the IRA? What are the key points? Where do I find all the synthesis?”

For my own sake of understanding the IRA, I would love for there to be a Q&A bot for this. People want to get fast answers and I also saw forums where people kept asking the same questions. I realized how much repetitive work and paperwork there is, and how much time people are spending to try to understand all of this.

But there's also the non-AI part. Developers of distributed solar, wind, and storage are scrambling to understand what they need to do in order to either develop in a new state or territory. It's a lot of work. They miss documents and by the time they go through due diligence either to get funding or to sell their portfolios, their underwriters and acquirers are coming back to them and saying “Hey, you're missing this study about endangered species.” or “You haven't looked at this yet.” and so they have to go back.

Helena: I fan-girled over the IRA. For a good amount of time it looked like it was not going to pass— in case you don’t remember, we had the Build Back Better Act and that totally fell through. Then, seemingly out of nowhere, the IRA passes and I'm like holy crap! I need to go into a deep dark hole, bundle up with some tea, and read this end-to-end, and then read it again.

At this point in time, myself and my co-founder were 9 months into building a company focused on pre-financing carbon credit projects. Pre-IRA, working on increasing liquidity for project developers was the highest leverage way to accelerate regeneration. Even though our approach today is very different, it's always been about “How do we get money to climate companies?”

When the IRA passed, it became obvious that it would change a lot. History tends to repeat itself, and a working thesis was that the winners would be chosen just how solar became a winner for renewables a decade ago. Studying the IRA helped build a mental model around which companies would suddenly have viable business plans, how companies should pivot, and more. What also stood out, was that while there was all this money, it was extremely hard to access.

To validate this assumption we started talking to everyone we knew in climate. Everyone was overwhelmed by how to navigate the IRA, had no bandwidth to do the research, and started asking me for my advice and pointers to where to start diving into policy.

Long story short, our process didn't start with “Oh, there's LLMs. Can we shove this into climate?” It was more like “This is a problem. Let's help companies navigate it.” Then Language Models like GPT4 launched and suddenly there was a new way to navigate paperwork heavy sectors like policy.

The Solution and the Role of LLMs

How are you thinking about solving the problem and the role that LLMs play?

Helena: Our mission is to accelerate funding for climate technologies — given much of the funding comes via government funding, it’s been a priority to build tools to navigate said policy.

Our current focus is grants, a non-dilutive funding option for technologically risky projects. We’re building workflows for grant sourcing, drafting, and reviewing that heavily lean on LLMs. As we expand past grants to other capital planning tools, we see the focus on LLM features evolving.

As an aside: my higher level opinion on LLMs, is that they are very much an ingredient and should be an ingredient in almost every company. In a few years, saying you are an “LLM company” will be akin to saying you’re a “cloud company.” Every product will have components that are enhanced by semantic search or LLMs.

Beyond grants, there’s several other government based funding opportunities for the climate transition— from loans to debt to product financing. All of these pathways involve similarly challenging application workflows (ex applying to the Loan Programs Office). Right now this is solved by hiring entire teams of finance and policy experts. Streamline exists to help companies navigate this messy capital journey from SBIR Grants to LPO.

Julia - How are you thinking about tackling this equally complex web of renewable energy stakeholders for project development?

Julia: One thing just to add to Helena's point around sometimes you apply to these grants and you realize that it was never meant for you. It actually happened to me. I applied a few months ago to the Community Power Accelerator by the DOE. It's meant to support small-scale developers of community solar, but I thought, maybe they’ll offer grants to people that are building technology to support developers, but nope. So I filled out the whole application and answered so many questions in depth and they came back to me and of course, it's written somewhere in the fine print that you need to at least have a plan for developing projects.

In the world of distributed generation, we help developers speed up their process by organizing their documents and helping them develop an understanding of the requirements and incentives for each jurisdiction. So instead of hiring a policy analyst whose role is to identify new markets and study new guidelines, what if you could just use a tool? To find the optimal sites from a real estate perspective, but also to know what the incentives, requirements, and interconnection process is like for each location? Once you're ready to develop, imagine LLM-powered software tools that can help you identify optimal sites, track applications for interconnection and permitting, and parse through the legalese for state-specific policies or incentives in your area of interest.

Unstructured vs. Structured Data

Helena you had this point on the power of LLMs for unstructured data. What is the difference between structured and unstructured data in climate policy?

Helena: To start we should define what is unstructured vs structured data.

Structured data you can think of as like an Excel sheet. You have nice headings and columns. Every field has some kind of label and you can apply an equation to, make a cool model, etc.

So much of building an AI model involves cleaning data. For images, this includes tagging and labeling. It's associating unstructured data with the structure of a label. The goal is to get things into a structured data format so that we can manipulate it and connect the dots and create logic. It's really hard to create logic if all you have is a blob of text.

On the second part of your question - Why does this unstructured restructured data actually matter to understand policy?

Rules are not structured or “code-ified” in any way. Rules and regulations are presented in long narrative documents. Sifting through this is currently very manual. Let’s take a simple example of parsing a grant opportunity:

You want to check if you meet the requirements. You can scroll through looking for it, or can Command F on a document and search for the word “requirement”. Then, you click through all the matches and manually sift to understand what applies to you. You then manually see if you meet these requirements before continuing. In your head, you are matchmaking between your understanding of the policy, and your understanding of your own organization/proposal.

There’s really two high level solutions to making this entire process easier. The first, is to present policy in a structured way. The second, which is only possible now, is to use language models that have been trained on the entire internet.

What's possible with language models is semantic search. I can pretty much search based on meaning. Rather than searching requirements and then having like command F my way through the document, I could ask a question like “Based on my profile, what are the requirements specific to me?”

The problem only gets more interesting from here.

Not only are single documents hard to parse through, they are also dispersed through the internet on various local gov website. As an example, we can look at a bike rebate programs. If you wanted to build a map for bike rebates, you’d first have to go city by city — or state by state — and collect all of the policies. You’d have to establish if they have a policy in place, find the details, and put it into a spreadsheet or other database. Finally you could create an interface that points to this spreadsheet. These policies likely change every quarter, so you need to redo a lot of this work.

This isn’t a theoretical problem - it’s the core of building a planning tool for any real world infrastructure. Paces does this for renewable energy and Nira for do injection points. They all face the same problem where you have decentralized data in very messy formats. They turn it into a clean, structured format and then create planning software on top of it.

Now that language models are around, these companies are most likely thinking about different ways of doing this than doing it by hand. Because that's before language models, the only way of creating planning software on top of decentralized data like this was done by hand.

Now you can theoretically create agents that go and scrape every single local government website, turn that into structured data and help people create planning software on top of it. So back to the bike example. Every government is putting out documents, PDFs, etc. 20+ pages of what are the different rules to follow and for the first time, we can create logic rules. Like If This, Then That rules. That's extremely exciting and hasn't been possible in policy before. That's why I think the LLM x Policy is probably one of the most interesting things.

The Complexity of Renewable Project Development

Julia: To your point, Matt, the financing piece and the construction piece and the consumption, AKA offtake, that's the tip of the iceberg. The reality is that by the time something is even fundable, it's already significantly de-risked.

In order to even get there, there are a few steps that involve a lot of different stakeholders. The first step is just making a decision about where to build a solar project. This might be out in the open, on top of a shopping mall, but regardless of where there's that initial decision of “Where do I even place this?”

How feasible is this? Where do I go if I look at aerial images? Which ones are closest to an interconnection point? Have healthy transmission feasibility around? The highest likelihood of passing the permit? Getting the appropriate environmental approvals? This is all a lot of text, applications, PDFs, financial models, and conversations and phone calls with property owners and utilities.

These solar and storage portfolios are cash-generating products that will generate a recurring stream of cash over the next 15 to 20 years. There's an upfront investment that goes into it. And then at any point in these stages, M&As might happen. Whenever portfolios trade hands, documents need to trade hands. Documents as in PDFs, text files, financial models, and beefed-up Excel spreadsheets. I have never seen such complex and robust spreadsheets. These professionals dedicate hours to putting these spreadsheets together and one small input may influence the entire project’s profitability. There’s fragmentation among stakeholders like policy analysts, developers, external parties, etc.

Let me share a bit about why people use SolarGPT. People want Q&A against external legislation and resources. One person said “Our group in Arizona is facing some horrible hurdles to get community solar up and running. We're hoping that there might be something in the law that circumvents or nullifies some of the utilities’ opposition to community solar.”

Some people want quick answers to IRA-related questions. Some people want incentive structure information. And some actually want advice on grant writing. Others want internal Q&A against internal development documents.

I used to think that it was just for external use cases, but because there's so much internal data, it’s easy to lose track of things. In terms of actual queries, people are asking “Are there grants for community solar projects in Arizona?” “Does solar work in Alaska?” “Can you summarize the solar crediting policy for the Dakota Valley Electric co-op in North Dakota?” “What are the steps to develop a community solar project in Illinois?” As you can see, there are a lot of very specific questions state by state. And some people know exactly what policy they're looking for answers to, but they just don't know if the policy is ready yet.

What risks and bottlenecks are you aware of?

Helena: The largest risk is that the amount of funding that we need from governments doesn't actually manifest.

Julia: I had one developer say that they're spending millions every year on project valuation reports. The reason why it costs so much is that consultants manually generate these appraisals. One challenge is if we generate appraisal reports using LLMs by parsing through the same things these consultants parse through and spit out a valuation, will it be considered credible? If it is credible, then we can save developers a lot of money. But there might also be opposition from companies that generate these reports. So there's a question of “Does AI-generated output carry the same weight even if they are created with the same decision process as whatever a human is doing?”

The other risk is developers saying, “Yeah, my biggest bottleneck is permitting and interconnection takes forever to get approved. Can you go just talk to the utilities? Can you make them more willing?” Just like governments don't have any incentives to make their workflows better, utilities don't want to do solar developers a favor. So are you really going to innovate from the utility side? I really hope people are pushing for that, but it's really hard. So I'm worried that one bottleneck is going to be just purely external one that would require policy to change.

Turning it to you 🫵

I’m curious - what other topics do you want me to look into? Are you fired up about a particular thing? If you wanna collab (potential guest post?) or just casually chat, let me know in the comments or via email!

Build in Climate

Discussion about this post