Green AI: Using Intelligence without melting the planet

AI chip glowing green, symbolizing sustainable intelligence

Green AI: Using Intelligence without melting the Planet

AI November, 2025

Artificial intelligence doesn’t live in the cloud — it lives in servers, datacenters, and silicon. Each time a model answers a prompt, GPUs somewhere spin up, drawing hundreds of watts to process a few lines of text. A single NVIDIA H100 accelerator can draw around 700 watts under load; entire training clusters can reach the power demand of small cities.

According to the International Energy Agency (IEA, 2025), global datacenter electricity use already exceeds 415 TWh per year and could pass 1,000 TWh by 2030 — roughly the annual consumption of Japan. AI workloads account for a fast-growing share of that curve.

Training one large model can emit hundreds of tons of CO₂ and consume millions of liters of cooling water (MIT, 2025), while the number of deployed models keeps multiplying. Efficiency improves, but growth moves faster.

The environmental cost of AI isn’t theoretical anymore. It’s physical, measurable, and accelerating.

Responsibility is shared: Chip makers like NVIDIA, hyperscalers, model creators, and cloud providers all carry structural responsibility. But users are not passive in this story. Every query, image, or video generation carries a footprint, and billions of them add up.

Learning to use intelligence responsibly starts with understanding this reality: AI isn’t virtual — it’s physical, industrial, and resource-heavy. It’s reshaping how we work, create, and consume energy, and we share responsibility for lightening its load. Just as we learned to recycle, wear seatbelts, and rethink smoking indoors, the next cultural shift is learning what green technology means — and how we use intelligence itself.

Index: Terms Worth Knowing

  • LLM (Large Language Model): An AI model trained on text data, usually with billions of parameters or more, used for tasks like writing, answering questions, or coding assistance.
  • Token: A small unit of text that the model reads or writes. More tokens mean more computation and therefore more energy consumed.
  • GPU: A graphics processing unit. It powers most modern AI computations and represents the majority of an AI server’s energy use.
  • CPU: A central processing unit. Slower than a GPU for large AI workloads, but often more energy efficient for smaller models or local inference.
  • PUE / WUE: Power Usage Effectiveness and Water Usage Effectiveness, metrics that express how efficient a data center is in terms of electricity and water.
  • Inference: The moment when an AI model generates an answer based on a prompt. This is the “thinking” phase of AI, and it runs every time you send a query.
  • Prompt: The instruction or question you send to an AI. Clear, focused prompts reduce retries and wasted computation.
  • Embedding / Semantic indexing: A way for AI to create a compact map of meaning for documents it has already processed. Instead of rereading everything for each new question, it can jump directly to the relevant parts.

I. Why GPUs Matter and Why You Should Care

GPUs are the real engine of modern AI. They make large models fast, but they also make them energy-intensive. A single NVIDIA H100 can draw up to around 700 watts under heavy load. The newer NVIDIA Blackwell B200 accelerators, used in DGX and HGX data-center systems, can reach around 1,000 to 1,200 watts per unit in full-load configurations (Tom’s Hardware, 2024; Guru3D, 2024). Training a large model can require thousands of these running for days or weeks.

In comparison, a typical CPU uses roughly 100–150 watts. The gap is not subtle. GPUs are fantastic for performance, but they are not neutral for the grid. Reports from Deloitte (2025) and GreenIT - Impacts environnementaux et sanitaires de l’IA (2025) confirm that AI servers can consume around four times more energy than traditional servers, with hardware lifetimes three to five times shorter.

There are alternatives, especially for smaller or local workloads. Toolchains like llama.cpp and MLC AI allow certain models to run on CPUs or low-power devices. New accelerators such as Intel Gaudi 3 claim up to 30 percent lower energy use for some AI workloads compared to traditional GPU-based setups. Mobile chips like Apple’s Neural Engine and Qualcomm’s Hexagon DSP also push some AI inference to efficient on-device hardware.

For now, though, most cloud AI still runs on GPU-heavy infrastructure. As users, the most effective lever we have is simple: choose lighter models when the task allows it, and avoid waking a giant when a small brain is enough.

II. Everyday Green AI: How We Can All Use It Smarter

AI isn’t magic, it’s electricity, water, and heat. Every query, every image, every “regenerate” wakes up real hardware somewhere, often powered by fossil grids. Here’s how to keep using intelligence without wasting it.

1. Choose the lightest mode

Most AI tools now offer several versions of the same model. Switch to “lite,” “fast,” or “mini” modes when you don’t need deep reasoning: a single toggle can reduce energy use by up to 70%. The same applies to temperature and verbosity: short, focused answers mean fewer tokens and fewer watts.

2. Don't be lazy, keep using search

If an answer already exists online, find it there instead of prompting an LLM. According to IEA data and Washington Post (2025), one short AI query can consume as much energy as sending 30–50 emails.

3. Use smaller models when you can

Not every task needs a giant brain. Different AI models have very different energy footprints, mainly because of how much GPU they need. As of November 2025, here is a practical way to match your needs with the right type of model.

Grammar fixes, summaries, and short drafts run perfectly well on compact models like Gemma 2B, Phi-3 Mini, Llama 3 8B, or Mistral 7B. For everyday writing or rephrasing, lightweight tools such as Grammarly, LanguageTool, or QuillBot, often running locally or as browser extensions, are far more efficient than firing up a large LLM. Reserve big models for reasoning, analysis, or creative synthesis, not for routine.

4. Run AI locally

Lightweight models can now run directly on laptops or phones through tools like Ollama or llama.cpp, which support open models such as Llama 3, Mistral, Gemma, or Phi-3. Other desktop apps like LM Studio or GPT4All offer similar setups for local inference. Cloud models like ChatGPT, Claude, or Gemini don’t run locally yet, though Apple Intelligence and Copilot+ PCs now process some light AI tasks directly on-device. Running models locally avoids data transfers to remote servers and reduces dependency on GPU-heavy infrastructure: cleaner, faster, and private.

5. Prompt with intention

Think before you click “regenerate.” Take a moment to plan your request - what you need, in what format, and how detailed it should be. Well framed prompts save both time and energy: fewer retries, fewer tokens, fewer watts. If you don’t know what you’re asking for, the model won’t know either. And if you only need a quick answer, say so: for example, “Give me three bullet points, no more than 50 words each.” Clarity is efficiency.

6. Avoid AI aggregators (for now)

Some AI dashboards and “orchestrators” send your query to several models at once to compare or merge answers. It sounds clever, but it multiplies the energy cost for a single question. Each model runs its own GPU inference, even if you only read one result in the end. Tools like Perplexity Pro or Mammoth AI offer great features for benchmarking, but they duplicate workloads across multiple APIs. Until routing becomes smart enough to select the most efficient model dynamically, prefer single-model tools or manual selection.

Some open-source frameworks such as LangFuse or Flowise already explore this kind of routing, but most implementations today still perform sequential or parallel inference, not optimized dispatch. In short, avoid waking multiple brains to solve one question.

7. Don't over-generate

Endless re-rolls of images, videos, or text burn real energy. Choose, refine, and stop. Each “just one more version” runs GPUs somewhere. Creation isn’t cleaner by repetition.

8. Prefer text to visuals

Generating one 1024×1024 image can use as much energy as sending several hundred emails; a one-minute AI-generated video can draw several kilowatt-hours, roughly like running a microwave for a couple of hours (MIT News, 2025). Use visuals when they add meaning, not just aesthetics.

9. Reuse your outputs

Keep a prompt notebook, a simple Notion page, Google Doc, or internal wiki, where you save prompts that worked well and the outputs you actually reused. Organize them by task: writing, data cleanup, image generation, summaries. The next time you need something similar, start from what already worked instead of regenerating from scratch. It’s digital recycling: faster next time, lighter every time.

If you want something more advanced, tools like Prompt Genie or Team-GPT let you save, organize, and share prompts across teams. They’re not perfect yet, most focus on collaboration and workflow, not energy tracking, but they’re a solid start for building a sustainable prompt library.

10. Be aware of invisible AI

Behind every feed, playlist, inbox, or ad, there’s machine learning running continuously. Recommendation engines like Netflix, YouTube, Spotify, or Instagram analyze millions of data points per second to keep content flowing, even when you’re not watching or listening. Email filters in Gmail or shopping suggestions on Amazon do the same. You can limit it slightly by disabling personalization, turning off “autoplay,” reducing notifications, or choosing simpler apps, but the real responsibility lies with companies designing these systems to run 24/7. Awareness is the first step toward accountability.

How much energy does one AI action really use?

Action Energy (approx.) Everyday equivalent
One web search 0.03–0.3 Wh Light a LED bulb for about 5 seconds
One AI text query (GPT-4 class) 0.3–3 Wh Send 30–50 emails
One AI image (1024×1024 px) 20–30 Wh Run a microwave for around 2 minutes
One minute of AI video generation 1–2 kWh Bake dinner in an electric oven for about 30 minutes
Training a large AI model ~1 GWh Power around 150 homes for a year

Sources: IEA 2024–2025, Washington Post 2025, MIT News 2025, Green IT 2025

Choose the right model for the task

Use case Model type Examples (2025) Hardware profile Energy footprint
Grammar fixes, rephrasing, short rewrites Tiny or CPU friendly LLMs Gemma 2B, Phi-3 Mini, Llama 3 8B, Mistral 7B Mostly CPU or small GPU Around 0.3 Wh per query
Summaries, data extraction, simple emails Medium models Claude 3 Haiku, GPT-4 Mini, Mixtral 8x7B Moderate GPU Around 1 Wh per query
Reports, analysis, complex reasoning Large flagship models GPT-4 Turbo, Claude 3 Opus, Gemini 1.5 Pro Multi GPU setups Around 2–3 Wh per query
Video or multimodal creation Ultra heavy multi-model systems Kling AI, Sora, Runway Gen-3 Alpha Large GPU clusters Often 10–100× heavier than text

Sources: OpenAI – GPT-4o System Card, Hugging Face – The Environmental Impact of AI, Hugging Face – AI Energy Score

III. The Heavy Side of Creativity

Generative visuals are where AI turns from clever to very heavy. A single one-minute AI-generated video can consume as much energy as running a microwave for two to three hours, according to analyses from MIT News (2025) and estimates compiled by the National Centre for AI (2025). A single 1024×1024 image generated by tools like Midjourney or DALL·E can use roughly the same energy as sending several hundred emails with attachments.

The problem is not creativity, it is waste. Iterations, upscales, minor tweaks, and “just one more version” can multiply the impact by a factor of ten or more. Behind each beautiful frame, there is a datacenter working hard.

How to Create More Responsibly

  • Test at small resolution first, then upscale only final choices.
  • Batch several variations in one request instead of regenerating from scratch each time.
  • Reuse base compositions and edit them locally when possible.
  • For agencies and studios, log approximate energy or carbon per asset and include it in project reporting.

Fewer, more intentional generations tend to produce better work anyway. Constraint can be a creative tool.

IV. AI Sobriety for Teams

In most companies, AI use quickly turns chaotic. Different tools, personal accounts, no visibility - and no idea how much energy or data is being consumed. The goal isn’t to block AI, it’s to organize it.

1. Centralize access

If 200 employees each use personal AI accounts, you get 200 unmanaged sessions and no control. A shared instance through OpenAI Enterprise, Anthropic Teams, or Microsoft Copilot Hub brings everything under one roof, with dashboards to monitor usage and optimize performance. Open-source options like Flowise or LangFuse let you build an internal AI layer that logs queries, monitors usage, and can route models based on simple rules you define.

2. Measure before managing

Use built-in analytics to track prompts, tokens, and models per workspace. Review these metrics regularly, they’re the foundation of both cost control and environmental awareness.

3. Educate and guide

Publish a short guide listing approved tools and when to use small, medium, or large models. For instance: Gemma 2B or Claude 3 Haiku for writing, GPT-4 Mini for analytics, Copilot for developers, Figma AI for designers. Good prompts and good model choices save both time and watts.

4. Host green

Infrastructure matters. Host your AI gateway on low-carbon providers like Infomaniak, Scaleway, or renewable-powered cloud regions from AWS. Combine that with caching and model routing to multiply the impact.

5. Build a culture of responsible intelligence

Monitoring should be about sustainability, not surveillance. When teams understand how prompts translate into cost and energy, they naturally adjust. Centralization, caching, and education turn AI use from a drain into an optimized, collective practice.

Each AI query alone feels harmless. It is the accumulation that creates the problem. The same accumulation, if we change habits, can help fix it. Green AI is not about banning intelligence, it is about teaching it to live within planetary boundaries.

Smarter prompts, smaller models, fewer retries, less vanity. If millions of people use AI just a little more consciously, the difference becomes visible at the grid level. The planet does not need you to stop creating. It needs you to think before you click.

References