Green AI: Using Intelligence without melting the planet
Artificial intelligence doesn’t live in the cloud — it lives in servers, datacenters, and silicon. Each time a model answers a prompt, GPUs somewhere spin up, drawing hundreds of watts to process a few lines of text. A single NVIDIA H100 accelerator can draw around 700 watts under load; entire training clusters can reach the power demand of small cities.
According to the International Energy Agency (IEA, 2025), global datacenter electricity use already exceeds 415 TWh per year and could pass 1,000 TWh by 2030 — roughly the annual consumption of Japan. AI workloads account for a fast-growing share of that curve.
Training one large model can emit hundreds of tons of CO₂ and consume millions of liters of cooling water (MIT, 2025), while the number of deployed models keeps multiplying. Efficiency improves, but growth moves faster.
The environmental cost of AI isn’t theoretical anymore. It’s physical, measurable, and accelerating.
Responsibility is shared: Chip makers like NVIDIA, hyperscalers, model creators, and cloud providers all carry structural responsibility. But users are not passive in this story. Every query, image, or video generation carries a footprint, and billions of them add up.
Learning to use intelligence responsibly starts with understanding this reality: AI isn’t virtual — it’s physical, industrial, and resource-heavy. It’s reshaping how we work, create, and consume energy, and we share responsibility for lightening its load. Just as we learned to recycle, wear seatbelts, and rethink smoking indoors, the next cultural shift is learning what green technology means — and how we use intelligence itself.
Index: Terms Worth Knowing
- LLM (Large Language Model): An AI model trained on text data, usually with billions of parameters or more, used for tasks like writing, answering questions, or coding assistance.
- Token: A small unit of text that the model reads or writes. More tokens mean more computation and therefore more energy consumed.
- GPU: A graphics processing unit. It powers most modern AI computations and represents the majority of an AI server’s energy use.
- CPU: A central processing unit. Slower than a GPU for large AI workloads, but often more energy efficient for smaller models or local inference.
- PUE / WUE: Power Usage Effectiveness and Water Usage Effectiveness, metrics that express how efficient a data center is in terms of electricity and water.
- Inference: The moment when an AI model generates an answer based on a prompt. This is the “thinking” phase of AI, and it runs every time you send a query.
- Prompt: The instruction or question you send to an AI. Clear, focused prompts reduce retries and wasted computation.
- Embedding / Semantic indexing: A way for AI to create a compact map of meaning for documents it has already processed. Instead of rereading everything for each new question, it can jump directly to the relevant parts.
I. Why GPUs Matter and Why You Should Care
GPUs are the real engine of modern AI. They make large models fast, but they also make them energy-intensive. A single NVIDIA H100 can draw up to around 700 watts under heavy load. The newer NVIDIA Blackwell B200 accelerators, used in DGX and HGX data-center systems, can reach around 1,000 to 1,200 watts per unit in full-load configurations (Tom’s Hardware, 2024; Guru3D, 2024). Training a large model can require thousands of these running for days or weeks.
In comparison, a typical CPU uses roughly 100–150 watts. The gap is not subtle. GPUs are fantastic for performance, but they are not neutral for the grid. Reports from Deloitte (2025) and GreenIT - Impacts environnementaux et sanitaires de l’IA (2025) confirm that AI servers can consume around four times more energy than traditional servers, with hardware lifetimes three to five times shorter.
There are alternatives, especially for smaller or local workloads. Toolchains like llama.cpp and MLC AI allow certain models to run on CPUs or low-power devices. New accelerators such as Intel Gaudi 3 claim up to 30 percent lower energy use for some AI workloads compared to traditional GPU-based setups. Mobile chips like Apple’s Neural Engine and Qualcomm’s Hexagon DSP also push some AI inference to efficient on-device hardware.
For now, though, most cloud AI still runs on GPU-heavy infrastructure. As users, the most effective lever we have is simple: choose lighter models when the task allows it, and avoid waking a giant when a small brain is enough.
II. Everyday Green AI: How We Can All Use It Smarter
AI isn’t magic, it’s electricity, water, and heat. Every query, every image, every “regenerate” wakes up real hardware somewhere, often powered by fossil grids. Here’s how to keep using intelligence without wasting it.
1. Choose the lightest mode
Most AI tools now offer several versions of the same model. Switch to “lite,” “fast,” or “mini” modes when you don’t need deep reasoning: a single toggle can reduce energy use by up to 70%. The same applies to temperature and verbosity: short, focused answers mean fewer tokens and fewer watts.
2. Don't be lazy, keep using search
If an answer already exists online, find it there instead of prompting an LLM. According to IEA data and Washington Post (2025), one short AI query can consume as much energy as sending 30–50 emails.
3. Use smaller models when you can
Not every task needs a giant brain. Different AI models have very different energy footprints, mainly because of how much GPU they need. As of November 2025, here is a practical way to match your needs with the right type of model.
Grammar fixes, summaries, and short drafts run perfectly well on compact models like Gemma 2B, Phi-3 Mini, Llama 3 8B, or Mistral 7B. For everyday writing or rephrasing, lightweight tools such as Grammarly, LanguageTool, or QuillBot, often running locally or as browser extensions, are far more efficient than firing up a large LLM. Reserve big models for reasoning, analysis, or creative synthesis, not for routine.
4. Run AI locally
Lightweight models can now run directly on laptops or phones through tools like Ollama or llama.cpp, which support open models such as Llama 3, Mistral, Gemma, or Phi-3. Other desktop apps like LM Studio or GPT4All offer similar setups for local inference. Cloud models like ChatGPT, Claude, or Gemini don’t run locally yet, though Apple Intelligence and Copilot+ PCs now process some light AI tasks directly on-device. Running models locally avoids data transfers to remote servers and reduces dependency on GPU-heavy infrastructure: cleaner, faster, and private.
5. Prompt with intention
Think before you click “regenerate.” Take a moment to plan your request - what you need, in what format, and how detailed it should be. Well framed prompts save both time and energy: fewer retries, fewer tokens, fewer watts. If you don’t know what you’re asking for, the model won’t know either. And if you only need a quick answer, say so: for example, “Give me three bullet points, no more than 50 words each.” Clarity is efficiency.
6. Avoid AI aggregators (for now)
Some AI dashboards and “orchestrators” send your query to several models at once to compare or merge answers. It sounds clever, but it multiplies the energy cost for a single question. Each model runs its own GPU inference, even if you only read one result in the end. Tools like Perplexity Pro or Mammoth AI offer great features for benchmarking, but they duplicate workloads across multiple APIs. Until routing becomes smart enough to select the most efficient model dynamically, prefer single-model tools or manual selection.
Some open-source frameworks such as LangFuse or Flowise already explore this kind of routing, but most implementations today still perform sequential or parallel inference, not optimized dispatch. In short, avoid waking multiple brains to solve one question.
7. Don't over-generate
Endless re-rolls of images, videos, or text burn real energy. Choose, refine, and stop. Each “just one more version” runs GPUs somewhere. Creation isn’t cleaner by repetition.
8. Prefer text to visuals
Generating one 1024×1024 image can use as much energy as sending several hundred emails; a one-minute AI-generated video can draw several kilowatt-hours, roughly like running a microwave for a couple of hours (MIT News, 2025). Use visuals when they add meaning, not just aesthetics.
9. Reuse your outputs
Keep a prompt notebook, a simple Notion page, Google Doc, or internal wiki, where you save prompts that worked well and the outputs you actually reused. Organize them by task: writing, data cleanup, image generation, summaries. The next time you need something similar, start from what already worked instead of regenerating from scratch. It’s digital recycling: faster next time, lighter every time.
If you want something more advanced, tools like Prompt Genie or Team-GPT let you save, organize, and share prompts across teams. They’re not perfect yet, most focus on collaboration and workflow, not energy tracking, but they’re a solid start for building a sustainable prompt library.
10. Be aware of invisible AI
Behind every feed, playlist, inbox, or ad, there’s machine learning running continuously. Recommendation engines like Netflix, YouTube, Spotify, or Instagram analyze millions of data points per second to keep content flowing, even when you’re not watching or listening. Email filters in Gmail or shopping suggestions on Amazon do the same. You can limit it slightly by disabling personalization, turning off “autoplay,” reducing notifications, or choosing simpler apps, but the real responsibility lies with companies designing these systems to run 24/7. Awareness is the first step toward accountability.
How much energy does one AI action really use?
| Action | Energy (approx.) | Everyday equivalent |
|---|---|---|
| One web search | 0.03–0.3 Wh | Light a LED bulb for about 5 seconds |
| One AI text query (GPT-4 class) | 0.3–3 Wh | Send 30–50 emails |
| One AI image (1024×1024 px) | 20–30 Wh | Run a microwave for around 2 minutes |
| One minute of AI video generation | 1–2 kWh | Bake dinner in an electric oven for about 30 minutes |
| Training a large AI model | ~1 GWh | Power around 150 homes for a year |
Sources: IEA 2024–2025, Washington Post 2025, MIT News 2025, Green IT 2025
Choose the right model for the task
| Use case | Model type | Examples (2025) | Hardware profile | Energy footprint |
|---|---|---|---|---|
| Grammar fixes, rephrasing, short rewrites | Tiny or CPU friendly LLMs | Gemma 2B, Phi-3 Mini, Llama 3 8B, Mistral 7B | Mostly CPU or small GPU | Around 0.3 Wh per query |
| Summaries, data extraction, simple emails | Medium models | Claude 3 Haiku, GPT-4 Mini, Mixtral 8x7B | Moderate GPU | Around 1 Wh per query |
| Reports, analysis, complex reasoning | Large flagship models | GPT-4 Turbo, Claude 3 Opus, Gemini 1.5 Pro | Multi GPU setups | Around 2–3 Wh per query |
| Video or multimodal creation | Ultra heavy multi-model systems | Kling AI, Sora, Runway Gen-3 Alpha | Large GPU clusters | Often 10–100× heavier than text |
Sources: OpenAI – GPT-4o System Card, Hugging Face – The Environmental Impact of AI, Hugging Face – AI Energy Score
III. The Heavy Side of Creativity
Generative visuals are where AI turns from clever to very heavy. A single one-minute AI-generated video can consume as much energy as running a microwave for two to three hours, according to analyses from MIT News (2025) and estimates compiled by the National Centre for AI (2025). A single 1024×1024 image generated by tools like Midjourney or DALL·E can use roughly the same energy as sending several hundred emails with attachments.
The problem is not creativity, it is waste. Iterations, upscales, minor tweaks, and “just one more version” can multiply the impact by a factor of ten or more. Behind each beautiful frame, there is a datacenter working hard.
How to Create More Responsibly
- Test at small resolution first, then upscale only final choices.
- Batch several variations in one request instead of regenerating from scratch each time.
- Reuse base compositions and edit them locally when possible.
- For agencies and studios, log approximate energy or carbon per asset and include it in project reporting.
Fewer, more intentional generations tend to produce better work anyway. Constraint can be a creative tool.
IV. AI Sobriety for Teams
In most companies, AI use quickly turns chaotic. Different tools, personal accounts, no visibility - and no idea how much energy or data is being consumed. The goal isn’t to block AI, it’s to organize it.
1. Centralize access
If 200 employees each use personal AI accounts, you get 200 unmanaged sessions and no control. A shared instance through OpenAI Enterprise, Anthropic Teams, or Microsoft Copilot Hub brings everything under one roof, with dashboards to monitor usage and optimize performance. Open-source options like Flowise or LangFuse let you build an internal AI layer that logs queries, monitors usage, and can route models based on simple rules you define.
2. Measure before managing
Use built-in analytics to track prompts, tokens, and models per workspace. Review these metrics regularly, they’re the foundation of both cost control and environmental awareness.
3. Educate and guide
Publish a short guide listing approved tools and when to use small, medium, or large models. For instance: Gemma 2B or Claude 3 Haiku for writing, GPT-4 Mini for analytics, Copilot for developers, Figma AI for designers. Good prompts and good model choices save both time and watts.
4. Host green
Infrastructure matters. Host your AI gateway on low-carbon providers like Infomaniak, Scaleway, or renewable-powered cloud regions from AWS. Combine that with caching and model routing to multiply the impact.
5. Build a culture of responsible intelligence
Monitoring should be about sustainability, not surveillance. When teams understand how prompts translate into cost and energy, they naturally adjust. Centralization, caching, and education turn AI use from a drain into an optimized, collective practice.
Each AI query alone feels harmless. It is the accumulation that creates the problem. The same accumulation, if we change habits, can help fix it. Green AI is not about banning intelligence, it is about teaching it to live within planetary boundaries.
Smarter prompts, smaller models, fewer retries, less vanity. If millions of people use AI just a little more consciously, the difference becomes visible at the grid level. The planet does not need you to stop creating. It needs you to think before you click.
References
- IEA (2025) – Energy demand from AI
- IEA – Data centres and data transmission networks
- Washington Post (2025) – “ChatGPT is an energy guzzler”
- MIT News (2025) – Generative AI’s environmental impact
- Green IT (2025) – Impacts environnementaux et sanitaires de l’intelligence artificielle
- Deloitte (2025) – GenAI power consumption and sustainable data centers
- The Guardian (2025) – AI and data centre power
- The Guardian (2025) – AI and data centre power
- OpenAI – System Cards and model documentation
- Hugging Face – Benchmarks and efficiency discussions
- Tripp et al. (2024) – Measuring energy consumption of deep neural networks
- NVIDIA Research – Blackwell and data center architectures
- llama.cpp – LLM inference on CPU
- MLC AI – Machine learning compilation for efficient devices
- Intel – Gaudi 3 accelerator overview