-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider a future integration to evaluate a site's carbon impact #1613
Comments
Also worth considering this JS lib that was integrated with Sitespeed.io recently. |
The Green Web Foundation provide an API for checking the "green" status for web hosts (used by Website Carbon Calculator as part of their test too). https://www.thegreenwebfoundation.org/green-web-feed/ |
Many of the checks surfaced on the Performance page would be applicable for a "sustainability score".
|
We can pull greenhouse into the lighthouse score and display it in the WPT UI reasonably easily, if that helps: |
Love this idea, and am excited it seems to have Tim's support. @fershad your render is great. I'm also wondering if any of these metrics should be considered?
|
What a great feature request! 💪🏽🌱 I'd love to help out with this topic! A couple of thoughts: Be clear about what the numbers are based on – and link to sourcesAt the risk of muddying up @fershad 's cool mockup and being a drag, it might be a good idea to provide an estimate range for CO₂e – or at least a link to an explainer about what the estimates are based on and what the sources are. The problem is: the energy coefficient (kilowatt hours per gigabyte) depends upon which information sources you use. Websitecarbon currently uses 1.8 kWh/GB, although it's used higher estimates in the past. I've also spoken to Tom Greenwood (creator of the Website Carbon Calculator) who told me they'll probably update those estimates due to newer research. Gerry McGovern (author of "World Wide Waste", 2020) uses 0.015 kWh/GB. And I've seen others use numbers in between that. The Sustainability Score should help drive actionable changeThere are some great ideas in the previous comments. It'll certainly be tricky to define some sort of "Sustainability Score", and in the end, the results should help drive actionable change.
Explainer pageAs mentioned above, we'd need an Explainer page containing info about:
Display the 'Carbon Cost' in the table on the Test Results pageCurrently on the Test Results page in the Summary view, we have the Cost column in the table that links to @tkadlec 's awesome website for estimating the monetary costs of visiting a webpage. |
I think I considered some of these points on https://github.com/rposbo/greentree It uses the Carbon Calculator logic for most info, but doesn't go into the fantastic details mentioned above. This article has some brilliant details about trying to calculate this story of data too https://sustainablewebdesign.org/calculating-digital-emissions/ |
WIP PR for the WPT Agent to include |
Great points by all above - here's what I see needs to happen:
Thoughts? Also, how do we ensure this will be adopted by master WPT? This is my first open source experience and I'd be bummed if we implemented this but it sat stagnant, unmerged. |
I'm thinking that we could pull in/directly reference the data sets used to power the GWF API, and ensure there's a circuit breaker pattern implemented: https://www.thegreenwebfoundation.org/green-web-datasets/ I believe the main model with WPT is to offload processing to the agent where possible, and only use the server where things might be needed later on, and it can't be known by a single agent (e.g., comparison videos) I'm not sure where @tkadlec 's website cost is calculated, but that could be an appropriate model to copy? |
:) Love all the excitement around this one!
That is computed after the fact, mostly independent of WebPageTest's agent and server. It looks at the data from a WebPageTest run and uses that to calculate the cost. In the WebPageTest results, where you see the dollar signs, Pat used a simple "for every 500kb, add a dollar sign" approach to hint at the cost, but the actual cost isn't actually known to WPT at all at the moment.
I definitely think this is something we want to bake in at some point (rather than rely on an external integration using the API). We've got some stuff in the works that will come into play here, but for now I'm thinking both a general "score" for sustainability as well as a meaningful presentation of the data are both highly relevant and useful.
Yeah, when we've talked about doing this this was the plan. I'd rather go straight to the source here and then see what kind of useful information we can get, whether we should pair it with anything, and how we should present it. |
Not to pre-emptively overstate the potential benefits of this, but exposing the approximate environmental impact of a website to something as ubiquitous to website testing as WebPageTest could help to push this into the average business's conciousness, such that we start comparing ourselves to competitors not only by speed, but also by environmental impact. My hopes are that this could snowball into becoming a new ranking factor in search engines (any Googlers on this thread who can help? 😉), thereby forcing companies to become more aware and considerate about the 3rd parties, infrastructure, and application code they decide to use. Help us, @tkadlec , you're our only hope 😁 |
110% this!!! I might have jumped the gun with the mockup 😅 This could be a good way to initially emphasise best practices in the sustainable development space. |
I found the 'website cost' code that @tkadlec and @rposbo were referring to, and now I understand what Tim was saying about it being more of a display element than an actual calculation. Doesn't seem like the model to follow here. Sounds like everyone is leaning towards grabbing a copy of the GWF dataset and using that instead of relying on a third party. Was looking at the data yesterday and just wanted to point out that it's currently 200MB when uncompressed, and appears to be stored as "SQLite format 3". Is that usable within the WPT server setup? Last I checked WPT didn't use a database but instead just relied on the filesystem. Also it's updated daily so we'd need to sync / grab a fresh copy periodically. |
Apologies if I'm being too verbose here, just trying to document my thoughts along the way - I realized that the CDN check is very similar to what we're trying to achieve here, and that may be a better model to follow. It appears that detection is done at the agent level, using an array here. If we mimic that approach, we'd need each agent to periodically download the large GWF dataset and then check against that - doesn't appear to be a problem, but is a lot of data, and this dataset is just going to keep getting bigger (hopefully). I'm wondering if it'd be better to have GWF expose the green hosts, instead of the urls, but am not sure how we'd determine the host of the url. I'll reach out to GWF to see how they do it. @rposbo I see this WIP and I think you have the right approach to implement this at the agent level, but sounds like we don't want to rely on 3rd party apis (greenhouse depends on the GWF API, yes?) On another note, I'm used to implementing unit tests with every PR, yet I'm seeing some PR's in the wptagent repo that are passing checks but have parse errors in them (example). @tkadlec Are the TravisCI build checks working as intended? |
Hi folks - I'm one of the people working on the green web foundation databases and code, and I was tagged in a slack channel mentioning this issue. We started releasing the sqlite snapshots with exactly with this use case like this in mind, so this was a nice issue to come across! I've linked to an example below some code using it the local snapshot, in a plugin that powers searx, a search engine that uses here. It essentially treats sqlite like a key-value store: https://github.com/searx/searx/pull/1878/files A little more about that project below: The examples there might help if you're looking to implement something local. Or come to think of it, you could use it to demonstrate prototypes of factoring in greener infrastructure in search engine result ranking decisions (!). Keeping local domain db snapshots small.In the sitespeed plugin we used the sqlie snapshots too to begin with. It was eventually swapped out in favour of a big ol' json file, to avoid needing to compile a version of sqlite on npm install. You could plausibly do that too, but we've had some success with accessing compressed key value data too from various tools, that might be another approach later on once you have this working. Internally, we use MariaDB for running the green web foundation database and we've started using the MyRocks storage engine to save space. This uses RocksDB under the hood, and more or less halves the size of the tables. If you want fast, local, in-process key-value access to the compressed data, that may be an option for you if you need to keep local size small. I think there are node libraries for using Rockdb, but that might introduce a compile step. We've also started storing some data as Parquet files , and using DuckDB (another in-process database, like sqlite) to access the compressed files too. We haven't used this for the domain info yet, but it's given us pretty dramatic savings storage wise on other tables with lots of repetitive data. you can read a bit more about duckdb and parquet files here. It's been the most convenient way to access compressed parquet files we've found, and I think there libraries for using it from node too). More on the querying parquet trick here CO2.jsWe have a bit of work scheduled in December to implement that CO2 calculation model used by ecoping and website carbon that's outlined below here, and have it as an option, like we have with the one byte model from the shift project. https://sustainablewebdesign.org/calculating-digital-emissions/ This would help make numbers consistent across websitecarbon, sitespeed.io, ecoping, and presumably now webpagetest. If you're looking at methodologies for calculating CO2 figures it might be worth being aware of this project too - it's a proposed spec to develop models against for calculating emissions from software: Anyway - while I'm out the rest of the day today, and I'm happy to share anything I know that would help 👍 |
Sounds like we're all in agreement about using the GWF data as our source here. Now we still need to decide the best approach to interact with that data. Here are the options I see (please do add others if you have them) Option 1: GWF Database on AgentEach agent downloads, extracts, and serves locally via sqllite the GWF 'Green Urls' dataset. Concern is that dataset is currently 200MB uncompressed (and has doubled in size since 9 months ago, so has potential to get quite large). But maybe this isn't a big concern since agents are typically run on better hardware than the server? Also, the data is not going to change from one run to another (within the same test suite), so seems a little silly to have this computed with every run (ie, if our test comprises of 3 runs (the default), there's no point collecting this same data 2 extra times). Option 2: GWF Database on ServerWhile this goes against the typical WPT model of having the agents doing most of the work, it requires less storage overall (only 1 copy on the server as opposed to on every agent). This solves the issue of unneeded extra calculations done at the agent level, and isn't dependent on any specific browser. Option 3: Greenhouse lighthouse plugin on AgentThe Greenhouse plugin already utilizes the GWF API to check each domain. This solution probably requires the least amount of new code. However, under the hood it relies on third party API, and the data returned is simplified to a score and a table. That'd make it hard to show this data in other places, such as the optimization checklist or on request details. But the biggest drawback here is it would require the 'Capture Lighthouse' option to be selected - which is currently defaulted to Off - and would only work with Chrome agents. Option 4: GWF API on Agent/ServerTake Option 1 or 2 above, and use the API instead of the database. Less processing/storage needed, but introduces dependency on a third party. Would also be unfortunate to hit the API at the agent level since green urls don't change within runs and we'd be hammering it harder than necessary. Any other options I'm missing? I feel like Option 2 offers the best balance, but would want buy-in from everyone involved since it doesn't follow the CDN check model. |
I agree with option 2, and I'd suggest that we use a simlar approach to that used for CrUX data; i.e., keep it on the WPT server (tho CrUX data is only cached for the URLs called, and deleted each day, whereas for GWF we'd keep it around and incrementally refresh) and add to test result data (in the gzip results file) after test completion. WPT Server runs a test, requesting CrUX data for the URL: WPT Server checks local cache first, failing back to API call: Looks like the WPT server has an hourly cron to remove cached CrUX data that wasn't cached today: Installing the GWF database on the server can be a WPT Server installation option, pulling down the latest GWF mysql backup (or other data source format if preferred), and adding a cron job to refesh daily/weekly. Looks like the dataset is daily: https://admin.thegreenwebfoundation.org/admin/green-urls - there's a nice naming convention on these backups, so could just pull down the current day's dataset, or perhaps request a handy "/latest" route that will always point to the most recent one. Makes sense to me 😄 |
Hey folks, this might be useful to know - you can pull down the snapshot there, but we also make this information in queryable form using datasette So you can fetch from this link: https://admin.thegreenwebfoundation.org/admin/green-urls But every morning a snapshot is made from MySQL, and put onto our datasets microsite using datasette. The idea here would be that people could make read-only, arbitrary queries in SQL that we hadn't written into an API yet, without unexpected load on the main database. API on datasetteThis itself has an API, that lets you return the results of an SQL query in json or csv form. So this query to get all the domains, orered by descending date order: Is also available as json: Downloading just a sectionIf you know you're doing a daily download you can filter the query to only show the last 24 hours of updated domains rather than downloading the whole dataset each time. The example below where I filtered to only show domains which have an modified time of greater than You can also download these in JSON/ CSV form too: Caveats and limitations
|
Phew. Sorry for being tardy...had a long break and ya'll have been busy! :) LOVE all the excitement here.
@shaneniebergall Those are...no, not working. :( We've added some proper unit tests to the server now, but still need to dress up the agent a bit.
@rposbo Just curious: why would you prefer/suggest the DB here? The more I had looked, the more I thought an approach like CrUX (use an external API, but cache the results) would make a lot of sense here. Not really disagreeing, to be clear, just want to understand if there's a con to that approach that I'm overlooking. |
Thinking through this some more in order to explain my point, I've realised that you're correct @tkadlec ; using the exact same approach as CrUX (call API, cache response, prune cache) makes sense. Make it so! 😁 |
Just to add a teensy bit more oomph to this request - even AWS have introduced a Sustainability Pillar in their Well Architected framework... and we wouldn't want WPT to fall behind, would we? 😁 https://aws.amazon.com/blogs/aws/sustainability-pillar-well-architected-framework/ |
Great insights - the CrUX model looks like a wonderful approach to model after. Looks like in order to keep things manageable, the CrUX integration was done in multiple phases, with the first phase to collect the data, followed by another phase to add the data to the UI. Shall we do something similar? One way we will differ from the CrUX approach is the cache. CrUX results are cached for the day, with each url as en entry in the day's cache folder. Yet we'll want green domains to be cached longer than that, right? (How often do sites change hosts?) Perhaps this is something we store monthly? (and if so, is it going to be problematic that there will be a large influx of API requests to the GWF at the start of the month?). Also curious if we'll run into directory limitations if we have a file per domain - might we need to shard this like other results? Last thought, if we're really trying to encourage a green web I'd love for this to be included in the apache branch too. That would give other providers (catchpoint competitors) the ability to promote sustainability as well. How do we plan for that? |
I'd love to put some time into this over the holidays...but want to make sure I have a solid plan. Any input on my last couple questions? @tkadlec |
FWIW we have a very similar thread over in the Lighthouse repo: GoogleChrome/lighthouse#12548 |
Just a note that the MS Edge team has a proposal out to surface some sustainability & energy usage data in Chrome DevTools. https://github.com/MicrosoftEdge/DevTools/blob/main/explainers/Sustainability/explainer.md |
Checking in a year on from this discussion being kicked off - is there anything we can develop (on the apache branch too) to help expose this info in WPT, or are we waiting on edge and chrome/lighthouse to implement some standard? |
@rposbo I was excited to work on this until I read the thread that Paul pointed us to, which said "The scientific community has not yet reached a consensus on how specifically to measure emissions from any digital product or service. And I'm not expecting this to happen anytime soon." as well as "I've looked at the various efforts to quantify website energy use/carbon footprint but currently they're rather under-developed compared to web performance or accessibility testing. They primarily use proxy metrics like network request count or byteweight. But relying on these is far too indirect." Others seemed to agree, and the Lighthouse issue was closed with no work. That took the wind out of my sails...Unless we hear from those that have the power to merge within WPT, I'd hate to work on a lost cause. |
HEY EVERYONE! We've closed this issue because the Carbon Control feature (centered around a carbon footprint metric) has landed in WebPageTest! Thank you to everyone here who commented and shared ideas and shared this post and +1'd it. It helped us keep the feature prioritized and heavily shaped the direction of the first release. Here's a blog post about Carbon Control and how you can use it now in WebPageTest: https://blog.webpagetest.org/posts/carbon-control/ And you can try it out on the homepage, or here: TY! @tkadlec @Nooshu @fershad @paulirish @rposbo @mrchrisadams @shaneniebergall @screenspan |
So sad to see this "feature". Just because people ask for it doesn't mean it's a good thing to build. IMO it distracts attention towards use phase of devices when what matters most is their lifespan and renewal, which is only/mostly actionable through "it's fast": https://blog.ltgt.net/climate-friendly-software/ |
Thank you @scottjehl and all who worked on this. I'm thrilled it is released and can't wait until it becomes the default. As developers we seem to think the environment is only impacted by air travel, cars, corporations, etc. This enables us to become aware and make a difference in the world. YOU ROCK! |
@tbroyer thanks for your feedback (I replied to your post on another platform just now too). Our teams have been evaluating the work around this for a year or so now and went into the space with an ample amount of curiosity and reservations. The actions the tool recommends, like identifying green and non-green hosts and reducing wire weight, are frequently cited as great actions to take towards a more energy efficient presence on the web. Tools for measuring these aspects are also frequently requested from folks who are increasingly being asked to measure and be more accountable for their digital footprint as well (eg https://www.gov.uk/government/publications/greening-government-ict-and-digital-services-strategy-2020-2025/greening-government-ict-and-digital-services-strategy-2020-2025#supply ) The changes this tool recommends are certainly not exclusive of other actions you can also take (many of which may be much more impactful too, if often out of developers hands), such as advocating for reducing device turnover as your article helpfully recommends. To your point on the CO2 model, we're using the Sustainable Web Design model, not 1-Byte. |
@tbroyer since v0.11 CO2.js defaults to the Sustainable Web Design model, though the OneByte model is still available in the library because we know of other tools which use that (release notes). We hope to add more methodologies in the future, and appreciate community feedback on the same. See this issue to investigate adding the DIMPACT model, or this conversation about data transfer as a proxy in general. |
@scottjehl @fershad I'll try to summarize my opinion here (and I can't stress this enough: it's a personal opinion, formed after 6 years digesting ecodesign guidelines; I'm not here to start a debate about the methods and models and numbers; the real issue is about communication mainly).
WebPageTest has a huge visibility, and putting the focus on such a small thing will likely be the tree hiding the forest and lead to more greenwashing rather than real improvements, doing more harm than good (I strongly believe that's what websitecarbon, co2.js and the like do, and similarly for GreenFrame et al: not bad tools per se, but used the wrong way). You know what they say about good intentions? At a bare minimum, emphasize more prominently that the numbers are skewed (another way for me of saying they're probably not worth that much) and more importantly add a big banner that this is far from being enough, that reducing that number is only just a drop in the ocean. |
@scottjehl It's great to see that you're now also including a sort of carbon control to WebPageTest! 👏 I invite you to read this article on the current state of digital carbon footprint measurement tools. |
@tbroyer I do think we need to jumpstart the conversation about this issue. It's not new, but in our industry do not have an understanding of the impact of the digital world on our physical environment. It's important to critique the use of tools like CO2.js, so that they can get better. There are lots of reasons to be wary of any scoring system. Goodhart's law is a real phenomeno (When a measure becomes a target, it ceases to be a good measure). However, we also need to find some means to materialize in some way what is a very complex series of variables. Your article has some great points in it https://blog.ltgt.net/climate-friendly-software/ but shaping an industry such that peoples devices have a longer life-span is a real challenge. So much of that will come down to their user expereince. Much of that will be based on how quickly, and reliably users are able to engage with the sites that they want. That kinda brings us back to Google Lighthouse scores and efforts to highlight to developers with fast machines and broadband that their experience isn't the same as their users. |
Thanks to @Nooshu for the tweet. The mentioned service is http://websitecarbon.com
Happy to discuss potential integration ideas here.
The text was updated successfully, but these errors were encountered: