ProtonDB Ratings, Revised

2020-02-15

Some of you may not be aware, but ProtonDB, the portal where you can register how well Windows games run on Linux with Proton, has changed their rating system in late 2019. Previously, it was following the WINE way of doing things, by assigning ratings such as “Platinum”, “Gold”, etc…

There were always issues with doing that, as ratings were giving by end users. End users are good at many things, but not at rating thing consistently when the definition is a little fuzzy to behind with. For example, “Platinum” is supposed to mean “works out of the box without any modification”. “Gold” is supposed to be “works perfectly once you do a small modification” - editing a file, installing an additional library, changing the settings, etc… But I can clearly remember instances where users rated interchangeably between one and the other depending on their perception.

This is why the right way of doing things was always to ask users for all factuals aspects related to the game at hand, and use that kind of data to generate some kind of rating from there. This way, you can control how consistent the final rating is.

This is precisely what changed in ProtonDB from October 2019. Now the data collected is quite different from what was collected before, and it took me a while to bridge the gaps between the different versions. Here’s now what the compatibility chart looks like…

From my end I had to revise the ratings based on the new metrics captured. I have decided the following to try to be as close as possible to the previous ratings in concept:

Platinum refers to games that were rated as working out of the box without any modification.
Gold refers to games that worked very well but required at least one modification to run as expected.
Borked refers to games that refused to run.
Silver is for everything that’s in between (meaning that the game runs but performance or some features do not work as expected)
Bronze does not exist anymore, yet still appears sometimes when a game is rated somewhere between Borked or Silver (because I use the median score when several ratings are available for each game).

The direct consequence is that the “Silver” has grown a lot while “Gold” has shrunk quite a bit. To be fair the previous distinction between “Silver” and “Gold” was never too clear to begin with, and the new rating system enables at least more of a clear cut. And of course, “Bronze” ratings are now mostly gone except when ratings are inconsistent across respondents between “Borked” and “Silver”.

I don’t expect that this is the only way to use the data at hand (one can argue that it’s possible to make better splits between “Platinum” and “Borked” and they would probably be right), but it looks relatively consistent with the previous ratings we had in ProtonDB, so I will probably keep it fixed as such from now on.

If you have any suggestion, feel free to comment!