Comments/Ratings for a Single Item

Good questions. I'll deffer them to Fergus as he understands what is going on here far better than I do and I could end up giving a wrong answer. But I do know a player who was about 2000. Unfortunately he has a mental condition, he is now about 1400 and getting weeker in all cognitive areas. It is now a strain for him just to walk. Understandably, he could have quit playing chess while at 2000... but he still plays. Anyway, if he quit at 2000 his frozen 2000 rating would certainly be false. Of course, if he quit and his rating climbed, that too would be false. It would need to drop over time to reflect reality. Would this happen with the equations Fergus is using? I don't know... We can shoot all kinds of rating situations around and argue one way or the other, but what is the point? Does it really matter? Why should we get so wrapped up in these values? They are just a means of comparison. Before we had nothing. Now we will have something. If we do not like that 'something' then we can choose the 'unrated game' option once implemented. We can also play in USCF tournaments where our ratings will freeze once we quit playing.
yeah no need to get wrapped up in it, but it would be good to get the best rating system in place, i am sure it would save Fergus a lot of hassle in the future also if people complain, say 'other sites have a better system' etc etc. will be kinda fun too, to see people have ratings, then you can see who is like the 'favorite' and the 'underdog' in games etc etc high drama :)
Michael Howe asks:
what, therefore, is the refutation to my concern that a player's rating be retroactively affected by the performance of a past opponent whose real playing strength has increased or decreased since that player last played him?
A system that offers estimates instead of measurements is always going to be prone to making one kind of error or the other. This is as true of Elo as it is of GCR. Keeping ratings static may avoid some errors, but it will introduce others. The question to ask then is not how to avoid this kind of error or that kind of error. The more important question to ask is which sort of system will be the most accurate overall. Part of the answer to that question is a holistic system. Given that the system estimates relative differences in playing strength, the best way to estimate these differences is a method that bases every rating on all available data. Because of its monodirectional chronological nature, Elo does not do this. But the GCR method does do this. This allows it to provide a much more globally consistent set of ratings than Elo can with its piecemeal approach of calculating ratings. Since ratings have no meanings in themselves and mean something only in relation to each other, a high level of global consistency is the most important way in which a set of ratings can be described as accurate. Since a holistic method is the most important means, if not actually necessary, for achieving this, a holistic method is the way to go, regardless of whatever conceivable errors might still be allowed.
The testdata field takes data in a line by line grid form like this:
1500 0 1 0 1500 0 0 1 1500 1 0 0
It automatically names players with letters of the alphabet. Each line begins with a rating and is followed by a series of numbers of wins against each player. The above form means that A beat B once, B beat C once, and C beat A once.
A weighted history would work with the assumption that anyone who isn't actively playing games is deteriorating in ability. I'm not sure this is an accurate assumption to make. Also, the factors you list as causing performance to drop are going to have less effect on games played on Game Courier, because these are correspondence games played over a long period of time, and a person may take time off for illness or disinterest without it affecting his game. When it does affect someone's games, it will generally affect fewer games than it would for someone active in Chess tournaments. Also, the large timeframe on which games are played is going to make it even harder to determine what the weights should be. For these reasons, I am not convinced that a weighted history should be used here. Anyway, if you do want ratings that better reflect the current level of play among active players, you already have the option of using the Age Filter to filter out older games. I think that should suffice for this purpose.
I want to draw attention to the main change I made today. You may notice that the list of ratings now uses various background colors. Each background color identifies a different network of players. The predominant network is yellow, and the rest are other colors. Everyone in a network is connected by a chain of opponents, all of whom are also in the network. Regarding weighted histories, they probably work better for the world of competitive Chess, in which active players normally play several rated games at regular intervals. This frequency and regularity provides a basis for weighting games. But Game Courier lacks anything like this. Games here are played at the leisure of players.
I've always thought the best implementation of ratings would be an 'open-source' approach: make public the raw data that go into calculating the ratings, and allow many people to set up their own algorithms for processing the data into ratings. So users would have a 'Duniho rating' and a 'FIDE rating' and 'McGillicuddy rating' and so on. Then users could choose to pay attention to whichever rating they think is most significant. Over time, studies would become available as to which ratings most accurately predict the outcomes of games, and certain ratings would outcompete others: a free market of ideas.I also like the open-source approach (maybe make the raw data XML, plain-text, or both), but there should also be one built-in to this site as well, so if you don't have your own implementation you can view your own.
'I also like the open-source approach (maybe make the raw data XML, plain-text, or both), but there should also be one built-in to this site as well, so if you don't have your own implementation you can view your own.' Sure, the site should have its own 'brand' of ratings. But I mean, it would be good to make ratings from many user-defined systems available here also. Just as the system allows users to design their own webpages (subject to editorial review) and their own game implementations, there could be a system whereby users could design their own ratings systems, and any or all these systems could be available here at CVP to anyone who wants to view them, study their predictive value, use them for tournament matchings, etc. Of course, it's much easier to suggest a system of multiple user-defined rating schemes (hey, we could call it MUDRATS) than to do the work of implementing it. But if enough people consider the idea and feel it has merit, eventually someone will set it up someplace and it will catch on.
25 comments displayed
Permalink to the exact comments currently displayed.