[ Help | Earliest Comments | Latest Comments ]

[ List All Subjects of Discussion | Create New Subject of Discussion ]

[ List Earliest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]

Comments/Ratings for a Single Item

⇧Earliest ⇧Earlier ⇧Reverse Order⇩ ~~Later~~

Game Courier Ratings. Calculates ratings for players from Game Courier logs. Experimental.[All Comments] [Add Comment or Rating]

🕸📝Fergus Duniho wrote on Fri, Apr 27, 2018 07:16 PM UTC:

A dictatorial decree tells other people what to do. I simply spoke about what I will not be doing. A veto is the exercise of executive authority to reject a piece of legislation. I was exercising my privilege as the creator and programmer of this script to not make changes to it I'm not interested in making.

Our discussion of this involved you proposing the idea, me giving my reasons for rejecting it, and you agreeing with one of them. I expected that would end things, but other people continued to run with the idea. In the meantime, I never changed my mind about it being arbitrary, I do not believe that changing the values for win and loss would help, and I do not believe that H. G.'s proposal, which seems to involve reducing the value of a win without changing the value of a loss, would solve anything. Changing the value of one without changing the value of the other is unacceptable to me, and changing both in a way that makes the scores more drawish is also unacceptable to me. Since there is no other alternative, I don't see any way for it work. Besides that, the arbitrariness of it still doesn't sit well with me.

H. G. Muller wrote on Fri, Apr 27, 2018 07:28 PM UTC:

> I do not believe that H. G.'s proposal ...

Well, then you are simply wrong. Math is something that you prove true or false, and not something subject to beliefs. Bad idea to put someone incompetent in math in charge over such a complex issue as rating calculation...

Note that I am not really an interested party, as I never use Game Courier. I just want to put the record straight on what would be the correct way to attach different weights to the distant past.

🕸📝Fergus Duniho wrote on Fri, Apr 27, 2018 07:51 PM UTC:

Saying you're right doesn't make you right, and insulting me does nothing to persuade me to agree with you.

🕸📝Fergus Duniho wrote on Fri, Apr 27, 2018 08:17 PM UTC:

Also, I'm not incompetent at math. I have gotten an A in every math class I ever took. If you do happen to be right, you should consider the possibility that you have not explained your idea very well.

Greg Strong wrote on Fri, Apr 27, 2018 08:17 PM UTC:

Perhaps I don't understand so let me pose a question. Would there be any difference in ratings between these two scenarios: (A) run the ratings now as is, and (B) run the ratings with everything the same except that you and I have played an additional 100 games against each other, but all 100 of them were draws? Would the additional draws change anything?

🕸📝Fergus Duniho wrote on Fri, Apr 27, 2018 10:02 PM UTC:

Yes, it would change a lot. 100 draws is mathematically equivalent to each of us winning 50 games against the other. As an experiment, I wrote a modified script that added 100 extra draws between us. My rating dropped from 1715 to 1675, your rating rose from 1477 to 1494, and many other ratings changed by smaller amounts.

Here is what is going on. First, it made our scores against each other more even, so that the maximum change to our ratings from our games together would be less than what it actually is. However, the greater number of games between us would also increase the portion of the maximum change that would actually be made to our ratings. The bottomline is that this caused us to come out with different ratings when our games factored into the calculation, and this continued to have an effect on every subsequent pair of opponents that included one of us. Since ratings are calculated twice, the second time being in the reverse order of the first time, this affected calculations for every opponent either one of us had. As it affected the ratings of other opponents, it affected the ratings calculated from comparing pairs of opponents that did not include either of us. So, the changes this had for our ratings had a chaotic butterfly effect on the ratings of many other players.

Besides this, it added to the number of games we each played, which made our ratings more stable than they would otherwise be. This also affected the calculation of our ratings with other opponents, and this too had a chaotic butterfly effect through the whole network that includes both of us. This is the yellow colored network, which includes most people who have played on Game Courier. The ratings for people in other networks were unaffected.

🕸📝Fergus Duniho wrote on Sat, Apr 28, 2018 12:41 AM UTC:

As a further experiment, I created three more ratings scripts with additional data:

100 additional wins by fergus against mageofmaple

100 half wins by fergus against mageofmaple

100 quarter wins by fergus against mageofmaple

Note that 100 half wins is the equivalent of 50 wins, and 100 quarter wins is the equivalent of 25 wins. My rating for 100 full wins is 1722, for 100 half wins is 1733, and for 100 quarter wins is 1728. These are all above my actual rating of 1715, which is to be expected. But since I expected the rating for 100 wins to be highest, I rechecked my code and the results, but nothing was amiss. Greg's rating for 100 full losses is 1437, for 100 half losses is 1452, and for 100 quarter losses is 1459. This does follow the expected pattern of being lower for greater losses. Also, they are all lower than his actual rating of 1476, which is also to be expected.

Let's now compare these ratings to those from the 100 draws experiment. My rating there was 1675, and Greg's was 1494. If we take the 100 wins experiment as describing an unmodified set of scores, and we take the others as attempts to reduce the weight of old scores, this seems to be the least fair of all. But let me try another experiment. In this experiment, I will change 100 wins for me into 75 for me and 25 for Greg. In this one, my rating is 1699, and Greg's is 1465. In this experiment, my rating dropped further than it did in the one-sided experiments, and Greg's rose more than it did in the one-sided ones.

It seems fairer to just reduce points than it does to adjust both sides. This reduces the significance of wins without giving a loser any false wins. But by what factor to reduce the wins of a certain age is still a matter of arbitrary decision. I know of no objective reason to favor one way over another. Furthermore, reducing older wins has the same kind of effect as not counting wins older than a certain age. It just does it in a more complicated and less transparent way than filtering out games below a certain date. So, I see no reason to add the ability to reduce the values of older wins.

Greg Strong wrote on Sat, Apr 28, 2018 09:11 PM UTC:

Thanks for the explanation and doing these expirements. Since aging out games just adds draws and draws affect the scores not only for the player but for other players as well, I agree that this is undesirable. I don't fully understand H. G.'s approach but it may have provided better results since it alters the number of games as well. But this would be a more dramatic change and given your complex two-pass system it might well have undesirable effects also. In any event, I'm certainly not going to ask that you put any more time into it.

Regarding the age filter, though, I think this might not be working as expected. I ran the calculation for the last 365 days and it shows David Paulowich playing 44 games but he hasn't been active here in years.

P. S. I sent you an email regarding the abstract piece set.

🕸📝Fergus Duniho wrote on Sat, Apr 28, 2018 11:51 PM UTC:

I checked the logs page and counted 50 games he allegedly finished within the past 365 days, but the years in the log titles were all old. I'll have to look into that.

Greg Strong wrote on Sun, Apr 29, 2018 12:00 AM UTC:

Now that I think about it, I think this is because it was, until very recently, any time you viewed a finished game log, it decided that the game just finished and reset the date.

Hopefully we can fix that - I'd hate for the GC history to be permenantly messed up. My thought is - if the update that was happening only changed the database record, but not the text file of the log itself, perhaps we can take the modification timestamps of the files and use that to update the finished time in the database.

🕸📝Fergus Duniho wrote on Sun, Apr 29, 2018 07:30 PM UTC:

Things are partially fixed now. When it writes the log and updates the database, it will now use the last timestamp in the timeline for the endtime, assuming there is a timeline. Otherwise, it uses the present time. Apparently, it keeps a timeline only if time controls are in use. I may want to change that.

The other problem is that write_log.php is being called too often. I have to look into each instance where it is being called incorrectly and fix that. The difficulty in fixing this is that, thanks to the complexity of the script, I have lost sight of everything that is going on. I made updates to Game Courier to update the logs and database for finished games when any data is incorrect, but there are still bugs in that.

🕸📝Fergus Duniho wrote on Sun, Apr 29, 2018 08:28 PM UTC:

One more change I've made is that I have stopped it from rewriting the log and updating the database whenever the status is "Game Drawn." Instead of doing so with the condition $status == "Drawn game.", it uses the condition (!empty($winner) && ($status == "Drawn game.")).

Ben Reiniger wrote on Tue, Jun 18, 2019 04:19 PM UTC:

I've fixed the header menu to point Ratings and Logs of a game to the correct locations.

The script here used `gamewcp` ("wild-card pattern", I assume), but at least some links are using `game` as request variables. For now, rather than track down all usages, I've just allowed `game` to override `gamewcp`.

I've also changed the default Status Filter to Only Rated Games.

(This is in response to a few messages here.)

Also, @Kevin, I still can't recreate the problem with including all logs. Using the game filter Chess shows only a few games per person, with Carlos at the high 40 and a few others in the teens. Are you removing the wildcard (`%`)?

Kevin Pacey wrote on Tue, Jun 18, 2019 05:25 PM UTC:

I can get the Ratings for just standard Chess now. I was using the wildcard (%) before when using the filter on the GC Ratings Page itself, and that caused the problem of too many games & players showing up. Before now I had no idea of the significance of the (%) symbol, fwiw.

Kevin Pacey wrote on Sat, Jun 22, 2019 09:05 AM UTC:

I like that the default Status Filter for the Ratings page has been changed (by Ben) to Only Rated Games (rather than All Public Games). I think this is the way it should always have been, though I never got around to making a post suggesting such a change.

Kevin Pacey wrote on Sun, Jun 26, 2022 08:47 PM UTC:

@ Fergus:

I'm wondering if there is a bug in the way GC does the total rated games for individual players. For example, I just finished a drawn publicly viewable Fischerrandom game with Play Tester, yet for both his public and rated games for Fischerrandom it shows he still has perfect scores.

Also, I once tallied up my rated games for (orthodox) chess, and I was not credited for one of my drawn games, but seemed to have received a loss instead.

Kevin

Kevin Pacey wrote on Fri, Jan 5 10:13 PM UTC:

Somehow this Page lost a lot of prominence. I can only find it under the already unprominent menu item 'Script', unless I'm missing a more prominent link that says 'Ratings' (that would interest more visitors/members, perhaps).

17 comments displayed

⇧Earliest ⇧Earlier ⇧Reverse Order⇩ ~~Later~~

Permalink to the exact comments currently displayed.