Single Comment

Game Courier Ratings. Calculates ratings for players from Game Courier logs. Experimental.[All Comments] [Add Comment or Rating]

🕸📝Fergus Duniho wrote on Wed, Jan 11, 2006 03:54 PM UTC:

Michael,

The purpose of a rating system is to measure relative differences between
playing strength. I can't emphasize the world relative enough. The best
way to measure relative playing strength is a holistic method that
regularly takes into account all games in its database. One consequence of
this is that ratings may change even when someone stops playing games. This
makes the method more accurate. The Elo and CXR methods have not been
holistic, because a holistic method is not feasible on the scale these
systems are designed for. They have to settle for strictly sequential
changes. Because GCR works in a closed environment with complete access to
game logs, it does not have to settle for strictly sequential changes. It
has the luxury of making global assessments of relative playing strength
on the basis of how everyone is doing.

A separate issue you raised is of a 3000 rated player losing less points
than a 1500 rated player. Since last night, I have rethought how to use
and calculate stability. Instead of basing stability on a player's
rating, I can keep track of how many games have so far factored into the
estimate of each player's rating. One thought is to just count the games
whose results have so far directly factored into a player's rating.
Another thought is to also keep track of each opponent's stability, keep
a running total of this, and divide it by the number of opponents a player
has so far been compared with. I'm thinking of adding these two figures
together, or maybe averaging them, to recalculate the stability score of
each player after each comparison. Thus, stability would be a factor of
how reliable an indicator a player's past games have been of his present
rating.

That covers my new thoughts on recalculating stability. As for using it, I
am thinking of using both player's stability scores to weigh how much
ratings may change in each direction. I am still trying to work out the
details on this. The main change is that both stability scores would
affect the change in rating of both players being compared. In contrast,
the present method factors in only a player's own stability score in
recalculating his rating.

One consequence of this is that if a mid-range rated player defeats a
high-rated player, and the mid-range player has so far accumulated the
higher stability score, the change will be more towards his rating than
towards the high-rated player's rating. The overall effect will be to
adjust ratings toward the better established ratings, making all ratings
in general more accurate.