[ Help | Earliest Comments | Latest Comments ]

[ List All Subjects of Discussion | Create New Subject of Discussion ]

[ List Earliest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]

Comments/Ratings for a Single Item

⇧Earliest ⇧Earlier ⇧Reverse Order⇩ Later⇧ Latest⇩

Game Courier Ratings. Calculates ratings for players from Game Courier logs. Experimental.[All Comments] [Add Comment or Rating]

🕸📝Fergus Duniho wrote on Sat, Jan 14, 2006 04:52 PM UTC:

No, I have no intention of doing anything along the lines of Mark
Thompson's 'open source' suggestion. While some people might like to
play with their own ratings system, most people are simply going to want
one standard ratings system without the fuss and bother of choosing one
among many. Also, if I set something up to freely let people create their
own ratings systems, there would soon be many bad ratings systems for
people to choose from. As for letting a free market choose the best one,
it wouldn't work like that. Without the benefit of serious investigation
into them, there would be little basis for informed decisions. A bad
system could become popular as easily as a good one. Consider how well
things like astrology and numerology fair under a free market. A free
market is no substitute for scientific investigation.

If people are interested in Game Courier having as good a ratings system
as it can, then they are free to offer comments and suggestions. I have
described the method in algorithmic detail on this page, and I have also
further discussed what it does and have compared it with Elo.

Roberto Lavieri wrote on Sat, Jan 14, 2006 06:40 PM UTC:

I think GCR is an alternative good method, although it has its weaknesses, as ELO also has. Both are not very sensitive to drastic changes in a person´s game play, I know it is unusual, but not impossible. But I insist that weighted history must be considered, weighted history (for each game,I mean) can reflect some evolution in player´s game force, it is expected to happen in our site, because many of the games we play are new games, all of us are gaining experience with little theory as help, and results are less indicative in the first contacts with a game. GCR main weakness is that it does not reflect with the best accuracy the actual real force, but it tends toward an average over all the time.

Roberto Lavieri wrote on Sat, Jan 14, 2006 07:03 PM UTC:

Other weakness I see is that You don´t know how many games are needed to consider a rating to be 'somewhat confident'. It is very possible that a player with only a few games played, say less than ten, but with almost perfect score against 'well rated' players, show a rating that does not reflect the player´s force, being the rating, perhaps, much less than other player´s rating with a lot of games played but much less average and relatively worse record against others. It has been said that the rating must stabilize with time, but I´m not sure how many games are needed, and the disparity in number of games may introduce a bias that can give ratings that could be not so easy to compare with accuracy. But once 'stabilized', the whole history introduces another bias, product of very old games considered with the same weight as new ones, this is the main reason I insist with the weighted history idea.

🕸📝Fergus Duniho wrote on Sun, Jan 15, 2006 09:52 PM UTC:

Roberto, I was reading about the Glicko method the other day. This is an improvement on Elo that takes into consideration each player's activity. As I was reading about it, it seemed to me that it was addressing some of the same concerns as a weighted system is supposed to address. But instead of weighting the point value of games, it was treating the ratings of more active players as more stable than the ratings of less active players. GCR already does this. So consider a player who intially does poorly at a new game then gets it and starts doing a lot better. So long as he actively plays the game with others, his initial games won't count for as much. If they were against the same opponents he continues to play, each new game he wins against them will lessen the effect of his initial losses. If they were against opponents he no longer plays, they will be considered as less stable than scores against players he plays against more often. Furthermore, if his old opponents don't improve as much as he does, his losses against them won't count as much as losses against stronger players. Although a weighted Elo method might be an improvement on Elo, GCR already comes with features that address the concerns that weighting Elo is supposed to meet. So there seems to be less, if any, need for weighting GCR.

🕸📝Fergus Duniho wrote on Sun, Jan 15, 2006 10:05 PM UTC:

I'll draw attention to the change I made today. Previously, when two players had ratings more than 400 points apart, GCR would calculate their provisional ratings by adding the lower rating to each player's percentage of games won times the full distance between them. Now, when two players have ratings more than 400 points apart, GCR calculates the midpoint between them, and calculates each player's provisional rating in a range between his current rating and 200 points past the midpoint. For example, if it compares two players at 1500 and 2000, the 1500 rated player's provisional rating would fall between 1500 and 1950, and the other's would fall between 1550 and 2000. The higher rated player's provisional rating is now calculated by subtracting the product of his opponent's score times the range. Since both scores add up to one, this is simply the same as using 1 minus his own score. The advantage of doing it this way is that the provisional scores for both players are only 400 points apart when the lower-rated player wins all games. Previously, this would give each player his opponent's rating as a provisional rating in this event, and that would be too much. After I made this change and fixed the bugs, the calculated ratings became slightly more accurate at predicting the original scores. So it seems to be an improvement.

Joe Joyce wrote on Mon, Jan 16, 2006 05:18 AM UTC:

What happens when a game you've won, and it says 'You have won' in the
game log, doesn't show up in the calculations, even though you can call
it up by name from the game logs with your password, and it shows as a win
when you list all your games? The game in question is
Omega Chess oncljan-joejoyce-2005-96-245
Admittedly, it's not a good win, but it balances out one of the
almost-won games where my opponent disappeared just before the end. (I see
the value of timed games now.) Actually, I hadn't brought it up before
because it is such a poor win that I didn't feel I deserve it, but I
realized that if it was included, I just might get up to 1500 briefly,
before I lose to Carlos, David, Gary..., and that'd be a kick for someone
who's only been playing a year or so after, depending on how you wish to
count time off, 30-40 years.  
I will say the ratings have brought out everyone's competitive spirits.
As for me, I'll happily carry a general rating that takes in all my
games: playtests, coffee-house, and tournament; but, since people are
asking for so many things, I'd like to add one more. Would it be possible
or practical to allow people to choose one or more subsets of games for a
specific rating. For example, I am currently playing several variants of
shatranj now, one of which is 'grand shatranj'. Could I be allowed to
put any number of game names into a 'Rate these games only' field, so I
could get a combined rating for say 6 shatranj variants plus Grand Chess?
And then another for the 'big board' games, and so on?

🕸📝Fergus Duniho wrote on Mon, Jan 16, 2006 04:04 PM UTC:

Joe,

GCR reads data only from public games, not from private games. That's why
one of your private games is not factoring into the calculations. 

I have now added a group filter that lets users select certain groups, and
I have extended the Game Filter to recognize comma separated lists. To list
games in the Game Filter, separate each game by a single comma, nothing
more or less, and don't use any wildcards.

Joe Joyce wrote on Mon, Jan 16, 2006 04:56 PM UTC:Excellent ★★★★★

Thank you very much.

🕸📝Fergus Duniho wrote on Mon, Jan 16, 2006 05:24 PM UTC:

I have now modified the reliability and stability formulas to these:

$r1 = ($n / ($n + 9)) - (($gamessofar[$p1] + 1) / (10*$gamessofar[$p1] + 100));
$r2 = ($n / ($n + 9)) - (($gamessofar[$p2] + 1) / (10*$gamessofar[$p2] + 100));
$s1 = (($gamessofar[$p1] + 1) / ($gamessofar[$p1] + 5)) - ($n / (5*$n + 100));
$s2 = (($gamessofar[$p2] + 1) / ($gamessofar[$p2] + 5)) - ($n / (5*$n + 100));

$n is the number of games two players have played together, and $gamessofar holds the number of games that have so far factored into the ratings of each player. I have modified each formula by subtracting a small fraction based on what determines the other. The fraction subtracted from reliability has a limit of .1, which is otherwise the lower boundary of reliability. The fraction subtracted from stability has a limit of .2, which is otherwise the lower boundary of stability. These have been introduced as erosion factors. A very high stability or reliability erodes the other, and may do so to a limit of zero. Thus, the more games won by one person against another increases the point gain for the winner ever closer to a limit of 400. Likewise, the more single games won by someone against separate individuals also allows his point gain to get closer to a limit of 400. Also, these changes have increased the accuracy slightly.

🕸📝Fergus Duniho wrote on Thu, Feb 16, 2006 05:00 PM UTC:

I have corrected an inaccuracy in the description of the method used. It used to say that a higher rated player is expected to win a percentage of games equal to one quarter of the point difference between the ratings, capped at 100%. Although this works for a 400 point difference, it is inaccurate for other point differences. In particular, this formula predicts that the lower rated player would win more games for any point difference below 200, which is just crazy. Anyway, the examples I gave to illustrate the formula did not illustrate it, and examination of my code indicates that I did not use it. The actual formula, which the examples did illustrate, and which I did use in the code, is that a higher rated player may be expected to win a percentage of games equal to 50% plus one eighth of the point difference, capped at 100%.

Matthew Montchalin wrote on Thu, Feb 16, 2006 11:03 PM UTC:

Fergus, I suggest you use a different rating system, especially considering
how your current one is pretty arbitrary (we can nitpick about the 400
point difference as opposed to a 500 or 600 point difference, but we would
do by knowing in advance that one number is just as arbitrary as another),
and how it appears to be designed to judge people's 'future
performance' based upon observations of previous games that users were
told wouldn't count.  (Although that's really not /that/ big of a deal.)
 And if you encouraged users to add their computer programs to the fray,
the ratings, as such, would add an extra dimension of utility.

🕸📝Fergus Duniho wrote on Fri, Feb 17, 2006 12:41 AM UTC:

By necessity, any rating system is going to have a degree of arbitrariness
to it, for some unit of measurement must be used, and there is no hard and
fast reason for preferring one over another. But that is no reason at all
against any particular method. As for the 400 figure, that is at least
rooted in tradition. This same figure is used by the Elo method, which has
already established itself as the most popular rating method.

As for including computer opponents, you are free to play games in which
you enter the moves made by a computer. If you do that, it would be best
to create a separate userid for the computer opponent. But Game Courier
does not provide any computer opponents, and I don't consider their
inclusion in the ratings important.

Finally, the filters let you filter out games that are not officially
rated. So it's a moot point whether the calculations factor in unrated
games. They factor them in only if you choose not to filter them out.

Thomas McElmurry wrote on Wed, Feb 22, 2006 06:19 AM UTC:

When I view the ratings for all tournament games by using '?*' as the tournament filter, exactly one player is displayed in a different color than the others. How is this possible? Does it indicate an error in the code, or in my understanding of what the colors indicate?

🕸📝Fergus Duniho wrote on Wed, Feb 22, 2006 02:31 PM UTC:

Since I have won a rated game against the person in question, I know his row should be yellow like all the rest. There must be a bug.

🕸📝Fergus Duniho wrote on Wed, Feb 22, 2006 03:43 PM UTC:

Okay, the bug should now be fixed. Thanks for reporting it.

Stephen Stockman wrote on Mon, May 8, 2006 08:41 PM UTC:Excellent ★★★★★

WOW!! this ratings page is super cool! Now I see why people are playing more games, they're working on their ratings. Thank You Fergus

Jeremy Good wrote on Mon, May 8, 2006 09:46 PM UTC:

Never noticed this before. Hey, Joe (Joyce) you and I have a very similar rating at this time. We're a good match.

Stephen Stockman wrote on Fri, Jul 28, 2006 07:20 AM UTC:Excellent ★★★★★

I have a suggestion. Is it possible to have a maximum number of points that
a player can gain or lose per game? I am thinking of a maximum change per
game of around 10 or 20 points, because there are many players listed here
who have only played one or two games but they have highly inflated or
deflated ratings.

Hats off to Jeremy Good who apparently has completed more games here than
anyone else, looks like 250 completed games, and counting!

🕸📝Fergus Duniho wrote on Fri, Jul 28, 2006 04:58 PM UTC:

So far, the ratings for all public games fall within a 500 point range. Except for the top rating, all fall within a 400 point range. Most fall within a 200 point range. Ratings of people who have played only two games fall within a 300 point range. Ratings of people who have played only one game fall within a 200 point range. So there does not appear to be any deflation or inflation of ratings. There is a range of variablity among players who have played few games, but you cannot get your rating very high or low without playing many games.

🕸📝Fergus Duniho wrote on Tue, Apr 7, 2015 02:11 PM UTC:

The good news is that the reason this didn't work sometimes was not because of too many files but because of a monkey wrench thrown into one of the log files. With that file renamed, it's not being read, and this page generates ratings even when set to give ratings for all public games. The bad news is that it seems to be undercounting the games played by people. I checked out a player it said had played only one game, and the logs page listed 23 games he has finished playing. I was also skeptical that I had played only 62 games. I counted more than that and saw that I had still played several more. So that has to be fixed. And since I have made the new FinishedGames database table, I will eventually rewrite this to use that instead of reading the files directly.

🕸📝Fergus Duniho wrote on Sat, Apr 11, 2015 01:02 AM UTC:

This script now reads the database instead of individual logs, and some bugs have been fixed. For one thing, it shouldn't be missing games anymore, as I was complaining about in a previous comment. Also, I found some functions for dealing with mixed character encodings in the database. Some years ago, I tried to start converting everything to UTF-8, but I never finished that. This led to multiple character encodings in the database. By using one function to detect the character encoding and another to convert whatever was detected to UTF-8, I'm now getting everyone's name to show up correctly.

One of the practical changes is the switch from Unix wildcards to SQL wildcards. Basically, use % instead of *, and use _ instead of ?.

One more thing. I moved this script from play/pbmlogs/ to play/pbm/. It was in the former only because it had to read the logs. Now that it doesn't, it seems more logical to put it in play/pbm/. The old script is still at its old location if you want to compare.

🕸📝Fergus Duniho wrote on Sat, Apr 11, 2015 01:36 AM UTC:

I have also modified groups to work with mysql, and one new feature that helps with groups is that it shows the sql of the search it does. This lets you see what Chess variants are in a group. Most of the groups are based on the tiers I made in the Recognized variants. These may not be too helpful, since they are not of related games. The Capablanca group, which I just expanded, seems to be the most useful group here, since it groups together similar games. What I would like to do is add more groups of related games. I'm open to suggestions.

Cameron Miles wrote on Sat, Apr 11, 2015 01:37 AM UTC:

It looks like everything's been fixed! Well done, Fergus, and thank you!

I see that the Finished Games database also allowed for the creation of a page listing Game Courier's top 50 most-played games, which is a very nice addition.

Now I guess I have to see if I can catch Hexa Sakk...  ; )

🕸📝Fergus Duniho wrote on Sat, Apr 11, 2015 03:26 AM UTC:

It's now possible to list multiple games in the Game Filter field. Just comma-separate them and don't use wildcards.

🕸📝Fergus Duniho wrote on Sat, Apr 11, 2015 10:40 AM UTC:

It is now possible to use wildcards within comma-separated lists of games. Also, Unix style wildcards are now converted to SQL style wildcards. So you can use either.

25 comments displayed

⇧Earliest ⇧Earlier ⇧Reverse Order⇩ Later⇧ Latest⇩

Permalink to the exact comments currently displayed.