Comments/Ratings for a Single Item
What happens when a game you've won, and it says 'You have won' in the game log, doesn't show up in the calculations, even though you can call it up by name from the game logs with your password, and it shows as a win when you list all your games? The game in question is Omega Chess oncljan-joejoyce-2005-96-245 Admittedly, it's not a good win, but it balances out one of the almost-won games where my opponent disappeared just before the end. (I see the value of timed games now.) Actually, I hadn't brought it up before because it is such a poor win that I didn't feel I deserve it, but I realized that if it was included, I just might get up to 1500 briefly, before I lose to Carlos, David, Gary..., and that'd be a kick for someone who's only been playing a year or so after, depending on how you wish to count time off, 30-40 years. I will say the ratings have brought out everyone's competitive spirits. As for me, I'll happily carry a general rating that takes in all my games: playtests, coffee-house, and tournament; but, since people are asking for so many things, I'd like to add one more. Would it be possible or practical to allow people to choose one or more subsets of games for a specific rating. For example, I am currently playing several variants of shatranj now, one of which is 'grand shatranj'. Could I be allowed to put any number of game names into a 'Rate these games only' field, so I could get a combined rating for say 6 shatranj variants plus Grand Chess? And then another for the 'big board' games, and so on?
Joe, GCR reads data only from public games, not from private games. That's why one of your private games is not factoring into the calculations. I have now added a group filter that lets users select certain groups, and I have extended the Game Filter to recognize comma separated lists. To list games in the Game Filter, separate each game by a single comma, nothing more or less, and don't use any wildcards.
I have now modified the reliability and stability formulas to these:
$r1 = ($n / ($n + 9)) - (($gamessofar[$p1] + 1) / (10*$gamessofar[$p1] + 100)); $r2 = ($n / ($n + 9)) - (($gamessofar[$p2] + 1) / (10*$gamessofar[$p2] + 100)); $s1 = (($gamessofar[$p1] + 1) / ($gamessofar[$p1] + 5)) - ($n / (5*$n + 100)); $s2 = (($gamessofar[$p2] + 1) / ($gamessofar[$p2] + 5)) - ($n / (5*$n + 100));
$n is the number of games two players have played together, and $gamessofar holds the number of games that have so far factored into the ratings of each player. I have modified each formula by subtracting a small fraction based on what determines the other. The fraction subtracted from reliability has a limit of .1, which is otherwise the lower boundary of reliability. The fraction subtracted from stability has a limit of .2, which is otherwise the lower boundary of stability. These have been introduced as erosion factors. A very high stability or reliability erodes the other, and may do so to a limit of zero. Thus, the more games won by one person against another increases the point gain for the winner ever closer to a limit of 400. Likewise, the more single games won by someone against separate individuals also allows his point gain to get closer to a limit of 400. Also, these changes have increased the accuracy slightly.
Fergus, I suggest you use a different rating system, especially considering how your current one is pretty arbitrary (we can nitpick about the 400 point difference as opposed to a 500 or 600 point difference, but we would do by knowing in advance that one number is just as arbitrary as another), and how it appears to be designed to judge people's 'future performance' based upon observations of previous games that users were told wouldn't count. (Although that's really not /that/ big of a deal.) And if you encouraged users to add their computer programs to the fray, the ratings, as such, would add an extra dimension of utility.
By necessity, any rating system is going to have a degree of arbitrariness to it, for some unit of measurement must be used, and there is no hard and fast reason for preferring one over another. But that is no reason at all against any particular method. As for the 400 figure, that is at least rooted in tradition. This same figure is used by the Elo method, which has already established itself as the most popular rating method. As for including computer opponents, you are free to play games in which you enter the moves made by a computer. If you do that, it would be best to create a separate userid for the computer opponent. But Game Courier does not provide any computer opponents, and I don't consider their inclusion in the ratings important. Finally, the filters let you filter out games that are not officially rated. So it's a moot point whether the calculations factor in unrated games. They factor them in only if you choose not to filter them out.
I have a suggestion. Is it possible to have a maximum number of points that a player can gain or lose per game? I am thinking of a maximum change per game of around 10 or 20 points, because there are many players listed here who have only played one or two games but they have highly inflated or deflated ratings. Hats off to Jeremy Good who apparently has completed more games here than anyone else, looks like 250 completed games, and counting!
The good news is that the reason this didn't work sometimes was not because of too many files but because of a monkey wrench thrown into one of the log files. With that file renamed, it's not being read, and this page generates ratings even when set to give ratings for all public games. The bad news is that it seems to be undercounting the games played by people. I checked out a player it said had played only one game, and the logs page listed 23 games he has finished playing. I was also skeptical that I had played only 62 games. I counted more than that and saw that I had still played several more. So that has to be fixed. And since I have made the new FinishedGames database table, I will eventually rewrite this to use that instead of reading the files directly.
This script now reads the database instead of individual logs, and some bugs have been fixed. For one thing, it shouldn't be missing games anymore, as I was complaining about in a previous comment. Also, I found some functions for dealing with mixed character encodings in the database. Some years ago, I tried to start converting everything to UTF-8, but I never finished that. This led to multiple character encodings in the database. By using one function to detect the character encoding and another to convert whatever was detected to UTF-8, I'm now getting everyone's name to show up correctly. One of the practical changes is the switch from Unix wildcards to SQL wildcards. Basically, use % instead of *, and use _ instead of ?. One more thing. I moved this script from play/pbmlogs/ to play/pbm/. It was in the former only because it had to read the logs. Now that it doesn't, it seems more logical to put it in play/pbm/. The old script is still at its old location if you want to compare.
I have also modified groups to work with mysql, and one new feature that helps with groups is that it shows the sql of the search it does. This lets you see what Chess variants are in a group. Most of the groups are based on the tiers I made in the Recognized variants. These may not be too helpful, since they are not of related games. The Capablanca group, which I just expanded, seems to be the most useful group here, since it groups together similar games. What I would like to do is add more groups of related games. I'm open to suggestions.
It looks like everything's been fixed! Well done, Fergus, and thank you! I see that the Finished Games database also allowed for the creation of a page listing Game Courier's top 50 most-played games, which is a very nice addition. Now I guess I have to see if I can catch Hexa Sakk... ; )
It's now possible to list multiple games in the Game Filter field. Just comma-separate them and don't use wildcards.
It is now possible to use wildcards within comma-separated lists of games. Also, Unix style wildcards are now converted to SQL style wildcards. So you can use either.
25 comments displayed
Permalink to the exact comments currently displayed.