[ Help | Earliest Comments | Latest Comments ]

[ List All Subjects of Discussion | Create New Subject of Discussion ]

[ List Earliest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]

Comments/Ratings for a Single Item

⇧Earliest ⇧Earlier ⇧Reverse Order⇩ Later⇧ Latest⇩

Game Courier Ratings. Calculates ratings for players from Game Courier logs. Experimental.[All Comments] [Add Comment or Rating]

🕸📝Fergus Duniho wrote on Wed, Apr 25, 2018 11:18 PM UTC:

I was wondering along the lines of do we want a Game Courier rating system that rewards players for trying out a greater number of chess variants with presets.

Such a ratings system would be more complicated and work differently than the current one. The present system can work for a single game or for a set of games, but when it does work with a set of games, it treats them all as though they were the same game.

However, conversely this could well 'punish' players who choose to specialize in playing only a small number of chess variants, perhaps for their whole Game Courier 'playing career'.

Yes, that would be the result. Presently, someone who specializes in a small set of games, such as Francis Fahys, can gain a high general GCR by doing well in those games.

in any case it seems, if I'm understanding right, the current GC rating system may 'punish' the winner of a given game between 2 particular players who have already played each other many times, by not awarding what might otherwise be a lot of rating points for winning the given game in question.

Game Courier ratings are not calculated on a game-by-game basis. For each pair of players, all the games played between them factor into the calculation simultaenously. Also, it is not designed to "award" points to players. It works very differently than Elo, and if you begin with Elo as your model for how a ratings system works, you could get some wrong ideas about how GCR works. GCR works through a trial-and-error method of adjusting ratings between two players to better match the ratings that would accurately predict the outcome of the games played between them. The number of games played between two players affects the size of this adjustment. Given the same outcome, a smaller adjustment is made when they have played few games together, and a larger adjustment is made when they have played several games together.

Getting back to your suggestion, one thought I'm having is to put greater trust in results that come from playing the same game and to put less trust in results that come from playing different games together. More trust would result in a greater adjustment, while less trust would result in a smaller adjustment. The rationale behind this is that results for the same game are more predictive of relative playing ability, whereas results from different games are more independent of each other. But it is not clear that this would reward playing many variants. If someone played only a few games, the greater adjustments would lead to more extreme scores. This would reward people who do well in the few variants they play, though it would punish people who do poorly in those games. However, if someone played a wide variety of variants, smaller adjustments would keep his rating from rising as fast if he is doing well, and they would keep it from sinking as fast if he is not doing well. So, while this change would not unilaterally reward players of many variants over players of fewer variants, it would decrease the cost of losing in multiple variants.

🕸📝Fergus Duniho wrote on Thu, Apr 26, 2018 12:09 AM UTC:

How would you measure the diversity of games played between two players? Suppose X1 and Y1 play five games of Chess, 2 of Shogi, and 1 each of Xiang Qi, Smess, and Grand Chess. Then we have X2 and Y2, who play 3 games of Chess, 3 of Shogi, 2 of Xiang Qi, and 1 each of Smess and Grand Chess. Finally, X3 and Y3 have played two games each of the five games the other pairs have played. Each pair of players has played ten games of the same five games. For each pair, I want to calculate a trust value between a limit of 0 and a limit of 1, which I would then multiply times the maximum adjustment value to get a lower adjustment value.

Presently, the formula n/(n+10) is used, where n is the number of games played between them. In this case, n is 10, and the value of n/(n+10) is 10/20 or 1/2. One thought is to add up fractions that use the number of games played of each game.

X1 and Y1

5/(5+10)+2/(2+10)+1/(1+10)+1/(1+10)+1/(1+10) = 5/15+2/12+1/11+1/11+1/11 = 17/22 = 0.772727272

X2 and Y2

3/13 + 3/13 + 2/12 + 1/12 + 1/11 = 6/13 + 2/12 + 2/11 = 695/858 = 0.81002331

X3 and Y3

2/12 * 5 = 10/12 = 0.833333333

The result of this is to put greater trust in a diverse set of games than in a less diverse set, yet this is the opposite of what I was going for.

How would this change if I changed the constant 10 to a different value? Let's try 5.

X1 and Y1

5/(5+5)+2/(2+5)+1/(1+5)+1/(1+5)+1/(1+5) = 5/10+2/7+3/6 = 1 2/7 = 1.285714286

Since this raises the value above 1, it's not acceptable. Let's try changing 10 to 20.

X1 and Y1

5/(5+20)+2/(2+20)+1/(1+20)+1/(1+20)+1/(1+20) = 5/25+2/22+1/21+1/21+1/21 = 167/385 = 0.433766233

X2 and Y2

3/23 + 3/23 + 2/22 + 1/22 + 1/21 = 6/23 + 2/22 + 2/21 = 2375/5313 = 0.447016751

X3 and Y3

2/22 * 5 = 10/22 = 0.454545454

This follows the same pattern, though the trust values are lower. To clearly see the difference, look at X2 and Y2, and compare 2/22, which is for two games of Xiang Qi, with 2/21, which is for one game each of Smess and Grand Chess. 2/22 is the smaller number, which indicates that it is giving lower trust scores for the same game played twice.

Since it is late, I'll think about this more later. In the meantime, maybe somebody else has another suggestion.

🕸📝Fergus Duniho wrote on Thu, Apr 26, 2018 12:43 AM UTC:

Presently, the more games two players play together, the greater the amount of trust that is given to the outcome of their games, but each additional game they play together adds a smaller amount of trust. This is why players playing the same game together would produce a smaller increase in trust than players playing different games together in the calculations I was trying out in my previous comment. Since this is how the calculations naturally fall, is there a rationale for doing it this way instead of what I earlier proposed? If one player does well against another in multiple games, this could be more indicative of general Chess variant playing ability, whereas if one does well against another mainly in one particular game but plays that game a lot, this may merely indicate mastery in that one game instead of general skill in playing Chess variants, and that may be due to specialized knowledge rather than general intelligence. The result of doing it this way is that players who played lots of different games could see a faster rise in their ratings than a player who specialized in only a few games. However, people who specialized in only a few games would also see slower drops in their ratings if they do poorly. For each side, there would be some give and take. But if we want to give higher ratings to people who do well in many variants, then this might be the way to do it.

Kevin Pacey wrote on Thu, Apr 26, 2018 09:56 AM UTC:

Hi Fergus

Note I did put a late, second, edit to my previous post, mentioning the small distinction that we're talking about specific userid's rather than specific players. I made this distinction since it's possible (and evident in some cases already on GC's ratings list) that people can have more than one userid, hence more than one Game Courier rating. While presumably it would be tough to prevent this if desired, starting a new rating from scratch at least does not guarentee a player he will get a higher one after many games (a long time ago, it may be worth noting, the Chess Federation of Canada allowed a given player to at least once effectively destroy his existing rating and begin again from scratch, perhaps for a $ price).

🕸📝Fergus Duniho wrote on Thu, Apr 26, 2018 11:07 AM UTC:

It goes without saying that I am talking about userids. The script is unable to disinguish players by anything other than userid, and it has no confirmed data on which players are using multiple userids. All I can do about this is discourage the use of multiple userids so that this doesn't become much of a factor. But if someone wants to play games with multiple userids, he presumably has a reason for wanting to keep seperate ratings for different games.

🕸📝Fergus Duniho wrote on Thu, Apr 26, 2018 12:46 PM UTC:

One concern I had was that adding up fractions for the number of times two players played each separate game could eventually add up to a value greater than 1. For example, if two players played 12 different games together, the total would be 12 * (1/11) or 12/11, which is greater than 1. One way to get around this is to divide the total by the number of different games played. Let's see how this affects my original scenarios:

X1 and Y1

5/(5+10)+2/(2+10)+1/(1+10)+1/(1+10)+1/(1+10) = 5/15+2/12+1/11+1/11+1/11 = 17/22 = 0.772727272

17/22 * 1/5 = 17/110 = 0.154545454

X2 and Y2

3/13 + 3/13 + 2/12 + 1/12 + 1/11 = 6/13 + 2/12 + 2/11 = 695/858 = 0.81002331

695/858 * 1/5 = 695/4290 = 0.162004662

X3 and Y3

2/12 * 5 = 10/12 = 0.833333333

10/12 * 1/5 = 10/60 = 0.1666666666

As before, these values are greater where the diversity is more evenly spread out, which is to say more homogenous.

However, the number of different games played was fixed at 5 in these examples, and the number of total games played was fixed at 10. Other examples need to be tested.

Consider two players who play 20 individual games once each and two others who play 10 individual games twice each. Each pair has played 20 games total.

Scenario 1: 20 different games

(20 * 1/11) / 20 = 20/11 * 1/20 = 1/11

Scenario 2: 10 different games twice

(10 * 2/12)/10 = 20/12 * 1/10 = 2/12 = 1/6

Applying the same formula to these two scenarios, the 20 different games have no more influence than a single game, which is very bad. This would severely limit the ratings of people who are playing a variety of games. So, if diversity of games played is to be factored in, something else will have to be done.

The problem is that the importance of diversity is not as clear as the importance of quantity. It is clear that the more games two players have played together, the more likely it is that the outcome of their games is representative of their relative playing abilities. But whether those games are mixed or the same does not bear so clearly on how likely it is that the outcome of the games played reflects their relative playing abilities. With quantity as a single factor, it is easy enough to use a formula that returns a value that gets closer to 1 as the quantity increases. But with two factors, quantity and diversity, it becomes much less clear how they should interact. Furthermore, diversity is not simply about how many different games are played but also about how evenly the diversity is distributed, what I call the homogeneity of diversity. When I think about it, homogeneity of diversity sounds like a paradoxical concept. The X3 and Y3 example has a greater homogeneity of diversity than the other two, but an example where X4 and Y4 play Chess 10 times has an even greater homogeneity of diversity but much less diversity. Because of these complications in measuring diversity, I'm feeling inclined to not factor it in.

The most important part of the GCR method is the use of trial-and-error. Thanks to the self-correcting nature of trial-and-error, the difference that factoring in diversity could make is not going to have a large effect on the final outcome. So, unless someone can think of a better way to include a measure of diversity, it may be best to leave it out.

Kevin Pacey wrote on Thu, Apr 26, 2018 12:54 PM UTC:

If left as is, is the current rating system at least somewhat kind to a player who suddenly improves a lot (e.g. through study or practice), but who has already played a lot of games on Game Courier? I'm not so sure, even if said player from then on plays much more often vs. players he hasn't much played against before on GC.

Aurelian Florea wrote on Thu, Apr 26, 2018 01:14 PM UTC:

I was thinking about older results (both in time and number of games) should, maybe fade away. Is that very difficult to implement Fergus? It seems fairer, but trouble that you need many games at the "same time" to make the ratings meaningfull, and as the current population is that cannot be easilly done :)!

🕸📝Fergus Duniho wrote on Thu, Apr 26, 2018 01:52 PM UTC:

The script includes an age filter. If you don't want to include old ratings, you can filter them out.

Greg Strong wrote on Thu, Apr 26, 2018 02:02 PM UTC:

I was also thinking the results from old games should be less trusted than the results from new games. A recent game is a better indication of a player's ability than a 10-year-old game.

Kevin Pacey wrote on Thu, Apr 26, 2018 02:32 PM UTC:

Off-topic, there is much a chess variants player might do to improve in a relatively short period of time (aside from suddenly improved real-life conditions partly or wholly beyond his control, such as recovering from poor health or personal problems, or acquiring more free time than before). Besides any sort of intuition/experience aquired through sustained practice there's general or specific study he might do on his own, as I alluded to previously. As Joe alluded to, there are many variants that are rather like standard chess, and study and practice of chess probably can only help playing chess variants generally.

Unlike for many chess variants, there is an abundance of chess literature that can help improvement, even at many variants, and hiring a chess teacher, coach or trainer will probably help those who play chess variants too. A chess trainer can help with any physical fitness regime, which also can help those who play chess variants. There might also be such ways available for improvement found by those into other major chess variants with literature etc. such as Shogi and Chinese Chess, though these two are perhaps less generally applicable for overall improvement at chess variants than using chess would be (not sure).

Greg Strong wrote on Thu, Apr 26, 2018 02:34 PM UTC:

That certainly is off-topic

Aurelian Florea wrote on Thu, Apr 26, 2018 04:59 PM UTC:

@Fergus,

I think it is not only about "old" in the calendar sense, but also "old" in the many games ago sense.

Also, I think fading away is a nicer way of putting things than cut offs :)!

🕸📝Fergus Duniho wrote on Thu, Apr 26, 2018 06:31 PM UTC:

Since different people play games at different rates, the last n games of each player would not return a single set of games that would work for everyone. A chronological cut-off point works better, because it can be used to select a uniform set of games.

Greg Strong wrote on Thu, Apr 26, 2018 07:00 PM UTC:

I was envisioning a system where older games are considered with lesser weight by some formula down to some minimum. A game should have, say, at least half the weight of a recent game no matter how old (for example.)

I can play atound with formulas if interested. Beyond the question of age, however, I think the system is good as-is.

🕸📝Fergus Duniho wrote on Thu, Apr 26, 2018 07:46 PM UTC:

I have two problems with discounting the results of older games. One is that the decision concerning what age to start reducing game results is arbitrary. The other is that the results for a game is zero points for a loss and one point for a win. While one point for a win could be reduced to a smaller value, zero points for a loss could not without introducing negative values. The alternative would then be to reduce the 1 point for a win and increase the zero points for a loss, making the results of the older game more drawish. This would distort the results of a game and possibly result in less accurate ratings. I prefer using the age filter to set a clean cut-off point at which older results just aren't included in the calculation.

Greg Strong wrote on Thu, Apr 26, 2018 09:08 PM UTC:

I don't see the first one being a problem as they would decrease in significance with age on a smooth curve. The other concern is a problem though. I agree increasing the value of losses up from zero is a bad idea. This would only work if a win was +1 and a loss was -1.

Joe Joyce wrote on Fri, Apr 27, 2018 02:55 AM UTC:

Stupid question: Could you rate wins as +2 and losses as -1, and would that help?

🕸📝Fergus Duniho wrote on Fri, Apr 27, 2018 09:32 AM UTC:

No, I will be sticking to the traditional values of 1 for a win and 0 for a loss.

H. G. Muller wrote on Fri, Apr 27, 2018 11:26 AM UTC:

I don't think there is any problem, weighting games with a {0, 1} result. You just add {0, w} to the score, and w to the number of games played. That 0 stays 0 is not a problem; the effect of the weighting comes in through the number of games. E.g. if you won a recent game (with weight 1) and lost an old one, your average will be 1/(1+w), which is greater than 0.5 if w<1, because the old loss is weighted less.

Aurelian Florea wrote on Fri, Apr 27, 2018 11:33 AM UTC:

I think HG's idea could work. It is along the lines of what I was thinking but somewhat better put. There should be a clearer rule though but fading away (never to 100%) of the results is a reasonable concept :)!

🕸📝Fergus Duniho wrote on Fri, Apr 27, 2018 01:10 PM UTC:

The scores are used to determine the number of games played between two players, because that is what they add up to.

H. G. Muller wrote on Fri, Apr 27, 2018 01:55 PM UTC:

Well, the weighted scores would add up to the weighted number of games. A recent 1-0 plus an old 0-w adds up to 1+w games, so the player averages will be 1/(1+w) and w/(1+w). When your rating calculation requires the average opponent rating, you will have to weight that average in the same way, of course.

🕸📝Fergus Duniho wrote on Fri, Apr 27, 2018 02:09 PM UTC:

I'm not going to be weighting the scores.

Greg Strong wrote on Fri, Apr 27, 2018 03:03 PM UTC:

If you were going to just veto the notion by dictatorial decree, you could have said so at the beginning and spared us the pretence of discussion.

25 comments displayed

⇧Earliest ⇧Earlier ⇧Reverse Order⇩ Later⇧ Latest⇩

Permalink to the exact comments currently displayed.