A Modeling of Game Learning Theory Based on Fairness

By incorporating fairness factor in the EWA (experience-weighted attraction) learning model, we develop an extended game learning model called FGL model. We use psychological effect in stead of material effect to modify strategy’s payoff and attraction, and to study the equilibrium movement further in dynamic Games. That participants have fair thinking will, in turn, lead to their psychological function changes. Compared with EWA learning model by simulating the decision-making in Ultimatum Game, we find FGL model converges to equilibrium strategy faster.


Introduction
Traditional theories of economics are based on rational hypothesis which assumes people only pursue their own material self-interest.However, many famous economists such as Simon (1955), Arrow(1981), Samuelson(1993) and Sen(1995)believed that people are bounded rational rather than perfect rational in reality.The hypothesis which economic-man is self-interest has been challenged since 1980s.Many experimental economists have proved that participants are altruism, and have strategy learning and fair thinking in games, which contravene the behaviors forecasted by standard game theory in many different game experiments (Guth, Schmittberger, and Schwarze. (1982); Forsythe, Joel, Savin and Martin (1994); Camerer, Thaler,Richard (1995); Roth(1995) ;Fehr,Alexander and Schmidt(2007); Ernst Fehr , Jean-robert Tyran(2008);Qingquan He and Yulei Rao(2009)).Explaining these phenomenas is the main focus of experimental economics and other related economic theories.Theoretical methods to explain these anomalies mainly include learning model and fairness model.
Learning model assumes that people are bounded rational, but it makes the material payoffs as utilities.There are three main learning models: belief learning model (Brown (1951); Milgrom and Roberts (1991); Hon-snir, Monderer, and Sela(1998); Sela (2000); Berger (2005)), reinforcement learning model (Gale,K.Binmore and Samuelson(1995); Roth and Erev(1995)) and EWA learning model (Camerer, and Ho (1999)).Crawford (1995) thought players have belief learning in games, namely, players can take account of previous behavior by other players or themselves to update their beliefs about what others will do in the future, then choose a best-response strategy accordingly to maximize their expected payoffs.However, due to the payoffs of historical strategies, some players also may repeat successful strategies and abandon failed strategies.That is reinforcement learning.(Note 1) Experience-weighted attraction learning model (we called EWA learning later), designed by Camerer and Ho (1999), combines the most appealing elements of reinforcement and belief learning models.Learning models can explain game players' behavior better, however, the learning process forecasted by it is slower so that the strategy equilibrium can't be converged to fast.
Rabin, who developed the Fairness model, pointed out that player's utility is not just equal to material payoff, but depends on others' payoffs (Rabin(1993)).That is to say, people may often respond to others' intentions on their certain behavior: people are willing to return kindness to those who they think are kind, and retaliate to those unkind, regardless of cost.Rabin defined a "kindness function" to measure others' kindness or behavioral intention in his model.However, there may be multi-equilibriums forecasted by his model.For example, there may be multi-equilibriums when material payoff is smaller but psychological payoff is more important relatively.The model can't forecast which equilibrium will emerge at last, when both fairness equilibrium and unfairness equilibrium exist and meet self-fulfilling prophecy.
Players don't represent only learning behavior but also fairness thinking in games.As the learning models don't take player's fairness thinking into account, it forecasts player's learning process slowly so that they can't make strategies converge to equilibrium fastly.This paper attempts to incorporate fairness factor into learning model to form the game learning model based on fairness (ab.FGL model).Our purpose is to enhance the veracity of the forecasted equilibrium results, to improve player's slowly learning process and solve the problems such as the imperfect learning effect and so on.

The Game Learning Model Based on Fairness (FGL Model)
We incorporate the fairness factor into the EWA learning model to form the FGL model.In the EWA learning model, every strategy has an "attraction" (Note 2), which implicates the choice probability of a certain strategy.
A t and ( ) N t vary with the time t, that is, ( ) A t and ( ) N t are updated after every period.The EWA attraction ( ) A t and the experience weight ( ) N t (Note 3) updating equations from Camerer and Ho[ 21]-[23] are: Parameterφ reflects the decay of previous attractions owing to forgetting or deliberating shedding of old experience when the learning environment is changing.φ is between zero and one.( 1)

Function ( , ( ))
is an indicator function of the model which equals to 0 or 1 under different conditions, as follows: of one is put on the payoff term when the strategy being reinforced is the one the player chose ( ( ) ), but the weight on forgone payoffs from the other strategies ( ( ) ) is δ .The parameter δ is the weight placed on forgone payoffs, which is presumably affected by imagination and the reliability of information about forgone payoffs, when 1 0 < < δ .The parameter ρ controls the rate at which attractions grow.It also captures influences of different models on attractions growing in the process of game learning.
Attractions can be mapped into choice probabilities using a logit response function by Camerer and Ho.The choice probability of player i 's strategy k in period t+1 is: Where λ is the response sensitivity.
We can learn that this learning model has taken history experience and the beliefs of players which have effects on game behavior.We can also learn that the material payoffs remain the same during the whole game process.However, people's beliefs change with the game moving, then the payoffs of strategies change.Moreover, we believe that game players have fairness thinking.They adjust their strategies according to whether their opponents' behaviors are fair or not.They may choose retaliatory strategies, even at a cost to themselves, a bad belief given on their opponents.However, if their opponents are kind, as "gift exchange", they may choose some strategies which benefit both or even more to their opponents.
Well then, we replace material effect ( ) ( ) U a b c (Note 5) to measure psychological effect changes which are brought by the changes of game opponents' beliefs.Denote player i's strategy by i a , the strategy player i believes player j is choosing by j b , while the strategy that player i's belief about what strategy player j believes i is choosing by i c .Then, denote the fairness effect function as follows: is the material effect when both game players choose strategies ( ) ) measures the degree of kindness and generosity player i to player j. ( ) ) measures the degree of kindness which player i believes j to himself, while represents the psychological effect function brought by fair motivation.
Then the strategy attraction (Note 6) of FGL model is: Here, ( ) According to our model, we can calculate the attraction of each strategy in every period.Then these attractions can be transformed into the choice probabilities according to equation (3).

The Compare of the Simulation to the Ultimatum Game
To verify the astringency and forecast ability of FGL model, we carry out a computer simulation to the ultimate game, and compare our results with the results simulated by EWA learning model.
The ultimatum game is a game about dividing some amount of money or goods.There are two participants in the game: a proposer and a responder.At first, the proposer makes a take-it-or-leave-it offer (1-x, x).x is the money for the responder, while (1-x) is left for himself.The responder responds to the offer then.If he accepts the division, then both people earn the specified amounts and the game is over.If he rejects, they both get nothing.Also, the game is over.
Suppose 100 yuan is divided between the two participants in the ultimate game.The proposer's strategy set composed of ten strategies what are the proportion offered to the other {0-10%, 11-20%, 21-30%,……, 91-100%}, symboled by {S 1 , S 2 ……S 10 }.While the strategy set of the responder is {accept, refuse}.Suppose the offers from proposer obey the uniform distribution, then the average payoff of each strategy is corresponding to {95,85,75,65,55,45,35,25,15,5}.When the Proportion offered to the responder from the proposer is lower comparatively, the responder rejects at most time.Otherwise, offers are rarely rejected.Using the experiment results of Camerer, the expected payoffs of the proposer of every strategy are listed as Tab.1.
As Tab.1, the strategies with lower offer such as S 1 and S 2 have higher mean payoffs, but the expected payoffs of the proposer are quite low due to the high rejection rate.The rejection rate decreases with the increasing division proportion, which leads to increase the expected payoffs.The expected payoff reaches maximum at S 5 but then decreases.So, S 5 should be the optimal strategy according to the expected payoffs.But people may not choose the optimal strategy due to the temptation of the average payoffs at first.They may adjust their strategies gradually through learning and fair thinking to reduce the choice probability of strategy S 1 or S 2 , and increase the choice probability of fair strategy such as S 5 relatively.So S 5 becomes the equilibrium strategy of the game.
All the values of parameter δ, ρ, φ are between 0 to1 but they all depend on the actual circumstances.
When the players are changed, the parameter will change.According to the documents of Camerer, δ is the weight placed on forgone payoffs.In the ultimatum game, the opportunity payoffs of the forgone strategy is low, as the strategy's expected payoff is lower.Suppose δ=0.36.ρ controls the rate at which attractions grow, which implicates the learning ability of participants.While in general, it is between 0.65 and 0.85, here we suppose ρ =0.82.φ reflects the decay of previous attractions, which decays slowly in general.Also we suppose φ = 0.97.N(0)is the strategy's initial weight, supposing N (0) 1 = .We calculate updated attraction of each strategy by the EWA learning model and FGL model in period 2(t=2).Under the EWA model, according to the initial values and the parameters' values above, taking the expected payoffs in Tab.1 as the material payoffs, we calculate attractions and choice probabilities of strategies by equation ( 1), see Tab.2.We can get the choice probabilities of strategies in ten periods through iterative updating calculation, see Fig. 1.
The processes simulated by our model are as follows.We can calculate each strategy's fair payoffs as equation ( 4) due to the expected payoff as Tab.1 at first.
Using the fair payoffs instead of the expected payoffs, we update each strategy's attraction by equation ( 5), and then calculate its choice probability by equation (3).see Tab.4.We can get the choice probabilities of the strategies in next nine periods through iterative updating calculation, see Fig. 2.
From Fig. 1 and Fig. 2, we find both models reflect the decrease of S 1 's choice probability and the increase of S 5 's with the game.Figure 1 shows that, under the EWA learning model, the choice probability of S 5 exceeds the choice probability of S 1 for the first time in period 6.However, Fig. 2 shows that, under the FGL model, the choice probability of S 5 has exceeded the choice probability of S 1 for the first time in period 4.and the proposer has no more tendency to change the strategy from this on.The choice probability of this strategy becomes higher.The results show that both models can simulate the changing process of choosing strategy.But FGL model converges to equilibrium strategy S 5 earlier than EWA learning model, which implicates our FGL model has stronger forecast ability.

Conclusion
As participants have both learning and fair thinking in repeated games, puted the fair factors into learning model can predicate equilibrium more accurately, and solve the problem that the players learn slowly and disefficiently in learning models.The main conclusions of this paper are as follows: (1) Putting the fair factor into EWA model, and replacing material payoffs with psychological effect function, FGL model revise strategies' payoffs and attractions.
(2) Compared the EWA model with FGL model through the simulation results of ultimatum game, the latter model can predicate the choice probability of each strategy nd equilibrium more accurately.
(3) Comparing FGL with the EWA according to the experiment results, both of the models can capture players' decision process accurately.The equilibrium converges to strategy S4 simulated according to experiment 1, and which is S5 according to experiment 2. However, the decision process simulated by our model is closer to the actual process of players, so FGL model has stronger explanatory and forecasting power than the EWA learning model.

Notes
Note 1. See Gale,Binmore and samuelson (references[16]), Rothand Erev (references[16]).Note 2. Attractions represent the degree of strategies attracting players to choose, and it is positive correlative with the choice probabilities of strategies.Note 3. Camerer denoted in References [22][23].The later is adopted in our paper.Note 4. As fairness model can only be applied to games with two players, but EWA learning model can be applied to multiplayers games(n>=2), we only take n=2 into account to keep the consistence of both models, that is, there are only player i and j .
Note 5. See [20]: Rabin,Matthew(1993). Incorporating Fairness into game theory and economies.American Economic Review.83Note 6.For Rabin's fairness model just applies two players only, I just choose the EWA learning model where n=2 to construct the fairness equilibrium learning model.

Table1. The strategy and its corresponding expected payoff
belief about what strategy player i believes player j believes what strategy he is choosing.Other symbols have the same meaning as above.
data come from the experiment of Camerer in 2003; b: Expected Payoff = Average Payoff *(1-Reject Proportion of responders)

Figure 1 .
Figure 1.Strategies of proposers simulated by EWA learning model in ultimatum game

Figure 2 .
Figure 2. Strategies of proposers simulated by FGL model in ultimatum game

Table 2 .
Attractions and choice probabilities of strategies of EWA model in period one

Table 3 .
The fair payoffs of strategies of the proposer

Table 4 .
Attractions and choice probabilities of strategies of FGL model in period one