Stochastic Methods in Game Theory
(16 Nov - 25 Dec 2015)
## ~ Abstracts ~
Jacob Abernethy, University of Michigan, USAA well-studied deterministic algorithmic technique for convex optimization is the so-called "interior point method" method of Nesterov and Nemirovski, which involves taking a sequence of Newton steps along the "central path" towards the optimum. An alternative randomized method, known as simulated annealing, involves performing a random walk around the set while "cooling" the stationary distribution towards the optimum. We will show that these two methods are, in a certain sense, fully equivalent: both techniques can be viewed as different types of path following. This equivalence allows us to get an improved state-of-the-art rate for simulated annealing, and provides a new understanding of random walk methods via barrier functions.
Shivani Agarwal, Indian Institute of Science, IndiaHarvard University, USA I will talk about two recent themes in our work where there are interesting connections between machine learning and game theory/economics/operations research. The first theme is surrogate risk minimization in supervised learning, where we show that calibrated surrogate losses can essentially be viewed as strictly proper scoring rules for eliciting certain properties of the conditional label distribution. The second theme is ranking from pairwise comparisons, where we examine the conditions under which ranking algorithms proposed in different fields (such as spectral ranking, least squares ranking, Borda ranking, etc) perform well -- we find there is considerable difference in these conditions across different algorithms, and propose alternative ranking algorithms that perform well under broader conditions.
Yossi Azar, Tel Aviv University, IsraelWe present a correlated rounding technique for designing mechanisms that are truthful in expectation. It is elementary and can be implemented quickly. The main property we rely on is that the domain offers fractional optimum solutions with a tree structure. In particular we discuss the restricted-related scheduling with selfish machines, where each job comes with a public weight, and it must be assigned to a machine from a public job specific subset. Each machine has a private speed and strives to maximize payments minus workload of jobs assigned to it. Here we design a mechanism for makespan minimization. This is a single-parameter domain, but the approximation status of the optimization problem is similar to unrelated scheduling: The best known algorithm obtains a (non-truthful) 2-approximation for unrelated machines, and there is 1.5-hardness. Our mechanism matches this bound with a truthful 2-approximation. Joint work with M. Hoefer, I. Maor, R. Reiffenhauser and B. Vocking.
Peter Bartlett, University of California at Berkeley, USAWe consider a linear regression game in which the covariates are known in advance: at each round, the learner predicts a real value, the adversary reveals a label, and the learner incurs a squared error loss. The aim is to minimize the difference between the cumulative loss and that of the linear predictor that is best in hindsight. For a variety of constraints on the the adversary's labels, we show that the minimax optimal strategy is linear, with a parameter choice that is reminiscent of ordinary least squares, is easy to compute, and does not require knowledge of the constraint set. We obtain an explicit expression for the minimax regret. We also consider the case of adversarial design, and exhibit constraint sets of covariate sequences for which the same strategy is minimax optimal. Joint work with Wouter Koolen, Alan Malek, Eiji Takimoto and Manfred Warmuth.
Mario Bravo, Universidad de Santiago de Chile, ChileWe study a class of game-theoretic learning dynamics in the presence of random payoff disturbances and observation noise, and we provide a unified framework that extends several rationality properties of the (stochastic) replicator dynamics and other game dynamics. In the multi-player case, we find that dominated strategies become extinct almost surely and strict Nash equilibria remain stochastically asymptotically stable, independently of the perturbations' magnitude. Also, we establish an averaging principle for 2-player games and we show that the empirical distribution of play converges to Nash equilibrium in zero-sum games under any noise level. In the unilateral case, we show that the stochastic dynamics under study lead to no regret, again irrespective of the noise level. Joint work with P. Mertikopoulos.
Sébastien Bubeck, Microsoft Research, USAI will prove that the minimax regret for bandit convex optimization is O(poly(n) sqrt(T)), thus resolving a decade-old open problem. The proof is based on an information theoretic analysis of Bayesian bandit problems, as well as a new understanding of convex functions. Joint work with Ronen Eldan.
Ozan Candogan, The University of Chicago, USAIn this work, we introduce a novel flow representation for finite games in strategic form. This representation allows us to develop a canonical direct sum decomposition of an arbitrary game into three components, which we refer to as the potential, harmonic and nonstrategic components. We analyze natural classes of games that are induced by this decomposition, and in particular, focus on games with no harmonic component and games with no potential component. We show that the first class corresponds to the well-known potential games. We refer to the second class of games as harmonic games, and study the structural and equilibrium properties of this new class of games. Intuitively, the potential component of a game captures interactions that can equivalently be represented as a common interest game, while the harmonic part represents the conflicts between the interests of the players. We make this intuition precise, by studying the properties of these two classes, and show that indeed they have quite distinct and remarkable characteristics. For instance, while finite potential games always have pure Nash equilibria, harmonic games generically never do. Moreover, we show that the nonstrategic component does not affect the equilibria of a game, but plays a fundamental role in their efficiency properties, thus decoupling the location of equilibria and their payoff-related properties. Exploiting the properties of the decomposition framework, we obtain explicit expressions for the projections of games onto the subspaces of potential and harmonic games. This enables an extension of the properties of potential and harmonic games to "nearby" games. We exemplify this point by showing that the set of approximate equilibria of an arbitrary game can be characterized through the equilibria of its projection onto the set of potential games.
Nicolo Cesa-Bianchi, Universitŕ degli Studi di Milano, ItalyOnline learning can be formulated as a repeated game between a player and an environment. On each round, the player chooses one of K actions and incurs a loss, where the loss associated with each action on each round is determined by the environment. After choosing an action, the player gets a feedback. We study a class of online learning problems where the feedback is specified by a graph. This class includes online prediction with expert advice and the multi-armed bandit problem, but also several learning problems where the online player does not necessarily observe his own loss. We analyze how the structure of the feedback graph can be used to characterize the minimax regret of the associated learning problem. In particular, we prove upper bounds that hold for adversarial environments and lower bounds for stochastic enviroments. Our results are subsume virtually all previous work on learning with feedback graphs and reveal new connections to partial monitoring games. Joint work with: Noga Alon (Tel-Aviv), Ofer Dekel (Microsoft), Tomer Koren (Technion)
Krishnendu Chatterjee, Institute of Science and Technology, AustriaIn this talk we will present several strategy and computational complexity results for various subclasses of stochastic games. We will also mention several major open problem related to the computational study of stochastic games.
Kamalika Chaudhuri, University of California at San Diego, USAAn active learner is given a hypothesis class, a large set of unlabeled examples and the ability to interactively query labels of a subset of these examples; the goal of the learner is to learn a hypothesis in the class that fits the data well by making as few label queries as possible. The main challenge in designing active learning algorithms is avoiding sampling bias and maintaining statistical consistency in the agnostic setting, that is, when there is no perfect hypothesis in the class that fits the data. In this talk, I will first present a novel active learning scheme which is statistically consistent in the agnostic setting, fully general, and has lower label complexity than previous work. We will see how this algorithm follows from a novel reduction between active learning and confidence-rated prediction, that is, classifiers that can say "I don't know". Next, I will present some of our recent work on active learning when labels are obtained from multiple annotators with slightly different labeling patterns. The main challenge is again maintaining statistical consistency; we will show how to address this challenge by cost-sensitive learning of the region of difference between the annotators. This talk is based on joint work with Chicheng Zhang.
Roberto Cominetti, Universidad de Chile, ChileIn this tutorial we overview network congestion games, starting from the basics and proceeding to more recent research topics. begin by studying the classic atomic model and present results on existence and uniqueness of equilibria based on Rosenthal's potential. We then discuss the analog results for the concept of Wardrop equilibrium in non-atomic network routing games. More advanced topics include the more sophisticated atomic splittable model as well as the notion of stochastic user equilibrium. Along the we will review results concerning computation of equilibria, their efficiency, and some adaptive dynamics that converge to equilibrium. Joint work with Jose Correa.
Costas Courcoubetis, Singapore University of Technology and DesignCrowdsensing requests mobile device users to sense useful information while they travel, and incentive mechanisms are crucial for inducing heterogeneous users to perform tasks that are more valuable for the community. In this paper we combine crowdsensing with routing, where a unit mass of nonatomic selfish users decide their trips in a non-cooperative game in choosing between a high-cost and a low-cost path. The information is aggregated over all such trips and made publicly available to benefit all users. Due to information overlap in sensing, the total useful content amount increases with the diversity in path choices made by the users. To remedy the inefficient low content equilibrium where all users choose the low-cost path, we propose and analyse two kinds of incentive mechanisms, one based on side payments and the other on content restrictions. Under asymmetric information about user types, both mechanisms efficiently penalise the participants that use the low-cost path and reward the participants that take the highcost path. They lead to greater path diversity and hence to more total content at the cost of user participation reduction or content destruction, and we show that user diversity can have opposite effects on the two mechanisms. Finally, we propose a combined mechanism to further increase the social welfare. Joint work with Yunpeng Li and Lingjie Duan.
Marco Cuturi, Kyoto University, JapanOptimal transport distances (a.k.a Wasserstein distances or Earth Mover's distances, EMD) define a geometry for empirical measures supported on a metric space. After reviewing the basics of the optimal transport problem, I will show how an adequate regularization of that problem can result in substantially faster computations. I will then show how this regularization can enable several applications of optimal transport to learn from probability measures, from the computation of barycenters to that of dictionaries or PCA, all carried out using the Wasserstein geometry.
Jim Dai, Cornell University, USADiffusion models have been used for steady-state analysis of many stochastic systems. A recent paper (Braverman and Dai 2015) demonstrates that Stein's method is a natural tool to establish error bounds for steady-state diffusion approximations. It turns out that the method also serves as a practical engineering tool for building robust diffusion models that are accurate in a number of parameter regimes. I will illustrate these advances through a series of examples.
Angelo Fanelli, Centre National de la Recherche Scientifique, FranceAmong other solution concepts, the notion of the pure Nash equilibrium plays a central role in Game Theory. Pure Nash equilibria in a game characterize situations with non-cooperative deterministic players in which no player has any incentive to unilaterally deviate from the current situation in order to achieve a higher payoff. Unfortunately, it is well known that there are games that do not have pure Nash equilibria. Furhermore, even in games where the existence of equilibria is guaranteed, their computation can be a computationally hard task. Inspired by such negative results, the concepts of approximate pure Nash equilibrium has been recently developed. Approximate pure Nash equilibria, which characterize situations where no player can significantly improve her payoff by unilaterally deviating from her current strategy, could serve as alternative solution concepts provided that they exist and can be computed efficiently. In this talk, we present the most recent results on approximate pure Nash equilibria in congestion games.
Mathieu Faure, GREQAM - Centre de la vieille Charité, FranceIn this talk we consider a game repeatedly played on a given network. We assume that the payoff of a given player only depends on his own action and the aggregate action of his neighbors in the network. Our objective is to understand whether agents are able to learn how to play Nash equilibrium when they have very little information about the structure of the game, and when they follow some simple behaviors. We focus on games with strategic substitutes (e.g. public good games), where the set of Nash equilibria is in general more complicated, and therefore raises more issues.
Vitaly Feldman, IBM, USAExisting approaches to ensuring the validity of inferences drawn from data assume a fixed procedure to be performed, selected before the data are examined. Yet the practice of data analysis is an intrinsically interactive and adaptive process: new analyses and hypotheses are proposed after seeing the results of previous ones, parameters are tuned on the basis of obtained results, and datasets are shared and reused. In this work we initiate a principled study of how to guarantee the validity of statistical inference in adaptive data analysis. We demonstrate new approaches for addressing the challenges of adaptivity that are based on techniques developed in privacy-preserving data analysis. As an application of our techniques we give a simple and practical method for reusing a holdout (or testing) set to validate the accuracy of hypotheses produced adaptively by a learning algorithm operating on a training set.
Gaëtan Fournier, Tel Aviv University, IsraelIn pure Hotelling games, a finite number of players select their locations on a network. These games do not satisfy the properties of congestion games. However, there are strong analogies between these two classes of games when the number of players is large. For example, when the set of possible locations is finite, and when there is no more any empty location, the game is basically a congestion game: indeed the payoff of a player only depends on the number of players on his own location. We give a brief survey of pure Hotelling game and try to examine in detail the relations between Hotelling games and congestions games.
Fabien Gensbittel, Toulouse School of Economics, FranceWe study a model of two-player, zero-sum, stopping games with asymmetric information. We assume that the payoff depends on two continuous-time Markov chains (X_t, Y_t), where X_t is only observed by player 1 and Y_t only by player 2, implying that the players have access to stopping times with respect to different filtrations. We show the existence of a value in mixed stopping times and provide a variational characterization for the value as a function of the initial distribution of the Markov chains. We also prove a verification theorem for optimal stopping rules which allows to construct optimal stopping times. Finally we use our results to solve explicitly two generic examples.
Hugo Gimbert, Centre National de la Recherche Scientifique, FranceWe present the class of asynchronous stochatic games. A team of player is playing against its environment. Each player plays on a local stochastic game and is seeking a reachability objective: all payoffs are 0 except in a signal absorbing state where the payoff is 1. Actions of players in the local games may synchronize together, in this case the players share all their knowledge, which can be used to take further decisions. We show that the existence of a winning strategy for the team can be decided for some classes of asynchronous games, and that the winning strategies can be computed. We also present some open questions.
Josef Hofbauer, University of Vienna, AustriaA game is unprofitable if equilibrium payoffs do not exceed the maxmin payoff for each player. In unprofitable non-zero-sum games NE play has been notoriously difficult to justify. Does learning lead to NE in such games?
Johannes Hörner, Yale University, USAThis short tutorial will cover results regarding non-zero sum stochastic games. We will consider in turn games in which the state is complete information, and then games in which it is not. Further, both the undiscounted and the discounted criterion will be surveyed. Emphasis will be given on recent results and open questions in the literature.
Xu Huan, National University of SingaporeAn important challenge in Markov decision processes is to ensure robustness with respect to unexpected or adversarial system behavior, i.e., the parameter uncertainty. A standard paradigm to tackle this problem is the robust MDP framework, which models the parameters as arbitrary elements of pre-defined "uncertainty sets", and seeks the minimax policy - the policy that performs the best under the worst realization of the parameters in the uncertainty set. This, essentially defines a zero-sum game between the decision maker and the adversarial "Nature". A crucial problem of robust MDP, largely unaddressed in literature, is how to find appropriate description of the uncertainty in a principled data-driven way. In this talk we address this problem using an online learning approach: we devise an algorithm that, without knowing the true uncertainty model, is able to adapt its level of protection to uncertainty, and in the long run performs as good as the minimax policy as if the true uncertainty model is known. Indeed, the algorithm achieves similar regret bounds as standard MDP where no parameter is adversarial, which shows that with virtually no extra cost we can adapt robust learning to handle uncertainty in MDPs. To the best of our knowledge, this is the first attempt to learn uncertainty in robust MDPs.
Rahul Jain, University of Southern California, USAI will start by talking about a natural framework for simulation-based optimization and control of MDP models. The idea is very simple: Replace the Bellman operator by its 'empirical' variant wherein expectation is replaced by a sample average approximation. This leads to a random Bellman operator in the dynamic programming equations. We introduce several notions of probabilistic fixed points of such random operators, and show their asymptotic equivalence. We establish convergence of empirical Value and Policy Iteration algorithms by a stochastic dominance argument. The idea can be generalized to asynchronous dynamic programming leading to a reinforcement learning algorithm. We also show how the 'empirical' DP method can be combined with state space sampling and function approximation for solving continuous MDP problems. The mathematical technique introduced is useful for analyzing other iterated random operators. For example, we can show that the stochastic gradient descent (SGD), SGD with variance reduction, the newton method, the dual sub gradient method can all be thought of this way, and our framework gives a common method for their convergence analysis. Preliminary numerical results show better convergence rate and runtime performance than other commonly used schemes. This is based on joint works with William Haskell, Dileep Kalathil, Abhishek Gupta, Vivek Borkar, Peter Glynn and Hiteshi Sharma.
Sandeep Juneja, Tata Institute of Fundamental Research, IndiaIn this talk we first review the concert queueing game with a finite number of customers, where the queue opens at a specified time and the customers are allowed to queue up earlier. Customers would like to get serviced quickly as and wait in the queue as little as possible. We consider a variant of this game where the customers relax in a lounge before deciding to join the queue and can see the queue length at all times. The service times are assumed to be exponentially distributed. In this setting we arrive at a symmetric Nash equilibrium dynamic policy for each customer. We also discuss a related setting of an M/M/1 queue where customers first arrive at a lounge and then decide to join the queue at an opportune equilibrium time.
Satyen Kale, Yahoo! Labs, USAWe initiate the study of boosting in the online setting, where the task is to convert a "weak" online learner into a "strong" online learner. The notions of weak and strong online learners directly generalize the corresponding notions from standard batch boosting. For the classification setting, we develop two online boosting algorithms. The first algorithm is an online version of boost-by-majority, and we prove that it is essentially optimal in terms of the number of weak learners and the sample complexity needed to achieve a specified accuracy. The second algorithm is adaptive and parameter-free, albeit not optimal. For the regression setting, we give an online gradient boosting algorithm which converts a weak online learning algorithm for a base class of regressors into a strong online learning algorithm which works for the linear span of the base class. We also give a simpler boosting algorithm for regression that obtains a strong online learning algorithm which works for the convex hull of the base class, and prove its optimality.
Michihiro Kandori, The University of Tokyo, JapanFinding a general method to identify equilibria in repeated games with observation errors (repeated games with "private monitoring") is a long-standing open problem in game theory. In a repeated game with private monitoring, each player receives a noisy observation of other players' actions, and a player's observation is not known to others. Thus, players have difficulty in having a common understanding about who is to be punished and when punishment should be started and ended. This paper is the first to provide a general and computationally feasible method to verify equilibria in this setting, provided that each player's behavior is described by a finite state automaton. Computational feasibility means that our method verifies equilibria in a finite number of steps, under some regularity conditions. Joint work with Ichiro Obara, Atsushi Iwasaki and Makoto Yokoo.
Bart de Keijzer, Sapienza University of Rome, ItalyIn the "The curse of simultaneity'', Paes Leme at al. show that there are interesting classes of games for which sequential decision making and corresponding subgame perfect equilibria avoid worst case Nash equilibria, resulting in substantial improvements for the price of anarchy. This is called the sequential price of anarchy. A handful of papers have lately analysed it for various problems, yet one of the most interesting open problems was to pin down its value for linear atomic routing (also: network congestion) games, where the price of anarchy equals 5/2. We prove surprisingly that the sequential price of anarchy is unbounded even for linear symmetric routing games, thereby showing that sequentiality can be arbitrarily worse than simultaneity for this important class of games. Another result I will talk about is an NP-hardness proof for computing the strategy profile induced by a subgame perfect equilibrium, as well as a tight bound of 7/5 for the special case where there are two players. Lastly, we solve an open problem in the area by establishing that the (regular) price of anarchy for symmetric routing games equals 5/2. This talk is based on joint work with José Correa, Jasper de Jong, and Marc Uetz.
Yoav Kerner, Ben Gurion University of the Negev, IsraelWe consider an M/G/1 queue in which the customers, while waiting in line, observe the dynamic of the queue, and may decide reneging it. We show that the Nash equilibrium profile is defined by two sequences of thresholds. For each customer, the decision is based on the observed past (that determines from what sequence the threshold is taken) and the observed queue length (that determines which element in the chosen sequence). We construct the set of equations that its solution is the Nash equilibrium and discuss the relation between the properties of the service time distribution and the properties of the Nash equilibrium (e.g. uniqueness, finiteness). Joint work with Eliran Schertzer
Nicolas Klein, University of Montreal, CanadaThis paper studies strongly symmetric equilibria (SSE) in continuous-time games of strategic experimentation with Poisson bandits. SSE payoffs can be studied via two functional equations similar to the HJB equation used for Markov equilibria. This is valuable for three reasons. First, these equations retain the tractability of Markov equilibrium, while allowing for punishments and rewards: the best and worst equilibrium payoff are explicitly solved for. Second, they capture behavior of the discrete-time game: as the period length goes to zero in the discretized game, the SSE payoff set converges to their solution. Third, they encompass a large payoff set: there is no perfect Bayesian equilibrium in the discrete-time game with frequent interactions with higher asymptotic efficiency. Joint work with Johannes Hörner and Sven Rady
Max Klimm, Technische Universität Berlin, GermanyWeighted congestion games are a class of noncooperative games that constitute an elegant model for the resource usage by selfish players. In a weighted congestion game, we are given a finite set of resources and a strategy of each player is to choose a subset of them. The private cost of each player is the sum of the costs of the chosen resources which depends on the total weight of the players using them. It is known that weighted congestion games need not possess a pure Nash equilibrium, i.e. a strategy vector from which no player can profitably deviate. We give a complete characterization of the maximal set of cost functions that one can allow on the resources in order to guarantee the existence of a pure Nash equilibrium. Further results on the existence of equilibria in related models are given.
Joon Kwon, University Pierre and Marie Curie, FranceWe demonstrate that, in the classical non-stochastic regret minimization problem with d decisions, gains and losses to be respectively maximized or minimized are fundamentally different. Indeed, by considering the additional sparsity assumption (at each stage, at most s decisions incur a nonzero outcome), we derive optimal regret bounds of different orders. Specifically, with gains, we obtain an optimal regret guarantee after T stages of order sqrt(T log s), so the classical dependency in the dimension is replaced by the sparsity size. With losses, we provide matching upper and lower bounds of order sqrt(T s log(d)/d), which is decreasing in d. Eventually, we also study the bandit setting, and obtain an upper bound of order sqrt(T s log(d/s)) when outcomes are losses. This bound is proven to be optimal up to the logarithmic factor sqrt log(d/s).
Rida Laraki, Université Paris Dauphine, FranceWe provide necessary and sufficient conditions to approach --in expectation-- convex sets in several classes of absorbing games. Contrarily to the standard Blackwell framework, the conditions differ if we consider strong versus weak approachability and depend on the structure of absorption. For instance, in big-match games of type 1, we have a simple formula in case of weak approachability, but we have no characterization in case of strong approachability. On the contrary, in big-match games of type 2, conditions for strong approachability are much simpler to state compared to necessary and sufficient conditions for weak approachability. In collaboration with: Vianney Perchet (University of Paris Diderot) and Janos Flesch (Maastricht of University)
Jihong Lee, Seoul National University, KoreaWe formulate and analyze a model of (full) repeated implementation with incomplete information. A group of infinitely-lived agents possess state-dependent utilities over a set of outcomes and in each period a state is drawn independently from an identical prior distribution. Each agent privately observes some partial contents of a realized state and may condition behavior on his private information as well as publicly observable histories. It is shown that, with minor qualifications, a social choice function (SCF) that is efficient in the range and incentive compatible can be repeatedly implemented in Bayesian Nash equilibrium under the general information structure. When the agents' utilities are interdependent, incentive compatibility can be replaced with a payoff identifiability condition. We also show that efficiency in the range is sufficient for approximate repeated implementation in public strategies when the information structure satisfies certain statistical identifiability properties. Joint work with Hamid Sabourian.
Stefano Lovo, HEC Paris, FranceWe consider Zero sum revision games where players moves are asynchronous. We show that: 1) the value exists in pure and Markov strategies ; 2) it is continuous and monotonic in the relative frequency with which players receive their revision opportunities; 3) it converges to the pure minmax as this frequency goes to infinity. 4) As the revision time goes to infinity these game value do not depend on the initial position. 5) For 2X2 games we fully characterize the equilibrium payoff and equilibrium strategies. We compare the revision value with value of the one shot game.
Frédéric Meunier, Ecole Nationale des Ponts et Chaussées, FranceConsider nonatomic congestion games played on a network. Almost everything is known about them when the cost functions are affine and do not depend on the players. It contrasts with the player specific case. In particular, uniqueness, efficiency, and algorithmic complexity of the equilibrium are in this case far from being understood. I will present some recent results obtained with Thomas Pradeau during his PhD and emphasize many open questions.
Igal Milchtaich, Bar-Ilan University, IsraelCongestion models may be studied from either a social or an individual point of view. The first, more traditional, perspective concerns social goals such as the minimization of the mean travel time in a transportation network. The second perspective examines the incentives of individual users, who are only interested in their own, personal payoff and ignore the externalities that their choices of resources create for the other users. This paper studies a more general setting in which individual users attach some weight r to the social payoff, which is not necessarily 1 or 0. It examines the comparative-statics question of whether higher r necessarily means higher equilibrium social payoff.
Andrzej Nowak, University of Zielona Góra, PolandWe consider discounted stochastic games with Borel state and compact action spaces. The primitives of our model satisfy standard continuity and measurability conditions. The transition probability is a convex combination of finitely many probability measures depending on states and is dominated by some finite measure on the state space. The coefficients of the combination depend on both states and action profiles. This class of models contains stochastic games with Borel state spaces and finite, state-dependent action sets. Our main result establishes the existence of subgame perfect equilibria, which are stationary in the sense that the equilibrium strategy for each player is determined by a single function of the current and previous states of the game. A counterexample given recently by Levy shows that stationary equilibria depending only on the current state may not exist in the discussed class of games. This talk is based on a recent joint work with Anna Jaskiewicz.
Wojciech Olszewski, Northwestern University, USAWe study chip strategies in repeated oligopoly games, in which firms privately observe their costs of production. The simplest form of chip strategies allows the firms with the lowest costs to charge a lower price and serve the entire market. In exchange, the firms (implicitly) issue chips which are distributed among the other firms. There is an upper bound on the number of chips that each firm is allowed to issue. If this bound is reached, the firm is supposed to stay out of the market for one period. Chip strategies are simple, but it may not be easy to verify that they satisfy equilibrium conditions, and to compute the payoff vectors that they attain. We develop a method of doing so in the limit case, in which the discount factor tends to 1 and the upper bound for the number of chips tends to ∞ . We show that in this limit case, chip strategies attain the efficient payoff vector. They do so for any number of firms, even if the costs of production are Markovian, and the strategies are required to be coalition proof.
Marcin Peski, University of Toronto, CanadaWe consider the value of persistent information in strictly competitive situations, formalized as stochastic zero-sum games where only the maximizer observes the state that evolves according to an ergodic Markov operator. We say that operator $Q$ is better for the maximizer than operator $P$ if the value of the game under $Q$ is higher than under $P$ regardless of the stage game. We show that this defines a partial order on the space of ergodic Markov operators, and provide a full characterization of this partial order. An i.i.d.\ state is the best case for the informed player; however, a perfectly persistent state is not necessarily the worst case. The analysis relies on a novel characterization of the value of a stochastic game with incomplete information. Our results can alternatively be interpreted as pertaining to the limit of the minmax value in repeated Bayesian games with Markov types.
Georgios Piliouras, Singapore University of Technology and DesignNash's proof of equilibrium existence has offered a tool of seminal importance to economic theory, however, a remaining, nagging hurdle is the possibility of multiple equilibria, perhaps with quite different properties. A wide range of approaches has been employed in an effort to reduce the study of games down to analyzing a single behavioral point (e.g., risk/payoff dominant equilibria, evolutionarily/stochastically stable states, e.t.c.). We move away from the search of a unique behavioral prediction. We propose a quantitative framework where the predictions of different equilibria are weighted by their probability to arise under evolutionary dynamics given random initial conditions. This average case analysis is shown to offer the possibility of significantly tighter understanding of classic game theoretic settings, including quantifying the risk dominance of payoff dominated equilibria in stag-hunt games and reducing prediction uncertainty due to large gaps between the efficiency of best/worst equilibria in coordination/congestion games. Based on joint work with Ioannis Panageas.
Catherine Rainer, Université de Brest, FranceIn the framework of repeated games with asymmetric information as well as for continuous time games with incomplete information on one side, it is well known that the value of the game has a probabilistic interpretation in terms of a game (resp. control-) problem, where the actions of the players are martingales. We show that result also holds for continuous time games with lack of information on both sides. In our case these martingales are continuous and can be expressed as stochastic integrals with respect to two Brownian motions. Joint work with Fabien Gensbittel
Jérôme Renault, Toulouse School of Economics, France1) The basic model Stochastic games with finitely many states and actions. Examples. Value of the game with finitely many stages, value of the discounted game. The algebraic approach The Big Match. The uniform value 2) Extensions and a few recent results A simple compact continuous game with no asymptotic value. Equivalence of the uniform convergence of the values. Dynamic programming, Gambling houses and MDP. Repeated Games with incomplete information. Open problems.
Jérôme Renault, Toulouse School of Economics, FranceWe consider general 2-player zero-sum stochastic games where each player controls his own state variable living in a compact metric space. The terminology comes from gambling problems where the state of a player represents its wealth in a casino. Under natural assumptions (such as continuous running payoff and non expansive, possibly stochastic, transitions), we consider for each discount factor the value of the $\lambda$-discounted stochastic game and investigate its limit when $\lambda$ goes to 0. We show that under a strong acyclicity condition, the limit exists and is characterized as the unique solution of a system of functional equations: the limit is the unique continuous excessive depressive function such that each player, if his opponent does not move, can reach the zone when the current payoff is at least as good than the limit value, without degrading the limit value.The approach generalizes and provides a new viewpoint on the Mertens-Zamir system coming from the study of zero-sum repeated games with lack of information on both sides. A counterexample shows that under weak acyclicity, convergence may fail. Joint work with Rida Laraki.
Thomas Rivera, HEC Paris, FranceWe study a discrete form of dynamic congestion games whereby a fixed number of players, utilizing a common fixed capacity road, wish to arrive at the destination before a given time. In such games the players, taking into account the effects of congestion on the road, strategically decide how early to depart in order to minimize their travel time subject to a penalty cost if they arrive late. We show that for large penalty costs the best and worst Nash equilibrium payoffs of the game are substantially bounded away from the social optimum. Next, we show that by utilizing correlated departure strategies the social planner can achieve a correlated equilibrium outcome with payoffs arbitrarily close to the social optimum. When coupled with a simple toll pricing mechanism we show that such an approximate social optimum is available for any penalty cost and, in equilibrium, never requires players to pay tolls.
Filippo Santambrogio, Université Paris-Sud, FranceI will review some models for traffic congestion that we studied with Carlier and Jimenez in 2008 as a continuous counterpart of the notion of Wardrop equilibria on networks, and underline in particular the connections with convex optimization. Then, I will show the analogy with the Mean Field Games model, introduced by Lasry and Lions in 2007 and nowadays widely studied in the PDE, optimal transport and optimal control communities. I will briefly address some regularity questions, which will be crucial for the sequel of the talk. Indeed, I will also present the corresponding models where "congestion costs" (i.e., the price for a path increases when the density of the players on this very same path is too high) are replaced by "capacity constraints" (i.e., there is a constraint on the maximal density of players that can be located at a same spot). In this case, as it is typical in fluid mechanics problems, a pressure appears, which will play the role of a price that the players pay to pass through saturated areas, and we need to study the mathematical nature of this pressure (or price) term. This is the object of a recent work with Cardaliaguet and Mészáros.
Marc Schroeder, Universiteit Maastricht, The NetherlandsWe propose a model of discrete time dynamic congestion games with atomic players and a single source-destination pair. The latencies of edges are composed by free-flow transit times and possible queuing time due to capacity constraints. We give a precise description of the dynamics induced by the individual strategies of players and of the corresponding costs, either when the traffic is controlled by a planner, or when players act selfishly. Importantly, we model seasonalities by assuming that departure flows fluctuate periodically over time. Our main contributions are two-fold. First, we introduce a measure that captures the queues induced by periodicity of inflows. For socially optimal flows, this measure is the increase in costs compared to uniform departures. The same holds for equilibrium flows, if the network is parallel. In general the analysis is more intricate. We even provide an example in which periodic departures induce lower equilibrium costs than the uniform departures. Second, we illustrate a new dynamic version of Braess's paradox: the presence of initial queues in a network may decrease the long-run costs in equilibrium. This paradox may arise even in networks for which no Braess's paradox was previously known.
Tadashi Sekiguchi, Kyoto University, JapanWe study a model of infinitely repeated games where two or more identical prisoners' dilemmas with imperfect monitoring, whose monitoring structures are mutually independent, are simultaneously played every period. Our central question is how the equilibrium per-game payoff vector set is different from that of a benchmark model where a single repeated game is played. This question translates into a debate in industrial organization as to whether multimarket contact facilitates collusion. We are primarily interested in its private monitoring version, focusing on the belief-free equilibria by two-state pure action automaton strategies. Our objective is to derive some sufficient conditions for multimarket contact to facilitate collusion in terms of per-market equilibrium profits.
Ohad Shamir, Weizmann Institute of Science, IsraelI will survey several recent results on online learning with bandit feedback. Time permitting, these will include lower bounds on bandit linear optimization; bandit learning with two-point feedback; and multi-armed bandits with multiple users. A common theme is the surprising brittleness of well-established bandit techniques and results to minor or even 'trivial' modifications.
Nahum Shimkin, Technion - Israel Institute of Technology, IsraelCongestion phenomena in queues and related service systems give rise to significant interaction between customers. An extensive literature now exists that examines customer decisions in such systems from a strategic, game theoretic viewpoint. A particular class of such problems addresses timing decisions, where the decision variable is continuous and specifies the timing of a certain action of individual customers. In this talk I will review several problems that belong to this class. These encompass the timing of possible abandonment from a waiting line due to excessive wait; timing of arrival to a non-stationary queue; and timing of a post to a shared publication medium, in order to maximize exposure. The analysis characterizes the individual decision is equilibrium, and studies the implications to the system-wide operation and efficiency. The reported research includes joint works with Eitan Altman, Sandeep Juneja, and Avishai Mandelbaum.
Sylvain Sorin, Université Pierre et Marie Curie, FranceThis introductory tutorial aims to present models and algorithms related to learning in a strategic framework. A first part will cover unilateral procedures where an agent faces an unknown environment. The model is in discrete time and the agent chooses at each stage an action based on his previous knowledge. He observes then an outcome and get a reward. His objective is to maximize the flow of such rewards. We wiil present the basic tool : "approachability theory" and the induced algorithms for "no-regret" or "calibrating" criteria. A second part will deal with extensions and recent results including relation with on-line optimization and imperfect monitoring framework. A third part will consider global dynamics where all the players in a game use a learning procedure. We will describe various convergence results for classes of games to classes of equilibria. A last part will deal with procedures in continuous time and convergence results. Then we will present result allowing to deduce convergence in the discrete time original configurations.
Sylvain Sorin, Université Pierre et Marie Curie, FranceWe consider two person zero-sum games where the players control at discrete times t _{n} of a partition Π of R^{+} a continuous time Markov state process. We prove that the limit of the values vΠ exist as the mesh of Π goes to 0. The analysis covers the cases of :1) stochastic games (where both players know the state) 2) symmetric no information case. The proof is by reduction to a deterministic differential game.
Karthik Sridharan, Cornell University, USAIn this talk I will describe how the problem of online or sequential prediction is inherently connected to game theory, optimization and to proving exponential tail bounds for supremum of certain collections of martingales. We will first formalize the problem of online learning with adaptive regret bounds. Then as specific examples, we will look at how these learning algorithms provide strategies for zero sum two player games with uncoupled dynamics, how these learning algorithms will provide optimization methods for saddlepoint minimization problems and finally describe an equivalence between adaptive regret bounds and one sided concentration or deviation statements for supremum of collections of martingales.
Mathias Staudigl, University of Maastricht, The NetherlandsDynamic games are difficult to solve. This applies even more so in the framework of continuous time games. This talk provides an overview about finished and ongoing research on a particularly tractable class of repeated games with imperfect monitoring. We present a complete characterization of a subset of equilibrium payoffs, as well as connections to discrete time games.
Jan-Henrik Steg, Universität Bielefeld, GermanyWe construct subgame-perfect equilibria with mixed strategies for symmetric stochastic timing games with arbitrary strategic incentives. The strategies are qualitatively different for local first- or second-mover advantages, which we analyse in turn. When there is a local second-mover advantage, the players may conduct a war of attrition with stopping rates that we characterize in terms of the Snell envelope from the general theory of optimal stopping, which is very general but provides a clear interpretation. With a local first- mover advantage, stopping typically results from preemption and is abrupt. Equilibria may differ in the degree of preemption, precisely at which points it is triggered. We provide an algorithm to characterize where preemption is inevitable and to establish the existence of corresponding payoff-maximal symmetric equilibria.
Takuo Sugaya, Stanford University, USAMany cartels, especially in the industry where monitoring is imperfect and private, involve third-party facilitators, who organize meetings, distribute agreed market shares, collect and verify data, detect deviations, and moderate tensions. Motivated by this observation, we study repeated games with general private monitoring and mediator, and characterize the limit set of sequential communication equilibrium as the discount factor goes to one.
Yeneng Sun, National University of SingaporeThis paper aims to solve two fundamental problems on finite or infinite horizon dynamic games with complete information. Under some mild conditions, we prove (1) the existence of subgame-perfect equilibria in general dynamic games with simultaneous moves (i.e. almost perfect information), and (2) the existence of pure-strategy subgame-perfect equilibria in perfect-information dynamic games with uncertainty. Our results go beyond previous works on continuous dynamic games in the sense that public randomization and the continuity requirement on the state variables are not needed. As an illustrative application, a dynamic stochastic oligopoly market with intertemporally dependent payoffs is considered.
Csaba Szepesvári, University of Alberta, CanadaI will consider a sequential learning problem with Gaussian payoffs and side information: after selecting an action $i$, the learner receives information about the payoff of every action $j$ in the form of Gaussian observations whose mean is the same as the mean payoff, but the variance depends on the pair $(i,j)$ (and may be infinite). The setup allows a more refined information transfer from one action to another than previous partial monitoring setups, including the recently introduced graph-structured feedback case. For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finite-time minimax lower bounds available in the literature. We also provide algorithms that achieve the problem-dependent lower bound (up to some universal constant factor) or the minimax lower bounds (up to logarithmic factors). I will also discuss open questions remaining. The talk is based on joint work with Yifan Wu and Andras Gyorgy.
Caroline Thomas, The University of Texas at Austin, USAWe analyze the social and private learning at the symmetric equilibria of a queueing game with strategic experimentation. An infinite sequence of agents arrive at a server which processes them at an unknown rate. The number of agents served at each date is either: a geometric random variable in the good state, or zero in the bad state. The queue lengthens with each new arrival and shortens if the agents are served or choose to quit the queue. Agents can only observe the evolution of the queue after they arrive. Thus each agent solves the experimentation problem of how long to wait to learn about the probability of service. The agents, in addition, benefit from an informational externality by observing the length of the queue and the actions of other agents. There is also a negative payoff externality, as those at the front of the queue delay the service of those at the back. Joint work with Martin Cripps (UCL).
Guillaume Vigeral, Université Paris Dauphine, FranceWe consider games with varying stage duration as defined by Neyman (2013) and establish some of their asymptotic properties using the theory of nonexpansive operators in Banach spaces.
Yannick Viossat, Université Paris Dauphine, FranceIn strategic interactions, do evolutionary processes lead economic or biological agents to behave as if they were rational? To test this idea, many authors examined whether evolutionary game dynamics eliminate strictly dominated strategies. We will survey this literature and argue that the current division between imitative dynamics, who eliminate pure strategies strictly dominated by other pure strategies, and innovative dynamics that need not do so, is misleading. Indeed, we will show that pure strategies dominated by other pure strategies may survive even under evolutionary dynamics arising from imitation of successful agents.
Alex, Yijun Wang, University of Auckland, New ZealandConsider a network where two type of routes are available for users to choose from a source to a destination. On one type of route (FIFO), delay increases as traffic increases and on the other(batch-service), the delay decreases as traffic increases. Adding capacity to queues doesn't always yield better performance, when individuals choose their own route through a network. In this talk I will give results for a system of parallel queues, with one batch-service and N FIFO queues.
Jonathan Weed, Massachusetts Institute of Technology, USARepeated auctions are a natural target for machine learning, since buyers and sellers may both hope to use past outcomes to improve their strategies over time. Most previous work focuses on the seller's perspective and assumes that the buyers' private valuations for the goods are drawn from some unknown distribution. We consider instead the situation from the perspective of a bidder who may know nothing about the value of a good until it is purchased. We introduce and analyze several bidding policies and prove minimax lower bounds. Joint work with Vianney Perchet and Philippe Rigollet.
Gideon Weiss, University of Haifa, IsraelConsider types C={c1,c2,. . .,cI}, and S={s1,s2,. . . ,sJ}, and a bipartite compatibility graph between C and S. Given an infinite sequence of i.i.d. c's and an infinite sequence of i.i.d. s's, we consider matching the two sequences according to the compatibility graph, where we use a first come first served policy (FCFS). This model, which appears to be quite complicated at first glance turns out to be extremely tractable, yielding results on uniqueness, reversibility, and stationary probabilities. The model is helpful in analyzing some complex applied probability models. We will discuss its applications to the analysis of organ transplants and to the analysis of skill based service in call centers. Joint work with: Ed Kaplan, Rene Caldentey, Ivo Adan, Marko Boon, Ana Busic, Jean Mairesse, Hanqin Zhang and Yuval Nov.
Ilze Ziedins, University of Auckland, New ZealandQueueing models of networks with selfish routing have often been used to study congestion effects in road and communication networks. Probabilistic routing corresponds to a static model, where users only have knowledge of expected delays, whereas under state-dependent routing users have information about the current state of the system. In some queueing systems, giving extra information to users can substantially reduce the adverse effects of selfish routing on system performance. In this overview talk, we will examine probabilistic (less information) and state-dependent (more information) selfish routing schemes in two kinds of network. One consists of a mix of first in first out single server queues, and batch service queues, a model which is sometimes used as a simple representation of private vs. public transport. The second network has a collection of processor sharing queues. For this second network, interesting questions also arise of existence of one or more equilibria, and then also convergence to those equilibria, under state-dependent routing. We discuss also two different methods for identifying equilibria under state-dependent routing. This is joint work with Heti Afimeimounga, Lisa Chen, Wiremu Solomon, Mark Holmes and, more recently, Niffe Hermansson and Alex Wang.
Bruno Ziliotto, Université Paris Dauphine, FranceWe investigate zero-sum stochastic games with long duration, with possibly infinite state space and action sets. An earlier paper (Z.15) shows that the value v(lambda) of the lambda-discounted game converges uniformly as lambda goes to 0 if and only if the value of the n-stage game converges uniformly as n goes to infinity. This result is generalized in the following way : given a sequence of weights theta summing to 1, we consider the value v(theta) of the stochastic game in which the total payoff is the weighted sum of the stage payoffs. We provide conditions on theta under which the uniform convergence of v(lambda) as lambda goes to 0 implies the uniform convergence of v(theta), as the weights go to 0 (but still sum to 1). We also provide an example which shows that these conditions are sharp. |
||