PAPERS | Yurong Chen 陈昱蓉

Journal

2024

Games Econ. Behav.
Optimal Private Payoff Manipulation against Commitment in Extensive-form Games

(αβ) Yurong Chen , Xiaotie Deng, and Yuhao Li

Games and Economic Behavior, 2024

A preliminary version of this work was presented at WINE 2022, where it received the Best Student Paper Award 🏆.

Abs DOI Bib

Stackelberg equilibrium describes the optimal strategies of a player, when she (the leader) first credibly commits to a strategy. Her opponent (the follower) will best respond to her commitment. To compute the optimal commitment, a leader must learn enough follower’s payoff information. The follower can then potentially provide fake information, to induce a different final game outcome that benefits him more than when he truthfully behaves. We study such follower’s manipulation in extensive-form games. For all four settings considered, we characterize all the inducible game outcomes. We show the polynomial-time tractability of finding the optimal payoff function to misreport. We compare the follower’s optimal attainable utilities among different settings, with the true game fixed. In particular, one comparison shows that the follower gets no less when the leader’s strategy space expands from pure strategies to behavioral strategies. Our work completely resolves this follower’s optimal manipulation problem on extensive-form game trees.
@article{chen2024optimal, title = {Optimal Private Payoff Manipulation against Commitment in Extensive-form Games}, journal = {Games and Economic Behavior}, year = {2024}, issn = {0899-8256}, doi = {https://doi.org/10.1016/j.geb.2024.11.008}, url = {https://www.sciencedirect.com/science/article/pii/S0899825624001647}, author = {Chen, Yurong and Deng, Xiaotie and Li, Yuhao}, keywords = {Stackelberg equilibrium, Strategic behavior, Private information manipulation, Extensive-form games}, note = {A preliminary version of this work was presented at <b>WINE 2022</b>, where it received the <b>Best Student Paper</b> Award 🏆. }, }

Conference

2024

ACM EC
Are Bounded Contracts Learnable and Approximately Optimal?

(αβ) Yurong Chen , Zhaohua Chen, Xiaotie Deng, and Zhiyi Huang

In Proceedings of the 25th ACM Conference on Economics and Computation, New Haven, CT, USA, 2024

Abs DOI arXiv Bib

This paper considers the hidden-action model of the principal-agent problem, in which a principal incentivizes an agent to work on a project using a contract. We investigate whether contracts with bounded payments are learnable and approximately optimal. Our main results are two learning algorithms that can find a nearly optimal bounded contract using a polynomial number of queries, under two standard assumptions in the literature: a costlier action for the agent leads to a better outcome distribution for the principal, and the agent’s cost/effort has diminishing returns. Our polynomial query complexity upper bound shows that standard assumptions are sufficient for achieving an exponential improvement upon the known lower bound for general instances. Unlike the existing algorithms which relied on discretizing the contract space, our algorithms directly learn the underlying outcome distributions. As for the approximate optimality of bounded contracts, we find that they could be far from optimal in terms of multiplicative or additive approximation, but satisfy a notion of mixed approximation.
@inproceedings{chen2024bounded, author = {Chen, Yurong and Chen, Zhaohua and Deng, Xiaotie and Huang, Zhiyi}, year = {2024}, isbn = {9798400707049}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3670865.3673483}, doi = {10.1145/3670865.3673483}, booktitle = {Proceedings of the 25th ACM Conference on Economics and Computation}, pages = {315–-344}, numpages = {30}, keywords = {contract theory, query complexity}, location = {New Haven, CT, USA}, series = {EC '24}, }

2023

NeurIPs
A Scalable Neural Network for DSIC Affine Maximizer Auction Design

Zhijian Duan, Haoran Sun , Yurong Chen , and Xiaotie Deng

In Advances in Neural Information Processing Systems, 2023

Abs arXiv Bib

Automated auction design aims to find empirically high-revenue mechanisms through machine learning. Existing works on multi item auction scenarios can be roughly divided into RegretNet-like and affine maximizer auctions (AMAs) approaches. However, the former cannot strictly ensure dominant strategy incentive compatibility (DSIC), while the latter faces scalability issue due to the large number of allocation candidates. To address these limitations, we propose AMenuNet, a scalable neural network that constructs the AMA parameters (even including the allocation menu) from bidder and item representations. AMenuNet is always DSIC and individually rational (IR) due to the properties of AMAs, and it enhances scalability by generating candidate allocations through a neural network. Additionally, AMenuNet is permutation equivariant, and its number of parameters is independent of auction scale. We conduct extensive experiments to demonstrate that AMenuNet outperforms strong baselines in both contextual and non-contextual multi-item auctions, scales well to larger auctions, generalizes well to different settings, and identifies useful deterministic allocations. Overall, our proposed approach offers an effective solution to automated DSIC auction design, with improved scalability and strong revenue performance in various settings.
@inproceedings{duan2023a, author = {Duan, Zhijian and Sun, Haoran and Chen, Yurong and Deng, Xiaotie}, booktitle = {Advances in Neural Information Processing Systems}, editor = {Oh, A. and Neumann, T. and Globerson, A. and Saenko, K. and Hardt, M. and Levine, S.}, pages = {56169--56185}, publisher = {Curran Associates, Inc.}, volume = {36}, year = {2023}, }
KDD
Learning-Based Ad Auction Design with Externalities: The Framework and A Matching-Based Approach

Ningyuan Li, Yunxuan Ma, Yang Zhao, Zhijian Duan , Yurong Chen , Zhilin Zhang, Jian Xu, Bo Zheng, and Xiaotie Deng

In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 2023

Abs DOI arXiv Bib

Learning-based ad auctions have increasingly been adopted in online advertising. However, existing approaches neglect externalities, such as the interaction between ads and organic items. In this paper, we propose a general framework, namely Score-Weighted VCG, for designing learning-based ad auctions that account for externalities. The framework decomposes the optimal auction design into two parts: designing a monotone score function and an allocation algorithm, which facilitates data-driven implementation. Theoretical results demonstrate that this framework produces the optimal incentive-compatible and individually rational ad auction under various externality-aware CTR models while being data-efficient and robust. Moreover, we present an approach to implement the proposed framework with a matching-based allocation algorithm. Experiment results on both real-world and synthetic data illustrate the effectiveness of the proposed approach.
@inproceedings{li2023learningbased, author = {Li, Ningyuan and Ma, Yunxuan and Zhao, Yang and Duan, Zhijian and Chen, Yurong and Zhang, Zhilin and Xu, Jian and Zheng, Bo and Deng, Xiaotie}, title = {Learning-Based Ad Auction Design with Externalities: The Framework and A Matching-Based Approach}, year = {2023}, isbn = {9798400701030}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3580305.3599403}, doi = {10.1145/3580305.3599403}, booktitle = {Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining}, pages = {1291–1302}, numpages = {12}, keywords = {multi-slot ad auction, learning-based mechanism design, online advertising, externality}, location = {Long Beach, CA, USA}, series = {KDD '23}, }
ICML
Coordinated Dynamic Bidding in Repeated Second-Price Auctions with Budgets

Yurong Chen^* , Qian Wang^*, Zhijian Duan, Haoran Sun, Zhaohua Chen, Xiang Yan, and Xiaotie Deng

In Proceedings of the 40th International Conference on Machine Learning, 23–29 jul 2023

Abs arXiv Bib PDF Poster

In online ad markets, a rising number of advertisers are employing bidding agencies to participate in ad auctions. These agencies are specialized in designing online algorithms and bidding on behalf of their clients. Typically, an agency usually has information on multiple advertisers, so she can potentially coordinate bids to help her clients achieve higher utilities than those under independent bidding. In this paper, we study coordinated online bidding algorithms in repeated second-price auctions with budgets. We propose algorithms that guarantee every client a higher utility than the best she can get under independent bidding. We show that these algorithms achieve maximal social welfare and discuss bidders’ incentives to misreport their budgets, in symmetric cases. Our proofs combine the techniques of online learning and equilibrium analysis, overcoming the difficulty of competing with a multi-dimensional benchmark. The performance of our algorithms is further evaluated by experiments on both synthetic and real data. To the best of our knowledge, we are the first to consider bidder coordination in online repeated auctions with constraints.
@inproceedings{chen2023coordinated, title = {Coordinated Dynamic Bidding in Repeated Second-Price Auctions with Budgets}, author = {Chen*, Yurong and Wang*, Qian and Duan, Zhijian and Sun, Haoran and Chen, Zhaohua and Yan, Xiang and Deng, Xiaotie}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {5052--5086}, year = {2023}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, }

2022

WINE
🏆 Best Student Paper
Optimal Private Payoff Manipulation Against Commitment in Extensive-form Games

(αβ) Yurong Chen , Xiaotie Deng, and Yuhao Li

In Web and Internet Economics - 18th International Conference, WINE 2022, Troy, NY, USA, December 12-15, 2022, Proceedings, 23–29 jul 2022

Abs arXiv Bib

To take advantage of strategy commitment, a useful tactic of playing games, a leader must learn enough information about the follower’s payoff function. However, this leaves the follower a chance to provide fake information and influence the final game outcome. Through a carefully contrived payoff function misreported to the learning leader, the follower may induce an outcome that benefits him more, compared to the ones when he truthfully behaves. We study the follower’s optimal manipulation via such strategic behaviors in extensive-form games. Followers’ different attitudes are taken into account. An optimistic follower maximizes his true utility among all game outcomes that can be induced by some payoff function. A pessimistic follower only considers misreporting payoff functions that induce a unique game outcome. For all the settings considered in this paper, we characterize all the possible game outcomes that can be induced successfully. We show that it is polynomial-time tractable for the follower to find the optimal way of misreporting his private payoff information. Our work completely resolves this follower’s optimal manipulation problem on an extensive-form game tree.
@inproceedings{chen2022optimal, author = {Chen, Yurong and Deng, Xiaotie and Li, Yuhao}, editor = {Hansen, Kristoffer Arnsfelt and Liu, Tracy Xiao and Malekian, Azarakhsh}, title = {Optimal Private Payoff Manipulation Against Commitment in Extensive-form Games}, booktitle = {Web and Internet Economics - 18th International Conference, {WINE} 2022, Troy, NY, USA, December 12-15, 2022, Proceedings}, series = {Lecture Notes in Computer Science}, volume = {13778}, pages = {355}, publisher = {Springer}, year = {2022}, }
IJCAI
On the Convergence of Fictitious Play: A Decomposition Approach

Yurong Chen , Xiaotie Deng , Chenchen Li, David Mguni , Jun Wang, Xiang Yan, and Yaodong Yang

In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, 23–29 jul 2022

Abs DOI arXiv Bib

Fictitious play (FP) is one of the most fundamental game-theoretical learning frameworks for computing Nash equilibrium in n-player games, which builds the foundation for modern multi-agent learning algorithms. Although FP has provable convergence guarantees on zero-sum games and potential games, many real-world problems are often a mixture of both and the convergence property of FP has not been fully studied yet. In this paper, we extend the convergence results of FP to the combinations of such games and beyond. Specifically, we derive new conditions for FP to converge by leveraging game decomposition techniques. We further develop a linear relationship unifying cooperation and competition in the sense that these two classes of games are mutually transferable. Finally, we analyze a non-convergent example of FP, the Shapley game, and develop sufficient conditions for FP to converge.
@inproceedings{chen2022on, author = {Chen, Yurong and Deng, Xiaotie and Li, Chenchen and Mguni, David and Wang, Jun and Yan, Xiang and Yang, Yaodong}, editor = {Raedt, Luc De}, title = {On the Convergence of Fictitious Play: {A} Decomposition Approach}, booktitle = {Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, {IJCAI} 2022, Vienna, Austria, 23-29 July 2022}, pages = {179--185}, publisher = {ijcai.org}, year = {2022}, url = {https://doi.org/10.24963/ijcai.2022/26}, doi = {10.24963/ijcai.2022/26}, }

2021

AAMAS
The Tight Bound for Pure Price of Anarchy in an Extended Miner’s Dilemma Game

Qian Wang , and Yurong Chen

In AAMAS ’21: 20th International Conference on Autonomous Agents and Multiagent Systems, Virtual Event, United Kingdom, May 3-7, 2021, 23–29 jul 2021

Abs DOI arXiv Bib

Pool block withholding attack is performed among mining pools in digital cryptocurrencies, such as Bitcoin. Instead of mining honestly, pools can be incentivized to infiltrate their own miners into other pools. These infiltrators report partial solutions but withhold full solutions, share block rewards but make no contribution to block mining. The block withholding attack among mining pools can be modeled as a non-cooperative game called "the miner’s dilemm", which reduces effective mining power in the system and leads to potential systemic instability in the blockchain. However, existing literature on the game-theoretic properties of this attack only gives a preliminary analysis, e.g., an upper bound of 3 for the pure price of anarchy (PPoA) in this game, with two pools involved and no miner betraying. Pure price of anarchy is a measurement of how much mining power is wasted in the miner’s dilemma game. Further tightening its upper bound will bring us more insight into the structure of this game, so as to design mechanisms to reduce the systemic loss caused by mutual attacks. In this paper, we give a tight bound of (1, 2] for the pure price of anarchy. Moreover, we show the tight bound holds in a more general setting, in which infiltrators may betray.We also prove the existence and uniqueness of pure Nash equilibrium in this setting. Inspired by experiments on the game among three mining pools, we conjecture that similar results hold in the N-player miner’s dilemma game (N>=2).
@inproceedings{wang2021the, author = {Wang, Qian and Chen, Yurong}, booktitle = {AAMAS '21: 20th International Conference on Autonomous Agents and Multiagent Systems, Virtual Event, United Kingdom, May 3-7, 2021}, doi = {10.5555/3463952.3464204}, editor = {Dignum, Frank and Lomuscio, Alessio and Endriss, Ulle and Nowé, Ann}, pages = {1695--1697}, publisher = {ACM}, title = {The Tight Bound for Pure Price of Anarchy in an Extended Miner's Dilemma Game}, url = {https://dl.acm.org/doi/10.5555/3463952.3464204}, year = {2021}, }

Working Papers

2024

Arxiv
Mechanism Design for LLM Fine-tuning with Multiple Reward Models

Haoran Sun , Yurong Chen , Siwei Wang , Wei Chen, and Xiaotie Deng

2024

Abs arXiv Bib

Fine-tuning large language models (LLMs) to aggregate multiple preferences has attracted considerable research attention. With aggregation algorithms advancing, a potential economic scenario arises where fine-tuning services are provided to agents with different preferences. In this context, agents may benefit from strategically misreporting their preferences, which could affect the fine-tuned outcomes. This paper addresses such incentive issues by framing it as a mechanism design problem: an LLM provider determines the fine-tuning objective (training rule) and the pricing scheme (payment rule) for agents. SW-Max training rules. Firstly, we show that under most circumstances, truthful reporting is sub-optimal with simply a training rule, thereby highlighting the necessity of payments. Secondly, we design affine maximizer payment rules that implement SW-Max training rules in dominant-strategy incentive compatibility (DSIC). We characterize sufficient conditions for payment equivalence properties. For a training rule that satisfies these conditions, we have found all the payment rules that implement it in DSIC, as they only differ by a constant term irrelevant to agents’ reports from each other. Thirdly, we demonstrate that our mechanism is approximately DSIC even with perturbed input, showcasing its robustness against the inevitable errors in real-world applications. Experiments on real LLM setups further confirm the practical implications of our results.
@misc{sun2024mechanism, title = {Mechanism Design for LLM Fine-tuning with Multiple Reward Models}, author = {Sun, Haoran and Chen, Yurong and Wang, Siwei and Chen, Wei and Deng, Xiaotie}, year = {2024}, eprint = {2405.16276}, archiveprefix = {arXiv}, primaryclass = {cs.GT}, }

2023

Arxiv
Learning to Manipulate a Commitment Optimizer

(αβ) Yurong Chen , Xiaotie Deng, Jiarui Gan, and Yuhao Li

2023

Abs arXiv Bib

We consider a Stackelberg scenario where the leader commits optimally based on the follower’s type (i.e., the follower’s payoff function). Despite its rationality, such commitmentoptimizing behavior inadvertently reveals information about the leader’s incentive, especially when one gets access to the leader’s optimal commitments against different follower types. In this paper, we study to what extent one can learn about the leader’s payoff information by actively querying the leader’s optimal commitments. We show that, by using polynomially many queries and operations, a learner can learn a payoff function that is strategically equivalent to the leader’s original payoff function, in the sense that it preserves: 1) the leader’s preference over fairly broad sets of strategy profiles and 2) the set of all possible (strong) Stackelberg equilibria the leader may engage in, considering all possible follower types. As an application, we show that a follower can use the learned information to induce an optimal Stackelberg equilibrium (w.r.t. the follower’s payoff) by imitating a different type, without knowing the leader’s payoff function beforehand. To the best of our knowledge, we are the first to extend this equilibrium inducing problem to the incomplete information setting
@misc{chen2023learning, title = {Learning to Manipulate a Commitment Optimizer}, author = {Chen, Yurong and Deng, Xiaotie and Gan, Jiarui and Li, Yuhao}, year = {2023}, eprint = {2302.11829}, archiveprefix = {arXiv}, primaryclass = {cs.GT}, }