PAPERS
[dblp]
❈ indicates alphabetical author order
2024
- ArxivAre Bounded Contracts Learnable and Approximately Optimal?❈ Yurong Chen, Zhaohua Chen, Xiaotie Deng, and Zhiyi Huang2024
This paper considers the hidden-action model of the principal-agent problem, in which a principal incentivizes an agent to work on a project using a contract. We investigate whether contracts with bounded payments are learnable and approximately optimal. Our main results are two learning algorithms that can find a nearly optimal bounded contract using a polynomial number of queries, under two standard assumptions in the literature: a costlier action for the agent leads to a better outcome distribution for the principal, and the agent’s cost/effort has diminishing returns. Our polynomial query complexity upper bound shows that standard assumptions are sufficient for achieving an exponential improvement upon the known lower bound for general instances. Unlike the existing algorithms which relied on discretizing the contract space, our algorithms directly learn the underlying outcome distributions. As for the approximate optimality of bounded contracts, we find that they could be far from optimal in terms of multiplicative or additive approximation, but satisfy a notion of mixed approximation.
- ArxivMechanism Design for LLM Fine-tuning with Multiple Reward ModelsHaoran Sun, Yurong Chen, Siwei Wang, Wei Chen, and Xiaotie Deng2024
Fine-tuning large language models (LLMs) to aggregate multiple preferences has attracted considerable research attention. With aggregation algorithms advancing, a potential economic scenario arises where fine-tuning services are provided to agents with different preferences. In this context, agents may benefit from strategically misreporting their preferences, which could affect the fine-tuned outcomes. This paper addresses such incentive issues by framing it as a mechanism design problem: an LLM provider determines the fine-tuning objective (training rule) and the pricing scheme (payment rule) for agents. SW-Max training rules. Firstly, we show that under most circumstances, truthful reporting is sub-optimal with simply a training rule, thereby highlighting the necessity of payments. Secondly, we design affine maximizer payment rules that implement SW-Max training rules in dominant-strategy incentive compatibility (DSIC). We characterize sufficient conditions for payment equivalence properties. For a training rule that satisfies these conditions, we have found all the payment rules that implement it in DSIC, as they only differ by a constant term irrelevant to agents’ reports from each other. Thirdly, we demonstrate that our mechanism is approximately DSIC even with perturbed input, showcasing its robustness against the inevitable errors in real-world applications. Experiments on real LLM setups further confirm the practical implications of our results.
2023
- NeurIPsA Scalable Neural Network for DSIC Affine Maximizer Auction DesignZhijian Duan, Haoran Sun, Yurong Chen, and Xiaotie DengIn Advances in Neural Information Processing Systems 2023
Automated auction design aims to find empirically high-revenue mechanisms through machine learning. Existing works on multi item auction scenarios can be roughly divided into RegretNet-like and affine maximizer auctions (AMAs) approaches. However, the former cannot strictly ensure dominant strategy incentive compatibility (DSIC), while the latter faces scalability issue due to the large number of allocation candidates. To address these limitations, we propose AMenuNet, a scalable neural network that constructs the AMA parameters (even including the allocation menu) from bidder and item representations. AMenuNet is always DSIC and individually rational (IR) due to the properties of AMAs, and it enhances scalability by generating candidate allocations through a neural network. Additionally, AMenuNet is permutation equivariant, and its number of parameters is independent of auction scale. We conduct extensive experiments to demonstrate that AMenuNet outperforms strong baselines in both contextual and non-contextual multi-item auctions, scales well to larger auctions, generalizes well to different settings, and identifies useful deterministic allocations. Overall, our proposed approach offers an effective solution to automated DSIC auction design, with improved scalability and strong revenue performance in various settings.
- KDDLearning-Based Ad Auction Design with Externalities: The Framework and A Matching-Based ApproachNingyuan Li, Yunxuan Ma, Yang Zhao, Zhijian Duan, Yurong Chen, Zhilin Zhang, Jian Xu, Bo Zheng, and Xiaotie DengIn Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2023
Learning-based ad auctions have increasingly been adopted in online advertising. However, existing approaches neglect externalities, such as the interaction between ads and organic items. In this paper, we propose a general framework, namely Score-Weighted VCG, for designing learning-based ad auctions that account for externalities. The framework decomposes the optimal auction design into two parts: designing a monotone score function and an allocation algorithm, which facilitates data-driven implementation. Theoretical results demonstrate that this framework produces the optimal incentive-compatible and individually rational ad auction under various externality-aware CTR models while being data-efficient and robust. Moreover, we present an approach to implement the proposed framework with a matching-based allocation algorithm. Experiment results on both real-world and synthetic data illustrate the effectiveness of the proposed approach.
- ICMLCoordinated Dynamic Bidding in Repeated Second-Price Auctions with BudgetsYurong Chen, Qian Wang, Zhijian Duan, Haoran Sun, Zhaohua Chen, Xiang Yan, and Xiaotie DengIn Proceedings of the 40th International Conference on Machine Learning 23–29 jul 2023
In online ad markets, a rising number of advertisers are employing bidding agencies to participate in ad auctions. These agencies are specialized in designing online algorithms and bidding on behalf of their clients. Typically, an agency usually has information on multiple advertisers, so she can potentially coordinate bids to help her clients achieve higher utilities than those under independent bidding. In this paper, we study coordinated online bidding algorithms in repeated second-price auctions with budgets. We propose algorithms that guarantee every client a higher utility than the best she can get under independent bidding. We show that these algorithms achieve maximal social welfare and discuss bidders’ incentives to misreport their budgets, in symmetric cases. Our proofs combine the techniques of online learning and equilibrium analysis, overcoming the difficulty of competing with a multi-dimensional benchmark. The performance of our algorithms is further evaluated by experiments on both synthetic and real data. To the best of our knowledge, we are the first to consider bidder coordination in online repeated auctions with constraints.
- ArxivLearning to Manipulate a Commitment Optimizer❈ Yurong Chen, Xiaotie Deng, Jiarui Gan, and Yuhao Li23–29 jul 2023
We consider a Stackelberg scenario where the leader commits optimally based on the follower’s type (i.e., the follower’s payoff function). Despite its rationality, such commitmentoptimizing behavior inadvertently reveals information about the leader’s incentive, especially when one gets access to the leader’s optimal commitments against different follower types. In this paper, we study to what extent one can learn about the leader’s payoff information by actively querying the leader’s optimal commitments. We show that, by using polynomially many queries and operations, a learner can learn a payoff function that is strategically equivalent to the leader’s original payoff function, in the sense that it preserves: 1) the leader’s preference over fairly broad sets of strategy profiles and 2) the set of all possible (strong) Stackelberg equilibria the leader may engage in, considering all possible follower types. As an application, we show that a follower can use the learned information to induce an optimal Stackelberg equilibrium (w.r.t. the follower’s payoff) by imitating a different type, without knowing the leader’s payoff function beforehand. To the best of our knowledge, we are the first to extend this equilibrium inducing problem to the incomplete information setting
2022
- WINE Best Student PaperOptimal Private Payoff Manipulation Against Commitment in Extensive-form Games❈ Yurong Chen, Xiaotie Deng, and Yuhao LiIn Web and Internet Economics 23–29 jul 2022
To take advantage of strategy commitment, a useful tactic of playing games, a leader must learn enough information about the follower’s payoff function. However, this leaves the follower a chance to provide fake information and influence the final game outcome. Through a carefully contrived payoff function misreported to the learning leader, the follower may induce an outcome that benefits him more, compared to the ones when he truthfully behaves. We study the follower’s optimal manipulation via such strategic behaviors in extensive-form games. Followers’ different attitudes are taken into account. An optimistic follower maximizes his true utility among all game outcomes that can be induced by some payoff function. A pessimistic follower only considers misreporting payoff functions that induce a unique game outcome. For all the settings considered in this paper, we characterize all the possible game outcomes that can be induced successfully. We show that it is polynomial-time tractable for the follower to find the optimal way of misreporting his private payoff information. Our work completely resolves this follower’s optimal manipulation problem on an extensive-form game tree.
- IJCAIOn the Convergence of Fictitious Play: A Decomposition ApproachYurong Chen, Xiaotie Deng, Chenchen Li, David Mguni, Jun Wang, Xiang Yan, and Yaodong YangIn Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022 23–29 jul 2022
Fictitious play (FP) is one of the most fundamental game-theoretical learning frameworks for computing Nash equilibrium in n-player games, which builds the foundation for modern multi-agent learning algorithms. Although FP has provable convergence guarantees on zero-sum games and potential games, many real-world problems are often a mixture of both and the convergence property of FP has not been fully studied yet. In this paper, we extend the convergence results of FP to the combinations of such games and beyond. Specifically, we derive new conditions for FP to converge by leveraging game decomposition techniques. We further develop a linear relationship unifying cooperation and competition in the sense that these two classes of games are mutually transferable. Finally, we analyze a non-convergent example of FP, the Shapley game, and develop sufficient conditions for FP to converge.
2021
- AAMASThe Tight Bound for Pure Price of Anarchy in an Extended Miner’s Dilemma GameQian Wang, and Yurong ChenIn AAMAS ’21: 20th International Conference on Autonomous Agents and Multiagent Systems, Virtual Event, United Kingdom, May 3-7, 2021 23–29 jul 2021
Pool block withholding attack is performed among mining pools in digital cryptocurrencies, such as Bitcoin. Instead of mining honestly, pools can be incentivized to infiltrate their own miners into other pools. These infiltrators report partial solutions but withhold full solutions, share block rewards but make no contribution to block mining. The block withholding attack among mining pools can be modeled as a non-cooperative game called "the miner’s dilemm", which reduces effective mining power in the system and leads to potential systemic instability in the blockchain. However, existing literature on the game-theoretic properties of this attack only gives a preliminary analysis, e.g., an upper bound of 3 for the pure price of anarchy (PPoA) in this game, with two pools involved and no miner betraying. Pure price of anarchy is a measurement of how much mining power is wasted in the miner’s dilemma game. Further tightening its upper bound will bring us more insight into the structure of this game, so as to design mechanisms to reduce the systemic loss caused by mutual attacks. In this paper, we give a tight bound of (1, 2] for the pure price of anarchy. Moreover, we show the tight bound holds in a more general setting, in which infiltrators may betray.We also prove the existence and uniqueness of pure Nash equilibrium in this setting. Inspired by experiments on the game among three mining pools, we conjecture that similar results hold in the N-player miner’s dilemma game (N>=2).