Summary: The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies
Authors: Stephan Zheng, Alexander Trott, Sunil Srinivasa, Nikhil Naik, Melvin Gruesbeck, David C. Parkes and Richard Socher
The summary below was automatically created by Flexudy. Feel free to download the app from your play store.
Have fun reading
In this work, we train social planners that discover tax policies in dynamic economies that can effectively trade-off economic equality and productivity. We propose a two-level deep reinforcement learning approach to learn dynamic tax policies, based on economic simulations in which both agents and a government learn and adapt. First, we present an economic simulation environment that features competitive pressures and market dynamics. We validate the simulation by showing that baseline tax systems perform in a way that is consistent with economic theory, including in regard to learned agent behaviors and specializations. Second, we show that AI-driven tax policies improve the trade-off between equality and productivity by 16% over baseline policies, including the prominent Saez tax framework. Third, we showcase several emergent features: AI-driven tax policies are qualitatively different from baselines, setting a higher top tax rate and higher net subsidies for low incomes. Moreover, AI-driven tax policies perform strongly in the face of emergent tax-gaming strategies learned by AI agents. Lastly, AI-driven tax policies are also effective when used in experiments with human participants. In experiments conducted on MTurk, an AI tax policy provides an equality-productivity trade-off that is similar to that provided by the Saez framework along with higher inverse-income weighted social welfare. D.P. drafted the manuscript; Kathy Baxter drafted the ethical review; R.S. planned and advised the work, and analyzed all results; all authors discussed the results and commented on the manuscript. Introduction Economic inequality is accelerating globally and is a key social and economic concern. Many studies have shown that large income inequality gaps can have significant negative effects, leading for example to diminished economic opportunity [United Nations, 2013] and adverse health effects [Subramanian and Kawachi, In this light, tax policy provides governments with an important tool to reduce inequality, supporting the possibility of the redistribution of wealth through government provided services and benefits. The basic reason is that while more taxation can improve equality, taxation can also discourage people from working, leading to lower productivity. Our work adopts baselines from optimal taxation theory, by comparing the performance of the AI Economist with tax policies that arise from the Saez framework, in this case, making use of estimated labor elasticities in our simulated economies. Another line of work explores the sample complexity of the problem of learning an optimal auction, typically focusing on simpler settings [Cole and Roughgarden, 2014, Morgenstern and Roughgarden, 2015, Balcan et al., 2016, Gonczarowski and Weinberg, 2018]. Earlier work studied the use of machine learning for the design of voting rules [Procaccia et al., 2009] and for matching and assignment problems [Narasimhan et al., 2016, Narasimhan and Parkes, 2016]. , 2019]. Two-stage problems also arise in multi-agent problems where the behavior of some agents is optimized in order to improve the overall system behavior [Dimitrakakis et al., 2017, Carroll et al., 2019, Tylkin et al. Agents can trade resources. In effect, if one agent learns a useful new behavior for some part of the state space then this becomes available to another agent. At the same time, agent behaviors remain heterogeneous because they have different observations and hidden states. We introduce the Gather-and-Build game, a two-dimensional grid-world in which agents can move to collect resources, earn coins by using the resources of stone and wood to build houses, and trade with other agents to exchange resources for coins. Stone and wood stochastically spawn on special resource regeneration tiles. Agents can move around the environment to gather these resources from populated resource tiles that remain empty after harvesting until some new resources spawn. Agents can choose to use one unit of wood and one unit of stone to construct a house, and this places a house tile at the agent The number of coins earned per house depends on the skill of an agent, and skill is different across agents. In addition, agents start at different initial locations in the world. These heterogeneities are the main driver of both economic inequality and specialization in our environment. Agents can also trade resources, by submitting the number of coins they are willing to accept (an ask) or are willing to pay (a bid), respectively, to an open market, for each of wood and stone. We provide a detailed description of the environment and its underlying dynamics in the appendix (Section A). Over the course of an episode (a single play out of the environment), agents accumulate labor cost, which reflects the amount of effort associated with the actions taken by the agent. Each type of action (moving, gathering, trading, and building) is associated with a specific labor cost. Each time an agent performs one of these actions, its accumulated labor is incremented by the action A conceptual view of the trade-off between productivity and equality for different tax policies is illustrated in Figure 5. Here, the notion of optimality implies that a tax policy realizes a trade-off between equality and productivity along the Pareto boundary linking these two extremes. To allow comparison across different schemes, we adopt income brackets for describing a tax schedule, imitating the US federal taxation scheme. The social planner sets the tax schedule T(z) by choosing the marginal tax rate τ ∈ [0, 1]B to be applied within each bracket. The objective of optimal tax theory is described through a social welfare function swf. Social welfare can be expressed in many ways. One approach considers the trade-off between income equality and productivity. this wealth defined as the cumulative number of coins owned by an agent after taxation and redistribution. (8) Given this, eq = 1 implies perfect equality (all endowments of money are identical), while eq = 0means perfect inequality (one agent owns all money). We write eqt(xct ) and prodt(xct ) to denote the equality and productivity, respectively, based on the cumulative endowment xct up to time The primary social welfare function that we consider in this work optimizes a trade-off between equality and productivity, defined as the product of equality and productivity: swft(xct ) Another family of social welfare functions, and one that receives attention in the optimal taxation theory, is the family of linear-weighted sums of agent utilities, defined for weights ωi ≥ 0: swft(xct , lt) Inverse income-weighted: ωi = 1/xci,t , which preferences the agents with lower endowments over those with higher endowments. A key benefit of our framework is that it is compatible with any social welfare function. For the purposes of comparing the performance of the AI Economist and other tax frameworks, we also adopt a variation on the second family of social welfare functions, where we adopt inverse income-weighted weights and consider agents’ cumulative endowment at the end of an episode. In the inner loop, RL agents gain experience by performing labor, receiving income, and paying taxes, and learn through balancing exploration and exploitation how to adapt their behavior to maximize their utility. Given a fixed tax policy, this is a standard RL problem in which agents iteratively explore and discover which behaviors are optimal for their fixed utility function, while observing the active tax schedule. In this way, G(z) represents how much the social welfare function weights the income above threshold z. Effective agent behaviors can be hard to learn due to this kind of noisy feedback from an unconstrained, suboptimal planner that generates random tax rates. All tax models control the marginal tax rates applied to each of seven income brackets (see Figure 9, which illustrates the average bracket rate set by each model). We set up the economic simulation such that the fraction of agent incomes per income bracket are in rough alignment with those in the US economy7. The 2018 US Federal tax rates are progressive, with a marginal tax rate that increases with higher income. For the present setting, and with the social welfare objective that we adopt, the Saez tax framework mostly sets a regressive tax schedule, with a marginal tax rate that decreases with higher income. The AI Economist features a more idiosyncratic structure, with a blend of progressive and regressive tax schedules. In particular, it sets a higher top tax rate (on income above 510), a lower tax rate for incomes between 160 and 510, and both higher and lower tax rates on incomes below 160. tax schedule provides higher subsidies to low income agents than the baselines. The agents have different skill levels, and the learned behaviors, incomes, and amount of tax paid all depend heavily on skill. Figure 11 presents the agent-by-agent averages after sorting 6We also find in our experiments that total redistribution (such that all workers have the same income after redistribution) yields perfectly equal but highly unproductive economies and very low equality-vs-productivity trade-offs. Income before redistribution (top-left) shows the average pre-tax income earned by each kind of agent. The amount of tax paid before distribution is shown in the bottom left. The amount of tax paid after redistribution is shown in the bottom right (the lower skill agents receive a net subsidy). The income after redistribution (top-right) shows the net average coin per agent at the end of the episode (the lower-skilled agents have higher net income under the AI Economist’s tax scheme). by skill. We see this kind of tax avoidance behavior in our experiments for both the Saez and AI Economist models, which feature lower top tax rates (regressive schedules), making it more tax-efficient to earn high incomes. EachHIT consists of a sequence of four episodes, with a tutorial before each episode, and a post-episode survey. Each group went through a sequence of four episodes, with each episode corresponding to a different tax policy (free market, US federal, Saez, and AI), these applied in random order to control for learning effects. We observed large variance in productivity across episodes, which can be attributed to adversarial behavior and other factors that we discuss below. The \”Camelback\” model used in experi- ments with human participants. It features higher tax rates for incomes between 39 and 160 Coins com- pared to baselines. The effective taxes payable as a function of income under the \”Camelback\” schedule. This evaluation objective places more weight on agents with lower endowments than those with higher endowments, considering agent endowments at the end of an episode, and thus the cumulative effect of tax policy over a sequence of ten tax periods.9 9This objective is related to the choice we make about the tax policy objective when instantiating the Saez framework, while deviating in a couple of important ways. First, the Saez framework considers economies with a single tax period and does not consider the effect of taxation policy on the cumulative endowment. : Social outcomes with 58 human participants in 51 episodes (first batch episodes with productivity of at least 1000 Coin). The AI Economist achieves competitive equality-productivity trade-offs with Saez and US Federal, and statistically significantly outperforms the free market (at p = 0.05). : Weighted average social wel- fare with 51 human participants in 60 episodes (second batch episodes with pro- ductivity of at least 1000 Coin). TheAI Economist achieves significantly higher weighted social welfare than all baselines (statistically significant at p = 0.05). We can see that the \”Camelback\” tax schedule significantly outperforms all baselines for this social welfare objective. Overall, the relative performance of the AI Economist compared with the various baselines is similar for the experiments with AI agents and the experiments with human participants. The AI-driven tax model did not require knowledge of economic theory, did not require that we estimate the tax elasticity of labor, and was nevertheless able to learn a well-performing tax policy for use with human participants tabula rasa. Moreover, given that the AI tax policy, which is dynamic in that its tax schedule changes across tax periods, substantially outperforms the Saez formula in the AI simulations, an interesting direction for future research is to develop experiments that can inform ways with which dynamic tax models can be applied to human settings. Multi-agent Reinforcement Learning in Sequential Social Dilemmas. Deep learning for revenue-optimal auctions with budgets. If an agent moves to a cell that contains a resource, that resource is added to the agent’s inventory and removed from the world at that location. At the start of each timestep, resources randomly re-spawn at empty source cells according to the regeneration probability. The state of the world is represented as a H ×W ×C tensor, where H andW are the size of the world and C is the number of unique entities that may occupy a cell, and the value of a given element indicates that a particular entity is occupying the associated location. The social planner is able to observe the full world state tensor, while agent observations are restricted to views of the state tensor from a narrower, egocentric spatial window. Agent spatial observations are padded as needed when their observation window extends beyond the world grid. Agents must navigate the world in order to collect resources and build new houses. The action space of the agents includes 4 actions for moving up, down, left, and right. Agents are restricted from moving on top of water cells, cells occupied by other agents, and cells containing houses built by other agents. An agent collects resources by moving itself on top of a resource-populated source cell. Building places a house at the location occupied by the agent and adds coin to the agent’s inventory, the amount of which is determined by its building skill. On the first timestep of each tax period, the planner sets the marginal tax rates that will be used to collect taxes when the tax period ends. The main experiment’s graphical user interface (Figure 20) showed (from top to bottom): the endowment of the agent, the remaining time in the episode, the bonus amount earned so far, the spatial state of the world, the tax information and current tax rate, the time left, the number of houses built and the number of profitable houses left to build.
Did you enjoy reading? Follow us on Medium and give us feedback to help us improve our Summarizer.