BELIEVABLE PROXIES OF HUMAN BEHAVIOR CAN EMPOWER INTERACTIVE APPLICATIONS RANGING FROM IMMERSIVE ENVIRONMENTS TO REHEARSAL SPACES FOR INTERPERSONAL COMMUNICATION TO PROTOTYPING TOOLS.

to download the project base paper

Abstract:
Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In
this paper, we introduce generative agents: computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).
UIST ’23, October 29-November 1, 2023, San Francisco, CA, USA © 2023 Copyright held by the owner/author(s). ACM ISBN 979-8-4007-0132-0/23/10. https://doi.org/10.1145/3586183.3606763
authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent’s experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them
dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty-five
agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors. For example, starting with only a single user-specified notion that one agent wants to throw a Valentine’s Day party, the agents autonomously spread invitations to the party over the next two arXiv:2304.03442v2 [cs.HC] 6 Aug 2023 UIST ’23, October 29-November 1, 2023, San Francisco, CA, USA J.S. Park, J.C. O’Brien, C.J. Cai, M.R. Morris, P. Liang, M.S. Bernstein days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture—observation, planning, and reflection—each contribute critically to the believability of agent behavior. By fusing large language models with computational interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.
CCS Concepts:

Human-centered computing → Interactive systems and
tools; • Computing methodologies → Natural language processing.

Keywords:
Human-AI interaction, agents, generative AI, large language models.

Introduction:

How might we craft an interactive artificial society that reflects believable human behavior? From sandbox games such as The Sims to applications such as cognitive models [23] and virtual environments [10, 59], for over four decades, researchers and practitioners have envisioned computational agents that can serve as believable proxies of human behavior. In these visions, computationallypowered agents act consistently with their past experiences and react believably to their environments. Such simulations of human behavior could populate virtual spaces and communities with realistic social phenomena [27, 80], train people on how to handle rare yet difficult interpersonal situations [44, 52, 94], test social
science theories [12, 46], craft model human processors for theory and usability testing [23, 39, 51], power ubiquitous computing applications [31] and social robots [10, 14], and underpin non-playable
game characters [59, 85] that can navigate complex human relationships in an open world.
However, the space of human behavior is vast and complex [85,108]. Despite striking progress in large language models [18] that can simulate human behavior at a single time point [39, 80], fully
general agents that ensure long-term coherence would be better suited by architectures that manage constantly-growing memories as new interactions, conflicts, and events arise and fade over time
while handling cascading social dynamics that unfold between multiple agents. Success requires an approach that can retrieve relevant events and interactions over a long period, reflect on those
memories to generalize and draw higher-level inferences, and apply that reasoning to create plans and reactions that make sense in the moment and in the longer-term arc of the agent’s behavior.
In this paper, we introduce generative agents—agents that draw on generative models to simulate believable human behavior—and demonstrate that they produce believable simulacra of both individual and emergent group behavior. Generative agents draw a wide variety of inferences about themselves, other agents, and their environment; they create daily plans that reflect their characteristics and experiences, act out those plans, react, and re-plan when appropriate; they respond when the end user changes their environment or commands them in natural language. For instance,
generative agents turn off the stove when they see that their breakfast is burning, wait outside the bathroom if it is occupied, and stop to chat when they meet another agent they want to talk to.1
A society full of generative agents is marked by emergent social dynamics where new relationships are formed, information diffuses, and coordination arises across agents. To enable generative agents, we describe an agent architecture that stores, synthesizes, and applies relevant memories to generate
believable behavior using a large language model. Our architecture comprises three main components. The first is the memory stream, a long-term memory module that records, in natural language, a
comprehensive list of the agent’s experiences. A memory retrieval model combines relevance, recency, and importance to surface the records needed to inform the agent’s moment-to-moment behavior.
The second is reflection, which synthesizes memories into higherlevel inferences over time, enabling the agent to draw conclusions about itself and others to better guide its behavior. The third is
planning, which translates those conclusions and the current environment into high-level action plans and then recursively into detailed behaviors for action and reaction. These reflections and
plans are fed back into the memory stream to influence the agent’s future behavior. This architecture suggests applications in multiple domains, from role-play and social prototyping to virtual worlds and games. In social role-play scenarios (e.g., interview preparation), a user could safely rehearse difficult, conflict-laden conversations. When prototyping social platforms, a designer could go beyond temporary personas to prototype dynamic, complex interactions that unfold over time. For this paper, we focus on the ability to create a small, interactive society of agents inspired by games such as The Sims.2
By connecting our architecture to the ChatGPT large language model [77], we manifest a society of twenty-five agents in a game environment. End users can observe and interact with these agents.
If an end user or developer wanted the town to host an in-game Valentine’s Day party, for example, traditional game environments would require scripting tens of characters’ behavior manually. We
demonstrate that, with generative agents, it is sufficient to simply tell one agent that she wants to throw a party. Despite many potential points of failure—the party planner must remember to invite other agents to the party, attendees must remember the invitation, those who remember must decide to actually show up, and more— our agents succeed. They spread the word about the party and then show up, with one agent even asking another on a date to the party, all from a single user-generated seed suggestion.
We conducted two evaluations of generative agents: a controlled evaluation to test whether the agents produce believable individual behaviors in isolation, and an end-to-end evaluation where the
agents interacted with each other in open-ended ways over two days of game time to understand their stability and emergent social behaviors. In the technical evaluation, we leverage a methodological opportunity to evaluate an agent’s knowledge and behavior by “interviewing” it in natural language to probe the agents’ ability to stay in character, remember, plan, react, and reflect accurately. We
compared several ablations that limit agents’ access to memory, reflection, and planning. We observe that each of these components is critical to strong performance across these interview tasks. Across the technical and end-to-end evaluation, the most common errors arose when the agent failed to retrieve relevant memories, fabricated embellishments to the agent’s memory, or inherited overly formal speech or behavior from the language model. In sum, this paper makes the following contributions:

  • Generative agents, believable simulacra of human behavior
    that are dynamically conditioned on agents’ changing experiences and environment.
  • A novel architecture that makes it possible for generative
    agents to remember, retrieve, reflect, interact with other
    agents, and plan through dynamically evolving circumstances.
    The architecture leverages the powerful prompting capabilities of large language models and supplements those capabilities to support longer-term agent coherence, the ability
    to manage dynamically evolving memory, and recursively
    produce higher-level reflections.
  • Two evaluations, a controlled evaluation and an end-to-end
    evaluation, that establish causal effects of the importance
    of components of the architecture, as well as identify breakdowns arising from, e.g., improper memory retrieval.
  • Discussion of the opportunities and ethical and societal risks
    of generative agents in interactive systems. We argue that
    these agents should be tuned to mitigate the risk of users
    forming parasocial relationships, logged to mitigate risks
    stemming from deepfakes and tailored persuasion, and applied in ways that complement rather than replace human
    stakeholders in design processes.

Image:

Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *