Most disaster operations require responder teams to plan and conduct geographically distributed tasks (eg, digging out casualties or transporting civilians) with limited resources and personnel—a timely response may be critical to save lives.1
Deciding when and how to use available resources in such a setting can be described as a “distributed resource allocation problem under temporal constraints”2; to that end, multi‐agent task allocation algorithms have been devised and tested in computational simulations of such tasks.2-4 These algorithms can be used to build automated planning agents that can perform complex calculations much faster than humans (eg, computing paths and optimising team configurations). However, these algorithms necessarily depend on abstracted models of the environment and human behaviour, which might lead to task allocations that are flawed in practice, owing to the contingent nature of situated action.5
We might conjecture that a human coordinator working together with the planning agent could notice and help to deal with such emergent problems. One way in which this working together might be achieved is by placing a human coordinator “in‐the‐loop” between the planning algorithm and the human responders in the physical world. A variation of in‐the‐loop is “on‐the‐loop,” in which the role of the human coordinator is less involved, perhaps best described as that of a supervisor, rather than a deciding authority. Our work studies such interactional arrangements with the goal to enable efficient interaction and collaboration between humans and agents.
To explore the sociotechnical interactional challenges related to these human‐agent arrangements, we developed a technology probe in the form of a mixed‐reality game called AtomicOrchid.6 Mixed‐reality games bridge the physical and the digital world7; they make use of pervasive technologies such as smart phones, wireless technologies, and sensors with the aim of blending game events into a real world environment.8 They have served as a vehicle to study distributed collaborative interactions across multiple devices and ubiquitous computing environments in the wild.9 In AtomicOrchid, players in the role of field responders and headquarters (HQ) coordinators have to collaborate to save spatially distributed targets from a spreading radioactive cloud. Following an ethnomethodological orientation,10 this setting makes available the observable and reportable team interaction with and around the planning support system in a disaster scenario for direct observation of activity.
In this paper, we report on 1 field trial of an on‐the‐loop arrangement and another field trial of an in‐the‐loop arrangement. We investigate sociotechnical issues that arise in relation to automated planning support with the on‐the‐loop and the in‐the‐loop interaction design. Interaction analysis11 is conducted based on log data and video recordings of field observations, revealing how human‐agent interaction is embedded in social interaction.
We provide 3 contributions in this paper. First, we demonstrate a field trial‐driven methodology used to reveal sociotechnical issues in relation to computational planning support. Second, we present findings that suggest mixed‐initiative designs that place humans in‐the‐loop may be preferable in situations with unforeseen contingencies. Third, we identify key design lessons in relation to critical mixed‐initiative features such as common ground between agent and humans and mutual awareness in planning.
In Section 2, we review related work and our approach. We then describe the scenario and design iterations including summary results from the field trial of the base version in Section 3. We then present the field trial of the on‐the‐loop version in Section 4 and the in‐the‐loop version in Section 5. The presented episodes of interaction serve to identify and discuss a range of key issues around the themes of division of labour, planning support, and field trial‐driven development in Section 6. Finally, we conclude by summarising the lessons learnt for supporting common ground and mixed‐initiative planning for designers of distributed coordination systems in Section 7.
We briefly review how our approach builds on related work on planning in disaster response, both from the point of view of computational optimisation on the one hand and empirical studies of command‐and‐control settings and computer‐supported cooperative work (CSCW) systems that support workflow management on the other hand.
We also briefly review the relevant literature concerned with “interactive automation” at the intersection of interface agents and user‐interface design and outline how it relates to our mixed‐reality game probe to study agent‐assisted collaboration.
One major concern for task planning in disaster response is how to efficiently allocate limited resources to multiple spatially distributed incidents under time pressure. To address such coordination challenges in operations, a number of multi‐agent planning algorithms have been developed to computationally support planning in time‐critical task settings.2-4 While these algorithms can rapidly compute optimal routes and model and predict certain environmental variables (eg, wind speed and fire spreading), they typically ignore the physical and cognitive charateristics of human field responders, such as human psychosocial condition, movement, and learning ability12 and stress, fear, exertion, or panic.13 Hence, a key motivation in our work is to create a setting in which participants experience physical exertion and stress through bodily activity and time pressure to increase confidence in the veracity of observations.6 Specifically, we adopt a serious mixed‐reality game approach to study how spatially distributed responders coordinate in a time‐critical task setting.14
Furthermore, sociotechnical studies of command and control settings (eg, in disaster response,15 the London Underground,16 and air traffic control17) have revealed the complex ways in which interaction with physical and digital (or electronic) resources is embedded in face‐to‐face social interaction in the control room and have argued that taking the social organisation of the cooperative work setting into account is crucial for success.17 Further empirical studies of CSCW systems have shown that it is vital to study technology in use to understand potential tensions raised for teamwork. In particular, field studies of workflow support systems have revealed that technologies can disrupt smooth workflow if they are not designed in a socially acceptable way.18, 19 This paper follows the tradition of the empirical CSCW studies to investigate interaction and cooperative work in situ, to identify implications for technology support.
A review of the interaction design literature yields studies that have found that the potential benefits of automation support may not always be realised and can be offset by unwanted consequences.20, 21 These negative consequences can include over‐reliance on automation, loss of situation awareness, and loss of skills needed to perform the automated functions manually in case of automation failure.22 It is this recognition of the potential problems of automation that raises important challenges for the design of the interface(s) between the human and the computational support.
To this end, one significant design strategy is “mixed‐initiative,” which refers to a flexible interaction strategy where both human and software agent can contribute to the task, with each party contributing to the task according to its strengths.23 In the most general case, each party's role is not pre‐determined but opportunistically negotiated as the problem is being solved. So at one time, the software agent might have the initiative, controlling the interaction while the human “monitors” the execution (ie, the human is on‐the‐loop), while at other times the human may drive the interaction, with the software agent in a supporting role (ie, the human is in‐the‐loop).24 A number of algorithms, interfaces, and applications have been devised that facilitate mixed‐initiative planning and control.25, 26
In this work, to study the interactional challenges that arise in these arrangements, we integrated a planning agent in the AtomicOrchid game probe. To study how on‐the‐loop and in‐the‐loop arrangements play out in practice, the interaction layer between players and agents can be configured in different ways through modifications to the game interface. Through agent integration and iterative interface design, we created 3 versions of “probes,” 2 of which we evaluate in this paper in some depth.
The mixed‐reality game probe is used to conduct observational studies, which allow us to unpack human‐system interaction in the different interactional arrangements (on‐the‐loop vs in‐the‐loop). Our foremost analytic orientation is ethnomethodology,10 a perspective that focuses on the accomplishment of practical action and practical reasoning by the members of a setting. Specifically, we use interaction analysis to unpack naturally occurring talk and activity, with the aim of uncovering and describing something of the order and organisation by which people interact with each other and with the things around them.11
Our interest in this paper is how sociotechnical interaction is organised around the computational planning support; hence, our focus is both on the action on the ground, as well as in the control room. We recorded both system logs and video of interaction in the field for analysis. To capture the distributed, concurrent nature of the interaction, 4 researchers with camcorders shadowed the field player teams and 1 researcher recorded the action in the HQ. A replay tool was used to synchronise and analyse triangulated game events, player positions, and concurrent video recordings. These were then catalogued to identify key decision points in teaming and task allocation, which served to index sequences (episodes) of interest (cf Heath et al27). Interesting distinct units of interaction were then transcribed and triangulated with log files and field video for deeper analysis; the results of which we present in this paper.
In this section, we outline the system design of the mixed‐reality game probe AtomicOrchid. We created AtomicOrchid to study team coordination, interaction, and communication in a disaster scenario. In brief, AtomicOrchid simulates a radioactive incident. Participants of the game play both the role of responders “on the ground,” and coordinators in the control room. The interactive system provides situation awareness capabilities that enable monitoring of players, tasks, radioactivity, and communication via text messaging. A planning agent is integrated into the system to support the teaming and task allocation of field responders.
In this section, we outline the game scenario, the iterative development rationale, a description of the planning agent integrated into the system, and we provide some more detail on the system evolution, including functionality and interface description.
The game, AtomicOrchid, is a location‐based game based on a fiction of an explosion which creates an expanding and moving cloud of radioactive gas. Most of the players are on the ground and play the role of first responders; we refer to these as “field players”. Two players are based in a nearby HQ and play the role of coordinators. Within the physical game area there are several “targets” and a small number of “safe zones.” The goal of the game is for the field players to evacuate as many targets as possible to the safe zone(s) before the radiation cloud covers the playing area. Field players have limited “health,” which declines when they are in or near the virtual radiation cloud. If they are exposed to too much radiation field players will become “incapacitated” (die). Field players need to communicate frequently with HQ, as only HQ can see the entire cloud, while field players only have a numeric “reading” for their current location.
Within the game each field player is assigned a specific type or role: medic, transporter, soldier, or fire fighter. Each target also has a specific type (animal, fuel, uranium, and victim) and can only be evacuated by a 2‐person team with the right combination of roles. For example, a soldier and a transporter are required to pick up and carry fuel to safety. One of the key challenges of the game is therefore to form appropriate transient 2‐person teams of field players to evacuate specific targets.
We progressively developed and refined AtomicOrchid and the planning agent support in 3 iterations. Each version focused on supporting a particular relation of the interactional arrangements (see Figure 1). In the first iteration, we developed a base version of coordination support without integrating a planning agent. The system's design focus is on supporting the collaboration between and among field responders and HQ by providing real‐time text messaging and “situational awareness” interfaces, eg, real‐time monitoring of players, tasks, and cloud. In the on‐the‐loop version, we integrated a planning agent into the system, focused on supporting the field responders directly. The planning agent automatically generates a plan and allocates tasks to field players (hence, the HQ is merely on‐the‐loop). The third (in‐the‐loop) version is aimed at providing a stronger role for the HQ, by providing an interface that lets the control room mediate between planning agent and field responders. For each version, field trials are conducted and analysed; the findings of the first and second versions have then been turned into design implications for the following version.
Interactional arrangement in AtomicOrchid and the focus of each version
In the field trials of the on‐the‐loop and the in‐the‐loop version, the player teams are supported by a software agent that acts as a “planner”; this is in contrast to the base version,28 in which the field responders and HQ were entirely responsible for planning. The planning agent assigns evacuation tasks to field responders by making use of locations of targets and safe zones, a predictive model of the radiation cloud and the current location and health of field responders to minimise their travelling distance and maximise the number of targets rescued. A plan produced by the planning agent is a set of “task assignments,” ie, a request for 2 specific field players (with particular roles) to evacuate a certain target to a specific safe zone. In the on‐the‐loop version the agent's plan is communicated directly to the field players. In the in‐the‐loop version, the agent's plan is initially made available only to the HQ players; they can check the plan and edit it if they wish; once HQ has approved the allocations they are sent to the field players.
Following the mixed‐initiative principles set out in Section 2.2, the design rationale is to augment rather than to replace human decision making, where each party contributes to the task according to its strength. Therefore, the human retains the capability to reject the agent's task assignments to acknowledge the uniquely human ability—unavailable to the agent—to deal with contingencies that arise in the course of action (eg, humans may be tired, or they may have encountered a road block, etc) * . Note that for a plan that involves multiple responders coordinating to perform a task, having only one of the responders reject the plan means that the allocation of other responders has to be recomputed from scratch to preserve the efficiency of the planning process. Doing so can be computationally time consuming. We propose a solution to this in what follows.
To provide more technical detail, the planning agent runs a real‐time multi‐agent coordination algorithm to solve the coordination problem in 2 steps: (1) task assignment and (2) path planning. The algorithm models the coordination problem in AtomicOrchid using a Multi‐Agent Markov Decision Process (MMDP). The goal of solving MMDPs is to find the optimal policy that maximizes the number of completed tasks with minimum costs, although due to the large state space and the real‐time requirement, a working solution can only be approximate.29 The model not only takes into account environmental parameters (locations, distances, cloud, etc) and actor parameters (responder role, health, etc) but also whether tasks have been rejected. In more detail, our algorithm computes a set of plans conditioned on all possible plan rejections from the responders (ie, combinations of rejections from individual responders), which reflect responders' preference for the plan. If the current plan is rejected, an alternative plan will be selected based on the set of rejections received. To compute such plans, our algorithm applies a 2‐pass planning process. In the first pass, the best policy for the underlying MMDP without rejections is computed, and in the second pass, the rejections are handled using the policy computed by the first pass. By doing so, the planner agent can quickly respond to the rejection event and generate a better plan that is more acceptable to the responders. Further technical details of the planning agent can be found elsewhere.29, 30
The system design, in particular the interfaces between the human team and the planning agent have evolved through the 3 iterations described. We only have space to briefly summarise the results from the field trial of the first iteration—the baseline version without the planning agent—the details of which have been presented elsewhere.28
In the base version of AtomicOrchid without a planning agent, the HQ is manned by 2 to 3 coordinators. All of the coordinators are provided with a Web‐based coordination interface. The interface gives them an overview of the game status and enables them to communicate with the field responders who carry a phone running the mobile responder app. The user interfaces are similar to the interfaces shown in Figure 2, but without the agent/task allocation elements.
HQ and mobile interfaces in the on‐the‐loop version
We ran 2 AtomicOrchid game sessions to field‐trial the base version. The size of the game area on the local university campus is 400 by 400 m, with little traffic. The terrain of the game area includes grassland, a lake, buildings, roads, footpaths, and lawns. There are 2 drop‐off zones and 16 targets. An earlier pilot study showed that this was a challenging, yet not overwhelming number of targets to collect in a 30‐minute game session. There were 4 targets for each of the 4‐target types. The pattern of cloud movement and expansion was the same for both game sessions.
The result of interaction analysis from video recordings of game action showed that team planning was dominated by local (face‐to‐face) coordination between field players in a situated manner. The field players teamed up with their teammates and selected task by using available resources such as local conversation, the mobile interface, and messaging remote players. The HQ was observed to successfully provide awareness of the “danger zone” to the field teams through remote messages. However, HQ had little direct influence on the planning and actions of field teams. One potential reason could be the lack of communication between HQ and field responders. The observations led to a set of design requirements to improve the usability of the system:
These requirements have been taken into account in the development of the the on‐the‐loop version.
In the second version, the game interfaces were modified according to the design requirements generated from field‐trialling the base version (see Figure 2). First, messages in the messaging interface are appended with timestamps to allow players to identify their freshness. Second, targets on the digital maps are marked with a unique task number to ease geo‐referencing. Third, a feedback system is built into AtomicOrchid to assist quick acknowledgement. The feedback system is part of the integration of the planning agent, which is detailed in the following section.
As can be seen in the Figure 2, the majority of the HQ dashboard is occupied by a map‐based presentation of the current game status. Roles and locations of field responders are represented on the map as icons. The field responders can be uniquely identified by their initials shown on the icons. The target types and locations are also shown as icons on the map. Location and intensity of the radioactive cloud is indicated by a heatmap. Health status (health value ranges from 0 to 100) of the field responders is displayed on the right‐top panel. A chatbox at the right bottom for HQ allows browsing, composing, and sending messages. The messaging system follows a broadcasting model: Everyone can send messages to 1 public channel, and the messages are visible to every player through the mobile and HQ interface. The agent's team‐task allocations can be shown visually at the click of a button.
Field responders are equipped with a mobile responder app providing them with sensing and awareness capabilities (also Figure 2). There are 3 tabs in the responder app. The “map” tab displays a map showing locations of field responders and targets, which is similar to the map on the HQ interface, except that the cloud is not shown. The radiation level of the players' current location is displayed as a Geiger counter reading (shown as a number on the top left of the screen), which ranges from 0 to 100. Health status of the field responder is indicated by a health bar on the right side of the Geiger counter. The chatbox (similar to the one on HQ interface) is placed on the “message” tab for the field player to receive and send messages. Finally, the “tasks” tab shows the agent's task allocations.
Apart from improvements in interface usability, crucially, we integrated a planning agent into the AtomicOrchid platform in the on‐the‐loop version. The planner (described above in Section 3.3), is deployed on a separate server, which exposes an HTTP interface for AtomicOrchid to request plans. Each plan request issued by AtomicOrchid is appended with updated game status, which includes players' health, distribution of radioactive cloud and locations of players and targets. On the basis of the updated game status, the planner will produce an optimised task allocation and return it to AtomicOrchid. The plan requests are triggered frequently in game sessions so that the task allocation can be frequently adjusted according to task execution status. In this version, plan requests (and thus replanning) is triggered by 2 kinds of game events:
On receiving an instruction from the planner, the field responder can choose to either reject or accept the instruction in the “tasks” tab of the app, the rationale for which is detailed above in Section 3.3. In the case of rejection, a new plan will be requested and the agent will take into account the rejection in the next iteration of task assignment. More importantly, the rejected allocation is used as a constraint within the optimisation run by the planner. For example, if 2 responders (a medic and a soldier) were allocated a task and the solider rejected it, the planning agent would return a new task allocation with the constraint that this soldier should not be allocated this task. Unlike the later human in‐the‐loop version, the planning agent retains the control over task assignments. In this version, HQ could only intervene by using the communication channel to study an arrangement in which the agent has a relatively stronger role.
The instructions sent to field responders are also displayed in the HQ interface for monitoring purposes. The task allocations are represented as yellow lines connecting players and their targets (Figure 2). Only 1 task allocation is displayed at a time when the HQ player clicks on the “show” task button on the player status panel.
This section provides an abbreviated presentation of the field‐trial results reported in a prior publication.31 The field trial of this version follows the same game setup as the base version (see Section 3.4). A total of 16 participants were recruited through posters and emails and reimbursed with 15 GBP for 1.5 to 2 hours of study. The majority were students of the local university. The procedure consisted of 30 minutes of game play and about 1 hour in total of pregame briefing, consent forms, a short training session, and a postgame group discussion.
Through interaction analysis of video recordings of game action and system logs, we gain insight into the division of labour between human and agent in which the agent takes over routine planning activities while the human focuses on other issues such as finding teammates, targets, and choosing the best routes.
After presenting an overview of how task assignments were handled in the field trial, we present episodes that reveal how teams accomplish the tasks in the rescue mission, particularly focusing on the social organisation of interaction with and around the agent instructions.
Figure 3 shows how task assignments were acted upon in the field trial. Fifty‐one assignments were created by the planner and sent to field responders. Twenty‐four were accepted, while 11 were rejected or did not receive a response, ie, only 1 or none of the 2 involved players responded. Of the accepted tasks, 15 were completed successfully. An additional 8 tasks were completed that had not received a response (2 of which without agent instruction).
How instructions were handled in the on‐the‐loop version
In the following episodes, players can be uniquely identified by their initials. Targets are denoted by their unique numeric target id. Task assignments from the agent are represented as 2 pairs of initials and 1 target id connected by a rightward arrow. For example, the notation PC, CR 22 means player PC and CR are instructed to team up and go for target 22. A standard orthographic notation11 includes non‐verbal elements “((..))” and pauses in seconds, eg, “(1.0)”; this is complemented by timestamps [0:00], and system messages from remote players and HQ.
The following episode depicts a team of 2 dropping off a target and planning the next step.
At the beginning of this episode, the team (PC, CR) drops off a target at a drop‐off zone. Player PC vocalises that they have finished the task (“I think we dropped off now. OK”). After about 7 seconds, PC says she received a new task allocation from the agent (“I have a task now”). PC confirms the initials of the other player (CR) and suggests CR to join her to go for target 22. The action is consistent with the agent instruction (PC, CR 22), suggesting that PC has read the instruction and decided to follow it. CR said that they have already finished target 22 (“We have done 22”), which indicates he is confused about the current task allocation. PC resolves the confusion by pointing in the direction of 22 and repeating to go for it. Later, the team successfully drop off target 22 as instructed by the agent.
The episode shows how an agent instruction is brought up and followed by a team in a relatively straightforward manner. The instruction was delivered immediately after the drop off of a previous target (7 seconds after). PC successfully locates the new target in the instruction and leads the team to pick it up. Although CR is confused at first, PC manages to rectify CR's mistake and they finish the task successfully.
This episode is a typical case of task assignment to existing teams, ie, the agent sent a new task to a team immediately after they finished their previous task. Of the 51 agent instructions, 23 fall into this category. The rate of compliance is high for these cases of task assignment to existing teams (21 of 23; 91%).
Unlike Episode 1, sometimes the agent instruction implies players need to disband and form new teams after finishing their previous task, to enact the computationally optimal plan. Ten of 51 agent instructions fall into this category. The compliance rate of instructions that require reteaming (50%) is substantially lower than compliance of instructions where players can stay in the same teams (91%). The following episode depicts a typical case in which team reformation fails.
The episode begins with a recommendation by HQ to LT to go for 10 (message A). The message is topicalised by LT, but it is soon overridden by an agent instruction (NK, LT 16). When CR proposes to team up with LT to go for target 10, LT declined (“mine is 16”). HQ then withdraws its previous suggestion to go for 10 in message B. Shortly after, a new instruction (NW, LT 15) prompts LT to read out the target number (15), but she fails to raise the other players' attention. While other group members are engaged in planning next steps, LT does not engage and keeps looking around. She can be seen turning and walking back and forth. Perhaps LT is trying to locate the player NW who she had been instructed to team up with. LT does not take any action until prompted by CR (“are you LT? NW is looking for you”). Then, LT begins to walk to find her teammate. However, when she finally manages to meet up with NW 2 minutes later, NW has already been assigned another task.
On one hand, LT seems to feel obliged to follow the agent instructions. She turns down other teaming invitations and appears to try to look for NW in her immediate vicinity, indicating difficulty with locating teammates out of sight (despite the real‐time location map). On the other hand, her body orientation displays a sense of attachment to the existing group. Her indecisive walking and turning back and forth suggests she struggles to leave. She does not leave the group to follow the instructions until prompted by someone. When CR points out NW's message, LT does not answer the message either. The episode illustrates a combination of interactional “troubles” as a result of which the reteaming fails: being attached to the local group, struggling to locate teammates out of sight, and failing to reciprocate messages.
Further, we found the distance between instructed players to be a key factor in successful reteaming. That is to say, if instructed players are not within line of sight, the rate of non‐ compliance with the agent instruction is high. Taking Episode 2 as an example, player LT was instructed to team up with a distant player twice. Neither one of the instructions was successfully implemented. Overall, there were 17 agent instructions that implied teaming with distant players; only 1 of them was actually followed by players. Players explicitly rejected 11 of them by pressing the rejection button; the other 5 were not followed without an interface action.
In this fragment, we can observe disagreement and negotiation about team reformation. AW receives 2 consecutive reteaming instructions from the agent, finally teaming them up with LC, while KD does not receive another instruction. KD's question (“Do they know we are already on the task?”) suggests that he might think the agent is unaware of their situation and that he disagrees with disbanding the existing team. In spite of KD's disagreement, AW declares his intention to follow the new instruction (“got new instruction again, [team up with] LC”) and he turns to find LC. However, KD ignores this (“Alright, Lets go to 46”), indicating he does not agree with AW's intention to disband the team. AW interjects (“I don't know, I got a new task with LC”) and continues to walk towards LC, denying KD. As KD realizes he is without assignment (“Ah, I do not have a task”), he follows AW to find LC.
In this episode, teammates agree to reject the first task assignments. We found task interruption could be a major reason to reject new instructions. Ten of 11 rejected instructions are associated with task interruption. In an extreme case (not pictured),one team reached an agreement to ignore any agent instructions after the agent tried to interrupt the team's ongoing task.
In the end, the player that received the new instruction disagrees with his teammate's suggestion to ignore the instruction and decides to leave the current team. The team is disbanded in disagreement; the teammates spend a fair amount of time arguing whether to follow or ignore instructions, hinting at the hidden social cost of “coalition formation” algorithms when applied to human teams.
Overall, most of new instructions that interrupted ongoing tasks required team reformation. When tasks were interrupted, the rate of compliance (22%) is substantially lower than when teams were required to reform after a task was completed (50%). Task interruptions were also much more likely to lead to rejection of the new assignment (10 of 11 assignments that interrupted tasks were rejected).
The HQ sent a total of 147 messages in the 2 sessions. We identified 50 assertives and 68 directives in 2 sessions through speech‐act analysis. Most of assertives were focused on providing situational awareness and safe routing for the responders to avoid exposing them to radiation, for example, “NK and JL approach drop‐off 6 by navigating via 10 and 09” or “Radiation cloud is at the east of the National College.”
Six of 68 directives were directly related to task allocations and teaming, which is substantially less than the number of agent instructions (51). Among the 16 directives, HQ sent 11 direct instructions to the field players (eg, “SS and LT retrieve 09”), while the remaining 5 are related to forward planning (eg, “DP and SS, as soon as you can head to 20 before the radiation cloud gets there first”). Six of the HQ instructions are consistent with agent instruction, while 5 other HQ instructions override the agent instructions. It is worth mentioning that field players implemented only 5 of 16 HQ instructions. In the interview, HQ reported that they felt they supported the agent rather than taking control.
Our observations reveal the tension between agent planning support and the social organisation of teamwork. The tension does not simply mean the model held by the agent is “incorrect”; it highlights potential trade‐offs we need to consider in system design.18 As a result, we propose 3 design implications to scaffold the division of labour when building agent‐based planning support for human teams.
In the final in‐the‐loop version, we took into account the design implications from the on‐the‐loop version. We were interested to see whether the aforementioned issues of accountability, and social cost may be alleviated by a stronger role of HQ in the planning loop. Therefore, we enabled a “human in‐the‐loop” arrangement in which HQ can mediate between the planning agent and the field players. In this arrangement, the human HQ can request task allocations from the agent at any point and then needs to approve the generated allocations. Once the allocations are approved the task allocations are sent to the field players, who are then able to respond by accepting or rejecting their assigned task. However, in this version any task “rejections” from field players are merely requests for the HQ to change the allocations; final task allocation remains at the HQ's discretion. As a result of the evaluation of the on‐the‐loop version, HQ can also communicate preferences to the agent, for example, to “keep” a certain task assignment when replanning. To clarify, we list several requirements that are necessary of the in‐the‐loop design.
This is communicated back to the HQ players, and the HQ players can request a new plan based on field players' feedback at any point.
The purpose of requirements 1 to 2 is to give HQ more control over the planning loop, by delegating to them the responsibility for the final planning decision. Requirement 3 enables HQ to modify the plans computed by the agent without having to take full manualcontrol of plan generation. Requirement 4 is derived from the observations from the base version and the on‐the‐loop version that HQ struggled to override agent planning through unstructured text messages. New HQ and mobile interface were developed to facilitate the in‐the‐loop design.
Because the on‐the‐loop version of the HQ interface (see Figure 2) has proved effective for monitoring the game status, the interface was kept for operation by one of the HQ players in the control room (HQ2). In addition, a new task assignment interface was developed and operated by an additional HQ player in the control room (HQ1, see Figure 4). The new task assignment interface is designed to support HQ monitoring and intervention in the plan‐execution loop. The interface enables HQ to approve and edit agent‐suggested task assignments and monitor player feedback.
Task assignment interface with live map view (left) and task assignment panel (right)
The task assignment interface has a live map view on the left (Figure 4), which shows current player and target locations and task assignments. The right side of the interface is occupied by the task assignment panel. The left column (1) of the panel shows “pending” (ie, proposed but unconfirmed) task assignments, while the right column (2) shows current (confirmed) tasks. When the operator presses the plan request button (3), the agent will calculate a plan based on current task status which is then shown in the pending panel. If the player then presses the plan edit button (4), then the assignments in the pending area become editable through drag‐and‐drop interaction. Pressing the plan approval button approves all pending assignments, which moves to the current (confirmed) area.
Figure 4 (5) shows an example of a proposed task assignment: player MP and GO are assigned to target 07. Within each confirmed task assignment, a feedback indicator (6) shows the field player's response to this assignments (no response, reject, or accept). The stop button terminates an assignment, for example, in an emergency. A “keep” checkbox causes the planner to retain the corresponding task assignment whenever it generates a new plan. A text messaging panel is linked to the current selected task assignment and allows the 2 players involved in the assignment and HQ1 to exchange task‐specific messages.
Compared with the on‐the‐loop version of AtomicOrchid, the mobile interface is largely unchanged except for the HQ task/chat tab (see middle of Figure 5). The task tab now displays a task with text description and map visualisation of the task at the top. The bottom half of the interface is a message box showing task‐specific information from HQ. It should be noted that the HQ can still send broadcast information (visible to everyone), which will be displayed in the chat tab.
Mobile responder app: status tab (left), task / HQ chat tab (middle), and global chat tab (right)
We ran 2 AtomicOrchid sessions to trial the in‐the‐loop version. Each session follows the same procedure as the base version and the on‐the‐loop version. Detailed results of the interaction analysis is presented in Section 6. Overall, 70% (28 of the 40) of the targets were evacuated in the in‐the‐loop version, which is similar to the on‐the‐loop version (71.8%).
The following subsections start with an overview of task assignments. Task assignments serve to “index” the beginnings of potential episodes of interests in our qualitative data corpus. Selected episodes of game play are then presented to unpack the interactions surrounding the task assignment activities in the control room. We provide these episodes as vivid exhibits of how members accountably organise their team coordination in situ.32
Figure 6 shows how task assignments were acted upon in the in‐the‐loop arrangement. Overall, the planning agent created a total of 45 task assignments with an additional 5 assignments created manually by HQ. Headquarters approved a total of 39 assignments. Field responders accepted most of the approved assignments (30 of 39). Only 1 assignment is rejected by field responders, and 8 assignments did not receive a response. † During task execution, occasional HQ interventions resulted in 5 task cancellations and 5 assignments being overridden.
How instructions were handled in the in‐the‐loop version
This section presents selected episodes of game play to unpack the interactions surrounding the task assignment activities in the control room. The presentation of the episodes follows the same notation as introduced in Section 4.2.
As summarised above, most of task assignments are generated by the planning agent and approved by the HQ players. Episode 1 illustrates a typical case of task planning and approval.
At the beginning of Episode 1, HQ2 is drawing attention to his monitoring of MV and XW, who are confirmed by HQ1 to be carrying target 43. Given their current location, HQ2 is able to deduce “they should be going to drop‐off zone 7” and is also able to anticipate that they should then “get 36,” referring to the next target assignment. As HQ1 is manning the task‐assignment interface that includes the task‐specific chat, HQ2 instructs HQ1 to “tell then to go to 36 afterwards,” which HQ1 confirms in turn and acknowledges by pointing at the target on his screen. A short while later, after the team dropped off the target, HQ1 requests a new plan from the agent, upon which the agent suggests team MV, XW is assigned to target 36. This assignment is consistent with their previous discussion as confirmed by HQ1's utterance “36, yes.” HQ1 approves the assignment by clicking “confirm.” The assignment is sent to the field responders, who in turn accept the assignment.
This episode depicts a typical case of unproblematic agent‐supported task assignment. As Figure 6 shows, 34 (39 less 5 created by HQ) of 45 of the agent's allocations are approved without editing. Worthy of note is that the HQ can be seen to be monitoring the field responders in their ongoing task execution by means of the interfaces provided, which enables them to plan ahead for the next task assignment. As a result, they do not make a timely request for new task assignments from the agent, but they have already selected an appropriate next task (“36”), probably based on its location and requirements; this suggests that the interface is providing the HQ with sufficient information (eg, regarding player, target, and radiation) to come to a decision about which task to allocate. Notably, this decision is the same decision that the agent has arrived at, which confirms the HQ in their planning and lends support to their decision making. However, HQ does not always agree with the agent's assignment, as the following episode will show.
Headquarters players chose to change the task assignments generated by the planning agent in 11 of 45 cases (see figure 6); Episode 2 presents one such example
This episode begins with HQ requesting a new plan from the agent. The agent proposes a set of assignments, one of which (CE, KH 06) would interrupt an ongoing task (CE, KH 03), much to the disapproval of HQ1 (“What? Why am I getting?”). The task assignment “03” had previously been sent to KH and CE; however, whilst they are ostensibly in the process of doing the task (apparent by their location and direction of movement), they have not both “accepted” the task. Hence, the responders “look” available to the agent, which in turn suggests a new task for KH and CE.
HQ1 realises the fact that they have not explicitly accepted the previous task (“Ah:: one of these guys did not accept.”). Headquarters then instructs the agent not to change the existing assignment [04:29] by use of the “keep” checkbox and requests a new plan, which is generated without the conflicting assignment. As a result, the changed plan is in turn confirmed.
It then is noteworthy that in contrast to the episode presented in Section 4.2.3, the task assignment interface allows HQ1 to avoid interrupting the field responders' current task, in that HQ1 is not only able to notice but also able to compensate for the field players' failure to explicitly accept the task. As a result, the field players are able to continue with the previously allocated task without interruption and oblivious to HQ's intervention in the control room. However, in contrast to this unproblematic instance of plan correction, the next episode will show that editing of the agent's allocations does sometimes not lead to desirable outcomes.
At the start of this game session, we can observe one of the HQ players overriding 3 of 4 of the planning agent's allocations.
The HQ1 requests initial task assignments for all of the field players. The planning agent provides HQ1 with a set of task assignments for approval, but HQ1 is not happy with them (“Why, it is stupid.”). HQ switches into “edit” mode and replaces 3 of the targets in the agent assignments, voicing his intention as he is performing the editing. The 3 manually assigned new targets are the ones that are closest to the radioactive cloud. HQ1 confirms his modification [02:03] and provides an account of his strategy to HQ2: “we should get the far ones first,” probably referring to the distance of the selected targets from the field responders' current location.
The episode shows how the capability to change the agent's allocations allows HQ to implement their own strategy and priorities. The design rationale of this “feature” was to enable human decision‐making in response to situational contingencies to take precedent over the agent's rigid world model. However, things do not work out so well in this case. The modified plan turned out to be undesirable as it leads to 2 assignment cancellations and 2 players “dying” as they attempt to rescue a target from the radioactive cloud. In the end, only 1 of the 3 modified assignments was finished successfully.
Herein, we provide some key metrics to compare compliance (task acceptance) and team performance (task completion) between the on‐the‐loop and the in‐the‐loop version. Note that this comparison may be confounded by changes in the user interface made between versions and by individual and between group differences. The objective of the statistical comparison is to be informative and to supplement the qualitative analysis, which is the main focus of our analysis.
Table 1 shows key metrics for both versions. Compared with the on‐the‐loop version, the task assignments in the in‐the‐loop version have relatively higher success rate: 28 of 39 (72%) assignments are completed successfully, while only 21 of 51 (42%) assignments were completed successfully in the field trial of the on‐the‐loop version.
TABLE 1. Result overview
|Success rate||Failure rate||Acceptance rate|
|On‐the‐loop arrangement||21/51 (42%)||30/51 (58%)||24/51(47%)|
|In‐the‐loop arrangement||28/39 (72%)||11/39 (28%)||30/39(77%)|
Compared with the on‐the‐loop trial, the task assignments in the in‐the‐loop trials have relatively higher acceptance rate. Thirty of 39 (77%) assignments are accepted by the field players, while only 24 of 51 (47%) assignments are accepted in the on‐the‐loop trial. An independent sample t test indicated that acceptance rate was significantly higher for the in‐the‐loop version (M = 0.77, SD = 0.43), than for the on‐the‐loop version (M = 0.47, SD = 0.5), t(87) = 3.04, P = .003. Levene's test indicated unequal variances (F = 19.45, P < .001), so degrees of freedom were adjusted from 88 to 87.
In addition, an independent samples t test shows that the completion rate of tasks in the in‐the‐loop version (M = 0.72, SD = 0.46) is also significantly higher then that in the on‐the‐loop version (M = 0.42, SD = 0.5), t(85) = 3.04, P = .003. Again, Levene's test indicated unequal variances (F = 6.5, P = .012), so degrees of freedom were adjusted from 88 to 85.
In summary, the results show significant improvements from the on‐the‐loop to the in‐the‐loop version in the key evaluation metrics of acceptance and completion of task assignments.
As the core part of the analysis of the field trials, we have presented detailed episodes of interaction to illustrate how collaboration was achieved in practice. We now draw out our observations on key interactional themes displayed in the data, and we reflect on the improvements between the versions.
The results in Section 5.3 show that task acceptance and completion has been significantly improved from the on‐the‐loop version to the in‐the‐loop version. Moreover, the communication between HQ and the field players has been largely unproblematic in the final version, and most targets were successfully evacuated according to plan. The outcomes seem to be considerably better than for the on‐the‐loop version.
In particular, the HQ players in the on‐the‐loop version were observed to struggle to intervene in the planning process. In a paper presented at CTS in 2014, we have argued that there is a “hidden cost” associated with the agent's task interruption and instructions that require team reformation.31 The episodes presented in Section 4 illustrate the local interactional “troubles” (eg, disagreement and locating teammates) implicated by allocations that require reteaming (Episode 2) and interrupt ongoing tasks (Episode 3).
These findings in turn inspired the design rationale towards a stronger HQ in‐the‐loop that we hoped would alleviate some of the problems associated with “unfiltered” agent instructions. In the on‐the‐loop arrangement, the only way for HQ to intervene in the planning is to send unstructured text messages in the broadcast channel. The fact that only 5 of 16 HQ instructions were acted on in on‐the‐loop version suggests that HQ was unable to effectively override the agent when they wanted to.
The improved task acceptance and completion rate do suggest that the performance is significantly improved in an in‐the‐loop arrangement compared to the earlier on‐the‐loop arrangement. Specifically, HQ's ability to intervene has been enhanced by the mixed‐initiative task allocation interface introduced in the in‐the‐loop arrangement.
In sum, our evaluation has not only shown that task allocations computed by the planner are more likely to be accepted by field responders when there is a human in the loop who confirms or modifies each allocation according to the situation at hand but also that this arrangement leads to a better task completion rate. More broadly, the move towards a stronger in‐the‐loop arrangement highlights the need for interfaces that provide means for humans to moderate and intervene in agent‐based planning to respond to situational contingencies. The following sections explore the findings regarding division of labour and further planning support.
Herein, we reflect on the division of labour between the field responders, the HQ, and the planning agent observed in the field trials reported in earlier sections. The rationale for the planning agent's integration was to take on some of the work load involved in planning. Episode 1 demonstrates a typical case of division of labour: The agent handles planning of teaming and task assignment, freeing the field responder team to focus on navigational issues (identifying the target on the interactive map and finding directions). However, we have already lamented the trade‐offs implicated by the comparatively “weak” role of the HQ in on‐the‐loop arrangement, which led to the aforementioned improvements.
The field trial of the in‐the‐loop version showed that in many cases the communication between HQ and the field players is unproblematic, and most targets are successfully evacuated according to plan. Hence, the situation has improved considerably, and we conjecture that this is due at least in part to differences in the user interface in the in‐the‐loop version. Specifically, to recap, the main changes are the HQ‐manned task allocation interface and the improved mobile responder app. In the mobile app, the current task allocation is shown as a graphical overlay on the mobile map in the in‐the‐loop version, not just as a textual instruction given by the HQ player in the base version or the planning agent in the on‐the‐loop version. This seems to significantly reduce the field players' confusion about their current target and team‐mate and where to find them.
Furthermore, the task‐planning interface for the most part appears to provide an effective shared representation of the current state of the game. As well as showing current player and target locations and player health, it also makes visible the currently approved task allocations, field player responses, and any new plan that has been requested or is being edited. This shared information forms the common ground between the HQ players and the planning agent.
The evaluation has demonstrated that HQ players closely monitor this view and its representation of plan execution. For example, Episodes 4, 5, and 6 all reveal HQ players' awareness of field player progress and current tasks. Episodes 2 and 6 show awareness of the cloud's location in relation to players, and Episodes 5 and 6 show HQ players engaging actively with proposed (rather than current) task assignments. We observe that the HQ players are quite capable of modifying the agent's plans when they wish to, for better (Episode 5) or worse (Episode 6). HQ is also able to intervene in current task allocations, which is successful in resolving the situation in Episode 5.
As seen in Episodes 4, 5, and 6, HQ players are observed to use the task interface to assess current game status, while in Episodes 5 and 6 we have also seen how they can modify the agent's plans. This suggests that the interface is sufficient in providing basic situational awareness for HQ players to make their own plans.
The drag‐and‐drop–based task assignment interface in the in‐the‐loop version also enforces various constraints on task assignment so that all plans are at least valid, ie, well‐formed. For example, each player and each target can be assigned to at most 1 task, and each task can only have players with the correct combinations of game roles for the target. The interface also highlights players and targets on the map when they are manipulated so that the HQ player can readily assess location and proximity when editing task assignments. However, the observations also reveal some potential for improving support for human planning.
Returning to Episode 6, where the HQ player massively revises the agent's assignments (leading to undesired outcomes), one future idea is to enable the planning agent to “comment” regarding potential problems in the player's proposed plan. While making visible the planning agent's reasoning might have discouraged the player from changing the plan so dramatically, there will still surely be situations in which plans could or should be changed. And in future we may improve the system beyond leaving the player to “do their best.” For example, the planning agent could simulate (and perhaps extend) the proposed modified plan to provide the HQ player with at least 1 predictive view of the possible outcomes of their plan.
In the current system, the agent performs forward planning, ie, it considers what field players might do in the future, not just in the current/next task assignments. In future, this information could be made available to the HQ players. In Episode 4, we also saw one of several examples of the HQ players also planning for future task assignments. In future, HQ could be enabled to record their own forward planning and thereby feed back into the system instead of having to make a note or remembering what they were thinking when the current task is completed and they have the chance to check and intervene. Therefore, at least for some situations, it might be beneficial if the agent's future plans could also be viewed and if the HQ players also had some system‐support to guide their own future thinking.
We have also encountered the following interactional challenges that likely generalise more broadly to related settings.
Herein, we provide the lessons learnt that may benefit the designers of distributed coordination systems, in particular, in relation to situation awareness, computational planning support and interactional breakdowns. These may be particularly relevant for settings in which timely human decision‐making is critical.
Common ground is a critical requirement for making collaborative decisions in an effective and timely manner. Through our field trials, we identified the following features as constitutive of common ground through providing a mutual situation awareness for the participating parties (HQ, field responders, and the agent).
Our observations also align with the theoretical framework model of situational awareness proposed by Endsley (2001),34 which argues that this needs to be supported by 3 levels including (1) perception of the elements in the environment, (2) comprehension of the current situation, and (3) projection of future status.
Some opportunities and challenges have also become evident that relate more specifically to the possibilities of mixed‐initiative planning.
Our observations echo work on human considerations in context‐aware systems, which propose principles to support intelligibility and accountability35; similarly, we stress that the goal for planning support systems should be to be accountable for their actions, therefore, “what they know, how they know it, and what they are doing about it” [ibid., p. 201] needs to be legible by the people involved. Furthermore, as planning is oriented towards the future, yet produced as a contingent, situated activity,5 the interface needs to support revision and revoking of plans in situ and furthermore provide the situational awareness essential to do so.
Our findings should not be overgeneralised. In this work, we compared 2 different human‐agent arrangements to study the emergent interaction. However, our goal was not to find the optimal system to solve the task allocation problem. While our results suggests that the in‐the‐loop arrangement was preferable to the on‐the‐loop arrangement, it was not without issues and there may be other arrangements and improvements that could have led to better performance, and reduced losses. Therefore, we suggest that future work could and should improve the system further. Particular aspects that could be improved further include both mixed‐initiate interfaces and the computational intelligence for distributed task allocation problems. For example, further means to communicate emergent issues back to the planning agent should be considered; however, the potential gains of such features would need to be carefully considered against the additional workload for the responders.
Overall, we foresee that there usually are unforeseen contingencies that humans need to deal with; hence, we feel strongly that a consideration of how contingencies can be responded to would need to be incorporated from the outset in any future work building on the contributions of this work.