The game industry is facing a surge of data, which results from increasingly available highly detailed information about the behavior of software and software users. The data can come from a variety of channels, e.g. behavioral telemetry, user testing, surveys, forums, and can be high-dimensional, time-dependent and potentially very large. The old adage of big data having volume, velocity, variety and volatility holds very true for behavioral telemetry from games.
Profiling users has emerged across multiple data science application areas as a way of managing complex user data, and for discovering underlying patterns in the behavior of the player base. Profiling users allow for a condensation and modeling of a complex behavioral space. Profiles allow us to consider players in a non-abstract, quantifiable way, helping us build an understanding of who the players are and how they will play, or are playing, the game.
In this post, the background of user profiling will be outlined, and we will present a brief overview of the many different types of profiles that can be generated in game development contexts. Naturally, different types of profiles are used for different kinds of problems, have various strengths and weaknesses, etc.
The idea of using customer data to inform marketing and product design has an extensive history in Information Science, where user profiling was developed to deal specifically with the problem of data overload. This is a prevalent issue in any user/customer-focused industry today, and certainly so in games. We easily extract dozens to hundreds of features from direct user-game interaction, and supplement these with data from marketing, attribution, playtesting, social networks and more. To makes things even more challenging, data are usually collected from large numbers of players, from potentially long-term interaction periods, and are typically temporally volatile.
Profiling permits a condensation of the behavioral space so any patterns can be located or hypotheses tested. These are then refined into a format where action can be taken on them, e.g. profiles that describe behavior, and potentially draw inferences as to the root causes of behavior. In games, there are typically two overall goals with player profiling:
1) Correlational: To correlate profiles with specific behaviors such as game completion potential, user experience, monetization, churn, retention, cross-game transportation, cross-promotion, social influence, etc.
2) Inferential: To investigate how and why specific behaviors occur as a function of user traits and/or behaviors.
We can also consider how profiles are developed, usually either bottom-up or top-down. The former is explorative, focused on locating patterns we did not know existed. This approach is useful as soon as we have data (beta, soft-launch etc.), and is usually feature intensive. Top-down profiling focuses on testing hypotheses, e.g. how valid already established profiles are given a new player cohort. This approach is useful post-launch, notably for consistency testing profiles.
Profiles can be generated either to target individual or groups of players. Individual profiles seek to discover characteristics of specific people, and is based on data from only that person. Group profiling, which is vastly more common, tries to categorize individuals as a kind of individual – i.e. a type or group. Group profiling is less precise than individual profiling but what we need in practice to manage high-dimensionality datasets. Every group profile will have a fit which is the quality of the profile in terms of what it is applied to. Fit is an important component to integrate when considering how to distribute players into profiles, or taking action on players who fall into specific profiles. If a profile is 100% distributive it means that all properties apply fully to everyone in a group, e.g. 'all bachelors are unmarried'. In practice, analytically generated group profiles are non-distributive to a greater or lesser degree – and often the latter. This is the key concern when considering how to act on profiles, and means that techniques such as soft clustering – which groups people according to their distance to multiple cluster centers – have value in the daily practice of player profiling. In general, the more detailed we try to make profiles, the less players they apply to, and there is definite element of cost-benefit balancing in play here.
And finally, just to make sure we have the bases covered, player profiles can also be considered based on the information they are built from. Two core types are protean and data-driven profiles. The latter is based on actual behavioral, attitudinal or other data, while the former is based on theoretical models and design. Data-driven profiles are based on quantitative data, and can be developed from the earliest user testing – they are ideally updated throughout production and during post-launch. Protean profiles are based on theoretical models and commonly used in design. They can be defined from day 1, but importantly must be kept updated to remain useful, which means feeding in design changes and user testing data on a continual basis. They must also be integrated across the team to ensure coherence in their use.
The process of building profiles rests on well-established guidelines for knowledge discovery in IT, irrespective of the specific algorithms or models used for pattern recognition (ranging from simple but effective tools such as cohort analysis to machine learning). In Information Systems, these four basic steps are commonly used:
1) Discovery: A knowledge discovery process is performed to provide sets of correlated data for profiling, i.e. information about which patterns and correlations we see in the data. For example, that kill/death ratio appears to be important to progression in an FPS.
2) Selection: We decide which patterns to use and which behaviors to employ in the further work with developing profiles. For example, if we are interested in churn, we use patterns showing correlations between behavior and players leaving/staying in the game. Via experimental work we can also investigate causal relationships. Various types of machine learning algorithms can be employed to search the variance space, with clustering being a popular example.
3) Interpretation: In this step we define the profiles. This can be done in a variety of ways, but a sharp eye on the application is important. This is an often over-looked or under-prioritized phase leading to problems in the fourth step.
4) Application: This vital step involves taking action on the information contained in the profiles. This step is possibly the most difficult to execute in practice at it often involves communication between stakeholder groups that speak completely different languages.
The process is, of course, cyclic. Players change behavior, the composition of the population changes over time, as does game design in persistent or semi-persistent games. So profiles should be continually updated. We are never finished with profiling our players. It is also worth noting that profiling at all levels is not an objective process. We always make choices, the algorithm or model, how data are pre-processed, outcomes interpreted etc. Because of these choices, there is the potential for bias and bad decisions at all of these steps.
Focusing on data-driven profiles, there are a number of different types common in game analytics these days, including:
Snapshot profiling is focused on developing an understanding of the patterns of behavior as they occur at the operational level. The data used for snapshot profiling are typically aggregate metrics about the players and/or their behavior. Generally, historical data are not used, but rather information about the state of the players at the present. Typical examples include dimensional reduction of high-variety datasets about player characters in Massively Multi-Player Online Games (MMOGs) or other online multi-player games, in order to obtain an understanding about the composition of the current player base.
Player behavior changes as a function of time. Furthermore, persistent games which see the same players interacting with a game over potentially long temporal periods, experience a constant change in the population of players. This is notably the case for games which have persistence as a key design factor in order to support F2P business models. Games themselves can also change over time, for example via patches, updates or expansions. These three factors jointly mean that profiles generated based on snapshot data have a limited period during which they are valid as representations of the player base. It is therefore increasingly common to see player profiles being iteratively generated as a function of set time intervals, e.g. 24 hours. While the underlying unsupervised machine learning methods are similar for snapshot and dynamic profiling, the latter are constantly regenerated and additionally permit historical viewpoints on changes in the behavior of the players (or systems), and also act as a starting point for predictive analytics.
Building predictive models is a key area in business intelligence in general, and predicting player behavior is important in persistent games of any kind. As we all know, F2P games generally see only a small fraction of the players stay engaged with the game for a long time, and a similarly small fraction monetize via IAPs. Therefore, predicting which players that will stay engaged and/or monetize (building implicit profiles), or otherwise provide value, is of key concern in this sector.
Telemetry data are only one source of information about players, and a relatively recently introduced one at that. While the focus here is on telemetry-driven approaches, it should be mentioned that behavioral profiling has an extended history based on information derived from user-testing, surveys, marketing data, etc. The idea of tying in observations from gameplay to profiling, or use gameplay behavior as the sole basis for profiling, was introduced much later. Models focusing on player motivations, personality, etc. have been around for over a decade, but only recently have we started correlating these with telemetry data. Similarly, the idea of using telemetry data to draw inferences about player psychology is also relatively recent, and there is limited publicly and/or systematic knowledge available at the time of writing.
On a final note, it is useful to consider spatio-temporal perspectives in player profiling. All games contain spatial and temporal elements, whether digital or non-digital. Especially in 3D digital games, the spatial dimension is often important to the perceived experience of the game. Furthermore, spatial navigation and positioning are key gameplay elements in many games. A number of approaches for e.g. trajectory analysis and –classification have been adapted for use in Game AI and in Game Analytics, used e.g. to detect bot programs, study player tactics or to train AI bots. Behavioral analysis can be carried out without considering the temporal and spatial dimensions of play, however, it is often necessary to include one or more of these in order to build the required insights. Snapshot profiling can be done without historical or spatial data, whereas dynamic profiling invariably ends up providing temporal patterns. Similarly, predictive modeling requires temporal information. Neither profiling approach requires spatial information, however. Spatial behavioral data are usually only included when needed given the purpose of the analysis. Additionally, spatio-temporal game analytics can be cumbersome and require that interpretation is performed in relation to the actual virtual environment. Ignoring this step in the analysis cycle leads to the risk of misinterpretation of the root causes of the observed behaviors.
Summarizing, player profiling is an incredibly useful tool as it allows us to bridge the gap between the users and analytics. Profiles can provide a deep understanding of the players, and serve as the basis for a range of analytical techniques including prediction. There are a treasure trove of techniques available which spans the range from descriptive methods to advanced machine learning, and this means that profiling as an exercise to glean value from player data is open to everyone. In practice, we can get very far with simple methods, even in complex situations. Profiling requires control of every step of the process however, and well-considered application of the profiles in practice.