I believe this joke is useful in thinking about how we can get opinion polls wrong – i.e., how we think about the relationship between quality (representative data) and quantity (sample size). Specifically, when I see opinion polls, and in particular unscientific polls, it is easy to say that the data is useless – and the sample size is too small. The issue is that the two are separate problems, but we can easily forget that.

Meng (2018) describes what he the calls the *Big Data Paradox*: “The bigger the data, the surer we fool ourselves.” I believe this is, for the most part, the case with opinion polls. That is, when opinion polls have sample sizes much greater than 1,000, we should ask specific questions about *why* the sample size is greater than 1,000 – and in particular how people justify their sample size. We should not simply be impressed by the large sample size. Similarly, the solution to achieving better polls is not necessarily more data.

To discuss some of these issues, Meng (2018) shows that there are three (and only three!) sources of potential bias to consider when evaluating the difference between a sample average ($ \bar{Y}_{n} $) and a population average ($ \bar{Y}_{N} $): data quality defect, data quantity, and inherent problem difficulty. The relationship can be written as (from Bradley et al. 2021):

$$ \bar{Y}_{n}-\bar{Y}_{N} = \hat{\rho}_{\small{Y, R}} \times \sqrt{\frac{N-n}{n}} \times \sigma_{Y} $$

Let us begin at the end of the equation: $ \sigma_{Y} $. This is the standard deviation of the outcome we are interested in. The greater the standard deviation, the greater the inherent problem difficulty. If this standard deviation is zero, we will not have any bias (i.e., the difference between the sample avarage and the population average will be zero). In other words, it will be no challenge to reach the true estimate as it does not matter who we poll (and even n = 1 will provide an unbiased estimate). As this standard deviation increases, it will become more difficult to poll the outcome of interest. This is why we call it the *inherent problem difficulty*. In this post, I will not go into greater detail with problem difficulty but just assume that is non-zero (which is a *very* realistic assumption). In most cases, this is outside the control of the researcher.

Next, and of greater interest here, is $ \sqrt{\frac{N-n}{n}} $. This is where the potential of big data comes into play. Specifically, the more people we poll, all else equal, the smaller the bias. In the most extreme example, when $ N = n $, our bias will be zero, and we no longer have to worry about $ \sigma_{Y} $ and $ \hat{\rho}_{\small{Y, R}} $. However, in the polls we are interested in, $ N > n $. The simple point underlying the big data paradox is that all things are *not* equal, and we often cannot increase the sample size in isolation from other potential biases.

The first part of the equation is $ \hat{\rho}_{\small{Y, R}} $. This is related to how representative our poll is of the population of interest, i.e., the *data defect correlation*. This is the correlation between the outcome ($ Y $) and whether the respondent is participating in the poll or not ($ R $). If there is no correlation between values on the outcome and the likelihood of people participating in a poll, there will not be any bias. This is what we achieve with true random sampling (when we have no problems with compliance and the like). In other words, the greater the correlation between the outcome of interest and participation in the poll, the greater the bias. If, for example, we are interested in vote intention ($ Y $) and Conservative voters are more likely to participate in a poll ($ R $), there will be a non-zero correlation (i.e., a bias). Clinton et al. (2022), for example, found that Democrats were more likely to cooperate with telephone interviewers in the 2020 presidential pre-election telephone polls.

The interesting aspect of the equation is that the product of the three terms can explain *all* bias in any poll (we are not missing out on anything). There are no other hidden assumptions or dynamics that can also lead to a systematic bias (i.e., a difference between a sample estimate and a population estimate). It all boils down to the data quality defect, data quantity, and inherent problem difficulty. When we multiply the three terms we will get the full bias, and if just one of the terms are 0, we will not have a problem with our poll (i.e., a representative poll will not have any bias even when $ \sigma_{Y} $ and $ \sqrt{\frac{N-n}{n}} $ are greater than zero).

More data (i.e., increasing *n*) will often come at a cost of $ \hat{\rho}_{\small{Y, R}} $. That is, when we work with big data (or just more data), we will often see that to achieve extra data, we will end up with a non-representative sample of the population (and new problems). When you see a poll with an impressive sample size, be extra cautious about $ \hat{\rho}_{\small{Y, R}} $.

More importantly, even a small correlation between *Y* and *R* leads to a much greater bias than simply increasing the sample size. In other words, you cannot easily address problems with a bias simply by increasing n. On the contrary, you run the risk of making the problem worse. For a good post on this in relation to Meng’s paper, see this 2018 post by Jerzy Wieczorek (see also this Twitter thread).

We know that the quality of an opinion poll is understood by its ability to provide a sample average that is similar to that of a population average. However, often, we look at the sample size of an opinion poll and we are, all else equal, more impressed by a greater sample size. The problem here is the big data paradox. Again: “The bigger the data, the surer we fool ourselves.” And unless we have data on all cases (*n = N*), which we in most cases do not have, we need to make sure we do not get impressed by the size of data if we are interested in a population average.

That being said, how do we know what sample size to use? There are various different justifications for a specific sample size. Lakens (2022), for example, outlines six different possible justifications (from Table 1 in the paper):

*Measure entire population*. A researcher can specify the entire population, it is finite, and it is possible to measure (almost) every entity in the population.*Resource constraints*. Limited resources are the primary reason for the choice of the sample size a researcher can collect.*Accuracy*. The research question focusses on the size of a parameter, and a researcher collects sufficient data to have an estimate with a desired level of accuracy.*A-priori power analysis*. The research question has the aim to test whether certain effect sizes can be statistically rejected with a desired statistical power.*Heuristics*. A researcher decides upon the sample size based on a heuristic, general rule or norm that is described in the literature, or communicated orally.*No justification*. A researcher has no reason to choose a specific sample size, or does not have a clearly specified inferential goal and wants to communicate this honestly.

The first justification is having *n ≈ N*. In opinion polls, this is rarely an option but, on the contrary, one of the reasons we do opinion polls in the first place. Instead, we often work a combination of the four other justifications, from the accuracy (to get a small margin of error) to heuristics (e.g., most representative polls are around 1,000 respondents). In most cases we do face resource constraints, and we want to collect as much data as possible within the constraints (time, money, etc.) we are working with.

When we talk about big data, we often say that more data is not always better, and hence big data is not a selling point in and by itself. That is, increasing *n* might even correlate positively with $ \hat{\rho}_{\small{Y, R}} $, hence the big data paradox. Accordingly, a lot of data is a necessary but not sufficient condition for a good opinion poll. The paper by Meng (2018) helps us understand some of these challenges in the domain of polling (and I can highly recommend reading the paper).

`geom_rect_pattern()`

fra pakken ggpattern til at anvende et bestemt mønster i visualiseringen af statistisk usikkerhed. Konkret har jeg arbejdet med visualisering af meningsmålinger, hvor det som bekendt er godt at vise 95% konfidensintervaller.
Her er et eksempel med en af de seneste meningsmålinger fra Voxmeter, der blandt andet viser hvor stærkt Socialdemokratiet står i målingerne relativt til de andre partier:

Det fungerer ganske godt med visualiseringen da der ikke alene er fokus på usikkerheden, men at det også naturligt falder i et med selve punktestimatet. Det er således relativt let at aflæse både opbakningen til partiet såvel som de øvre og nedre konfidensintervaller. Det er ikke nødvendigvis en bedre måde at visualisere statistisk usikkerhed, men det bliver aldrig kedeligt at forsøge sig med nye funktioner i R.

Koden er som altid at finde på GitHub.

]]>- Caprese Bruschetta and Serrano Crisps with Caramelised Red Onion and Rocket Salad
- Cheese Burger with Wedges and Slaw
- Cheesy Chipotle Bean Quesadillas with Avocado, Tomato and Rocket Salad
- Chicken & Plum Noodle Stir-Fry with Bok Choy
- Chicken and Spinach Curry with Rice and Mango Chutney
- Chicken Thigh Chow Mein with Peppers and Green Beans
- Creamy Mushroom Pasta with Balsamic Dressed Rocket
- Creamy Spiced Lentil Curry with Roasted Cauliflower, Sweet Potato and Spinach
- Creamy Truffle and Mushroom Rigatoni with Tenderstem® Broccoli and Walnuts
- Fragrant Beef Pilaf with Flaked Almonds, Spinach and Coriander Yoghurt
- Fragrant Chicken Laksa with Red Peppers and Noodles
- Herby Crispy Skin Chicken with Sticky Baked Veg
- Jerk Style Chicken and Black Bean Curry with Basmati Rice
- Korean Style Beef Tacos with Sriracha Mayo and Pickled Onion
- Lentil Veggie Chilli with Zesty Rice
- Lentil ‘Bolognese’ with Baby Spinach, Mushrooms and Linguine
- Lentil Sambar Curry With Roasted Aubergine and Salted Peanuts
- Mexican Style Chicken & Sweetcorn Stew with Cheese and Garlic Ciabatta
- Mexican Spiced Beef Tostadas Rapidas with Soured Cream
- Panzanella Salad with Roasted Butternut and Crumbled Feta
- Paprika Chicken with Bulgur and Olive Jumble
- Pesto-Crusted Lamb with Nutty Asparagus Salad and Proper Roasties
- Plant Based Chilli Loaded Sweet Potato Fries with Limey Tomato and Avo Salsa
- Pork and Black Bean Tacos with Pickled Red Onion, Chipotle Tomatoes and Lettuce
- Pork and Chickpea Stew With Garlic Ciabatta
- Pork and Lentil Curry with Naan Bread
- Presto Bacon and Mushroom Linguine with Asparagus
- Quick Chilli with Basmati Rice and Sour Cream
- Red Lentil and Spinach Dal with Roasted Aubergine
- Refried Bean and Halloumi Tacos with Chipotle Mayo
- Roasted Aubergine and Spinach Dal with Rice and Greek Yoghurt
- Pork, Sage and Onion Creamy Spaghetti with Cavolo Nero
- Sicilian Style Aubergine and Pepper Caponata Stew with Cannellini Beans and Garlicky Ciabatta
- Smokey Chicken With Cheesy Mash and Garlicky Green Beans
- Smoky Mexican Style Bean Stew with Roasted Peppers, Feta and Tortilla Chips
- Speedy Sausage Pasta with Spinach
- Spiced Chicken Breast, Pepper and Bulgur Jumble with Roasted Tenderstem® and Yoghurt
- Spicy Creamy Chicken Pasta with Spinach
- Sri Lankan Style Sweet Potato Curry with Green Beans, Basmati Rice and Cashews
- Superfast Asian-Spiced Pork Noodles With Stir Fried Green Pepper
- Superquick Beef Ragu with Fusilli
- Superfast Asian-Spiced Pork Noodles With Stir Fried Green Pepper
- Superquick Beef Ragu with Penne Pasta and Spinach
- Teriyaki Sesame Chicken with Green Beans and Basmati Rice
- Tofu Massaman Curry with Green Beans and Zesty Rice
- Turmeric Roasted Cauliflower with Lentil and Coconut Dal
- Veggie Laksa Soup with Mushrooms and Green Pepper
- Veggie Moussaka with Cheat’s Garlic Bread
- Veggie Packed Chilli with Brown Rice and Zesty Soured Cream
- Warm Panzanella Salad (V) with Chilli and Crumbled Feta

1551. Aldrich-McKelvey Scaling: An Opinionated Guide to the Method and the Literature

1552. Reviewing Tip: The 10-Minute Data Check

1553. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

1554. Data Analysis at the Command Line

1555. Charts.css: CSS data visualization framework

1556. Understanding ShinyApps

1557. Understanding the Basics of Package Writing in R

1558. Alternatives to paired bar charts

1559. A bestiary of undead statistics

1560. The Data Science Interview Book

1561. Getting started with NHS mental health data: A practical guide for analysts

1562. Valgdata 2022

1563. Reinforcement Learning Fundamentals

1564. Hypothesis testing by example

1565. CS231n: Deep Learning for Computer Vision

1566. Modeling the secular trend in a cluster randomized trial using very flexible models

1567. A Curated Collection of Data Management Resources

1568. Beautiful Public Data

1569. Delete all your tweets using rtweet

1570. Failed Machine Learning (FML)

1571. Waste datasets review: List of image datasets with any kind of litter, garbage, waste and trash

1572. Bayesian neural network papers

1573. Images and Words: AI in 2026

1574. A Selective Review of Negative Control Methods in Epidemiology

1575. From migration to railways, how bad data infiltrated British politics

1576. R for Data Analysis

1577. Creating beautiful tables in R with {gt}

1578. Building a TidyModels classification model from scratch and deploying with Vetiver

1579. Let’s Get Apply’ing

1580. Fixing broken and irregular column headers

1581. Using functional analysis to model air pollution data in R

1582. Working with coalition data (part 1)

1583. Coalition data II: trials and tibblations

1584. Raster4ML: A geospatial raster processing library for machine learning

1585. Awesome Wikidata

1586. Predict missing values

1587. A global dataset of pandemic- and epidemic-prone disease outbreaks

1588. Variations on a ggtheme: Applying a unifying aesthetic to your plots

1589. The 12-bit rainbow palette

1590. conText: An R package for estimating and doing statistical inference on context-specific word embeddings

1591. Forecasting the 2022 World Cup

Previous posts: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70 #71 #72 #73 #74 #75 #76 #77 #78 #79 #80 #81 #82 #83 #84 #85 #86 #87 #88 #89

]]>The second decade of the twenty-first century witnessed a significant ‘rightward drift’ as populists in the West scored striking electoral gains. We argue that this reflects a shift in the power of electoral cleavages that is asymmetric in nature. Specifically, voters for whom immigration is salient are more likely to switch to conservative and national populist parties than to liberal or left-wing parties. We leverage data from three prominent cases, the United States, Britain and Germany, to demonstrate that immigration-specific asymmetric realignment occurred in the three countries. These findings have implications for our understanding of electoral politics, populism and the emerging ‘culture divide’ in party systems.

You can find the article here. The replication material is available on Harvard Dataverse and GitHub.

]]>682. Great Works in Computer Science

683. The Geometry Junkyard

684. Acquisition of chess knowledge in AlphaZero

685. deep-finance: Deep Learning for Finance

686. Notes on Saudi Arabia

687. Academic urban legends

688. Interactive Vim tutorial

689. 98.css

690. What’s wrong with medieval pigs in videogames?

691. What makes a photo cinematic?

692. The Typing of the RegEX

693. Mapping Museums Project

694. How This All Happened

695. How Do Regular Expressions Really Work?

696. King of the Underworld: Building The Thames Tunnel

697. L’ange (1982)

698. Things your manager might not know

699. The Art of the Desk Setup & The Evolution of the Desk Setup

700. Misconceptions: Some common geographic mental misplacements

701. Enclaves & Exclaves: A tour of the world’s geographically engulfed and orphaned places

702. Variability, Not Repetition, is the Key to Mastery

703. Nord: An arctic, north-bluish color palette

704. Use RSS for privacy and efficiency

705. Why All Mosques Look the Same

706. The Proust Questionnaire

707. How I Make Espresso: Tools and Techniques

708. Butterick’s Practical Typography

709. The truffle industry is a big scam. Not just truffle oil, everything

710. On Tea and the Art of Doing Nothing

711. Shou vs. Sheng Puerh Tea

712. Product Hunt: The most upvoted products, each day in every month

713. Box Breathing Techniques and Benefits

714. The Show About the Show

715. A tour of The UK and Ireland in accents

716. Demystifying financial leverage

717. Hobo Economicus

718. Apple Ranking: The definitive list of good and bad apples

719. How to Think About Relativity

720. Things could be better

Previous posts: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21

]]>Ideen kom i forbindelse med udarbejdelsen af Introduktion til R, hvor jeg så småt begyndte at lave et appendiks, hvor der var en oversigt over de forskellige funktioner, man kunne anvende. Jeg erkendte dog hurtigt, at det ville tage en masse plads, og det ville være langt nemmere at søge i en HTML-fil end at skulle gennemskue en stor tabel.

Som inspiration har jeg brugt How Do I? …(do that in R), der har en lignende oversigt over, hvordan man kan gøre forskellige ting i R. Se desuden også denne fine tabel.

Siden er under udarbejdelse og jeg satser på at tilføje flere eksempler over tid. Materialet til at lave tabellen på siden ligger frit tilgængeligt på GitHub. Har du forslag til ændringer eller tilføjelser, hører jeg gerne fra dig.

]]>There are relatively easy ways to improve the visual presentation of findings in interrupted time series studies. In the paper Creating effective interrupted time series graphs: Review and recommendations you will find specific recommendations to improve such figures.

The recommendations are related to the data points, the interruption, trend lines, the counterfactual, additional lines and general graph components. In addition, the authors emphasise that some of the recommendations are *core* whereas other are *additional*.

For data points, it is important to 1) plot each data point, 2) show the same points as used in the analysis, 3) line up the data points with x-axis tick marks, and 4) not join data points with lines. Accordingly, in good papers, it is easy to get a sense of the data simply by looking at the visualisation.

For the interruption in the visualisation, it is important to 1) clearly show the interruption (e.g., with a vertical line), 2) show any potential transition in the period (if it is not a clear interruption), and 3) label the interruption line. In other words, a good interrupted time series graph provides a lot of details on the actual interruption being studied.

For the trend lines, it is important to 1) plot both the fitted pre- and post-interruption trends, 2) use bold and solid lines for fitted trends, and 3) match the colours for the trend line and data points. Without these trend lines, it is next to impossible to evaluate whether there is an interruption in the outcome of interest. Here, it is also relevant to add a counterfactual trend line (i.e., the estimated trend in the absence of the event of interest), and in particular to use a different line pattern.

For additional lines, the most important thing to consider is reporting uncertainty. This is something that the paper is not devoting significant attention to, but I believe this is paramount for a good visualisation. Specifically, make sure to show 95% confidence intervals around the trend lines (or something to that effect).

Last, there is a series of aspects to consider in relation to general graph components. These core aspects include: 1) showing axis tick marks, 2) label axes, 3) align axis labels with axis tick marks, 4) include axis titles with information on name and unit of measurement. In addition, it is worth considering the use of grid lines, the scale, the visual impact of additional text, horizontal text, and colourblind-friendly colours.

There are some good examples in the paper as well. For example, consider Figure 6 where panel A is a figure from a published paper and panel B is the revised figure taking the recommendations into account:

In the original figure, vertical bars are used to show the monthly proportions. Such bar charts are not ideal as it is difficult to get a sense of the variation in the data over time, and it needs to include 0 on the y-axis in order not to mislead. Here is how Turner et al. (2021) describe the improved version:

The data points have been plotted, which allows the spread of the data to be more easily seen, allows the data to be extracted and reduces the visual clutter. The interruption has been represented by a vertical line which is labeled. The counter-factual and trend lines have been plotted, allowing the reader to more easily see the impact of the intervention. The x-axis has been adjusted so that the points are more clearly aligned with the tick marks to facilitate data extraction. The range of the y-axis has been decreased to allow the data to fill the available space. Using additional text, the level and slope changes are given, along with 95% confidence intervals.

Turner et al. (2021) further looked at 217 interrupted time series graphs related to public health interventions to examine what recommendations were followed in the literature. 73% of the graphs had a line representing the time of the interruption, but only 17% of the graphs had a line for the counterfactual trend.

In sum, interrupted time series graphs are great and there are easy ways to make such graphs effective.

]]>In CSGO, a first-person shooter video game, two teams (T and CTs) compete against each other in best-of-30 rounds in a series of maps (i.e., first team to win 16 rounds wins the map). In brief, the maps are virtual worlds (I know maps like Dust2 and Inferno better than my own neighborhood). On each team you have five players with 100 health points (HP) each in every round. When a player reaches 0 HP, the player is gone for the rest of the round. The Ts can win a round by either killing all CTs or succesfully detonate the bomb. The CTs can win by killing all Ts, defuse the bomb or letting the time of the round run out. Before each round begin, the players can buy equipment (such as guns and armor).

Before two teams even compete, they have to decide what maps to play with a pick and ban process. That is, there is a map pool (of seven maps), and the teams pick the maps that they are best at and ban the ones that their opponents are better at. Of course, this selection is anything but random and the teams should consider how to increase their odds of a win.

Petri et al. (2021) provide a study of the map selection process for a best-of-three match (see the figure below for an illustration of the process). In their study, they use data on 8,753 games (165 professional teams playing a total of 3,595 matches) to show how using machine learning to pick maps can increase the expected win probability by ~10 percentage points.

Notice that a good team is able to not only consider what maps they are best at and what maps their opponent are best at, but also to take into the expectations of what maps to play. A good team can then surprise the opponent if they prepare for a map they usually do not play (and thereby making it more difficult for the opponent for make a strategy against the team, i.e., anti-strat).

Once the two teams have decided what maps to play, it is all about winning the maps. The most straightforward way to win a map is to eliminate the enemy team. To value a player, we can therefore look at how many kills that specific player got, i.e., his or her kill-death ratio (KDR).

$$ KDR = \frac{Kills}{Deaths} $$

The more kills a player get (relative to his or her number of deaths), the better the KDR. The main limitation is that it captures very little of what is actually going on in a round. You can in theory be the most influential player on the team and deal 495 damage and still have a KDR of 0. One alternative is to consider the average damage per round (ADR).

$$ ADR = \frac{Total\;damage}{Rounds} $$

The main limitation is that it is still a very simplistic measure that does not take a series of other influential factors into account. For one, it is not taking into account how often you survive and the type of damage you are doing. One metric that takes these factors into accout is the KAST, i.e., the *kills*, *assists*, *survivals*, and *trades*, provided by a player (trades, for example, is when you are able to revenge the death of a team mate shortly after he or she is killed). We can then divide this by the number of rounds to get the KAST%.

$$ KAST\% = \frac{Kills + Assists + Survivals + Trades}{Rounds} $$

The problem with KAST is that is does not distinguish between the different inputs, and it is just as important surviving as it is providing a kill. In addition, there are various factors still not being addressed.

In order to provide a more nuanced rating, HLTV, the world’s leading CSGO site, introduced their HLTV Rating 1.0 back in 2010. The rating for an individual player is given by this formula (RWMK stands for Rounds With Multiple Kills):

$$ Rating\;1.0 = \frac{Kill\;rating + 0.7 \times Survival\;rating + RWMK\;rating}{2.7} $$

This measure can go from 0 to 3. The kill rating is calculated as Kills/Rounds/AverageKPR, where AverageKPR is 0.679 (average kills per round). The survival rating is calculated as (Rounds-Deaths)/Rounds/AverageSPR, where AverageSPR is 0.317 (average survived rounds per round). The RWMK is calculated as (1K + 4*2K + 9*3K + 16*4K + 25*5K)/Rounds/AverageRMK, where AverageRMK is 1.277 (average value calculated from rounds with multiple kills). Accordingly, is is better to kill five players in one round than one player every round for five rounds (you can find more information on the rating here).

The HLTV Rating 1.0 is an improvement as it takes into account additional factors and puts more emphasis on kills vis-a-vis survivals (given the fact that the survival rating is multiplied by 0.7). However, it did not take some of the measures introduced above into account, such as KAST and damage. For that reason, HLTV introduced its HLTV Rating 2.0 in 2017. Here is a visual comparison between Rating 1.0 and Rating 2.0:

We are now looking at five separate inputs: KAST rating, kill rating, survival rating, impact rating, and damage rating. The blue and orange colours indicate that these ratings are calculated for a player both on the T side and the CT side of the map (because the expected values differ for the two sides). The impact rating includes various types of impactful actions on a map, such as multi-kills, opening kills, and 1onX wins. The main limitation of the 2.0 rating is that the formula is not publicly available and it is, for that reason, impossible for others to replicate the results.

Another limitation is that there are still important aspects that are not taken into account. For example, some kills are more important in certain contexts and the in-game economy decisions are also crucial for the performance of a player and a team. Unsurprisingly, scientists have tried improve the models used to evaluate the performance of players.

Xenopoulos et al. (2020), for example, study the Win Probability Added (WPA) per round. They explore 4,682 matches with 70 million unique in-game events. The results confirm that player outcomes are heavily dependent on the context, as some game situations are harder or easier than others, and equipment value increases the win probability the most (more than remaining HP). That is, simply by looking at the economy of a team and what they are able to buy, we can make good predictions about the likely outcome of a round.

Xenopoulos et al. (2021) further look at how teams allcoate their in-game dollars on equipment and the different strategies teams can use in different situations. They estimate a game-level win probability model with a measure of ‘Optimal Spending Error’ in order to to rank teams by how actual spending decisions deviate from the optimal decisions.

There is a lot of data that can be explored in CSGO, including spatiotemporal data, exploring where different players are on the map and how that captures the performance of individual players and teams. However, if you are familiar with the above-mentioned metrics, you should be able to get a sense of why some players are better ranked than others in CSGO.

]]>Det vil sige at det første valg, bogen skal beskæftige sig med, er folketingsvalget 2022. Det efterfølgende kapitel vil så være folketingsvalget 2019 osv. Hvert kapitel kan komme ind hvad der kendetegner det pågældende valg, især med fokus på, hvad der var nyt ved valget (i 2022 kunne det være aspekter som Moderaterne, valglovens § 77, Radikales krav om valg osv.) samt hvilke ligheder valget havde med forrige valg.

Hvorfor gøre det på denne måde? Fordi det i min optik vil være mere givende at læse om et folketingsvalg i 1953, når det sker med udgangspunkt i de valg, der står mere klart i hukommelsen, modsat hvis man begynder med et tidligt valg og skal arbejde sig op gennem historien. Tanken er med andre ord at rækkefølgen 3-2-1 for begivenheder giver en anden narrativ såvel som systematisk dynamik, man ikke kan opnå, hvis man fortæller om bevigenhederne i rækkefølgen 1-2-3.

Generelt finder jeg nyere politisk historie mere interessant, da det netop er lettere at forholde sig til, hvor det let blivere en sværere opgave ikke alene at forstå relevansen af ældre historie, men også sværere, når det ikke sker med en eksplicit kobling til de begivenheder, man som læser husker bedre. Når jeg kigger på Søren Mørchs bog *25 statsministre*, finder jeg det mere interessant at læse kapitlet om Poul Nyrup Rasmussen end Erik Scavenius, hvor jeg tror, at det for de fleste ville være bedre at læse kapitlet om de første statsministre til sidst, selvom de er blandt de første kapitler i bogen.

Der vil selvsagt være tale om en alternativ historiebog, at skrive en bog om danske folketingsvalg i omvendt kronologisk rækkefølge. Det er dog netop på grund af dette, at jeg ville finde sådan en bog interessant at læse. En slags dansk politik møder *Memento*. Idéen er hermed givet videre.