The year is 2025, and a new study by Ibrahim et al. claims: “TikTok’s recommendations skewed towards Republican content during the 2024 U.S. presidential race.” In it, authors from NYU Abu Dhabi conduct a comprehensive analysis of TikTok recommendation trends. They use sock-puppets, “fake” accounts simulating bot activity. Some sock-puppets are primed with Republican content, while some are primed with Democratic content. Then, they just let the TikTok algorithm do its thing and measure the extent to which democratic-aligned content and republican-aligned content is recommended. Their finding is (at least to me) unsurprising:
Our analysis reveals significant asymmetries in content distribution: Republican-seeded accounts received ~11.8% more party-aligned recommendations compared to their Democratic-seeded counterparts, and Democratic-seeded accounts were exposed to ~7.5% more opposite-party recommendations on average.
The authors then conclude:
Our findings provide insights into the inner workings of TikTok’s recommendation algorithm during a critical election period, raising fundamental questions about platform neutrality.
Which really is being interpreted as “the algorithm amplifies Republican-aligned content.” But is that the case? Reading this fills me with nostalgia — arguing about this has been one of the joys of my intellectual life. But at the same time, I feel afraid that we will repeat the same mistake over and over again. The reason why the argument above is flawed:
Recommender systems learn correlations between user preferences.
Sock-puppet accounts encode artificial user preferences.
Therefore, differences in sock-puppet accounts primed with different videos cannot, by themselves, indicate that the recommender system is “biased” or that it “amplifies” specific types of content.
Let me explain with another example. Consider videos not about politics but two sports (among many): running and Brazilian Jiu-Jitsu. Consider also that people engaging with videos about these sports vary in profile. Running appeals to a broader spectrum of people, and those people do not watch many Running-related videos. On the other hand, Brazilian Jiu-Jitsu is hit or miss: a relatively small group of “wrestling aficionados” consumes many videos about the sport.
Suppose we create two sock puppets, one containing videos about Brazilian Jiu-Jitsu (BJJ) and another with videos about running. What should we expect? I would expect the recommender system to have an “asymmetry” in content distribution. Most people who like BJJ really like BJJ, but most people who enjoy running only “kind of” like it. So if you are optimizing for “engagement,” 5 videos about BJJ is a stronger signal that the person will consume more of that content than 5 videos about running! This has nothing to do with content; it is all about co-viewership patterns, and the whole thing can be observed in a simple content-agnostic recommender system: the type you first learn about in an introduction to machine learning class; see Horta Ribeiro et al. (2023).
I feel that this is also what is happening here. The most popular democratic-aligned TikTok profiles include “Jimmy Kimmel Live” and “The New York Times,” which, in my view, are broadly appealing. On the other hand, the most popular TikTok profiles from the Republican Side are things like “Ben Shapiro” and “The Charlie Kirk Show.” Thus, a very plausible hypothesis is that you have “BJJ” vs. “running” all over again.
I mentioned that this debate fills me with nostalgia — and this is because we (Society? The research community?) have been arguing about “algorithmic amplification,” “algorithmic bias,” or “algorithmic effects” for quite a while. Already in 2018, there was a lot of concern around YouTube’s algorithm, Zeynep Tufekci wrote an influential opinion piece called “YouTube, the Great Radicalizer,” where she talked about her experiences with the algorithm of the worlds largest video catalogue. Rebecca Lewis wrote an excellent report about the rise of an alternative media cluster on YouTube and how there were links between contrarian creators within this cluster and other creators spousing much more extreme ideologies. And indeed, many crazy things in the world were happening around social media (and still are!). Which begs the question, to which extent is the algorithm to blame?
In a paper that would open a lot of doors in my career as a researcher, Horta Ribeiro et al. (2020) showed that a lot of people went on from consuming contrarian content on YouTube to explicitly white supremacist content. To measure that, we looked at commenting trajectories on YouTube. As far as I know, this paper also pioneered the use of sockpuppets to measure “algorithmic amplification.” Like most pioneers in empirical work, our analysis was kind of bad (in hindsight; at the time, it was amazing).
Nonetheless, the picture painted by the user trajectories we observed via YouTube comments and the sockpuppet audit was different. On the one hand, we found that a large fraction of users commenting on extreme videos (something like 40%) previously exclusively commented on “contrarian content.” On the other hand, when looking at the algorithm, we found that “you could reach extreme content from contrarian content,” — but definitely, this kind of content was not disproportionally recommended. Nonetheless, in discussing our findings, we went along with the whole algorithmic radicalization idea, as we thought this was what was happening.
Shortly after, Munger and Phillips (2020) wrote a compelling counterpoint to the idea of algorithmic radicalization. They argued that the explosion of extreme content online was due to user preferences. They proposed that there existed a demand for extreme content and that YouTube allowed this demand to be met. It is all about the platform's affordances! It is not financially sustainable for a TV channel to cater to 50,000 white supremacists, but it is financially sustainable for a random guy on YouTube to do so. YouTube changed the rules of the game, and new viewership dynamics emerged.
The best empirical work to date is much better aligned with “the supply and demand theory” than with the “algorithmic radicalization theory.” Hosseinmardi et al. (2021) used real online traces from a large (n=300,000) representative sample of users. They found “no evidence that engagement with far-right content is caused by YouTube recommendations systematically.” Instead, “consumption of political content on YouTube appears to reflect individual preferences that extend across the web as a whole.” Chen et al. (2023) paired behavioral and survey data (n=1,181) and showed that “exposure to alternative and extremist channel videos on YouTube is heavily concentrated among a small group of people with high prior levels of gender and racial resentment. These viewers often subscribe to these channels (prompting recommendations to their videos) and follow external links to them.”
But still, the research community has continued doing sockpuppet audits, and the general public has continued to buy that algorithms are somehow “amplifying” specific types of content without accounting for user preferences. Haroon et al. (2023) published a prominent work in this direction in the prestigious Proceedings of the National Academy of Sciences (PNAS). In this paper, they conducted an massive sock puppet audit (over 100,000 sock puppets), finding that “a growing proportion of recommendations deeper in the recommendation trail come from extremist, conspiratorial, and otherwise problematic channels.” I don’t think that the finding is “wrong,” this may indeed be an interesting quirk of the algorithm, however, this doesn’t tell us on whether the algorithm is actually “amplifying,” or “favouring” such content.
So, is the TikTok algorithm “amplifying Republicans?” Is YouTube amplifying extreme content? Depends on what you mean by amplification! If you count the number of videos shown under the conditions established by Ibrahim et al. (2025) or Haroon et al. (2023), then yes. But I would propose a more sophisticated notion of amplification: one that takes into account user preferences. Following Stray et al. (2023), I’d argue that an algorithm amplifies a specific type of content over another if they suggest it even when considering user behavior. An algorithm amplifies a kind of content A over a kind of content B if users systematically choose A over B when given the choice, but still, the algorithm disproportionally recommends B over A.
Audits like Ibrahim et al. (2025) or Haroon et al. (2023) do not model user preferences or behavior; they just look at the fraction of “extreme” recommended content as you go deeper and deeper into the recommendation tree, randomly picking videos to watch. This is not how people use YouTube or TikTok, recommender systems are slowly shaped by the continuous input of user preferences — and behavior in this simulated (and unrealistic) scenario is a poor metric to study algorithmic biases and algorithmic amplification. For example, in analyses of actual user behavior by Hosseinmardi et al. (2021) and Chen et al. (2023), extreme content is not typically consumed at the very “end” of long recommendation sessions! It is sought after by channel subscriptions or through external links on other websites or social media platforms.
But could we do a sockpuppet audit that doesn’t ignore user preferences? Back in 2023, I teamed up with Homa Hosseinmardi and the Duncan Watts’ CSS Lab at UPenn to do so. Our partnership, led by Homa, resulted in a quirky methodology that we call “counterfactual bots” and a nice paper that also appeared at PNAS (Hosseinmardi et al. 2024). While previous work feeds custom-made media diets to sockpuppet accounts, we feed them real media diets. So, for example, if we have a user named Bob, we get his 2021 YouTube history and train two “digital twins,” two sockpuppets that consumed the same videos as Bob. Then, to answer the question of “Is the algorithm amplifying content?” we consider the subsequent year, 2022. We get one of the two sockpuppets (the “control” bot), and this bot continues to mimic exactly what Bob did, the program watches all the videos that Bob watched in 2022. Then, we get the other sockpuppet account (the “the treatment” bot), and this one works similarly to the sockpuppet bots of other sockpuppet studies: it just roams around YouTube, blindly following the algorithm.
Previous work measures algorithmic amplification by simply looking at the consumption of “the treatment” bot. Instead, we measured algorithmic amplification by contrasting the amount of extreme content the two bots found! The idea is that the “control” bot represents a scenario where the content consumed is shaped both by the algorithm and user preferences, whereas the “treatment” bot represents a scenario where the content consumed is shaped both only by the algorithm. If the algorithm favors extreme content, we would expect more extreme content to appear in the treatment bot rather than in the control bot.
But what did we find? The recommender system recommends less extreme content when only the algorithm is present. This suggests that the driving force here is user preferences — which are not considered in more naïve audits. Increasing the influence of the algorithm deamplifies the content that the naïve audits find to be amplified by the algorithm. With this in mind, we argue that simple sockpuppet audits are “measuring the wrong thing” and that we cannot draw meaningful conclusions about “algorithmic amplification” from approaches like the ones in Ibrahim et al. (2025) or Haroon et al. (2023).
A story closely related to this is that of the Facebook feed ranking algorithm. In 2020, the docudrama “The Social Dilemma” painted a stark picture. Facebook’s feed algorithm would promote the worst of the worst kind of content: content that triggered outrage, misinformation, yada yada. The documentary is not alone in portraying Facebook’s algorithm as a great force for evil: media outlets have run countless pieces that give the reader the certainty that Facebook algorithms drive highly partisan content, disinformation, etc.
Ultimately, this picture turned out to be not only stark but also deceiving. In an unparalleled study, Guess et al. (2023) show that, during the 2020 US election, moving users to a chronological feed (i.e., essentially a feed without an engagement maximizing algorithm):
Decreased time spent on the platforms.
Increased exposure to political and untrustworthy content and decreased exposure to content classified as uncivil or containing slur words.
“Did not significantly alter levels of issue polarization, affective polarization, political knowledge, or other key attitudes.”
So essentially, removing the algorithm may make Facebook lose some money, but ultimately, it is no easy fix to the problems of modern society! In some senses, this whole story is akin to what we saw in the YouTube case. Perhaps there is something inherently compelling about the narrative: some hidden force drives people towards opinions or beliefs we think are inappropriate. Munger and Phillips (2020) mention another very different time in history when this happened: the horrors of the World Wars gave rise to “The Hypodermic Needle” model of communication, in which media would be able to “inject” messages into a passive audience. This theory explained the rise of absurd Fascist ideologies: people were just “vulnerable” to propaganda the same way as people would be “manipulated” by the algorithm. But it turns out that this model was very bad at explaining how people change their minds, and perhaps if we had thought more about this fact, we would have been more skeptical of the strong claims made around the power of the algorithm.
But I want to go somewhere else here. We only got an answer in the Facebook case due to a first-of-its-kind, unprecedented collaboration between Meta (or Facebook) and researchers in top U.S. institutions. This led to a flurry of outstanding social science papers that were able to give somewhat good answers to a lot of questions that people had been studying in less-than-optimal ways for a while.
However, this partnership highlights how hard doing research is in the absence of corporate support. Companies actually have meaningful data that could answer societally relevant questions, and the pace of research is harmed tremendously because we lack access to it. Times have changed, as has “the vibe” in Silicon Valley. Still, even when these studies were published, observers were already noting that this kind of collaboration was not a sustainable format for studying algorithmically infused societies (or machine behavior, if you will). The Facebook project had an “independent rapporteur” who wrote a compelling opinion piece after spending an ungodly amount of time auditing the collaboration between academic researchers and Meta (Wagner, 2023). His assessment is nicely summarized at the end of the abstract for the piece he wrote:
Though the work is trustworthy, I argue that the project is not a model for future industry–academy collaborations. The collaboration resulted in independent research, but it was independence by permission from Meta.
Indeed, his assessment was on the spot. If anything, collaboration between Tech Companies and researchers has since decreased. The rise of AI didn’t help much. We will likely not see studies of the caliber of those published around the 2020 U.S. election for the next couple of election cycles!
So, where do we go from here? We spent the last 5-ish years disproportionally blaming the algorithm for societies’ ills, while the best empirical research has suggested that this is a naïve take. But I don’t think this means that we should give up. Algorithms are part of broader sociotechnical systems that can be tweaked with policies, content moderation, and even with better ways to embed societal values inside the algorithm. I see current research exploring all these directions. For example, recent work from Stanford folks suggests that: “Social Media Algorithms Can Shape Affective Polarization via Exposure to Antidemocratic Attitudes and Partisan Animosity;” see Piccardi et al. (2024). Other large-scale projects, like the National Internet Observatory, are trying to get more robust data for people studying our information ecosystem. Initiatives like The Prosocial Design Network are mapping how different interventions (algorithmic or not) can help improve online spaced.
However, as we focus on these new directions, all stakeholders must be more critical in evaluating research on “algorithmic amplification.” Otherwise, we risk rediscovering that algorithms do not exist in a vacuum—that they only make sense when accurately modeling user preferences. I’d much rather spend my energy reimagining social media and finding new, better ways to study it.
… cough cough … Jonathan haidt…