Guess et al. (2023) showed in a prominent paper in Science that switching some users to a chronological (rather than algorithmic) feed for three months during the election in 2023 “did not significantly alter levels of issue polarization, affective polarization, political knowledge, or other key attitudes.” However, in a new correspondence, also in Science, Bagchi et al. (2024) question the validity of the prominent study. Meta changed the algorithm before the elections, which would call into question the study’s findings. Most specifically, Bagchi et al. (2024) argue the study may mislead readers “to conclude that the Facebook news feed algorithm used outside of the study period mitigates political misinformation compared to (the) reverse chronological feed.”
Bagchi et al. (2024) base their claims on an analysis of a large dataset released by Meta containing the amount of time users viewed, clicked, and shared URLs on Facebook. Using a list of outlets known for low-quality reporting (Media Bias Fact Check), they find a drop starting in early November 2020 that goes until early March 2021. These coincide with changes to the Facebook algorithm in November. Without such changes, Guess et al. (2023) may have found a positive effect!
Authors responded with their own letter, hereinafter Guess et al. (2024). They make three key arguments against Bagchi et al. (2024), all of which I found reasonable:
First, the experiment's internal validity is not impacted by these changes. Accurate causal conclusions can be reached for the specific period analyzed. Facebook did enact these changes, after all.
Second, the data analyzed by Bagchi et al. (2024) is incomplete, it only contains URLs shared more than 100 times, but not actual content posted on Facebook (which could be actually misleading). When analyzing Guess et al. (2023) data, they found little change in the number of unreliable sources pre- vs. post-study period. In other words, in their control group (where the recommender algorithm is enabled), they don’t observe this drop in the fraction of untrustworthy content.
Third, that the evidence used by Bagchi et al. (2024) is not causal. The “information ecosystem” was absolutely crazy between November 2020 and March 2021. Observed changes might have come from the many things that were going on, e.g., Biden got elected.
Things I wish the letters had done
After reading each letter, I wished that more analyzes had been conducted.
When reading Bagchi et al. (2024), I found in their references a fairly precise description of algorithmic changes enacted by Facebook. This was obtained by the January 6th Committee (but left out of the final report). Most important:
(A) Filter low News Ecosystem Quality (NEQ) pages from Pages you May Like to prevent low quality and misinformation pages from becoming viral.
Launched 10/22
Reduced to 75% on 12/1, then 50% on 12/3, 25% on 12/8, and deprecated on 12/10.
Relaunched in response to Jan 6th.
(B) Deploy the virality circuit breaker, which prevents the likelihood of URLs from new or unknown external domains that may contain misinformation from being boosted
10/9 launched at 100x threshold
10/23 launched at 25x threshold
12/1 reduced to 75%, then 50% on 12/3, then 25% on 12/8
Deprecated on 12/10
(C) Demote content from users who posted multiple pieces of third-party fact-checked misinformation in the past 30 days.
Launched 11/5
Reduced to 50% on 12/2, then deprecated 12/3
Relaunched 1/14
Deprecated 1/29.
(D) Demote low NEQ news and boost high NEQ news in order to increase the average quality of news in connected news feed
Launched 11/7
Reduced to 75% on 12/1, then 50% on 12/3, 25% on 12/8, and deprecated 12/10
Relaunched 1/13
Deprecated 2/16
So, basically, two of the relevant changes (A and B) above were deployed in October. This does not coincide with the sudden increase in trustworthy news in early November observed by Bagchi et al. (2024). Two relevant changes (B and D) were deployed in early November but were deprecated in December! If they were driving the consumption of trustworthy news, I would expect that the consumption of trustworthy news would have dropped again in December.
When reading Guess et al. (2024), I wondered why they wouldn’t simply rerun the analysis in Figure 1 (reproduced above), considering the (short) period before the changes to the algorithm were enacted. In other words, couldn’t they have redone the analyses considering the month after the start of the experiment (in late September) and before the changes pointed out by Bagchi et al. (2024)? This would have been a simple, elegant way to determine how much the algorithmic changes mattered.
But what is external validity here, anyway?
Guess et al. (2023) used pretty timid language to describe the findings, indicating that it was hard to generalize them. However, the lack of external validity is inherent to the problem of algorithmic effects! Let me explain. Implicit in both the letter by Bagchi et al. (2024) and in an editorial discussing the letter exchange is this idea of a “canonical” Facebook algorithm (the editorial calls it the “site’s default algorithm”). But Facebook has no default algorithm; the algorithm is constantly changing due to new data being posted on the website and new tweaks being made to the (dozens) of models used.
Bagchi et al. (2024) are correct in two senses. First, the original study by Guess et al. (2023) original study would have been even stronger if they studied the pre-changes algorithm. Second, the original should have mentioned the algorithmic changes made for the election. Their study would have been stronger if they had done the kinds of sanity checks they did in their letter, to begin with (and even stronger if they redid the analysis considering periods with different algorithmic “regimes”). However, they are misleading in calling these changes a threat to the study's overall validity. If anything, they are an incremental threat to its external validity, which was already questionable (by the authors themselves).
So what?
Outside the ivory tower, however, things get a bit weird. Meta’s Nick Clegg (President of Global Affairs) said that the original paper undermined claims that the site was designed to “serve people content that keeps them divided,” which is clearly false. The social dynamics induced by platforms like Facebook transcend their algorithm. In that context, I guess that the letter may serve as a reminder that more work is needed to study the effect of social media (and of algorithms) on society.
But we must be careful! This letter and the reaction to it may also lead to a misleading narrative. The press release from University College Dublin had the sensationalist title “New eLetter in Science debunks Meta-funded study suggesting its news-feed algorithms are not major drivers of misinformation.” It also contains a quote by one of the authors of the letter that is absolutely not supported by the letters: “Our results show that social media companies can mitigate the spread of misinformation by modifying their algorithms but may not have financial incentives to do so.” This is false! The letter adds incremental concerns about the external validity of the study. That’s it.
One reasonable question to ask here is: what does the study (and the letter) tell us that will be helpful for the future? I share Kevin Munger’s take that we should think about these things in a Bayesian sense. The original Science study indicates that the Facebook algorithm will not be a key driver of polarization in the next election. The letter, indicates that, in absence of special changes, we might expect it to have a slightly bigger effect. In my view, the evidence of the Science study is stronger than that of the letter, given the response by Guess et al. 2024.
Additional comments
Przemyslaw Grabowicz, the corresponding author of the letter, wrote an interesting comment on this post. They are in the comments just below, but I append them here:
Manoel, I greatly appreciate your feedback, since it makes me realize that I should write more about our Science eLetter and its implications. Once I find time, I'll put together a proper overall response at UncommonGood.substack.com. Meanwhile, let me quickly respond to the "three key arguments against Bagchi et al. (2024)" that you've taken from the response eLetter by Guess et al.
First, I don't think the paper by Guess et al. is internally valid. Consider the following. If a paper computes a causal estimate based on an experiment, but the control condition is meaningfully changed during the experiment specifically to affect the target causal estimand, and the paper doesn't reveal anything about that specific change, nor accounts for it, then the change could result in any desirable value of the causal estimate, without revealing anything about how it happened. In other words, a causal claim must define exactly what control and treatment conditions are. If it doesn't then the causal claim may be invalid in the situations where the description misses something meaningful that's related to the estimand. If the change was not described, then the default assumption should be that there was no meaningful change during the experiment. However, during the the experiment of Guess et al. there was a meaningful change introduced...
Second, you write that:
> Guess et al. (2023) data, they found little change in the number of unreliable sources pre- vs. post-study period. In other words, in their control group (where the recommender algorithm is enabled), they don’t observe this drop in the fraction of untrustworthy content.
Ok, so let's see what exactly Guess et al. write in their response eLetter, I quote
> Over the 90 days prior to the treatment, untrustworthy sources represented 2.9% of all content seen by participants in the Algorithmic Feed (control) group – during the study period, this dropped only modestly to 2.6%.
So, according to their own measure, there was a drop in the fraction of misinformation from 2.9% to 2.6%, so that's 10.5% relative drop (0.3/2.9), whereas we reported a 24% drop. Note, however, that only about half of their treatment period overlaps with the period of Facebook's emergency interventions. If it overlapped entirely, then probably instead of 10.5% drop, we would observe a 21% drop. That's starts to be quite close to the 24% drop we measured using a different dataset and a different notion of misinformation.
Third, you're right that the evidence used by Bagchi et al. (2024) is not causal. However, in our eLetter we haven't made any causal statements. Instead, we're pointing out that Guess et al. made causal statements without properly describing the control condition of its experiment. That said, we provided also potential explanations for the drop in the fraction of misinformation in news feeds of users. This explanation aligns with the reasons why emergency measures were introduced. These reasons were provided both officially by Facebook representatives [1], and unofficially by Facebook employees and a whistleblower, Francis Haugen [2, 3].
[2] https://www.wsj.com/articles/the-facebook-files-11631713039
[3] https://www.washingtonpost.com/documents/5bfed332-d350-47c0-8562-0137a4435c68.pdf
To which I replied:
Thanks for the reply, I attached it to the end of the post!
I disagree with point #1: the control group was "Facebook as it was during the election," that's fine. It is like saying that you an experiment to study in mobility in a city is invalid because there were changes due to Christmas — most likely than not, Facebook will always have changes for US elections...
I am not super convinced of point #2 on either way. You make a good point about the "entire treatment period". But still, this is such a convoluted thing because there are exogenous shocks to the demand and supply of news here.
I agree with you on point #3. You folks didn't do any causal claims in the letter, but note that: “Our results show that social media companies can mitigate the spread of misinformation by modifying their algorithms but may not have financial incentives to do so” is a causal statement, which is what bothered me.
Manoel, I greatly appreciate your feedback, since it makes me realize that I should write more about our Science eLetter and its implications. Once I find time, I'll put together a proper overall response at UncommonGood.substack.com. Meanwhile, let me quickly respond to the "three key arguments against Bagchi et al. (2024)" that you've taken from the response eLetter by Guess et al.
First, I don't think the paper by Guess et al. is internally valid. Consider the following. If a paper computes a causal estimate based on an experiment, but the control condition is meaningfully changed during the experiment specifically to affect the target causal estimand, and the paper doesn't reveal anything about that specific change, nor accounts for it, then the change could result in any desirable value of the causal estimate, without revealing anything about how it happened. In other words, a causal claim must define exactly what control and treatment conditions are. If it doesn't then the causal claim may be invalid in the situations where the description misses something meaningful that's related to the estimand. If the change was not described, then the default assumption should be that there was no meaningful change during the experiment. However, during the the experiment of Guess et al. there was a meaningful change introduced...
Second, you write that:
> Guess et al. (2023) data, they found little change in the number of unreliable sources pre- vs. post-study period. In other words, in their control group (where the recommender algorithm is enabled), they don’t observe this drop in the fraction of untrustworthy content.
Ok, so let's see what exactly Guess et al. write in their response eLetter, I quote
> Over the 90 days prior to the treatment, untrustworthy sources represented 2.9% of all content seen by participants in the Algorithmic Feed (control) group – during the study period, this dropped only modestly to 2.6%.
So, according to their own measure, there was a drop in the fraction of misinformation from 2.9% to 2.6%, so that's 10.5% relative drop (0.3/2.9), whereas we reported a 24% drop. Note, however, that only about half of their treatment period overlaps with the period of Facebook's emergency interventions. If it overlapped entirely, then probably instead of 10.5% drop, we would observe a 21% drop. That's starts to be quite close to the 24% drop we measured using a different dataset and a different notion of misinformation.
Third, you're right that the evidence used by Bagchi et al. (2024) is not causal. However, in our eLetter we haven't made any causal statements. Instead, we're pointing out that Guess et al. made causal statements without properly describing the control condition of its experiment. That said, we provided also potential explanations for the drop in the fraction of misinformation in news feeds of users. This explanation aligns with the reasons why emergency measures were introduced. These reasons were provided both officially by Facebook representatives [1], and unofficially by Facebook employees and a whistleblower, Francis Haugen [2, 3].
[1] https://www.nytimes.com/2020/12/16/technology/facebook-reverses-postelection-algorithm-changes-that-boosted-news-from-authoritative-sources.html
[2] https://www.wsj.com/articles/the-facebook-files-11631713039
[3] https://www.washingtonpost.com/documents/5bfed332-d350-47c0-8562-0137a4435c68.pdf
Thank you, Manoel, for your interest in our Science eLetter and for your comments. I appreciate them greatly. I attach my responses to your Substack post here, since these exchanges may lead to a broader discussion about our eLetter and the original paper by Guess et al.
First, I'd like to clarify that the "debunk" framing originates from the University of Massachusetts Amherst's press release, not from Dublin, and it doesn't originate from me, but it does appear in both releases. It may sound as an exaggeration, since Science hasn't issued the correction. For this reason, I've requested already to take it out from the title of University College Dublin's press release.
Second, I also don't believe that the entire paper by Guess et al. is debunked, but it did miss crucial information. Without revealing *any* information about the 63 break-glass measures, it could arrive at any desired conclusion, since the result depends on the unrevealed emergency interventions, and nobody would know what this conclusion really means.
Third, yes we've read the "fairly precise description of algorithmic changes enacted by Facebook". However, when I talked with co-authors from Meta, they said that these dates are based on unofficial leaks and may be incorrect. That's why we are careful about the wording in our Science eLetter.
Finally, you write "This is false!". Would you mind clarifying what exactly is false in that statement “Our results show that social media companies can mitigate the spread of misinformation by modifying their algorithms but may not have financial incentives to do so.”?