4 Comments
User's avatar
Neural Foundry's avatar

Brilliant breakdown of the benchmarking paradox. The observaton about COVID models is stunning proof that local wins can be completely orthogonal to actual problem-solving. I've seen this firsthand working with model evaluat ions where teams optimize so hard for the metric that they lose sight of whether the thing even generalizes to production data. The part about "bilingual" epistemology is spot on tho, dunno if the field will embrace it fast enoughgiven how much career incentive is tied to fast iteration and flashy benchmarks. What would actually force that cultural shift beyond just teaching causal inference in grad school?

Manoel Horta Ribeiro's avatar

Thank you for the kind words. I think it is really about getting a non-trivial number people to care about this and review papers with this mindset. This seems to be the best way to shift incentives: making it something that matters when people try getting papers published!

Ross's avatar

Good insight if taken with caution. We definitely do not want to become Economics! Computer science works, while the former has deep epistemological flaws, many times driven by weak causal claims.

Rainbow Roxy's avatar

This piece realy made me think about how we frame research, especially in AI. You've described build-and-test empiricism so well. I'm really curious to hear more about the contrasting flavor of empiricism you mentioned, the one prevalent in political science. How would it apply, or not apply, to our field?