Pick your contrarianism at the right level of abstraction
Sometimes, I meet people who say:
I don’t want to work on X; everyone’s working on X! I want to work on something a bit more niche, neglected, off-the-beaten-track…
Now, I think this instinct is really good. But I also want to note that you can apply it on different levels of the stack.
What does that mean?
When you notice that everyone’s working on X, consider that instead of doing not-X, you could do the most neglected thing within X.
At my ARENA cohort in 2024, it felt like within AI safety, interpretability was the most popular field, and within that, SAEs were the most popular.
I remember specifically deciding, at that point, not to work on interpretability because it didn’t seem neglected. And I think that made sense.
Some people bit the bullet, dove into interpretability, but decided to be contrarian on the next level down.
e.g. from Stefan Heimersheim’s research bio:
I try to focus on neglected ideas, which just means I avoid doing plain SAE projects.
What if you jumped further into the funnel, indexed heavily on SAEs, and tried to be contrarian within that? You might end up doing impactful work like Activation Oracles — which could do a lot for interpretability, AI safety, and AI as a whole.
I think people who run from AI are typically picking their contrarianism on the wrong level of abstraction right now.
Pick your contrarianism at the right level of abstraction.



hm. you give, as an example, being contrarian by "working on interpretability, but not SAEs". but there's nothing stopping you from applying the same argument again: "working on SAEs, but not X". and to give credit to everyone working on SAEs, it's not like anyone is doing the *exact* same thing as anyone else. in general, as long as you're not doing the exact same thing as someone, you're contrarian at *some* level. so the razor of "as far down the stack as possible" doesn't quite make sense to me?