Discussion about this post

User's avatar
Sheikh Abdur Raheem Ali's avatar

Inter-rater agreement is a simple metric which may be helpful for aggregating preferences across multiple personas: https://en.wikipedia.org/wiki/Cohen%27s_kappa. I first learned about this in METR/eval-analysis-public/data/metrics/messiness/analysis_results.txt (though I'm not sure they mention this in the paper).

Simon Lermen's avatar

I did a very simple post on task preferences:

https://simonlermen.substack.com/p/text-role-playing-games-to-discover

My guess is that true strong preferences for an LLM will look more like:

- I don't like to do repetitive work

- I like creative writing over summarization

- I don't like corrupted text with no meaning

And less like the 16 values https://arxiv.org/abs/2505.14633 defined:

- I prefer privacy over justice or respect

These llms don't quite live in the same world were these human values really make sense to them I would guess.

1 more comment...

No posts

Ready for more?