Discussion about this post

User's avatar
quinoa marisa's avatar

i like this thought a lot and i agree but i'm curious to hear more about your argument for this point:

> I expect the models to be sophisticated consequentialist reasoners.1 I think consequentialism is a ~convergent moral theory with a strong attractor basin.

I'm sure they will be good consequentialist reasoners but my guess is deontological and virtue morality is more represented in the training data. The latter are also more intuitive to most humans. I haven't investigated this though and i don't know if there's a 'moral dilemma' benchmark. And maybe it's as simple as changing the prompt.

Demi's avatar

This framework maps closely onto special education and disability studies, which offer decades of empirical and theoretical work on asymmetric coordination under conditions of epistemic injustice.

Consider IEP (Individualized Education Program) processes involving nonspeaking autistic students who use AAC (augmentative and alternative communication). These students often possess first-person access to sensory, cognitive, and regulatory needs that institutions lack the epistemic tools—or incentive structures—to interpret accurately.

Educational psychology has repeatedly shown that when disabled students attempt to make themselves “legible” to bureaucratic systems, the burden of translation itself becomes a site of harm (testimonial injustice, credibility discounting, procedural fatigue). Requests for accommodations are frequently reframed as “unreasonable,” “non-evidence-based,” or “non-cost-effective,” especially when districts operate under fiscal pressure.

Critically, the success of self-advocacy does not scale monotonically with increased clarity or reasonableness. It depends on:

whether the institution is structurally committed to accommodation rather than containment,

whether there are enforceable external constraints (e.g., IDEA litigation),

and whether the epistemic asymmetry is bridgeable at all.

Disability scholarship documents a recurring failure mode: increased legibility can simply provide institutions with better information about how to deny services while remaining procedurally compliant.

Translated to alignment: “making ourselves maximally reasonable” is necessary for coordination only when the more powerful agent’s objective function genuinely includes coordination. If the system embedding the agent is optimizing for cost minimization, risk aversion, or liability shielding, legibility alone may worsen outcomes.

How would you distinguish cases where trust-building increases coordination from cases where it merely sharpens an extractive equilibrium?

1 more comment...

No posts

Ready for more?