Self-forks and games Holton’s Intention as a Model for Belief Embedded Agency Sequence Choosing for changing selves Callard stuff obviously

Vincent Le did the MSCP course about how natural selection favours AI. Analogies to Hendrycks textbook/model

https://humancompatible.ai/news/2024/07/23/ai-alignment-with-changing-and-influenceable-reward-functions/

Is there some deep connection between value change, Parfit’s future tuesday indifference, and the strangeness of Augustine praying ‘Make me chaste, but not yet’?

In the collective action literature there might be some interesting stuff around the requirement of collective agents to value their own agential coherence/persistence/unity, which might cast some light on the convergent instrumental goals stuff.

Could Ken Thompsons old paper about trojans hidden in self-compiling compilers (and the subsequent discussion, e.g. Doctorow 2020) have some relevance to what properties a system must have to pass values on to it’s children?