On Bayesian brain models, what explains the feeling of surprise? How can this be like Non-conservative fields? How could manipulating salience and magic tricks work like this? They’re path dependent I guess. Is that it? What explains why psychedelics feel more real than real life? What other cheap tricks are there in writing?
The mere feeling of enlightenment is what you get when the author raises something to salience/surprises you with something that wasn’t salient to you in the moments before. But this is like a non-conservative field in reward modelling. An author can simply make things that are normally obvious unobvious, then reveal them, then repeat the cycle infinitely. Actually enlightening essays or ideas will be surprising regardless of prior attentional motivation. So even a summary of Hanson or Hazlitt’s theses will surprise. For this surprise to turn to enlightenment, this heightened salience must (1) persist after reading, though the reader need not know this (cf. The books ive read have made me) and (2) this heightened salience must make them respond more aptly to reasons in future.
The corollary of this is unfortunate. It means the actually enlighten their readers in expectation, the author must have a sense of which things are systematically under-salient.
https://www.nature.com/articles/s44271-024-00120-6
Boredom is the only mistake
- delacroix’s sardanapalus, the horror.
Dopamine = td error = a kind of reinforcement learning. Explains hedonic treadmill, explains why mastery leads to boredom, explains the evolutionary reason for boredom too. Dopamine drugs like cocaine work by promising that the future is looking surprisingly bright. So dopamine happiness is tied up with both error and being only temporary (in an intelligently updating system) Original monkey studies of dopamine?
Remember the exercise of guessing masked words in poems? It’s much easier to do with bad poems that good poems. This suggests discrete diffusion text models will really struggle to be good poets compared to autoregressive models. But they’ll do great at code since they can do that thing where you add noise to the generated thing but not the label which can constrain generations to end up at e.g. code that compiles. tbdesu i didnt realise adding noise to labels was normal lol. Is that just in GANs?
- For some reason P called that property of bad poetry efficiency. But I would call it redundancy of meaning, like meaning is more distributed across the entire sequence of text. Redundancy == Bus Factor, but we want poetry to be surprising. When read in sequence (surprise is an inherently sequential notion I think?) low redundancy == surprise. Surprise and boredom. So there’s a tension between surprise and redundancy. More obvious link is Shklovsky and defamiliarization Aesthetics and disinterested liking. This mean it connects to affordance and expectation. Wacky lol r1 poetry experiments
- Aside: But you can also just give regular autoregressive models code compilers for them to test with? I guess that kind of trial and error makes inference slower and more expensive. Maybe some kinds of verification are bad to run at inference time vs have the precomputed somehow?