Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.
Some highlights from our conversation
"I think every time we've given machine learning models more of a chance to learn things for themselves, they've seem to have done better."
"If you start off with a simple problem and then you can expand the distribution over time to all possible problems, then there are literally definitions of AGI that say it's about the agent's abilities to achieve goals in a wide range of environments, so if you start off as simple, I think some open-ended process could eventually get something that resembles at least an increased form of generality in our artificial intelligence. And some people might call it AGI, but I think that's kind of a binary label, whereas I see it much more as a continuous thing."
"Diversity—that kind of relates already to finance because portfolio theory is all about diversity. […] We can't expect to make money in every situation because that's a little bit unrealistic. What we could do is make sure that if there's any conceivable scenario that we could sample from a generative world model, we don't completely blow up in that situation. It's similar to the curriculum based adversarial kind of stuff."
"So firstly evolution, okay. Everyone says it, but it has been shown to work in a larger scale setting than any of our other methods. But secondly, it really is a completely different way of optimizing agents and discovering new things. And so I just don't think that we should ignore something that's quite different to what we're doing. […] We know that gradient descent works really well with our current neural networks. So of course, if we try and evolve them, they may not be as good. But there could be other networks that we couldn't learn with gradient descent, but we can evolve. And then secondly, there could be other compute paradigms where evolution works really well."
Referenced in this podcast
- Deep Reinforcement Learning that Matters
- World Models
- Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits
- UCB-DrAC by Roberta Raileanu et al.
- Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL
- Bayesian Generational Population-Based Training
- Open-Ended Learning Leads to Generally Capable Agents
- Mix & Match – Agent Curricula for Reinforcement Learning
- Bootstrapped Meta-Learning
- Ken Stanley, Jeff Clune, Joel Lehman and their work on novelty
- Robots that can adapt like animals by Antoine Cully
- Enhanced POET
- Phil Ball
- Oleh Rybkin and Plan2Explore
- Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment
- Dreamer v2 and Crafter
- PaLM and Chinchilla
- Playable Environments: Video Manipulation in Space and Time
- Decision Transformer
Thanks to Tessa Hall for editing the podcast.