Brief notes
- Indigenous in AI!
- Hand-transcription of Māori for automatic speech recognition: Kōrero Māori
- Caleb Moses: 316 hours of utterances collected in 10 days!
- bootstrap a DeepSpeech model
- instead of who owns the data (like Western land ownership) who safeguards the data (like Māori frameworks like kaitiakitanga)
- kaitiakitanga license: royalties go back to the data’s communities
- this is much more work that just writing little Datasheets for Datasets
- data sovereignty: open source only works among equals, versus getting your data strip-mined out of your community
- language is inextricable from culture: it doesn’t belong to one person
- but if there are few enough people in the group, they may choose to dump their data into the public domain, due to so few options
- each indigenous group has their own standards of engagement
- Making kin with the machines: Indigenous protocol
- note therein the story Gwiizens, The Old Lady, and The Octopus Bag Device
- “be kind to us, great mystery”
- technology as prosthetic, as rare, as mediated by people with seemingly bad intentions, as fickle, as arising from surprising sources, as object of beauty at the end of a long task, as tool of salvation, as of undisclosed results
- Causal learning can help extricate you from Simpson’s Paradox
- domain knowledge helps you avoid understand what is and isn’t a confounding variable
- “causal” language models … aren’t
- Neural architectures inspired by neuroscience, for AGI
- Perceptron &c explicitly modeled on biology
- instead of idolizing our planes versus birds, look at our cars versus goats: cars are missing fundamentals of quad-ped locomotion
- two large challenges: interact with the world (old brain), plan / reason / use language (🙋, 🐭, 🐦, 🐙)
- Hans Moravec: reasoning is the thinnest veneer of human thought, and we have a billion years of experience in perceptual and motor control
- AI relies on learning, but animals rely on innate structure, from a genome, via genetic bottleneck
- horses can stand at birth, spiders can hunt and spin webs at birth, birds have species-specific nests, beavers want to build dams without parental guidance (even indoors 😵)
- deer mouse, raised by field mouse, builds a deer mouse burrow, not a field mouse burrow, and vice versa!
- if you know English, and can model English, memorizing three English words is nbd; versus memorizing some Linear A
- rats have a spatial modeling center in their brains to learn spatial maps with
- humans have a Fusiform Face Area to learn faces with
- humans have stereotyped language areas (koko the chimp learned 1000 signs but without syntax)
- innate behavioral differences between us and chimps are our genome, our wiring diagram
- we understand the parts of neurons, though, but not how to assemble them
- GPT-3 has >100B parameters
- C elegans has 302 neurons and 7K synapses, for 200MBit genome
- Human brain has 100B neurons, 100T synapses, for 1 GBit genome
- Genomic bottleneck: Loss(Genome) = InitialPerformance(Genome After Development) + λ Entropy(Genome)
- you can compress an MNIST network 1000x and still get good performance (distillation?)
- Learning is not necessarily the key to our success: humans almost died out 70kya!
- Cultural transmission breaks the cultural bottleneck: oral transmission, and especially written communication, allows passing on huge amounts of knowledge from generation to generation
- cf Weight-Agnostic Neural Networks