
Paul Smolensky
[intermediate/advanced] Symbol Processing in Transformers and Other Neural Networks
Summary
Large Language Models (LLMs) have demonstrated impressive abilities in symbol processing through in-context learning (ICL). This success flies in the face of decades of predictions that artificial neural networks cannot master abstract symbol manipulation. We seek to understand the mechanisms that can enable robust symbol processing in transformer networks, illuminating both the unanticipated success, and the significant limitations, of transformers in symbol processing. Borrowing insights from symbolic AI on the power of Production System architectures, we develop a high-level language, PSL, that allows us to write symbolic programs to do complex, abstract symbol processing, and create compilers that precisely implement PSL programs in transformer networks which are, by construction, 100% mechanistically interpretable. We demonstrate that PSL is Turing Universal, so the work can inform the understanding of transformer ICL in general. The type of transformer architecture that we compile from PSL programs suggests a number of paths for enhancing transformers’ capabilities at symbol processing.
Syllabus
- Marr’s multiple levels of interpretation of computational systems
- Fundamental capabilities of symbolic computation
- Introduction to Production System architectures for classical symbolic AI
- Implementing Production Systems in transformer neural networks
- Analyzing transformer in-context learning as implemented Production System programs
- Representation and processing of general symbol structures embedded in vector spaces via Tensor Product Representations
References
https://arxiv.org/abs/2410.17498
http://arxiv.org/abs/2205.01128
Pre-requisites
Familiarity with transformer neural networks. Some familiarity with symbolic AI recommended.
Short bio
Paul Smolensky is Emeritus Professor of Cognitive Science at Johns Hopkins Unversity and a Senior Principal Researcher in the Deep Learning Group at Microsoft Research Redmond. His work focuses on the integration of symbolic and neural network computation for modeling reasoning and, especially, grammar in the human mind/brain. This work created: Harmony Networks (a.k.a. Restricted Boltzmann Machines); Tensor Product Representations; Optimality Theory and Harmonic Grammar (grammar frameworks grounded in neural computation); and Gradient Symbolic Computation. The work up through the early 2000’s is presented in the 2-volume MIT Press book with G. Legendre, The Harmonic Mind. He received the 2005 David E. Rumelhart Prize for Outstanding Contributions to the Formal Analysis of Human Cognition.