Paul Smolensky

Johns Hopkins University

[intermediate/advanced] Symbol Processing in Transformers and Other Neural Networks

Summary

Large Language Models (LLMs) have demonstrated impressive abilities in symbol processing through in-context learning (ICL). This success flies in the face of decades of predictions that artificial neural networks cannot master abstract symbol manipulation. We seek to understand the mechanisms that can enable robust symbol processing in transformer networks, illuminating both the unanticipated success, and the significant limitations, of transformers in symbol processing. Borrowing insights from symbolic AI on the power of Production System architectures, we develop a high-level language, PSL, that allows us to write symbolic programs to do complex, abstract symbol processing, and create compilers that precisely implement PSL programs in transformer networks which are, by construction, 100% mechanistically interpretable. We demonstrate that PSL is Turing Universal, so the work can inform the understanding of transformer ICL in general. The type of transformer architecture that we compile from PSL programs suggests a number of paths for enhancing transformers’ capabilities at symbol processing.

Syllabus

Marr’s multiple levels of interpretation of computational systems
Fundamental capabilities of symbolic computation
Introduction to Production System architectures for classical symbolic AI
Implementing Production Systems in transformer neural networks
Analyzing transformer in-context learning as implemented Production System programs
Representation and processing of general symbol structures embedded in vector spaces via Tensor Product Representations

References

https://arxiv.org/abs/2410.17498

http://arxiv.org/abs/2205.01128

Pre-requisites

Familiarity with transformer neural networks. Some familiarity with symbolic AI recommended.

Short bio

Paul Smolensky is Emeritus Professor of Cognitive Science at Johns Hopkins Unversity and a Senior Principal Researcher in the Deep Learning Group at Microsoft Research Redmond. His work focuses on the integration of symbolic and neural network computation for modeling reasoning and, especially, grammar in the human mind/brain. This work created: Harmony Networks (a.k.a. Restricted Boltzmann Machines); Tensor Product Representations; Optimality Theory and Harmonic Grammar (grammar frameworks grounded in neural computation); and Gradient Symbolic Computation. The work up through the early 2000’s is presented in the 2-volume MIT Press book with G. Legendre, The Harmonic Mind. He received the 2005 David E. Rumelhart Prize for Outstanding Contributions to the Formal Analysis of Human Cognition.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.