On Gary Marcus

On Gary Marcus   ai philosophy

I have Gary Marcus in my blogroll. I agree with his idea that neuro-symbolic architectures are the way forward for robust AI.

Side note: Although unlike him:

  1. I do not think that causation is fundamentally separate from correlation, because I'm a causal reductionist after Hume
  2. I subscribe to Wittgensteinian "meaning as use" and "family resemblance" theories of language
  3. I think heuristically jumping to the correct solution immediately is closer to how humans work — and how we achieve seemingly "incomputable" and impressive results — than laboriously searching through a problem space which leads to extreme performance and scoping problems no one has well-solved yet
  4. I don't think it is possible to meaningfully represent rules and abstract concepts as anything but heuristic statistical patterns in such a way that:

    1. Human beings would actually be able to encode that data well (too many assumptions, complexities and exceptions)
    2. Computers would be able to symbolicly encode that data well even if we're having computers automatically construct those things
    3. It wouldn't be brittle and strange, just in a different way than deep learning
    4. Much of the knowledge a symbolic reasoner would need is commonsense knowledge that is basically impossible to figure out, list, and encode, but which can be readily (in theory) learned from large enough training corpora.

So I don't think merging symbolism with deep learning approaches is "ultimately" the right approach in some philosophical sense; I just think that as things currently stand, given what symbolic and connectionist approaches can achieve and can't achieve respectively, a hybrid is the best interim approach while we figure out better architectures and training methods for deep learning. Although he does have ideas that could be used to add that:

If there are three proposals for innateness that come up over and over again, they are frameworks for time, space and causality.

Kant, for example, emphasized the value of starting with a “manifold” for time, space, and causality. Spelke has long argued that some basic, core knowledge of objects, sets and places might be prerequisite for acquiring other knowledge.

Maybe multimodal models could be the start of this, since introducing vision introduces a spacial dimension, while audio and video introduce time, and all of them, especially video, involve a common web of causality — and linking all of those together and with language would necessitate a very rich conceptual space and the building of a world model. Although video-generating models are really, really horrendous energy-wise (worth actually getting upset about, versus chatbots) so maybe don't generate videos, just train on them? Or on embodied cognition? I've always been very sympathetic to the idea that for deep learning to achieve any kind of causal or world model, while we don't need symbolism, we do need it to actually interact with a complex, rule-following world to learn rules from

Or maybe we'll stick with hybrid approaches forever, but it will never feel like the "right" approach to me — I'll more agree with this offhand comment of his:

Finally, even if it turned out that brains didn’t use symbol- manipulating machinery, there is no principled argument for why AI could not make use of such mechanisms. Humans don’t have floating point arithmetic chips onboard, but that hardly means they should be verboten in AI. Humans clearly have mechanisms for write- once, retrieve immediately short- term memory, a precondition to some forms of variable binding, but we don’t know what the relevant mechanism is. That doesn’t mean we shouldn’t use such a mechanism in our AI.

Thus, when he says things like:

The trouble is that GPT- 2’s solution is just an approximation to knowledge, and not substitute for knowledge itself. In particular what it acquires is an approximation to the statistics of how words co- occur with one another in large corpora—rather than a clean representation of concepts per se. To put it in a slogan, it is a model of word usage, not a model of ideas, with the former being used as an approximation to the latter.

I don't see the problem, since word usage is meaning in my philosophical opinion, and the problem with the examples he gives of GPT-2 failing is always and only using words wrong in a way that would be directly penalized by their training function, or failing to accumulate high enough abstractions and correct groupings in vector space and discern deeper patterns, which can be fixed with scaling compute/training data or by making models more efficient in using the compute and training data we already have (my preferred solution). I used to be much more bullish on symbolic AI, but I've become a lot less enthusiastic about it over time.

I also disagree with him on this point:

The lack of cognitive models is also bleak news for anyone hoping to use a Transformer as input to a downstream reasoning system. The whole essence of language comprehension is to derive cognitive models from discourse; we can then reason over the models we derive. Transformers, at least in their current form, just don’t do that Predicting word classes is impressive, but in of itself prediction does not equal understanding.

A transformer need not "understand" some text in order to transform it into a regular, structured piece of data that a symbolic system can process, because information does not need to be picked and chosen or filtered in any way, just translated to a different structure, a task which we know for a fact transformers are reliably pretty damn good at. Not perfect, but good enough that when combined with structured data output constraints, they are a revolution in the possibility of natural language interfaces to data systems.

After all, most of his criticisms of large language models and stable diffusion models are accurate and on point, and his deflations of AI hype are necessary and timely.

And yet… whenever I see him pop up, or read his work, there's always a small sense of unease and distrust in the back of my head, telling me to take whatever he says with a grain of salt, even as I find his perspectives useful.

I think the reason is that he's spent something like the past 30 years levying the same criticisms of neural networks and connectionism repeatedly whether alone or with other authors (only deviating even slightly in the focus of criticism, but never the content nor ideas, when he is writing alongside those other authors, who seem to include him in the author list as sort of an obligation, as "The Guy Who Criticizes NNs"), with very little new to say — all he's been doing is essentially writing the same opinion piece again and again, even blog posts mostly just cover the same material — and while those criticisms may continue to hold true, despite all this grand theorizing about hybrid neuro-symbolic systems, he also very little new to show for any of it: he hasn't gone out and built anything he's been discussing; he hasn't actually achieved any results or done anything interesting; meanwhile the people he's been criticising have revolutionized the field of AI and produced fascinating and mindblowing results decade after decade.

It's not just that. There are other crank red flags. For one thing, he seems to take the lack of prevalence of neuro-symbolic approaches to AI in the field not as evidence that they just haven't produced the same overwhelmingly impressive results that connectionism has, but as evidence of some sort of grand conspiracy or personal insult to him — when all the "insults" directed at him are a result of the strange contrarian transcendental miserablist persona he's created for himself. For another, he teams up with people like the late Douglas Lenat, leader of the failed Cyc Project, to write papers once again echoing his same old tune, this time pointing to Cyc as (part of) the answer, when, again, it has resoundingly failed to produce anything remotely as new or novelly useful as large language models, or even remotely as "artificially intelligent" and capable of learning. This does not further inspire my confidence.

In a lot of ways, he sort of reminds me of another person I have on my blogroll, Loper OS. Someone who is very good at (in a grumpy, somewhat cranky, very repetitive way) critiquing the current state of a given field, and has their own pet hobby horse they like to ride as an alternative, but never actually manages to produce much of note themselves, nor branch out of their small niche as a cynic (in the Diogenes sense). The only difference being that to a certain degree the technologies that Loper advocates for are presently available and usable tools for me — things like Emacs and Common Lisp — and thus I can enjoy them and come to more closely empathize with and agree with his worldview, whereas the world Marcus is advocating for is totally inaccessible and the only way we've gotten closer to it is through precisely the methods he does not advocate, even if that "getting closer" is inherently limited in ways he describes, so something always feels off about his criticisms.