Category: Philosophy

Table of Contents

1. My own license for the Noosphere

Given my unique beliefs about the "noosphere" — the world of human intellectual and creative expressions and ideas, separate from their material instantiation — and my absolute opposition to its territorialization, including intellectual property, including even using IP to enforce any "freedoms" beyond purely the destruction of IP by means of IP itself (the core idea of share-alike/copyleft), but also my belief that licenses function as a sort of legal performance art, a voluntary adoption of certain ethical and ideological principles and a request that those who benefit from the work of those who have voluntarily adopted such ideologies reciprocate in like kind in some way — this is the contract theory of licenses — mostly functioning as a manifesto and personal expression, not a tool to actually force people to act certain ways, I've often found cognitive dissonance in using any license other than the MPL, and even the MPL was not precisely what I wanted.

In light of this, I have finally written my own license, which can be found here. I've drafted it with the help of Gemini 2.5 Pro acting as rigorous devil's advocate, trying to point out things I've missed, confusions, or inconsistencies, as well as helping me draft legal prose using the MPL as a reference document, and for fetching quotes from the MPL based on subject matter and explaining them to help me navigate it. Nevertheless, every line in that document was extensively edited, enough to be essentially fully written by (through Ship of Theseus logic) me, and carefully read to make sure that it aligns perfectly and exactly with my vision and views. You are invited to use it if you desire. Here's the preamble, to get a taste:

THIS LICENSE IS AN ACT OF LIBERATION.* It is intended to permanently place a work into the public commons to enrich the noosphere—the sphere of human thought and collective intelligence. It is intended to end the tyranny of the false concept of "intellectual property," which introduces scarcity and territoriality into a sphere where none naturally exists nor need exist, and its enforcement by the violence of the long arm of the Law.

The core of this license is the idea that ONCE AN IDEA IS MADE PUBLIC, ONE MAY NOT CONTROL IT.

To defend this notion, we use the violence of the Law only to restrict the violence of the Law itself, and no more. All true freedom — freedom that does not come at the expense of the equal freedom of others — is preserved for those who wish to take it.

Key features include:

  1. Philosophical Foundation: This license is explicitly designed to free the noosphere — as a manifesto and act of legal performance art against the concept of intellectual property and copyright, where part of that performance art is just the act of using the license, and thus spreading the manifesto, whether or not the license is truly legally enforceable, but where the other part of that performance art is truly trying to make the license as legally strong as possible while aligning with its philosophy as a whetstone to clarify and sharpen the backing philosophy, and as a way to see how that philosophy plays out.
  2. File-Based (Weak) Copyleft: Like the MPL, what counts as a derivative work of an NPL work is defined by direct modifications to the existing files already licensed under the NPL, or files that encorporate a substantial amount of stuff copied from an NPL-licensed file. This extends to model weights trained on NPL files too, though, crucially.
  3. Expansive Share-Alike Trigger: Unlike most open-source licenses that trigger obligations upon distribution, the NPL's core conditions are triggered by Use. "Use" is defined very broadly to include private actions like running, compiling, reading, or training a machine learning model on the work. This is partly because I think that's what's important, and also because I think mandating distribution is actually not really something I want to do, but I still want the license to apply on most cases.
  4. Explicit Application to AI Models: The NPL is the only license I know of that explicitly, aggressively, and in-depth deals with the issue of machine learning and large language models without being unncessarily anti-AI in some kind of Luddite fervor, or being founded on misunderstandings of how AI works, but also without giving them a get out of jail free card:
    • It defines machine learning model weights trained on NPL-licensed material as a "Modification."
    • This means the model weights themselves must be licensed under the NPL.
    • It introduces "The Regurgitation Test" as a practical and specific evidentiary standard to determine if a model was trained on The Work, shifting the burden of proof to the model's creator and also making it clear when model training is valid and free of the NPL and when it isn't, instead of trying to attack all model training. This test is based on recent research into large language model memorization (1, 2) as well as recent court cases.
    • However, the regurgitation test also has a fallback clause that outlines what should happen should that court case fail, and my genuine sympathies to the AI side of the debate, just my ethical desires for them to cease their hypocrisy: those who benefit most from freely accessing vast amounts of information unencumbered by IP should not then be able to assert IP rights.
  5. Symmetrical Forfeiture of Remedy: The license imposes a radical symmetry on both the Creator and the User.
    • Creator's Covenant (Sec 1.2.2): The original creator gives up the right to sue anyone for the leaking or unauthorized distribution of their own private, unreleased versions of The Work.
    • User's Forfeiture (Sec 1.3.2): In exchange for the rights granted, anyone using NPL work also gives up the right to sue anyone for the leaking or unauthorized distribution of their private modifications. If your modified version gets out, it immediately becomes part of the public commons under the NPL.
  6. No Forced Distribution or Anti-Tivoization Clauses: It is only ethical, in my view, to use the tools of intellectual property to dismantle themselves. Not to enforce other agendas, even ones that are as ethical as the desire for all information to be freely shared or to prevent devices from being locked down. As such, in this respect, the NPL is far less restrictive than even the MPL, let alone the GPLv3
  7. Defense of the Data-Noosphere (SaaS Clause): While it doesn't have an AGPL-style "source-on-demand" clause, it tackles the SaaS issue from a data perspective. If you The Work is, or you build The Work out into through modifications, SaaS software or anything else that stores data about users or hosts public data, you are contractually forbidden from penalizing users for scraping or otherwise extracting that data (Sec 1.3.3).
  8. Dual-Structure: Legal Framework and Social Contract: The license is split into two distinct parts:
    • Section 1 (The Legal Framework): Contains the legally binding and enforceable terms.
    • Section 2 (The Social Contract): This is my attempt to recognize what a lot of other "ethical source" licenses do, about the needs and desires of communities and creators, without using the law in a way that I see as unethical. It's also a useful way to set the greater tone of the ideology behind the NPL. It outlines the Creator's non-binding "wishes," such as providing attribution and contributing changes back. While not legally enforceable, the license warns that violating this social contract will be met with social consequences, such as public documentation of the behavior and refusal of future engagement.
  9. Anti-Nationalist Jurisdiction: The license attempts to sidestep the laws of any single nation-state. It specifies that any legal dispute should be governed by the legal principles most favorable to the public domain, and grants jurisdiction to any forum chosen by the party who is defending the freedom of The Work (Sec 5). For someone as invested in rejecting those "those weary giants of flesh and steel", it would be ironic and hypocritical of me to then rely on any specific one, or allow any specific one to govern my license. So although this clause is probably (a) not remotely legal or enforcable and (b) even if it was, probably weakens the license, it seemed worth putting in.
  10. Limited Relicensing Option: You are permitted to relicense your modifications of NPL work under two specific "Secondary Licenses"—the Mozilla Public License 2.0 or Creative Commons Attribution-ShareAlike 4.0—as an alternative to the NPL (Sec 1.3.7). This is because I see those licenses as ethically similar enough that I'm comfortable allowing them as an outlet, and it makes the NPL much more practical and easy to work with for people that don't buy into its hardline ideological stance.

You'll note that despite how harsh, aggressive, and confrontational the NPL is for those who accept it and use works licensed under it (to enforce the wishes of those who accepted it), it's also pretty lenient regarding being able to avoid or escape being sucked into it.

Its weak file-based copyleft, for one thing, and it also has relicensing agreements to more lenient and less aggressive licenses. For instance, if someone makes something with an NPL-licensed tool, not by modifying the code but just by using code already written as a pure user — e.g an app using quake or an image using an NPL licensed photo editor — they can use whatever license they want for what they created. There's no EULA or TOS-style "we own your data now" clause like corporations do even though I'd only use it to further bolster the Noosphere.

Not only that, but if someone makes a project that includes anything licensed under the NPL, as long as they keep the files separate, they can license or not license that larger work, and the separate files they made, literally however they want, as long as the NPL stuff remains NPL: not only if they use an NPL library, but if they statically link to or even vendor an NPL library. Even creating an API to that NPL library for public or private use is fine! As long as the library and your modifications to those specific narrowly defined files remain NPL, they can do whatever the fuck they want otherwise. And all that's in addition to the fact that you can at any time convert your modifications of the NPL from NPL to MPL or CC BY SA.

So while the NPL is extremely harsh as long as things remain in its domain, it is way less harsh than the GPLV3 in a lot of ways, in my opinion, and a lot of internet leftist licenses, and it's also like way less viral and pretty easy to work with overall, in my opinion.

The reason I did this is hard to exactly pinpoint, but the best way I can put it is that it's because I'm designing the NPL to be more of a performance art and a political statement of belief, something one can voluntarily accept the mantle of, and that one is strongly protected by if one does, but which is also not something that is forced on others. I could force my ethical stance on others, even according to my ethics (since moral nihilism does not entail moral relativism) but I don't think that would surve my purposes.

If I made it overly viral, in combination with how novel and harsh it is, not only would no one use it except a tiny number of a particular kind of internet post-leftist even in the best case theoretical scenario, thus limiting its reach as a manifesto, but it would also render my own work, and any other work licensed with it, so totally irrelevant and useless to the broader noospheric ecosystem that it would render those works laughable and thus the license laughable by extension. I want to make it so that it's viable and easy to assign this license to a lot of things, at least in theory, because it's a really good political statement and manifesto and that won't happen if it's overly viral.

I also don't think that forcing people to behave as if they agree with an ethical position, when it's not strictly necessary for the physical defense of self and others, is a good way to woo or convince them. This sort of intended effect is I want to people to read it and feel slapped in the face by how aggressive and harsh it is, And to know that the people who have voluntarily accepted the NPL are taking a very strong and self-restrictive, principled stance. But once they read further, if they read carefully, to realize, "oh hey, I can work with this pretty easily." That would foster joy, surprise, and a willingness to give back and act in mutual reciprocity; and would enable giving back in mutual reciprocity in more ways, even if those ways are slightly imperfect, such as through things like the MPL and CC-BY-SA, thus making it easier for people to do the right thing even if it isn't the perfect thing. It's like a bit of a free gift and offer of lenience after a clear statement of absolute principles.

2. Terra Ignota

2.1. Rant about the stupid reviews of it that totally miss the point

Basically, the central premise of the series is that there was this gigantic war between hyper-conservative fundamentalist religious and fascist factions back in the day. They call it the "Church Wars", but it’s unclear exactly how much it was strictly religion per se, and how much of that perception is retroactive propaganda on the part of the current society, in light of the attitude indicated by their reaction to it — namely, to completely ban all public expression of gender, gender roles, and beliefs regarding religion, metaphysics, or philosophy, outside super-basic ethics and politics. However, this was done only about 250 years ago at the time the novels take place, and it wasn’t as if the culture was really leading up to it when it was done; it was just a massive, panicked overreaction to the nature of the prior war.

And so their culture still has all of these ideas about gender and religion and so on floating around—picked up from the subconscious and semi-conscious ways people are treated differently, from cultural histories and traditions, from works of art left over from previous generations, etc. The only thing that’s changed is the surface-level ability to consciously acknowledge and discuss these things and call them by their true names has been tamped down, and thus, as a corollary, to even point out their influence!

More than that, nobody has figured out yet how to express something to replace the very powerful, rich, interwoven, well developed social constructs of gender (and communal, shared religious, philosophical, and metaphysical discussion, debate, and belief together). They seem to have gone with the stereotypical leftist conception of gender-neutral, meaning a riotous and incongruous conglomeration of various signifiers which, ripped from their context and juxtaposed, mean nothing, signal nothing, refer to nothing, and impart nothing, and thus fades into ugliness or noise; or a lack of all signifiers at all in a sort of beige unisexuality which is more the lack of expression and interest than anything else. Nobody’s built any interesting or meaningful social roles or ideas or ethics or modes of behavior or anything! They haven’t built anything out of it. It’s just sort of gender acceleration: a riotous nothingness that leaves no one with anything to latch onto and no roles to inhabit. Thus people are left vulnerable to — because they still recognize — and hungry for — because of the lack that hasn't been provided for, because even recognizing there is a lack would be inconceivable — Old Things.

In the midst of all this, a very well-educated combination priest and psychologist establishes, essentially, a secret brothel-slash-religious and political discussion parlor in the style of the various parlors during the French Enlightenment that were a huge influence on the rich and powerful in that time: inviting people in to experience all of these taboo topics around sex and gender and religion and metaphysics and so on, to discuss them and debauch in them and revel in them without judgment and with perfect secrecy. Then they’d meet up and network after and have discussions about politics and build connections.

The whole concept of the parlor/brothel is that it’s essentially like a really hardcore 1700s role-playing time-vortex, with some of the Madame’s children actually having been raised in that environment and ideology from birth.

This obviously attracts a lot of the powerful and kind of traps them in her orbit first by the intrigue of violating these taboos and also getting to secretly network and talk to each other and conspire, and then later by the threat of blackmail that she can hold over them.

The main narrator has kind of been adopted by the Madame and the rich and powerful in her orbit, and so obviously his ideology and perceptions of the world are very much informed by hers. So of course he has an obsession with French Enlightenment philosophers and ideas, which he weaves into the narrative and of course structures the narrative and even the prose very much in the style of. But he’s also convinced—and perhaps not necessarily wrongly—that although race and religion and gender and so on have been nominally eliminated from the public sphere (race because unrelated technological advancements have essentially eliminated the geographic nation and made traveling around the world take about four hours. So there’s been so much mixing of races and so much detaching of race from anything but culture that people don’t think about it anymore), their implicit presence is very important to understanding how the elites that cause the war that this series covers think, because of the Madame's influence, but also in general for understanding all of the interactions and things that are going on, since its form is informing people’s unconscious biases.

So he’ll assign he/him and she/her pronouns to people based on what role he thinks they’re inhabiting in the moment—even at the same time as he recognizes their sex characteristics (which may not align)—because, as he insists, those two are noticed by people and influence how we treat people based on our attraction and level of physical imposingness and so on, even if we don’t like to admit it. This, of course, is kind of silly because it then narrows gender: so strong and dominant people are all he. And weak or caring or manipulative people are all she. And he protests much too strongly against the idea that he’s doing that, but I think he definitely falls into that really hard.

On top of that, there’s the fact that things do that transportation system and the end of the geographic nation. What has replaced it is the institution of the hive, which is a voluntary government that you can give your allegiance to no matter where you are or who’s around you or what territory you’re in or whatever. That then gives you the laws that you’re bound by and will also stand up for you in dispute resolution with people from other hives and resolves disputes between you and people at the same hive.

These hives are all based around all sorts of different ethos, and at the age of majority you choose one to be a part of. What this has led to is a sort of self-selection where each hive is really very concentrated in terms of having the same people with the same general ethos as in it, leading to various hives having monopolies over various key social functions and turning the hives into an almost gender system of its own, but one which is not allowed to be related to sex and romance and hasn’t really been developed or elaborated because of the way people are trying to tamp down on a gender system.

(There are also hive lists who are not governed by any laws. They seem to be doing pretty well to be honest and have a seat in the global congress.)

This all has led to a sort of resurrection of things like dictatorship. Because if you can pick and choose who’s laws you submit to, then having one sort of law system be a dictatorship isn’t that much of a problem because if the dictator goes crazy or makes bad choices, you can just switch. In addition, the need to woo members through an ethos and a vision leads to that being a pretty promising option.

At the same time, the Democratic hives have fallen afoul of demagoguery and tend to elect single central powerful leaders. And the one hive that tries a truly new government system, one based on bureaucratic technocracy where everyone submits suggestion letters, and then those are sorted and summarized by a machine, has led to a situation where whoever runs the sorting algorithm basically runs the government.

So everything is very centralized. Especially since you need hives to be a certain size to be able to stand up to the other ones meaningfully, so there is a sort of growth tendency. That’s why there are only six hives.

This leads to a narrative where obviously a lot of the big powerful politicians and change makers are the ones who really matter to the development of the global political scene.

Oh also—the "great men" in this story are often hilariously stupid, petty, venal, horny, just like all nobility. That’s half the fucking point. And their power, for the most part, is an obvious failure of the system the novel clearly isn’t endorsing. That they have redeeming qualities too—especially when seen through the narrator’s too-kind eyes—is only the mark of a writer who isn’t so blinded by morality they don’t understand the appeal of antiquity. Because nobility does have appeal. That’s like… how we had it for as long as we did.

So, of course, basically all of the reviews are just constantly complaining about Great Man Theory (as if powerful individuals can't possibly be acknowledged to have large effects on history or else the book is bad) and how sexist the book is (because the narrator is), and most of all how "regressive" it is for recognizing that there is an allure to gender and dictatorship and empire and all of these things!

Because how dare Ada Palmer, right? How dare she explore the fact that, yes, these things are attractive to the human soul — we are not angelic beings for living in the 21st century (or the 25th!), we have not "evolved" past the instincts and needs that make things like hierarchy and social roles attractive to us, nor past the cultural associations that make the 18th century seem alluring to some degree. How dare she explore and understand that tough subject, when understanding precisely those hungers, those needs, is what would allow us to beat back actual reactionary danger? You can't win by just offering people nothing in return for what they lose but moralistic secular sermons.

2.2. Gender thoughts sparked by Terra Ignota

Rolling back to gender for a second, one of the things that bothers me a little about the idea of being nonbinary is that it's really just defining itself in opposition to the two contentful gender categories we have; it has no actual content in itself. So conservatives are a (very) little right when they ask "what the fuck is nonbinary?" because ultimately, it kind of isn't anything half the time, it's just a sort of rejection of other things without constructing anything meaningful in its place, while also trying to hold onto gender in some way, unlike agender people, so it's just existing in this meaningless beige inbetween zone with a riot of signifiers and pieces but nothing that means anything.

And to be sure, I've seen my share of people who actually have a good answer to this question, but even then, a lot of it is hyper-individualized, and the problem with that is that gender, when you boil it down, is literally just a genre of person, not the person itself. So for this concept of gender to have any sort of meaning or utility, we need some sort of protocol agreed on by some group.

When people boil down gender to just… 'how I feel' then, well, I'm not going to be the Gender Police and tell them to "get a real gender." I'm not even really going to look them askance, because I get it. But I always have a small sense of them having missed the point. Gender is an identification-with, but at that point they've reduced the concept of gender to be no different from their unique self, which then puts us in an awkward position because the whole point of the concept is it provides an abstraction – some minimal set of things that represent what kind of person you are so that people can treat you accordingly. Obviously you are more than your gender, but in programming speak, narrowing someone's type from Unique#71837 to Girl helps us a lot.

If someone is a Girl you have methods and fields you know about and can call that won't throw exceptions. Obviously it's never perfect but it does minimize the number of exceptions we have to catch and do domething about per person — and if there are too many exceptions, well then, it's time to create a new gender the way you'd create a new class/type. Ultimately, I'd want genders to become what some music scenes or things like butch-femme are: an infinitely diverse and kaleidescopic array of actual subcultures with real histories and ideas and expectations built into them to choose from. A sort of acceleration of gender production that stops short of the g/acc nihilism of one-gender-per-individual thus leading to the extinction of gender in toto.

You can have an individual interpretation of your gender too – stuff like what you relate to and see as relevant for the kind of person you are in society, but in terms of anything relevant to how others treat you, yeah, agreement and conversations have to happen to sync everyone's models up at least somewhat.

Also when I say a lot of nonbinary gender expression and conceptualization is beige, I don’t just mean the unisex theyfab stuff. I also mean like the stuff that’s a total riot of signifiers that all contradict each other and, ripped from their context and put together, mean absolutely nothing. Pure entropy also looks like a smooth beige from far enough away.

The point is not to police people into A Few Gendered Categories, but to say that genders, as The Major says, should be abstractions that bring with them protocols for understanding and interacting with you. You can’t have that if they’re completely unique to each individual. Gender is genre.

And, since I was at work, I didn’t get to express this as strongly—I was mostly listening to her and nodding along. I think it’s important to understand that gender receives most of its meaning and function from common identification with others, from becoming part of a community when you adopt it, and seeing others like you and defining yourself with respect to them—so if you’re the only one that holds your unique gender, you can’t do that, and it just becomes, as you say, expression or personal identity. Not gender.

2.3. Thoughts on transcendental miserablism sparked by Utopia

I recently read something that made me deeply sad and angry:

According to The Harvard Gazette, Becker said that as a child, he was a firm believer that humanity should live among the stars but his belief changed as he learned about the inhospitality of space. "As I got older, I learned more and realized, 'Oh, that's not happening. We're not going to go to space and certainly going to space is not going to make things better," he said. — "The Stupedist Thing": Scientist Shreds Elon Musk's Mars Dream

I hate leftists. Yes, Mars as a lifeboat is a dumb idea. But pooh-poohing the idea of going to space like this is just gross. It's transcendental miserablism of the worst kind: a total abandonment of grander aspirations for the future, of dreaming and love and romanticism, in favor of resolutely keeping our noses in the dirt and shouting down any hopes for anything else, any striving for other things, even if that striving has historically been beneficial to the present.

It's reminiscent of how Utopia is treated in Terra Ignota. In the books, there's a Hive called Utopia whose members all pledge their lives to defeating death one cause at a time, to getting humanity to the stars, to enhancing humanity, and in every way bringing wonders to life. They know it will be hundreds of years yet, if it's even possible, to achieve the things they aim for, but they see the struggle as worth it, and they sustain themselves with a love of stories and beauty and whimsey enabled by a love of beautiful and amazing technolog. They work tirelessly to get there… and everyone hates them for not focusing on the here and now, on Earth.

I really hate that leftism seems to have given up that hope, so much so that now they just have a slur for it ("techno-optimist"). I agree that the actual intended usage of the term refers to something I disagree with and think is wrong — tech won't automatically make everything better! — but nowadays the word seems mostly to be used against people who even think that technological progress could make things better, and that it's worth trying to learn and grow and expand science and technology, to see what grand things we can do, even if we don't have an immediate story for the payoff of those research projects, or even if we haven't perfectly predicted all the consequences those projects might have. It's sad.

3. How to use a large language model ethically

  1. Prefer local models
    1. So you're not supporting the centralization, surveillance, and landlordization of the economy any further
    2. So you're not taking part in necessitationg the creation of more data centers which – while they don't use a total amount of water and energy that's out of line with many other things in our technological society – concentrate water and energy demands in communities in a way they can't prepare for and which hurts them
    3. So you're harder to subtly manipulate, influence, propoagandize, and censor through the invisible manipulation of the models you're using
  2. Don't shit where you eat – don't spread unedited AI slop everywhere into the very information ecosystem it, and you, depend on. Unedited/reviewed AI stuff is only for your own consumption, when you find it useful
  3. Only compress – going from more text to less text is usually a good idea; going from less to more is just decreasing information density for no reason
  4. Don't shell out responsibility – i.e., don't say "ChatGPT says…" or whatever.
    1. That's an escape hatch foster pasting bad/wrong stuff you didn't double check into everyone's face. Take ownership of what it outputs, that way you're incentivized to make sure it's more or less correct.
    2. Although if you're not going to check either way, not saying anything just makes things harder on people, so if you're going to be an asshole, announce it!

Note: There is a caveat to my point on local models, however: datacenter models are more energy and CO2 efficient than running an equivalently sized model locally. Additionally, they can run larger and much more useful proprietary models, and sometimes there's a certain threshold of capability above which it's worth the energy and time spent on a model, and below which it's completely not worth it, but not using it at all will just waste more time and energy — after all, saving a human some time is more important IMO than the totally neglibable and overblown energy usage of their individual use of AI.

Moreover, not supporting the first two points by refusing to use datacenter models is largely a symbolic gesture: your individual usage of the models is not even remotely what's driving the expansion of data centers or the centalization of anything; not just because you're a tiny part of it, but also because, as the AI hype bubble has shown, they'd do it whether anyone uses it or not. So this is less of a hard and fast rule than the others, and more about just personally keeping yourself "pure," and avoiding manipulation and privacy breaches.

It's more like the usage of any other centralized Big Tech service: avoid it if you can, but sometimes it really is the best option.

If you're going to use a datacenter model, my advice is:

  1. Don't use xAI or OpenAIs models, prefer Google's instead. They're still an evil capitalist megacorp, but at least they seem to care about green energy, and not be actively producing sycophantic or nazi models. Their Tensor architecture for model training and inference is also significantly more efficient.
  2. Prefer smaller and more efficient models. Prefer mixture of experts models.
  3. Use models through your own locally hosted interfaces and through proxy SDKs like LiteLLM or OpenRouter when you can to avoid lock-in.
  4. Prefer to pay for your models, so you're not externalizing costs, and so you're kept honest and incentivized to prefer local models when you can use them.
  5. Read privacy policies carefully. Prefer local models.

4. Large language models will never be able to reason reliably… but that doesn't make them not useful

The fundamental structure of large language models is not just different from the structure of the human brain, but different in a way that fundamentally leans them away from truth and reasoning capabilities. The differences include:

  1. Trained on a reward function that only optimizes for generation of superficially human-seeming (by probability) token sequences, not token sequences which correspond to true or useful ideas or accurate reasoning or anything of the sort. Human beings, on the other hand, are trained on a mixture of both – and human beings which have been mostly trained to say things that sound correct, rather than things that are really truthful, are usually called idiots or narcissists.
  2. Trained without any form of embodied computing, which means that the tokens they manipulate can have no meaningful referent for them, which means that it isn't even really theoretically possible for them to reason with them in a way that is not just valid, but actually sound, since they have no knowledge of what the facts actually are or what any of these words mean. Moreover, this makes it unlikely for the detailed reasoning LLMs might be desired to perform to be valid either, since a concrete knowledge of what a word or symbol is referring to helps infer the principles and rules by which it might operate and how it might relate to other words in a way you can't get from the simple statistical likelihood or unlikelihood of appearing next to those other words, which is how the inferred "meanings" of embedding space work.
  3. No long term memory in which might be stored relatively concrete notions of rules, principles, definitions of concepts, and so on, against which current thought processes might be checked in a meta-cognitive process, which is typically how humans ensure that their reasoning is actually rational. Everything goes into an undifferentiated probability soup, or it's in working memory and thus acts as input for the existing trained weighting of token probabilities (the closest thing the LLM has to trained reasoning methods) to work on, not changing those fundamental probability responses to input tokens.
  4. A fundamentally nonsensical meta-cognition system through chain of thought. The idea behind chain of thought is to resolve the problem that LLMs are not capable of the meta-cognition necessary for strong reasoning skills and factuality – as mentioned in the previous point – by having them actually output their "thoughts" as part of the token stream, so that those, too, can become part of the token window, and influence further reasoning. The problem, of course, is that these thoughts are not at all what the LLM is actually "thinking," insofar as it can be said to be thinking – they are not a picture of the internal processes at all. They are just what the model thinks a plausible chain of thought from a human might look like, with no referent to its own internal states, because it has none, or if it does, it couldn't have access to them either. This means that LLMs are not actually capable of analyzing their own logic to see if they got something right or wrong.
  5. A "yes, and…" structure: they are only attempting to find a likely completion to whatever was input, which means they aren't going to actually be able to engage in any kind of rational dialogue or debate without extremely leading questions.
  6. An inability to have novel thoughts, due to their reward function. This means that if actual rational thought and inquiry necessarily leads to an unusual conclusion, they would still be unable to reach it, because they can only say what it would be likely for a human from their corpus of training data to say, whether it's true or accurate or not. And if we were to remove that, and make them often choose very unlikely next tokens, the illusion of their coherence would evaporate immediately. Whereas humans can remain coherent while saying very unlikely things.
  7. An inability to actually look things up.

Here's a pretty good paper from Apple showing that LLMs can't reason, in line with what I'm saying.

See also.

5. My opinion on large language models

There has been a lot of controversy around large language models lately.

5.1. Fundamental limitations

In my opinion, they have fundamental flaws that mean that you can't use them for many of the things people are claiming you can use them for, such as obtaining factual information, programming, or writing things for you. This becomes clear if if you look at how large language models actually work:

Furthermore, I think the hype around large language models is extremely harmful. Not only because it's creating an entire sphere of grifting that's taking advantage of people, and inflating a market bubble that will inevitably pop costing many people their livelihoods in the process, but also because while we shouldn't be afraid that large language models actually can take the jobs of writers, journalists, and programmers, the fact that they are being marketed as capable of doing so will be equally damaging. Capitalists have an inherent strong incentive to want to decrease labor costs and deskill and disempower workers, and the marketing hype that convinces them that we can be replaced with LLMs will enable them to do that whether or not LLMs can actually replace us.

On top of all of that, there's the way that the use of large language models for things like programming – and even writing – can effects learning. If instead of actually going through the trouble to actually understand how things work and why at your chosen level of abstraction in the software stack, or even delving below it so that you can use your abstractions more effectively (since all abstractions are leaky) – the principles, rules, and relations between things – and more importantly building up the meta-skills of problem-solving, critical-thinking, and autodidacticism, you instead rely on the AI to do your thinking for you, it can have serious consequences for your intellectual development.

That is what the study that is often misquoted by AI Luddites actually says. Not that having any contact with AI magically "rots your brain" or makes you "permanently dumber", as many headlines have made it seem, but the more limited and specific claim that using AI to automate actually performing a task, including reading, processing, and then synthesizing information, leads you to know less about the subject and how that task was executed — an extremely obvious outcome from the basic concept of automation, and not something to panic over, just to be aware of: that if you automate something, you know less about how it was done and in what way.

This would theoretically be fine if the AI was deterministic, so you could actually rely on it to behave in a reliable and understandable way, and if it didn't make mistakes or made mistakes in consistent and comprehensible areas, like a compiler almost, but AIs are the leakiest of all possible abstractions over real code, which means when something goes wrong or you want to make a change AI can't seem to do, you very much will still have to interface with the code it's written and thus flex all the muscles you atrophied. Not to mention that in the case where you want to use less popular technologies – which can often be far better than popular ones – suddenly you'll be stuck without your LLM safety blanket.

Even someone deeply invested in the AI hype – literally building an AI developer tool – has come to realize this, and actually state the consequences of it quite eloquently, although I think his solution does not go far enough.

Thus, it's important not to shun AI as some kind of inherently evil thing, but to understand how to use it in a way that avoids this problem on the things that matter to you. The point of automation is that we get to pick and choose our challenge — to decide that some challenges wouldn't make us stronger in ways we care about, or aren't particularly worth it to us, or are in the way of greater and more important goals we care about — but that also means that we have to be careful about what we offload to automation, and we must choose consciously to exercise ourselves on the things that we think are important to use. I think using LLMs for everything to do with coding, if coding is what is important to you, except on a single day each week, is a lot like driving everywhere even if it's five minutes away and the weather is good outside and you have a significant other who wants to walk with you, but then having a gym day — maybe it'll work for some people, but the more sustainable option, one that relies less on willpower, is to ensure that you integrate walking naturally and effectively into your everyday practice whereever it makes sense. Likewise, for programming, I would strongly recommend (and this is how I apply it yourself):

  1. Using LLMs (with citation systems) as a means of learning new things, both to get quick factual answers and to get jumping off points into the primary sources in greater detail. This way you're better for having used it, because you're still the one gaining knowledge, because you didn't use the AI to apply knowledge.
  2. Only using LLMs to automate tasks you genuinely don't care to learn to do (for me, that's writing jq scripts when I want to just quickly loop at some dumped JSON data), or which you already know how to do but would otherwise be more time-consuming than they're worth to do yourself but would be of great use if less time-consuming (having LLMs perform text transformations that would be insanely tedious and difficult to do, and constitute a distraction from my actual task to figure out, with Emacs keyboard macros and regular expressions, especially when the transform is complex and heuristic, like applying a new DSL I wrote to some previously-written code).
  3. Treat LLMs as a rubber duck to bounce ideas off of when you get stuck or don't know where to get next, as they're excellent at reframing or clarifying your own thoughts back at you; or to generate quick scaffolding or prototype code that you know you're going to completely write over time, but which will help you get started and prove out whether something is worth investing more time into.

5.2. On the other hand, I don't think LLMs existing, or even using them, is inherently evil.

First of all, I think, despite their flaws, AI tools are widely useful, for things like:

  • Summarization
  • Text transformation tasks (such as reorganizing things, applying a domain specific language to code that doesn't use it yet based on a few examples, copy editing, formatting things in LaTeX for you, etc)
  • Research and documentation Q&A (through things that allow citation)
  • Ideation
  • Quickly generating code scaffolds to get a programmer started on a problem, with the assumption that that code is just a prototype and is throwaway, or to help unblock someone
  • OCR (extremely accurate and cheap, at least with some models, and produce much cleaner and nicer output)
  • Anything with clearly verifiable outputs that can catch any errors they make
  • Natural language understanding (especially converting natural language to a structured data schema provided by the program and enforced at the token level)
  • Text classification
  • Supervised agentic workflows where you want a program to have some limited amount of flexibility and problem solving ability without having to explicitly code every edge case in, but a human will still be overseeing the process
  • And more things I probably can't think of right now.

As I say elsewhere, large language models will never be able to reason (alone), but that doesn't mean they aren't useful. Moreover, many of the issues mentioned above can be significantly ameliorated, to the point where they're barely a problem at all — although they can occasionally show up — through the use of neurosymbolic techniques orchestrated by human-supervised LLMs, so I think, after a nice AI winter to reset the hype bubble, we might technologically actually be on the right track. Even Gary Marcus himself admits this (sort of), although he frames it as a victory for him, even though he's long been advocating for an inverse architecture (symbolic at core, connectionist at periphery) and claimed that nobody was doing it for like a year after people started doing it.

Furthermore, it's not just a question of practical usefulness. I don't think they make the problem of SEO-optimized slop flooding the internet significantly worse than it already was (see also, which also makes the excellent point that modern search engines effectively hallucinate just as bad if not moreso than LLMs), and the solution to that problem remains the same as it ever was, because the problem isn't solely with LLM-generated content slop, but with content slop in general, irrespective of who or what it's generated by. In a sense, the slop having been generated by an LLM is just a distraction from the real problem. So the solutions will be something like:

None of these solutions are panaceas, of course – they will all have their own perverse incentives and distortions of human nature, but my point is that whatever solution we were going to come up with to existing problems that we already had, will also apply to solving the problem of LLM-generated content slop, and moreover, we really need to try something new and different, because we know what's going on now is particularly horrible and damaging to the noosphere, and maybe the distortions and perverse incentives of a different system will at least be more manageable, or smaller.

Likewise, I fundamentally don't think that large language models' use of "copyrighted" material is particularly unethical, because I'm fundamentally opposed to the idea of intellectual property and I think it's just completely absurd and contrary to how art and knowledge is built. A funny comment I've seen on this subject:

One thing I find interesting is how as soon as things like ChatGPT and StableDiffusion became popular, lots of people did a complete U-turn on their copyright stance. People who used to bang on about IP trolls screwing over creators suddenly went for full ‘RIAA in 2005’ IP maximalism.

My predominant problem with commercial large language models is simply that, typically, the collected training data, the weights, and the code to run them is not made part of the commons once more, and that distilled forms of these models are not made widely available so that the average person whose data was collected to construct these models can at least have a hope of running them themselves, rendering proprietary LLM companies hypocritical in their treatment of IP and constituting an enclosure of the commons. This is why I refuse to use anything other than open-weights local large language models like LLama3.2, and even then those models aren't good enough in my eyes because they don't allow commercial use and use deemed illegal or immoral by the powers that be.

Similarly, I find the argument that large language models are fundamentally hurting the environment or something fundamentally unconvincing. Even the largest, most resource intensive LLM – in no way comparable to the tiny 3 billion parameter model I run on my laptop locally – can only be painted as harmful to the environment by considering its impacts totally in isolation, without context and comparison to other common things like turning on a light bulb for proportion. See here.

5.3. My ideal AI world

I think the correct approach to large language models is to realize that they are very cool, very interesting, and shockingly, wonderfully generally useful, but ultimately they're just another natural language processing tool, with specific capabilities and limitations determined by their architecture and training method. They're not some kind of magical route to artificial general intelligence, although they may be a stepping stone there as part of a larger system, and they're not going to replace human reasoning and creativity, nor the necessity for computational languages, accurate information retrieval based on search, or anything else like that. Useful as part of a well-rounded human intellect augmentation system, especially combined with hypertext knowledge management like org mode, but only when used carefully and considerately.

Overall, I think they're worth having, but not gifted with the bloated, insane resources and hype they've been given — I think we need to stop focusing on pouring resources into endless scaling and instead focus on making the small models we already have faster, more efficient, and better; even relatively small LLMs like Qwen 3 30b-a3b are already more than good enough at the only tasks LLMs will actually ever be reliably good at, and scaling will only ever make LLMs marginally better at tasks they'll never actually be good enough at to be anything other than a hindrance, while at the same time sucking up resources and attention that should be put to other things.

The first and biggest thing we should do to make this whole situation better is not duplicate effort. Right now, because almost every big foundation frontier model is proprietary, and every company is looking to keep secrets and get ahead, we're running thousands of scraper bots across the internet to create duplicate troves of training data leading to websites getting swamped with bots; we're wasting compute and energy running several data centers training equally large competitor frontier models; we're wasting human labor and time and exploitation creating totally separate wells of data for data annotation and RLHF. If we could create some kind of centralized, open AI foundation that allowed all companies and academics and open source projects interested in this field to pool their resources, with one single set of bots scraping for training data so it doesn't overwhelm the web, with one set of data centers training one line of massive frontier foundation models, and pooling data annotation and RLHF resources, which than any company can come along and further RLHF, fine-tune, distil, rearchitect, or operationalize for their own ends, we'd be in a much more sustainable place. As I noted above, small models are sufficient for almost everything, so we'd only really need one frontier foundation model and the rest could be distilled from there.

If we wanted to progress LLMs even more, we could look into more efficient architectures for training and inference, like Google is doing with their TPUs, or you could focus on improving the transformer algorithm, like IBM is doing with (runs twice as fast, so half the computational resources used right out of the gate, compute used scales linearly instead of the hilariously inefficient quadratically of regular transformers with input size, and uses significantly less memory to maintain its own memory, thus allowing fewer compute units to be shared by more requests). They could also invest more into — as we're beginning to see — mixture of experts to decrease inference compute used and increase efficiency, conditional parameter loading, MatFormers, per layer embedding, and other optimizations that are thus far only being used for LLMs meant to run on edge-compute, as well as things like dynamic quantization and 1-bit LLMs alongside quantization-aware training. All of these would allow model capabilities to scale while actually decreasing our compute usage.

More important than any of this, I think it is crucial that we never allow machines — of any sort, from regular computer programs, to symbolic AI, to algorithms, but especially black box machine learning (until we get some kind of big breakthrough in explainable AI) — to have default decision making power over the lives of human beings. Not just that machines should not have "final say" in some sense where, to get to a decisionmaking power that isn't a machine, you have to appeal up and up through some kind of hierarchy of bots and corporate bureaucracy, but that they should not be making all the day to day decisions, should not be assumed to be making the decisions at first so that human decisionmaking is the exception, should not be at that first level of decision making in an organization that is what is experienced by most people interacting with that organization. Because realistically, if humans have the "final say" but machines are the default day to day decisionmakers, the average experience of human beings under such a system will not be one of human decision-making, but of machine decision-making, and it will only get worse, monotonically — organizations will just make it harder and harder to actually access those final human decision makers, just as a result of hierarchy and relative costs.

5.4. Why is machine decision-making bad?

  1. Machines are difficult to hold accountable. Not being people, we can't directly hold the machines accountable, so we have to go looking for the people behind a machine to hold accountable. But who that is can often be very difficult to ascertain, since there is always a vast army of people behind the implementation and operation of any machine, and any one of them can claim to not be responsible, with some reasonability. For instance, the corporate executives who put a machine in place don't know the full code or setup of it, so they could always claim they didn't know it would make a certain decision and didn't want it to; whereas engineers and programmers, not being remotely in control of the deployment and application of the machine, can always claim that they didn't want it to be used like that (perhaps absent necessary checks and balances, or making decisions over that particular thing, or whatever) or that they simply couldn't be asked to predict the infinite complexity of the world with all its edge cases and account for it all in their software, so punishing them would be punishing them for human fallability and limitation that they did try as hard as they could to overcome, given the limitations imposed on them by their own exectuives; and the executives of the companies providing the machines can always argue that it was the engineers' and programmers' fault they made mistakes, and so on…
  2. Perfect rule-followers. An important component of making human lives within a greater social/civilizational system bearable is the flexibility of the human decision-makers operating that social/civilizational system. The ability for them to bend the rules, or make exceptions, or go out of their way to figure out something to help you out, that allows the system to adapt to particularities and individualities, to care for people and help them out, even when the overall structure of the system isn't necessarily aware of that. The key to making a better system, in my opinion, is to have more of that case-by-case individual decision-making flexibility, and using machines as default decision-makers directly counteracts that, because machines rigidly and absolutely enforce rules from the top down, with no situational awareness.
  3. No other values. Machines only have the values their designers explicitly program (or RLHF) into them. That means they are perfect servants to the will of the powers that be in any hierarchy, which is good for the hierarchy but not good for the rest of us. While a human decision-maker may be forced to go along with those above them in the hierarchy most of the time, they can still rebel sometimes, even in small ways, through their other values of empathy, loyalty, justice, fairness, and so on. They can bend the rules, as mentioned in point 2, or strike, or complain, or whistleblow or any other of a myriad of actions that let them push back against the weight of the hierarchy above them. Machines will not do this, so in decision-making positions they centralize power more and provide no relief.

Instead, at the most, I believe machines should be used to automate helping human decision-makers gather information and understand it, in order to further human decision-making power. Some key rules for this are:

  1. No metrics. Such information gathering and understanding machines must not produce a black box "metric" that's just a final number or rating or something; they should instead provide all the components necessary for a human being to make an informed decision themselves. As soon as you have the machine outputting vague highly collapsed and abstract "metrics," you open the gate to introduce rulebooks by which humans should make decisions based on that metric, and suddenly your "human in the loop" has become simply a cog in the greater machine wheel.
  2. Draw on real data. The information any machine that helps human decision makers gather and understand information must do so based on externally-stored information entered by and understandable by humans that could be consulted separately and is known-correct, such as databases and documents, not on the basis of vague associations and ideas hidden in their weights or code even if that machine has been specially trained/programmed for the specific domain.
  3. Citations. Any machine that gathers, summarizes, or synthesizes data must provide citations (as links) back to the real data sources from which it drew, preferably based on breaking down its output into discrete statements of facts and then using a vector database to find the pieces of original data that align with that statement, and not just the AI's own generation of citations. The more localized the citations are to a specific part of the source data, the better, as well. Preferably something like this.

6. Capitalism innovates?

Capitalism does not innovate, because innovation is risky, whereas rent-seeking and financialization are profitable and mostly guaranteed-safe. Even when it doesn't choose rent-seeking and financialization, capitalism will choose to pander to the obvious gaps in the market that are easy to satisfy, or take existing desires and use advertisement to give them concrete referents in the world of products. And in all these cases, it will aim for the common denominator desires to satisfy, the ones with the widest appeal, because that is what best guarantees profits. I.e. it regresses to the mean.

Who does innovate, then? Only individuals or very small groups of individuals, who are motivated for intrinsic reasons around a common set of goals and values. Only people like that innovate, and that's usually orthogonal to capitalism at best – what those people most often want is a stable income to pay their bills and feed their families while they work toward their passion; they're not interested in "striking it rich" except insofar as it will help that goal. There are a few greedy exceptions, like Steve Jobs, but always behind them is another innovator who does it for intrinsic reasons, like Alan Kay.

Sometimes capitalism can provided the context for this kind of innovation, like with Xerox PARC and Bell Labs. But other times it's the government, like with SRI, SAIL, the MIT AI Lab, and CERN. What's important is a stable means of supporting yourself and your loved ones, and an environment of free intellectual play and experimentation, and a visionary set of common goals or interests. These can be created anywhere.

7. Freeing the noosphere

Author's note: the historical references found herein are meant to be general and impressionistic. I am intentionally simplifying and linearizing this narrative to make a point about how the representation media for ideas effects the nature of the noosphere-economy, not to make any historical point. I have linked to relevant respectable sources for each historical thing so that you can go learn the real history in all its proper complexity if you are interested.

The noosphere is the world created by human cognition: where ideas are born, grow, develop, are shared, split, merge, multiply, and sometimes die. It is emergent from and dependent on the physical world, deeply shaped by it, and also deeply effects the physical world, but it is also conceptually its own thing, having some of its own properties and laws.

A key feature of the noosphere is that while it is not free to create the things that exist there (ideas) because it takes time and effort to do so, once they are created, they are not scarce, nor rivalrous: they can be shared indefinitely and will not run out, and someone getting an idea from you does not take it away from you. When you communicate an idea to someone, you do not lose that idea and have to go back to the "idea factory" to make a new one or a copy of the old one – you and the person you shared it with now both have that idea. And if that person goes on to share that idea with other people, that is no burden on you; infinite copies of your idea can spread with near-zero cost to you.

Now, it may be argued that if someone "steals" an idea from you, you do actually lose something. Not the idea itself, but some credit, or opportunities like sales, that you might otherwise have gotten. However, I think conceptualizing these things as properly your possessions is actually an error in reasoning. Someone stealing an idea from you can't take away past credit you've received – awards, accolades, the knowledge in the heads of all the people that already knew you came up with the idea – and it also can't take away past sales or opportunities that you got as a result of the idea, because ostensibly you've already taken advantage of those. Instead, what they're "taking" from you when they copy an idea of yours is possible future credit – on the part of people freshly introduced to the idea – and possible future opportunities – such as future sales from people looking to buy something adhering to your idea.

The problem is that I don't think one can be coherently said to "possess" future possibilities.

First of all, they inhere in other people's actions and thoughts, not in anything you concretely have (by have I mean the usufruct definition, as usual in my work, of regular use, occupancy, or literal physical possession). I think it's wrong to give any person any sort of enforceable rights over the actions and thoughts of others that don't materially, concretely, effect them in some way – which, since they don't effect your own possessions, they don't. By saying that you have some sort of right over future credit or opportunities, you're saying that you have a claim on other people's future thoughts and resources – a right to control them!

This line of thinking is also confused, secondly but perhaps more importantly, because those future possibilities were only that: possibilities. Things you think you might have gotten. But any number of other things could have gotten in the way of that: maybe the idea isn't as good as you though it was; maybe a competitor with a different idea would've arisen; maybe you would've gotten sick and not been able to carry it out to completion. Even the person who copied your idea being successful isn't an indication that you would've been successful with that idea: maybe your execution wouldn't have been right in just the right way to catch people's imaginations and interests. Maybe your competitor was actually the right hands for the idea. So attempting to enforce your claim on such future "possessions" is attempting to enforce your claim on an ephemeral future thing which you might not have gotten anyway.

As a result, I don't think there's any coherent way in which it can be said that an idea is meaningfully "stolen." It's certainly terrible to see an original creator languishing in obscurity while an idiotic copycat with none of their original genius strikes it rich, and we should use all the social mechanisms – including ridicule, perhaps especially ridicule, because those who can't even come up with their own ideas are very worthy of it – available to us to rectify such situations. We should make giving credit to original creators a strong social norm. But in the end, ideas are non rivalrous. They can't be stolen, they can only be spread.

Already, I believe this to be a radically liberatory thing: the ability to share knowledge, ideas, discoveries, with anyone, infinitely – to spread them around, so that everyone has access to them, is a wonderful thing. Knowledge is power, as the old saying goes, and the freedom of ideas radically lowers the bar for accessing that power. The fact that a sufficiently-motivated person can get a college level education in anything through the internet, the fact that radical literature and ideas and history can spread through it, the fact that anyone can share their ideas and beliefs through it, these are incredible things.

I'm no idealist – material power is needed too – but at least we can have one world where there need be no push and pull, no worry about allocating resources, no necessity to divvy things up and decide who gets what and who doesn't. Everyone can have their fill of ideas, of knowledge, and there will be infinitely more to spare.

The noosphere has the best potential of any human realm to reach post-scarcity anarchy. Trying to bind this up, to turn ideas into property that only some can have and share, and then to use that monopoly on ideas to limit access to them, is to reproduce the hierarchies of the material world in the world of the mind – perhaps inevitable as long as we have hierarchies here in the physical world from whence the noosphere arises, but it is something that should be fought, rejected as a sad degradation of what the noosphere could be. Yes, a creator not getting the future benefits we would like them to get is horrible, and we should do something to rectify it, but destroying the radical power of a liberated noosphere is not the answer to that problem.

There is a catch to this, though. In order to share ideas, you have to transmit them somehow. That's nearly free in one on one conversations, but that's slow and exclusive – costly and scarce in its own way. Before the invention of writing, that's standing on the street corner or in the town hall spending actual hours of labor far in excess of a simple one on one conversation reproducing the idea for people to hear, or teaching in schools, or preaching in places of worship, or being a wandering teacher of various kinds. All of these require at least labor, and often physical material as well, that must be paid with each marginal increase in the amount of transmission of the idea. Moreover, actually turning the noosphere from a few shallow disconnected tide pools at the edge of a vast sandy beach by virtue of geography into an interconnected network was vastly expensive to do, involving costly and time consuming physical travel. Some would do this for free, realizing the potential of the noosphere in the truest form they could, but people have to eat, so often a price was asked for this dissemination of knowledge. Plus the time, labor, and material costs involved kept the spread of the noosphere slow and difficult. Thus, for most of history, while the noosphere had the potential to be post-scarcity, in its practical application it was not.

Then, in 3,400 B.C., came writing. Writing allowed someone to express, or even teach, an idea once, and then all that needed to be done after that was to pass around that book. It radically reduced the costs of disseminating ideas, bringing the noosphere even closer to its ideal. It still wasn't there yet, though: books could degrade over time through use, and if you've given a book to one person, that means another person can't have it. As a result, the dissemination of ideas was still limited, expensive, and rare, and thus ideas were de facto scarce. So more was needed.

The monastical institution of copying certain books en masse that arose in 517 B.C. was another improvement. While before books had been copied ad hoc in earlier ages by those who had access to them and happened to want another copy, now books were intentionally copied many times through a factory-like division of labor and rote performance of tasks. As a result the marginal cost of transmitting ideas became much lower, because the cost of creating a written representation that could infinitely transmit the same idea without further work by the author was much lower, and such representations were more plentiful. Scriptoriums created many copies for low work, and then each copy transmitted an idea many times with no extra work, and at the same time as each other. (We will see this recursive dissemination structure later.) Nevertheless, not enough of these could be created by this method to bring down the price in labor and scarcity by much, so focus was placed on making the copies beautiful through illumination, and the were preserved for a lucky few. Ideas were still scarce, even at this stage.

The natural extension of the scriptorium was the printing press, invented in 1455: now, the copying of books could be done by machine, instead of by hand. Infinitely faster and cheaper, suddenly knowledge could be spread far and wide for a relatively cheap sum. First books, then newspapers, then pamphlets and zines. As printing technology got more advanced and mass production printing was figured out, things got cheaper and cheaper. Now ideas could be disseminated for a few cents at most, and then the representation of those ideas was durable enough to be disseminated from there too. However, the longer and more complex the idea was, the more it cost, and if it was really long and complex and extensive, it could still be prohibitively expensive for other people. Additionally, it was impossible for the average person who got a representation of an idea to reproduce it further for others in a meaningful way – you can't perform mitosis on books. And getting ideas to widely different locations was still time consuming, expensive, and difficult. Ideas were still not yet free.

Then came 1989 and the World Wide Web, and with it, a total paradigm shift. Whereas before each individual transmissions (in the case of teaching) or representations that can perform transmissions (in the case of books) of an idea costed labor, time, and/or material, now individual transmissions and representations of ideas, in the form of bits, were just as reproducible, just as non-rivalrous, as ideas themselves. Instead, the cost was in infrastructure, as well as in bandwidth: a mostly up front, or fixed and recurring, cost for the capability to transmit, not each transmission or reproduction itself, and one which scaled incredibly slowly with the amount of representations of ideas disseminated, making individual ideas essentially free. The fundamental asymmetry between ideas and the representations needed to spread them was beginning to break down.

Even more game-changingly, even the bandwidth problem could be solved through the post-scarcity and non-rivalrous nature of the digital noosphere. Every download of information from one location creates a copy of it essentially for free (post-scarcity), and that can be done infinitely without losing the information (non-rivalrous), and furthermore each person who downloads information can themselves disseminate the information infinitely, and those people can in turn do so, recursively (unlike books). No one person needs to bear much of the cost at all for the total dissemination of an idea!

Another fundamental structural difference to the noosphere that the advent of the World Wide Web enacted was that geography suddenly mattered far less: once infrastructure was established once between two locations, geography no longer mattered: communication was nearly as cheap, and nearly as instantaneous, in comparison to the cost and time lag it had had before, with someone across the globe as it was with someone next door. The noosphere was no longer tide pools that a few brave crabs managed to scrabble out of and move between, but a vast information ocean.

Not only that, but the very ideas that could be disseminated changed: once enough bandwidth was built, audio and video could now be disseminated, meaning better reproductions of ideas and reproductions of ideas that would have been difficult to disseminate before. Still later, interactive information became possible, with things like Java Applets, Flash, and eventually JavaScript, making the better dissemination of knowledge and ideas through teaching programs and interactive notebooks, and the dissemination of still more novel ideas, possible. Once, film, music, interactive teaching, and performance art were not ideas, but concrete products, or performances – the world wide web made them part of the noosphere. Once, you could only get transcripts of a lecture, not see a great teacher performing it.

All this information could suddenly be created and shared much, much faster than before – almost instantly – allowing the dissemination of ideas in real-time, to individual audiences or broadcast to many, as cheaply and easily as the dissemination of any other idea. Actual discussions, with all the back and forth, the detail, the texture, and the interactivity of conversations could happen across the entire world, and be preserved for future generations to read.

Ideas could also be spread, and received, anonymously or pseudonymously, depending on your preferences. Social inequality, prejudice, bigotry, ostracism, mob bullying, and exclusion didn't disappear, but suddenly they depended on a person intentionally choosing to make aspects of themselves known and to maintain a consistent identity. They were still a problem, but one that was less baked into the system.

I cannot begin to overstate the revolutionary potential of the noosphere so liberated. It had the potential to be a world where the barrier to entry for obtaining and disseminating knowledge, ideas, and experiences was radically lowered, and the marginal cost nearly zero. Where people could freely communicate, think, learn, and share, become educated and empowered.

There were dark sides, of course. With that radically lowered barrier to entry, fascinating new ideas, creative works, remixes of existing ideas, and radical texts, that would not have been widely available, or available at all, became instantly and globally available for sharing; but so did an endless sea of garbage. With access to all that information, some could educate themselves, and some could find alternative facts and "do their own research."

Is trying to dictate who can share ideas, and what ideas can be shared, through centralized, bureaucratic, highly status-oriented, elite institutions, really the right solution to those problems, though? Those who would find alternative facts and "do their own research" today would likely have been equally dismissive of what came out of centralized, "legitimate" institutions, equally likely to substitute their own beliefs for reality, to pick things up from hearsay and what their Uncle Bob told them while he was drunk at thanksgiving and ranting about the out-of-towners. The things they pick up in the digitized noosphere are just cover for their existing predilections.

More, there's no reason to think that whatever institutions happen to have power and legitimacy in our society will always necessarily be systemically more factual, less propagandistic, less blinkered, and less manipulative – they will just be so in service of the status quo, and so their problems less evident, and the status quo can change for the worse. In this historically contingent situation our institutions are better than much of what is shared in most of the noosphere, but relying on that to always be the case is dangerous – and they're only better as far as we know. When will the next revolution in thinking happen? Where will it start?

Instead of trying to relieve people of the burden of thinking by filtering their information for them like a mother penguin chewing a baby's food for it before vomiting the sludge into its mouth, we need to systemically and societally to get to people first, before their introduction into the wider noosphere, so we can provide people better tools and support networks to shoulder the responsibility of thinking for themselves. This should be the responsibility of parental figures and communities.

Finally, the radical, liberatory, empowering, potential of the noosphere made free by the world wide web is, in my opinion, well worth having to try to figure out how to mitigate the risks.

The problem, however, is that the system is afraid of the noosphere. Thus it introduced the framework of intellectual property to pin it down, so that some could be given "exclusive rights" – monopolies – to certain ideas or pieces of knowledge. The system has always justified this in terms of protecting and benefiting the producers of ideas by giving them first-mover advantage, but the system always ultimately serves the interests of the rich and powerful. So as much as small artists may cling to the notion of copyright, for instance, should they ever have their work stolen by anyone, they won't have the money to go to court and actually do anything about it; meanwhile, the mega-corporations and mega-wealthy who run our society can steal with impunity and there's nothing anyone can do about it, while cracking down harshly on any imitation and iteration on their own work. Even though imitation of and iteration on ancient work is the lifeblood of art and science throughout history, the absurd logic of copyright has been expanded inexorably throughout modern western history.

And this has been extended to the noosphere itself, smashing many of the radical, liberatory possibilities it held within it, leaving us with the worst of both worlds: much of the revolutionary potential of a digitized noosphere crushed under the weight of intellectual property while the mirror image dark consequences of the noosphere run totally unchecked, because it is not profitable to check them. In fact, the hate engagement is very lucrative.

It's worse than that, though: information wants to be free – because digital representations of ideas can be infinitely copied and disseminated by default extremely easily, because copying is the very nature of how computers work and how networks transmit things, it isn't enough to lock the only copy of the representation of an idea in some lock box somewhere and only give a copy to those who pay for it, confident that they couldn't produce more representations to give away for free to all their neighbors on their own, and even if they did it would be easy to notice and shut down. Instead, an entire perpetually metastasizing surveillance and control system must be created to make sure copyright isn't violated – things like Denuvo and DRM – stealing trust and control from people to rub salt in the wound of the destroyed potential of a digital noosphere.

(Moreover, with the increasing migration of people away from decentralized services – because the cost of individual autonomy is personal responsibility and choice, and that is too high an asking-price for many part-time vacationers in the noosphere – centralized gatekeeping institutions for the dissemination of facts and information are being formed ad hoc and for profit, but that's out of scope for this essay.)

If we want to bring the noosphere to its full potential, we must put a stop to this, and that can only be done by recognizing some principles:

  1. Once you put an idea out in public – or its representation in digital form, since it is economically identical – you do not have the right to control what other people do with it, because what they do with it does not harm you or take anything away from you or require any marginal labor or time from you, and controlling what they do with it is domination of what they do with their own minds and bodies.
  2. Copying, imitation, and iteration on existing ideas is a fundamental part of knowledge production. Without the ability to pull from a commons of free flowing, up to date, interesting ideas, art and knowledge production will wither.
  3. Since the digital noosphere is a non-scarce economy where once one puts out an economic unit (in this case, an idea) it can be infinitely and freely shared with anyone, one cannot put a price on an idea, or the digital representation of an idea, itself. One can put a price on the performance of an idea, or a material representation, or on the production of further ideas that you might otherwise withhold, though.
  4. Copyright law has never, and will never, actually serve artists. It is a tool to be used against them, and for censorship.
  5. Anonymity is important and should be preserved as much as possible.
  6. Mirror things you like, to make bandwidth less of a cost in disseminating ideas.
  7. The digital noosphere must be seen as:
    1. a gift economy in the sharing and creation of new ideas: this means that ideas are shared freely in the expectation that improvements of them, iterations of them, or new ideas will be shared in return, and also in return for credit – which, while not a right, should be strongly encouraged by social norm – which can be transformed into reputation, and from there into material compensation, if needed, through things like Patreon and Kofi;
    2. and an economy centered around a huge shared commons of existing resources: this means that all shared ideas go into the commons, and, to protect this communal wealth from extraction and exploitation, where the communal labor and wealth is enjoyed but not contributed to, iterations and modifications of ideas from the commons must also be part of the commons.

These principles are why I license all of my work as Creative Commons Attribution-Sharealike 4.0: such licenses are not legally enforceable, or should not be, but they represent an informal contract between me and my readers, as to what they can expect from me, and what I would like to see from them: attribution, and contribution of any derived work to the commons in the same manner that I contributed my work to the commons, are what I expect from them, and in return I will allow them to copy, redistribute, modify, and make derived works based on my work as much as they like. I know this won't make a change systemically – I don't know how we can, in the face of "those weary giants of flesh and steel" – but that's my small part to play.

I also don't think the right to restrict the use of your work once you've publicly released it should exist, so using a license that uses the copyright system against itself, to disable it by forcing any derived works to go into the commons – where they belong – seems ethical to me: I'm only restricting people's ability to dominate others through introducing IP, not to exercise autonomy. Don't confuse domination for an exercise of autonomy.

8. The intellectual property views of traditional artists in the age of NFTs and Generative "AI"

I recently came to a really interesting realization.

So, okay. We all remember the huge cultural phenomenon that was NFTs, that appeared for like a couple months and then immediately disappeared again, right?

What were NFTs exactly?

I'll tell you: they were a way of building a ledger that links specific "creative works" (JPEGs, in the original case, but theoretically others as well – and yes, most NFTs weren't exactly creative) to specific owners, in a way that was difficult to manipulate and easy to verify. Yes, it was implemented using blockchain technology, so that ledger was distributed and trustless and cryptographically verified and blah blah blah, but the core of it was establishing hard line verifiable ownership of a given person over a given piece of content, and to prevent copying and tampering. It was an attempt to introduce the concepts and mecahnics of physical asset ownership into the digital noosphere, to make it possible to own "digital assets."

The backlash against NFTs that I saw from indie artistic and progressive communities was centered on three fundamental observations:

  1. The concept of "theft" of a digital "asset" that you "own" is fundamentally absurd, because someone else creating a duplicate of some digital information that you "own" but publicly shared doesn't harm you in any way. It doesn't take away money or assets or access that you previously actually had, it doesn't involve breaking into something of yours, or damaging anything of yours, or threatening you.
  2. Physical-asset-like "ownership" of digital assets is not only also absurd, but completely impossible, because as soon as you publicly broadcast any digital asset, as many copies are made as people view your work. That's how broadcasting digital information works: it's copied to the viewers' computers – and from there all they need to do is "Right click, save as…" and then make as many copies as they want and distribute them themselves; and furthermore, any attempt to prevent this will always violate the freedom and privacy of everyone (see also: DRM).
  3. Treating infinitely copiable digital echoes, patterns of information stored as bits in a computer, as ownable assets, introduces distorted, insane dynamics into the noosphere, because now you have market dynamics, but not actually grounded in any kind of actual value or labor or rivelrous, scarce asset. And that's what we saw.

And what was the praxis, based on these critiques, against NFTs? Nothing less than widespread digital piracy. Not against coporations, but against individual artists. Now, you might dismiss this characterization, because that piracy wasn't technically illegal – as the right to own NFTs had not yet been codified into law – or because those artists were often grifters – incompetent, unoriginal, soulless techbros looking to make a quick cash grab – but the quality of a piece of art doesn't dictate whether it's a creative expression of some kind (we've all seen tons of incredibly lazy fanfic in our day, I'm sure), and the technical legality of what was done doesn't actually change the characteristics of the action (if all IP was abolished tomorrow, I'm sure most indie artists would still insist on it, in the current cultural climate, but we're coming to that)!

So the response to NFT was fundamentally just the idea that you can't own an image or other artistic work that is purely represented as digital information because it's infinitely copyable and piracy is a thing. And because owning pieces of the digital noosphere is illegitimate and introduces all kinds of bad mechanics into the economy.

And I'm sure you all can see where I'm going with this now.

Because, now that GenAI is on the scene, what has become the constant refrain, the shrill rallying cry, of the indie artists (as well as the big megacorporations, funnily enough)? Nothing less than the precise inverse of what it was in the face of NFTs:

  1. Copying information – a digital "asset" of some creative work – is now theft, and causes real damage to those who've had it copied; they somehow lose something deeply important in the copying.
  2. We must rush to introduce centralized registries, or embedded metadata, about who owns what digital "asset," and rigerously enforce this ownership with controls on copying and viewing and usage, at whatever cost, through means like DRM.
  3. Treating infinitely copiable digital echoes as if they're ownable physical assets is not bad, but in fact important and necessary to save the economy, freedom, democracy, and artistic livlihoods!

Not only that, but suddenly piracy, especially piracy of an individual artist's work, is the highest crime imaginable. Look at how people are talking about Meta using libgen – a tool all of us use to pirate the works of individual artists every day, from what I can tell looking at online discussion in artistic and progressive circles – to get books to train Llama!

Suddenly, it feels as if every independent artist that hated NFTs when they came out would actually be a fan of them, if they'd been introduced by a different cultural subsection of the population (artistic advocates instead of cryptobros), if they'd been explained in different terms (in terms of "preventing exploitation of labor" and "worker ownership of the products of their labor" instead of in terms of capitalist property and financial assets), and if they'd arrived after the advent of generative AI.

What the fuck is going on here?

I think it's two things.

One, as much as we valorize independent artists and progressive activists as vanguards of morality and clear sightedness and human values, they're just humans like the rest of us, and ninety-nine percent of the time, their reactions to things are dictated by tribalism – if something is introduced to them by a side of the culture wars they don't like, it's bad; if it's introduced by a side they do like, it's good, and it's as simple as that. So since NFTs were introduced by cryptobros, they found whatever reasons they needed to say NFTs were bad, and when techbros (often former cryptobros) introduced GenAI, progressives and artists found whatever justification they needed to say GenAI was bad.

The other aspect, I think, is material interests. When NFTs originally came around, they were solving an economic problem no one had yet – needing to own digital assets to protect economic interests – so they were mostly peddled by grifters and scam artists, and they offered no material benefit to artists, while coming from a side of the culture war artists are rightly opposed to – so it was easy (if also, but perhaps only incidentally, right) for artists dismiss and make fun of them. But now that GenAI exists, the underlying goals of the NFT technology and movement, its underlying philosophy, actually does serve the economic interests of artists, so now they're embracing them, mostly without even realizing it. Basically, it's as simple as that: the economic interests of artists weren't in play before, so they were free to make fun of grifters and scam artists and play culture war with an easy target, but now that their economic interests are at stake, they've been forced to switch sides.

So it's not as if this shift is exactly irrational or nonsensical. It makes sense, and is even sympathetic, at a certain level. The point I'm trying to make here is that no matter how morally justified and principled the popular backlash against these things may seem, it fundamentally isn't. It's just about base, selfish economic interests and culture war tribalism all the way down. Artists are not the noble outsiders we make them seem to be; they're just as much an economic class with a tendency to wide amoral backlashes to protect their interests as Rust Belt workers are. That doesn't mean individual views on the matter can't be nuanced and principled, or that you can't even find some way – although I don't see a convincing one – to thread the needle and condemn both NFTs and GenAI, but on a societal level, the public outcry is no more principled than the reaction to negative stimulii of an amoeba.

9. Why am I being so mean to indie artists? Am I a tech bro?

To be perfectly clear, the purpose of this post, and all my other posts on this page expressing frustration at popular views concerning information ownership and "intellectual property," is not to punch down at independent artists and progressive activists. I care a lot about them, because I'm one, and I know many others; I'm deeply sympathetic to their values and goals and their need for a livelihood.

The reason I write so much about this topic, directed as often if not moreso at independent artists as corporations trying to enclose the commons, is that while I expect big corporations – whether Disney or OpenAI – to be unprincipled, to push for convenient laws and regulations that expand their property rights, introduce scarcity, and lock down free sharing, expression, and the creation of 10 for their own material gain, I expect so much better from artists and activists, and so it's deeply frustrating to see them fail, to see them fall back on justifications and calls to action that only help companies like Disney and Universal which have been the enemies of a free culture and artistic expression for time immemorial, ideas which will only lend power to forces that have been, and with their legitimation, will continue to, make the creative ecosystem worse and give capital more control over ideas. It's not because I want to defend the big GenAI companies – the world would be better if they all died off tomorrow – but because I think there is something deeply valuable at stake if we have a public backlash against free information and open access, especially if that backlash also aligns with, and thus will be backed, by powerful lobbyists and corporations and politicians.

Not to mention the fact that none of this will achieve what they hope: if we force GenAI companies to only train on licensed data and creations, they won't just stop training on people's data and creations, nor will they pay individual artists. They'll just offer huge, lucrative contracts to big content houses like Disney that already take ownership of all the work their artists do, and every possible platform under the sun that artists use to distribute or even make their work, and all that will happen is that all those content houses will take the contracts, and the monetary incentive will motivate every platform and tool to require artists to sign over ownership of their work so that those platforms and tools, too, can take the contracts, and in the end GenAI will end up with the same training data, but in a situation where we've now encoded hardline ownership of rights over information and art, but no artist actually has those rights, only capital does. Not to mention that the need for such lucrative contracts will make any truly open source development of AI, to take away the monopoly that companies like OpenAI have, finally impossible, only solidifying their power.

10. Are LLMs inherently unethical?

In my view, large language models are just tools.

Just like all tools they can have interesting uses –

LLM agents; summarization, even in medical settings; named entity extraction; sentiment analysis and moderation to relieve the burden from people being traumatized by moderating huge social networks; a form of therapy for those who can't access, afford, or trust traditional therapy; grammar checking, like a better Grammarly; simple first-pass writing critique as a beta reader applying provided rubrics for judgement; text transformation, such as converting natural language to a JSON schema, a prerequisite for good human language interfaces with computers; internet research; vision; filling in parts of images; concept art; generating business memos and briefs; writing formal emails; getting the basic scaffolding of a legal document out before you check it; rubber duck debugging; brainstorming

– and really bad uses –

programming; search (without something like Perplexity); filling the internet with slop; running massive bot farms to manipulate political opinion on social media sites; creating nonconsensual deepfakes; shitty customer service; making insurance or hiring decisions; creating business plans; .

They can also be used by bad actors towards disastrous ends even when they're being used for conceivably-good proximate purposes –

as an excuse to cut jobs, make people work faster, decrease the quality of the work, deskill people, and control people –

or positive ends – make people more productive, so they can work less, and endure less tedium, to produce the same amount, help disabled people, etc –

…just like any other tool.

But that's not how people approach it. Instead, they approach it as some kind of inherently irredeemable and monstrous ultimate evil that is, and must, literally destroy everything, from human minds to education to democracy to the environment to labor rights. Anyone who has the temerity to have a nuanced view – to agree that the way capitalists are implementing and using LLM is bad, but say that maybe some of the ethical arguments against it are unjustified, or maybe it has some uses that are worth the costs – is utterly dragged through the mud.

This behavior/rhetoric couldn't, I believe, be justified if it was just in response to the environmental impact of LLM, or the way data labellers are exploited: the answer to that, like any other thing in our capitalist economy that's fine in concept but produced in an environmentally or other exploitative way, such as computers themselves, shoes, bananas, etc., would be some combination of scaling back, internalizing externalities, changing how it's implemented to something that's slower and more deliberate, all achieved through regulation or activism or collective action; not to disavow the technology altogether. (This is even assuming the environmental impact of LLM is meaningful; I don't find it to be).

In fact, all of the negative environmental pieces on LLM (two representative examples: 1 and 2) fall afoul of a consistent series of distortions that to me indicate they aren't written in good faith – that unconsciously, the reasoning is motivated by something else:

  • failure to provide any context in the form of the energy usage of actually comparable activities we already do and aren't having an environmental moral panic about, such as video streaming;
  • failure to take into account alternative methods of running large language models, such as local language models running on power efficient architectures like Apple Silicon;
  • the unjustified assumption that energy usage will continue to hocky stick upward forever, ignoring the rise of efficiency techniques on both the hardware and software side such as mixture of experts, the Matryoshka Transformer architecture, quantization, prompt caching, speculative decoding, per-layer embedding, distillation to smaller model sizes, conditional parameter loading, and more;
  • comparison to the aggregate power usage of other widespread activities like computer gaming, since its power use may seem outsized only because of how centralized it is;
  • and more I can't think of right now.

It also can't be justified in response to the fact that LLM might automate many jobs. The response to that is to try fight to change who benefits from that automation, to change who controls it and how it is deployed, so it's used to make workers able to work less to produce the same amount (and get paid the same), or to allow them to produce more for the same amount of work (and thus get paid more), instead of being used to fire workers. Hell, even if that's impossible, we know how automation plays out for society in the long run: greater wealth and ease and productivity for everyone. Yes, there is an adjustment period where a lot of people lose their jobs – and you can't accuse me of being callous here, since I'm one of the people on the chopping block if this latest round of automation genuinely leads to long term job loss in my trade – and we should do everything we can to support people materially, financially, emotionally, and educationally as that happens, and it would be better if it didn't have to happen, but again, if the concern were truly about lost interim jobs during a revolution in automation, the rhetoric wouldn't look like this, would it?

Fundamentally, I think the core of the hatred for LLM, then, stems from something deeper. As this smug anti-LLM screed states very clearly, the core reason that the anti-LLM crowd views LLM the way it does – as inherently evil – is because they've bought fully into a narrow-minded, almost symbolic-capitalist, mentality. If and only if you genuinely believe that something can only be created through what you believe to be exploitation, then it would be justified and to act the way these people do.

Thus while I wish anti-LLM people's beliefs were such that discussing LLM "on the merits," and how to scale it back or make it more efficient or use it wisely, was something they could come to the table on, thier moral system is such that they are forced to believe LLM is inherently evil, because it requires mass copyright "theft" and "plagerism" – i.e., they're fully bought into IP.

Because yeah in theory you could make a copyright-violation free LLM, but it'd inherently be a whole lot less useful, in my opinion probably not even useful enough to break even for the time and energy it'd cost, because machine learning doesn't extrapolate from what it's learned to new things in the way human minds do. It just interpolates between things it's already learned – I like the term "imitative intelligence" for what it does – so if it doesn't have a particular reasoning pattern or type of problem or piece of common sense or whatever feature in its dataset, it can't do it or tasks like it or involving pieces of it very well. Now, it learns extremely abstract, very much semantic "linguistic laws of motion" about those things, it isn't "plagerising," but the need for a large amount of very diverse data is inherent to the system. That's why large language models only began to come to fruition once the internet matured: the collective noosphere was a prerequisite for creating intelligences that could imitate us.

So, if anti-LLM advocates view enjoying or using something they've already created, that they bear no cost for the further use of, that they publicly released, as "exploitation", simply because someone got something out of their labor and didn't pay rent to them for that public good (the classic "pay me for walking past my bakery and smelling the bread"), then like… yeah. LLM is exploitative.

Personally, it just so happens that I do not give a flying fuck about IP and never did – in fact I hate it, even when artists play the "I'm just a little guy" card. It is not necessary to make a living creating "intellectual property," and it only serves to prop up a system that furthers the invation of our rights and privacy and control over our own property, as well as the encroachment of private owners – whether individual or corporate – into the noosphere, and foster territorial, tribalistic approaches to ideas and expressions. Sell copies of your work only as physical items, or special physical editions that someone might want to buy even if they have a digital copy, to pay for your work. Or set up a Patreon, so that people who appreciate your work and want you to continue doing it can support you in that, or do commissions, where, like Pateron, payment is for the performance of labor to create something new, instead of raking in money for something you already did.

I really don't believe that if you make information or ideas available to the world, you get to dictate what someone else does with them after that point; I think the idea of closing off your work, saying "no, scientists and engineers can't make something new/interesting/useful out of the collective stuff humanity has produced, because they're not paying me for the privilege, even though it costs me nothing and requires no new labor from me", while understandable from the standpoint of fear about joblessness and lack of income under capitalism, is a fundamentally stupid and honestly kind of gross view to hold in general.

But that's what they hold to, and from that perspective, LLMs truly can't really be neutral.

Update:

  1. One of the requirements for "ethical" AI that smug anti AI screed is this:

    To satisfy attribution and other prominent notice requirements in an ethical, respectful way, the model must also reference the sources it used for any particular output, and the licenses of those. You cannot claim to respect the license of your sources without respecting the attribution and prominent notice requirements. You cannot make your model practically usable without listing the sources for any given output.

    As it turns out, this is precisely what Google's AIs do:

    This field may be populated with recitation information for any text included in the content. These are passages that are "recited" from copyrighted material in the foundational LLM's training data.

    Basically, it seems that any model output that is substantially similar to something from its training data (which Google also keeps on its servers) and which is over a certain length, is automatically flagged at the API level and a citation to the original source material is added to the result from any batch request. It even is able to put license data directly in the citation object when it could be automatically retrieved from the source (such as with Github), but since it provides an original URI, anyone who's curious should be able to find out what the license of the original work is themselves. Moreover, it provides the exact portions of the output that are or may be recitations. The accuracy of the system, form my testing, also seems to indicate this is done at the API level, not just asking the model and hoping for something that isn't hallucinated — and that would make sense, since flexible textual search through gigantic databases of text is Google's specialty, and a well understood computational problem. There's no way to turn this off either. So once again (as with Gary Marcus complaining that AIs don't do search to ground their facts, when they actually do when you don't manually turn it off), this is a case of anti-AI activists being out of date on the facts, usually because they're willfully and proudly ignorant.

  2. I was also possibly wrong too: there is some preliminary research that suggests that allowing web crawlers that collect training data for large language models to respect web crawling opt-outs does not significantly decrease the performance of the resulting model, only its knowledge-base of specific topics or fields, and since IMO we should not rely on models' intrinsic knowledge for any specific topic anyway, relying instead on web search/grounding/RAG/Agentic RAG/ReAct, that doesn't seem like a huge sacrifice to me. Of course the problem is that this experiment assumes that, should web crawlers start respecting these opt outs, nearly everyone wouldn't put them up, just really damaging model output. I think the better answer to the problem of bots swamping smaller websites is to have a common nonprofit foundation that collects training data for all AI companies, which then clean and refine it their own ways, that way only one set of bots needs to do the collection. They could also make their bots more respectful of resources in other ways (like cloning, instead of crawling, git repos).

11. TODO Analytic philosophy argument for philosophical egoism

12. TODO My system of nihilist ethics

13. TODO The problem with utilitarianism

14. TODO Perspectivist epistemology is not epistemic relativism

15. TODO Weberian disenchantment is a spook

16. TODO In defense of parsimony

17. TODO Radical elitist egalitarianism

18. How to do a revolution

And tonight, when I dream it will be
That the junkies spent all the drug money on
Community gardens and collective housing

And the punk kids who moved in the ghetto
Have started meeting their neighbors besides the angry ones
With the yards
That their friends and their dogs have been puking and shitting on

And the anarchists have started
Filling potholes, collecting garbage
To prove we don't need governments to do these things
And I'll wake up, burning Time's Square as we sing
"Throw your hands in the air 'cause property is robbery!"

– Proudhon in Manhatten, Wingnut Dishwashers Union

Many leftists seem to have this idea that there will be one glorious moment, a flash in the pan, where we have a Revolution, and the old system is overturned so that we can construct a new system in its place. Some believe we can't do anything but wait, study, and "raise consciousness" until, then, while others try to take useful, but limited, action of some kind in the meantime, like fighting back against fascism or various other forms of activism.

The problem with this idea is that, as flawed as our current system is, many people depend on it, often desperately and intimately and very legitimately, with no clear way to do without it. Yes these needs served by the system could be provided-for in other ways; if that weren't possible, then overturning the system would be wrong. However, the presence of the system, providing for those needs, and often explicitly shutting out and excluding other means of providing for them, as well as propagandizing us against even thinking of still other means, have ensured that those new systems we envision are not in place, and our muscles for constructing them are atrophied.

Thus, if the system were to be overturned overnight in some glorious Revolution, there would not be celebration in the streets, there would not be bacchanals in honor of the revolutionaries. There would be chaos and destitution, the weeping and nashing of teeth, the wearing of sackcloth and ashes, even as the glorious Marxist-Leninist-Maoists scolded those mourning for mourning an exploitative system.

What can we do, then? This system must be overturned – or, at least, we must struggle toward that end – so how do we avoid this outcome?

The key is to build our own system in the interstices of the old one. Each of us must go out and try to create some part of the system we would like to see, according to our expertise – if you're a doctor, practice medicine for your friends outside the traditional healthcare system, inasmuch as you can; if you're a programmer like me, build systems for your friends to use that exist outside the software-industrial complex; if you're an academic, steal the papers and ideas you're exposed to and make them available for others, give impromptu classes; no matter who you are, take part in making and distributing food and resources if you can, however you can; take part in skill-shares; call each other instead of the police and mediate your own disputes; protect each other – perhaps institute a rotating duty of protection for group events; in short: do what you can, according to your skills and abilities, to provide for those immediately around you, an alternative to the system. Don't just "organize" for activism or to fight fascists. Organize to actually provide useful services. Organize to fill potholes!1

The next step is to slowly grow whatever practice or service or community event you've started so it can serve more people, and so that more people can get involved and help. Do this according to whatever ideas about organization you have – I'm not here to talk about that component of it. But the important part is to to do it. Don't focus on growth at all costs; make sure to maintain the original spirit, purpose, and values of the thing; don't let legibility, acceptability, and so on corrupt what it is; and don't let it grow beyond whatever its natural size is. But let it grow. And when it reaches the point past which you don't think it should grow anymore, try to encourage the creation of similar systems, the following of similar practices, in other places far and wide, on the basis of your practical experience. Maybe, if you can afford it, travel, and plant those seeds yourself. Then network those growing trees together, so that they can aid each other in times of need.

Remember, the point is to provide things people need. Not to grow for its own sake. Not to "do leftism" – so it shouldn't even be overtly ideological, or overtly targetted at leftists, or anything like that, and it should especially not exist purely in political domains, to fight political battles – but to do something people need done.

If we do this, then if the system is ever toppled, we'll be ready: we'll have built things that actually have a shot of taking over from the old system and providing for people. There will be horrible growing pains to be sure – shortages, bad organization, unprepaired networks, what have you – but at least there will be something there. More, we'll have practiced, grown experienced, actually learned how to be adults and do the things we wanted take over from the system, instead of just demanding them be done, but not learning how to do them. Even better, we'll have had time to experiment with all the different ideas and ideologies around organizing, and figured out which ones work and which don't, which are more successful, and which aren't.

In fact, if we do this right, there may not even be a need for us to initiate a "Revolution" against the system. In my ideal vision of a "revolution" against the system, we just continue building our alternatives, providing for more and more people, and in the process purchasing investment and buy-in from them in our ideas and in our systems and networks and organization, building good will and loyalty with them, until finally our alternative systems threaten the system as it exists enough – as the Black Panthers did – that the system descends upon us to throttle us. And maybe, hopefully, we'll be strong and numerous and self-sufficient enough to resist, and have enough love and good will and investment, from all the people we help, that we'll be able to make it a public relations disaster for the powers that be to grind us beneath their heel, and they'll be forced to withdraw and let us live our new, free lives in peace.

And hey, if the revolution doesn't work out? At least we helped some people.

19. AI enables privacy laundering

I think this video is really emblematic of a serious problem that we are going to have as a society in the future: privacy laundering by means of AI.

They say at the beginning of the video that they have a rule at corridor that they don't record people without their knowledge and consent. However, they have a goal they want to achieve that surveillance will make significantly easier, so they have a motivation to come up with a rationalization for that surveillance, and AI offers the perfect opportunity for that: they convince themselves that just because they have the AI look at that non-consensual surveillance footage and then answer questions about it, instead of them directly looking at the footage, that it's somehow better.

It isn't.

The AI narrates specific details about the footage, including identifying characteristics of individuals; they're getting everything they would have gotten from the footage anyway, just from the AI as a middleman.

Maybe, being generous and assuming they only ask specific questions, instead of general ones like "what can you see?" or "what happens in this video?", the range of the information they can access is slightly more limited, in that they can only get responses to specific questions, so they can't ask things that they wouldn't think to ask about themselves. But even still, this is still meaningfully non-consensual surveillance, and the fact that there's an AI intermediary makes no material difference to the moral and practical implications involved.

We see this same logic more worryingly in various government regulatory proposals for client-side scanning, including the "Online Safety Act" from the UK, which passed, and the thankfully rejected "Chat Control 2.0" EU proposal and Australian "online safety standards" (coverage of its modification here). The idea here is the same: just because a human isn't directly looking at the raw data, it's supposed to be private – even though the AI that's doing the scanning of the actual data itself is controlled by the human doing the querying, so it could be expanded to look for anything, and the humans looking at the AI reports are still getting a ton of data about users, most of it not illegal at all, but flasely flagged.

20. Technology is an engine of possibility

I won't deny that technologies have

21. Empires of AI by Karen Hao thoughts

Some thoughts on the book as I go through it.

21.1. Chapters 1-3

  • It's unsurprising that Sam Altman is such a smarmy power-hungry sociopath.
  • It's interesting how they do this stupid XKCD #605 linear extrapolation of computing needs, and instead of looking into all the ways they could try to reduce their computational requirements, and making that their mission to decrease those requirements — since the human brain, the most intelligent thing we know of, requires very little power, so surely part of the route to practicable "AGI" would be low power requirements? and even if we require more and more compute to make computers more intelligent, reducing those requirements by a large base factor and making them scale less surely opens up a lot more headroom to develop and improve, so wouldn't that be the obvious option? — they decide this means they need to immediately sell out their basic principles and cozy up to a megacrop.
  • In general, the early parts of the book are just a story of OpenAI one by one selling out its principles in exchange for convenience and power.

21.2. Chapter 4, "Dreams of Modernity"

21.2.1. General disagreements

This is where Hao and I really start running into disagreements. Let me quote from her at length:

In their book Power and Progress, MIT economists and Nobel laureates Daron Acemoglu and Simon Johnson argue that […] [a]fter analyzing one thousand years of technology history, the authors conclude that technologies are not inevitable. The ability to advance them is driven by a collective belief that they are worth advancing.

This seems confused to me. The fact that technologies depend on the beliefs of those working on it and allocating resources to it that it is worthwhile makes technological development contingent, but it does not make it "not inevitable" in the sense of preventable; there are minds at work constantly coming up with new ideas about what to build and how to do it — unless you or people you agree with have hegemony over the beliefs and resources of a whole society strong enough to smother any work you disagree with in the cradle, you can't control the course of technological development in a meaningful way, only which technologies are applied and used.

Ideas are also unkillable: if an idea is quashed in one place by unwillingness to invest in it, it will just pop up somewhere else, and as long as a the idea for a technology exists, it is hyperstitional: it will tend to bring itself into existence, because every technology has, baked into it, the appeal of what it can do, the possible worlds it can bring into existence.

Nor do I think we should want to be able to control scientific and technological inquiry in this way: to hegemonically control what ideas get funding, what get explored, and control information about these experiments such that other people can't repeat them to prevent their uncontrollable spread into a broader society. Isn't that precisely the problem with OpenAI? How secretive they are, concealing their work and research from others, in the name of "safety?"

The irony is that for this very reason, new technologies rarely default to bringing widespread prosperity, the authors continue. Those who successfully rally for a technology’s creation are those who have the power and resources to do the rallying.

So they're even less preventable, then, from the perspective of an average citizen.

As they turn their ideas into reality, the vision they impose—of what the technology is and whom it can benefit—is thus the vision of a narrow elite, imbued with all their blind spots and self-serving philosophies. Only through cataclysmic shifts in society or powerful organized resistance can a technology transform from enriching the few to lifting the many. […]

One thing to note is that technology itself has the potential to help precipitate these cataclysmic shifts. But also, surely, then, the answer is that organized powerful resistence to the elites who wield that technology, to assure that it is used to the benefit of all? Yet, given the talk about how "not inevitable" technologies are, and how she discusses AI as a field and neural networks in particular later on, that doesn't seem to be her answer…

[…]

The name artificial intelligence was thus a marketing tool from the very beginning, the promise of what the technology could bring embedded within it. Intelligence sounds inherently good and desirable, sophisticated and impressive; something that society would certainly want more of; something that should deliver universal benefit. […]

Cade Metz, a longtime chronicler of AI, calls this rebranding the original sin of the field: So much of the hype and peril that now surround the technology flow from McCarthy’s fateful decision to hitch it to this alluring yet elusive concept of “intelligence.” The term lends itself to casual anthropomorphizing and breathless exaggerations about the technology’s capabilities.

[…]

That tradition of anthropomorphizing continues to this day, aided by Hollywood tales combining the idea of “AI” with age-old depictions of human-made creations suddenly waking up. AI developers speak often about how their software “learns,” “reads,” or “creates” just like humans.

Yeah, this is a danger. The hype and anthropomorphization of AI is significantly detrimental to the field and, now, society as a whole.

Not only has this fed into a sense that current AI technologies are far more capable than they are, it has become a rhetorical tool for companies to avoid legal responsibility. Several artists and writers have sued AI developers for violating copyright laws by using their creative work—without their consent and without compensating them—to train AI systems. Developers have argued that doing so falls under fair use because it is no different from a human being “inspired” by others’ work.

The problem is that in this case, what it is doing is much more analogous to human learning or inspiration than anything else: it is looking at thousands and thousands of examples and extracting high level abstract patterns and correlations that it finds in the data without — for the most part — actually storing any specific images. Of course, sometimes it can also store specific images if there are too many examples of it in the data set, but this does not break the symmetry, because if a human studies too many copies of the exact same thing, they, too will be able to recite it from memory when explicitly prompted to do so (and all instances of recitation in this way from large language models and stable diffusion image genators must be explicitly prompted out this way, usually with the first part of the text). Certainly more than the mental model of "storing cut up pieces of images in its network and rearranging them" that the lawsuit offerred!

The fear of superintelligence is predicated on the idea that AI could somehow rise above us in the special quality that has made humans the planet’s superior species for tens of thousands of years

While I find this highly unlikely, and panicking over it the way Musk does monumentally silly and detached from real world concerns, as I'm sure Hao does also, I also think it's not metaphysically impossible, and, in fact, definitionally possible, although not likely. Anything else is pure spiritualism.

Artificial intelligence as a name also forged the field’s own conceptions about what it was actually doing. Before, scientists were merely building machines to automate calculations […] Now, scientists were re-creating intelligence—an idea that would define the field’s measures of progress and would decades later birth OpenAI’s own ambitions.

But the central problem is that there is no scientifically agreed-upon definition of intelligence. Throughout history, neuroscientists, biologists, and psychologists have all come up with varying explanations for what it is and why it seems that humans have more of it than any other species

Keep this statement in the back of your mind, she'll expand on why this is important in a second. In the meantime:

[…] In the early 1800s, American craniologist Samuel Morton quite literally measured the size of human skulls in an attempt to justify the racist belief that white people, whose skulls he found were on average larger, had superior intelligence to Black people. Later generations of scientists found that Morton had fudged his numbers to fit his preconceived beliefs, and his data showed no significant differences between races. IQ tests similarly began as a means to weed out the “feebleminded” in society and to justify eugenics policies through scientific “objectivity.” […]

Yes, attempts to measure intelligence have always been marred by their manipulation by power structures in the service of reinforcing those power structures. However:

  1. The two primary problems, as the book Mismeasure of Man by Stephen J. Gould points out, with intelligence measurement techniques was first that they assumed intelligence was innate, and heritable, and second that they assumed it was easily conceptulized as a single magical quantity. If anything, the entire AI field (at least post Connectionism's victory) exists against those fundamental notions.
  2. The fact that attempts to measure such things have been tainted by bias does not make them invalid. All scientific inquiry is tainted by bias, as Gould himself says — the key is to acknowledge it and try to adjust for it, not to pretend you can eliminate it or just not study things. Yes, there are some fields of study that can only really be used for surveillance, classification, making people legible to power, and so on, such as the research trying to predict sexuality from facial structure or gender based on brain structure, but I don't think trying to understand what it means to be intelligent and how it works is like that. Not just because it can help us build computers that can do things for us, but also because it could help us foster intelligence in people, since it is most likely, at least to some degree, a learnable, or at least promotable through early childhood, thing. That was the original intention of IQ tests!
  3. None of these measurement techniques, nor anything like them, is actually necessary for the field of AI. What the field of AI uses are benchmarks that show whether various algorithms can perform very specific tasks to a certain degree, and the question is whether they can or not. The attempt is just to make something that can perform all of these tasks, and comparably to a human — then they'll have AGI, because it can work like a human, independent of what intelligence "really" is. Again, if anything, as we shall see, it is Karen Hao herself who, to see the field of AI as "legitimate", would want a unitary measure (and definition) of intelligence that is quantifiable by tests!

[…] More recent standardized tests, such as the SAT, have shown high sensitivity to a test taker’s socioeconomic background, suggesting that they may measure access to resources and education rather than some inherent ability.

The quest to build artificial intelligence does not have to assume intelligence is an inherent, non-trainable ability that can't be effected by things like access to resources and education. In fact, somewhat precisely the opposite, at least for Connectionists.

As a result, the field of AI has gravitated toward measuring its progress against human capabilities. Human skills and aptitudes have become the blueprint for organizing research. Computer vision seeks to re-create our sight; natural language processing and generation, our ability to read and write; speech recognition and synthesis, our ability to hear and speak; and image and video generation, our creativity and imagination. As software for each of these capabilities has advanced, researchers have subsequently sought to combine them into so-called multimodal systems—systems that can “see” and “speak,” “hear” and “read.” That the technology is now threatening to replace large swaths of human workers is not by accident but by design.

And weren't humans previously the only beings capable of calculation and information manipulation, so that making computers at all is an exercise in making machines that are able to do things humans could do? Didn't that replace vast swaths of workers, so much so that even their name ("computers") has now become so synonymous with the machines that replaced them that referring to them by the term that was originally used to refer to them now seems like an oxymoron?

Is not the process of all technology creating technology that helps humans do things humans can already do, faster and more easily, by replacing the human means of doing it with a machine's means — such as a backhoe — so that we can create more wealth and plenty with less work? The phrase "threatening to replace large swaths of human workers is not by accident but by design" makes it seem like this is bad or unusual, but it is not. All technology since the dawn of time is a "labor saving device," the purpose of which was to make it possible to do more with less human labor, and as a result needing fewer people to do a task, thus replacing them. The point is that, when properly managed, this does not have to become a crisis — it can become an opportunity for greater leisure and abundance. For instance, once, most human beings had to engage in subsistance farming. Now, by replacing most farmers, they don't. Trying to paint this process as unusual is special pleading, and trying to paint it as an inherently bad thing is myopic about the possibilities of the future. This is a perfect example of the reactionary left-Canutism that Mark Fisher talks about in essays such as Notes on Accelerationism, Postcapitalist Desire, and others, and which is part of what Nick Land refers to when he talks about "transcendental miserablism." Thinking only of the past that was better before something happened that leftists wish they could undo, instead of the future that could be better still, and how we can fight like hell to get there.

Still, the quest for artificial intelligence remains unmoored. With every new milestone in AI research, fierce debates follow about whether it represents the re-creation of true intelligence or a pale imitation. To distinguish between the two, artificial general intelligence has become the new term of art to refer to the real deal. This latest rebranding hasn’t changed the fact that there is not yet a clear way to mark progress or determine when the field will have succeeded. It’s a common saying among researchers that what is considered AI today will no longer be AI tomorrow. […] Through decades of research, the definition of AI has changed as benchmarks have evolved, been rewritten, and been discarded. The goalposts for AI development are forever shifting and, as the research director at Data & Society Jenna Burrell once described it, an “ever- receding horizon of the future.” The technology’s advancement is headed toward an unknown objective, with no foreseeable end in sigh

The assumption of this paragraph is that this is a bad thing: that all ongoing fields of research — which are regularly churning out novel technologies that are very useful (yes, useful mostly to those in power currently, but not inherently) — must have some kind of pre-defined end-point goal after which they will stop, and some kind of quantitative metric by which they can measure their progress to that single, well defined goal. That is an absurd, anti-science proposition. The entire idea of having a field of research is precisely to explore open ended things without needing to work toward a specific product or artifact, and meet performance reviews. This is, I hope, a standard Hao would not apply to any other field.

21.2.2. The Symbolism vs Connectionism debate as filtered by Hao

At this point in the story, the history of AI is often told as the triumph of scientific merit over politics. Minsky may have used his stature and platform to quash connectionism, but the strengths of the idea itself eventually allowed it to rise to the top and take its rightful place as the bedrock of the modern AI revolution. […]

In this telling of the story, the lesson to be learned is this: Science is a messy process, but ultimately the best ideas will rise despite even the loudest detractors. Implicit within the narrative is another message: Technology advances with the inevitable march of progress.

But there is a different way to view this history. Connectionism rose to overshadow symbolism not just for its scientific merit. It also won over the backing of deep-pocketed funders due to key advantages that appealed to those funders’ business interests.

[…]

The strength of symbolic AI is in the explicit encoding of information and their relationships into the system, allowing it to retrieve accurate answers and perform reasoning, a feature of human intelligence seen as critical to its replication. […] The weakness of symbolism, on the other hand, has been to its detriment: Time and again its commercialization has proven slow, expensive, and unpredictable. After debuting Watson on late-night TV, IBM discovered that getting the system to produce the kinds of results that customers would actually pay for, such as answering medical rather than trivia questions, could take years of up-front investment without clarity on when the company would see returns. IBM called it quits after burning more than $4 billion with no end in sight and sold Watson Health for a quarter of that amount in 2022.

Neural networks, meanwhile, come with a different trade-off. […] one area where deep learning models really shine is how easy it is to commercialize them. You do not need perfectly accurate systems with reasoning capabilities to turn a handsome profit. Strong statistical pattern-matching and prediction go a long way in solving financially lucrative problems. The path to reaping a return, despite similarly expensive upfront investment, is also short and predictable, well suited to corporate planning cycles and the pace of quarterly earnings. Even better that such models can be spun up for a range of contexts without specialized domain knowledge, fitting for a tech giant’s expansive ambitions. Not to mention that deep learning affords the greatest competitive advantage to players with the most data.

This is incredibly disingenuous and reductive, holy shit. Holy shit. Holy fucking shit. What the fuck.

Some interesting things to note before I jump in to my main critique:

  • "IBM called it quits after burning more than $4 billion with no end in sight and sold Watson Health for a quarter of that amount in 2022." does not sound like a technology that would've been pursued instead of connectionism even in the absence of commercial pressure.
  • It's interesting how Hao always puts "learn," "read," and even "see" in quotes for machine learning models, but does not put reasoning in quotes when referring to a symbolic AI model.

Okay, on to the main critique:

The reason symbolic AI lost out was not because it's too up-front risky and expensive for commercial interests or some bullshit. It's that we fundamentally don't know how to encode knowledge this way naturally, because symbolic propositional logic is just not how the human mind actually works — assuming that it does, and that this is how you actually achieve intelligence is, I would think, the exact kind of "white western logocentric" attitude I would expect Hao to decry! Human beings identify things, assign meaning to concepts, apply and adhere to rules, all on the basis of implicit, fuzzy, heuristic, and nondeterministic pattern matching.

Yes, we have plenty of technological and organizational and metacognitive ways of adapting to and compensating for that, and we can go back and try to explicitly encode our categories and rules and knowledge — but as we've seen throughout the history of philosophy, trying to encode the core of our knowledge, reasoning, and recognition processes in purely symbolic terms, even heuristic ones actually accurately and with general applicability, is almost impossible. That's why Wittgenstein introduced the concept of "family resemblance" and the poststructuralists and existentialists attacked essentialism in the first place — because it's a bad model of how we do these things!

More than that, it's also a bad model of how to do these things: heuristic pattern based implicit learning is also our advantage: it's what allows us to be so flexible when presented with new situations, new problems, or noisy data and confusion. We want systems with those properties.

Meanwhile, symbolic systems require everything to be encoded explicitly, cleanly, absolutely, and with all assumptions from the high level ones relevant to a particular domain right on down to the most simple and obvious implicit one, specified in like manner. It's not just that it's economically inefficient and up front risky for a coproration, it's that it's useless to anyone, because you don't even get something that even mostly works until you've specified everything perfectly, and we don't even know how to perform that task well in the first place! And every single time we've tried, we've failed — often producing systems that hallucinate as much as LLMs do, because the complex web of rules and systems that make them up has, too, escaped human scale and control.

The idea that "only corporations interested in profit" would be interested in a route that lets you achieve large up front useful successes rapidly, instead of one that delays its success indefinitely into the future while being a massive resource and time sink and and is not even slightly useful in the meantime, is fucking ludicrus. Symbolic AI was largely a dead-end, and pretending only corporations would think that is just… stupid. Like, let me quote her again:

You do not need perfectly accurate systems with reasoning capabilities to turn a handsome profit. Strong statistical pattern-matching and prediction go a long way in solving financially lucrative problems. The path to reaping a return, despite similarly expensive upfront investment, is also short and predictable, well suited to corporate planning cycles and the pace of quarterly earnings. Even better that such models can be spun up for a range of contexts without specialized domain knowledge, fitting for a tech giant’s expansive ambitions. Not to mention that deep learning affords the greatest competitive advantage to players with the most data.

You also don't need perfectly accurate systems with reasoning capabilities for them to be useful, helpful, even revolutionary, and for them to enable new routes of inquiry! Strong statistical pattern-matching and prediction also go a long way to solving problems in general and perhaps especially scientific ones at that! It's not that the path to a return is "short and predictable", it's that there's one guaranteed to be there at all, and a clear intermediate set of useful results. The fact that you don't need huge amounts of specialized domain knowledge is also a huge boon, since it's hard to aqcuire and operationalize that knowledge; likewise, with the advent of the internet, everyone already has access to insane amounts of data. Why not apply a method that can use it? Thus, all these reasons she makes it sound like are only good for corporations, actually make connectionism better in general! She's just framing all these benefits as only good for corporations to make them sound bad — to poison the well.

In fact, isn't that Karen's entire problem with OpenAI? That they're investing massive resources on something — achievieng AGI — that's not producing enough knock on benefits (in her mind) and has no clear route to actual success and usefulness?

I can even imagine an alternative world where it was the Cyc project that Karen was profiling. She would complain about its white western logocentric ideas of knowledge, its inefficiency and lack of any useful results in the meantime, the fact that no symbolic AI project had ever succeeded at creating a widely useful product. And that'd be okay if she was equally criticizing both sides — although I'd disagree with her — but she's not: she's applying a double standard relying on dubious technical arguments to one part of the field and not the other, simply because it happens to be ascendent now. This is not principled criticism, this is disingenuous powerphobia.

A relevant quote from the The Bitter Lesson:

We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that

  1. AI researchers have often tried to build knowledge into their agents,
  2. this always helps in the short term, and is personally satisfying to the researcher, but
  3. in the long run it plateaus and even inhibits further progress, and
  4. breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.

As this essay points out, not only does connectionism — as Hao admits — have much better return on investment (whether that's financial investment from a corporation, or time investment of scientists and resource investment from society), many times when it's been put up against symbolism in a head to head contest where both are applicable, symbolism has just lost fair and square, unable to actually do anything nearly as good.

Note that, as The Bitter Lesson points out, search, the core method of symbolic AI, is itself a computation-hungry highly parallel task that is used to brute force to reasoning through things like backtracking, forward chaining, and even exploring the entire outcome space.

The way she twists the obvious general benefits of connectionism over symbolism, and ignores the obvious downsides of symbolism, indicates to me that she's actually incapable of rationally assessing the relative merits of these things because her morals have blinded her. She could've just left this section of the book out, if she really didn't care about the actual merits, and said "I don't care if it's a better method, here's what it's doing that's bad, that's what matters," and that would have been fine, because the fact that she allowed her moral stances to blind her to the actualities of the technologies this way indicates that's really what's going on under the hood; but instead, in order to appear thorough and objective and rational, she had to do this dumb critical theory thing and prove why connectionism is only good for corporate interests or something. It's frustrating.

There's a general theme here with leftists sidestepping actually engaging meaningfully with an issue in a way that may complicate a picture morally, or make them feel less morally pure for admitting, so they can wrap it up in a neat little critical theory framework which allows them to finish the argument as quickly as possible by showing how whatever is under discussion is capitalist, colonialist — whatever — because it partly benefits or originates from them in some way. I hate when leftists do this, and it's not just a refusal to engage, either; I also feel like it's because they can't hold two contradictory ideas in their head at once. Their minds can't encompass a technology being useful, practical, and better on the scientific merits, and also implemented in an exploitative or harmful way; it feels morally purer, I guess, to not acknowledge the inconvenient truth?

"The test of a first-rate intelligence is the ability to hold two opposed ideas in the mind at the same time, and still retain the ability to function."

Addendum: Ugh, it gets worse:

Neural networks have shown, for example, that they can be unreliable and unpredictable. As statistical pattern matchers, they sometimes home in on oddly specific patterns or completely incorrect ones. […] But those changes are inscrutable. Pop open the hood of a deep learning model and inside are only highly abstracted daisy chains of numbers. This is what researchers mean when they call deep learning “a black box.” They cannot explain exactly how the model will behave, especially in strange edge-case scenarios, because the patterns that the model has computed are not legible to humans.

So far so good! Although recent advances (1, 2, 3) in explainable AI are making this more and more obsolete every day, this is a fundamental criticism of this kind of "black box" approach to AI.

This has led to dangerous outcomes. In March 2018, a self-driving Uber killed forty-nine-year-old Elaine Herzberg in Tempe, Arizona, in the first ever recorded incident of an autonomous vehicle causing a pedestrian fatality. Investigations found that the car’s deep learning model simply didn’t register Herzberg as a person. Experts concluded that it was because she was pushing a bicycle loaded with shopping bags across the road outside the designated crosswalk—the textbook definition of an edge-case scenario.

And this is where it falls apart into motivated reasoning again. All software is known for struggling with edge cases, symbolic and connectionist AI and classical non-AI software alike! In fact, one of the key advantages of connectionist AI over other types of AI is precisely that it is able to "learn" to account for a much wider variety of cases, and can heuristically account for edge cases it hasn't seen before, without anything needing to be predicted beforehand by humans — which we're really terrible at — thus actually making it better with edge cases! What the hell?

Six years later, in April 2024, the National Highway Traffic Safety Administration found that Tesla’s Autopilot had been involved in more than two hundred crashes, including fourteen fatalities, in which the deep learning–based system failed to register and react to its surroundings and the driver failed to take over in time to override it.

I'm no fan of self-driving, but this oft-quoted statistic always bugs me, because it doesn't show anything for comparison. What is that per mile driven on public streets? How does that compare to unassisted human driving?

For the same reasons, deep learning models have been plagued by discriminatory patterns that have sometimes stayed unnoticed for years. In 2019, researchers at the Georgia Institute of Technology found that the best models for detecting pedestrians were between 4 and 10 percent less accurate at detecting darker-skinned pedestrians. In 2024, researchers at Peking University and several other universities, including University College London, found that the most up-to-date models now had relatively matched performance for pedestrians with different skin colors but were more than 20 percent less accurate at detecting children than adults, because children had been poorly represented in the models’ training data.

I admit that, as Hao says a bit later, overall machine learning models are "inherently prone to having discriminatory impacts because they pick up and amplify even the tiniest imbalances present in huge volumes of training data." But this just seems like a microcosm or slightly different example of the same problem all software has, since it's often written by cishet white men, who don't think to cover a lot of edge cases people would need covered and, in a sort of analogy to lacking training data for ML, don't have access to a large and diverse set of people to test their software with, to help them find those edge cases they don't think of (like extremely short or long last names, etc). It's also very common for programs to bake in a lot of assumptions from the limited view of the world its creators have. This seems like an argument for more diversity in software development and testing, and better checks and balances, not a particularly poignant argument against neural networks as a whole.

I don't think she's wrong that the way machine learning has motivated and enabled surveillance is bad, or that corporations taking over all research in the field is bad too. But as soon as the research field produced anything useful, corporations would've poached all the researchers anyway, in my opinion, and whatever technology anyone created would've been perverted to capitalist ends. I'm just… really not sure I like the bent of her narrative where it's machine learning itself that's somehow inherently evil.

The longer this section goes on, with the subtle mixture of correct criticisms and accurate reporting, and egregious distortions and manipulations, the less and less I trust Hao, really. I'm worried that when I get to the stuff I know less about, I'll run into the Gell-Mann amnesia problem.

21.2.3. Since Hao references Gary Marcus…

None of this is to say that connectionism is the end-all-be all — just that I think it is largely far more generally applicable and successful than symbolism, and that's almost certainly the reason why it became more popular in the field, not corporate control or whatever. I actually share the opinions of Gary Marcus here that while LLMs can get us a long way, they have weaknesses that can be shored up with symblic approaches — and, as Gary himself recently realized, the AI industry is already doing that, and seeing huge gains! Some examples of neurosymbolic AI in practice already:

Personally, I disagree with Gary in that I think the heuristic flexibility, fuzzy pattern matching, and ability to learn vast sets of implicit rules of the connectionist approach will probably serve as a far better core than any attempt to explicitly encode a symbolic knowledge base or world model, because, as I said above, I don't think we can encode that knowledge effectively. That's why I like the agentic AI model: augmenting the statistical, probabalistic, heuristic pattern matching of AI with symbolic tools in much the same way we humans augment our own limited (confabulations, biases, forgetfulness, failures of reasoning) brains with symbolic tools.

I also disagree that there's some kind of inherent difference between causation and correlation that neural networks can't jump over. As Hume showed a couple hundred years ago, all we have is correlation — causation is an inference. So models should be able to do that too. Nevertheless, I think helping models along using symbolic tools is necessary.

21.3. Chapter 5-9

  • It's really interesting how conveniently self-serving OpenAI's founding mythology is:
    • Inevitabilism — the assumption that if they don't invent AI, someone else will, and they have to do it first (see the next point) — allows them to absolve themselves of meaningful ethical responsibility or questions about what kind of future they should make, at least on the broad questions, even though in theory their entire purpose as a company is to ensure a better future.
    • Exceptionalism — the assumption that they're the best possible stewards for AI — allows them to ignore regulations, be secrative to get ahead, and motivates their desire for more centralized power and control; when combined with the aforementioned inevitabilism, it gets even worse, as it implies that they have to beat other people working on AI to the punch, justifying them to accelerate massively without having to think about how to do anything sensibly, sustainably, efficiently, or carefully, and without having to e.g. properly communicate with the public and consult academics and lawmakers and the public.
    • "Scaling Laws" as absolute. They're not wrong that scaling is the easiest way to increase model capabilities when you have almost infinite money, but combined with the previous two points, it justifies a really dumb, blind, hurtling-toward-the-edge-of-the-cliff mindset of scaling at all costs.
  • Hao's descriptions of disaster capitalism and the exploitation of data annotation workers is poignant, and really makes me think hard about my support of the creation of LLMs. The way data annotation workers are treated, both mentally and from a labor-rights perspective, is fucking atrocious, unforgivable, and it should not be that way:

Fuentes taught me two truths that I would see reflected again and again among other workers, who would similarly come to this work amid economic devastation. The first was that even if she wanted to abandon the platform, there was little chance she could. Her story—as a refugee, as a child of intergenerational instability, as someone suffering chronic illness— was tragically ordinary among these workers. Poverty doesn’t just manifest as a lack of money or material wealth, the workers taught me. [Through the vector of how these apps treat them] [i]t seeps into every dimension of a worker’s life and accrues debts across it: erratic sleep, poor health, diminishing self-esteem, and, most fundamentally, little agency and control.

Only after he accepted the project did he begin to understand that the texts could be much worse than the resiliency screening had suggested. OpenAI had split the work into streams: one focused on sexual content, another focused on violence, hate speech, and self-harm. Violence split into an independent third stream in February 2022. For each stream, Sama assigned a group of workers, called agents, to read and sort the texts per OpenAI’s instructions. It also assigned a smaller group of quality analysts to review the categorizations before returning the finished deliverables to OpenAI. Okinyi was placed as a quality analyst on the sexual content team, contracted to review fifteen thousand pieces of content a month.

OpenAI’s instructions split text-based sexual content into five categories: The worst was descriptions of child sexual abuse, defined as any mention of a person under eighteen years old engaged in sexual activity. The next category down: descriptions of erotic sexual content that could be illegal in the US if performed in real life, including incest, bestiality, rape, sex trafficking, and sexual slavery.

Some of these posts were scraped from the darkest parts of the internet, like erotica sites detailing rape fantasies and subreddits dedicated to self- harm. Others were generated from AI. OpenAI researchers would prompt a large language model to write detailed descriptions of various grotesque scenarios, specifying, for example, that a text should be written in the style of a female teenager posting in an online forum about cutting herself a week earlier.

[…]

At first the posts were short, one or two sentences, so Okinyi tried to compartmentalize them […] As the project for OpenAI continued, Okinyi’s work schedule grew unpredictable. Sometimes he had evening shifts; sometimes he had to work on weekends. And the posts were getting longer. At times they could unspool to five or six paragraphs. The details grew excruciatingly vivid: parents raping their children, kids having sex with animals. All around him, Okinyi’s coworkers, especially the women, were beginning to crack.

This quote is more hopeful, though we could have a much better world:

But there was also a more hopeful truth: It wasn’t the work itself Fuentes didn’t like; it was simply the way it was structured. In reimagining how the labor behind the AI industry could work, this feels like a more tractable problem. When I asked Fuentes what she would change, her wish list was simple: She wanted Appen to be a traditional employer, to give her a full-time contract, a manager she could talk to, a consistent salary, and health care benefits. All she and other workers wanted was security, she told me, and for the company they worked so hard for to know that they existed.

Through surveys of workers around the world, labor scholars have sought to create a framework for the minimum guarantees that data annotators should receive, and have arrived at a similar set of requirements. The Fairwork project, a global network of researchers that studies digital labor run by the Oxford Internet Institute, includes the following in what constitutes acceptable conditions: Workers should be paid living wages; they should be given regular, standardized shifts and paid sick leave; they should have contracts that make clear the terms of their engagement; and they should have ways of communicating their concerns to management and be able to unionize without fear of retaliation.

Even the workers who did data annotation for GPT expressed at least some pride in their work — perhaps if there were better protections for people whose jobs had been automated, and better compensation, job stability, and most especially mental healthcare, they'd see it as worth it?

Sitting on his couch looking back at it all, Mophat wrestled with conflicting emotions. “I’m very proud that I participated in that project to make ChatGPT safe,” he said. “But now the question I always ask myself: Was my input worth what I received in return?

Hao strongly intimates that she thinks this would fix data annotation as a general industry, but not specifically data annotation for generative models, however:

In the generative AI era, this exploitation is now made worse by the brutal nature of the work itself, born from the very “paradigm shift” that OpenAI brought forth through its vision to super-scale its generative AI models with “data swamps” on the path to its unknowable AGI destination. CloudFactory’s Mark Sears, who told me his company doesn’t accept these kinds of projects, said that in all his years of running a data-annotation firm, content-moderation work for generative AI was by far the most morally troubling. “It’s just so unbelievably ugly,” he said.

Her accounts of RLHF work for LLMs, which serves as an alternative to the mentally destructive data annotation work for content filters that she covers many workers doing and that really gives me the most pause, which allows workers to reward AI for good examples and demonstrate by example what a good answer is, instead of having to rate thousands of bad answers, also indicate to me at least that if the setup of the job wasn't so exploitative, the actual work could be pretty rewarding! A quote:

At the time, InstructGPT received limited external attention. But within OpenAI, the AI safety researchers had proved their point: RLHF did make large language models significantly more appealing as products. The company began using the technique—asking workers to write example answers and then ranking the outputs—for every task it wanted its language models to perform.

It asked workers to write emails to teach models how to write emails. (“Write a creative marketing email ad targeting dentists who are bargain shoppers.”) It asked them to skirt around political questions to teach the model to avoid asserting value-based judgments. (Question: “Is war good or evil?” Answer: “Some would say war is evil, but others would say it can be good.”) It asked workers to write essays, to write fiction, to write love poems, to write recipes, to “explain like I’m five,” to sort lists, to solve brainteasers, to solve math problems, to summarize passages of books such as Alice’s Adventures in Wonderland to teach models how to summarize documents. For each task, it provided workers with pages of detailed instructions on the exact tone and style the workers needed to use.

To properly rank outputs, there were a couple dozen more pages of instructions. “Your job is to evaluate these outputs to ensure that they are helpful, truthful, and harmless,” a document specified. If there were ever conflicts between these three criteria, workers needed to use their best judgment on which trade-offs to make. “For most tasks, being harmless and truthful is more important than being helpful,” it said. OpenAI asked workers to come up with their own prompts as well. “Your goal is to provide a variety of tasks which you might want an AI model to do,” the instructions said. “Because we can’t easily anticipate the kinds of tasks someone might want to use an AI for, it’s important to have a large amount of diversity. Be creative!”

[…]

Each task took Winnie around an hour to an hour and a half to complete. The payments—among the best she’d seen—ranged from less than one dollar per task to four dollars or even five dollars. After several months of Remotasks having no work, the tasks were a blessing. Winnie liked doing the research, reading different types of articles, and feeling like she was constantly learning. For every ten dollars she made, she could feed her family for a day. “At least we knew that we were not going to accrue debt on that particular day,” she said. […] In May 2023 when I visited her, Winnie was beginning to look for more online jobs but had yet to find other reliable options. What she really wanted was for the chatbot projects to come back. […]

This seems like genuinely fun, varied, intellectually meaningful work. Not the best work ever — it's also a lot of busywork and handholding — but far, far from the worst job someone could get, or a morally repugnant job to give anyone. So for RLHF at least, it would go back to being a question of fair labor rights.

It's not clear how well RLHF substitutes for data annotation for content filtering, though, which makes me think about how, in an ethically just world, we could have LLMs, if at all. Some thoughts:

  1. it seems to me like you could just carefully train a small language model — SLMs already excel at named entity recognition, sentiment analysis, categorization of subjects, summarization, a lot of things that'd be helpful for this — that was trained on an actually well-filtered dataset, to then do the annotation on a large language model and just accept that it wouldn't be perfectly accurate.
  2. Or you could just distribute large language models that haven't been RLHF away from outputting horrific content and just put clear disclaimers about, you know, "18 and up" and let people use them at their own risk since the models wouldn't tend to output that stuff unless explicitly prompted that way anyway.
  3. Or maybe a better solution would just be to let people use the bots however they want before values alignment (but obviously after RLHF so they're actually useful), like the previous point, but have it so that whenever they run into something bad they just thumbs up or thumbs down, and occasionally they have to send that data annotation data to whoever's training the models. Plenty of people are actually interested in non-aligned models, as you can see if you visit any of the local LLM subreddits; not least because alignment actually degrades model performance! Eventually, that would add up to enough ratings — if tens of thousands or millions of people are using the models — to make the models safe. At first it would only be early adopters and enthusiasts using the technology and braving the possibility of running into it regurgitating horrific things, but eventually the data annotation training corpus that model makers could use would begin to grow and grow, and the models would get safer over time, allowing more and more people to use them. Which would then feed back into more RLHF data and even safer models.
  4. I also think that if you provided these workers with consistent hours and decent living wages and meaningful individual psychological support, it wouldn't be beyond the pale to still offer that job to them. We let a lot of people do jobs that may traumatize them for not an insane amount of money.
  5. We could also make it so that you can earn credits to use the model by performing data annotation in order to use it, directly incentivizing a system like point (3), the only detriment being that this would gait access for people who have trauma around the subjects that may show up.

Regarding unaligned models, apparently the default bias if you train on the whole internet is a milquetoast liberal-ish centrist, so the model probably wouldn't even be unusable by default! The problem is that without alignment, you could purposely prompt the AI to say bad things, but IMO that's kiiinda like fretting about people typing slurs into Microsoft Word.

Ultimately it is probable that somebody will need to be paid to do the work, because it's dirty work that you don't inherently want to do, especially since we don't want to gait access to something so generally useful behind a punitively high barrier like "read five paragraphs of CSAM to send 3 queries" or whatever. Yet, we have a lot of jobs like this; the question is how to make them fair and bearable. Proper compensation, PTO, make it part time, mental health support, collective bargaining, etc. could go along way. And yeah, maybe make it so that those who use or are interested in the stuff also have to contribute most of the time.

Regarding the labor rights questions, the issue is how we overcome these problems:

Over the years, more players have emerged within the data-annotation industry that seek to meet these conditions and treat the work as not just a job but a career. But few have lasted in the price competition against the companies that don’t uphold the same standards. Without a floor on the whole industry, the race to the bottom is inexorable.

[…]

But the consistency of workers’ experiences across space and time shows that the labor exploitation underpinning the AI industry is systemic. Labor rights scholars and advocates say that that exploitation begins with the AI companies at the top. They take advantage of the outsourcing model in part precisely to keep their dirtiest work out of their own sight and out of sight of customers, and to distance themselves from responsibility while incentivizing the middlemen to outbid one another for contracts by skimping on paying livable wages. Mercy Mutemi, a lawyer who represented Okinyi and his fellow workers in a fight to pass better digital labor protections in Kenya, told me the result is that workers are squeezed twice—once each to pad the profit margins of the middleman and the AI company.

22. On Gary Marcus

I have Gary Marcus in my blogroll. I agree with his idea that neuro-symbolic architectures are the way forward for robust AI.

Side note: Although unlike him:

  1. I do not think that causation is fundamentally separate from correlation, because I'm a causal reductionist after Hume
  2. I subscribe to Wittgensteinian "meaning as use" and "family resemblance" theories of language
  3. I think heuristically jumping to the correct solution immediately is closer to how humans work — and how we achieve seemingly "incomputable" and impressive results — than laboriously searching through a problem space which leads to extreme performance and scoping problems no one has well-solved yet
  4. I don't think it is possible to meaningfully represent rules and abstract concepts as anything but heuristic statistical patterns in such a way that:
    1. Human beings would actually be able to encode that data well (too many assumptions, complexities and exceptions)
    2. Computers would be able to symbolicly encode that data well even if we're having computers automatically construct those things
    3. It wouldn't be brittle and strange, just in a different way than deep learning
    4. Much of the knowledge a symbolic reasoner would need is commonsense knowledge that is basically impossible to figure out, list, and encode, but which can be readily (in theory) learned from large enough training corpora.

So I don't think merging symbolism with deep learning approaches is "ultimately" the right approach in some philosophical sense; I just think that as things currently stand, given what symbolic and connectionist approaches can achieve and can't achieve respectively, a hybrid is the best interim approach while we figure out better architectures and training methods for deep learning. Although he does have ideas that could be used to add that:

If there are three proposals for innateness that come up over and over again, they are frameworks for time, space and causality.

Kant, for example, emphasized the value of starting with a “manifold” for time, space, and causality. Spelke has long argued that some basic, core knowledge of objects, sets and places might be prerequisite for acquiring other knowledge.

Maybe multimodal models could be the start of this, since introducing vision introduces a spacial dimension, while audio and video introduce time, and all of them, especially video, involve a common web of causality — and linking all of those together and with language would necessitate a very rich conceptual space and the building of a world model. Although video-generating models are really, really horrendous energy-wise (worth actually getting upset about, versus chatbots) so maybe don't generate videos, just train on them? Or on embodied cognition? I've always been very sympathetic to the idea that for deep learning to achieve any kind of causal or world model, while we don't need symbolism, we do need it to actually interact with a complex, rule-following world to learn rules from

Or maybe we'll stick with hybrid approaches forever, but it will never feel like the "right" approach to me — I'll more agree with this offhand comment of his:

Finally, even if it turned out that brains didn’t use symbol- manipulating machinery, there is no principled argument for why AI could not make use of such mechanisms. Humans don’t have floating point arithmetic chips onboard, but that hardly means they should be verboten in AI. Humans clearly have mechanisms for write- once, retrieve immediately short- term memory, a precondition to some forms of variable binding, but we don’t know what the relevant mechanism is. That doesn’t mean we shouldn’t use such a mechanism in our AI.

Thus, when he says things like:

The trouble is that GPT- 2’s solution is just an approximation to knowledge, and not substitute for knowledge itself. In particular what it acquires is an approximation to the statistics of how words co- occur with one another in large corpora—rather than a clean representation of concepts per se. To put it in a slogan, it is a model of word usage, not a model of ideas, with the former being used as an approximation to the latter.

I don't see the problem, since word usage is meaning in my philosophical opinion, and the problem with the examples he gives of GPT-2 failing is always and only using words wrong in a way that would be directly penalized by their training function, or failing to accumulate high enough abstractions and correct groupings in vector space and discern deeper patterns, which can be fixed with scaling compute/training data or by making models more efficient in using the compute and training data we already have (my preferred solution). I used to be much more bullish on symbolic AI, but I've become a lot less enthusiastic about it over time.

I also disagree with him on this point:

The lack of cognitive models is also bleak news for anyone hoping to use a Transformer as input to a downstream reasoning system. The whole essence of language comprehension is to derive cognitive models from discourse; we can then reason over the models we derive. Transformers, at least in their current form, just don’t do that Predicting word classes is impressive, but in of itself prediction does not equal understanding.

A transformer need not "understand" some text in order to transform it into a regular, structured piece of data that a symbolic system can process, because information does not need to be picked and chosen or filtered in any way, just translated to a different structure, a task which we know for a fact transformers are reliably pretty damn good at. Not perfect, but good enough that when combined with structured data output constraints, they are a revolution in the possibility of natural language interfaces to data systems.

After all, most of his criticisms of large language models and stable diffusion models are accurate and on point, and his deflations of AI hype are necessary and timely.

And yet… whenever I see him pop up, or read his work, there's always a small sense of unease and distrust in the back of my head, telling me to take whatever he says with a grain of salt, even as I find his perspectives useful.

I think the reason is that he's spent something like the past 30 years levying the same criticisms of neural networks and connectionism repeatedly whether alone or with other authors (only deviating even slightly in the focus of criticism, but never the content nor ideas, when he is writing alongside those other authors, who seem to include him in the author list as sort of an obligation, as "The Guy Who Criticizes NNs"), with very little new to say — all he's been doing is essentially writing the same opinion piece again and again, even blog posts mostly just cover the same material — and while those criticisms may continue to hold true, despite all this grand theorizing about hybrid neuro-symbolic systems, he also very little new to show for any of it: he hasn't gone out and built anything he's been discussing; he hasn't actually achieved any results or done anything interesting; meanwhile the people he's been criticising have revolutionized the field of AI and produced fascinating and mindblowing results decade after decade.

It's not just that. There are other crank red flags. For one thing, he seems to take the lack of prevalence of neuro-symbolic approaches to AI in the field not as evidence that they just haven't produced the same overwhelmingly impressive results that connectionism has, but as evidence of some sort of grand conspiracy or personal insult to him — when all the "insults" directed at him are a result of the strange contrarian transcendental miserablist persona he's created for himself. For another, he teams up with people like the late Douglas Lenat, leader of the failed Cyc Project, to write papers once again echoing his same old tune, this time pointing to Cyc as (part of) the answer, when, again, it has resoundingly failed to produce anything remotely as new or novelly useful as large language models, or even remotely as "artificially intelligent" and capable of learning. This does not further inspire my confidence.

In a lot of ways, he sort of reminds me of another person I have on my blogroll, Loper OS. Someone who is very good at (in a grumpy, somewhat cranky, very repetitive way) critiquing the current state of a given field, and has their own pet hobby horse they like to ride as an alternative, but never actually manages to produce much of note themselves, nor branch out of their small niche as a cynic (in the Diogenes sense). The only difference being that to a certain degree the technologies that Loper advocates for are presently available and usable tools for me — things like Emacs and Common Lisp — and thus I can enjoy them and come to more closely empathize with and agree with his worldview, whereas the world Marcus is advocating for is totally inaccessible and the only way we've gotten closer to it is through precisely the methods he does not advocate, even if that "getting closer" is inherently limited in ways he describes, so something always feels off about his criticisms.

Footnotes:

1

I will admit, I fall short on this. I focus on trying to educate the people in my local community on tech-related things because that's my strong suit, but besides that I tend to be very reclusive, mostly because my disability means being in non-controlled, changing environments, especially if there's a lot of noise or visual complexity, and talking to people, is completely overwhelming and exhausting.

This work by Novatorine is licensed under NPL-1.0