LLM monotony

One limitation I have noticed when reading LLM byproduct is that the intent gets lost. Soulless information.

Summaries

When summarising a meeting the LLM is unable to register the importance of the different sections or the feeling of agreement or disagreement of various parts.

A standard meeting might have a few important or controversial points while the rest are routine agreements.

So a meeting with

a passionate argument on why the company should acquire a competitor or decide on which product line to pursue
20 maintenance items that everybody agrees on

Will be summarised with proportional allocation of time but not intent, so roughly 21 segments each and no sense of capturing disagreement.

The disagreement is captured as a yuxtaposition of opinions with no particular weight, creating a sloppy reference of items occurred and not an actual summary.

https://dictionary.cambridge.org defines summary as

a short, clear description that gives the main facts or ideas about something

Code

I have observed the same effect in code. A prompt can request the reimplementation of a function in certain style, or any style. The generated code will, if lucky, contain a piece of code that provides the functionality in question.

But the style will be off as well. A python code snippet will look like:

def my_function(argument1:type1, argument2:type2) -> None|type3:
   """
   A slighly verbose docstring explaining what the function does.
   argument1: type1
   argument2: type2
   returns: None|type3
   """

   ## small note describing step1

   content = argument1.data
   ...

   ## small note describing step2

   content_as_list = [x for x in content]
   ...

   ## small note describing step3

   return_value = library.function(content_as_list, argument2.property1)

   ## small note describing step4

   if condition
       return return_value
   else:
       return None

various techniques to assign intent get lost in the prompt. For example, the first 2 steps could be in a separate function that starts with _ to signify local module only to prepare data.

Or the step3 could be considered a separate service with its own funcion for reusability.

Perhaps step1 did not need a comment at all. Or a superfluous variable to say if LEGACY_MODEl: ....

Most of these intent revealing writing methods get lost in the prompt. They can be part of the prompt, but to truly capture intent the prompt would be the code itself.

Total Library

These byproducts are suboptimal for human consumption. Writing is a human communication tool. Intent matters because it is part of the theory building and sharing.

Over time the generated writing will reduce the amount of information per document, and increase the effort require to go through all of these reference like, intentless documents.

Borges wrote about the fictional concept of the “total library” [[]]:

Everything: but for every sensible line or accurate fact there would be millions of meaningless cacophonies, verbal farragoes, and babblings. Everything: but all the generations of mankind could pass before the dizzying shelves – shelves that obliterate the day and on which chaos lies – ever reward them with a tolerable page

This concept is similar to the thousand monkeys with typewriters, or “The Library of Babel”. All of them convey the idea that there are infinite possible writings, and that to find something valuable the effort is equivalent to writing it.

In our current predicament, LLMs would be the equivalent of both reading and writing at higher speed. An assistant available for a small nominal price so far that will produce likely outputs to whatever it consumes.

It started with a regular big library made by humans, output of millions’ effort. But as more humans used LLM outputs, the new library resembles more and more the “Total Library”. Now both humans and the assistant have to go through more reading to get useful or intentional information.

Eventually, only the reading speed of the LLM will allow to extract useful information in a reasonable timeframe. In coding, complexity will be overwhelming and assisted manipulation will be the only option.

Monopoly and hope

The big corporations would love this hypothetical scenario where most people need a LLM to perform their reading and writing. I believe this is the reason they are subsidising them despite the lack of profit plan.

And that they can of course because they are monopolies.

Internet search and social media are tipping towards rents and away from markets, and this seems like a natural next step.

It is hard to predict where LLM are going to go. On one side efficiency and performance keeps improving. On the other hand natural resources are becoming harder, plus training requires more curation. When will capital dry up and how stable the world will be are unknown.

Something that gives me hope against this power imbalance and technical usurpation of intellectual wealth is admonition. The social rules around LLM usage. The frustration of having to review slop, the shaming of those that share a made up reference.

Dijkstra wrote an essay on the foolishness of natural language programming, criticising the tools that people were trying to create back then.

These arguments will continue to exist, and there will always be demand for easy to understand texts and code.

Instability is a less hopeful but also effective way to reduce the negative effects of the LLM. When enough people start to believe incorrect things that the LLMs help produce, democracy suffers. And given how much technology is involved, I doubt that authoritarian governments will be able to maintain the delicate structures to produce the hardware or the software.

*#llm