Points of View: 3 Podcasts on AI
English translation of an italian post that was originally published on Levysoft.it
To stay updated about the latest developments in Generative AI, today I want to suggest 3 italian podcasts, citing some excerpts that I found particularly interesting.
Let’s start with “2024 — Special Artificial Intelligence” in its latest episode “Emotions and Creativity,” which begins by quoting a piece of an interview with Iago Corazza (National Geographic photojournalist) who questions the end of human creativity:
AI will be increasingly used but for trivial things. The standardization of results is almost mandatory, using this technological system, because soon, anyone will be able to achieve more or less the same result, as obviously more sophisticated presets will be created, which will allow uneducated users to produce high-level content, as an agent that is more prepared and thus there will be this standardization of results.
At the same time, there will be an inevitable lowering of the standard of excellence, because we will get used to having texts, photographs, artifacts that all look a bit alike, so the real danger, in my opinion, is the death of creativity for humans, while for artificial intelligence, it becomes an obvious and undiversified accessory
[…] Thanks to artificial intelligence technology, our children will be less and less stimulated to create but simply to use, and thus inventiveness will be diluted enough to level all differences. The appreciation for differences, which is what drives us reporters, and the space for creativity will keep narrowing, and at this point artificial intelligence, which reworks data of creativity inputted by humans, will find less material to process and will start to standardize in a thousand ways what it has already used, learning from itself, but I do not see beauty in the future.
[…] So you say that the best time for artificial intelligence is perhaps today, as we move forward there will be more and more flattening because the future training of systems will be done on the same material created by artificial intelligence or by humans who however are fading more and more?
Of course, the material that artificial intelligence uses, as sophisticated as it is, is provided by us, we will provide it less and less, poor thing, I would say, it will process more and more data trying to diversify but then knows that it learns from itself so I see a quite inevitable future loop.
I also see a habit of users having documents, images, anything produced by artificial intelligence that look a bit alike because they have a recognizable common denominator. This can be a problem that can also be transformed into an opportunity for those who do jobs that are outside of this type […] It was a bit like that with digital photography, a flood of images, anyone can shoot, the standard of excellence lowers but those who do things differently after a few years found themselves even more on top. It’s the rule of advancing if everyone else takes a step back.
Iago Corazza
Then an interesting Podcast “The Reality Factory” in its interview version called Visions where in the episode “All the Truth Please about Minerva, the first truly Italian LLM” interviews Roberto Navigli, leader of the Sapienza NLP (Natural Language Processing) research group who gives his explanation on the latent understanding of LLMs:
How does a machine understand the meaning of an ambiguous word in a given context?
Like “pesca”, which can mean both sport fishing and the fruit, or, more complexly, “piano”, which can denote a word, a common noun like the piano, the government’s plan, the surface of a table, but also piano as a person, for example Renzo Piano, or even piano as an adverb.
[…]
But does [generative AI] understand or not?
The answer is it does understand, but in an uninterpretable and explicit way. It understands in a so-called latent form, an implicit form that is difficult to interpret. We say it understands as much as it needs to understand to do its job, but it is not a human understanding.
The human has the great advantage of being able to explain better, therefore to give an additional explanation, and an interpretation. Something that can also do chatGPT, for example, but in a way that often can also be contradictory. Obviously, if a human contradicts themselves when responding to a further question about their own statement, we start to worry. If a large language model, like chatGPT or other models contradicts themselves, we know we are dealing with an automatic system, with so-called hallucinations and still with problems that are ongoing studies.
[…] a lot of work has been done before arriving at generative AI, at understanding […] and we return to the discussion of what is written in a text, to resolve its ambiguity […] And it does so because it has been trained to do so by reading billions and billions of other words.
[…] The first stage is that of the so-called Foundation Model, i.e., the foundation model, which must learn the language, more than it must learn to respond. It is not like a child to whom you must teach a language: here in teaching the language we do something different from what we do with humans, that is, we subject it to an enormous and endless amount of documents, information, and texts, which, therefore, allow the model to understand what the structure of the language is, which words follow other words and that allows the model to avoid making blunders, those that we humans make many times. We, however, do not read 500 billion words in order to converse or even simply write a sentence. Instead, a model today, at the state of the art, needs this and this is the first stage. […]
Then there is a subsequent process which is the so-called supervised learning or also called Instruction Tuning for which the model is presented with examples of instructions or commands, orders, requests and the corresponding answers that ideally we would like the model to give. You also make it understand how the answers are made, that is, you train it to use that information and to give it form because you give it examples. The more varied these sets of instructions and answers are, the better the system will converse or will still be able to satisfy the needs.
Then you also need to start setting some boundaries, that is, you have to teach it what it can say and what it cannot say […] because anyway the machine improvises and does not have ethics. But above all, the machine always responds, this is an essential point and when it does not know what to say, it invents, as does a student in an examination where they must say something instead of remaining silent. And this is what we call hallucinations of the algorithm.
Roberto Navigli
Finally, the podcast “Interview with a Pythonist” in a particular episode that focuses on LLMs: LLM: the open framework born in Italy. Ep 45. Here Piero Savastano, a data scientist and founder of the open-source AI project Cheshire Cat (Cheshire Cat) is interviewed. Assuming that the entire interview was really very interesting and I therefore recommend listening to it in its entirety, I propose some excerpts that made me think about the fact that finally we have conquered language in machines:
So you can talk to it, you can create structured things, you can build a top, but you’re always on a machine that really is like flying, it’s like flying, but you’re on an airplane where you can see the engine sputtering, that could abandon you at any moment, it’s ultra imprecise, insecure, vague, out of control, but that’s the price we pay for having conquered language in machines.
[…]
Yes, then it starts in 2020, GPT-3 comes out, which is the version 10x in terms of quantity of parameters and quantity of training data compared to GPT-2, which was a model actually quite similar to others of the period, by OpenAI, so OpenAI didn’t exactly start too much on language models. With GPT-3 the scale component, thanks to OpenAI and a large Microsoft funding, is introduced into this world of language models and with the scale finally we reach linguistic capabilities that go well beyond what we expected. In their article they also say that these are emerging capabilitiesand that “the choice we made to create GPT-3 is the scale,we added the scale, we put in computers and data. What we achieved is the emergent property of few-shot learning, and they manage to do things that we could not do before.”
[…]
I believe in RAG rather than Fine-Tuning, because for the language model, if the retrieval is done well, you have a prompt where the answer is already written; it simply has to, using attention, reshuffle the tokens and produce a linguistic response. So, you don’t leverage the memory of terabytes of text that have been used for training, which is very fascinating indeed, however, once the language model has conquered language, I give it the information, it doesn’t need GPT to know what the capital of Bulgaria is, I’m interested that it knows how to articulate well the language concerning capitals.”
Piero Savastano