Can AI like ChatGPT or Llama 3 be trusted to provide accurate information, especially when it comes to local recommendations like restaurant listings?
## The Issue at Hand
When requesting a list of restaurants in your area using Google Maps through AI like ChatGPT or Llama 3, you received inaccurate information about a restaurant called Mug and Bean. The AI claimed this restaurant was in your town with a specific address, but it’s actually located in a neighboring town with a different street address.
### Why Does AI Provide False Information?
1. **AI’s Learning Process**: How do AI systems like ChatGPT or Llama 3 gather and process information from various sources to provide responses?
2. **Data Accuracy**: Can AI be prone to errors due to outdated or incomplete data sources?
3. **Machine Inference**: Does AI sometimes make assumptions or fabricate details based on imperfect information to fill gaps?
### Mitigating Inaccuracies in AI Responses
– **Cross-Verification**: How can users fact-check AI-provided information like restaurant listings using multiple sources?
– **Feedback Loop**: Is there a way for users to provide feedback to AI systems to improve the accuracy of responses in the future?
By investigating why AI systems like ChatGPT or Llama 3 occasionally provide incorrect information, we can gain a better understanding of their capabilities and limitations in providing accurate local recommendations.
#AI #ChatGPT #Llama3 #LocalRecommendations #InaccurateInformation #MachineLearning #DataAccuracy #UserFeedback
I bought a service off Fiverr for YouTube video topic ideas. This guy had hundreds of five star reviews. Some of his ideas didn’t even make sense, providing names of stuff that didn’t exist in the city that I wanted to. He made mistakes that only an AI could do. I called him out on it and he went ballistic with me saying he does it all himself 🤦🤣
That’s part of their creativity. They make up and pretend and hallucinate to fill in the gaps between things they know. It does an AMAZING job of letting them do some wild stuff…. but they haven’t yet learned when to apply their creativity and when to stick to facts.
When someone asks you to show them Shindler’s List, but with Muppets, you could respond “That hasn’t been done, it would be make up make-believe”, but that’s exactly the place to flex some creativity.
When someone asks for legal precedence on airlines and injuries with the food cart, it’s super easy to fill in the gaps [with made up cases](https://www.forbes.com/sites/mollybohannon/2023/06/08/lawyer-used-chatgpt-in-court-and-cited-fake-cases-a-judge-is-considering-sanctions/?sh=21d006b77c7f)
It’s been told to pretend to be an expert who knows the correct answer (as long as it’s a question that looks like one whose answer is known.)
Usually the most convincing way of pretending to be that is to actually give the correct answer. But if it doesn’t know, but thinks an expert should, then the best way of pretending is to make something up.
(We can’t easily tell it the equivalent of “if you don’t know them don’t answer”. It doesn’t know whether it knows.)
Everyone is talking about the technical details of how the program works but I want to bring up the philosophy/practical side.
The reason Chat GPT doesn’t necessarily give you an accurate answer is because that’s not the goal of the program.
The goal of a GPT-type AI is to make the reader (you) believe that a real human wrote the response.
The goal is NOT to provide you with an accurate answer.
A person could make an AI that was supposed to give you an accurate answer to a math problem for instance, or find restaurants for you, or any use case. There are many people using AI for a variety of these applications where the problem justifies having an AI to try and solve it.
However that’s not the purpose of Chat GPT.
Since the question was already answered, I’d just like to point out that you went to ChatGPT to ask it to use Google to put together a list of nearby restaurants instead of just typing “nearby restaurants” into Google yourself…
Because they are not “truly intelligent”. Not real nor artificially. All it does is, it tries to predict the next word in a sentence. There’s nothing remotely intelligent about it.
Stop believing media bullshit about it being intelligent. Can it be helpful in some situations and job positions? Sure. Is it _by itself_ going to take over our jobs? No. At least not if it’s a job requiring just a smidge intellect.
Which version of ChatGPT are you using, 3.5 or 4? I have both, 3.5 does not have access (or has limited access) to real time internet, however 4 does.
Chatgpt doesn’t “*know*” anything other than what words people tend to put together when discussing a topic.
Think how those things learn. For example, imagine you want an ai that writes wikipedia articles, and you feed it a bunch of existing articles to teach it how it is done.
Note that the information such as “the contents of the article must be true” and “references must link to an actual reference” are external to this dataset of articles. Those are things that you know, but the AI has no way of learning.
So it absolutely will create articles with made up facts and fake references.
The target is to create an article just like others, but not to fact check it.
Same with real people. If you ask me about a restaurant, and I am strongly encouraged to produce an answer that looks realistic, but absolutely not encouraged to make sure the answer is correct, I will absolutely make up a restaurant, and an address.
ChatGPT does not give accurate, true, or realistic answers. It gives answers that “look right”. That’s all it was trained to do.
Chatgpt has no idea what is factual and what is plain falsehood. It just strings together words that seem to have been put next to each other by many people.
Its not fabricating answers, its giving you a grammatically correct word salad. That word salad is *sometimes* genuinely correct. Learn not to trust a machine that gives you word salad.
That’s a consequence of how they were trained. Two things conflict: 1. They are required to always give an answer; 2. They are trained to predict what a human would answer.
So at the beginning of the training, they just make up meaningless word soup. Then they start to be good at making sentences but they don’t make sense. Then they start to be good at making sentences that make sense but aren’t really about the current subject. Then they start being good at making sentences that are about the things we were talking about but aren’t true. Then they start being good at making sentences that are about the things we were talking about and are also true.
The thing is, there is never a point were we can say “okay, here it’s always going to say true things”. We just made it as good as it could get with the current “brain size”. We stopped it from learning once it looked like it didn’t get any better, and every time it says something false we train it to no longer say it, and hope it improves it.
In reality, LLMs were intended to understand languages, we kind of got surprised it could understand the meaning behind it that well. Even if it’s not perfect, it’s still surprisingly usable.
Because when you ask chat GPT a question it doesn’t evaluate your question and give you an answer, it looks at the text you’ve entered and gives you a response that looks like responses to similar questions it’s seen before. It doesn’t actually understand what either you or it is saying, all it knows is if what it said looks like a answer it’s seen before. So to the AI restaurant at 123 Main St looks equally as real as 456 Fake St since they both follow the same pattern of [numbers word street type] and it says looks good to me. Ship it.
These things don’t actually “care” about getting things “right”. They don’t really answer questions, they generate text that _look_ like an answer to a question.
It is a subtle distinction but something looking like the right answer to a question doesn’t mean it is the right answer to the question. This is why it will, for example, cite nonexistent papers when asking technical questions. It knows answers to questions like that should have a citation. So in puts one in to make it look like the right answer.
You know all incorrect stuff and fake stories people tell on the internet?
That’s what things like ChatGPT are trained on, so that’s the sort of thing they spit out via their predictive text generation algorithms. In top of that, increasingly they’re now trained on the false information that they provided and that some gullible sod reposted somewhere else.
It’s just a fancy garbage-in-garbage-out system.
Ever hear the saying, “if you put a monkey in a room for long enough it will write Shakespeare”? Think about it that way. Monkey is not writing with any rational intent, but the end result appears as though it was.
AI is trained on patterns, and is really good at recognizing them. It doesn’t inherently understand the patterns. So when you ask it a question, it essentially looks at all its knowledge / patterns and attempts to match what comes next. Well trained AI will have lots of nuanced weights than will help guide it to the correct answer. Poorly trained AI will spout the first “match” it finds. AI’s trained by user interactions can be quickly corrupted by telling it something is true when it’s not.
You can think of AI training much like natural selection. Training data is much like an animal adapting to its environment successfully. When the AI is placed in a similar environment (chatbot on a support site) it will be successful. But AI placed in a dissimilar environment (the same chatbot relocating to a comedic dad joke site) will produce poor results since it is not in its natural environment.
I try to imagine the amount of (artificial) hubris it would take to make up any answer ever from the stuff you read off the internet. It must be almost impossible for it to know the difference between a real answer it read on the internet and some phantom of an answer it imagined after reading something related on the internet.
An LLM (what ChatGPT and Llama 3 are) is a bit like a person who has HEARD lots of things from other people, but doesn’t KNOW anything because they’ve never fact-checked it. And loves to talk about ANYTHING.
When you ask it a question, it will try to talk about the subject, based on all the different things it’s heard. But it has no way of knowing which of those things is true.
So you MIGHT get a specific answer that is correct, but you also might get slightly rambling stories about things that are related to the question. And because the LLM doesn’t know when it’s wrong, once it starts telling you a story that isn’t relevant, it can’t really stop itself.
**TLDR:**
An LLM is not a search engine, it’s a story-telling engine. It can’t look up a fact for you and present details. But it can talk about the subject by drawing on every conversation about that subject it has ever heard. Sometimes that’s much better than a search engine, but sometimes you just need an exact specific fact.
**NB:**
ChatGPT and Llama 3 are “LLM’s”, which is a type of “AI”. This question is specific to LLM’s, not all AIs.
You know those AI art tools, like Midjourney? You can ask out for a photo of Obama dressed in a power-rangers suit, but even if you get that you know that doesn’t mean he actually wore that.
ChatGPT is just doing that same generation but for text. It’s not responding to your question, it’s generating what a response to your question *might look like*.
I asked meta ai how to remove it from my Facebook app and it gave me instructions that weren’t possible to follow. I then asked if it was making things up and it said effectively ‘yes, I gave you instructions for how to turn off features based on other apps, but those options don’t exist’
This is called hallucination. AI will state things very clearly, and confidently, and with cited sources, and everything can be made up. AI doesn’t “know” anything. AI is trained on the interconnectivity of works and concepts and ideas, from which it can derive responses. These responses sound human because they’re written in complete sentences, but that’s just words and formatting.
AI responses, with proper words and formatting, are then populated with a combination of connected details that may or may not be accurate. Whether it was a misunderstanding or misspelling of a word in the training data, or just a rogue “fact” that only AI discovered (perhaps because there was some correlation between your town and the word “bean”) these hallucinations just appear. There is a whole industry of individuals that shackle the AI in various ways to avoid that correlation being made in the future, once identified.
Even if AI sounds like it’s intelligent, it isn’t. It’s writing complete sentences and filling in the details with specific words that it thinks relates to your prompt. The greatest value of AI is the ability to look at a ton of boring, similar data, and derive some meaning that no human could possibly derive, even if they had a lifetime of caffeine. Finding those interesting relations is of tremendous value to humanity. Looking at a mountain of numbers, and then recognizing that X, Y, and Z values appear under certain circumstances, can lead to breakthroughs faster than ever before. It’s not that a human couldn’t make that connection, but we’d often be making it by accident, rather than AI looking for it on purpose.
TLDR: Just because it can form complete sentences, doesn’t mean that what it is writing about is actually accurate. Every noun and verb used are derived from a mathematical calculation and correlation on the back end, and not necessarily factual.
LLMs are next word predictors. Given a consumed, tokenized data set (the internet), they can parse an input prompt (your restaurant request), find things in their set that are close to it by keyword, subject, etc., and then predict what a response would be word by word given the context and what’s already been said. They’re very good at it and the composition seems plausible. But that’s not expert knowledge or novel composition really. It’s just writing something similar to other things based on context it’s heard about. It’s not exactly intelligence; humans are just weird about talking things since we’re the only animals that do that and it underpins our civilization. It’s conceptually the same as models that can identify fruit in a basket or show you a sketch of a cat just with words. So the model is wrong because the internet is often wrong and because it’s just picking information close to the prompt. It’s not fact checking that info or applying any sort of reasonableness.
This btw is a big problem in commercial applications where the answer needs to be right. It’s the use case for a feature last year called gpt plugins where you can connect an expert or trusted source and tell GPT to only use that. If it only has to reference and quote from the right articles that do have the answer, it’s better at answering questions.
ChatGPT isn’t actually giving you an answer, it’s giving you text that it thinks is an answer. That’s a subtle difference, but it means that ChatGPT doesn’t really care about correctness, it cares about *appearing* correct.
If you ask for a list of a restaurants, it knows you want a list back of words that look like a restaurant name, followed by things that look like addresses. Whether or not they’re real or not doesn’t matter, only that they appear real
From gpt:
Okay, imagine ChatGPT is like a big library full of books. But instead of books, it has lots of information from the internet. When you ask it a question, it tries to find the best answer by looking at all the information it has. But sometimes, it might mix up the information and give a silly answer, like when you mix up your toys and make a funny-looking creature. It’s not trying to lie, it’s just trying its best with what it knows.
ChatGPT is a chat bot belonging to the family of generative AI models called Large Language Model, or LLM. ChatGPT generates text based on learned statistical relationships between words. ChatGPT does not evaluate whether or not what it is generating is correct, only whether or not it is statistically likely based on the prior input and its training data.
If you ask ChatGPT to tell you the sum of 2 + 2 it will tell you that the sum is 4. However, ChatGPT will tell you this because it’s seen that many times before in its training data, it will not tell you it because it’s evaluating a mathematical expression.
If you instead ask ChatGPT to tell you the product of 45694 and 9866 it will almost certainly give you an incorrect result. It hasn’t seen that before, so it will just produce something close to it that it has seen before because that’s the most likely result based on the training data.
Edit: I tested it, it does give an inaccurate result.
They aren’t trying to return a true answer. They are trying to return a likely answer based on the information it has and how it’s seen other people respond to similar questions. Now those are often right answers because that’s how people generally answer such questions in the data it has seen. But there is no guarantee.
They are essentially a bit smarter parrots who have been taught grammar rules. They can say things they sound right that are prompted by what you say, but they really have no idea what they’re saying.
Imagine a box you can put instructions written in chinese. Out comes the same writing only in english you can now read.
Because you can’t see inside the box, you may think a chinese national is inside, translating for you. In reality, its a random american, with an english to chinese dictionary. This person has the *tools* but not the intelligence to understand what its doing.
First off, I’d just like to define a word to add some context. Semantics is the study of meaning. In simple terms semantics refers to the meaning behind the words that you write and that you say. When you speak you naturally think about the meaning of your words. When you ask a LLM a question, you are logically expecting it to answer with a semantically coherent and correct response.
As others have mentioned, LLMs are just adding the next word based on the probability calculated from the context. However this probability is calculated in very complex ways in the background. LLMs seem to have the ability to generalize certain semantic information within their neural networks to the point where is seems to be able to reason and connect seemingly disconnected pieces of information. However this phenomena is not fully understood at the moment. This also means that when you ask it something it doesn’t know it will always give you it’s best guess based on the probability. Another weird pattern that you might see when using LLMs is that it might tell you it doesn’t know something even when it does, this is probably because the original training data probably had a certain piece of text that might have biased the model into answering in a particular way.
Artificial Intelligence is often used as a term when referring to LLMs and Machine Learning. However there are several other branches of AI that are actively being explored that I think are worth mentioning in this thread.
Knowledge Graphs is a different approach to semantic data analysis and usage. With knowledge graphs it’s easier to determine what the system knows and what the system doesn’t know. So it’s easier to keep the system from hallucinating. However knowledge graphs are usually harder to create and harder to use for more casual things.
Another interesting branch of AI is Logic programming. With logic programming you can determine the rules of a problem you are trying to solve and allow the system to interpret those rules to find a solution. With Logic programming you can solve complex issues, however, similar to knowledge graphs, using logic programming languages tends to require a lot of time, and isn’t really convenient for day to day use.
I believe future research into AI will combine these technologies in smart ways to leverage each of their strengths and weaknesses.
I recently attended an event where the head of the Microsoft Copilot team was the keynote speaker. During her presentation she stressed that the biggest issue with AI being adopted was that people were using it like a search engine. This is your problem. ChatGPT and Llama3 are not built to search the internet for you. It’s like using a screwdriver to hammer a nail, you’re using the tool wrong. These tools are meant to be used to create new ideas. The other posts talk about HOW the tools create new ideas, but the key take away here is that these are GENERATIVE tools. That’s what the ‘G’ stands for in ChatGPT. Ask them to create a meal plan for your specific dietary needs or create a new recipe given a list of ingredients. Do not ask them to find you a restaurant to eat at.
Because while these devices are trained on internet data the internet itself is full of a lot of misinformation and on top of that learning algorithms are still kind of in their infancy, despite how promising they are they certainly still have a lot of issues that can result in a lot of false positives.
Although on the other hand in my experience chat GPT and similar AIs do not seem to lie nearly as much as some news articles claim.
Even tested this recently when they were news articles claiming that like chat GPT almost always gets math wrong, started asking it math questions of various degrees of hardness and it only really ended up getting one kind of wrong and it was due to a mistake that I could see even a human making
And if I just usually ask it something basic like who was the 30th president of the USA or something it usually gets that right. It typically just seems to have issues with more logic related questions because the AI itself is not really designed to be logical it’s designed to be conversational
Because large language models don’t really understand what the “truth” is.
They know how to build human-readable sentences, and they know how to scour the internet for data.
When you ask them a question, they will attempt to build an appropriate human-readable answer, and will check the internet (and their own database, if any) to supply specific details to base the sentence(s) around.
At no point in this process does it do any kind of checking that what it’s saying is actually *true*.
These systems lack any inherit knowledge, all they do is try to predict the next word in a sentence.
They are reasoning engines, not databases. You can to an extend avoid hallucinations by feeding it context relevant information (eg. the Wikipedia page on the topic you are asking questions about) together with your prompt, this is what many tools utilizing ChatGPT’s API do, but it may still invent stuff even if this is done.
Due to this risk there always needs to be a human in the loop who validates the output of these models, and you should never trust anything these models claim unless you can validate it.
LLMs do not *know* anything and you should not use them to research or reference real facts. They simply predict what is likely to be the next word in a sentence.
It isn’t actually “making up” an answer, in that it isn’t some kind of deception or the like (that would require intent, and it does not have intent, it’s just a *very* fancy multiplication program).
It is collecting together data that forms a grammatically-correct sentence, based on the sentences you gave it. The internal calculations which figure out whether the sentence is grammatically correct have zero ability to actually know whether the statements it makes are *factual* or not.
The technical term, in “AI” design, for this sort of thing is a “hallucination.”
Basically it asks itself “How would a human answer this question?” looking to it’s trainings data – which is all conversations online prior to 2022.
What that tells it is that a human would say something along the lines of “[male Italian name]’s Pizzeria”, “[Color] [Dragon, Tiger or Lotus] Restaurant”.
So it tells you that.
That’s what humans say when being asked for Restaurants.
They are not “intelligent”. They are fancy-shmancy autocompletes, just like the basic autocomplete on your phone.
They are designed to *generate* text which *looks* human-written. That’s it.
It’s not actually thinking. It’s probabilistically associating. Which is often fine for writing but useless for technical questions without clear answers, or ones with multiple plausible answers like streets.
Because AI like ChatGPT is not **thinking** about the response, its basically glorified autocomplete. It has a huge dataset of words and the probability that a word will come after another word, it doesn’t “understand” anything its outputting, only variables and probabilities.
Never ever trust information given by an AI chatbot.
Chat GPT chooses the next word in a sentence by looking at how often different words come after the previous one based on the material that was used to train it. It doesn’t have the ability to evaluate whether the most probable word makes a true statement.
(Edit: it’s really more complex than that, but you’re five years old.)