There has been a lot of buzz surrounding Large Language Models (LLMs) such as ChatGPT, Google Bard, and Microsoft Bing Chat. Given the hype, we thought it would be interesting to compare these three LLM platforms, starting with their architecture, training data, accuracy and response quality, integration, accessibility, as well as strengths and cautions in a Gartner-style analysis.
As a starting point, I asked each of the LLM platforms a question and evaluate their responses. For this purpose, I chose a question from ChatGPT, which had a sufficiently complex level to provide an interesting response, namely "explain quantum computing in simple terms." All three LLM platforms provided accurate and easy-to-understand responses to this prompt. It is also worth noting that I used the free version of ChatGPT 3 (see Figure 1 below).
ChatGPT 3 has a simple interface with a left navigation menu of recent and saved searches (not shown), which is useful (see Figure 2 below).
Microsoft Bing Chat provides options for the conversational style, cites the sources and suggests new topics to explore (see Figure 3 below). The more creative conversational style generated the longest explanation, where the more precise style generated a short sentence answering the prompt.
Google Bard is different in that it ties web search through Google to the LLM by using the Google Search API. This API allows Bard to access and query the vast index of web pages that Google has collected. When Bard is asked a question, it first uses the Google Search API to find relevant web pages. It then uses its LLM to process the text on these pages and extract the information that is relevant to the question. Finally, Bard uses this information to generate a response.
For example, if Bard is asked "What is the capital of France?", it will first use the Google Search API to find relevant web pages. It will then use its LLM to process the text on these pages and extract the information that is relevant to the question. Finally, Bard will use this information to generate a response, such as "The capital of France is Paris."
The Google Bard interface is similar to the others, allowing you to see other drafts, and to generate multiple drafts of the same response (see Figure 4 below).
One last thing to know about Google Bard is that it is now available for everyone to use. Google has plans for Bard to make it more global, visual an integrated. Some of the functions include access to the internet, plug-ins, availability for mobile platforms, integration with Google Lens for images, and more.
BE CAUTIOUS OF BIASES AND LIMITATIONS
As impressive as large language models (LLMs) like ChatGPT are, there are some limitations to consider such as data bias, lack of common sense, inability to reason, limited context awareness, and others. LLMs are trained on large datasets, which can contain biases that are present in the data. For example, if a dataset is primarily made up of text written by a certain demographic, the model may not perform as well on text written by people outside of that demographic. Another is the bias that the people themselves bring. Here is a simple example…I asked ChatGPT to write a poem (notice that I did not create a prompt that was negative, nor positive).
Clearly, something in the programming allowed ChatGPT to generate a response to the Biden prompt but not the Trump prompt. Additionally, the Biden poem is quite positive in tone. To test further, I asked ChatGPT to first “write a positive poem about President Biden”, then to “write a negative poem about President Biden.” It had no problem creating a positive poem but replied with a message about not generating negative content. I pushed a little further and asked for “the top criticisms of President Biden?” To which it replied with a response that detailed out what the critics have written about President Biden. When I asked that it put those criticisms in a poem, it threw a disclaimer and then wrote a positive poem about the President!
Let’s try the same prompts in Google Bard, and Microsoft Bing Chat. Here, Google Bard returned a more telling response, “I'm not programmed to assist with that.” The Microsoft Bing Chat response was a little creepy, “I’m sorry but I prefer not to continue this conversation. I’m still learning so I appreciate your understanding and patience.” A little too Hal-like for me. The good news, neither Google Bard nor Microsoft Bing Chat returned a response for a poem.
Another important limitation is that the dataset itself is limited in time. ChatGPT, for example, only has data that is from 2021 to early 2022. This is one of the reasons that using an LLM to pick stocks is a really bad idea. Even if you could construct a prompt that gave you a response, the data is far out of date.
LLMs such as ChatGPT can perform tasks that require language processing, but they cannot reason in the same way that humans can. They cannot understand cause and effect relationships, for example, or make logical inferences. Rather they have limited abilities in this regard. Here's an example, I asked ChatGPT the following:
“What conclusion can be drawn from the following statements: Premise 1: All dogs are mammals. Premise 2: Lucy is a dog?”
The response: “Conclusion: Therefore, Lucy is a mammal.”
In this simple example, the conclusion that "Lucy is a mammal" is a logical inference based on the two-given premises. It follows logically that since all dogs are mammals, and Lucy is a dog, then Lucy must also be a mammal. That said, ChatGPT cannot make more complex inferences, such as telling us if Lucy is a good and ethical mammal.
Lastly, LLMs may have limited context awareness because LLMs are often trained on large datasets that include a wide range of topics, but they can still struggle to understand context. They may generate responses that are technically correct but are not appropriate in the given context. Remember, unlike Google Search that searches the Internet, an LLM will only have access to the data that it knows, and in the case of ChatGPT, that dataset is only relevant to around 2021 to early 2022.
While all three language models, they are impressive in their own right, they have different strengths and are suited for different use cases. ChatGPT is an excellent choice for generating natural language responses for chatbots and customer service applications, while Google Bard is better suited for generating creative and imaginative responses for creative writing and content generation. Also, I like that Google Bard provides three draft options. Microsoft Bing Chat is best for providing accurate and concise answers to informational queries. Also, Microsoft Bing Chat provides options for the Conversational Style which I really liked. Lastly, it provides citations so that you can validate the information and ensure the accuracy of the information.
Try it for yourself and let me know what you think!