ChatGPT Has All the Answers – But Not Always the Right Ones

The latest AI sensation is OpenAI’s ChatGPT. Not a day goes by without multiple mentions of this glib chatbot in popular and tech media. The praise came right after it hit the streets last November. It was genius. It had all the answers. It was a Google-killer.

The protests followed soon thereafter. Educators complained of students using ChatGPT to cheat on essay questions. By January, New York City public schools had banned ChatGPT on school devices and networks due to “concerns about negative impacts on student learning and concerns regarding the safety and accuracy of content,” a spokesperson told Chalkbeat. Lawyers in several states felt threatened by ChatGPT’s ability to answer questions in court. After ChatGPT got a passing score (60%) on the USMLE, the exam required for entry to a residency program, doctors were quick to denigrate its performance.

Not heard from has been the engineering community. Until now.

Let the Testing Begin

ChatGPT does indeed seem to give well formed answers to many questions. It can pass muster with questions in literature, philosophy, history... many of the classes universities insist engineering students take so they don’t graduate as complete geeks. For example, asked about the symbolism of The Scarlet Letter, Nathaniel Hawthorne’s must-read American Literature classic, ChatGPT came up with:

In Nathaniel Hawthorne's novel "The Scarlet Letter," Hester Prynne is the main character and her name and the scarlet letter "A" she is forced to wear symbolize different things. The "A" initially stands for "adultery," as Hester has committed this sin, but it also comes to symbolize "able" and "angel" as the novel progresses. Hester herself is a symbol of the human spirit's ability to endure and triumph over shame and societal judgment. She is also a symbol of the complexity and contradiction of human nature, as she is both flawed and heroic.

“Not bad,” said my wife, summa cum laude, masters in literature (Villanova University), who had posed the question.

Asked about the biggest mass murderers in history, ChatGPT came up with the usual suspects: Hitler, Stalin, Mao Zedong, Pol Pot and Genghis Khan, a numbered list with description of each villain and a ranking based on what seemed to be a current perception of heinousness rather than the number of deaths attributed to each.

How deep does ChatGPT go with more obscure, very techy subjects? Surely, ChatGPT could not know about RPN, a way to enter numbers into calculator only I seem to favor. But ChatGPT knew all about it.

“Reverse Polish notation (RPN), also known as postfix notation, is a mathematical notation in which the operators follow the operands. Instead of the typical "infix" notation, where the operator is placed between the operands (e.g. 2 + 2), RPN places the operator after the operands (e.g. 2 2 +). This notation can be evaluated using a stack, which allows for simpler evaluation and eliminates the need for parentheses. RPN is often used in calculators and computer programming languages.”

I was curious how ChatGPT would handle calculations. Let’s start with something easy, like time value of money. Asked it to find the monthly payment on a 10 year $100,000 loan at 5% interest , we got the following:

The monthly payment on a 10 year $100,000 loan at 5% interest would be $877.57. This is calculated using the standard formula for calculating fixed monthly payments on a loan, which is:

M = P [ i(1 + i)^n ] / [ (1 + i)^n – 1]

Where: M = monthly payment P = the principal amount of the loan (i.e. $100,000) i = the monthly interest rate (5% annual interest rate / 12 months in a year) n = the number of payments (10 years x 12 months in a year).

ChatGPT was quick to show the formula, though all on one line. However, the answer, $877.57, was not correct. The correct answer is $1,060.66.

Bummer. If it can’t handle common financial calculations, how could it possibly handle engineering calculations? We warm up by asking ChatGPT about the best failure theory for ductile materials.

The most widely used failure theory for ductile materials is the von Mises yield criterion, also known as the maximum distortion energy criterion. This theory is based on the idea that failure occurs when the material reaches a state of maximum shear stress, or maximum distortion energy. This theory is applicable to ductile materials that undergo plastic deformation and is relatively simple to use in engineering calculations.

It states that failure will occur when the von Mises stress, which is the equivalent stress, reaches the material's yield strength. The von Mises stress is calculated as the square root of the sum of the squares of the principal stresses, divided by the square root of two.

This theory is widely used for ductile materials because it gives a good representation of the behavior of the material under complex loading conditions, it's easy to use and it's relatively simple to implement into engineering calculations.

My faith is restored. ChatGPT provided a totally plausible answer. Big surprise. The same question had stumped many a designer attempting to use FEA to do a stress analysis. Plus, it expounded on the results so eloquently that were I still teaching “Strength of Materials,” I would have given ChatGPT full points.

But Give ChatGPT Enough Rope and ...

Still trying to plumb the depths of ChatGPT, damn the torpedoes, we follow up with what would be routine table lookup in a Strength of Structures text. We ask it the moment of inertia of a S12 X 35 I-beam.

The area moment of inertia (also known as the second moment of area) of an S12x35 I-beam can be calculated by taking the sum of the area moments of inertia of the individual parts of the cross-section. The S12x35 I-beam has the following dimensions:

flange width: 12 inches
flange thickness: 0.47 inches
web thickness: 0.31 inches
web height: 35 inches

The area moment of inertia can be calculated using the following formula:

I = (bh^3)/12 + (2twh^3)/3

Where: I = area moment of inertia b = flange width h = flange thickness t = web thickness w = web height

The value of I for S12x35 I-beam is approximately: I = (120.47^3)/12 + (20.31*35^3)/3 = 8.33 in^4

It's worth noting that the shape of the I-Beam can change depending on the manufacturer, so it's important to refer to the specific dimensions of the I-beam you are working with to ensure accurate calculations.

ChatGPT's answer was not in agreement with established values for the moment of inertia about the X-X axis (the one that matters) of a S12 X 35 I-beam, giving a value of 8.33 in⁴, whereas Popov’s Strength of Materials and Mark’s Standard Handbook for Mechanical Engineers give values of 227 in⁴and 229 in⁴, respectively.ChatGPT answer is closer (though still off 20%) to the moment of inertial about the less critical axis against which I-beams are not usually loaded.

Why Continue?

By now ChatGPT has raised enough red flags that any help might offer to engineers should be treated with skepticism. But let's continue. A precocious child should not be dismissed for its mistakes but nurtured for its talent. Grown in a lab, ChatGPT has only seen the glare of the spotlight for a few months. It gives many correct answers, certainly not all, but enough to show a promising future.

Microsoft Throws in the Big Bucks

OpenAI is known to technology watchers as the creator of another AI sensation, DALL-E 2, which creates images from text prompts

ChatGPT’s initial popularity was a surprise to its creators. Watching closely was the largest software company in the world. Microsoft had invested over $3 billion in OpenAI already and just poured in another $10 billion, according to the New York Times. OpenAI’s valuation is $29 billion, making it possibly the highest valuation of any pure-AI company. Not bad for a non-profit.

Microsoft is itching to capture some of the lucrative search advertising market but has failed with its Bing search engine. Over 90% of searches are done with Google and only 3.4% with Bing.

Microsoft has already implemented OpenAI’s technology into GitHub, its developer platform, using it in “copilot” mode to generate snippets of code.

What is the Google Connection?

Google, arguably the leader in AI among big tech companies, has used AI to answer natural language questions through its search interface and has been active in the development of a chatbot. “In fact, In fact, the technology at the heart of OpenAI’s chat bot was developed by researchers at Google,” say Nico Grant and Cade Metz of the New York Times. It may have worked so well that it threatened to upset Google’s business model.

“Google has a business model issue,” said Amr Awadallah, who worked for Yahoo and Google. “If Google gives you the perfect answer to each query, you won’t click on any ads.”

Google chatbot uses LaMDA, or Language Model for Dialog Applications, for the last seven years—perhaps with the hope of eclipsing ChatGPT. Unlike OpenAI, it has kept its chatbot almost completely out of the public eye.

ChatGPT’s popularity caught Alphabet (Google’s parent company) executives by surprise. They have issued a code red, according to the New York Times, which also reported that current CEO Sundar Pichai called an emergency meeting with Google founders, Larry Page and Sergey Brin, both of whom had long since left the building..

What Could Possibly Go Wrong?

Chatbots, once the hobby of bored developers and sci-fi buffs, are now in an arms race with every major tech company with their challenger. The race has had its casualties. Microsoft’s chatbot Tay was pulled back after racist, xenophobic and profane language. Microsoft chabot suffered a similar fate. Those failures may have led others (notably Google and Amazon) to be more cautious before releasing another chatbot to the public – only to be jarred into action with the runaway hype around ChatGPT.

Disclaimer on the ChatGPT interface does little to inspire trust in its answers.

OpenAI, however, is moving aggressively forward and has gone brazenly public, as if a disclaimer prominent on its interface admitting to “incorrect or nonsensical answers” shields it from backlash and liability that has befallen on others.

Clearly, ChatGPT has a long road to travel before it can join handbooks, calculators, CAE software and Google as a tool in the engineers’ toolbox. That road, however, has been suddenly widened, its speed limit removed and paved with gold -- thanks to a $10 billion investment by Microsoft.