Tech Reporter Sean Michael Kerner from VentureBeat provided an overview of Dr. Martell’s presentation:
“On the main stage at the DEF CON security conference in a Friday afternoon session (Aug. 11), Craig Martell, chief digital and AI officer at the U.S. Defense Department (DoD), came bearing a number of key messages.
First off, he wants people to understand that large language models (LLMs) are not sentient and aren’t actually able to reason.
Martell and the DoD also want more rigor in model development to help limit the risks of hallucination — wherein AI chatbots generate false information. Martell, who is also an adjunct professor at Northeastern University teaching machine learning (ML), treated the mainstage DEF CON session like a lecture, repeatedly asking the audience for opinions and answers.
AI overall was a big topic at DEF CON, with the AI Village, a community of hackers and data scientists, hosting an LLM hacking competition. Whether it’s at a convention like DEF CON or as part of bug bounty efforts, Martell wants more research into LLMs’ potential vulnerabilities. Hen helps lead the DoD’s Task Force LIMA, an effort to understand the potential and the limitations of generative AI and LLMs in the DoD.
Martell spent a lot of time during his session pointing out that LLMs don’t actually reason. In his view, the current hype cycle surrounding generative AI has led to some misplaced hype and understanding about what an LLM can and cannot do.
‘We evolved to treat things that speak fluently as reasoning beings,’ Martell said.
He explained that at the most basic level a large language model is a model that predicts the next word, given the prior words. LLMs are trained on massive volumes of data with immense computing power, but he stresses that an LLM is just one big statistical model that relies on past context. (2)
Dr. Craig Martell on the Future of LLM’s at DEFCON 31
…if a soldier in the field is asking an LLM a question…there needs to be a high degree of accuracy.”
“I’m here today because I need hackers everywhere to tell us how this stuff breaks,” Martell said. “Because if we don’t know how it breaks, we can’t get clear on the acceptability conditions and if we can’t get clear on the acceptability conditions we can’t push industry towards building the right thing, so that we can deploy it and use it.”
“We as humans, I believe, are duped by fluency,” he said. “You also often want to use large language models in a context where you’re not an expert. That’s one of the real values of a large language model: … asking questions where you don’t have expertise,” Martell said. “My concern is that the thing that the model gets wrong [imposes] a high cognitive load [on a human trying] to determine whether it’s right or whether it’s wrong.”
Martell said that if a soldier in the field is asking an LLM a question about how to set up a new technology, there needs to be a high degree of accuracy. “I need five nines [99.999% accuracy] of correctness,” he said. “I cannot have a hallucination that says: ‘Oh yeah, put widget A connected to widget B’ — and it blows up.” (2)
https://oodaloop.com/archive/2023/08/16/dod-cdaos-task-force-lima-to-explore-responsible-fielding-of-generative-ai-capabilities/
https://oodaloop.com/archive/2019/02/27/securing-ai-four-areas-to-focus-on-right-now/
https://oodaloop.com/ooda-original/disruptive-technology/2023/07/17/we-have-no-moat-tracking-the-exponential-growth-of-open-source-llm-performance-latency-and-throughout/
https://oodaloop.com/archive/2023/05/05/we-have-no-moat-and-neither-does-openai-leaked-google-document-breaks-down-the-exponential-future-of-open-source-llms/