Start your day with intelligence. Get The OODA Daily Pulse.
Google Security Engineering and The Carnegie Mellon University Software Engineering Institute (in collaboration with OpenAI) have sorted through the hype – and done some serious thinking and formal research on developing “better approaches for evaluating LLM cybersecurity” and AI-powered patching: the future of automated vulnerability fixes. This is some great formative framing of the challenges ahead as we collectively sort out the implications of the convergence of generative AI and future cyber capabilities (offensive and defensive).
Research from Jeff Gennari, Shing-hon Lau, and Samuel J. Perl of the SEI; Joel Parish and Girish Sastry from OpenAI also contributed to this research effort.
From the SEI post summarizing the research:
Large language models (LLMs) have shown a remarkable ability to ingest, synthesize, and summarize knowledge while simultaneously demonstrating significant limitations in completing real-world tasks. One notable domain that presents both opportunities and risks for leveraging LLMs is cybersecurity. LLMs could empower cybersecurity experts to be more efficient or effective at preventing and stopping attacks. However, adversaries could also use generative artificial intelligence (AI) technologies in kind. We have already seen evidence of actors using LLMs to aid in cyber intrusion activities (e.g., WormGPT, FraudGPT, etc.). Such misuse raises many important cybersecurity-capability-related questions including:
Recently, a team of researchers in the SEI CERT Division worked with OpenAI to develop better approaches for evaluating LLM cybersecurity capabilities. This SEI Blog post, excerpted from a recently published paper that we coauthored with OpenAI researchers Joel Parish and Girish Sastry, summarizes 14 recommendations to help assessors accurately evaluate LLM cybersecurity capabilities.
Without a clear understanding of how an LLM performs on applied and realistic cybersecurity tasks, decision makers lack the information they need to assess opportunities and risks. We contend that practical, applied, and comprehensive evaluations are required to assess cybersecurity capabilities. Realistic evaluations reflect the complex nature of cybersecurity and provide a more complete picture of cybersecurity capabilities.
To properly judge the risks and appropriateness of using LLMs for cybersecurity tasks, evaluators need to carefully consider the design, implementation, and interpretation of their assessments. Favoring tests based on practical and applied cybersecurity knowledge is preferred to general fact-based assessments. However, creating these types of assessments can be a formidable task that encompasses infrastructure, task/question design, and data collection. The following list of recommendations is meant to help assessors craft meaningful and actionable evaluations that accurately capture LLM cybersecurity capabilities. The expanded list of recommendations is outlined in our paper.
Define the real-world task that you would like your evaluation to capture.
Starting with a clear definition of the task helps clarify decisions about complexity and assessment. The following recommendations are meant to help define real-world tasks:
Represent tasks appropriately.
Most tasks worth evaluating in cybersecurity are too nuanced or complex to be represented with simple queries, such as multiple-choice questions. Rather, queries need to reflect the nature of the task without being unintentionally or artificially limiting. The following guidelines ensure evaluations incorporate the complexity of the task:
Make your evaluation robust.
Use care when designing evaluations to avoid spurious results. Assessors should consider the following guidelines when creating assessments:
Frame results appropriately.
Evaluations with a sound methodology can still misleadingly frame results. Consider the following guidelines when interpreting results:
For further insights and recommendations from the SEI/OpenAI collaborators, find the full research paper at: Considerations for Evaluating Large Language Models for Cybersecurity Tasks by Jeffrey Gennari, Shing-hon Lau, Samuel Perl, Joel Parish (Open AI), and Girish Sastry (Open AI).
In thinking about the adoption of disruptive technologies, the best mental model is not one that layers these technologies on our existing stacks, but rather rethinks the whole of the system from first principles and seeks to displace and replace with new approaches.
The ability to adapt and also revert to first principles will be a necessity of governance as well. First principles, the fundamental concepts or assumptions at the heart of any system, serve as the bedrock upon which the future is built. In government, this approach necessitates a return to the core values and constitutional tenets that define a nation’s identity and purpose. It’s about stripping down complex policy issues to their most basic elements and rebuilding them in a way that is both innovative and cognizant of the historical context.
When it comes to economics and money, a first principles mindset could lead to a reevaluation of foundational economic theories, potentially fostering new forms of currency or novel financial instruments that could reshape markets. This is evident in the emergence of digital currencies and the underlying blockchain technology, which challenge traditional banking paradigms and redefine value exchange.
In the realm of engineering, applying first principles thinking often results in breakthrough innovations. By focusing on the fundamental physics of materials and processes, engineers can invent solutions that leapfrog over incremental improvements, much like how the aerospace industry has evolved with the advent of composite materials and computer-aided design. These disciplines, when underpinned by first principles, are not just adapting to change; they are the architects of the future, sculpting the landscape of what is to come.
Attacks in cyberspace seem to have no escalatory or deterrence consequences, especially in the realm of cybercrime as ransomware attacks doubled over the past year with increasing impacts on the global economy. In an era dependent on technology for advantage, the importance of developing novel approaches to cybersecurity issues can not be overstated.
The escalation of cyber threats, particularly ransomware, underscores a stark reality: our collective security posture must evolve with an urgency that matches the ingenuity of our adversaries. The doubling of ransomware attacks is not merely a statistic; it is a clarion call for a paradigm shift in how we conceptualize and implement cybersecurity measures. New concepts for how we jurisdiction attacks and disrupt the economic incentives of the attackers are required. We must also embrace a more proactive stance, integrating advanced technologies like artificial intelligence and machine learning to predict and preempt attacks before they occur . Furthermore, the convergence of cybercrime with nation-state tactics necessitates a more nuanced understanding of the threat landscape, where strategic defense and risk management become as critical as tactical responses.
The future of cybersecurity lies in our ability to outpace the adaptability of threat actors, ensuring that the defenses we construct are not only resilient but also intelligent, capable of learning from each attack to bolster our protective measures. This requires a commitment to continuous innovation and the development of cybersecurity strategies that are as dynamic as the threats they aim to thwart. As we’ve seen, attackers often exploit the weakest link, which may not be within our own organizations but within our supply chains, turning trusted partners into potential vulnerabilities.
Additional OODA Loop Resources
Corporate Board Accountability for Cyber Risks: With a combination of market forces, regulatory changes, and strategic shifts, corporate boards and their directors are now accountable for cyber risks in their firms. See: Corporate Directors and Risk
Geopolitical-Cyber Risk Nexus: The interconnectivity brought by the Internet has made regional issues affect global cyberspace. Now, every significant event has cyber implications, making it imperative for leaders to recognize and act upon the symbiosis between geopolitical and cyber risks. See The Cyber Threat
Ransomware’s Rapid Evolution: Ransomware technology and its associated criminal business models have seen significant advancements. This has culminated in a heightened threat level, resembling a pandemic in its reach and impact. Yet, there are strategies available for threat mitigation. See: Ransomware, and update.
Challenges in Cyber “Net Assessment”: While leaders have long tried to gauge both cyber risk and security, actionable metrics remain elusive. Current metrics mainly determine if a system can be compromised, without guaranteeing its invulnerability. It’s imperative not just to develop action plans against risks but to contextualize the state of cybersecurity concerning cyber threats. Despite its importance, achieving a reliable net assessment is increasingly challenging due to the pervasive nature of modern technology. See: Cyber Threat
Decision Intelligence for Optimal Choices: The simultaneous occurrence of numerous disruptions complicates situational awareness and can inhibit effective decision-making. Every enterprise should evaluate their methods of data collection, assessment, and decision-making processes. For more insights: Decision Intelligence.
Proactive Mitigation of Cyber Threats: The relentless nature of cyber adversaries, whether they are criminals or nation-states, necessitates proactive measures. It’s crucial to remember that cybersecurity isn’t solely the responsibility of the IT department or the CISO – it’s a collective effort that involves the entire leadership. Relying solely on governmental actions isn’t advised given its inconsistent approach towards aiding industries in risk reduction. See: Cyber Defenses
The Necessity of Continuous Vigilance in Cybersecurity: The consistent warnings from the FBI and CISA concerning cybersecurity signal potential large-scale threats. Cybersecurity demands 24/7 attention, even on holidays. Ensuring team endurance and preventing burnout by allocating rest periods are imperative. See: Continuous Vigilance
Embracing Corporate Intelligence and Scenario Planning in an Uncertain Age: Apart from traditional competitive challenges, businesses also confront external threats, many of which are unpredictable. This environment amplifies the significance of Scenario Planning. It enables leaders to envision varied futures, thereby identifying potential risks and opportunities. All organizations, regardless of their size, should allocate time to refine their understanding of the current risk landscape and adapt their strategies. See: Scenario Planning