Start your day with intelligence. Get The OODA Daily Pulse.
By Rich Heimann, Chief AI Officer at Cybraics & Marvin Wheeler, CEO at Cybraics and Chief Innovation Officer at SilverSky
Traditional threat detection (e.g., rule-based intrusion detection, anti-virus systems, and threat intelligence feeds) has been reactive and not a reliable way of preventing threats. While a database full of past events is better than an empty one, threat detection tools matter only insofar as they relate to threats your organization faces. Since there is no guarantee that traditional solutions will provide any protection, these solutions must serve a subordinate role to learning directly from adversaries using computational tools like statistical and machine learning to detect aberrant behavioral patterns.
Machine learning can be applied effectively to threat detection because adversaries operate with incomplete information about your defenses. External adversaries are scanning, phishing, acquiring credentials, escalating privileges, evading defenses, moving laterally, and collecting information. These pursuits provide an opportunity to learn from trace evidenceleft behind by adversaries, including their tradecraft or lack of tradecraft, successes, failures, errors, oversights, and accidents, to provide an early warning system. Remarkably, malicious tradecraft in the early stages of so-called reconnaissance is what traditional solutions ignore, despite the information asymmetry between attacker and defender strongly favoring the defender. Still, as the name suggests, “trace evidence” is a weak signal. A weak signal is any solution that performs slightly better than guessing, in contrast to a strong signal, which is near optimal. Learning with data produces such weak signals because machine learning is limited by the amount of information in data, making its strengths paradoxical.
Complete and perfect information exists in small worlds like chess but not in the real world. Just as there is no guarantee that traditional solutions will provide any protection, there is never an a priori guarantee that your data contains enough information to predict what you need. The technical reasons deserve a full-length article. However, a weak signal for machine learning generally means that features for learning are non-discriminative, or classes and distributions overlap and cannot be easily distinguished. Therefore, the resulting solutions may only perform slightly better than guessing. The problem’s complexity and the poor informational value of cyber data are reasons security paradigms like zero trust proliferate. The implicit declaration of such paradigms is this: it is more manageable for everyone to prove their trustworthiness than to discover malicious activity and nefarious actors in network traffic.
In this context, we must not confuse “learning” with learning new behaviors, threats, or attack vectors. Machine learning is not a blank slate. Unlike informal learning, which humans engage in, machine learning will not learn new threats in new domains without being purpose-built. That’s because machine learning fits a function to data as closely as possible locally, regardless of how it performs outside these situations. While machine learning is reasonably good at classifying noisy inputs based on known situations, it struggles with zero-days and advanced persistent threats (APTs) because we lack labeled data and complete homogenized datasets. Threats cannot be identified in advance, and adversarial tradecraft is distributed across datasets. Therefore, learning paradigms like supervised machine learning or any single machine learning algorithm will struggle as a unitary response to threat detection regardless of the learning paradigm.
The point is that there are no perfect, unitary error-free solutions. Traditional security tools are countless, subvert information advantages, and lag far behind new and adaptive threats.However, they are stable and represent a way of putting aspects of a problem that we understand back into a solution. Computational tools cannot be error-free because we do not have complete and perfect information. You will have to do everything possible to squeeze out weak signals from perimeter device logs, network logs, event logs, endpoint logs, and application logs. However, learning directly from adversaries rather than vicariously from peers is more robust but does not replace other weaker signals. Consider that driving a car using your windshield is superior to using your mirrors. Also, consider that your windshield didn’t replace your mirrors,which is still required because you still need to find your blind spots and even drive in reverse.
Immanuel Kant famously said, “Out of the crooked timber of humanity, no straight thing was ever made.” This article shows that Kant’s proposition is no less true for threat detection. Still, some problem-solving strategies are better than others, and threat detection can be optimallymanaged with weak signals. I know. Promoting weak signals must seem odd,especially after writing over 700 words criticizing weak signals. However, the counterintuitive defense of weak signals makes sense to anyone familiar with the so-called wisdom of the crowds. The wisdom of crowds suggests that aggregation plays an essential role in learning and that groups are better at making predictions than individual experts.This observation is valid for weak signals aggregated together, which shows that weak and strong aren’t as dichotomous as we might think. In fact, acounterintuitive mathematical proof shows that weak and strong are equivalent (“he Strength of Weak Learnability”). This counterintuitive proof is the basis for ensemble learning. Although all ensemble learning follows this general strategy, some specific differences and optimizations exist beyond this article. Nevertheless, the strength of weak signals is essential for threat detection.
At Cybraics, we have taken a similar approach based on the team’s exclusive observations supporting threat detection research conducted at DARPA. Cybraics’ solution uses meta-algorithms for distributed learning over the whole cybersecurity problem. Meta-algorithmssignificantly reduce false positives by combining weak signals, which achievemaximal coverage over the threat space, thereby reducing false negatives. Such a strategy treats the threat detection problem respectfully as dispersed and connected. The reward is an early warning system thatprovides a strong signal of noticeably high impact above a confidence threshold for human curation or automated response.
The title of this article emphasizes the counterintuitive relationship that weak signals share with strong security. Weak and strong are juxtaposed to show that these two concepts are not opposites. In fact, within the correct architecture, weak and strong are equivalent. To place next to—or to juxtapose—is to reevaluate something based on a sharp proximal change. In other words, to notice things in juxtaposition is to see things side by side and better understand a given concept. Not opposed, but juxtaposed. Not opposite, but adjacent. Weak signals are an unfortunate reality we don’t have to ignore or hide but instead embrace.