While supporting leading-edge cybersecurity and machine learning research at a top government lab, Booz Allen combined multiple innovations to create a groundbreaking, automated software tool to identify and classify malware.
We're building value and opportunity by investing in cybersecurity, analytics, digital solutions, engineering and science, and consulting. Our culture of innovation empowers employees as creative thinkers, bringing unparalleled value for our clients and for any problem we try to tackle.
Empower People to Change the World®
While supporting leading-edge cybersecurity and machine learning research at a top government lab, Booz Allen combined multiple innovations to create a groundbreaking, automated software tool to identify and classify malware.
The government’s Laboratory for Physical Sciences (LPS) is a unique agency where scientists from academia, industry, and government collaborate on research that advances the physics and engineering behind information science and technology. Booz Allen has long supported LPS’ research into advanced computing, machine learning, and cybersecurity, including the role that machine learning can play in addressing malware threats quickly and effectively.
In the battle against malware, it is important for cybersecurity analysts to identify and classify it. To do this, analysts group malware into families that share common code and traits. They often use a software tool called Yara, which works by searching for sequences of specific characters or bytes that are unique to known families of malware. Logical rules, known as Yara rules or “signatures,” are also written into the tool instructing it how to apply those character sequences.
Yara rules are used in many situations, such as when responding to cybersecurity incidents, determining whether devices or networks have been compromised, and improving an organization’s defenses through proactive malware detection.
A longstanding problem with Yara rules, however, is that cybersecurity analysts need to build them manually. This manual process is tedious and highly time-consuming, even for seasoned cybersecurity pros. In many cases, it might take hours or days to write an effective Yara tool for certain classes of malware. For highly complicated cases, cybersecurity analysts may simply give up on creating the needed sequences and rules, because they have too many other tasks to do and not enough time. This is problematic given the amount of malware that exists (more than 1.3 billion malware have been identified) and the number of cyberattacks that occur.
In 2020, a team of Booz Allen cybersecurity researchers developed a novel way to use machine learning and other innovations to automate the process of building a Yara rule. The solution, called AutoYara, is a highly configurable tool that produces effective, accurate Yara rules in minutes or seconds—dramatically reducing the time typically needed. Moreover, AutoYara is highly compact so it can be deployed on a typical laptop or in a remote-network environment.
A Java-based software package, AutoYara incorporates three key innovative approaches:
Booz Allen did not originally set out to develop an automated Yara tool. Rather, the journey to develop AutoYara began with Booz Allen’s groundbreaking work to develop a new algorithm that would produce KiloGrams for practical malware analysis. Once the ideas behind KiloGrams were fleshed out, the Booz Allen cybersecurity research team realized it could apply that innovation to malware identification and classification, and the Yara tools used to do that.
By adding the biclustering and Bloom filter components to the concept (and after more than a year of engineering iterations), the Booz Allen team was able to build the AutoYara tool and refine its performance and practicality in the field. In September 2020, LPS made the AutoYara tool available for downloading on its LPS GitHub website.
In summary, AutoYara was the result of Booz Allen’s ability to apply an innovative mindset to client problems, combined with deep expertise in cybersecurity research, machine learning research, and engineering.
Real-world testing by malware analysts indicates that AutoYara can reduce the time that analysts spend constructing Yara rules by between 44 percent and 86 percent. This allows the analysts to spend their time instead on the kinds of advanced malware that current tools cannot handle. Our test results demonstrate that AutoYara can help reduce analyst workload by producing rules with useful true-positive rates while maintaining low false-positive rates—sometimes performing as well or better than human analysts. This is valuable at a time when cybersecurity experts and analysts are increasingly in short supply at many organizations.
The Booz Allen team presented its work on AutoYara in the article, Automatic Yara Rule Generation Using Biclustering, at the 13th ACM Workshop on Artificial Intelligence and Security (AISec'20), where it won the Award for Best Paper.