Part One Of Why Edit Difference Is Crucial – Chuck Leaver

Written By Jesse Sampson And Presented By Chuck Leaver CEO Ziften


Why are the same tricks being used by opponents over and over? The simple response is that they are still working today. For instance, Cisco’s 2017 Cybersecurity Report informs us that after years of wane, spam e-mail with malicious attachments is again growing. In that standard attack vector, malware authors generally mask their activities by using a filename much like a common system process.

There is not necessarily a connection between a file’s path name and its contents: anybody who has aimed to hide sensitive information by giving it a boring name like “taxes”, or changed the extension on a file attachment to circumvent e-mail rules is aware of this concept. Malware creators understand this as well, and will typically name their malware to resemble typical system processes. For instance, “explore.exe” is Internet Explorer, but “explorer.exe” with an additional “r” may be anything. It’s easy even for experts to overlook this small distinction.

The opposite problem, known.exe files running in uncommon places, is easy to fix, utilizing SQL sets and string functions.


How about the other case, discovering close matches to the executable name? Most people start their search for close string matches by sorting data and visually searching for inconsistencies. This typically works effectively for a little set of data, perhaps even a single system. To discover these patterns at scale, nevertheless, requires an algorithmic approach. One recognized method for “fuzzy matching” is to use Edit Distance.

Exactly what’s the best method to calculating edit distance? For Ziften, our technology stack consists of HP Vertica, making this job simple. The internet has lots of data researchers and data engineers singing Vertica’s praises, so it will be sufficient to discuss that Vertica makes it simple to produce custom functions that maximize its power – from C++ power tools, to analytical modeling scalpels in R and Java.

This Git repo is preserved by Vertica enthusiasts working in industry. It’s not an official offering, however the Vertica group is certainly familiar with it, and moreover is believing every day about ways to make Vertica more useful for data scientists – a great space to see. Best of all, it contains a function to determine edit distance! There are also alternative tools for natural language processing here like word stemmers and tokenizers.

By utilizing edit distance on the top executable paths, we can quickly discover the nearest match to each of our leading hits. This is a fascinating data set as we can arrange by distance to discover the nearest matches over the whole data-set, or we can sort by frequency of the top path to see what is the closest match to our typically utilized procedures. This data can likewise appear on contextual “report card” pages, to show, e.g. the top five nearest strings for a provided path. Below is a toy example to provide a sense of usage, based on genuine data ZiftenLabs observed in a customer environment.


Setting an upper limit of 0.2 seems to discover great results in our experience, however the point is that these can be adjusted to fit specific usage cases. Did we find any malware? We discover that “teamviewer_.exe” (should be simply “teamviewer.exe”), “iexplorer.exe” (needs to be “iexplore.exe”), and “cvshost.exe” (ought to be svchost.exe, unless maybe you work for CVS drug store…) all look unusual. Considering that we’re already in our database, it’s likewise unimportant to obtain the associated MD5 hashes, Ziften suspicion scores, and other attributes to do a much deeper dive.


In this specific real life environment, it ended up that teamviewer_.exe and iexplorer.exe were portable applications, not known malware. We helped the client with further investigation on the user and system where we observed the portable applications given that use of portable apps on a USB drive could be proof of naughty activity. The more troubling find was cvshost.exe. Ziften’s intelligence feeds suggest that this is a suspect file. Searching for the md5 hash for this file on VirusTotal confirms the Ziften data, indicating that this is a possibly severe Trojan infection that could be a component of a botnet or doing something much more harmful. When the malware was discovered, however, it was easy to resolve the issue and make certain it stays solved utilizing Ziften’s ability to eliminate and constantly obstruct procedures by MD5 hash.

Even as we develop advanced predictive analytics to identify harmful patterns, it is important that we continue to improve our capabilities to hunt for known patterns and old techniques. Just because brand new risks emerge does not imply the old ones disappear!

If you liked this post, keep looking here for the second part of this series where we will use this method to hostnames to discover malware droppers and other malicious sites.


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>