Statistical URL Analyzer

In today’s world, Social Networking has become a de facto and more and more people are connecting to each other and trying to bridge the geographical gap. You will find all sorts of people and all sorts of invitations. But how many of these invitations are genuine?

The links embedded within an email, have been for ages being used to serve malware or steal credentials, those who know about it recognize while others fall prey. Drive-by downloads are the worst offenders, the victim just doesn’t have any idea as to what has hit them.

A Malware Analyst will have plethora of tools at their disposal, they might be using VMs and sandboxes to ensure that their machine doesn’t get infected , they are able to carry out analysis and protect the users from future attacks.

A few days ago, I received an email from a colleague of mine for analysis. The contents of the mail were pertaining to a LinkedIn Invitation. At a first glance, it was evident that these emails were related with phishing. The links embedded within the email were pointing to some other domain and the the displayed image was from a different domain.

This is the point where things start getting interesting. Normally, its very rare of me to discuss about the things we do in our research department, but today I will make an exception.

We have an in-house (developed) analyzer which we use to analyze the numerous links and URLs which we keep on getting from various sources. The goal of the analyzer is to correctly identify whether the submitted URL is clean or malicious.

Many anomalies can/may exist, within the construct of a web-page, which are actively targetted by Malware Authors and these methods are being detected by the analyzer. The basic thought which went into its design was coding behavior between a normal web-programmer and a malware author.

Using this analyzer, for us it is very easy to analyze, as it consumes less time and we are 100% sure that either the URL is safe and clean or it is malicious. Initially there were many false-positives but as time went by FPs decreased and today they are quite negligible.

As of this moment, in addition to phishing websites, we are also able to detect Blackhole exploits, Drive-by Downloads or SEO poisoned websites – frequently used by domain-squatters.

Today, I came across one research site, which did research on LinkedIn Invitation based scam mails and they did a commendable job of categorizing these sites as #Blackhole Exploits.

Since, the gist and the very thought of this algorithm is very much different and doesn’t comply with the existing nomenclature of categorizing i.e. Blackhole Exploit Kit, Flashback, hence most of the categories have a very different naming convention eg. ‘SC1’, ‘M3’, ‘M21’ or something as weird as ‘RM_MT’.  Whatever be the category, and whatever be the ‘result count’, every category is equally dangerous.

Some of the categories are #Experimental , whether to allow the results of this category or to completely ignore, is a task, which is filled with numerous tests and trials.

Secondly, this is a real-time analyzer, hence many times when we are testing out a particular category based on the input url which we have received, the administrator of the server may end-up cleaning the system. Recreating the entire scenario (the flow) is the most difficult task.

Lats but not the least, some of the urls are defunct due to the proactive nature of an admin and we end up getting a 0% detection, but services which rely on a database will mark them as Malicious. Database cleanup is another issue I would like to highlight.

I have come across malware authors who are proactive and highly protective against such — intrusions, cause we are not clicking on anything and not submitting anything precious. Many times, I have ended up using anon-proxy services, just to get hold of the web-site and sometimes had to fake the browser and OS. Why? ‘Targeted attacks’ is the two-word answer.

In the next blog-post, I will be writing about MetaSploit, Social Engineering toolkit and Statistical URL Analyzer.  What happens when yahoo.com or facebook.com or as a matter of fact any normal website gets infected or there is a clone website for these genuine ones?

Eg. 1 Driveby Download

Checking : hxxp://blechvet.de/81shTho6/index.html
Downloading Script : infocen.org/QHaL3SXj/js.js
Downloading Script : www.hotel-lunadelsol.com/DAooDhHL/js.js
Downloading Script : www.stefanie-engelmann.de/8CcoTM2H/js.js
Downloading Site : 96.126.109.182/tid6mian.php?q=w5sa5su1wthouoz6

Sc1=1
JsDL=1
ScR=1

Malware Section Start
ML1=2
ApInv= 1
Malware Section End

Results=5
Analysis Time=0.079374588206357 secs
Total Time=6.09946603694241 secs

Sorry, no image for this Driveby , will require me to upload a video, maybe some day in future.

Eg. 2 - This is a phishing site
Checking : hxxp://nusstop.com/
Server Header REDIRECTING to : 0a7da20731a1ee3a1ebd9ed15c961063
Ac=1
AcD2=1
AcD3=1 #Exp
M21=1 #EXP

Results=4
Analysis Time=0.0624399217671422 secs
1.26769285470587 secs

How does the web-site look like?

Eg. 3 LinkedIn Invitations
Checking : hxxp://mtz.spb.ru/gagelink.html
Sc1=1

Results=1
Analysis Time=0.00475735238110351 secs
Total Time=1.83510875557389 secs

How does the email look like?

Eg. 4 Marketing Email
Checking : hxxp://cl.exct.net/?ju=fe3716717161057d7c1571&ls=
fddd1379706d00747112747c&m=
fef91275706402&l=fe99167077660d7a75&s=fe201d767462037a741579&jb
=ffcf14&t=
Server Header REDIRECTING to : hxxp://aberdeen.com/Aberdeen-Library/
7972/RA-business-intelligence-analytics.aspx

AcL=1
M21=1 #EXP

Results=2
Analysis Time=0.217457987527175 secs
Total Time=2.46597908619896 secs

How does the email look like?

DriveBy Download – Live Example at the time of publishing the blog. We have been observing a rise in email-based Drive-by Downloads.

Checking : hxxp://dimidi.com/10Yt5g3R/index.html
Downloading Script : bandwidth.jonatancolman.com.ar/VJTnWr0T/js.js
Downloading Script : new.directwhite.com/tmdsfx9H/js.js
Downloading Script : stukater.eu/9ypP1jm7/js.js
Downloading Site : 74.91.117.200/jbq98p6414wen6q.php?e=bcnvaimshyjv1d1c
Sc1=1 
JsDL=1
ScR=1

Malware Section Start
ML1=2
ApInv= 1
Malware Section End

Results=5
Analysis Time=0.102545514619282
6.94590359692043
This entry was posted in eScan 11 and tagged . Bookmark the permalink.

One Response to Statistical URL Analyzer

  1. Pingback: MalwareMustDie – BH EK version 2 | Welcome to the eScan Blog

Comments are closed.