February 11th, 2008 | Published in Google Online Security
It has been over a year and a half since we started to identify web pages that infect vulnerable hosts via drive-by downloads, i.e. web pages that attempt to exploit their visitors by installing and running malware automatically. During that time we have investigated billions of URLs and found more than three million unique URLs on over 180,000 web sites automatically installing malware. During the course of our research, we have investigated not only the prevalence of drive-by downloads but also how users are being exposed to malware and how it is being distributed. Our research paper is currently under peer review, but we are making a technical report [PDF] available now. Although our technical report contains a lot more detail, we present some high-level findings here:
Search Results Containing a URL Labeled as Harmful
The above graph shows the percentage of daily queries that contain at least one search result labeled as harmful. In the past few months, more than 1% of all search results contained at least one result that we believe to point to malicious content and the trend seems to be increasing.
Good computer hygiene, such as running automatic updates for the operating system and third-party applications, as well as installing anti-virus products goes a long way in protecting your home computer. However, we have been wondering if users' browsing habits impact the likelihood of encountering malicious web pages. To study this aspect, we took a sample of ~7 million URLs and mapped them to DMOZ categories. Although we found that adult web pages may increase the risk of exploitation, each DMOZ category was affected.
Malicious Content Injection
To understand if malicious content on a web server is due to poor web server security, we analyzed the version numbers reported by web servers on which we found malicious pages. Specifically, we looked at the Apache and the PHP versions exported as part of a server's response. We found that over 38% of both Apache and PHP versions were outdated increasing the risk of remote content injection to these servers.
Our "Ghost In the Browser [PDF]" paper highlighted third-party content as one potential vector of malicious content. Today, a lot of third-party content is due to advertising. To assess the extent to which advertising contributes to drive-by downloads, we analyze the distribution chain of malware, i.e. all the intermediary URLs a browser downloads before reaching a malware payload. We inspected each distribution chain for membership in about 2,000 known advertising networks. If any URL in the distribution chain corresponds to a known advertising network, we count the whole page as being infectious due to Ads. In our analysis, we found that on average 2% of malicious web sites were delivering malware via advertising. The underlying problem is that advertising space is often syndicated to other parties who are not known to the web site owner. Although non-syndicated advertising networks such as Google Adwords are not affected, any advertising networks practicing syndication needs to carefully study this problem. Our technical report [PDF] contains more detail including an analysis based on the popularity of web sites.
Structural Properties of Malware Distribution
Finally, we also investigated the structural properties of malware distribution sites. Some malware distribution sites had as many as 21,000 regular web sites pointing to them. We also found that the majority of malware was hosted on web servers located in China. Interestingly, Chinese malware distribution sites are mostly pointed to by Chinese web servers.
We hope that an analysis such as this will help us to better understand the malware problem in the future and allow us to protect users all over the Internet from malicious web sites as best as we can. One thing is clear - we have a lot of work ahead of us.