CAIDA Home
 www.caida.org > data : passive : codered_worms_dataset.xml
    donate     contact     search:
CAIDA: Cooperative Association for Internet Data Analysis
UCSD Network Telescope -- Code-Red Worms Dataset

-----summary of contents-----

Recent Updates

Two Days in November 2008 Dataset (2009-07-01)IPv6 Topology Dataset (2009-03-20)Anonymized 2009 Internet Traces Dataset (2009-03-13)DNS Names for IPv4 Routed /24 Topology Dataset (2009-03-13)Traceroute Probe Method 2008-08 Dataset (2008-08-26)

|  View Caida Data by:    Topic    Source    Tool    Accessibility  |

|  Data Sources:    Realtime Monitors    Passive    Active    Other    External  |


-----end summary of contents-----

The Dataset on the Code-Red Worms

The first incarnation of the Code-Red worm (CRv1) began to infect hosts running unpatched versions of Microsoft's IIS webserver on July 12th, 2001. The first version of the worm uses a static seed for it's random number generator. Then, around 10:00 UTC in the morning of July 19th, 2001, a random seed variant of the Code-Red worm (CRv2) appeared and spread. This second version shared almost all of its code with the first version, but spread much more rapidly. Next, on August 4th, a new worm began to infect machines exploiting the same vulnerability in Microsoft's IIS webserver as the original Code-Red virus. Although the new worm had no relationship to the first one outside of exploiting the same vulnerability, it contained in its source code the string "CodeRedII" and was thus named CodeRed II. Finally, on September 18, 2001, the Nimda worm began to spread via backdoors left by CodeRedII, as well as via email, open network shares, and compromised web sites.

This dataset contains information useful for studying the spread of the Code-Red version 2, and CodeRedII worms. The dataset consists of a publicly available set of files that contain summarized information that does not individually identify infected computers.

Data included in the Code-Red Dataset includes:
  • Publicly Available:
    • Code-Red July: the first Code-Red version 2 outbreak (July 19-20, 2001)
      • distribution of start and end times of hosts performing port 80 TCP SYN scanning
      • distribution of durations of time code-redv2-infected computers were observed to be scanning
      • country distribution of code-redv2-infected computers
      • a file containing a table with the following eight tab-separated fields for each observed IP address: start time, end time, top-level domain, country, latitude, longitude, AS number, and AS name
    • Code-Red August: the second Code-Red version 2 outbreak and beginning of the spread of the CodeRedII worm (August 1-20, 2001)
      • distribution of start and end times of hosts performing port 80 TCP SYN scanning
      • distribution of durations of time code-redv2-infected computers were observed to be scanning
      • country distribution of code-redv2-infected computers
      • a file containing a table with the following seven tab-separated fields for each observed IP address: start time, end time, top-level domain, country, latitude, longitude, AS number

Data source:
  • Code-Red July:
    The data source for this dataset includes packet headers collected from a /8 network at UCSD (the UCSD Network Telescope), timestamp/IP address pairs for TCP SYN packets received by two /16 networks at Lawrence Berkeley Laboratory (LBL), and sampled netflow from a router upstream of the /8 network at UCSD. These three data sources are used to maximize coverage of the expansion of the worm. Between midnight and 16:30 UTC, a passive network monitor recorded headers of all packets destined for the /8 research network. After 16:30 UTC, a filter installed on a campus router to reduce congestion caused by the worm blocked all external traffic to this network. Because this filter was put into place upstream of the monitor, we were unable to capture IP packet headers after 16:30 UTC. However, a second UCSD data set consisting of sampled netflow output from the filtering router was available at the UCSD site throughout the 24 hour period. Vern Paxson provided probe information collected by Bro on the LBL networks between 10:00 UTC on July 19, 2001 and 7:00 on July 20, 2001. We have merged these three sources into to produce the Code-Red July dataset.
  • Code-Red August:
    The data source for this dataset includes only packet headers collected by a passive monitor on a /8 network at UCSD (the UCSD Network Telescope). Beginning August 4th, this data contains a mix of hosts infected by Code-Red version 2 and CodeRedII. It is not possible to determine which worm caused a host to send TCP SYN packets to port 80.
Caveats that apply to this dataset:
  • The .ida vulnerability utilized by the Code-Red worms was exploited via TCP connections to port 80. Because the UCSD Network Telescope did not respond to connection attempts, this dataset does not consist solely of worm traffic. All TCP SYN packets to port 80 received are included in these summaries, including non-worm traffic.
  • The DHCP Effect significantly impacts this dataset, particularly after the first 24 hours of each cycle of worm spread. Changing IP addresses on dynamically addressed machines cause an order of magnitude difference between the number of IP addresses active in any two hour period and the number of IP addresses active in a week. This dataset does not include IP address, so keep in mind that each start/end time or duration does not necessarily uniquely identify an infected computer. It identifies only a newly active IP address, with no information about whether that IP address represents a computer previously known to be infected.
Data Use Restrictions

Acceptable Use Policy for the public access files of the Dataset on the Code-Red Worms

  1. Code-Red worm data, including every file in the Dataset on the Code-Red Worms, will not be redistributed.

  2. I will not attempt to connect to, probe, or in any other way initiate contact with a machine or machine administrator identified via the Code-Red worm data.

  3. In so far as possible, privacy of end users (hosts) and networks monitored by the network telescope will be respected by the researchers. Any publications will anonymize, aggregate or summarize IP addresses, network names, and domain names, as appropriate when the disclosure of such information may present a security risk to those organizations or the general Internet.

  4. At the end of the research, or semi-annually (which ever is less), a summary of the research and any findings/conclusions will be reported to CAIDA. If any research is described on the WWW, a URL will be provided. This information is primarily used in reports to our funding agencies.

  5. All users who publish a document (including web pages, and papers) using data from this dataset must provide CAIDA with a copy of the publication and must cite:

    The CAIDA Dataset on the Code-Red Worms - July and August 2001, David Moore, Colleen Shannon, and kc claffy http://www.caida.org/data/passive/codered_worms_dataset.xml.

  6. Users are encouraged, but not required, to include the following attribution in the acknowledgments section of their document:

    Support for the CAIDA Dataset on the Code-Red Worms was provided by Cisco Systems, the US Department of Homeland Security, the National Science Foundation, DARPA, and CAIDA Members.

  7. All users who create a publicly available presentation using data from this dataset must provide CAIDA with a copy of the presentation and must use the full name of the dataset ("The CAIDA Dataset on the Code-Red Worms") in the presentation. Users are further encouraged, but not required, to include the url for the dataset (http://www.caida.org/data/passive/codered_worms_dataset.xml) in their presentation.
Code-Red Dataset Access References

This dataset is cataloged in DatCat with handle http://imdc.datcat.org/collection/1-001P-M=CAIDA+Code-Red+Worm+Dataset.

For more information on the Code-Red-related worms (Code-Redv1, Code-Redv2, CodeRedII), see:

Acknowledgments

Special thanks to Brian Kantor, Jim Madden, and Pat Wilson at UCSD and Barry Greene at Cisco for support of the UCSD Network Telescope Project. Rapid coordination of all of these folks in the face of a network crisis, along with an equally rapid and incredibly generous equipment donation from Cisco, allowed the collection of this unique dataset.

UCSD Network Telescope Sponsors: Cisco Systems National Science Foundation Defense Advanced Research Projects Agency U.S. Department of Homeland Security

The Dataset on the Code-Red Worms was sponsored by:

Cooperative Association for Internet Data Analysis (CAIDA)
  Last Modified: Wed Jul-1-2009 10:59:19 PDT
  Page URL: http://www.caida.org/data/passive/codered_worms_dataset.xml