UCSD Network Telescope -- Code-Red Worms Dataset
The Dataset on the Code-Red Worms
The first incarnation of the Code-Red worm (CRv1) began to infect
hosts running unpatched versions of Microsoft's IIS webserver on July
12th, 2001. The first version of the worm uses a static seed for it's
random number generator. Then, around 10:00 UTC in the morning of July
19th, 2001, a random seed variant of the Code-Red worm (CRv2) appeared
and spread. This second version shared almost all of its code with the
first version, but spread much more rapidly. Next, on August 4th, a new
worm began to infect machines exploiting the same vulnerability in
Microsoft's IIS webserver as the original Code-Red virus. Although the
new worm had no relationship to the first one outside of exploiting the
same vulnerability, it contained in its source code the string
"CodeRedII" and was thus named CodeRed II. Finally, on September 18,
2001, the Nimda worm began to spread via backdoors left by CodeRedII,
as well as via email, open network shares, and compromised web sites.
This dataset contains information useful for studying the spread of the
Code-Red version 2, and CodeRedII worms. The dataset
consists of a publicly available
set of files that contain summarized information that does not
individually identify infected computers.
Data included in the Code-Red Dataset includes:
-
Publicly Available:
-
Code-Red July: the first Code-Red version 2 outbreak (July 19-20, 2001)
- distribution of start and end times of hosts performing port 80 TCP SYN scanning
- distribution of durations of time code-redv2-infected computers were observed
to be scanning
- country distribution of code-redv2-infected computers
- a file containing a table with the following eight tab-separated fields for each observed IP address: start time, end time, top-level domain, country, latitude, longitude, AS number, and AS name
-
Code-Red August: the second Code-Red version 2 outbreak and beginning of the spread of the CodeRedII worm (August 1-20, 2001)
- distribution of start and end times of hosts performing port 80 TCP SYN scanning
- distribution of durations of time code-redv2-infected computers were observed
to be scanning
- country distribution of code-redv2-infected computers
- a file containing a table with the following seven tab-separated fields for each observed IP address: start time, end time, top-level domain, country, latitude, longitude, AS number
Data source:
-
Code-Red July:
The data source for this dataset includes packet headers
collected from a /8 network at UCSD (the UCSD Network
Telescope), timestamp/IP address pairs for TCP SYN packets
received by two /16 networks at Lawrence Berkeley Laboratory
(LBL), and sampled netflow from a router upstream of the /8
network at UCSD. These three data sources are used to maximize
coverage of the expansion of the worm. Between midnight and
16:30 UTC, a passive network monitor recorded headers of all
packets destined for the /8 research network. After 16:30 UTC,
a filter installed on a campus router to reduce congestion
caused by the worm blocked all external traffic to this
network. Because this filter was put into place upstream of the
monitor, we were unable to capture IP packet headers after
16:30 UTC. However, a second UCSD data set consisting of
sampled netflow output from the filtering router was available
at the UCSD site throughout the 24 hour period. Vern Paxson
provided probe information collected by Bro on the LBL networks
between 10:00 UTC on July 19, 2001 and 7:00 on July 20, 2001.
We have merged these three sources into to produce the Code-Red
July dataset.
-
Code-Red August:
The data source for this dataset includes only packet headers
collected by a passive monitor on a /8 network at UCSD (the
UCSD Network Telescope). Beginning August 4th, this data
contains a mix of hosts infected by Code-Red version 2 and
CodeRedII. It is not possible to determine which worm caused a
host to send TCP SYN packets to port 80.
Caveats that apply to this dataset:
- The .ida vulnerability utilized by the Code-Red worms was
exploited via TCP connections to port 80. Because the UCSD
Network Telescope did not respond to connection attempts, this
dataset does not consist solely of worm traffic. All TCP SYN
packets to port 80 received are included in these summaries,
including non-worm traffic.
- The
DHCP Effect significantly impacts this dataset,
particularly after the first 24 hours of each cycle of worm
spread. Changing IP addresses on dynamically addressed
machines cause an order of magnitude difference between the
number of IP addresses active in any two hour period and the
number of IP addresses active in a week. This dataset does not
include IP address, so keep in mind that each start/end time
or duration does not necessarily uniquely identify an
infected computer. It identifies only a newly active IP
address, with no information about whether that IP address
represents a computer previously known to be infected.
Data Use Restrictions
Acceptable Use Policy for the public access files of the
Dataset on the Code-Red Worms
-
Code-Red worm data, including every file in the Dataset
on the Code-Red Worms, will not be redistributed.
-
I will not attempt to connect to, probe, or in any other way initiate
contact with a machine or machine administrator identified
via the Code-Red worm data.
-
In so far as possible, privacy of end users (hosts) and networks
monitored by the network telescope will be respected by the
researchers. Any publications will anonymize, aggregate
or summarize IP addresses, network names, and domain names,
as appropriate when the disclosure of such information may
present a security risk to those organizations or the general
Internet.
-
At the end of the research, or semi-annually (which ever is
less), a summary of the research and any findings/conclusions
will be reported to CAIDA. If any research is described on
the WWW, a URL will be provided. This information is
primarily used in reports to our funding agencies.
-
All users who publish a document (including web pages, and papers) using data
from this dataset must provide CAIDA with a copy of the publication and must cite:
The CAIDA Dataset on the Code-Red Worms - July and August 2001,
David Moore, Colleen Shannon, and kc claffy
http://www.caida.org/data/passive/codered_worms_dataset.xml.
-
Users are encouraged, but not required, to include the following
attribution in the acknowledgments section of their document:
Support for the CAIDA Dataset on the Code-Red Worms was provided
by Cisco Systems, the US Department of Homeland Security, the
National Science Foundation, DARPA, and CAIDA Members.
-
All users who create a publicly available presentation using data
from this dataset must provide CAIDA with a copy of the presentation
and must use the full name of the dataset ("The CAIDA Dataset on the
Code-Red Worms") in the presentation. Users are further encouraged, but
not required, to include the url for the dataset
(http://www.caida.org/data/passive/codered_worms_dataset.xml)
in their presentation.
Code-Red Dataset Access
References
This dataset is cataloged in DatCat with handle
http://imdc.datcat.org/collection/1-001P-M=CAIDA+Code-Red+Worm+Dataset.
For more information on the Code-Red-related worms (Code-Redv1, Code-Redv2, CodeRedII), see:
- .ida vulnerability
- Code-Red Worms
- Code-Red version 2 Spread Analysis
Acknowledgments
Special thanks to Brian Kantor, Jim Madden, and Pat Wilson at UCSD
and Barry Greene at Cisco for support of the UCSD Network Telescope
Project. Rapid coordination of all of these folks in the face of a
network crisis, along with an equally rapid and incredibly generous
equipment donation from Cisco, allowed the collection of this
unique dataset.
| UCSD Network Telescope Sponsors: |
 |
 |
 |
 |
The Dataset on the Code-Red Worms was sponsored by:
|
|