This dataset contains anonymized passive traffic traces from CAIDA's
equinix-chicago and
equinix-sanjose
monitors on OC192 Internet backbone links. This data is useful for research on
the characteristics of Internet traffic, including application breakdown,
security events, geographic and topological distribution, and flow volume and
duration.
The first traffic trace available
is a 1 hour traffic trace collected during the DITL 2008
measurement event. This trace contains anonymized packet headers in pcap format on a single direction
of the bidirectional OC192 link at equinix-chicago from approximately 2008-03-19 19:00 to 20:00 UTC.
The hardware monitoring the other direction of the link was not functioning
properly at the time of the traffic capture, so only data for a single
direction was captured.
Additionally, a 6 hour traffic trace was also collected during DITL 2008
on the same single direction of the bidirectional OC192 link from 2008-03-19 00:00 to 06:00 UTC.
Due to the volume of this trace (almost 400 GiB compressed) we have not (yet) made this trace part of
this dataset.
For the equinix-chicago monitor, the first monthly bidirectional traffic
trace was taken on April 30 2008, and added to the Anonymized 2008 Internet
Trace dataset in June 2008. This 1 hour trace resulted in 83 GB of compressed
pcap files. The first monthly bidirectional traffic trace from the
equinix-sanjose monitor was taken on July 17 2008.
We are aware that some data in this dataset contains more then trivial amounts
of packet loss; this has especially been an issue for equinix-chicago direction
B. Due to the way the monitoring equipment is set up the synchronization
between directions we don't know how well-aligned both directions of a single
link are. We plan to provide more metadata on synchronization between
directions and packet loss in the traces in the near future.
Traffic traces in this dataset are anonymized using CryptoPAn prefix-preserving anonymization. All traces in this
dataset are anonymized with the same key.
CAIDA makes near-realtime traffic reports available from its passive monitors.
More detailed information on individual traces is available in DatCat, in
the collection
/collection/1-06BX-2=CAIDA+Anonymized+2008+Internet+Traces+Dataset.
These traces can be read with any software that reads the pcap (tcpdump)
format, including the CoralReef Software Suite, tcpdump, Wireshark, and many others.
Data Use Restrictions
-
The anonymized traffic traces will not be distributed beyond authorized users.
-
CAIDA will be notified of the names and email addresses of any persons (and their respective affiliations) assisting in research using the anonymized traffic traces. This includes graduate students and interns.
-
The IP addresses in these traces are all anonymized to preserve the privacy of end users (hosts) and networks monitored in the collection of the data. The anonymization is prefix-preserving; if the original IP addresses had N bits in common, the anonymized addresses will have those same N bits in common. The traces in a dataset are all anonymized with the same key, so one original IP address that appears in multiple traces in a dataset will appear as the same anonymized IP address across those traces.
In so far as possible, privacy of end users (hosts) and networks monitored in the creation of these traces will be respected by the researchers. Researchers will make no attempts to reverse engineer, decrypt, or otherwise identify the original IP addresses collected in the trace. Researchers will also not attempt to extract unanonymized IP addresses from encapsulated headers.
Researchers will make no attempts to connect to, probe, or in any other way initiate contact with a machine or machine administrator identified via the anonymized traffic traces.
-
Anyone who publishes a document (including web pages and papers) that uses data from this dataset must provide CAIDA with a copy of the publication and must cite:
The CAIDA Anonymized 2008 Internet Traces - <dates used>
Colleen Shannon, Emile Aben, kc claffy, Dan Andersen,
http://www.caida.org/data/passive/passive_2008_dataset.xml
-
All users are encouraged, but not required, to include the following
attribution in their acknowledgments section:
Support for CAIDA's Internet Traces is provided by the National
Science Foundation, the US Department of Homeland Security,
and CAIDA Members.
-
All users who create a publicly available presentation using
data from this dataset must provide CAIDA with a copy of the
publication and must use the full name of the dataset ("The
CAIDA Anonymized 2008 Internet Traces") in the presentation.
Users are encouraged, but not required, to include the url for
the dataset
(http://www.caida.org/data/passive/passive_2008_dataset.xml).
-
At the end of the research, or semi-annually (whichever
is less), a summary of the research and any findings/conclusions
will be reported to CAIDA. If any research is described on
the WWW, a URL will be provided. This information is
primarily used in reports to our funding agencies.
Data Access
Request Access to the Anonymized 2008 Internet Traces Dataset and
other CAIDA Anonymized Trace Datasets
More Information