Current Monitor Status and Statistics
 |
| (click on image for an interactive map) |
 |
| (click on image for monitor statistics) |
Introduction
Archipelago (Ark) is CAIDA's newest active measurement infrastructure,
the next generation in evolution of the skitter infrastructure CAIDA
operated for nearly a decade (what is skitter and how is Ark different from skitter?).
The primary goals are to
- reduce the effort needed to develop and deploy sophisticated
large-scale measurements, and
- provide a step toward a
community-oriented measurement infrastructure by allowing collaborators to run their vetted
measurement tasks on a security-hardened distributed platform.
Ark is tailored specifically for active network measurement. This
allows Ark to be simpler than some other general-purpose distributed
experimental platforms, and it allows us to concentrate on providing
facilities that directly address the needs of networking research. In
particular, we provide a facility for communication and coordination
that makes it easier to write distributed measurements that must work
together to achieve a goal. We are working on providing a high-level
API to ease the challenges of writing measurement tools. Our goal is
to lower the barrier to bringing novel and interesting measurement
techniques to life.
Current Measurements
Dataset quick links:
The initial focus of Ark is coordinated large-scale traceroute-based
topology measurements using a process called team probing. In team
probing, we group monitors into teams and dynamically divide up the
measurement work among team members. This parallelization allows us
to obtain a traceroute measurement to all routed /24's in a short
period of time: about 48-56 hours for a team of 13 monitors probing 7
million /24's (that is, the full routed address space subdivided into
/24's) at 100pps. We currently have two teams active, and each team
probes independently.
We perform traceroute measurements using
scamper, a
powerful and flexible active measurement tool supporting IPv4, IPv6,
traceroute, and ping. Scamper supports TCP-, UDP-, and ICMP-based
measurements and
Paris traceroute
variations. Scamper has been in
development for several years by our
collaborator Matthew Luckie at the University of Waikato.
We codify the output of these measurements to the IPv4 Routed /24 Topology Dataset. These
measurements have been ongoing since September 2007, and as of
mid-August 2008, we have collected 1.3 billion traceroutes and 519GB
of data.
We augment the Routed /24 Topology Dataset with automated lookups of
DNS names. We have an in-house bulk DNS lookup service called HostDB
that can look up millions of addresses per day. We look up all
intermediate addresses and responding destinations seen in the
Topology Dataset.
We are working on combining the three alias resolution techniques
currently available (Mercator, Ally, APAR) into a unified tool and
system that we will use to generate router-level topology from the IP
Topology Dataset.
Finally, we provide the IPv4 Routed /24 AS Links Dataset, which
contains Autonomous System (AS) links derived from the IP paths of the
Topology Dataset. This AS links dataset is useful for studying the
peering relationships between Internet bit transport providers.
The Spoofer Project
Ark monitors participating in the Spoofer Project gather data on IP spoofing by receiving potentially spoofed traffic and forwarding it on to the Spoofer Project's server at MIT for analysis. Ark Hosting sites interested in participating as receivers need to agree to the Acceptable Use Policy (AUP) for the Spoofer Project.
Tuple Space
One of the distinguishing features of Ark is its focus on
coordination. Coordination, broadly speaking, is concerned with
planning, executing, and controlling an ensemble of distributed
computations. Coordination is the meta-activity that surrounds a
computation.
To facilitate coordination, Ark provides a new implementation, called
Marinda, of the well-known tuple-space coordination model first
introduced by David Gelernter in his Linda coordination language. A
tuple space is a distributed shared memory combined with a small
number of easy-to-use operations. The tuple space stores tuples,
which are arrays of simple values (strings and numbers). Clients
retrieve tuples by pattern matching.
The tuple space is a many-to-many communication and coordination
medium. Over this medium, measurement clients can interact in
sophisticated ways, such as exchanging state and triggering actions
among monitors. The tuple space abstraction leads to a peer-to-peer
architecture, in which participants can be both a client and a server
seamlessly. For example, it is simple to write a traceroute service
that takes requests and sends responses over the tuple space. We can
then layer on top of these traceroute services clients that trigger
traceroutes when certain conditions are met. By lowering the barrier
to writing and deploying services to just a few lines of code, the
tuple space abstraction allows a rich ecosystem of measurement
services to thrive, in the same way that HTML empowered users by
allowing anyone to become a publisher on the Internet.
For more information, see the list of coordination references below.
Future Plans
- We will release the source code of the Marinda tuple space
implementation under the GPL.
- We will continue implementing the Ark infrastructure software,
including a high-level API for performing network measurements and
the security layers needed to allow semi-trusted third parties to
conduct measurements.
- We will conduct IPv6 topology measurements from 6 monitors that
currently have IPv6 connectivity (as of August 2008). We also hope to perform DNS open resolver surveys.
Presentations
Has your computer received a probe from an Ark monitor?
Learn more about the probes sent by CAIDA for these
experiments.
Questions about Ark?
Please send questions or comments regarding Ark to
ark-info@caida.org.