Filename: 151-path-selection-improvements.txt
Title: Improving Tor Path Selection
Author: Fallon Chen, Mike Perry
Created: 5-Jul-2008
Status: Closed
In-Spec: path-spec.txt
Implemented-In: 0.2.2.2-alpha
Overview
The performance of paths selected can be improved by adjusting the
CircuitBuildTimeout and avoiding failing guard nodes. This proposal
describes a method of tracking buildtime statistics at the client, and
using those statistics to adjust the CircuitBuildTimeout.
Motivation
Tor's performance can be improved by excluding those circuits that
have long buildtimes (and by extension, high latency). For those Tor
users who require better performance and have lower requirements for
anonymity, this would be a very useful option to have.
Implementation
Gathering Build Times
Circuit build times are stored in the circular array
'circuit_build_times' consisting of uint32_t elements as milliseconds.
The total size of this array is based on the number of circuits
it takes to converge on a good fit of the long term distribution of
the circuit builds for a fixed link. We do not want this value to be
too large, because it will make it difficult for clients to adapt to
moving between different links.
From our observations, the minimum value for a reasonable fit appears
to be on the order of 500 (MIN_CIRCUITS_TO_OBSERVE). However, to keep
a good fit over the long term, we store 5000 most recent circuits in
the array (NCIRCUITS_TO_OBSERVE).
The Tor client will build test circuits at a rate of one per
minute (BUILD_TIMES_TEST_FREQUENCY) up to the point of
MIN_CIRCUITS_TO_OBSERVE. This allows a fresh Tor to have
a CircuitBuildTimeout estimated within 8 hours after install,
upgrade, or network change (see below).
Long Term Storage
The long-term storage representation is implemented by storing a
histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when
writing out the statistics to disk. The format this takes in the
state file is 'CircuitBuildTime <bin-ms> <count>', with the total
specified as 'TotalBuildTimes <total>'
Example:
TotalBuildTimes 100
CircuitBuildTimeBin 25 50
CircuitBuildTimeBin 75 25
CircuitBuildTimeBin 125 13
...
Reading the histogram in will entail inserting <count> values
into the circuit_build_times array each with the value of
<bin-ms> milliseconds. In order to evenly distribute the values
in the circular array, the Fisher-Yates shuffle will be performed
after reading values from the bins.
Learning the CircuitBuildTimeout
Based on studies of build times, we found that the distribution of
circuit buildtimes appears to be a Frechet distribution. However,
estimators and quantile functions of the Frechet distribution are
difficult to work with and slow to converge. So instead, since we
are only interested in the accuracy of the tail, we approximate
the tail of the distribution with a Pareto curve starting at
the mode of the circuit build time sample set.
We will calculate the parameters for a Pareto distribution
fitting the data using the estimators at
http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.
The timeout itself is calculated by using the Quartile function (the
inverted CDF) to give us the value on the CDF such that
BUILDTIME_PERCENT_CUTOFF (80%) of the mass of the distribution is
below the timeout value.
Thus, we expect that the Tor client will accept the fastest 80% of
the total number of paths on the network.
Detecting Changing Network Conditions
We attempt to detect both network connectivity loss and drastic
changes in the timeout characteristics.
We assume that we've had network connectivity loss if 3 circuits
timeout and we've received no cells or TLS handshakes since those
circuits began. We then set the timeout to 60 seconds and stop
counting timeouts.
If 3 more circuits timeout and the network still has not been
live within this new 60 second timeout window, we then discard
the previous timeouts during this period from our history.
To detect changing network conditions, we keep a history of
the timeout or non-timeout status of the past RECENT_CIRCUITS (20)
that successfully completed at least one hop. If more than 75%
of these circuits timeout, we discard all buildtimes history,
reset the timeout to 60, and then begin recomputing the timeout.
Testing
After circuit build times, storage, and learning are implemented,
the resulting histogram should be checked for consistency by
verifying it persists across successive Tor invocations where
no circuits are built. In addition, we can also use the existing
buildtime scripts to record build times, and verify that the histogram
the python produces matches that which is output to the state file in Tor,
and verify that the Pareto parameters and cutoff points also match.
We will also verify that there are no unexpected large deviations from
node selection, such as nodes from distant geographical locations being
completely excluded.
Dealing with Timeouts
Timeouts should be counted as the expectation of the region of
of the Pareto distribution beyond the cutoff. This is done by
generating a random sample for each timeout at points on the
curve beyond the current timeout cutoff.
Future Work
At some point, it may be desirable to change the cutoff from a
single hard cutoff that destroys the circuit to a soft cutoff and
a hard cutoff, where the soft cutoff merely triggers the building
of a new circuit, and the hard cutoff triggers destruction of the
circuit.
It may also be beneficial to learn separate timeouts for each
guard node, as they will have slightly different distributions.
This will take longer to generate initial values though.
Issues
Impact on anonymity
Since this follows a Pareto distribution, large reductions on the
timeout can be achieved without cutting off a great number of the
total paths. This will eliminate a great deal of the performance
variation of Tor usage.