View on GitHub

torbwest-pam2021

On the Accuracy of Tor Bandwidth Estimation

Overview

This page provides instructions for reproducing our relay capacity variation analysis. See the front page for more context.

Note: the data processing tasks in Steps 1-4 have already been done, and the output from those steps have been cached in this repository. If you just want to re-plot the graphs, do Step 0 and then skip to Step 5.

Step 0: prepare python virtual environment

python3 -m venv myenv
source myenv/bin/activate
pip3 install stem matplotlib numpy scipy

Step 1: download raw Tor metrics data

for d in 2018-08 2018-09 2018-10 2018-11 2018-12 2019-01 2019-02 2019-03 2019-04 2019-05 2019-06 2019-07
do
    wget https://collector.torproject.org/archive/relay-descriptors/server-descriptors/server-descriptors-${d}.tar.xz
    wget https://collector.torproject.org/archive/relay-descriptors/consensuses/consensuses-${d}.tar.xz
done

Step 2: decompress

tar xJf *tar.xz
mkdir cons
mv consensuses* cons
mkdir sdesc
mv server-desc* sdesc

Step 3: extract bandwidth info

source myenv/bin/activate
python3 parse_tor_archive.py cons sdesc

Step 4: compute

source myenv/bin/activate

# input for all scripts is the tor.archive.json file created in Step 3

# these metrics are computed over all data throughout the entire year
python3 compute_position.py
python3 compute_uptime.py
python3 compute_weight.py
python3 compute_advbw.py

# compute relative standard deviation while ignoring unmeasured relays
# and weeks where bwrate or bwburst changed advertised bandwidth
python3 compute_rsds.py

Step 5: plot the graphs

# uses the output from Steps 3 and 4
source myenv/bin/activate
python3 plot_rsds.py > stats.txt

Results

Figures 1a, 1b, 1c, and 1d; and related statistics