This blog post presents some interesting ideas I've had about CPU load covert channels.
A covert channel is any technique that allows two programs to communicate that should be unable to communicate; for example, encoding information in TCP initial sequence numbers, as in , or in the timings of HTTP requests (imagine: a request on an even numbered second is a 0; a request on an odd-numbered second is a 1).
An example use case for a covert channel is to communicate an IP address from outside an anonymized context to within one, or vice versa.
A CPU load covert channel is a specific covert channel based on the use of CPU load as a means of transmission.
An example of a CPU load covert channel is as follows: we have two processes, which we wish to communicate; one is designated as the sender or transmitter, while the other is designated as the receiver. The receiver is constantly running a loop (that normally takes about 1 second), and recording the timings; the transmitter runs the same loop to transmit a 1, and does nothing to transmit a 0. On a single-core machine, the receiver will observe that the loop will take about twice as long when the sender is transmitting a 1 as when it isn't. On a multicore machine, the transmitter can use one thread per logical core.
CPU load covert channels are not new; see, for example, , which used them to transmit data between Xen domains. However, I believe that more though needs to be put into how these affect Tor.
I've actually put together a demo of this to transmit an IPv4 address from a regular, non-anonymized browser to Tor Browser or similar. For me, at least, this seems to work nearly 100% of the time, with nearly 100% accuracy; I also find it fun to watch a CPU usage graph while this is running.
Given only the above, and certain assumptions about the Tor client, it would still be safe to use an operating system such as Tails that does not allow most applications to learn the real IP address. However, there are more ways to observe CPU load than via running code on the same computer. For example, in , Murdoch et al. show that the increased heat emitted by a CPU when it is running under high load can cause the clock on the motherboard to skew by a very small amount, thus allowing one to judge its CPU usage from afar.
With two computers connected via Ethernet through a switch, I would normally get ping timings of around 250 microseconds. However, when the computer being pinged was pegged at 100% CPU on all cores, ping latency would drop to about 170 microseconds. This could be observed over a larger distance, such as through a Wi-Fi network (think Internet café), by averaging over a large number of samples.
I was able to use this to transmit a 32-bit IPv4 address from Tor Browser on Debian to a Python script running on a separate computer (Linux Mint, if it matters), with only four bit errors, easily within the reach of error correcting codes. As far as I know, this is the first time this particular property has been used as a covert channel; if I'm wrong, contact me, and I'll correct it.
The most obvious way to fix this is using cgroups to limit the CPU usage of any given browser to only a fraction of the total available resources; if two concurrent loops are limited to 25% of the CPU, then they should (in theory) be unable to notice eachother. Although this would be a nice start, there may be ways to get around this. (If we were to do this, it may be more profitable in the long-term to actually run Tor Browser within Docker.)
This one seems harder to mitigate. However, we should be able to use a similar trick: containerize Tor Browser, but instead of simply limiting the CPU usage to 25%, ensure that the CPU usage is always precisely 25%; this could be implemented using a process with a niceness of 19 running in an infinite loop. This would mean that CPU usage would be constant, even if Tor Browser were itself using more CPU time, thus (in theory) preventing the ping latency side channel.
Just disabling ping packets (or all of ICMP for that matter) isn't enough. As an example, an attacker could observe the timings of TCP SYN-ACK or ACK packets (those are used during TCP's 3-way handshake). One suggestion would be to ensure that all packets are always sent precisely on the millisecond. However, depending on the precise mechanism for the decreased ping latency, this may not help at all.
I would like to thank Jonathan Huo for allowing me to bounce ideas off him, and Stephen J. Murdoch and Georg Koppen for their help in developing the idea.