Traffic analysis is the process of intercepting and examining messages in order to deduce information from patterns in communication. It can be performed even when the messages are encrypted. In general, the greater the number of messages observed, the greater information be inferred. Traffic analysis can be performed in the context of military intelligence, counter-intelligence, or pattern-of-life analysis, and is also a concern in computer security.
Traffic analysis tasks may be supported by dedicated computer software programs. Advanced traffic analysis techniques which may include various forms of social network analysis.
Traffic analysis has historically been a vital technique in cryptanalysis, especially when the attempted crack depends on successfully seeding a known-plaintext attack, which often requires an inspired guess based on how specific the operational context might likely influence what an adversary communicates, which may be sufficient to establish a short crib.
Traffic analysis method can be used to break the anonymity of anonymous networks, e.g., TORs. There are two methods of traffic-analysis attack, passive and active.
In a military context, traffic analysis is a basic part of signals intelligence, and can be a source of information about the intentions and actions of the target. Representative patterns include:
There is a close relationship between traffic analysis and cryptanalysis (commonly called codebreaking). Callsigns and addresses are frequently encrypted, requiring assistance in identifying them. Traffic volume can often be a sign of an addressee's importance, giving hints to pending objectives or movements to cryptanalysts.
Traffic-flow security is the use of measures that conceal the presence and properties of valid messages on a network to prevent traffic analysis. This can be done by operational procedures or by the protection resulting from features inherent in some cryptographic equipment. Techniques used include:
Traffic-flow security is one aspect of communications security.
The "Communications' metadata intelligence" or "COMINT metadata" is a term in communications intelligence (COMINT) referring to the concept of producing intelligence by analyzing only the technical metadata, hence, is a great practical example for traffic analysis in intelligence.
While traditionally information gathering in COMINT is derived from intercepting transmissions, tapping the target's communications and monitoring the content of conversations, the metadata intelligence is not based on content but on technical communicational data.
Non-content COMINT is usually used to deduce information about the user of a certain transmitter, such as locations, contacts, activity volume, routine and its exceptions.
For example, if an emitter is known as the radio transmitter of a certain unit, and by using direction finding (DF) tools, the position of the emitter is locatable, the change of locations from one point to another can be deduced, without listening to any orders or reports. If one unit reports back to a command on a certain pattern, and another unit reports on the same pattern to the same command, the two units are probably related. That conclusion is based on the metadata of the two units' transmissions, not on the content of their transmissions.
Using all or as much of the metadata available is commonly used to build up an Electronic Order of Battle (EOB) by mapping different entities in the battlefield and their connections. Of course, the EOB could be built by tapping all the conversations and trying to understand, which unit is where, but using the metadata with an automatic analysis tool enables a much faster and accurate EOB build-up, which, alongside tapping, builds a much better and complete picture.
Similarly to the military aspects, commercial business relationships can also be vulnerable to traffic analysis. While the data exchanged will generally be encrypted, the mere flow of data can be informative.
Whenever communications between commercial entities pass through a third-party, such as a telecommunications service, a mediator, a consulting firm, an escrow provider, or a "trusted intermediary" for a data transaction, there is some risk of the traffic being analyzed to obtain commercial intelligence. The rise in multi-party data communications using technologies such as Dataspaces has highlighted this risk once again.
Comparable to the military examples given above, commercial examples include:
All of this can provide valuable intelligence for stock traders, competitors, and other business associates.
These risks are well understood, and lead to the observation that two business organizations will only communicate via a third-party (which costs money) if at least one party trusts the intermediary more than they trust the other party, so far more likely for new or temporary business relationships. That fact itself can be informative.
Traffic analysis is also a concern in computer security. An attacker can gain important information by monitoring the frequency and timing of network packets. A timing attack on the SSH protocol can use timing information to deduce information about passwords since, during interactive session, SSH transmits each keystroke as a message. The time between keystroke messages can be studied using hidden Markov models. Song, et al. claim that it can recover the password fifty times faster than a brute force attack.
Onion routing systems are used to gain anonymity. Traffic analysis can be used to attack anonymous communication systems like the Tor anonymity network. Adam Back, Ulf Möeller and Anton Stiglic present traffic analysis attacks against anonymity providing systems. Steven J. Murdoch and George Danezis from University of Cambridge presented research showing that traffic-analysis allows adversaries to infer which nodes relay the anonymous streams. This reduces the anonymity provided by Tor. They have shown that otherwise unrelated streams can be linked back to the same initiator.
Remailer systems can also be attacked via traffic analysis. If a message is observed going to a remailing server, and an identical-length (if now anonymized) message is seen exiting the server soon after, a traffic analyst may be able to (automatically) connect the sender with the ultimate receiver. Variations of remailer operations exist that can make traffic analysis less effective.
Traffic analysis involves intercepting and scrutinizing cybersecurity threats to gather valuable insights about anonymous data flowing through the exit node. By using technique rooted in dark web crawling and specializing software, one can identify the specific characteristics of a client's network traffic within the dark web.
It is difficult to defeat traffic analysis without both encrypting messages and masking the channel. When no actual messages are being sent, the channel can be masked by sending dummy traffic, similar to the encrypted traffic, thereby keeping bandwidth usage constant. "It is very hard to hide information about the size or timing of messages. The known solutions require Alice to send a continuous stream of messages at the maximum bandwidth she will ever use...This might be acceptable for military applications, but it is not for most civilian applications." The military-versus-civilian problems apply in situations where the user is charged for the volume of information sent.
Even for Internet access, where there is not a per-packet charge, Internet service providers (ISPs) make the statistical assumption that connections from user sites will not be busy 100% of the time. The user cannot simply increase the bandwidth of the link, since masking would fill that as well. If masking, which often can be built into end-to-end encryptors, becomes common practice, ISPs will have to change their traffic assumptions.