Responsible Research with Anti-Censorship Technologies

Joseph Lorenzo Hall (joehall@berkeley.edu)
Postdoctoral Researcher
UC Berkeley School of Information
Princeton Center for Information Technology Policy

(16 Feb 2011 (v1.1); prepared for DARPA/NSF meeting in Arlington, Virginia on Ethical, Legal and Social Issues of Personally Identifiable Information)

Summary

Given recent efforts to degrade, intercept or thwart internet communications in repressive societies, more people in these countries will be drawn towards using tools to circumvent such censorship measures to communicate freely. Researchers and funders will also spend more of their time and grant resources to investigate, improve and develop novel anti-censorship projects. This short note highlights a number of privacy-problematic activities in which researchers may engage, intentionally or not, and attempts to recommend some guidelines.

What Are Anti-Censorship Tools?

Anti-censorship tools represent a broad swath of different types of technology. Basic techniques used by anti-censorship technologies include:

Encrypted communications: where a secret key is used between communicating parties to shroud the content of communications.
Steganography: where communication signals are embedded covertly in seemingly normal communications.
Proxy communication: where a communication is passed to an independent party who then delivers the communications contents to its intended destination.

Many tools practically available for anonymous communication or circumvention of censorship technologies combine a few of these techniques. For example, the "onion-routing" network, Tor, uses clients that encrypt communications in layers of encryption ("onion skins") that they then pass through a number of proxy server "nodes" in the Tor network that can only decrypt the outer-most layer of each communication before finally sending a given message to its destination.

Research Efforts

There has been a lot of academic research into these kinds of tools, including efforts to improve/degrade tool performance, improve/defeat anonymity guarantees, detect/hide malicious nodes in the network and mask/profile tool users and the uses to which these tools are put. There is an active community of academic researchers in anonymous communications and they tend to congregate annually at the Privacy Enhancing Technologies Symposium. A good bibliography of the literature is at: http://freehaven.net/anonbib/

Continued research into improving these tools is needed to improve their resistance to profiling, filtering, blocking, etc. That is to say, like any adversarial security game, the adversaries are working hard to upset anonymous communication and if research and development efforts stop, there will be no truly anonymous communication capability. Ideally, a repressive regime will have to use the Egypt option, of completely "turning off the internet" or perhaps blocking all encrypted traffic in order to thwart the ability of people to communicate freely. (Of course, steganographic methods can be used in unencrypted communications, but they tend to be less efficient, requiring dramatically more bandwidth per message size.)

Preserving Privacy in Research (Or, At Least, Minimizing Harm)

Research using these tools must walk a very fine line. These tools are actively being maintained by people who care dearly about the continued availability of free communications and they are being used by real people who may very well be risking their lives and freedom to communicate.

There are a number of concerns that researchers need to take care to avoid altogether or, if that is impossible, minimize as much as possible:

Disruption: Activities can bring parts of the network down or degrade its performance.
Interception: Sniffing actual communications contents and routing information can identify users and put researchers at risk of prosecution for violating wiretapping and "pen register" laws.
Dependency: Removing research resources that the system has come to rely upon can manifest as inconsistency to users and put sudden loads on non-research elements of the tool.
Profiling: Fingerprinting end users and their devices and associating them can identify users, behavior and associational patterns.
Reidentification: Data that has been effectively "scrubbed" or "anonymized" can in certain circumstances be unanonymized in the presence of other types of data.
Exposure to Future Capability: Some technologies, such as encryption, base their security models on assumptions about the resources an attacker can bring to bear on defeating protections. Of course, an attacker's resources or capabilities may change dramatically in the future.

Recommendations

Of course the potential benefits into a particular piece of research always need to be balanced against risks of human subjects that might be affected by research activity. The simplest recommendation we can make is when a research effort doesn't need actual humans to demonstrate its goals, it shouldn't do so.

There will be cases where a research project is deemed sufficiently important and in need of real world user data, activity or interaction or simply needing aspects of a system that are substantially more rich and varied than what could possibly be created in the lab with reasonable resources. In these cases, it seems important to follow a few guidelines:

Seek external human subjects advice: Traditional Institutional Review Boards are simply not going to understand the complexity of these tools and the associated risks to human subjects. It would be important to discuss the research project with the anti-censorship tool builders, but it seems that the anonymous communication academic field, such at those that attend PETS, could set up an informal IRB-like committee such that some affirmative approval from this group could serve as a mark of due diligence that privacy and security are receiving sufficient attention and ex-ante peer review. (Soghoian 2011, in a forthcoming paper on responsible research of the Tor network, advocates for Program Committees and Editorial Boards to enforce pro-privacy research practices by having them ask researchers for proof of IRB/legal clearance and then reserving the right to reject research entirely that crosses a few bright lines.)
When possible, notify users: Some tools will not easily permit notice and/or informed consent. Certainly any interactive study or study that runs code locally on user machines should have full-blown informed consent. Network-based research should at least attempt to provide notice; for example, listing research details or a positive statement about what information is being captured, manipulated, etc. on a web-facing resource.
Disruption should be avoided: No research effort should result in a tool becoming less useful during a project or inoperable. This is especially important when the researchers know that a current event, such as the Egyptian revolution, might increase the demand for a tool (despite the fact that these are also very interesting events to study!).
Cessation of resources should be phased: If a research resource is providing or adding to the capabilities of an anti-censorship tool, taking it offline suddenly should be discouraged. Researchers should plan to gradually step-down resource provision or, even better, find methods to hand off or sponsor these resources as an ongoing gesture of goodwill for the cooperation of developers in the research effort.
Communications routing from users should not be collected: Since the user base for these tools includes people actively engaged in activities their governments do not approve of, it's important to make sure that any communications routing information (IP addresses, etc.) is not captured at all. Some aggregate data here could perhaps be OK, but detailed individualized data raises too much of a risk (See Loesing 2010, below, for a more informed and nuanced discussion of collecting such data responsibly for Tor.). Destination routing information to large, unindividualized hosts is perhaps less problematic, although peer-to-peer communications raise essentially the same issues.
Communications content should never be captured: Contents of communications are explicitly protected in many countries by wiretapping laws. A researcher puts herself and the tool users at great risk by capturing and storing communications contents. While no researchers that have examined communications content have yet been prosecuted, that shouldn't give comfort. In addition, capturing communications contents will undoubtedly expose sensitive information that could be used to identify the user, expose secret key information (passwords, crypto keys, etc.) or frustrate the abilities of these users to communicate and organize in secret.
Data should be carefully controlled and then securely deleted as soon as possible: Most researchers want their data for eternity and a day, considering the effort required to produce it and the possibility of being able to use it in future research. For anonymity tools, this can be especially dangerous. Repressive regimes or law enforcement can easily subpoena data and get wholesale access to records they may not have been easily capable or legally permitted to obtain otherwise. This recommendation should help to combat reidentification threats and risks of future capability that would otherwise render moot the protections afforded by anti-censorship tools. (See Soghoian 2011 for a discussion of the trade-offs associated with ensuring that all data analysis is done on-the-fly, in ephemeral (RAM) memory, such that persistent records are never created.)

References

A couple papers talk about the need to conduct research on anonymity tools, notably Tor, in an ethical manner (see their references for very good, more general, work on conducting cybersecurity research legally and ethically):

Karsten Loesing, Steven J. Murdoch, and Roger Dingledine. "A Case Study on Measuring Statistical Data in the Tor Anonymity Network", in Financial Cryptography and Data Security, volume 6054 of Lecture Notes in Computer Science, pages 203--215. Springer, Berlin, 2010. http://metrics.torproject.org/papers/wecsr10.pdf
Christopher Soghoian. "Enforced Community Standards For Research on Users of the Tor Anonymity Network", (forthcoming) in 2011 Workshop on Ethics in Computer Security Research, 2011. (on file with author)

Papers, discussed in Soghoian (above), that arguably cross the line in terms of ethical research practices:

Claude Castelluccia, Emiliano De Cristofaro, and Daniele Perito. "Private Information Disclosure from Web Searches", in Mikhail J. Atallah and Nicholas J. Hopper, editors, Privacy Enhancing Technologies, volume 6205 of Lecture Notes in Computer Science, pages 38--55. Springer, 2010. http://www.ics.uci.edu/~edecrist/PETS10.pdf
Damon McCoy, Kevin Bauer, Dirk Grunwald, Tadayoshi Kohno, and Douglas Sicker. "Shining Light in Dark Places: Understanding the Tor Network", in Proceedings of the 8th International Symposium on Privacy Enhancing Technologies, PETS'08, pages 63--76, Berlin, Heidelberg, 2008. Springer-Verlag. http://www.cs.washington.edu/homes/yoshi/papers/Tor/PETS2008_37.pdf

This file resides on the net:
http://josephhall.org/papers/elsi-022011.html (HTML)
http://josephhall.org/papers/elsi-022011.text (Markdown)