# Responsible Research with Anti-Censorship Technologies

*[Joseph Lorenzo Hall][1] (<joehall@berkeley.edu>)  
Postdoctoral Researcher  
UC Berkeley School of Information  
Princeton Center for Information Technology Policy*

(16 Feb 2011 (v1.1); prepared for DARPA/NSF meeting in Arlington,
Virginia on Ethical, Legal and Social Issues of Personally
Identifiable Information)

# Summary

Given recent efforts to degrade, intercept or thwart internet
communications in repressive societies, more people in these countries
will be drawn towards using tools to circumvent such censorship
measures to communicate freely.  Researchers and funders will also
spend more of their time and grant resources to investigate, improve
and develop novel anti-censorship projects.  This short note
highlights a number of privacy-problematic activities in which
researchers may engage, intentionally or not, and attempts to
recommend some guidelines.

# What Are Anti-Censorship Tools?

Anti-censorship tools represent a broad swath of different types of
technology.  Basic techniques used by anti-censorship technologies
include:

* **Encrypted communications:** where a secret key is used between
  communicating parties to shroud the content of communications.

* **Steganography:** where communication signals are embedded covertly
  in seemingly normal communications.

* **Proxy communication:** where a communication is passed to an
  independent party who then delivers the communications contents to
  its intended destination.

Many tools practically available for anonymous communication or
circumvention of censorship technologies combine a few of these
techniques.  For example, the "onion-routing" network, [Tor][3], uses
clients that encrypt communications in layers of encryption ("onion
skins") that they then pass through a number of proxy server "nodes"
in the Tor network that can only decrypt the outer-most layer of
each communication before finally sending a given message to its
destination.

# Research Efforts

There has been a lot of academic research into these kinds of tools,
including efforts to improve/degrade tool performance, improve/defeat
anonymity guarantees, detect/hide malicious nodes in the network and
mask/profile tool users and the uses to which these tools are put.
There is an active community of academic researchers in anonymous
communications and they tend to congregate annually at the [Privacy
Enhancing Technologies Symposium][2]. A good bibliography of the
literature is at: <http://freehaven.net/anonbib/>

Continued research into improving these tools is needed to improve
their resistance to profiling, filtering, blocking, etc.  That is to
say, like any adversarial security game, the adversaries are working
hard to upset anonymous communication and if research and development
efforts stop, there will be no truly anonymous communication
capability. Ideally, a repressive regime will have to use the Egypt
option, of completely "turning off the internet" or perhaps blocking
all encrypted traffic in order to thwart the ability of people to
communicate freely.  (Of course, steganographic methods can be used in
unencrypted communications, but they tend to be less efficient,
requiring dramatically more bandwidth per message size.)

# Preserving Privacy in Research (Or, At Least, Minimizing Harm)

Research using these tools must walk a very fine line.  These tools
are actively being maintained by people who care dearly about the
continued availability of free communications and they are being used
by real people who may very well be risking their lives and freedom to
communicate.

There are a number of concerns that researchers need to take care to
avoid altogether or, if that is impossible, minimize as much as
possible:

1. **Disruption:** Activities can bring parts of the network down or
   degrade its performance.

2. **Interception:** Sniffing actual communications contents and
   routing information can identify users and put researchers at risk
   of prosecution for violating wiretapping and "pen register" laws.

3. **Dependency:** Removing research resources that the system has
   come to rely upon can manifest as inconsistency to users and put
   sudden loads on non-research elements of the tool.

4. **Profiling:** Fingerprinting end users and their devices and
   associating them can identify users, behavior and associational
   patterns.

5. **Reidentification:** Data that has been effectively "scrubbed" or
   "anonymized" can in certain circumstances be unanonymized in the
   presence of other types of data.

6. **Exposure to Future Capability:** Some technologies, such as
   encryption, base their security models on assumptions about the
   resources an attacker can bring to bear on defeating protections.
   Of course, an attacker's resources or capabilities may change
   dramatically in the future.

# Recommendations

Of course the potential benefits into a particular piece of research
always need to be balanced against risks of human subjects that might
be affected by research activity.  The simplest recommendation we can
make is when a research effort doesn't need actual humans to
demonstrate its goals, it shouldn't do so.

There will be cases where a research project is deemed sufficiently
important and in need of real world user data, activity or interaction
or simply needing aspects of a system that are substantially more rich
and varied than what could possibly be created in the lab with
reasonable resources.  In these cases, it seems important to follow a
few guidelines:

* **Seek external human subjects advice:** Traditional Institutional
  Review Boards are simply not going to understand the complexity of
  these tools and the associated risks to human subjects.  It would be
  important to discuss the research project with the anti-censorship
  tool builders, but it seems that the anonymous communication
  academic field, such at those that attend [PETS][2], could set up an
  informal IRB-like committee such that some affirmative approval from
  this group could serve as a mark of due diligence that privacy and
  security are receiving sufficient attention and ex-ante peer review.
  (Soghoian 2011, in a forthcoming paper on responsible research of
  the Tor network, advocates for Program Committees and Editorial
  Boards to enforce pro-privacy research practices by having them ask
  researchers for proof of IRB/legal clearance and then reserving the
  right to reject research entirely that crosses a few bright lines.)

* **When possible, notify users:** Some tools will not easily permit
  notice and/or informed consent.  Certainly any interactive study or
  study that runs code locally on user machines should have full-blown
  informed consent.  Network-based research should at least attempt to
  provide notice; for example, listing research details or a positive
  statement about what information is being captured, manipulated,
  etc. on a web-facing resource.

* **Disruption should be avoided:** No research effort should result
  in a tool becoming less useful during a project or inoperable.  This
  is especially important when the researchers know that a current
  event, such as the Egyptian revolution, might increase the demand
  for a tool (despite the fact that these are also very interesting
  events to study!).

* **Cessation of resources should be phased:** If a research resource
  is providing or adding to the capabilities of an anti-censorship
  tool, taking it offline suddenly should be discouraged.  Researchers
  should plan to gradually step-down resource provision or, even
  better, find methods to hand off or sponsor these resources as an
  ongoing gesture of goodwill for the cooperation of developers in the
  research effort.

* **Communications routing from users should not be collected:** Since
  the user base for these tools includes people actively engaged in
  activities their governments do not approve of, it's important to
  make sure that any communications routing information (IP addresses,
  etc.) is not captured at all.  Some aggregate data here could
  perhaps be OK, but detailed individualized data raises too much of a
  risk (See Loesing 2010, below, for a more informed and nuanced
  discussion of collecting such data responsibly for Tor.).
  Destination routing information to large, unindividualized hosts is
  perhaps less problematic, although peer-to-peer communications raise
  essentially the same issues.

* **Communications content should never be captured:** Contents of
  communications are explicitly protected in many countries by
  wiretapping laws.  A researcher puts herself and the tool users at
  great risk by capturing and storing communications contents.  While
  no researchers that have examined communications content have yet
  been prosecuted, that shouldn't give comfort.  In addition,
  capturing communications contents will undoubtedly expose sensitive
  information that could be used to identify the user, expose secret
  key information (passwords, crypto keys, etc.) or frustrate the
  abilities of these users to communicate and organize in secret.

* **Data should be carefully controlled and then securely deleted as
  soon as possible:** Most researchers want their data for eternity
  and a day, considering the effort required to produce it and the
  possibility of being able to use it in future research.  For
  anonymity tools, this can be especially dangerous.  Repressive
  regimes or law enforcement can easily subpoena data and get
  wholesale access to records they may not have been easily capable or
  legally permitted to obtain otherwise.  This recommendation should
  help to combat reidentification threats and risks of future
  capability that would otherwise render moot the protections afforded
  by anti-censorship tools.  (See Soghoian 2011 for a discussion of
  the trade-offs associated with ensuring that all data analysis is
  done on-the-fly, in ephemeral (RAM) memory, such that persistent
  records are never created.)

# References

A couple papers talk about the need to conduct research on anonymity
tools, notably Tor, in an ethical manner (see their references for
very good, more general, work on conducting cybersecurity research
legally and ethically):

* Karsten Loesing, Steven J. Murdoch, and Roger Dingledine. "A Case
  Study on Measuring Statistical Data in the Tor Anonymity Network",
  in *Financial Cryptography and Data Security*, volume 6054 of
  *Lecture Notes in Computer Science*, pages 203--215. Springer,
  Berlin, 2010.  <http://metrics.torproject.org/papers/wecsr10.pdf>

* Christopher Soghoian. "Enforced Community Standards For Research on
  Users of the Tor Anonymity Network", (forthcoming) in *2011 Workshop
  on Ethics in Computer Security Research*, 2011. *(on file with
  author)*

Papers, discussed in Soghoian (above), that arguably cross the line in
terms of ethical research practices:

* Claude Castelluccia, Emiliano De Cristofaro, and Daniele
  Perito. "Private Information Disclosure from Web Searches", in
  Mikhail J. Atallah and Nicholas J. Hopper, editors, *Privacy
  Enhancing Technologies*, volume 6205 of *Lecture Notes in Computer
  Science*, pages 38--55. Springer, 2010.
  <http://www.ics.uci.edu/~edecrist/PETS10.pdf>

* Damon McCoy, Kevin Bauer, Dirk Grunwald, Tadayoshi Kohno, and
  Douglas Sicker.  "Shining Light in Dark Places: Understanding the
  Tor Network", in *Proceedings of the 8th International Symposium on
  Privacy Enhancing Technologies*, PETS'08, pages 63--76, Berlin,
  Heidelberg, 2008. Springer-Verlag.
  <http://www.cs.washington.edu/homes/yoshi/papers/Tor/PETS2008_37.pdf>

[1]: http://josephhall.org/
[2]: http://petsymposium.org/
[3]: http://www.torproject.org/

----

This file resides on the net:  
<http://josephhall.org/papers/elsi-022011.html> (HTML)  
<http://josephhall.org/papers/elsi-022011.text> (Markdown)