When President Barack Obama hosted a summit earlier this year on his research initiative to gather DNA samples and medical histories on a million volunteers, he highlighted the delicate balance between science and privacy inherent in genomic information.
“We’ve got to figure out, how do we make sure that if I donate my data to this big pool, that it’s not going to be misused, that it’s not going to be commercialized in some way that I don’t know about,” Obama told a panel discussion. “And so we’ve got to set up a series of structures that make me confident that if I’m making that contribution to science that I’m not going to end up getting a bunch of spam targeting people who have a particular disease I may have.”
Even before the launch of Obama’s Precision Medicine Initiative Cohort Program, millions of Americans had already had their DNA tested for insights into health, longevity, paternity, ancestry and more. It has become a lucrative field, and represents the fastest growing segment of the $830 million DNA lab industry, according to a March report from IBISWorld.
Labs processing genetic information on our behalf do generally take steps to protect privacy by anonymizing and safeguarding the details. But most direct-to-consumer DNA testing companies also admit that fully anonymizing genetic information is a practical impossibility, and in any case, many either share genomic data by default, or encourage their customers to allow it. In part, this is because vast collections of DNA data could lead to very real advances in medicine, allowing researchers to uncover patterns and linkages across millions of genetic samples, improving their ability to diagnose and treat common ailments like cancer and heart disease, as well as rarer illnesses that are still poorly understood.
But they also could be used — or misused — for a variety of reasons. Some experts, for example, worry that genetic information could be exploited for identity theft. Using the same databases, marketers could also target individuals with particular diseases — or in a more unnerving scenario, those whose genetic information suggests they are likely to develop certain illnesses in the future. And in all these cases, individuals might well have little control, given that a cousin, sibling, or other relative sharing DNA information on an ancestry website could unveil telling insights into a whole family’s health issues.
Despite these concerns, the virtues of pooling genetic information are frequently extolled — as are the sort of privacy safeguards that many consumers might assume are ironclad.
“We encourage data to be shared if anonymized and consented,” says Jay Flatley, executive chairman of Illumina, which made a splash in 2014 by announcing the first full genome sequencing for $1,000. “This is the only way we are going to ramp up the discovery rate in human genomics.”
And yet, as noted by Linda Avey, the co-founder of 23andMe — the popular genetic testing and ancestry research firm — nothing is foolproof. “It’s a fallacy,” she said, “to think that genomic data can be fully anonymized.”
Indeed, although researchers and commercial companies say they anonymize genomic information before sharing it, many acknowledge that future privacy is a false promise when it comes to genes. “DNA is so unique, and there are so many data sources out there, that it is incredibly hard to fully anonymize — and more so to promise and provide any absolute guarantee that the data are anonymized,” says Laura Lyman Rodriguez, the director of policy, communications and education at the National Human Genome Research Institute.
Researchers have already re-identified people from their publicly available genomic data. For example, one 2013 study matched Y-chromosome data with names posted in places such as genealogy sites. In another study that same year, Harvard Professor Latanya Sweeney re-identified 84 to 97 percent of a sample of Personal Genome Project volunteers by comparing gender, postal code and date of birth with public records. (Like the U.S. government effort to gather medical data and blood samples on a million volunteers, the Personal Genome Project hopes its information will lead to new scientific breakthroughs.)
A 2015 study re-identified nearly a quarter of a sample of users sequenced by 23andMe who had posted their information to the sharing site openSNP. “The matching risk will continuously increase with the progress of genomic knowledge, which raises serious questions about the genomic privacy of participants in genomic datasets,” concludes the paper in Proceedings on Privacy Enhancing Technologies. “We should also recall that, once an individual’s genomic data is identified, the genomic privacy of all his close family members is also potentially threatened.”
After such re-identifications, researchers sought to strengthen privacy protections — only to see others find new holes. Even if outsiders can only ask yes-or-no questions about aspects of a genome, with enough questions they can unmask someone’s identity among a database of 1,000 people, a 2015 study found.
Using such information, third parties may be able to learn details such as whether someone suffers from cancer or has an autism risk in the family, says the study’s lead author Suyash Shringarpure.
For those considering whether to share data, it is difficult to understand whether and how labs share genomic profiles — and what risks may exist. Sometimes, the privacy guarantees sound pretty reassuring. For example, ancestry site BritainsDNA seeks permission to share “data with academic groups, but only using an ID number for your sample and never in a way which can be traced back to you.”
Several experts I asked expressed surprise that a company would claim genomic data could “never” be traced back to its source. “Why would you ever put that in there?” wonders Jacob Sherkow, an associate professor at New York Law School who studies technology issues.
BritainsDNA did not respond to requests for comment.
Ancestry.com, which has information on 1.5 million people in its AncestryDNA database, asks customers to consent to company-led research. While “this project will not contain information that traditionally permits identification of you, such as your name and birth date, people may develop ways in the future that would allow someone to re-engineer the otherwise de-identified data.”
When I asked Cathy Petti, AncestryHealth’s chief health officer, whether this language did not fully take into account recent re-identification experiments, she declined to give a direct answer, speaking instead of the “wide-ranging benefits for our entire species” of genomic research.
The Personal Genome Project takes a different approach. It requires its volunteers to answer a long series of questions correctly about the privacy risks of sharing genomic data. “Any promises of keeping data secure [are] also false if the intention is to share it, which it almost always is,” says director George Church.
Despite these risks of re-identification, some commercial DNA firms share by default. Veritas Genetics, a Personal Genome Project spinoff that recently announced full gene sequencing and interpretation for $999, shares “de-identified” data with public databases aimed to help science. Yet, one would have to comb through several pages of legal disclaimers on the Veritas website to find mention of the company’s policies regarding the sharing of DNA information for research purposes. Details of those policies are included near the bottom of Veritas’ informed consent form.
The fine print at 23andMe says the firm shares de-identified, aggregated DNA information by default with third-party firms to improve their service. More than 80 percent of customers also consent to share more widely, allowing the firm to share with or sell data to pharmaceutical companies such as Pfizer or Genentech, or to non-profits, says 23andMe privacy officer Kate Black.
“We understand that people are worried about it and that there are inherent risks here, so we like to take all the legal, contractual, and administrative precautions that we can to limit the scope of those risks,” she says. “We make sure all of our research partners and service providers are contractually obligated not to re-identify the information.”
By contrast, Sure Genomics, a Carlsbad, California company that offers full DNA sequencing, leaves it up to customers to decide whether or not to share DNA data.
“It’s our industry’s ethical responsibility to drive towards standardized language so when an individual shares their data they know exactly who has it and for what purpose — and feels confident their data is contractually protected,” says Warren Little, the CEO and founder. “If personal privacy is not protected, then data monetization without informed consent potentially becomes a reality. It’s a slippery slope.”
President Obama echoed this sentiment in February. “I would like to think that if somebody does a test on me or my genes, that that’s mine,” he said.
Still, it’s often hard to know if you have signed away your ownership rights because of lengthy and obtuse privacy policies. “If you look at enough terms of service and privacy policies you will see the word ‘may’ or ‘might’ being used a lot — as in we ‘may share’ — which leaves the door open,” says Jan Charbonneau, a PhD candidate at the Centre for Law and Genetics at the University of Tasmania in Australia.
“However, unlike privacy breaches in other online services where, for example, passwords can be changed, genetic data is irrevocable — and doesn’t just apply to an individual but to all their ancestors, family and future generations,” Charbonneau said.
U.S. law bars the sharing some medical data, depending on who does the collecting, but the federal Health Insurance Portability and Accountability Act does not cover anonymized data at all. That leaves the door wide open for a variety of actors keen to get ahold of such information — including commercial data brokers who gather dossiers on hundreds of millions of Americans with information on what we buy, where we live and work, our wealth, as well as health and other characteristics, to help companies target their sales and marketing messages.
“Genetic data outside of HIPAA-covered entities isn’t protected generally, just like all other health data,” says Washington D.C. privacy consultant Bob Gellman. There is a “big gap in protections here, and as genetic data comes into broader availability and use, it may pass into data-broker, profiling, and marketing files, just like other health data — except that genetic data may be about your kids too.”
Given the risks, experts suggest, people should be given a clear, understandable choice about whether and with whom to share their genomic information. To highlight the privacy stakes, New York Law School’s Jacob Sherkow quotes singer Joni Mitchell: “You don’t know what you’ve got till it’s gone.”
Adam Tanner is the 2016-17 C.W. Snedden Chair in Journalism at the University of Alaska Fairbanks and writer in residence at Harvard’s Institute for Quantitative Social Science. His next book, “Our Bodies, Our Data: How Companies Make Billions Selling Our Medical Records,” will be published in January.