Protecting Genetic Information in EHRs

Federal Adviser Dixie Baker Addresses Key Issues Marianne Kolbasuk McGee (HealthInfoSec) • July 23, 2013

Federal advisers haven't yet sorted out security standards for genomic data in electronic health records. For now, EHRs collect and protect genetic test results the same as any other laboratory test results, says security specialist Dixie Baker, who serves on several federal advisory panels.

"The same standards are used to order the [genetic] test and receive results as used for any other lab test. In my opinion, this is as it should be," Baker says in an interview with HealthcareInfoSecurity (transcript below).

On the other hand, a whole genome sequence, a complete record of all of the DNA contained in a single cell of an individual, is too large and too sensitive to store within an EHR, Baker says. "It's an ordered listing of all the genetic material in a strand of DNA."

Whole genome sequencing remains relatively rare in routine medical practice, Baker says. "Most sequencing is performed with a research context," she says, and storing such information requires up to 150 gigabytes.

"It's impractical to store a whole genome sequence within an EHR repository," Baker says. "It's more likely the whole genome sequence file itself will be stored in a separate repository that's linked to the EHR," similar to how a PACS system stores medical images.

Whole genome sequences are also much more sensitive than genetic test results, Baker says, because sequences are unique to the individual and disclose personal information as well as information related to parents and siblings.

"A whole genome sequence needs to be very strongly protected, separate from the individual's demographic information and phenotypic information," she says.

In the interview, Baker also discusses:

Privacy challenges involving pharmacogenomics, a promising area of personalized medicine in which drug treatments are based on an individual's genetic information;
Why some patients may prefer enabling easy access to some of their genetic information for certain medical research or in a medical emergency;
How genomic data should be protected as patient information gets shared among healthcare providers and researchers.

Before becoming senior partner at the consulting firm Martin, Blanck & Associates in 2012, Baker served as chief technology officer for the health and life sciences business at Science Applications International Corp. Since May 2009, Baker has served as a member of the Health IT Standards Committee, which advises the Office of the National Coordinator for Health IT. Baker chairs the committee's Privacy and Security Workgroup and the Nationwide Health Information Network Power Team. She also serves on the Health IT Policy Committee's Privacy and Security Tiger Team.

Personalized Medicine and Genetic Data

MARIANNE KOLBASUK MCGEE: Briefly describe what you think are the most promising areas in the quickly advancing field of personalized medicine and how genetic data fits in.

DIXIE BAKER: The use of genomic data to personalized care is likely to take many forms, involve different types of genetic information and introduce different levels of privacy risk. Looked at a different way, the use of genomic data can significantly reduce health and safety risks by helping reduce the likelihood that prescribed drugs and therapy regimens will adversely affect the patient, while at the same time it will introduce new privacy risks.

The most immediately beneficial use of genomic data is around what's called pharmacogenomics, which is the use of an individual's genetic data to predict a response to particular drug treatments. The most commonly cited example is drug sensitivity to Warfarin or Coumadin, which is a blood thinner that's given to about 2 million people in the United States every year. Determining the optimal amount to prescribe is challenging since there's no single dose that's right for everybody. Too much Warfarin will put the patient at risk for bleeding, and too little can lead to clots which could lead to heart attacks, stroke and even death. Genomic data can be used to determine an individual's sensitivity to Warfarin. The more sensitive they are, the lower the dosage they require.

Another example is the use of genomic data to predict an individual's response to certain treatment regimens. For example, thanks to Angelina Jolie, we know that BRCA1 and BRCA2 are genes associated with breast cancer. Similarly, there are specific genetic patterns that can be used to predict an individual's response to chemotherapy used in the treatment of breast cancer. Knowledge of whether an individual's genes are consistent with these patterns can not only avoid needlessly putting patients through chemotherapy and the side effects, but also can avoid the cost of ineffective treatments.

Data Privacy Challenges

MCGEE: What are the data privacy challenges involved with those examples of personalized medicine?

BAKER: I think the biggest challenge to data privacy is the fact that there are no absolutes. Every individual perceives privacy differently, and that perception will change over time and circumstances. No data are inherently private just because they disclose genetic information. Our laws, practices and privacy technology need to be sensitive to this fact.

For example, in the case of Warfarin and sensitivity, one might choose not to keep this information private but to make it very public. In fact, you might want to record it on a health-alert wristband so that in a case of an emergency the paramedics and the ER staff will be aware of the sensitivity. In the case of data that predict an individual's response to chemotherapy, one may not want to record this information on a health-alert wristband, but most certainly you will want your provider to know about it. Both of these genetic indicators can be derived from a whole genome sequence, the disclosure of which involves much greater risk.

Whole Genome Sequencing

MCGEE: What's the difference between whole genome sequencing versus specific genetic testing, and what are the data privacy challenges with each?

BAKER: A whole genome sequence is a complete record of all of the DNA contained in a single cell of an individual. It's an ordered listing of all the genetic material in a strand of DNA. Whole genome sequence then is the laboratory process for creating such a record based on a biological sample from the individual, such as saliva or blood sample. Generating the whole genome sequence does not answer any specific question other than, "What does this person's DNA look like?" A genetic test, on the other hand, is a specific laboratory test that answers a very specific question based on looking at a single gene. For example, there's a genetic test to determine whether an individual has a BRCA1 or BRCA2 mutation.

Protecting Genetic Info in EHRs

MCGEE: How should genomic information be protected as it's incorporated into electronic health records, and what are the challenges?

BAKER: First, I want to clarify that the Health Information Technology Standards Committee that I serve on has not directly addressed standards around genomic data. Nor has the Health Information Technology Policy Committee addressed genetic privacy policy. All I can tell you are where things stand now and where I think they're headed.

Today's electronic health records record and protect the genetic test results the same as they record and protect any other laboratory test results. The same standards are used to order the test and receive results as they're used for any other lab test. In my opinion, this is as it should be. Right now, there's not much whole genome sequencing going on in routine medical practice. Most sequencing is performed within a research context or by individual consumers ordering their own genome sequences. A single genome sequence contains about 3 billion base pairs and requires about 150 gigabytes of storage. It's impractical to store a whole genome sequence within an EHR repository. It's more likely that the whole genome sequence file itself will be stored in a separate repository that's linked to the EHR. I see these as similar to how images are managed using a PACS system today.

In addition to the storage challenge is the sensitivity of the whole genome sequence. A whole genome sequence is unique to the individual and it discloses not only attributes and conditions that the individual may already know about - like their hair color, eye color and gender - but it also has conditions that may lay in the individual's future. Also, a genome contains information relating to the individual's parents and their siblings. A whole genome sequence needs to be very strongly protected, separate from the individual's demographic information and phenotypic information.

Securing Shared Data

MCGEE: How should genomic data be protected as patient information gets shared among healthcare providers and researchers?

BAKER: As I mentioned earlier, the term genomic data covers the entire sensitivity spectrum, from something that should be openly disclosed, like the sensitivity to Warfarin, to something that should be protected at the highest level, like a whole genome sequence. I see genetic test results being shared the same way as any other lab result.

However, sharing of a whole genome sequence requires much more discretion and a higher level of protection. "Privacy and Progress in Whole Genome Sequencing," a report written by the Presidential Commission for the Study of Bioethical Issues, talks about three ways to share data: use, access and possession. "Possession" involves holding a copy of the data; "access" involves seeing the data; and "use" involves seeing a result that's derived from the data. We should minimize the number of entities that possess a copy of a whole genome sequence. Ideally, providers and researchers would be able to query the sequence without having direct access to it at all. And if access is essential, they should be allowed to view the data as an image without having the actual data bits flow to their computer. Whole genome sequences should be encrypted both at rest and in transmission.

Technology Solutions

MCGEE: Are there any promising technologies or processes that you think will be important to improving the protection of genetic data as personalized medicine advances? If so, what are they?

BAKER: I think technology that enables federated query across multiple data storage will be extremely important. Federated query will enable a provider or researcher to pose a question to multiple data stores simultaneously and to then receive a single response without having direct access to the data themselves. This is what the Query Health Initiative out of the Office of the National Coordinator addressed. This capability will become increasingly important to the realization of what we're calling the "learning health system."

I also think technologies for codifying, managing and enforcing consumer preferences and permissions for the sharing and use of their personal information will become very important. The current model of every provider and every researcher collecting patient signatures on consent forms that have been filed away, totally disconnected from the use of the individual's electronic information, is unsustainable for two reasons. First, as patient information becomes more broadly shared, the electronic faxing of signed consent forms with every exchange becomes impractical, disruptive and even less likely to be enforced. Second, as consumers are becoming more engaged in their own care - and, I might add, more computer-savvy - they're demanding more control over and visibility into how their own information is used. As you know from an earlier interview with Greg Biggers [see: Data Registry Gives Patients Control], the Registries for All - or Reg4All - consumer-driven research registry has incorporated these kinds of permissions-management capabilities.