>> good afternoon. i'm judith greenberg, acting director of the national institute of general medical sciences. i would like to welcome you to our annual dewitt stetten junior lecture. our speaker today dr. russ
altman of stanford university has broken new grounds in applying computing technology to medicine primarily in the area of pharmacogenomics. he began focusing on the intersection of computation and medicine long8 before it was trendy during his graduate
studies at stanford. this positioned him well to mine the profusion of data that has emerged from the human genome project, and a wealth of other basic science and clinical sources. dr. altman's for tai is integrating different layers and
types of data into a coherent picture that suggests hypotheses and experiments to test them. he also develops and applies bioinformatics tools in novel ways that reveal conceptual and physical relationships. dr. altman received an ad in biochemistry in molecular
biology from harvard college, a ph.d. in medical information sciences from stanford and an m.d. also from stanford. he became stanford faculty member in 1992. in addition to his own research, he directed two nih funded projects, the national september
e for biomedical computation at stanford which focuses on physics based stimulation of biological structures and the pharmacogenomics knowledge base. dr. altman's honors include a presidential early career award for scientists and engineers and election to the institute of
medicine of the national academy. i refer you to the program for further details about his impressive background. i'm proud to say that nigms spoirted dr. at map's -- supported dr. altman's work for more than decade.
before i invite him to the podium i would like to take a moment to tell you about the highly regarded scientist for whom this lecture is named. dr. stetten spent most of his career at nih serving in two institutes and in the office of the nih director.
he was perhaps best known as nih dep at this director for -- deputy director for science but prior the that time he served as third director of nigms. in recognition of his many contributions to nih and to the scientific community as a whole, dr. stetten had the require
distinction of having two nih entities named after him. this lecture is one of them. and the other is the dewitt stetten jr. museum of medical research dedicated during nih's centennial in 1987. dr. stetten died in 1990. and now on to dr. altman's talk
which is entitled the emerging network of data for understanding the interactions of genes and drugs. dr. altman. [applause]d >> thank you very much for that nice introduction. i have had a great davie sitting
particularly nigms and it's very good. thank you for coming even in the rain to the talk and hello to the tv land out there, i understand there's thousands of you. hundred, tens, whatever. i just want to say one word
about dr. stetten. of course when you are invited for a distinguished named lecture if you're smart you find out about the background of the person. and there's many things you just heard about, a few things that i want to highlight that really)
struck me and sounded familiar is he was born in new york in 1909, he did his m.d. at columbia university and did an internship and residency at bell view hospital which is exactly true of my grandfather. so i think there is a chance that my grandfather and
dr. stetten knew each other. i don't know that for a fact but those three facts are definitely true for both of them. so it's possible. that was funny because my grandfather would have had -- he passed away a few years ago but he would have had his 102nd
birthday yesterday so i thought it was a nice little connection. so i'm very honored. i want to thank, i want to do my thanks up front. i want to thank nigms they supported my work more than a decade. if you include the fact they
supported my ph.d. and ph.d. training through the msdp program they have been supported my work for three decades for which i'm very grateful and you have heard about -- (indiscernible) we have a training grant. and also national library of
medicine and ninds has made a for the first time part of an award from them. i want to acknowledge gifts an conflicts which i don't think are relevant to today's talk but i list them anyway. and i put my phone on mute. and i want to thank my
co-workers. this is some of the people in my laboratory who work very hard, almost all of them have a project that involves drugs, that's why i'm talking about drugs today. you will be hearing about the work of the people i have
circled and whose last names i've provided on this slide. of course everybody is working hard and i'm grateful for all their efforts. so i want to outline today what i'm going to do in our laboratory, we have a focus on drug action.
very broadly at the molecular, cellular an organismal level with applications particularly in pharmacogenetics, understanding drug mechanisms, repurposing, using old drugs for new purposes, and discovering drug interactions. our tool kit, this is i think
what makes us a little different is entirely computational. i have space but i do not have a bench. so if there are experiments to be done it's because i take a experimentalist out for coffee and convince them to do them. or i pay somebody to do them but
they don't happen in my lab normally so we're really relying on using data and computational techniques to generate hypotheses but also to at least develop initial proof that these hypotheses are reasonable. and i'm going to tell you about -- we do this for discovery and
leveraging the investment. what i mean by leveragingyã‘ the investment is there's a sense in informatics that valuable data sets are not completely mined for the nuggets of gold that they have within them. and i wants2ã± to tell fourstories today of data that was collected
through high through put data collection experimental efforts published by others and we have gone back over and found some interesting things. i think that the message that i like to get out today is these informatics technologies some of which are simple, some complex,
weal have a range in the talk today but they're all potentially extremely powerful for understanding what's going on. and they're even more powerful in concert with onen another. and the title of my talk was something about a network of
data. that's what i mean. any given modality may give you confidence that something is true but when multiple modalities start telling you something is true from an independent perspective you begin perhaps to believe it.
so the four stories i'll tell today go from molecular to population base, one of the beautiful things about not having a bench and not having any equipment is we can look very broadly at data from multiple sources without having to retool the lab.
so the first story is structural molecular data and drug repurposing. the second story is about the use of expression data to understand cellular mechanisms of action. then i want to talk about text mining as a way to predict drug
interactions. and then i'm also going to discuss drug interactions into the context of mining population based databases and electronic medical records. so we're going to go from the molecular to the population. a lot
from the two projects that are nigms funded that i want to briefly mention. first is the pharmacogenetics knowledge base. we have the goal of categorizing and collecting all information about how human genetic variation impacts drug response.
we have been in business for 11 years, this is a website i'm proud of it but i won't talk about it. this is kind of the one stop shop for the effects of genes on drug response and i encourage you to visit it at/ one thing i record every year is
number of hits opt site so hit it from home, work, pda as much as possible. one of the -- but the key point about pharmacogenomics is in many ways this is a very important goal for medicine because you can only do pharmacogenetics when you
understand the details of drug actions. so this is a pathway for how warforin works that you can find. this is the pharmacodynamics pathway. we also have a pharmacokinetics if only we had these kinds of
pathways for all drugs, then we would know the molecular players that modulate the response to a drug. with that information thenx,ã· we could then look for variations in the hawm genome that change this response, and therefore we could do very robust
personalized prescriptions based on genome and other information. so for example on pharmgkb you can find out there are two genes for warforin, a site cytochrome p-450 and the target and if there's snps, single nucleotide changes in dna, that have specific impacts on
warforin dose. in particular if you have this particular snp in the cytochrome aa has an average chance of response to warforan where cc is associated with decreased dose. so the vision for pharmaco genetics and all understanding of drug action is to understand
the molecular mechanism and even how variations modulate that mechanism. so much of what we do is driven by this long term goal. we use pharmgkb to do annotation of a single human genome and the headline there for you is that there are hundreds of drugs that
we felt like we could come up with clinical advice. this was published in lansett last year. i'm not going to go through the details but i'll say this is not a one or two drug field. this is a field that has evidence for hundreds of drugs
that we're able to collect in annotateing a single human genome. we published another analysis of a family quartet, a mom, dad, daughter, and son. and i'm not going to talk about it but i'll show this table from the paper which are variance
associated with adverse drug response. we have dad in dark blue, mom in dark pink, boy in light blue, girl in pink. and really this just shows that genetics is as advertised, sometimes the whole family agrees because they all have the
same genetics, but sometimes mom and dad and sonjf agree and mom and daughter are more likely to cause a side effect versus less likely to cause side effect. we can do this for many, many drugs. so we're making good progress in pharmacogenomics and there's
still work to be done. at the other end is my interest in three dimensional structure an function. this is what simbiose does, the national center for physics based simulation and one of the things we study there is g protein coupled receptors.
g protein coupled receptors, the beta 2 ajnergic receptor, these are targets as many of you know for a large percentage of drugs on the market but in many cases we don't have a deep understanding of how the drug binds or how it affects the confirmation to create the
signaling pathways. we'd like to understand that. so one thing we do is put a gpcr which you can hardly see in a lipid membrane with water on the internal and external parts of the cell and simulate the dynamics in order to understand what the motions are that may
he and function. so now i'll tell you four stories. let me begin with an apology, by its nature four stories will be short and will not go into a ton of technical detail, almost all of these -- this work is
published but in order to get the big picture i need to go up a little bit higher altitude. and so i apologize for anybody who finds this technically unsatisfying. so structural genomics initiative has been a big push in the last ten years led by
nigms. the goal was to get an increase in three dimensional structuresch this is a fairly successful with about 70,000 three dimensional structures in the protein data bank now and many structures are complexes of proteins with small molecule
ligands, including drugs. so a question you can ask, can we identify or use this information to predict new interactions between proteins and small molecule drugs based on the examples that we have. i'd like to tell you that i think the answer is yes.
so for the last few years my lab is developing the feature system for describing microenvironments within proteins. so here you see a protein, this is a piece of a protein. i have drawn a blue circle around a microenvironment that's approximately seven angstroms in
radius. what we do, we go to the microenvironments and we build statistical models of them based on the occurrence of various biophysical and biochemical features such as types of atoms that are found, listed at the bottom here, the types of
residues that are found, the secondary structure, the charges the hydro febicity and a listening list of biochemical and physical properties which give a signature, a computational or statistical signature of that site. we do many things with these
signatures. one thing is we look for other sites that are similar. we have shown previously that we can detect remote similarities in different proteins that proteins that are remote but have similar microenvironments that may therefore have similar
function. one of the applications of this for drug or small molecule binding is shown here. what i have here are two protein structures both which bind flavins. you can see one flavin here and one flavin there.
i'll talk about those in a second. then you see the feature balloons. what these are are feature microenvironments shown in different colors. i have chosen the colors on left and right for environments that
were very similar based on a similar pattern of physical and chemicalb" properties. so what you can see is there's a blue balloon here and a blue balloon there. there's a purple one, et cetera. what we found, this is not surprising, was that different
types of environments were associated with finding different parts of the small molecule flavin. for example, here you have the adenine, let's take an easy one. we have negatively charged oxygen-rich areas in the middle which are near the purple and
the red and near the purple and red here. i want to point out the exact configuration of these balloons is not the same in the two structures but they're binding similar portions, so you can see the blue balloon is near this three membered ring, the triple
ring and so that is true here as well. and what we found is that feature is able to detect the similarity in these two binding proteins, quite distant evolutionarily but we were able to see the similarities for the common configuration of these
microenvironments. this is a hard example because flavins are known to have two confirmations, the butterfly and the extended. butterfly is bent. you can tell me if you think that looks like a butterfly. and this is the extended
confirmation. those lead to different configurations of the colored balls and yet there's a conservation of the colored balls overall just with with different geometric locations. so we wonderedã‘fhut could use feature for repurposing say we
different have a flavin bound to this pocket, could we recognize the flavin has certain colored environments that are present here ab therefore there would be a possibility for the flavin to bind. that's the kind of idea for a structure based repurposes.
this also works for gpcrs. we have the beta 1, 2 and the adenosine receptor. somewhat divergent with different functions but they all have a common core feature environments that we think are related to the function. the function of these molecules
is different. they bind different drugs but they go on to have similar confirmational changes and we think we can identify the environments that are conserved. finally really more directly towards this possibility of repurposing we were recently
looking at kinase structures, these are two kinases that are diverse in terms of their three dimensional structure. yet we saw a very similar constellation of microenvironments which led us to believe inhibitors binding this kinase might also bind this
kinase. and now we're going out, i wish i had the results but we're working with a company that does kinase assays to test the binding of a known ligand for this protein and see if it binds this protein. we don't mow the answer yet but
we have some confidence based on the certainty scores and our calibration that this is a pretty good guess. what's the point? the point is using publicly available three dimensional structures and some algorithms we're able to come up with
hypotheses about small molecule binding that can get very far before we spend any timev resources doing experimental tests so this is part of our emerging network. so i want to move now to molecular and cellular data and particular expression data since
about 1998 there's been an explosion of expression data from different cell types and under different conditions where the level of expression of a gene is measured in a population of cells. in fact there are measurements, these are kept at the gene
expression omni buzzer maintained by the ncbi down the street. there are literally more than 20,000 human samples in that database and there has been published already some examples of drug repurposing based on finding common gene expression
patterns, in fact in one case they took the gene expression pattern for a disease which genes were up, which were down and they found a drug that did the opposite, very simplistic idea that the drug should undo the disease but in vivo tests they showed some efficacy and
that it was not an unreasonable thing to do. this was in science translational medicine a couple of weeks ago from my colleague. so we wondered can we understand drug action using this publicly available data or part of the network if you will.
so you have all seen gene expression snped expression chips. this is one sample one, i made it up, but if you look into geoyou will find 20,000 arrays covering all sorts of human biology, normal biology, cancers, rare diseases,
diabetes, cholesterol pick your favorite disease, there's an array in geomeasuring the gene expression. we did a algorithm called independent component analysis or ica.㺠a component is basically a hidden signal in the data and
i'll take you to a slide to show you the idea behind ica. this was published last year. so here is a number of arrays and if you look carefully you can see some patterns that these rays are all combinations of these three basic patterns. an apple, an rna and something
that looks like a real microarray data set. and what we have done in creating these is we have weighted them different amounts to create different array results. and the hypothesis we had is there really are fundamental
modules or components of human expression such that all 20,000 human experiments in geowould be some sort of combination of basic biological modules, pathways, call them what you will. so independent component analysis is designed to pull out
such hidden components. so we applied our ica to the full set, we downloaded all of those arrays, 22,000 arrays, applied the independent component analysis, and defined 450 fundamental components that were sufficient to explain almost all the arrays in the
data set. so every array was a combination of these 450 components. we plot these components here in a two dimensional plot. i just do this to show you that when we look at the genes in each of these modules, they are not random but instead they
correspond to what you would expect to see in terms of functional modules. so in the upper left we have cell division, mytosis and dna replication type genes. over here we have cell differentiation, epidermal differentiation, we have immune
response, cell this is remarkable to us because we did not build in any knowledge of biology. we simply asked the mathematical question what the key set of bases vectors that allow us to concisely express experiments. when we got those vectors and
looked at the highly ranked genes we found a lot of known biology. i should point out -- bad to have a cell phone anywhere near a mic. i should also point out there were very nice components that had no labels we didn't have any
theory why all the genes were together. it could be wrong but it also could be entirely unexplored parts of biology that the data is telling us it sees but that we don't recognize because we haven't learned that part of biology yet.
so side note, all the ones that don't have good annotations which will be number 201 and cluster number 251, they look just as good as the other clusters that do correspond to so the hypothesis would be we need to do work on those genes and what their functions are.
so we have these components. what do we do with them? it's about drugs. we looked at the response to a drug perfenalide, a drug used in cancer. we had cd34 cells from 12 aml patient, all in geo, all in the same data set and they had
samples from the cd34 cells before an after treatment. so f8y first question we asked, what components as a group show up or down regulation in the treated or untreated patients. so this is the kind of picture i'm sure you have seen many times with treated an untreated
patients are the columns. but you're probably used to seeing genes along the rows. they are not genes, these are entire components. so component 373 is up, red is up, in the untreated and down in the treated patients. almost without exception.
so it's a module-based analysis instead of a gene based analysis. here i just have statistics.8 and there are about ten or so of these components that are really markedly different in the treated versus the untreated. but because 373 was so
impressive, so red here and so green there, though not entirely, we took a closer look at that. we can look at the highly weighted genes in component 373. and we could look back in geo at the diseases that have a high contribution from component 373
when we do that combination of components to recreate their in a very satisfying result we got a series of all and aml leukemias which are indeed what perthenal kyde treats but you wouldn't have had to told us that because when you analyze the gene expression changes they
were in the exact same component that seems the most disregulated in these cancers. that was a confidence builder. we next go and build a pathway. what i show here is a pathway where the red are up-regulated and the green are down-regulated by prethenalide and there's good
genes here, necrosis, nf kappa b and a number of kinases. so now we have databased method with no understanding, i had never heard of it before this study and i'm not an expert and all but by doing this analysis i got my components, i looked at the
components and now i have very specific hypothesis about the pathways that parthenalide is affecting. any other disease that involves this pathway, especially if that disease up regulates this pathway would be a tarlg for treatment.
this is how i can imagine a routine use of expression data to find repurposing opportunities. the recipe would be take a look at these genes and see if they're in general up regulated, red. and if they are, consider
parthenalide as a potential so again that's a short story. we publish more about it but i wanted to give a flavor how the expression data can lead directly to repurposing and understanding of drug mechanism in more detail. i want to move to contextual
generalization. we talk about molecular, 3-d structure, chemical t-cell lar expression data. now let's look attention. pubmed holds 20 million abstracts. arguably it is where biomedical knowledge is stored.
if you had to point to one spot you would point at a server room on lister hill and say that is holding basically the entirety of these medical literature. especially if you had whole text through pubmed, et cetera, et cetera. but as a computational person i
have mixed feelings, but i like to read it as human but it's a disaster for humans. computers are not good for reading written human text. that's where the gold is. so we have to ask this question, i asked begrudgingly, can we train computers to read the
literature and reason over them and in particular can we do it to predict drug, drug interactions? i want to show you that i think the answer is yes. so in the computational science field of natural language processing or parsing nlp,
there's been revolutionary advances the last few years. one is the ability to do do accurate parsing of sentences so the nouns and the verbs the prepositions, all things you might have learnedded in the fifth grade are identified by computers with good fidelity.
here is a sentence. the a b c b 1 c of-34, 35 t polymorphism influences4h methotrexate sensitivity in rheumatoid arthritis patients. that's an actual sentence from pubmed which we can parse. you can see it recognizes some things as nouns, it recognizes
influences as a verb, et cetera. and it builds computational representations of the sentence. i will not go through this although it's a fascinating area where linguistics and computer science and in this case biology all meet. but i'll leave it at the fact
we're able to find the object subject of sentences. we're interested in sentences that say gene verbs a drug or drug verbs a gene. like gene metabolizes drug or drug activates genesch these are sents of intense interest to those who care about
pharmacology. one of the things we can do is we can recognize genes and drugs and diseases and we can recognize the relationships between them. so in this case we had an abcb-1 polymorphism that influences methyl trek sate sensitivity.
in a special trick that i'm not going to discuss we created a standardized vocabulary to map the arbitrary phrases into formalized phrases that are controlled terminologies. so this is a controlled term called variant, we have map influences to the word effects.
methotrexate to the concept methotrexate drug sensitivity. what that enables us to do now is go through the entirety of pubmed, and look for every single sentence that has a verb and a gene as the subject or the object as well as the verb that relates -- a noun, we can look
for all the sentences with a gene or a drug and we can look for theã· when you do that to pubmed you find 170,000 different relations between genes and drugs after the normalization with the control vocabulary. so actually we're up over around
500,000 sentences before but of those 500,000 of them, many of them are saying the same thing and they boil down to 170 things that are saying different things. so what you see here, i should add if it looks familiar it's the graphic on the advertising
poster for this lecture. produced by my student beth purcha. we have a network that has drugs as green circles and genes as red circles. this shows you the complete knowledge of gene drug interactions at the level of
pubmed. it's complicated. there's famous genes that we know, cyp 2d 6 is a hub that affects many, many drugs. and there are othersch this is actually only a piece of the network. you can tell because there's
only about 30, 40 genes shown here. each sentence has a drug, a gene and a relationship. let me zoom in on a piece of this which we have the drug verapamil and abcb-1, mdr multi-drug resistant transporter or glyco protein.
the drug metabolism is influenced by polymorphisms and abcb-1. that's all represented in the computer in a controlled way. verapamil is metabolized by mabc-1. the treatment causes repression of abcb-1 and it regulates the
gene product abcb-1. so we learned quite a bit about verapamil and's all now i'm happy to report in a computerized computer legible form. so now we can do some magic or informatics. we can chain together two gene
drug relationships to look for unrecognized drug, drug so if each link in my network is a drug gene interaction, then if i have a drug that interacts with a gene and another drug that also interacts with that same drug -- gene, that would be two of these, linked together,
that might create ce pending on type of relationship that is found between those drugs and the gene, that might be an indication of06" druginteraction so wet bent to 170,000 gene drug interactions, provide ad bunch of known drug, drug and learned the kinds of genes
that occur in the middle and the kinds of relationships that occur on the side in known drug, drug interactions and then asked are there any others that are unknown but match the pattern. so if this drug is metabolized by this gene and this drug induces this gene thn you might
expect these drugs would have an interaction. that's the kind of thing she learned by looking at positive examples. when she did that she had a long list of drug drug interactions. she's not a physicianch all the people who are not physicians
the favorite use of advice, me is to walk into my office an p give me these long lists and say is there anything useful here. what she did thankfully before she walked into my office she went to a known online resources with drug, drug interactions. and all the ones in white here
are known to be drug, drug so she did not use them in training but she predicted them and they're known so we don't get credit for discovery because it was already known. for us it's a confidence builder because we don't provide that information and you can see
there are many, many positive ones in orange or yellow and the color difference is not important now, it will come up at the end, are ones where the strength is just as strong as the other ones on the list but these have never been reported. so these are predictions.
so let me talk you through a prediction and where it comes from. beta blocker and dex tra meforfan is a cough suppresssant. so one of the sentences normalized form is metoperol is metabolized by this very long
sentence, dex tra methorpan were 2d-6 substrates. though it's mentioned here, that's not what we take out of this sentence written in 2000. we just take out that it's metabolized by 2d 6. in a different abstract in 2005 there's a sentence a 2d 6 also
metabolizes dex trameforthan. there there are a lot of other features that look good based on the training set, i'm going to call this a high conversation prediction. if you look at all the genes in all those sentences that we found, 2d-6 is clearly the one
that's doing the work in the relationship between dex tra meforthan. so we're going to electronic medical records in a way to describe now to see if we can validate by looking for patients on both drugs and see if they have unexpected side effects or
expected side effects. my final story is now population level. population based and clinical data for generating hypothesize about drugs. fda releases adverse event reports regularly, every quarter, but tes thes are
complex because for each patient it gives all the drugs they're on, all the diseases they have, all the side effects they experience so many people looked and say this is a mess there's nothing here of value. but we thought that maybe there was some value there.
and we also are intrigued by the possibility that electronic medical records do have information about drug exposure and the effects. so can we mine these two clinical sources to discover new drug, drug interactions? so what did we do?
we went to the fda database and we built a statistical model to distinguish glucose altering drugs from based on adverse event signature. so here we have diabetes drugs and a bunch of -- other diseases which patients who have diabetes also have but that are not
diabetes. there's a reason for that, i won't go into it. the important thing, along the column we have all the adverse events reported in the fda database. and what we looked for were adverse events that were
enriched in diabetes drugs compared to the controls. please think of these as controls cases. there's a heavier darkness compared to here, here and here and here. and so these are adverse events we see in abundance in diabetes
drugs and we use that to create a signature. the first thing we ask, doesr+ that signature recover all diabetes drug? with with 93% accuracy, i can tell you if something is a diabetes drug based on its side effects.
the side effects are shown here, and some are what you expect like diabetes outs of control or paristhesia, depression, hypoglycemia, diarrhea, and anorexia. some are enriched in diabetics, and some are deenriched or more rare in the red ones.
but these are a very good signature for drugs that alter glucose, including diabetic drugs so my student had a clever idea. why don't we take patients who are on two drugs, neither of which is a diabetes drug, but let's see if they have glucose
altering effects. either up or down. so he took all drug pair, he applied the signature, and what -- he walked into my office and said russ, there are four pairs of drugs that as far as i can tell have nothing to do with with diabetes but where i see a
glucose signal. they were these and the one that caught my attention was peraxitie or paxil and pravasta ten. i nigh there are lots of people on these two -- knew there were lots of people on these two drugs so i said if that's real
ooh that's interesting. there's about 15 million americans on each and probably about 500 to 800,000 americans on both of those two. for the first -- first thing we did is go to stanford emr. electronic medical record and looked for patients on one of
these, i call them p and p, who is on one p and had a glucose measurement and then got the other p and also had a second glucose measurement all within 60 days. that's pretty strict and we only found 12 patients. but at stanford hospital those
12 patients had pretty impressive increase in their glucose. now but it's 12 patients. so yes, i'm an informatics guy but not an idiot. i know that that's intriguing initial result but is not probative of anything.
in fact we looked at paxil alone and they were pretty flat and the combination did seem to go up but there was funky differences in the baseline and 12 was just not enough. look at those error bars so i called up my friend at vanderbilt and harvard, showed
them these results and say hey, could you go into electronic medical records? dan roadn, can you go into your records and check? so vanderbilt came back, they had 20 patients and saw the same effect and harvard had 106 and they really saw the effect.
so i think all together, i don't have the all together but all together a very clear tiet error bar, 20-milligram per december litter increase in glucose. in this retrospective analysis of the electronic medical record with the individual drugs alone, over 60 days showing no bump.
we also looked at other combinations of antidepressants an cholesterol drugs and none had any significant bump. this one had an unsignificant bump. barely significant, barely unsignificant but this is specific to those two drugs
which we don't understand yet. they're most concerning was the fact that diabetic patients the rising glucose was 50-milligrams per decaliter with no change in baseline when you start one drug or the other drug alone. so we submitted this paper, it was rocky because it was
computational. we did this crazy data mining thing to get the hypothesis and then we did data mining to get evidence in favor. i'm happy to report the paper was acceptedded based on this but we knew that people want in vivo data so not knowing what
would happen to the paper we exposed mice to the drugs as so what you see here are mice on just diet controls, high fat and regular and mice who are getting teroxatine and pravastatin. this does seem to be true in vanderbilt, stanford and harvard
patients and mice. so we're following up because we done have a mechanism. so really i have said what i wanted to say at a high level. i think the informatics because they do profied this powerful tool to allow us to do discovery at the molecular, cellular an
organismal level. i told you separate stories there. it's exciting they operate on primary data but they can also operate on process knowledge. they can individually contribute to our understanding of drug action but i think the
combination is where we all need to move. so let me just say something about that. the orange guys here i didn't tell you what they were before, these were things beth predicted from her text mining but that nick the guy who did the mouse
studies and the peroxatine saw in the fda data. if i were a betting person, i definitely am a betting person f you ever meet an informatician who said they're not a betting person, they're lying. i would say these are much more convincing than the two yellow
because i have two entirely different lines of evidence, biological text and fda data mine and emr validation. so we're looking carefully at naprocin and valium and varapomil and texafenaldine. we are looking at molecular purposing, the molecular data an
expression data to look for pathways that are affected. so we have a known drug target be but preding interactions based on similarities opt pocket in the protein but percolating that down to expression data to see what diseases might be affected, or effect of the
disease shows sensitivity to these new drugs so i think this is to show you this is lost upon us, whatever you can do with the individual data sets, the combination is really where the future is. so i want to leave you with the promise emerging network.
here we have the 3-d structure, we have drug- binding, all can be done with publicly available databases. we have the cellular responses available from geo, we have the text mining, we have the population database mining from the fda and we have the
electronic medical records and and for each one of these it means a combination between students in my lab or to themselves thinking about integration experiments that bring us together. what results i think, i hope i can convince you of this, is a
really powerful network of data that allows us to do a lot of work in the computer before we touch a mouse. we did mouse experiments and i was how confident, very confident that we were going to see an effect, was i sure? of course not but the point is
those are focused experiments, that entire experiment cost me $3,000 and i only spent that money when i had a focused hypothesis. so i think the real message here is you can do not only hypothesis generation but initial triage so you're
experimental work can be extremely focused and therefore cost effective and therefore hopefully catalyze and accelerate sients and efficiency of science. so with that i want to stop and thank you. here are the various websites.
one of the things that cymbiose does is put out a quarterly magazine called computational review where we try to celebrate the joys an miracles of buy medical computation. it's online but you can get a hard line or a copy an request that.
so thank you very much for your attention. i think we have some time fort questions. [applause] >> hi, that was a great lecture. so i'm very curious about the text mining. did you put weights on
connections that you saw multiple times in five or six journals and did you exclude weights that you saw once or twice, it might be new but it still might be real? >> great question. what i thought you were going to ask -- i'll answer the question
you didn't ask, do you weight new england journal higher than a solid disciplinary journal? and the answer is no, but we thought about that. not sure which way the weights would go. different discussion. if you remember we actually for
any gene drug interaction we have hundreds of sentences that relate them so we don't have to weight them because the number of times that said feeds into the machine learning algorithm, it notices certain wodz and relationships occur more often. so one thing we're worried about
is whether we shall remove the redundant ones and then we have to wor i about weighting but right now it's an implicit weighting. i want to stress i'm sure those who are good scientists are saying wait a minute, everything in the literature is not true.
and i'm totally aware of that. so that's why these are hypotheses based on things you could read in the literature and after that, caveat mtor an normal scientific skepticism has to be applied such as asking if it okay considers in two different independent -- occurs
in two dind analyses. so no weighting except implicitness of the the sentence. >> real quick. do you think going into the actual journal instead of the abstract will give you more or less information or doesn't
matter? >> there is a literature on this, turns out that you can get misled by the sentences in the discussion because people speculate more. and that might lead to more false positives bsm you definitely also get more true
positives because there's more time to mention more true with the growth of pubmed central this experiment will be doable pretty soon. so we're big fans of pubmed central because it gives full text access but we haven't taken advantage of it yet.
>> thank you very much. >> i guess we'll bounce back and forth. >> so the ica approached expression data impressive. i'm wondering if it's been applied to gwas data that's probably an order of magnitude more available public data and
we're missing a lot of genetic components of many diseases. >> i think that's a great idea. we haven't done that. the number of gwas for drugs is just beginning to come up. i try to keep the lab focused because there's a rot to do -- a lot to do.
i don't know if you call that focused but every project has drug -- the word drug in it. so i'm aware of a number of gwas for drugs coming out and it might be relevant but we haven't done it yet. >> what drugs do you think you might have covered because we
still don't know so much of other genes and proteins, you mentioned a few of them. >> so i think that these cellular based assays are incredibly powerful for a database unbiased view at drug the problem is many drugs might not act via transcriptional
control. so that immediately makes you a fan of proteomic and metabalomics. in theory the problem i think we all know is those are still promising emerging fields but they haven't crossed that threshold where we can get large
amounts of reliable digital when they do i can guarantee you my lab will be looking at those data sets trying to pull things out so that's a huge shortcoming of only looking at expression the gwas and genomic data is also very powerful, so we shouldn't be complaining.
we do have a richness of information but for drug response i still think you're right, that we're still challenged with respect to the full amount of data that we would like to have. >> also there are drugs which behavior which are really pro
where do you stand there? basically you're looking at the metabolites as of the drug itself. >> that's where metabalomics comes in because you can measure those. there are some interesting progress in creating
computational livers, these are models that take a drug and apply various transformations that the cytochromes are able to do, add an exgeneral here or to come up with credible hypotheses about what the metabolites mayusã· be. so i think another area, i
haven't stressed it but the area of chemo informatics, colleagues have been thinking about computational methods for handling huge libraries, it's very exciting to think about biology and bioinformatics together with chemo informatics. we don't have results in my lab
but we have projects now where we're starting to look at molecules not as a name but as a 2d or 3-d structure and thinking about what metabolites might be and which pockets they're able to bind. >> but there are also blood brain barriers so what happened
at the systemic level? would it interact also -- >> you're absolutely right. so is drug response complicated? yes. i don't think we're going to have everything solved any time soon. >> integrate data sources to
form hypotheses is excellent. i work in functional genomics. and one big frustration is many screens are being done now and enup as supplements, eater pdf file or excel file on the back of publication. going forward how do you see the field bringing that data to the
mainstream like what's been done -- >> so that's a great question. we have two examples where things had -- three&oã‘ examples. so three dimensional structural data, nobody asked questions, i talked to gms veteran whose said that was not a trivial thing to
achieve. gene expression data for the most part has been going into go though there are logistical issues with rna seek but the size of these databases, be careful what you ask for. i think genetic and gwas data with db gap is good and pubmed
and others. but you're right as these fields as these fields are emerging, if it's a screen not of a standard type. a lot of drug screens are of pe could yawr interest, hard to imagine how to create database when they're a one off
experiment. though it's 5 or 50,000 measurements, it's the only time the data is in this exact format so my informatics friends and i think about this a lot because it is technically not difficult to come up with a language that describes the data and then you
produce the data through xml and things like this. you need the political wealth and scientific will to take that extra step to make the data available. so many scientist it is moment the paper is accepted, it's like thank god that's out.
let's move to the next thing. so that very unsexy, very unsatisfying activity of formatting data for future generations just gets lost. i think there might be a role for journal editors and funding agencies. >> thank you.
>> without using too heavy of a hammer. >> so at the beginning of your talk you show us the database of drug resistance. and then you followed with the future balloons slide. i was kind of almost waiting that the next slide will come
how those snps affect those and it didn't. so -- >> we're very interested testimony question is a good one. if we have information about human variation that affects drug response and if we have 3-d
structures or why done we put them together and have studies how the 3-d structure changes in the context of an amino acid change and how that might explain the phenotypes. i'm happy to tell you that one of the post-docs in my lab is doing that specific project on
war fxorin target. so vkroc-1, there was a bacterial homologue published from bacteria obviously, she has built a human model of the bacterial homologue and is now starting to mutate the site for the known human variations and seeing if we can predict the
phenotype. so i absolutely agree with you, just didn't talk about it today. perfect timing. >> let's thank dr. altman again. >> please join us in the library for refreshments. before you go, please note that the next wals talk will be on
november 2nd by dr. kenneth fishbeck.
0 comments:
Post a Comment