Building the process-drug–side effect network to discover the relationship between biological processes and side effects

Lee et al. BMC Bioinformatics 2011, 12(Suppl 2):S2http://www.biomedcentral.com/1471-2105/12/S2/S2 Building the process-drug–side effect network todiscover the relationship between biologicalProcesses and side effects Sejoon Lee1, Kwang H Lee1, Min Song2*, Doheon Lee1* From Fourth International Workshop on Data and Text Mining in Biomedical Informatics (DTMBio) 2010Toronto, Canada. 26 October 2010 Background: Side effects are unwanted responses to drug treatment and are important resources for humanphenotype information. The recent development of a database on side effects, the side effect resource (SIDER), is afirst step in documenting the relationship between drugs and their side effects. It is, however, insufficient to simplyfind the association of drugs with biological processes; that relationship is crucial because drugs that influencebiological processes can have an impact on phenotype. Therefore, knowing which processes respond to drugs thatinfluence the phenotype will enable more effective and systematic study of the effect of drugs on phenotype. Tothe best of our knowledge, the relationship between biological processes and side effects of drugs has not yetbeen systematically researched.
Methods: We propose 3 steps for systematically searching relationships between drugs and biologicalprocesses: enrichment scores (ES) calculations, t-score calculation, and threshold-based filtering. Subsequently,the side effect-related biological processes are found by merging the drug-biological process network and thedrug-side effect network. Evaluation is conducted in 2 ways: first, by discerning the number of biologicalprocesses discovered by our method that co-occur with Gene Ontology (GO) terms in relation to effectsextracted from PubMed records using a text-mining technique and second, determining whether there isimprovement in performance by limiting response processes by drugs sharing the same side effect tofrequent ones alone.
Results: The multi-level network (the process-drug-side effect network) was built by merging the drug-biologicalprocess network and the drug-side effect network. We generated a network of 74 drugs-168 side effects-2209biological process relation resources. The preliminary results showed that the process-drug-side effect network wasable to find meaningful relationships between biological processes and side effects in an efficient manner.
Conclusions: We propose a novel process-drug-side effect network for discovering the relationship betweenbiological processes and side effects. By exploring the relationship between drugs and phenotypes through amulti-level network, the mechanisms underlying the effect of specific drugs on the human body may beunderstood.
* Correspondence: 1Bio and Brain Engineering Department, KAIST, Daejeon 305-701, SouthKorea2Information Systems Department, New Jersey Institute of Technology,University Heights, Newark, USAFull list of author information is available at the end of the article 2011 Lee et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative CommonsAttribution License ), which permits unrestricted use, distribution, and reproduction inany medium, provided the original work is properly cited.

Lee et al. BMC Bioinformatics 2011, 12(Suppl 2):S2 gene expression responses to drugs [. SIDER is a Side effects are unwanted responses to drug treatment, recently developed database on side effects to document and they are important resources of human phenotype the relationship between drugs and side effects []. The information. Drugs bind to target proteins and affect bio- connectivity map provides drug-responsive gene expres- logical processes, and the processes cause phenotype sion information, and SIDER provides drug-side effect effect. However, drugs may also bind to off-target proteins, which affects other biological processes and causes adverse By utilizing the connectivity map and SIDER, we reactions (Figure Side effects occur mainly when drugs aimed to automatically discover the relationship between bind to unintended off-targets. These side effects vary biological processes and side effects by building a multi- from simple symptoms, such as headache, to critical symp- level network of drug-biological processes influenced by toms, such as carcinoma. Most side effects are harmful to the association of targets with side effects.
humans, but side effects can also be utilized to find new Figure is an example of our approach. If drug 1, 2, uses for known drugs, such as Viagra. Therefore, it is or 3 induces the same side effect, their common highly desirable to automatically discover new targets for response (biological process2) is potentially related to known drugs and to understand the mechanisms that their side effect. To examine these relationships, SIDER cause side effects for target-specific treatments.
was used to construct the drug-side effect network In their paper published in Science, Campellos et al (Fig. SIDER provides information on the frequency reported finding new targets based on drugs with similar of connections between drugs and side effects. The side effects [. They used an ABC network model built drug-side effect relationships are filtered based on the with (A) drugs developed for new targets, (B) targets, frequency of relevant information to construct a reliable and (C) side effects. Similarly, Keiser used chemical drug-side effect network. The drug responsive biological similarity to find new targets for a known drug [. Kei- process network was also constructed using drug ser's approach enabled the discovery of off-targets of a responsive gene expression profiles (Fig. known drug but did not consider the relationship Gene ontology (GO) terms were used for biological between a drug and its biological process.
processes, and gene set enrichment scores (ES) were Like Keiser's and Campellos's studies, most previous used to find which processes were upregulated or research was focused mostly on finding off-target pro- downregulated by the drugs. Subsequently, an ABC teins causing the side effects. In addition, the biological network model was built (A, processes; B, drugs; and processes that are affected by the drug target need to be C, side effects) to find relationships between side considered because they cause phenotypical responses in effects and biological processes (Fig. The results the human biological system. A drug that influences show that many processes found in the drug-process biological processes can also have an impact on pheno- network were meaningful and were confirmed by pre- type. Therefore, if the biological process that responds vious studies. In addition, a novel network consisting to a drug influencing the phenotype is known then of 168 effects and 2,209 biological processes was con- drugs pertinent to the phenotype can be studied more structed, and these relationships based on the ABC effectively and systematically. To date, the relationships model were also confirmed to be significant by support between biological processes and side effects have not from the literature. Finally, evaluations were conducted been systematically researched.
in 2 ways: first, by quantifying how many biological Two databases are available for studying relationships between side effects and biological processes: the connec-tivity map and side effects resource (SIDER). The con-nectivity map is developed to generate and analyze adrug-gene-disease network from large-scale experimental Figure 2 Concept of discovering side effect-related biologicalprocesses. A: Drug-Side effect network; B: Drug-Biological processes Figure 1 Flow of drug treatments and adverse reaction.
Lee et al. BMC Bioinformatics 2011, 12(Suppl 2):S2 processes were found by our method and were concur- Drug-biological process network construction rently found in GO terms with effects extracted from Figure illustrates an overview of the approach to con- the PubMed records using a text-mining technique structing the drug-process network. To find a drug- and second, whether there was an improvement in per- responsive biological process, gene rank information formance by limiting response processes by drugs shar- from the connectivity map and gene set information ing the same side effect to frequent ones alone. The available in GO were used. The ES for each GO term experimental results showed that our process-drug-side was calculated to find significant terms. Subsequently, effect network was able to reveal meaningful relation- the t-score was calculated to measure the significance of ships between biological processes and side effects in each process of the drug in question. Finally, a threshold an efficient manner.
T was applied to remove insignificant data between In addition to comprehensive evaluation, our method drugs and biological processes.
contributes to systematically finding relationships between drugs and biological processes using ES scores A connectivity map was used to construct a drug- calculations, t-score calculation, and threshold-based fil- responsive process database. The connectivity map is a tering. Second, side effect-related biological processes collection of genome-wide transcriptional expression are revealed by merging the drug-biological process net- data from cultured human cells treated with bioactive work and the drug-side effect network. Finally, data on small molecules [The connectivity map contains 74 drugs, 168 effects, and 2209 biological process rela- 6,100 expression profiles representing 1,309 compounds.
tion resources were generated.
The connectivity map provided rank information ofprobes for each sample. There were 22,283 probes and Datasets and methods 6,100 samples in the rank matrix. Probe sets were To discover the relationships between side effects and bio- ranked in descending order of d, where d is the ratio of logical processes, 2 networks were constructed: the drug- the corresponding treatment-to-control values. There- biological process network and the drug-side effect net- fore, "top rank" means probes that are more highly work. Side effect and biological process relationships were upregulated than the control; "bottom rank" means automatically revealed by connecting the 2 networks.
probes that are more highly downregulated than the Figure 3 Schematic diagram for inferring relationships between biological processes (GO) and drugs.
Lee et al. BMC Bioinformatics 2011, 12(Suppl 2):S2 control. Top rank genes are positively affected by drugs, The ES for gene set i was calculated as follows: and bottom rank genes are negatively affected by drugs.
Gene Ontology GO was used as a resource for biological processes. The GO project provided term definitions representing geneproduct properties in 3 categories cellular compo- ES is the maximum deviation from zero of Sumij. For nent, molecular function, and biological processes a randomly distributed gene set, Si, ESi will be relatively Gene Set Enrichment Analysis small, but if it is concentrated at the top or the bottom Gene Set Enrichment Analysis (GSEA) was used to of the list, or otherwise non-randomly distributed, then show the relationship of processes to drugs. GSEA is a ESi will be correspondingly high.
gene expression profile analysis technique used for find- Process significance calculation ing the significance of a function, pathway, or GO cate- A t-score was used to show the significance of each pro- gory It calculates an ES that reflects the degree to cess. To get a normalized t-score robust to outliers, the which set S is over-represented at the extremes (top or ESs were standardized with the median-MAD normali- bottom) of the entire ranked list L. The score is calcu- zation method for each process ESij was used to lated by walking down the L, increasing a running-sum denote an ES of process i = {1,2,…p} from sample j = statistic when a gene in S is encountered, and decreas- ing it when a gene not in S is encountered. ES is themaximum deviation from zero encountered in the ran- In this approach, gene sets S i = {1,…,n} are defined by GO terms and ranking information of each gene L j = Both MEDi and MADi were used to represent the {1,…,k} from the connectivity map. The ESs of each median, and the median absolute deviation of enrich- gene set were calculated in 6,100 samples. ESs of upre- ment scores for biological process i. The scale factor of gulated processes were calculated based on the ranked 1.4826 in the above equation was used to make MADi list; ESs of downregulated processes were calculated an estimator of s.
using the reversed ranked list.
Drug-side effect network construction Side effect resource (SIDER) if g is in geneset S SIDER was developed to discover the relationships between side effects and drugs, and SIDER connects 888 not in geneset S drugs to 1,450 types of side effects It contains fre- quency of occurrence information between drugs and sideeffects for one-third of the drug-side effect pairs. (Table Drug-side effect network construction Cij is defined as a summing factor of a gene gj that is Drug-side effect relationships available in SIDER are drawn from L. N is the number of total genes in L, and incomplete because side effects do not occur in gene Ns is the number of genes in the gene set Si.
expression data every time. Therefore, drug-side effect Then the running sum Sumij for each sample against relationships appearing in SIDER needed to be filtered gene j is calculated using the following equation: to find highly occurring relationships of gene expression data. Among the 120,598 drug-side effect relationships in SIDER, however, only 15,672 relations have a fre- ⎨ S +C for j = 2,.,k quency higher than 5%. Most relations had no informa-tion about frequency. Twenty percent was set as a Table 1 Examples of SIDER information Description of frequency A search tool for Interactions of chemicals (STITCH) ID is represented as a compound ID in STITCH databases. A unified medical language system (UMLS) conceptID implies a description of frequency that consists of 4 types: postmarketing, rare, infrequent, and frequent. For frequent cases, a percentage is used instead ofthe word "frequent." Lee et al. BMC Bioinformatics 2011, 12(Suppl 2):S2 threshold of frequency to find drug-side effect relation- normalized drug information for 1,494 FDA-approved ships (Additional file Finally, 6,197 filtered relations drugs. The file "drugcards.zip" was downloaded from were used to construct the drug-side effect network.
the DrugBank Three fields, i.e., drug ID, synonym,and brand names, were used to normalize drug names Biological process-side effect network construction between the AB network and the BC network. Because Lastly, the biological process-side effect network was of the small number of side effects with frequency infor- built. Figure shows the method used for finding rela- mation, only 74 drugs were included in both the AB tionships between side effects and biological processes.
and BC networks. Finally, using the 74 drugs with 168 The hypothesis used was that frequent responses to effects and 2,209 processes network, data on 63,878 drugs causing the same side effect have higher probabil- relationships were generated.
ities of correlation with a side effect than less frequent To illustrate the construction of the side effect-biologi- cal process network, the example of tamoxifen was used.
Connecting drug-process and drug-side effect networks Tamoxifen is one of drugs present in both the drug-pro- To find relationships between biological processes and cess network and the drug-side effect network, and it is side effects, drug information was used as a bridge used as a mediator to connect the 2 networks (Figure between the 2 networks, the drug-biological process net- Discovering side effect-related processes from the drug- work and the drug-side effect network. This can be process-side effect network represented as an ABC model consisting of A, biological Co-occurrence-based scoring was used to determine processes; B, drugs; and C, side-effects. To merge the 2 how many drugs shared the same side effect in each networks, the drug names needed to be normalized process. A biological process that has a high co-occur- because the connectivity map and SIDER use different rence score implies that the process is closely related to drug identification. DrugBank was used to obtain Figure 4 Schematic diagram for discovering side effect-biological process relationships. Nausea, which is the sensation of unease anddiscomfort in the stomach with an urge to vomit, is an example of a side effect. In this example, 3 of 5 drugs known to cause nausea arerelated to anti-oxidant activity, but the other processes were perturbed by only 1 or 2 drugs. Based on this connectivity, the scores werecalculated to find possible processes causing the side effects. Finally, the processes were analyzed to ensure whether the side effect-biologicalprocess relationships revealed by this approach were meaningful.

Lee et al. BMC Bioinformatics 2011, 12(Suppl 2):S2 Figure 5 Tamoxifen-mediated drug-process network and drug-side effect network. Tamoxifen causes six types of side effects that arereported with a frequency of greater than 20%. We found 10 significant upregulated biological processes associated with tamoxifen (p < 0.001).
the targeted side effect; therefore, side effect data are Evaluation method only used when at least 2 drugs are related.
The constructed network was evaluated by examining Scoreij was used to denote the co-occurrence score of the significance of relationships between biological pro- a process i = {1,2,…n} in a side effect j = {1,2,…,n}. For cesses and side effects provided by the network. The sig- each side effect i = {1,2,.n}, CDij is used to represent the nificance of relationships was measured by comparing number of drugs that have the co-occurring process i biological processes represented by GO terms with the related to a side effect j, and TDj is used to represent co-occurrence of GO terms and effect names appearing the number of total drugs related to the side effect j.
in PubMed records. The first and second steps wereused to calculate the co-occurrence of effect names and GO terms. First, a set of PubMed records with an effect name was used as a query. The "[abstract/title]" qualifier was used in the PubMed search to ensure that effect In the drug-process-side effect network, nausea is the names appeared in abstracts or titles. Secondly, because most common side effect and is connected to 26 drugs.
it is not easy to extract noun phrases from GO terms by To investigate how many drugs with the same processes using a simple exact string match, significant phrases were significant, drug-side effect relations were randomly were used. To this end, the following text-mining tech- generated. The processes were determined by randomly niques were used: a conditional random field (CRF)- selecting 74 drugs (2 26) for each side effect, repeated based sentence segmentation technique was used to 1,000 times. The distribution was then determined usingthe number of related drugs on processes, and the pro-cesses with a p-value less than 0.05 were analyzed.
Table 2 Side effect-related process threshold Table shows the total number of drugs causing side Number of total drugs causing side effects and how many co-occurring drugs are significant in the total number of drugs. In the case of total drugs ranging from 2 to 5, co-significant processes in more than 2 drugs are significant to side effects.
Lee et al. BMC Bioinformatics 2011, 12(Suppl 2):S2 parse abstracts , the sentence was tokenized with the Tamoxifen is an antagonist of estrogen receptors in breast part-of-speech (POS) technique using an extension of tissue Some breast cancers require estrogen to grow.
the Brill POS tagger [, and noun phrase groups were Estrogen binds to and activates the estrogen receptors in extracted with a text chunking technique that spe- these cells. Tamoxifen is metabolized into compounds cialized in biomedical data collections. Thirdly, the that bind to estrogen receptors but do not activate them.
extracted noun phrases were compared with GO terms, As a result, tamoxifen prevents estrogen from binding to and the number of matched phrases was stored along receptors, and breast cancer cell growth is blocked.
with the phrases. The comparison between extracted Table shows significant processes related to tamoxi- phrases and GO terms was based on string similarity fen in MCF7 cells (breast cancer cell line) using our between the 2, and the shortest path-based edit distance method. The most significant GO term is nucleoside (SPED) technique was used. The SPED technique is diphosphate kinase activity, and Neeman's experiments a variation of Markov random field-based edit distance support that nucleoside diphosphate is higher in the (MRFED) and calculates the shortest path between 2 tamoxifen-treated cells ]. Tamoxifen also upregulates selected vertices of a graph. Various thresholds were low-density lipoprotein receptor binding according to tested for string similarities, and the threshold was set Suarez's study These results show that biological at 0.55 since it gave the best performance. Table processes in our drug-biological upregulated process shows the number of abstracts found in PubMed and relationships are meaningful in drug response profiles.
the total GO terms evaluated for rash and urinary tract Table shows that there are 6 downregulated pro- infection (UTI); 2,209 GO terms were utilized to calcu- cesses for tamoxifen. Translation elongation factor activ- late co-occurrence scores for evaluation.
ity is highly related to tamoxifen in MCF-7 cells. Asreported by Byun , translational elongation factor Results and discussion are underwent by tamoxifen. Cilium is known as cellular The goodness of the discovered relations was confirmed GPS, and is crucial to wound repair. For cilium, the per- using a survey of literature. First, the drug-biological ipheral loss of cilia function is reported in tamoxifen process network was analyzed using the tamoxifen case treats cell Tamoxifen reduced proteoglycan synth- study to show the significance of our method. Secondly, esis in an in vivo study [Finally, Lahoute found that the ABC network model for A, processes; B, drugs; and tamoxifen induced a loss of serum response factor C, side-effects was analyzed to find relationships (SRF), which induces downregulation of skeletal muscle between side effects and biological processes. Two case fiber development These results confirm that biolo- studies are used as examples to show the meaningful- gical processes in the drug-biological downregulated ness of the network. Finally, the performance of the net- processes relationships are also meaningful in drug work was evaluated by comparing the number of response profiles.
matched GO terms extracted by a text-mining methodthat was applied to a large number of PubMed abstracts.
Biological process-side effect networkThe biological process-side effect network contains Drug-biological process network 63,878 biological process-side effect pairs and covers a The network connects 1,309 drugs to 3,629 GO terms total of 168 side effects and 2,209 processes. In this net- with its ES. The GO terms are varied and some GO work, there are 37,280 upregulated biological process- terms are too broad to interpret the relations; therefore, side effect pairs with a total of 168 side effects and GO terms with less than 31 genes in human were chosen.
Highly relevant GO terms with a t-score greater than 3.0 Table 4 Upregulated tamoxifen-related processes in the (approximately p = 0.001) were also chosen. A positive drug-process network association is more upregulated than the control; a nega- tive association is more downregulated than the control.
nucleoside diphosphate kinase activity Case study—Tamoxifen-related biological processes in the NADP or NADPH binding constructed networkFor the case study of the drug-process network, tamoxifen was chosen because of its well-known mechanism.
substrate-bound cell migration low-density lipoprotein receptor binding acid-thiol ligase activity Table 3 Datasets for evaluation coenzyme catabolic process Urinary tract infection actin filament bundle formation histone acetyltransferase binding arginine catabolic process

Lee et al. BMC Bioinformatics 2011, 12(Suppl 2):S2 Table 5 Down regulated tamoxifen-related processes in Case study—Nausea-related biological processes in the the drug-process network biological processes-side effect network In the case study of nausea, the most common cause is Translation elongation factor activity gastroenteritis or food poisoning, but nausea also fre- quently occurs as a medication side effect. Nausea is Hexose biosynthetic process connected to 26 drugs in the drug-side effect network.
Proteoglycan biosynthetic process For random sampling analysis, a score greater than or equal to 0.15 was considered significant (p < 0.05).
Skeletal muscle fiber development Table shows 3 upregulated processes related to nau- sea. For example, Yoneyama et al found that adenosinedeaminase activity (ADA) was related to hyperemesis 1,736 processes (Additional file . Furthermore, there gravidarum (vomiting and nausea) [Chemothera- are 26,598 downregulated biological process-side effect peutic agents induce oxidative damage in the gastroin- pairs, 168 side effects, and 1,430 processes (Additional testinal tract, causing nausea and vomiting; therefore, file Figure shows the statistics of upregulated upregulated antioxidant activity is needed to reduce oxi- processes. To apply our algorithm, the side effects of dative damages [. Also, nausea occurs when blood more than 1 drug need to be considered. We finally sugar rises rapidly [and the cellular carbohydrate used 119 effects and 744 processes with 4581 relations.
catabolic process is noted for increasing the blood sugar level in the body.
Table shows downregulated processes that are related to nausea. In human studies, treatment withcytokines is often accompanied by nausea Synap-tic vesicle endocytosis may subsequently be used forneurotransmitter storage Neurotransmitters arealso involved in relaying messages of nausea andvomiting Case study—Anemia-related biological processes in thebiological processes-side effect networkAnemia is defined as a qualitative or quantitative defi-ciency of hemoglobin, which is a molecular substanceinside red blood cells. As hemoglobin carries oxygenfrom the lungs to the tissues, anemia leads to hypoxia inorgans. Anemia is connected to 10 drugs in the drug-side effect network. A random sampling analysis scoregreater than or equal to 0.3 was considered significant(p < 0.05).
Table shows anemia-related upregulated processes.
Cytochrome b5 reductase is an enzyme in the blood. Itcontrols the amount of iron in red blood cells and helpsthe cells carry oxygen. Therefore, cytochrome b5 reduc-tase is highly related to anemia. Antioxidant activity ofblood serum is highly related to anemia Anemiasearch results are similar to those of nausea(GO:0016209, GO:0044275) because 8 of 10 drugs caus- Figure 6 Network statistics in drug upregulated biological ing anemia also cause nausea.
process-side effect network. Figure 6A shows the relationshipbetween side effects and the total number of connected drugs inupregulated processes. The range of the total number of drugs is 1to 26. It shows that 49 side effects occurred with only 1 drug, and Table 6 Nausea-related upregulated processes 26 drugs caused nausea. Figure 6B shows that most scores of relations (about 88%) are less than 0.5. Half of relation scores are less than 0.2. Further, only 543 relation scores are greater than orequal to 0.5. This means that many significant processes are not Deaminase activity over-represented among drugs. Therefore, a threshold needs to be Antioxidant activity determined to show which processes are highly related to which Cellular carbohydrate catabolic side effect (Table Lee et al. BMC Bioinformatics 2011, 12(Suppl 2):S2 Table 7 Nausea-related downregulated processes Table 9 Anemia related down-regulated processes Regulation of cytokine production during immune response Regulation of cytokine production Alpha-beta T-cell activation during immune response Synaptic vesicle endocytosis Intramolecular oxidoreductase activity Synaptic vesicle endocytosis mining results, 35 were found in the top 40% of theresults, and 45 processes were in the top 50% of the Table shows downregulated processes related to anemia. Regulation of cytokine production during It was assumed that more frequent responsive pro- immune response was related to anemia in a previous cesses to drugs causing the same side effect have higher study [Iron deficiency induces anemia and neuro- probabilities of correlation with a side effect than less transmitter deficiency. Synaptic vesicle endocytosis may frequent responsive processes. The hypothesis was subsequently be used for neurotransmitter storage tested with rash and UTI cases. In Figure , the rash2 Downregulated activity of synaptic vesicle endocytosis bar (blue) includes less frequent response processes, and induces neurotransmitter deficiency.
the rash3 bar (red) includes only significant frequentresponse processes. For the rash2 bar, we found 100 Evaluation result related processes. Eleven processes (11%) were found in Two different side effects, i.e., rash, and UTI, were the top 10% of the text-mining results, 21 (21%) were in used for evaluation by retrieving PubMed records for the top 20% of the results, 30 (30%) were in the top each side effect and calculating the co-occurrence 30% of the results, 48 (48%) were in the top 40% of the scores for each GO term. Figure shows the co-occur- results, and 55 processes (55%) were in the top 50% of rence scores for each GO term for 2 cases. To evaluate the results. The rash3 bar shows 16 significant frequent the significance of discovered biological processes, the response processes by drugs. Two processes (13%) were top 10%, 20%, 30%, 40%, and 50% scores in the distri- in the top 10%, 4 (25%) were in the top 20%, 6 (38%) bution were selected, as shown in Figure This were in the top 30%, 7 (44%) were in the top 40%, and threshold was used to examine the significance of the 9 processes (56%) were in the top 50%. For all results, processes in each top n%.
except 40%, rash3 performs better than rash2 in terms Figure shows the number of matched terms between of the proportion of processes discovered over the top n our approach and the results of the text-mining method ranked processed (Fig. for GO terms extracted from PubMed.
In Figure the UTI2 bar (blue) includes less fre- For rash, our method showed 116 GO-related terms.
quent response processes, and the UTI3 bar (red) only Of 116 processes, 13 were found in the top 10% of the includes significant frequent response processes. For the text-mining results, 25 processes were found in the top UTI2 bar, our method found 73 related processes. Seven 20% of the results, 36 processes were found in the processes (10%) were found in the top 10%, 11 (15%) top 30% of the results, 55 processes were found in the were found in the top 20% of the results, 21 (29%) were top 40% of the results, and 64 processes were in the top found in the top 30%, 33 (45%) were found in the top 50% of the results. For UTI, our method shows 76 40%, and 42 processes (58%) were found in the top 50%.
GO-related terms. Of 76 processes, 8 were found in the As indicated by the UTI3 bar, our method found 3 fre- top 10% of the results, 13 were found in the top 20% of quent response processes by drugs. One process (33%) the results, 23 were found in the top 30% of the text- was found in the top 10%, 2 processes (67%) in the top20%, 30%, and 40%, and 3 processes (100%) in the top50% of the results. This shows that UTI3 performed bet- Table 8 Anemia related up-regulated processes ter than UTI2 in all 5 cases (Fig. and confirms that our method was able to find relationships between bio-logical processes and side effects.
Antioxidant activity Eukaryotic translation initiationfactor 3 complex Cytochrome-b5 reductase activity In this paper, we proposed a new approach for automa- Cellular carbohydrate catabolic tically discovering relationships between biological pro- cesses and side effects using the co-occurrence based

Lee et al. BMC Bioinformatics 2011, 12(Suppl 2):S2 Figure 7 Literature-based co-occurrence score distribution of 2 side effects. Top 20 processes are omitted in this graph because of rangeproblem.
multi-level network. We built the drug-biological pro-cess network, and showed that our method can be usedto discover drug related significant processes (as shownin the example of tamoxifen). In addition, we built anABC Model (using A, biological processes; B, drugs; andC, side effect information) for 74 drugs, 168 side effects,and 2,209 biological processes. A literature analysis con-firmed that relations between side effects and biologicalprocesses found by co-occurrence were meaningful. Inaddition, our method was evaluated using a text-miningtechnique to extract co-occurring GO terms with effects.
The results showed that our method is efficient and use-ful for finding relationships between biological processesand side effects.
Figure 8 The number of processes matched with text-mining In a future study, the scoring scheme will be improved results for rash, and UTI. The x axis is the top n% of co-occurred because the current scoring algorithm considers all GO terms with biological processes (total 2,209). The y axis is the drugs equally regardless of the number of side effects or number of processes with scores greater than the top n% (x axis) the number of biological processes associated with threshold of the total process scores.
them. For example, drug A has only 1 side effect (s-1), Figure 9 Evaluation of our hypothesis for rash and UTI. The x axis is the top n% for the total process scores. The y axis is percentage ofprocesses with scores greater than the top n% (x axis) threshold of the total processes scores.
Lee et al. BMC Bioinformatics 2011, 12(Suppl 2):S2 whereas drug B has 2 side effects (s1 and s2), with all Authors' contributions other settings the same, including association with bio- LS designed the method and drafted the manuscript along with MS. MSalso critically revised the manuscript for important intellectual content. KHL logical process (p). In this case, drug A provides more and DL supervised the work and gave final approval of the version of the reliable information on the association of s1and p than manuscript to be submitted.
drug B. However, the proposed scoring scheme cannot Competing interests reflect this, thus causing a loss of information for a The authors declare that they have no competing interests.
more accurate association. We also plan to investigatewhether biological processes related to side effects are Published: 29 March 2011 valuable resources in elucidating the mechanism of drug effects. Instead of using the text-mining technique, a Campillos M, Kuhn M, Gavin AC, Jensen LJ, Bork P: manual evaluation will be conducted to identify undis- In Science. Volume 321. New York, covered relationships from process-side effect pairs that Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, Jensen NH, are not mentioned in literature. In addition, we are Kuijer MB, Matos RC, Tran TB, et al: interested in a research on personalized drug responsive Nature 2009, 462(7270):175-181.
expression data by applying multi-level networks for Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, Lerner J,Brunet JP, Subramanian A, Ross KN, et al: personalized medicine. By exploring the relationship between drugs and phenotypes on the multi-level net- In Science. Volume 313. New York, NY; 2006:(5795):1929-1935.
work, we will be able to understand the mechanisms Kuhn M, Campillos M, Letunic I, Jensen LJ, Bork P: Molecular systems biology 6:343.
underlying drug involvement in the human body.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP,Dolinski K, Dwight SS, Eppig JT, et al: Nature genetics Additional material 2000, 25(1):25-29.
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, file contains drug names and related side Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al: effect names which are reported with frequency of greater than 20% from SIDER. First Column: Drug Bank ID Second Column: Drug Proceedings of the National Academy of Sciences of the name Third Column: Effect ID ( UMLS Concept ID) Fourth Column: United States of America 2005, 102(43):15545-15550.
Effect name.
Wu Biostatistics2007, 8(3):566-575.
file contains up_regulated processes (T-score Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, > 3.0) and related effects. First Column: Effect ID ( UMLS Concept Hassanali M: DrugBank: a knowledgebase for drugs, drug actions and ID) Second Column: Process ID ( Gene Ontology ID) Third Column: drug targets. Nucleic Acids Res 2008, 34:D901-6.
The number of drugs which affect to process and causing the side Lafferty J, McCallum A, Pereira F: Condition random fields: probabilistic effect. Fourth Column: Total drugs which are causing the side models for segmenting and labeling sequence data. Proc. 18th International conf. on Machine Learning 282-289.
file contains down_regulated processes (T- Brill E: A simple rule-based part of speech tagger. HLT'91 Proceeding of the score > 3.0) and related effects. First Column: Effect ID ( UMLS workshop on speech and Natural Language 1992, 112-116.
Concept ID) Second Column: Process ID ( Gene Ontology ID) Third Phan X: CRFChunker: CRF English Phrase Chunker. 2006 Column: The number of drugs which affect to process and causing the side effect. Fourth Column: Total drugs which are causing the Rudniy A, Song M, Geller side effect.
International Journal of Data Mining and Bioinformatics2010.
Hodges LC, Cook JD, Lobenhofer EK, Li L, Bennett L, Bushel PR, Aldaz CM,Afshari CA, Walker CL: Mol Cancer Res 2003, 1(4):300-311.
This work was supported by the Korean Systems Biology Research Project Neeman M, Degani H: (20100002164), the World Class University program (R32-2008-000-10218-0) and the Basic Research Laboratory grant (2009-0086964) of the Ministry of Cancer research 1989, 49(3):589-594.
Education, Science and Technology through the National Research Suarez Y, Fernandez C, Gomez-Coronado D, Ferruelo AJ, Davalos A, Foundation of Korea. It was also supported by the Korea Institute of Science Martinez-Botas J, Lasuncion MA: and Technology Information (KISTI).
This article has been published as part of BMC Bioinformatics Volume 12 research 2004, 64(2):346-355.
Supplement 2, 2011: Fourth International Workshop on Data and Text Byun HO, Han NK, Lee HJ, Kim KB, Ko YG, Yoon G, Lee YS, Hong SI, Lee JS: Mining in Bioinformatics (DTMBio) 2010. The full contents of the supplement are available online at Cancer research 2009, 69(11):4638-4647.
Kesterson RA, Berbari NF, Pasek RC, Yoder BK: Methods in cell biology 2009, 94:163-179.
1Bio and Brain Engineering Department, KAIST, Daejeon 305-701, South Rosner IA, Malemud CJ, Hassid AI, Goldberg VM, Boja BA, Moskowitz RW: Korea. 2Information Systems Department, New Jersey Institute of Technology, University Heights, Newark, USA.
Prostaglandins 1983, 26(1):123-138.
Lee et al. BMC Bioinformatics 2011, 12(Suppl 2):S2 Lahoute C, Sotiropoulos A, Favier M, Guillet-Deniau I, Charvet C, Ferry A,Butler-Browne G, Metzger D, Tuil D, Daegelen D: PloS one 2008, 3(12):e3910.
Yoneyama Y, Sawa R, Suzuki S, Otsubo Y, Araki Clinicachimica acta; international journal of clinical chemistry 2002, 324(1-2):141-145.
Wang CZ, Fishbein A, Aung HH, Mehendale SR, Chang WT, Xie JT, Li J,Yuan CS: Polyphenol contents in grape-seed extracts correlate withantipica effects in cisplatin-treated rats. In Journal of alternative andcomplementary medicine. Volume 11. New York, NY; 2005:(6):1059-1065.
Ezrin C, Salter JM, Ogryzlo MA, Best CH: The Clinical and Metabolic Effectsof Glucagon. Can Med Assoc J 1957, 78(2):3.
Kronfol Z, Remick The American journal of psychiatry 2000, 157(5):683-694.
Hesketh PJ: The NewEngland journal of medicine 2008, 358(23):2482-2494.
Levina AA, Andreeva AP, Tsvetaeva NV, Tsibul'skaia MM, Minaeva LM,Tokarev Iu N: Gematologiia i transfuziologiia 1991,36(7):11-14.
Morceau F, Dicato M, Diederich M: Mediators ofinflammation 2009, 2009:405016.
doi:10.1186/1471-2105-12-S2-S2Cite this article as: Lee et al.: Building the process-drug–side effectnetwork to discover the relationship between biological Processes andside effects. BMC Bioinformatics 2011 12(Suppl 2):S2.
Submit your next manuscript to BioMed Central
and take full advantage of:
• Convenient online submission
• Thorough peer review
• No space constraints or color figure charges
• Immediate publication on acceptance
• Inclusion in PubMed, CAS, Scopus and Google Scholar
• Research which is freely available for redistribution
Submit your manuscript at www.biomedcentral.com/submit

Source: http://biosoft.kaist.ac.kr/~dhlee/pubs/papers/2011BuildingProcessDrug.pdf

the 27th annual convention of the iacr – a report

The 27th Annual Convention of the IACR – A Report by Dr. Ujjwala M. WarawdekarGenetic Engineering Dept.ACTREC, Tata Memorial Centre,Navi Mumbai. The 27th Annual Convention of the IACR named IACRCON –2008 was held at the GCRI, Ahmedabad between the 7th and 9th February, 2008. It coincided with the celebrations of the silver jubilee of the Department of Cancer Biology of the GCRI and the birth centenary year of Dr. T B Patel, the Founder Director whose guidance and efforts have helped establish the Institute to its present stature. The Convention was spaced out between the first and the third day with a total of eight sessions covering various areas of Cancer research, basic and translational, with a global representation. The International Symposium on Frontiers in Functional Genomics between these two days was the high point of the Convention and featured talks on approaches and clinical trials applying latest technologies, to study, understand and treat Cancer.

Microsoft word - ul lafayette h1n1 prep _2_.docx

UL Lafayette GENERAL PANDEMIC GUIDE Seasonal (common) Flu • Caused by: Human influenza virus • Transmitted: From person to person • Immunity: o Most people have some immunity o Vaccine is available Pandemic flu would describe a new human virus that: • Is easily spread throughout the world