Marys Medicine

An integrated pharmacokinetics ontology and
corpus for text mining

Hengyi Wu1,† Email: Shreyas Karnik1,† Email: Abhinita Subhadarshini1,† Email: Zhiping Wang1,2,† Email: Santosh Philips3 Email: Xu Han1,3,4 Email: Chienwei Chiang1 Email: Lei Liu5 Email: Malaz Boustani6 Email: Luis M Rocha7 Email: Sara K Quinney3,9 Email: David Flockhart1,2,3,4,8,9 Email: Lang Li1,2,4,7,* Email: 1 Center for Computational Biology and Bioinformatics, School of Medicine, Indiana University, Indianapolis, IN, USA 2 Department of Medical and Molecular Genetics, School of Medicine, Indiana University, Indianapolis, IN, USA 3 Department of Pharmacology and Toxicology, School of Medicine, Indiana University, Indianapolis, IN, USA 4 Division of Clinical Pharmacology, School of Medicine, Indiana University, Indianapolis, IN, USA 5 Shanghai Center for Bioinformation and Technology, Shanghai 200235, China 6 Regenstrief Institute, Indianapolis, IN, USA 7 Informatics and Cognitive Science Center for Complex Networks and Systems Research, School of Informatics & Computing, Indianapolis, IN, USA 8 Indiana Institute of Personalized Medicine, Indianapolis, IN, USA 9 Department of Obstetrics and Gynecology, School of Medicine, Indiana University, Indianapolis, IN, USA * Corresponding author. Center for Computational Biology and Bioinformatics, School of Medicine, Indiana University, Indianapolis, IN, USA † Equal contributors. Abstract
Drug pharmacokinetics parameters, drug interaction parameters, and pharmacogenetics data have been unevenly collected in different databases and published extensively in the literature. Without appropriate pharmacokinetics ontology and a well annotated pharmacokinetics corpus, it will be difficult to develop text mining tools for pharmacokinetics data collection from the literature and pharmacokinetics data integration from multiple databases. Description
A comprehensive pharmacokinetics ontology was constructed. It can annotate all aspects of in vitro pharmacokinetics experiments and in vivo pharmacokinetics studies. It covers all drug metabolism and transportation enzymes. Using our pharmacokinetics ontology, a PK-corpus was constructed to present four classes of pharmacokinetics abstracts: in vivo pharmacokinetics studies, in vivo pharmacogenetic studies, in vivo drug interaction studies, and in vitro drug interaction studies. A novel hierarchical three level annotation scheme was proposed and implemented to tag key terms, drug interaction sentences, and drug interaction pairs. The utility of the pharmacokinetics ontology was demonstrated by annotating three pharmacokinetics studies; and the utility of the PK-corpus was demonstrated by a drug interaction extraction text mining analysis. Conclusions
The pharmacokinetics ontology annotates both in vitro pharmacokinetics experiments and in vivo pharmacokinetics studies. The PK-corpus is a highly valuable resource for the text mining of pharmacokinetics parameters and drug interactions. Background
Pharmacokinetics (PK) is a very important translational research field, which studies drug absorption, disposition, metabolism, excretion, and transportation (ADMET). PK systematically investigates the physiological and biochemical mechanisms of drug exposure in multiple tissue types, cells, animals, and human subjects [1]. There are two major molecular mechanisms of a drug‟s PK: metabolism and transportation. The drug metabolism mainly happens in the gut and liver; while drug transportation exists in all tissue types. If the PK can be interpreted as how a body does on the drug, pharmacodynamics (PD) can be defined as how a drug does on the body. A drug‟s pharmacodynamics effect ranges widely from the molecular signals (such as its targets or downstream biomarkers) to clinical symptoms (such as the efficacy or side effect endpoints) [1]. Drug-drug interaction (DDI) is another important pharmacology concept. It is defined as whether one drug‟s PK or PD response is changed due to the presence of another drug. PD based drug interaction has a wide range of interpretations (i.e. from molecular markers to clinical endpoints). PK based drug interaction mechanism is very well defined: metabolism enzyme based and transporter based DDIs. Pharmacogenetic (PG) variations in a drug‟s PK and PD pathways can also affect its responses [1]. In this paper, we will focus our discussion on the PK, PK based DDI, and PK related PG. Although significant efforts have been invested to integrate biochemistry, genetics, and clinical information for drugs, significant gaps exist in the area of PK. For example DrugBank ( doesn‟t have in vitro PK and its associated DDI data; DiDB ( doesn‟t have sufficient PG data; and PharmGKB ( doesn‟t have sufficient in vivo and in vitro PK and its associated DDI data. As an alternative approach to collect PK from the published literature, text mining has just started to be explored ([1–4] From either database construction or literature mining, the main challenge of PK data integration is the lack of PK ontology. This paper developed a PK ontology first. Then, a PK corpus was constructed. It facilitated DDI text mining from the literature. Construction and Content
PK Ontology is composed of several components: experiments, metabolism, transporter, drug, and subject (Table 1). Our primary contribution is the ontology development for the PK experiment, and integration of the PK experiment ontology with other PK-related ontologies. Table 1 PK Ontology Categories
Pharmacokinetics Pharmacokinetics Manually accumulated from text books and studies and parameters. literatures. There are two major categories: in vitro experiments and in vivo studies. Drug transportation Subject description for a pharmacokinetics study. It is composed three categories: disease, physiology, and demographics Experiment specifies in vitro and in vivo PK studies and their associated PK parameters. Table 2 presents definitions and units of the in vitro PK parameters. The PK parameters of the single drug metabolism experiment include Michaelis-Menten constant (Km), maximum velocity of the enzyme activity (Vmax), intrinsic clearance (CLint), metabolic ratio, and fraction of metabolism by an enzyme (fmenzyme) [5]. In the transporter experiment, the PK parameters include apparent permeability (Papp), ratio of the basolateral to apical permeability and apical to basolateral permeability (Re), radioactivity, and uptake volume [6]. There are multiple drug interaction mechanisms: competitive inhibition, non-competitive inhibition, uncompetitive inhibition, mechanism based inhibition, and induction [7]. IC50 is the inhibition concentration that inhibits to 50% enzyme activity; it is substrate dependent; and it doesn‟t imply the inhibition mechanism. Ki is the inhibition rate constant for competitive inhibition, noncompetitive inhibition, and uncompetitive inhibition. It represents the inhibition concentration that inhibits to 50% enzyme activity, and it is substrate concentration independent. Kdeg is the degradation rate constant for the enzyme. KI is the concentration of inhibitor associated with half maximal Inactivation in the mechanism based inhibition; and Kinact is the maximum degradation rate constant in the presence of a high concentration of inhibitor in the mechanism based inhibition. Emax is the maximum induction rate, and EC50 is the concentration of inducer that is associated with the half maximal induction Table 2 in vitro PK Parameters
Single Drug
Michaelis-Menten Maximum velocity of the mg h-1 enzyme activity. Intrinsic metabolic clearance is defined as ratio of maximum metabolism rate, Vmax, and the Michaelis-Menten constant, Km. Parent drug/metabolite concentration ratio Fraction of drug systemically available that is converted to a metabolite through a specific enzyme. The apparent permeability cm/sec of compounds across the monolayer cells. Re is the ratio of basolateral to apical over apical to basolateral. Total radioactivity in dpm/mg Transport plasma and bile samples is protein Consortium measured in a liquid scintillation counter radioactivity associated protein Consortium with the cells divided by its concentration in the incubation medium. Inhibitor concentration that mg L-1 inhibits to 50% of enzyme Inhibition rate constant for mg L-1 competitive inhibition, noncompetitive inhibition, and uncompetitive inhibition. The natural degradation rate constant for the The concentration of inhibitor associated with half maximal Inactivation in the mechanism based inhibition. The maximum degradation h-1 rate constant in the presence of a high concentration of inhibitor in the mechanism based inhibition. Maximum induction rate Unit free Rostami- Hodjegan and Tucker The concentration of inducer that is associated with the half maximal Type of Drug Competitive inhibition, Rostami-Hodjegan and noncompetitive inhibition, Tucker uncompetitive inhibition, mechanism based inhibition, and induction. Note: Segel H. Irwin. Enzyme Kinetics – Behavior and analysis of rapid equilibrium and steady state enzyme systems. John Wiley & Sons, Inc. 1975, New York. Rostami-Hodjegan Amin and Tucker Geoff „In silico‟ simulations to assess the „in vivo‟ consequences of „in vitro‟ metabolic drug-drug interactions. Drug Discovery Today, 2004, 1, 441 – 448. The International Transporter Consortium, Membrane transporters in drug development. Nature Review Drug Discovery, 9, 215–236. Rowland Malcolm and Tozer N. Thomas Clinical Pharmacokinetics Concepts and Applications, 3rd edition. 1995, Lippincott Williams & Wilkins. The in vitro experiment conditions are presented in Table 3. Metabolism enzyme experiment conditions include buffer, NADPH sources, and protein sources. In particular, protein sources include recombinant enzymes, microsomes, hepatocytes, and etc. Sometimes, genotype information is available for the microsome or hepatocyte samples. Transporter experiment conditions include bi-directional transporter, uptake/efflux, and ATPase. Other factors of in vitro experiments include pre-incubation time, incubation time, quantification methods, sample size, and data analysis methods. All these info can be found in the FDA website ( Table 3 in vitro Experiment Conditions
Experimental drugs
Substrate, metabolite, and inhibitor/inducer Conditions: Interaction Guidance, Salt composition EDTA concentration MgCl2 concentration Cytochrome b5 concentration Concentration of exogenous NADPH added isocytrate dehydrogenase + NADP Microsomes (human liver microsomes, human intestine microsomes, S9 fraction, cytosol, whole cell lysate, hepatocytes. Enzyme name mg/mL or uM Transporters Bi-Directional CHO; Caco-2 cells; HEK-293; Hepa-RG; LLC; LLC-PK1 MDR1 cells; MDCK; MDCK- MDR1 cells; Suspension Hepatocyte Uptake/efflux tumor cells, cDNA transfected cells, oocytes injected with cRNA of transporters membrane vesicles from various tissues or cells expressing P-gp, Reconstituted P-gp Other factors Pre-incubation time Incubation time Quantification HPLC/UV, LC/MS/MS, LC/MS, radiographic methods Sample size Data Analysis log-linear regression, plotting; and nonlinear The in vivo PK parameters are presented in Table 4. All of the information are summarized from two text books [1,8]. There are several main classes of PK parameters. Area under the concentration curve parameters are (AUCinf, AUCSS, AUCt, AUMC); drug clearance parameters are (CL, CLb, CLu, CLH, CLR, CLpo, CLIV, CLint, CL12); drug concentration parameters are (Cmax, CSS); extraction ratio and bioavailability parameters are (E, EH, F, FG, FH, FR, fe, fm); rate constants include elimination rate constant k, absorption rate constant ka, urinary excretion rate constant ke, Michaelis-Menten constant Km, distribution rate constants (k 12, k21), and two rate constants in the two-compartment model (λ1 2); blood flow rate (Q, QH); time parameters (tmax, t1/2); volume distribution parameters (V, Vb, V1, V2, Vss); maximum rate of metabolism, Vmax; and ratios of PK parameters that present the extend of the drug interaction, (AUCR, CL ratio, Cmax ratio, Css ratio, t1/2 ratio). Table 4 in vivo PK Studies
Area under the drug concentration time Area under the drug concentration time curve within a dosing curve at steady state. L-1 Area under the drug concentration time curve from time 0 to t. Area under the first moment of concentration versus time curve. AUC ratio (drug interaction parameter). Total clearance is defined as the proportionality factor relating rate of drug elimination to the plasma drug concentration. Blood clearance is defined as the proportionality factor relating rate of drug elimination to the blood drug concentration. Unbound clearance is defined as the proportionality factor relating rate of drug elimination to the unbounded plasma drug concentration. Hepatic portion of the total clearance. Renal portion of the total clearance. Total clearance of drug following an oral Total clearance of drug following an IV Intrinsic metabolic clearance is defined as ml h-1 RT p165 ratio of maximum metabolism rate, Vmax, and the Michaelis-Menten constant, Km. Inter-compartment distribution between the ml h-1 central compartment and the peripheral compartment. Ratio of the clearance (drug interaction Highest drug concentration observed in plasma following administration of an extravascular dose. The ratio of Cmax (drug interaction Concentration of drug in plasma at steady mg L- RT pxii state during a constant rate intravenous The ratio of Css (drug interaction Extraction ratio is defined as the ratio between blood clearance, CLb, and the Hepatic extraction ratio. Unit RT p161 free Bioavailability is defined as the proportion Unit RT p42 of the drug reaches the systemic blood. Gut-wall bioavailability. Hepatic bioavailability. Unit RT p167 free Renal bioavailability. Unit RT p170 free Fraction of drug systemically available that Unit RT pxiii is excreted unchanged in urine. Fraction of drug systemically available that Unit RT pxiii is converted to a metabolite. Ratio of unbound and total drug concentrations in plasma. Elimination rate constant. Distribution rate constants between central h-1 compartment and peripheral compartment. Absorption rate constant. Urinary excretion rate constant. Rate constant for the elimination of a Michaelis-Menten constant. mg L- RT pxiii 1 Mean time a molecular resides in body. Hepatic blood flow. Time at which the highest drug concentration occurs following administration of an extravascular dose. Half-life of the drug disposition. Half-life ratio (drug interaction parameter). Unit Half-life of the fast phase drug disposition. h Half-life of the slow phase drug Volume of distribution based on drug concentration in plasma. Volume of distribution based on drug concentration in blood. Volume of distribution of the central Volume of distribution of the peripheral Volume of distribution under the steady state concentration. Maximum rate of metabolism by an enzymatically mediated reaction. Disposition rate constants in a two- compartment model. Pharmacokinetics Non- Use drug concentration measurements directly to GP p409 Compartment estimate PK parameters, such as AUC, CL, Cmax, Tmax, t1/2, F, and V. It assumes the whole body is a homogeneous Compartment compartment, and the distribution of the drug from the blood to tissue is very fast. It assumes either a first order or a zero order absorption rate and a first order eliminate rate. Its PK parameters include (ka, V, CL, F). It assumes the whole body can be divided into Compartment two compartments: central compartment (i.e. Model systemic compartment) and peripheral compartment (i.e. tissue compartment). It assumes either a first order or a zero order absorption rate and a first order eliminate and distribution rates. Its PK parameters include (ka, V1, V2, CL, CL12, F). Bioequivalence, drug interaction, pharmacogenetics, and disease conditions. Single arm or multiple arms; cross-over or fixed order design; with or without randomization; with or without stratification; prescreening or no-prescreening; prospective or retrospective studies; and case reports or cohort studies. The number of subjects, and the number of plasma or urine samples per subject. Sampling time points and dosing time points. Sample types Blood, plasma, and urine. Subject specific doses. HPLC/UV, LC/MS/MS, LC/MS, radiographic methods Rowland Malcolm and Tozer N. Thomas Clinical Pharmacokinetics Concepts and Applications, 3rd edition. 1995, Lippincott Williams & Wilkins. Gibaldi Milo and Perrier Donald. Pharmacokinetics, 2nd edition. 1982, Dekker. It is also shown in Table 4 that two types of pharmacokinetics models are usually presented in the literature: non-compartment model and one or two-compartment models. There are multiple items need to be considered in an in vivo PK study. The hypotheses include the effect of bioequivalence, drug interaction, pharmacogenetics, and disease conditions on a drug‟s PK. The design strategies are very diverse: single arm or multiple arms, cross-over or fixed order design, with or without randomization, with or without stratification, pre-screening or no-pre-screening based on genetic information, prospective or retrospective studies, and case reports or cohort studies. The sample size includes the number of subjects, and the number of plasma or urine samples per subject. The time points include sampling time points and dosing time points. The sample type includes blood, plasma, and urine. The drug quantification methods include HPLC/UV, LC/MS/MS, LC/MS, and radiographic. CYP450 family enzymes predominantly exist in the gut wall and liver. Transporters are tissue specific. Table 5 presents the tissue specific transports and their functions. Probe drug is another important concept in the pharmacology research. An enzyme‟s probe substrate means that this substrate is primarily metabolized or transported by this enzyme. In order to experimentally prove whether a new drug inhibits or induces an enzyme, its probe substrate is always utilized to demonstrate this enzyme‟s activity before and after inhibition or induction. An enzyme‟s probe inhibitor or inducer means that it inhibits or induces this enzyme primarily. Similarly, an enzyme‟s probe inhibitor needs to be utilized if we investigate whether a drug is metabolized by this enzyme. Table 6 presents all the probe inhibitors, inducers, and substrates of CYP enzymes. Table 7 presents all the probe inhibitors, inducers, and substrates of the transporters. All these information were collected from industry standard (, reviewed in the top pharmacology journal [9]. Table 5 Tissue Specific Transporters
Intestinal enterocyte, kidney proximal tubule, hepatocyte (canalicular), brain endothelia ABCG2 BCRP Intestinal enterocyte, hepatocyte (canalicular), kidney Efflux proximal tubule, brain endothelia, placenta, stem cells, mammary gland (lactating) SLCO1B1 OATP1B1, OATP- Hepatocyte (sinusoidal) SLCO1B3 OATP1B3, OATP- Hepatocyte (sinusoidal) SLC22A2 OCT2 Kidney proximal tubule SLC22A6 OAT1 Kidney proximal tubule, placenta SLC22A8 OAT3 Kidney proximal tubule, choroid plexus, brain Table 6 in vivo Probe Inhibitors/Inducers/Substrates of CYP Enzymes
Enzymes CYP1A2 Ciprofloxacin, enoxacin, Montelukast, phenytoin, Alosetron, caffeine, fluvoxamine, Methoxsalen, smokers versus non- duloxetine, melatonin, mexiletine, oral smokers, moricizine, ramelteon, tacrine, tizanidine, theophylline, phenylpropanolamine, thiabendazole, vemurafenib, zileuton, acyclovir, allopurinol, caffeine, cimetidine, daidzein, disulfiram, Echinacea, famotidine, norfloxacin, propafenone, propranolol, terbinafine, ticlopidine, verapamil CYP2B6 Clopidogrel, ticlopidine Efavirenz, rifampin, Bupropion, efavirenz CYP2C8 Gemfibrozil, fluvoxamine, Repaglinide, Paclitaxel ketoconazole, trimethoprim CYP2C9 Amiodarone, fluconazole, Celecoxib, Warfarin, miconazole, oxandrolone, rifampin, aprepitant, capecitabine, cotrimoxazole, bosentan, phenobarbital, etravirine, fluvastatin, St. John‟s wort fluvoxamine, metronidazole, sulfinpyrazone, tigecycline, voriconazole, zafirlukast CYP2C19 Fluconazole, fluvoxamine, Rifampin, artemisinin Clobazam, lansoprazole, ticlopidine, esomeprazole, fluoxetine, moclobemide, omeprazole, voriconazole, allicin (garlic derivative), armodafinil, carbamazepine, cimetidine, etravirine, human growth hormone (rhGH), felbamate, ketoconazole, oral contraceptives CYP3A Boceprevir, clarithromycin, Avasimibe, Alfentanil, aprepitant, conivaptan, grapefruit juice, carbamazepine, budesonide, buspirone, indinavir, itraconazole, phenytoin, rifampin, St. conivaptan, darifenacin, John‟s wort, bosentan, darunavir, dasatinib, lopinavir/ritonavir, efavirenz, etravirine, dronedarone, eletriptan, mibefradil, nefazodone, modafinil, nafcillin, eplerenone, everolimus, nelfinavir, posaconazole, amprenavir, aprepitant, felodipine, indinavir, ritonavir, saquinavir, fluticasone, lopinavir, telaprevir, telithromycin, clobazamechinacea, lovastatin, lurasidone, voriconazole, amprenavir, pioglitazone, prednisone, maraviroc, midazolam, aprepitant, atazanavir, rufinamide, vemurafenib nisoldipine, quetiapine, ciprofloxacin, crizotinib, saquinavir, sildenafil, darunavir/ritonavir, diltiazem, simvastatin, sirolimus, erythromycin, fluconazole, tolvaptan, tipranavir, fosamprenavir, grapefruit triazolam, ticagrelor, juice, imatinib, verapamil, vardenafil, Alfentanil, alprazolam, amiodarone, astemizole, cisapride, amlodipine, atorvastatin, bicalutamide, cilostazol, dihydroergotamine, cimetidine, cyclosporine, ergotamine, fentanyl, fluoxetine, fluvoxamine, pimozide, quinidine, ginkgo, goldenseal, isoniazid, sirolimus, tacrolimus, lapatinib, nilotinib, oral contraceptives, pazopanib, ranitidine, ranolazine, tipranavir/ritonavir, ticagrelor, zileuton CYP2D6 Bupropion, fluoxetine, Atomoxetine, desipramine, paroxetine, quinidine, dextromethorphan, cinacalcet, duloxetine, metoprolol, nebivolol, perphenazine, tolterodine, venlafaxine, Thioridazine, amiodarone, celecoxib, clobazam, cimetidine, desvenlafaxine, diltiazem, diphenhydramine, echinacea, escitalopram, febuxostat, gefitinib, hydralazine, hydroxychloroquine, imatinib, methadone, oral contraceptives, pazopanib, propafenone, ranitidine, ritonavir, sertraline, telithromycin, verapamil, vemurafenib Table 7 in vivo Probe Inhibitors/Inducers/Substrates of Selected Transporters
Transporter Inhibitor
Amiodarone, azithromycin, Avasimibe, Aliskiren, ambrisentan, captopril, carvedilol, colchicine, dabigatran clarithromycin, conivaptan, phenytoin, rifampin, etexilate, digoxin, cyclosporine, diltiazem, St John‟s wort, everolimus, fexofenadine, dronedarone, erythromycin, tipranavir/ritonavir imatinib, lapatinib, felodipine, itraconazole, maraviroc, nilotinib, ketoconazole, lopinavir and posaconazole, ranolazine, ritonavir, quercetin, saxagliptin, sirolimus, quinidine, ranolazine, sitagliptin, talinolol, ticagrelor, verapamil tolvaptan, topotecan Cyclosporine, elacridar (GF120918), eltrombopag, mitoxantrone, imatinib, irrinotecan, lapatinib, rosuvastatin, sulfasalazine, topotecan OATP1B1 Atazanavir, cyclosporine, Atrasentan, atorvastatin, eltrombopag, gemfibrozil, bosentan, ezetimibe, lopinavir, rifampin, ritonavir, fluvastatin, glyburide, SN- saquinavir, tipranavir 38 (active metabolite of irinotecan), rosuvastatin, simvastatin acid, pitavastatin, pravastatin, repaglinide, rifampin, valsartan, olmesartan OATP1B3 Atazanavir, cyclosporine, Atorvastatin, rosuvastatin, lopinavir, rifampin, ritonavir, pitavastatin, telmisartan, valsartan, olmesartan Cimetidine, quinidine Amantadine, amiloride, cimetidine, dopamine, famotidine, memantine, metformin, pindolol, procainamide, ranitidine, varenicline, oxaliplatin Adefovir, captopril, furosemide, lamivudine, methotrexate, oseltamivir, tenofovir, zalcitabine, zidovudine Probenecid cimetidine, Acyclovir, bumetanide, ciprofloxacin, famotidine, furosemide, methotrexate, zidovudine, oseltamivir acid, (the active metabolite of oseltamivir), penicillin G, pravastatin, rosuvastatin, sitagliptin Metabolism The cytochrome P450 superfamily (officially abbreviated as CYP) is a large and diverse group of enzymes that catalyze the oxidation of organic substances. The substrates of CYP enzymes include metabolic intermediates such as lipids and steroidal hormones, as well as xenobiotic substances such as drugs and other toxic chemicals. CYPs are the major enzymes involved in drug metabolism and bioactivation, accounting for about 75% of the total number of different metabolic reactions [10]. CYP enzyme names and genetic variants were mapped from the Human Cytochrome P450 (CYP) Allele Nomenclature Database (http://www. This site contains the CYP450 genetic mutation effect on the protein sequence and enzyme activity with associated references. Transport Proteins are proteins which serves the function of moving other materials within an organism. Transport proteins are vital to the growth and life of all living things. Transport proteins involved in the movement of ions, small molecules, or macromolecules, such as another protein, across a biological membrane. They are integral membrane proteins; that is they exist within and span the membrane across which they transport substances. Their names and genetic variants were mapped from the Transporter Classification Database ( In addition, we also added the probe substrates and probe inhibitors to each one of the metabolism and transportation enzymes (see prescribed description). Drug names was created using the drug names from DrugBank 3.0 [2]. DrugBank consists of 6,829 drugs which can be grouped into different categories of FDA-approved, FDA approved biotech, nutraceuticals, and experimental drugs. The drug names are mapped to generic names, brand names, and synonyms. Subject included the existing ontologies for human disease ontology (DOID), suggested Ontology for Pharmacogenomics (SOPHARM),, and mammalian phenotype (MP) from (see Table 1)The PK ontology was implemented with Protégé [11] and uploaded to the BioPortal ontology platform. PK corpus
A PK abstract corpus was constructed to cover four primary classes of PK studies: clinical PK studies (n = 56); clinical pharmacogenetic studies (n = 57); in vivo DDI studies (n = 218); and in vitro drug interaction studies (n = 210). The PK corpus construction process is a manual process. The abstracts of clinical PK studies were selected from our previous work, in which the most popular CYP3A substrate, midazolam was investigated [12]. The clinical pharmacogenetic abstracts were selected based on the most polymorphic CYP enzyme, CYP2D6. We think these two selection strategies represent very well all the in vivo PK and PG studies. In searching for the drug interaction studies, the abstracts were randomly selected from a PubMed query, which used probe substrates/inhibitors/inducers for metabolism enzymes reported in the Table 6. Once the abstracts have been identified in four classes, their annotation is a manual process (Figure 1). The annotation was firstly carried out by three master level annotators (Shreyas Karnik, Abhinita Subhadarshini, and Xu Han), and one Ph.D. annotator (Lang Li). They have different training backgrounds: computational science, biological science, and pharmacology. Any differentially annotated terms were further checked by Sara K. Quinney and David A. Flockhart, one Pharm D. and one M.D. scientists with extensive pharmacology training background. Among the disagreed annotations between these two annotators, a group review was conducted (Drs Quinney, Flockhart, and Li) to reach the final agreed annotations. In addition a random subset of 20% of the abstracts that had consistent annotations among four annotators (3 masters and one Ph.D.), were double checked by two Ph.D. level scientists. Figure 1 PK Corpus Annotation Flow Chart.
A structured annotation scheme was implemented to annotate three layers of pharmacokinetics information: key terms, DDI sentences, and DDI pairs (Figure 2). DDI sentence annotation scheme depends on the key terms; and DDI annotations depend on the key terms and DDI sentences. Their annotation schemes are described as following. Figure 2 A Three Level Hierarchical PK and DDI Annotation Scheme.
Key terms include drug names, enzyme names, PK parameters, numbers, mechanisms, and change. The boundaries of these terms among different annotators were judged by the following standard. • Drug names were defined mainly on DrugBank 3.0. In addition, drug metabolites were also tagged, because they are important in in vitro studies. The metabolites were judged by either prefix or suffix: oxi, hydroxyl, methyl, acetyl, N-dealkyl, N-demethyl, nor, dihydroxy, O-dealkyl, and sulfo. These prefixes and suffixes are due to the reactions due to phase I metabolism (oxidation, reduction, hydrolysis), and phase II metabolism (methylation, sulphation, acetylation, glucuronidation) (Brunton). • Enzyme names covered all the CYP450 enzymes. Their names are defined in the human cytochrome P450 allele nomenclature database, http://www. The variations of the enzyme or gene names were considered. Its regular expression is (?:cyp CYP P450 CYP450)?[0–9][a-zA-Z][0–9]{0,2}(?: *[0–9]{1,2})?$. • PK parameters were annotated based on the defined in vitro and in vivo PK parameter ontology in Table 2 and 4. In addition, some PK parameters have different names, CL = clearance, t1/2 = half-life, AUC = area under the concentration curve, and AUCR = area under the concentration curve ratio. • Numbers such as dose, sample size, the values of PK parameters, and p-values were all annotated. If presented, their units were also covered in the annotations. • Mechanisms denote the drug metabolism and interaction mechanisms. They were annotated by the following regular expression patterns: inhibit(e(s d)? ing ion(s)? or)$, catalyz(e(s d)? ing)$, correlat(e(s d)? ing ion(s)?)$, metaboli(z(e(s d)? ing) sm)$, induc(e(s d)? ing tion(s)? or)$, form((s ed)? ing tion(s)? or)$, stimulat(e(s d)? ing ion(s)?)$, activ(e(s)? (at)(e(s d)? ing ion(s)?))$, and suppress(e(s d)? ing ion(s)?)$. • Change describes the change of PK parameters. The following words were annotated in the corpus to denote the change: strong(ly)?, moderate(ly)?, high(est)?(er)?, slight(ly)?, strong(ly)?, moderate(ly)?, slight(ly)?, significant(ly)?, obvious(ly)?, marked(ly)?, great(ly)?, pronounced(ly)?, modest(ly)?, probably, may, might, minor, little, negligible, doesn‟t interact, affect((s ed)? ing ion(s)?)?$, reduc(e(s d)? ing tion(s)?)$, and increas(e(s d)? ing)$. The middle level annotation focused on the drug interaction sentences. Because two interaction drugs were not necessary all presented in the sentence, sentences were categorized into two classes: • Clear DDI Sentence (CDDIS): two drug names (or drug-enzyme pair in the in vitro study) are in the sentence with a clear interaction statement, i.e. either interaction, or non-interaction, or ambiguous statement (i.e. such as possible or might and etc.). • Vague DDI Sentence (VDDIS): One drug or enzyme name is missed in the DDI sentence, but it can be inferred from the context. Clear interaction statement also is required. Once DDI sentences were labeled, the DDI pairs in the sentences were further annotated. Because the fundamental difference between in vivo DDI studies and in vitro DDI studies, their DDI relationships were defined differently. In in vivo studies, three types of DDI relationships were defined (Table 8): DDI, ambiguous DDI (ADDI), and non-DDI (NDDI). Four conditions are specified to determine these DDI relationships. Condition 1 (C1) requires that at least one drug or enzyme name has to be contained in the sentence; condition 2 (C2) requires the other interaction drug or enzyme name can be found from the context if it is not from the same sentence; condition 3 (C3) specifies numeric rules to defined the DDI relationships based on the PK parameter changes; and condition 4 (C4) specifies the language expression patterns for DDI relationships. Using the rules summarized in Table 8, DDI, ADDI, and NDDI can be defined by C1 ˄ C2 ˄ (C3 ˄ C4). The priority rank of in vivo PK parameters is AUC > CL > t1/2 > Cmax. In in vitro studies, six types of DDI relationships were defined (Table 8). DDI, ADDI, NDDI were similar to in vivo DDIs, but three more drug-enzyme relationships were further defined: DEI, ambiguous DEI (ADEI), and non-DDI (NDEI). C1, C2, and C4 remained the same for in vitro DDIs. The main difference is in C3, in which either Ki or IC50 (inhibition) or EC50 (induction) were used to defined DDI relationship quantitatively. The priority rank of in vitro PK parameters is Ki > IC50. Table 9 presented eight examples of how DDIs or DEIs were determined in the sentences. Table 8 DDI Definitions in Corpus
C1 C2 C3**
Yes Yes The PK parameter with the highest
Significant, obviously, priority* must satisfy p-value <0.05 and markedly, greatly, FC > 1.50 or FC < 0.67 pronouncedly and etc. Ambiguous
The PK parameter with the highest Modestly, moderately, DDI (ADDI)
priority* in the conditions of p-value probably, may, might, and <0.05 but 0.67 < FC < 1.50; or FC >1.50 etc. or FC <0.67, but p-value > 0.05. The PK parameter with the highest Minor significance, priority*are in the condition of p-value > slightly, little or negligible 0.05 and 0.67 < FC < 1.50 effect, doesn‟t interact etc. IN VITRO STUDY
Yes Yes (0< Ki < 10 or 0< EC50 < 10 microM,
Significant, obviously, and p-value <0.05) markedly, greatly, pronouncedly and etc. Ambiguous
(10 < Ki < 100 or 10 < EC50 < 100 Modestly, moderately, DDI (ADDI)
microM, and p-value <0.05 or vice versa) probably, may, might, and Ambiguous

(Ki > 100 microM or EC50 > 100 Minor significance, microM, and p-value >0.05) slightly, little or negligible effect, doesn‟t interact etc. Non-DEI

Note: C1: At least one drug or enzyme name has to be contained in the sentence. C2: Need to label the drug name if it is not from the same sentence. C3: PK-parameter and value dependent. C4: Significance statement. *Priority issue: When C3 and C4 occur and conflict, C3 dominates the sentence.**For the priority of PK parameters: AUC > CL > t1/2 > Cmax;; the priority of in vitro PK parameters: Ki>IC50. Table 9 Examples of DDI Definitions
DDI sentence
Relationship and commend
20012601 The pharmacokinetic parameters of verapamil
Because of the words, were significantly altered by the co-administration "significantly", (Verapamil, of lovastatin compared to the control. lovastatin) is a DDI.
20209646 The clearance of mitoxantrone and etoposide was Because of the fold changes
decreased by 64% and 60%, respectively, when were less than 0.67, combined with valspodar. (mitoxantrone, valspodar.) and
(etoposide, valspodar) are
20012601 The (AUC (0-infinity)) of norverapamil and the
Because of the words, "not terminal half-life of verapamil did not significantly significantly changed", changed with lovastatin coadministration. (verapamil, ovastatin) is a
17304149 Compared with placebo, itraconazole treatment
AUC has a higher rank than significantly increase the peak plasma Cmax, and it had a 1.5 fold- concentration (Cmax) of paroxetine by 1.3 fold change and less than 0.05 p- (6.7 2.5 versus 9.0 3.3 ng/mL, P≤0.05) and the area value, thus, (itraconazole, under the plasma concentration-time curve from paroxetine) is a DDI.
zero to 48 hours [AUC(048)] of paroxetine by 1.5 fold (137 73 versus 199 91 ng*h/mL, P≤0.01). 13129991 The mean (SD) urinary ratio of dextromethorphan The change in PK parameter is
to its metabolite was 0.006 (0.010) at baseline and more than 1.5 fold but P-value 0.014 (0.025) after St John's wort administration is >0.05. Thus, (dextromethorphan, St John‟s
wort) is an ADDI.
19904008 The obtained results show that perazine at its
Because of words, "potent therapeutic concentrations is a potent inhibitor of inhibitor", (perazine, CYP1A2) human CYP1A2. 19230594 After human hepatocytes were exposed to 10
Because of words, "not microM YM758, microsomal activity and mRNA induced" and "slightly level for CYP1A2 were not induced while those for induced", (YM758, CYP1A2) CYP3A4 were slightly induced. and (YM758, CYP1A2) are
19960413 From these results, DPT was characterized to be a Because K was larger than
competitive inhibitor of CYP2C9 and CYP3A4, 10microM, (DPT, CYP2C9) with K(i) values of 3.5 and 10.8 microM in HLM and (DPT, CYP3A4) are and 24.9 and 3.5 microM in baculovirus-insect cell-expressed human CYPs, respectively. Krippendorff's alpha [13] was calculated to evaluate the reliability of annotations from four annotators. The frequencies of key terms, DDI sentences, and DDI pairs are presented in Table 10. Their Krippendorff's alphas are 0.953, 0.921, and 0.905, respectively. Please note that the total DDI pairs refer to the total pairs of drugs within a DDI sentence from all DDI sentences. Table 10 Annotation Performance Evaluation
Key Terms
Annotation Categories Krippendorff's alpha Total Drug Pairs Table 11 Clinical PK Studies
Pharmacogenetics Trial Drug Interaction Trail Midazolam (MDZ, PO 4mg; IV 0.05mg/kg), Ketoconazole (KTZ, PO, 200, 400 mg) SOLTAMOX™, 20mg/day MDZ PO, IV; KTZ PO month 1, 4, 8, 12 before and 0.5, 0.75, 1, 2, 4, 6, 9 hrs TAM and its metabolites MDZ and KTZ: AUC, AUCR, t1/2, and Cmax prior chemo, menopausal three-phase crossover prospective, single arm prospective, single arm CYP2D6, 2C9, 2B6 healthy volunteers Caucasian/African Note: The annotations are aligned for each row. The left column is the ontology tree presentation. The central and right columns display their corresponding annotations from the paper. Table 12 in vitro PK Study
MDZ, APZ, TZ, CLAR, TAM, DTZ, NIF, BFC, HFC, TEST, E2 Compare metabolic capabilities of CYP3A4, 3A5, 3A7 sodium phosphate, NADPH, methanol. WinNonlin 4 fold, 10% methanol (TZ) 5 min insect cell (CYP3A) N/A 3min; 6 min HPLC, MS, Fluorimetry CYP3A4/5/7, P450 reductase, b5 1mol, 6.6mol, 9mol BD Gentest, PanVera, PanVera CL for individual substrates Km for individual substrates Vmax for individual substrates MDZ, APZ, TZ, CLAR, TAM, DTZ, NIF, BFC, HFC, TEST, E2 CYP3A4, 3A5, 3A7 Note: The annotations are aligned for each row. The left column is the ontology tree presentation. The central and right columns display their corresponding annotations from the paper. The PK corpus was constructed by the following process. Raw abstracts were downloaded from PubMed in XML format. Then XML files were converted into GENIA corpus format following the gpml.dtd from the GENIA corpus [14]. The sentence detection in this step is accomplished by using the Perl module Lingua::EN::Sentence, which was downloaded from The Comprehensive Perl Archive Network (CPAN, GENIA corpus files were then tagged with the prescribed three levels of PK and DDI annotations. Finally, a cascading style sheet (CSS) was implemented to differentiate colours for the entities in the corpus. This feature allows the users to visualize annotated entities. We would like to acknowledge that a DDI Corpus was recently published as part of a text mining competition DDIExtraction 2011 ( DDIExtraction2011/dataset.html). Their DDIs were clinical outcome oriented, not PK oriented. They were extracted from DrugBank, not from PubMed abstracts. Our PK corpus complements to their corpus very well. Example 1: An annotated tamoxifen pharmacogenetics study
This example shows how to annotate a pharmacogenetics studies with the PK ontology. We used a published tamoxifen PG study (Borges, Desta et al.). The key information from this tamoxifen PG trial was extracted as a summary list. Then the pre-processed information was mapped to the PK ontology (column 2 in Table 9). This PG study investigates the genetics effects (CYP3A4, CPY3A5, CYP2D6, CYP2C9, CYP2B6) on the tamoxifen pharmacokinetics outcome (tamoxifen metabolites) among breast cancer patients. It was a single arm longitudinal study (n = 298), patients took SOLTAMOXTM 20mg/day, and the drug steady state concentration was sampled (1, 4, 8, 12) months after the tamoxifen treatment. The study population was a mixed Caucasian and African American. In Table 9, the trial summary is well organized by the PK ontology. Example 2 midazolam/ketoconazole drug interaction study
This was a cross-over three-phase drug interaction study [15] (n = 24) between midazolam (MDZ) and ketoconazole (KTZ). Phase I was MDZ alone (IV 0.05 mg/kg and PO 4mg); phase II was MDZ plus KTZ (200mg); and phase III was MDZ plus KTZ (400mg). Genetic variable include CYP3A4 and CYP3A5. The PK outcome is the MDZ AUC ratio before and after KTZ inhibition. Its PK ontology based annotation is shown in Table 9 column three. Example 3 in vitro Pharmacokinetics Study
This was an in vitro study [16], which investigated the drug metabolism activities for 3 enzymes, such as CYP3A4, CYP3A5, and CYP3A7 in a recombinant system. Using 10 CYP3A substrates, they compared the relative contribution of 3 enzymes among 10 drug‟s metabolism. Its PK ontology based annotation is shown in Table 10. Example 4 A drug interaction text mining example
We implemented the approach described by [17] for the DDI extraction. Prior to performing DDI extraction, the testing and validation DDI abstracts in our corpus was pre-processed and converted into the unified XML format [17]. The following steps were conducted: • Drugs were tagged in each of the sentences using dictionary based on DrugBank. This step revised our prescribed drug name annotations in the corpus. One purpose is to reduce the redundant synonymous drug names. The other purpose is only keep the parent drugs and remove the drug metabolites from the tagged drug names from our initial corpus, because parent drugs and their metabolites rarely interacts. In addition, enzymes (i.e. CYPs) were also tagged as drugs, since enzyme-drug interactions have been extensively studied and published. The regular expression of enzyme names in our corpus was used to remove the redundant synonymous gene names. • Each of the sentences was subjected to tokenization, PoS tags and dependency tree generation using the Stanford parser [18]. • C n 2 drug pairs form the tagged drugs in a sentence were generated automatically, and they were assigned with default labels as no-drug interaction. Please note that if a sentence had only one drug name, this sentence didn‟t have a DDI. This setup limited us considering only CDDI sentence in our corpus. • The drug interaction labels were then manually flipped based on their true drug interaction annotations from the corpus. Please note that our corpus had annotated DDIs, ADDIs, NDDIs, DEIs, ADEIs, and NDEIs. Here only DDIs and DEIs were labeled as true DDIs. The other ADDIs, NDDIs, DEIs, and ADEIs were all categorized into the no-drug interactions. Then sentences were represented with dependency graphs using interacting components (drugs) (Figure 3). The graph representation of the sentence was composed of two items: i) One dependency graph structure of the sentence; ii) a sequence of PoS tags (which was transformed to a linear order "graph" by connecting the tags with a constant edge weight). We used the Stanford parser [18] to generate the dependency graphs. Airola et al. proposed to combine these two graphs to one weighted, directed graph. This graph was fed into a support vector machine (SVM) for DDI/non-DDI classification. More details about the all paths graph kernel algorithm can be found in [17]. A graphical representation of the approach is presented in Figure 3. Figure 3 Drug Interaction Extraction Algorithm Flow Chart.
DDI extraction was implemented in the in vitro and in vivo DDI corpus separately. Table 13 presented the training sample size and testing sample size in both corpus sets. Then Table 14 presents the DDI extraction performance. In extracting in vivo DDI pairs, the precision, recall, and F-measure in the testing set are 0.67, 0.79, and 0.73, respectively. In the in vitro DDI extraction analysis, the precision, recall, and F-measure are 0.47, 0.58, 0.52 respectively in the in vitro testing set. In our early DDI research published in the DDIExtract 2011 Challenge [19], we used the same algorithm to extract both in vitro and in vivo DDIs at the same time, the reported F-measure was 0.66. This number is in the middle of our current in vivo DDI extraction F-measure 0.73 and in vitro DDI extraction F-measure 0.52. Table 13 DDI Data Description
DDI Pairs
True DDI Pairs
in vivo DDI training in vivo DDI testing in vitro DDI training in vitro DDI testing Table 14 DDI Extraction Performance
in vivo DDI Training in vivo DDI Testing in vitro DDI Training in vitro DDI Testing Error analysis was performed in testing samples. Table 15 summarized the results. Among the known reasons for the false positives and false negatives, the most frequent one is that there are multiple drugs in the sentence, or the sentence is long. The other reasons include that there is no direct DDI relationship between two drugs, but the presence of some words, such as dose, increase, and etc., may lead to a false positive prediction; or DDI is presented in an indirect way; or some NDDI are inferred due to some adjectives (little, minor, negligible). Table 15 DDI Extraction Error Analysis from Testing DDI Sets
No. Error Categories
Error Frequency Examples
vivo vitro
1 There are multiple drugs in the sentence, FP PMID: 12426514. In 3 subjects with measurable concentrations in the single- and the sentence is long. dose study, rifampin significantly decreased the mean maximum plasma concentration (C(max)) and area under the plasma concentration-time curve from 0 to 24 h [AUC(0–24)] of praziquantel by 81% (P <.05) and 85% (P <.01), respectively, whereas rifampin significantly decreased the mean C(max) and AUC(0–24) of praziquantel by 74% (P <.05) and 80% (P <.01), respectively, in 5 subjects with measurable concentrations in the multiple-dose study PMID: 10608481. Erythromycin and ketoconazole showed a clear inhibitory effect on the 3-hydroxylation of lidocaine at 5 microM of lidocaine (IC50 9.9 microM and 13.9 microM, respectively), but did not show a consistent effect at 800 microM of lidocaine (IC50 >250 microM and 75.0 microM, respectively). 2 There is no direct DDI relationship PMID: 17192504. A significant fraction of patients to be treated with between two drugs, but the presence of HMR1766 is expected to be maintained on warfarin some words, such as dose, increase, and etc. may lead to a false positive prediction 3 DDI is presented in an indirect way. PMID: 11994058. In CYP2D6 poor metabolizers, systemic exposure was greater after chlorpheniramine alone than in extensive metabolizers, and administration of quinidine resulted in a slight increase in CLoral. 4 Design issue. Some NDDI are inferred FP PMID: 10223772. In contrast,the effect of ranitidine or ebrotidine on CYP3A due to some adjectives (little, minor, activity in vivo seems to have little clinical significance. PMID: 10383922. CYP1A2, CYP2A6, and CYP2E1 activities were not significantly inhibited by azelastine and the two metabolites. PMID: 10681383. However, the most unusual result was the interaction between testosterone and nifedipine. Conclusions and discussions
A comprehensive PK ontology was constructed. It annotates both in vitro PK experiments and in vivo PK studies. Using our PK ontology, a PK corpus was also developed. It consists of four classes of PK studies: in vivo PK studies, in vivo PG studies, in vivo DDI interaction studies, and in vitro DDI studies. This PK corpus is a highly valuable resource for text mining drug interactions relationship. We previously had developed entity recognition algorithm or tools to tag PK parameters and their associated numerical data (Wang [4]). We had shown that for one drug, midazolam, we have achieved very high accuracy and recall rate in tagging PK parameter, clearance (CL), and its associated numerical values. However, using our newly developed PK corpus, we cannot regain such a good performance in a more general class of drugs and PK parameters. This area will need much further investigation. We would like to acknowledge that a DDI Corpus was recently published as part of a text mining DDIExtraction2011/dataset.html). Their DDIs were clinical outcome oriented, not PK oriented. They were extracted from DrugBank, not from PubMed abstracts. Our PK corpus complements to their corpus very well. Availability, links, and requirement, which can be accessed by using any OWL editor/viewer, ADMET, Absorption, disposition, metabolism, excretion, and transportation; DDI, Drug-drug interaction; KTZ, Ketoconazole; MDZ, Midazolam; POS, Part of speech; PK, Pharmacokinetics; PG, _harmacogenetics Authors' contributions
Hengyi Wu developed the three level hierarchical PK and DDI annotation scheme for the corpus; Shreyas Karnik designed the PK corpus annotation implementation scheme and was one of the master annotator; Abhinita Subhadarshini designed the PK ontology and was one of the master annotator; Zhiping Wang applied the PK ontology to three PK studies; Santosh Philips collected the pharmacogenetics abstracts; Xu Han was one of the master annotator; Chienwei Chiang collect the ontology information for the transporter; Lei Liu advised the utility of protégé; Malaz Boustani, Luis M Rocha and Sara K. Quinney defined the in vitro and in vivo PK terminologies; Sara K. Quinney was one of the Ph.D. level annotator; David Flockhart confirmed the disagreed annotations and double checked the PK terminologies and study design; and Lang Li contributed the idea, guide this research, and wrote the manuscript. All authors read and approved the final manuscript. This work is supported by the U.S. National Institutes of Health grants R01 GM74217 (Lang Li) and AHRQ Grant R01HS019818-01 (Malaz Boustani), 2012ZX10002010-002-002 (Lei Liu), and 2012ZX09303013-015 (Lei Liu) References
1. Rowland M, Tozer TN: Clinical pharmacokinetics concept and applications. London: Lippincott Williams & Wilkins; 1995. 2. Knox C, Law V, Jewison T, Liu P, Ly S, et al: "Drugbank 3.0: a comprehensive
resource for 'omics' research on drugs.
Nucleic Acids Res 2011:D1035–D1041.
3. Tari L, Anwar S, Liang S, Cai J, Baral C: Discovering drug-drug interactions: a text-
mining and reasoning approach based on properties of drug metabolism.
2010, 26(18):i547–i553.
4. Wang Z, Kim S, et al: Literature mining on pharmacokinetics numerical data: a
feasibility study.
J Biomed Inform 2009, 42(4):726–735.
5. Segel HI: "Enzyme kinetics – behavior and analysis of rapid equilibrium and steady state enzyme systems.". New York: John Wiley & Sons, Inc; 1975. 6. Consortium TIT: Membrane transporters in drug development. Nature Review Drug
2010, 9:215–236.
7. Rostami-Hodjegan A, Tucker G: "In silico" simulations to assess the "in vivo"
consequences of "in vitro
" metabolic drug-drug interactions. Drug Discovery Today:
2004, 1:441–448.
8. Gibaldi M, Perrier D: Pharmacokinetics. 2nd edition.: Dekker; 1982. 9. Huang SM, Temple R, Throckmorton DC, Lesko LJ: Drug interaction studies: study
design, data analysis, and implications for dosing and labeling.
Clin Pharmacol Ther
2007, 81(2):298–304.
10. Guengerich FP: Cytochrome p450 and chemical toxicology. Chem Res Toxicol 2008,
11. Rubin DL, Noy NF, et al: Protege: a tool for managing and using terminology in
radiology applications.
J Digit Imaging 2007, 20(Suppl 1):34–46.
12. Wang Z, Kim SK, Quinney S, Guo Y, Hall SD, Rocha LM, Li L: Literature mining on
pharmacokinetics numerical data: a feasibility study.
J Biomedical Informatics 2009,
13. Krippendorff K: "Content analysis: an introduction to its methodology.". Thousand Oaks, CA: Sage; 2004. 14. Kim JD, Ohta T, Tateisi Y, Tsujii J: Genia corpus—a semantically annotated corpus
for bio-textmining.
Bioinformatics 2003, 19(Supp 1):i180–i182.
15. Chien JY, Lucksiri A, et al: Stochastic prediction of cyp3a-mediated inhibition of
midazolam clearance by ketoconazole.
Drug Metab Dispos 2006, 34(7):1208–1219.
16. Williams JA, Ring BJ, et al: Comparative metabolic capabilities of cyp3a4, cyp3a5,
and cyp3a7.
Drug Metab Dispos 2002, 30(8):883–891.
17. Airola A, Pyysalo S, Bjorne J, Pahikkala T, Ginter F, Salakoski T: All- paths graph
kernel for protein-protein interaction extraction with evaluation of cross-corpus
BMC Bioinforma 2008, 9(suppl 11):S2.
18. De Marneffe M, MacCartney B, Manning C: Generating typed dependency parses
from phrase structure parses.
Proceedings of LREC 2006, 6:449–454.
19. Karnik S, Subhadarshini A, Wang Z, Rocha LM, Li L: "Extraction of drug-drug
interactions using all paths graph kernel.".
Proc. of the 1st Challenge task on Drug Drug
Interaction Extraction
2011, :83–88.
20. Borges S, Desta Z, et al: "Composite functional genetic and comedication cyp2d6
activity score in predicting tamoxifen drug exposure among breast cancer patients.".
Clin Pharmacol
, 50(4):450–458.
21. Brunton LL, Chabner BA, Knollmann BC: "Goodman & Gilman's The Pharmacological Basis Of Therapeutics.".:12. 22. Segura-Bedmar I, Martínez P, de Pablo-Sánchez C: Using a shallow linguistic kernel
for drug-drug interaction extraction.
J Biomed Inform 2011, 44(5):789–804.




1ASSAY TECHNOLOGIES Anuradha RoyDel Shankel Structural Biology Center, High Throughput Screening Laboratory, Lawrence, Kansas Gerald H. LushingtonMolecular Graphics and Modeling Laboratory, University of Kansas, Lawrence, Kansas; LiS Consulting, Lawrence, Kansas James McGee Quantitative Biology, Eli Lilly and Company, Indianapolis, Indiana