PLA Datums

Datums are a structured, formal representation of experimental findings suitable for computational reasoning and semantic query.

The PL Datum Knowledge Base (DKB) contains over 35k datums related to the state of mammalian cells and cellular response to a variety of stimuli. This document provides a little introduction to datums and describes how to search the DKB using the web based interface http://light.csl.sri.com:3000. See the section Getting started with Datum Queries below for instructions.

Worked examples can be found here, with suggestions for further querying here.

Please note that this interface is a prototype designed primarily for computational biologists, with an emphasis on computational. An interface for experimental biologists is under development.

Table of Contents

For a little background go to The origin of Datums. To learn about datum contents go to What are Datums?. For instructions on accessing the Datum Query service go to Getting started with Datum Queries, the first step is creating an account. For details about building queries and formmating/exporting results go to Datum Query Construction and Results retrieval.

The origin of Datums

To develop a model concerning a particular aspect of cellular signaling from available literature, the first step is to gather all the available experimental evidence. To help organize gathered information, we have developed a system for recording experimental findings in a knowledge base. Individual entries are called datums. It is important that each datum capture objective information, rather than conclusions of the experimenter or curator. We want to be able to compute with datums in the knowledge base, to retrieve sets of datums satisfying possibly complex combinations of properties, and to make logical inferences based on datums and general biological information. We also want the knowledge base and its infrastructure to be generally useful for experimental biologists, thus datums should be expressed using readily understood concepts with generally agreed-upon meaning, e.g., assays, detection methods and cells. Furthermore, each datum should contain a manageable chunk of information, sufficient to unambiguously describe an experimental finding. It should also contain the source of the information so the datum for review and access to additional details or context.

What are Datums?

There are two main types of datum, state and change datums, corresponding to two basic types of biological experiments.

Overview

State datums describe the state of something in a defined system, often compared to the state of something else in the same system. An example of a state experiment is a comparison of the number of Egf receptors per cell on different cell lines. Protein interaction data is often produced by state experiments. If one protein can be co-precipitated by another protein from the same cell, they are considered to interact. State data is used Pathway Logic to create initial system states that require a list of components in the system, their modifications, and their location. It is also used to deduce the location and modifications of a protein demonstrated to be active (i.e., capable of performing its molecular function). Change datums describes the change in the state of something in a system in response to a stimulus. An experiment in which the Egf receptor is demonstrated to be unphosphorylated in a serum-starved cell and phosphorylated after that cell has been treated with Egf for 5 minutes is an example of a change datum.

Element Details

The elements of a datum include its subject, the assay performed, the observation made, the experimental environment, the source of information, and zero or more extras (variants on experimental conditions). In addition change datum elements include a treatment and observation times if available.

The curation notebook contains descriptions of the assays and other datum components. Its glossary is a good place to look for definitions of abbreviations and unfamiliar terms.

Getting started with Datum Queries

To access the datum query page point your browser at http://light.csl.sri.com:3000. The first time you will need to establish a user name and password by clicking on the "Register" link. On future visits click on the login link and enter your username and password. This will take you to the saved queries page. (The system allows you to name and save queries -- to just rerun, or as a starting point for making different queries.) The first time you access the Datum Query site there will be no queries saved, so you must create one. Click on "Make a new Query", type in a name in the "Make a new query" text box, click on "create" and you are ready to go. If you have one or more saved queries, you can select one of those (or make a new one). From a specific query page you can always return to the query selection page by clicking on the "My queries" link in the upper left. A newly created query is empty (equivalent to the true predicate) and thus the query page reports the full set of datums as results. The page for a saved query page resumes in the state it was in when last accessed. Queries are specified by selecting predicates (corresponding to datum parts) and predicate attributes (fields of the part) to constrain the result set by choosing a verb and possibly typing additional text for matching. Results display can be customized by choosing the parts of each datum to be displayed in the Fields area.

Datum Query Construction and Results retrieval

A datum query defines a predicate on datums. The query result is the set of datums for which the predicate is true. It is presented as a list which shows, for each datum, the subject line and any element attributes selected for display (see below). Clicking on the "Expand" link for a datum shows the full datum, you can also expand/collapse all. You can export the full query results as plain text (using the datum natural language syntax), or the selected attributes in csv (comma separated values) form. The export will appear in your browser - which you can save in a file if you like. (CSV export can be imported into a spread sheet program such as Excel.)

Constructing a Query

A query is a conjunction of basic predicates, each constraining the value of some datum element. The empty query selects all datums in the current DKB. To construct a query, a new predicate element can be added to a query by clicking on the link
Add a predicate
You choose the datum element to constrain by clicking on the leftmost selection button and selecting the desired element, one of {subject, change, assay, treatment, environments, times, source, extra}. This will cause the next selection button to the right to list the attributes of the element that can be selected. The third selection button allows you to specify the verb/relation. On the far right there are Add child/Remove links. The
Add child
link adds a new attribute line for a datum element, thus keeping all the attributes for a given element together. The
 Remove
link removes the attribute constraint on that line. If there is only one attribute, the element predicate is removed. You can start fresh by clicking on
Remove all predicates
The possible verbs are
  matches/does not match 
  isa/is not a 
  exists/does not exist   
The first 4 have associated text boxes to enter a string. At any point you can press the "Search" button to see the results for the current query state. You can search the entire datum string using the datum predicate and matches verb, for example to find all datums in which a give protein appears, either as subject, treatment, substrate or extra.

Formatting Results

You can change how the results are reported using the "Fields" box. (Click on show to see the choices.) Each selected field will be printed on a separate line following a result datum. Selected fields are also the fields used to determine the csv export (see Exporting Results). Selecting an element name displays the full element string from the datum natural language form (except for treatment and extras, since the element is defined from multiple substrings). Selecting an element attribute displays that attribute. In the case of extras, the selected attributes of each extra of a datum are displayed (named extra1.attr, extra2.attr etc.).

Exporting Results

Search results can be exported as plain text using the "txt" link of "Export all results". Txt export prefixes each datum with the selected fields, one per line. Search results can be exported in csv format (for import into a spread sheet application) using the "csv" link of "Export all results". Csv export has one column for each field, and one row for each datum in the result set. Either choice produces a text page which you can save from the browser window or copy and paste in to a file. NB: If you want the exported page to show in a new window, hold down the command key (control or shift key on windows/linux) to open a new tab in the browser.

Appendix

Datum Elements

The datum elements and searchable/displayable attributes are discussed below.
subject
subject.entity -- a protein, gene, possibly a lipid or other chemical
subject.origin -- one of
         ['endogenous', 'expressed', 'recombinant', 'purified', 'knockin']
subject.mods -- modifications -- matches searches in mods substring
subject.muts -- mutations -- matches searches in muts substring
subject.handle -- how the subject is identified
   possibilities include Ab (antibody), phosAb (phospho Ab), tAb (tag Ab) 
       14C, 32P, 35S, 125I (radioactive handles)

assay
assay.type -- See appendix below for Isa and matches possibilites
assay.detection_method
assay.hooks -- only relevant for binding assays
assay.hook_handles -- like subject handles
assay.substrate -- only relevant for activation assays
     (substrates and hooks are molecules, usually proteins)

change -- Controlled vocabulary:    
      [increased,decreased,unchanged,detectable,undetectable] ??un or not?

treatment
treatment.type -- one of  "irt", "by", "itpo" via string match
treatment.entity -- looks for match in any treatment entity, should print all
treatment.origin 
treatment.mods
treatment.muts
    Same as Subject counterparts 
    -- applies to first or matched entity if more than one????

environment  
environment.cells -- controlled vocabulary, too appear
environment.comment -- the comment string if any
environment.medium -- Controlled vocabulary, includes BMS BMLS BSS BMHIS
environment.cellmuts -- searches withing mutation string
environment.cmut_entity  -- a protein
environment.cmut_mods
environment.cmut_muts
    Same as Subject counterparts


times -- a string

source  
source.pmid -- the pubmed identifier 
source.figs -- the figures/tables used

extra
extra.type -- Controlled vocabulary:
     [repressed by, inhibited by, enhanced by, does not req, reqs, reversed by,
      unaffected by, bkg inhibited by,  partially ...]
extra.entity --  Same as Subject counterpart
extra.mode -- Controlled vocabulary:  [addition, substitution, KO, stim, RNAI, ...]

The default verb for any attribute is `may exist', meaning initially it won't
constrain the search. If you change the verb for some attribute and get an empty
result this is likely because there is a conflict, for example asking for the
substrate of a binding assay or for the hooks of an activation assay.

Too appear Links to CV lists
  BProtein     --- proteinops
  Genes        --- geneops
  Chemical     --- chemicalops
  Stress       --- stressops
  Cells        --- cellops

Assay Sorts and Types

AssayType -- isa
  SimpleAssay 
    GXPAssay 
    REReporterAssay 
    SimpleModAssay
  ModificationAssay 
    SimpleModAssay
  BindingAssay 
  ActivationAssay  
  LocationAssay 


AssayType -- matches
SimpleModAssay
	upshift
	dimerization
	oligomerization 
	polymerization 

GXPAssays
 GDP-dissociation
 GTP-association 
 GTP-hydrolysis 
 GTP-bdpd 
 GTP-percent

SimpleAssays
 cbs-binding 
 Gal4-reporter
 LexA-reporter
 mRNA 
 promo-reporter 
 internalization
 nuc-export-reporter
 nuc-import 
 nuc-export 
 surface-exp
 prot-exp 
 prot-stability 
 secretion 

ModificationAssay .
 acetylation 
 phos 
   --- Sphos >> phos(SSite)
   --- Tphos >> phos(TSite)
   --- STphos >> phos(STSite)
   --- Yphos >> phos(YSite)
 ubiq 
 sumo 
 cleavage

ActivationAssay -- has substrate
	IVKA
	IVLKA 
	IVGefA
	oligo-binding

BindingAssay -- has hooks
 boundby 
 colocwith
 copptby
 snaggedby

LocationAssay
	locatedin 
	infraction
	boundto 

REReporterAssays --- also a kind of activity (of TFs) detection
 ARE-reporter 
 BRE-reporter 
 CAGA-reporter
 DE-reporter 
 E2fRE-reporter
 EgrRE-reporter
 GAS-reporter 
 ISRE-reporter
 Lef1RE-reporter
 Nfkb-reporter
 SBE-reporter 
 SrfRE-reporter
 Stat3RE-reporter
 TCF-reporter 
 TRE-reporter