Use of Pathway Logic to refine, visualize, and combine KEGG Glycosylation Pathways

Introduction

The KEGG PATHWAY Database (http://www.kegg.jp/kegg/pathway.html) is a source of pathway maps of molecular interactions, reactions, and relations. The series of maps concerning Glycan biosynthesis and metabolism integrate structures, reactions, and enzymes related to glycans by providing links superimposed on nodes and edges of a hand drawn graph. This model uses PLA to provide the ability to interpret the reactions as excutable processes in order to combine maps, explore pathways, and carry out insilico/what-if experiments.

PLA has several advantages over the system used by KEGG to visualize and explore pathways. First, PLA networks use Petri nets to model biological processes. This allows the substrates, products, and enzymes of all reactions in the network to be displayed on one page. It also allows reachability analysis to be performed - providing a way to find a path from one glycosylation state to another. Second, the layout of PLA networks are produced automatically. The possible routes between states can be reformatted on the fly with non-relevant reactions removed. Third, the place nodes of the Petri nets (known as occurrences in PLA) contain information about the modification and location of an entity as well as the entity itself. This provides a way to avoid some of the ambiguities in the KEGG pathways such as whether multiple proteins in a node should be considered as "ands" (complexes of proteins working together) or "ors" (multiple proteins that catalyze the same reaction).

Most of the information in this model comes directly from the KEGG database. Additional information such as the subunit structure and intracellular locations of glycosyltransferases was obtained from UniProt (http://www.uniprot.org). Rules concerning relocation of proteoglycans from ER to Golgi were inserted as spontaneous reactions.

Ten KEGG maps that pertain to human proteins and/or lipids are represented in this model. Two extra maps were created with the same set of reactions to demonstrate how the interactions between pathway components can be investigated (Ceramide Glycosylation) and how generic proteins can be substituted with real world proteins (Protein Glycosylation). This section will introduce you to the KEGG pathways that we have converted into PL networks and show you how PLA can be used to learn about the different ways human cells glycosylate proteins and lipids.

To follow the discussion using the Pathway Logic Assistant go to (PLA Online) and dowload either the Webstart version of Glycosylation or download the PLA client and double click on PLAremote5 in the unzipped folder. This will produce a small KBManager window. Using the Select Dish button, select Predefined and then choose the map you want to see. They are nameed according to the map name in the section describing the map. (Note that in PLA maps are also called dishes.) See the demos or (the SmallKB guided tour) for more information on how to use PLA.

Introduction
The KEGG Maps
The Combined Maps
- Ceramide Glycosylation
- Protein Glycosylation
Demos
- Demo 1
- Demo 2

Abbreviations, symbols, and color codes used in this model can be found here.

The KEGG Maps

map00510 N-Glycan Biosynthesis

http://www.kegg.jp/kegg-bin/show_pathway?map00510

Biosynthesis of N-glycans begins with the sequential attachment of 14 sugars to dolichol phosphate at the ER membrane (rule R01018 to rule R06264). The dolichol-sugar precursor is then transferred to newly synthesized proteins in rule R05976 by the protein complex oligosaccharyltransferase (OST) to the asparagine occurring at a Asn-X-Ser/Thr site. The protein-bound N-glycan precursor is subsequently trimmed, extended, and modified in the ER and Golgi by a complex series of reactions catalyzed by membrane-bound glycosidases and glycosyltransferases. N-linked glycosylation facilitates the protein-folding process and stabilizes the mature protein. Proteins modified in this way are represented in this graph by NProtein.

map00512 Mucin type O-Glycan Biosynthesis

http://www.kegg.jp/kegg-bin/show_pathway?map00512

Human mucins are highly O-glycosylated glycoproteins ubiquitous in mucous secretions on cell surfaces and in body fluids. After a Mucin type protein is N-glycosylated and folded properly in the Golgi, GalNAc is attached to serine and threonine residues by a member of the Galnt family. The first GalNAc is extended with sugars including Gal, GlcNAc, or Neu5Ac to form 5 different "Core" structures. This pathway shows the biosynthesis of the core structures and a few extended structures. Proteins modified by O-GalNAcylation are represented in this graph by OProtein1.

map00514 Other types of O-glycan Biosynthesis

http://www.kegg.jp/kegg-bin/show_pathway?map00514

KEGG does not provide a process type pathway in map 00514. Instead it shows glycan trees containing links to enzymes and reactions known to operate on specific linkages. A small Pathway Logic network can still be assembled using the reactions provided by KEGG and the locations of the enzymes from UniProt.

O-linked GlcNAc type: Protein O-GlcNAcylation is an O-linked glycosylation involving attachment of GlcNAc to Ser/Thr residues catalyzed by O-GlcNAc transferase (OGT) without further extension of GlcNAc. O-GlcNAcylation occurs primarily in nucleocytoplasmic proteins. There is no consensus sequence common to all O-GlcNAcylated proteins although about half of the mapped sites have a proline and valine residue on either side of the modified hydroxy amino acid. Proteins modified by O-GlcNAcylation are represented in this graph by OProtein2.

O-linked Man type: O-mannosyl glycans are a type of O-glycans that are found both in eukaryotes and prokaryotes. Biosynthesis of O-mannosyl glycans is initiated by the transfer of mannose from Man-P-Dol to serine or threonine residue, which is catalyzed by protein O-mannosyltransferases POMT1 and POMT2. Proteins modified by O-mannosylation are represented in this graph by OProtein3.

O-linked Fuc type: Pofut1 transfers fucose from GDP-Fuc to a serine or threonine on a properly folded EGF-like repeat (ELR) containing an appropriate consensus sequence (C2X4–5 (S/T)C3, where C2 and C3 are the second and third conserved cysteines of an ELR). Proteins containing ELRs that are modified by O-Fucosylation are represented in this graph by OProtein4b. O-Fucose can be elongated by any of the Fringe family of transferases that are specific for fucose residues in O-linkage to EGF-like repeats. Pofut2 recognizes thrombospondin type 1 repeats (TSRs) which contains six conserved cysteines and three disulphide bonds. Elongation of O-Fucose attached to TSRs is continued by the attachment of Glu by B3galtl. Proteins containing TSRs that are modified by O-fucosylation are represented in this graph by OProtein4a.

O-linked Glc type: The O-glucose modification site occurs between the first and second conserved cysteines of EGF-like repeats (ELRs) at the putative consensus sequence C1XSXPC2. The O-glucose glycan typically exists as the trisaccharide Xyl-Xyl-Glc- although the monosaccharide form is also seen. The first linkage is catalyzed by Poglut1 which is localized to the ER and requires a properly folded ELR as a substrate. Proteins containing ELRs that are modified by O-glucosylation are represented in this graph by OProtein5.

O-linked Gal type: Proteins with collagen domains are modified by a disaccharide, Glc-Galβ-, which is assembled on hydroxylysine or hydroxyproline residues. The first step in the pathway is the hydroxylation of either lysine to hydroxylysine or proline to hydroxyproline using the appropriate hydroxylases. Once these acceptor sites are created, the two-step glycosylation rapidly follows on most of the available acceptor sites. Glycosylation is thought to proceed until the protein folds and assembles into a triple helix. Proteins containing collagen domains that are modified by O-galactosylation are represented in this graph by OProtein6.

map00532 Glycosaminoglycan biosynthesis - chondroitin sulfate / dermatan sulfate

http://www.kegg.jp/kegg-bin/show_pathway?map00532

Chondroitin sulfate (CS) and dermatan sulfate (DS) are linear polysaccharide chains consisting of repeating disaccharide units known as glycosaminoglycans (GAGs). Their synthesis begins with covalent attachment of xylose to the hydroxyl group of a serine residue of certain core proteins represented in this graph by SProtein. The sugar chain is extended into a core tetrasaccharide linkage region consising of GlcA-Gal-Gal-Xyl-. The next step is the polymerization of alternating residues of GalNAc and GlcA in rules R05929 to R05933. In the case of DS the polymer chains are further modified by epimerization of GLcA to IdoA. In rule R02180 sulfation modification are performed at various positions in these sugar residues by carbohydrate sulfotransferases. Although chain polymerization and modifications are thought to occur simultaneously, they are displayed as sequential reactions here as they are in the KEGG maps.

map00534 Glycosaminoglycan biosynthesis - heparan sulfate / heparin

http://www.kegg.jp/kegg-bin/show_pathway?map00534

The synthesis of of Heparin and Heparan Sulfate begins with the core linkage region (GlcA-Gal-Gal-Xyl-) produced in the chondroitin sulfate / dermatan sulfate map by rules R05925-R05928. The polymerization of alternating residues of GlcNAc and GlcA is shown in rules R05930 to R05936. As the chain polymerizes, HS/Hep undergoes a series of modification reactions including N-deacetylation, N-sulfation, epimerization, and subsequently O-sulfation. The order and details of these reactions are unclear so they are grouped together in one rule (NKR8 for heparin sulfate and NKR9 for Heparin).

map00533 Glycosaminoglycan biosynthesis - keratan sulfate

http://www.kegg.jp/kegg-bin/show_pathway?map00533

Keratan sulfate (KS) is a glycosaminoglycan with the basic disaccharide unit of Gal-GlcNAc, with sulfate esters at C-6 of GlcNAc and Gal residues. There are two types of KS distinguished by the protein linkage: type I for N-linked via the N-glycan core structure (represented by KSProtein) and type II for O-linked via the O-glycan core 2 structure (represented by OProtein1).

map00563 Glycosylphosphatidylinositol(GPI)-anchor biosynthesis

http://www.kegg.jp/kegg-bin/show_pathway?map00563

A Glycosylphosphatidylinositol (GPI) anchor is a glycolipid that is attached to the C-terminus of a cell surface protein and attaches it to the cell membrane. Biosynthesis of GPI anchors proceeds in three stages: (i) pre-assembly of a GPI precursor in the ER membrane (rules R05916-R08107), (ii) attachment of the C-terminus of a newly synthesized protein (represented by GPIProtein) to a distal, nonreducing mannose residue of the GPI (rule NR1), and (iii) lipid remodeling and/or carbohydrate side-chain modifications in the ER and the Golgi (rule NR2).

Ceramide consists of a long-chain amino alcohol, sphingosine, in amide linkage to a fatty acid. Ceramide structures vary in length, hydroxylation, and saturation of both the sphingosine and fatty acid moieties, resulting in lipid structural diversity that impacts the presentation of the attached glycan at membrane surfaces. The glycosylation of human Glycosphingolipids (GSLs) in KEGG all start with linkage of Glucose to Ceramide followed by extension with β-linked galactose on the C-4 hydroxyl of glucose, to give lactosylceramide (Galβ1-4GlcβCer). KEGG divides the next reactions into three maps depending on the next reaction in the pathway: addition of Neu5Ac (rule R05937) or GalNAc (rule 05938) leads to the ganglio series, Gal leads to the globo series (rule R05960), and GlcNAc (rule R05971) to the lacto and neolacto series.

The Combined Maps

Ceramide Glycosylation

This map combines maps 00601, 00603, 00604 with two reactions from KEGG map 00600 that convert Ceramide to Lactosylceramide. If you can find the GSL you want to know about, this map can show you the series of reactions needed to make it from Ceramide. Finding a particular glycolipid in a sea of symbols is not easy but we are working on a glycan finder to do the job. As an example Ceramide decorated with the glycan with Kegg ID G00086 can be found in the center of the map at the botton. Click on this node, make it a goal and press the Subnet button to see the parts of the map relevant to this form. Press the FindPath button so see a specific construction pathway.

Protein Glycosylation

In order to study the effects of glycosylation on signal transduction we need to know which proteins are glycosylated with which glycans. This map combines the maps in which proteins are modified by different types of glycosylation (maps 00510, 00512, 00514, 00532,00534, 00533) and substitutes the KEGG symbols representing amino acids with proteins known to be linked to sugars at those sites. Protein lists were derived from UniProt Version 126.

Demos

Now that we have the maps - what can we learn from them? We have assembled a series of topics that can be investigated by using this model and made them into demonstrations that display various features of PLA. Additional information on how to use PLA can be found in the SmallKB guide and the PLA Reference Guide. Abbreviations, symbols, and color codes used in this model can be found here.

Demo 1

What glycosylation patterns cannot be made if St8sia5 is removed?

Demo 2

How does Thbs1 get glycosylated?

Demo 3

How is CD15 antigen assembled on Ceramide?