Graph structure inference for high-throughput genomic data

Graph structure inference for high-throughput ...
Hui Zhou, Hui Zhou
Locate

My Reading Lists:

Create a new list



Buy this book

Last edited by MARC Bot
December 22, 2022 | History

Graph structure inference for high-throughput genomic data

Recent advances in high-throughput sequencing technologies enable us to study a large number of biomarkers and use their information collectively. Based on high-throughput experiments, there are many genome-wide networks constructed to characterize the complex physical or functional interactions between the biomarkers. To identify outcome-related biomarkers, it is often advantageous to make use of the known relational structure, because graph structured inference introduces smoothness and reduces complexity in modelling. In this dissertation, we propose models for high-dimensional epigenetic and genomic data that incorporate the network structure and update the network structure based on empirical evidence. In the first part of this dissertation, we propose a penalized conditional logistic regression model for high dimensional DNA methylation data. DNA methylation of CpG sites within genes are often correlated and the number of CpG sites typically far outnumbers the sample size. The new penalty function combines the truncated lasso penalty and a graph fuse-lasso penalty to induce parsimonious and consistent models, and to incorporate the CpG sites network structure without introducing extra bias.

An efficient minorization-maximization algorithm that utilizes difference of convex programming and alternating direction method of multipliers is presented. Extensive simulations demonstrated superior performance of the proposed method compared to several existing methods in both model selection consistency and parameter estimation accuracy. We also applied the proposed method to a matched case-control breast invasive carcinoma methylation data from the Cancer Genome Atlas (TCGA), generated from both Illumina Infinium HumanMethylation27 (HM27) and HumanMethylation450 (HM450) Beadchip. The proposed method identified several outcome-related CpG sites that have been missed by the existing methods. In the latter part of this dissertation, we propose a Bayesian hierarchical graph-structured model that integrates {em a priori} network information with empirical evidence. Empirical data may suggest modifications to the given network structure, which could lead to new and interesting biological findings when the prior knowledge on the graphical structure among the variables is limited or partial.

We present the full hierarchical model along with the Markov Chain Monte Carlo sampling inference procedure. Using both simulations and brain aging gene pathway data, we showed that the new method can identify discrepancy between data and a prior known graph structure and suggest modifications and updates. Motivated by methylation and gene expression data, the two models we propose in this thesis make use of the available structure in the data and produce better inferential results. The proposed methods can be applied to a wider range of problems.

Publish Date
Language
English

Buy this book

Book Details


Edition Notes

Department: Biostatistics.

Thesis advisor: Shuang Wang.

Thesis (Ph.D.)--Columbia University, 2014.

Published in
[New York, N.Y.?]

The Physical Object

Pagination
1 online resource.

Edition Identifiers

Open Library
OL44812603M
OCLC/WorldCat
896221937

Work Identifiers

Work ID
OL32962557W

Source records

marc_columbia MARC record

Community Reviews (0)

No community reviews have been submitted for this work.

Lists

History

Download catalog record: RDF / JSON
December 22, 2022 Created by MARC Bot import new book