It looks like you're offline.
Open Library logo
additional options menu
⚠ Urge publishers to restore access to 500,000 removed library books: Sign Letter - Learn More

MARC Record from marc_columbia

Record ID marc_columbia/Columbia-extract-20221130-034.mrc:12818265:3171
Source marc_columbia
Download Link /show-records/marc_columbia/Columbia-extract-20221130-034.mrc:12818265:3171?format=raw

LEADER: 03171cam a22003733i 4500
001 16624196
005 20220701080414.0
006 m o d
007 cr |n||||a||||
008 220615s2022 nyu|||| om 00| ||eng d
035 $a(OCoLC)1333964231
035 $a(OCoLC)on1333964231
035 $a(NNC)ACfeed:legacy_id:ac:12jm63xspf
035 $a(NNC)ACfeed:doi:10.7916/cv4k-2t44
035 $a(NNC)16624196
040 $aNNC$beng$erda$cNNC
100 1 $aDavison, Andrew.
245 10 $aStatistical Perspectives on Modern Network Embedding Methods /$cAndrew Davison.
264 1 $a[New York, N.Y.?] :$b[publisher not identified],$c2022.
336 $atext$btxt$2rdacontent
337 $acomputer$bc$2rdamedia
338 $aonline resource$bcr$2rdacarrier
300 $a1 online resource.
502 $aThesis (Ph.D.)--Columbia University, 2022.
500 $aDepartment: Statistics.
500 $aThesis advisor: Tian Zheng.
520 $aNetwork data are ubiquitous in modern machine learning, with tasks of interest including node classification, node clustering and link prediction being performed on diverse data sets, including protein-protein interaction networks, social networks and citation networks. A frequent approach to approaching these tasks begins by learning an Euclidean embedding of the network, to which machine learning algorithms developed for vector-valued data are applied. For large networks, embeddings are learned using stochastic gradient methods where the sub-sampling scheme can be freely chosen. This distinguishes it from the setting of traditional i.i.d data where there is essentially only one way of subsampling the data - selecting the data points uniformly and without replacement. Despite the strong empirical performance when using embeddings produced in such a manner, they are not well understood theoretically, particularly with regards to the role of the sampling scheme. Here, we develop a unifying framework which encapsulates representation learning methods for networks which are trained via performing gradient updates obtained by subsampling the network, including random-walk based approaches such as node2vec.
520 $aIn particular, we prove, under the assumption that the network has an exchangeable law, that the distribution of the learned embedding vectors asymptotically decouples. We characterize the asymptotic distribution of the learned embedding vectors, and give the corresponding rates of convergence, which depend on factors such as the sampling scheme, the choice of loss function, and the choice of embedding dimension. This provides a theoretical foundation to understand what the embedding vectors represent and how well these methods perform on downstream tasks; in particular, we apply our results to argue that the embedding vectors produced by node2vec can be used to perform weakly consistent community detection.
653 0 $aStatistics
653 0 $aMachine learning--Statistical methods
653 0 $aMachine learning--Graphic methods
653 0 $aComputer networks
856 40 $uhttps://doi.org/10.7916/cv4k-2t44$zClick for full text
852 8 $blweb$hDISSERTATIONS