Finding Distinct Subgroups of Samples Using Microbiome Taxa Count Data
In this first BioRankings’ Technical Report we show how cluster analysis is highly subjective with results changing for different inputs, why arguments against the Dirichlet-multinomial distribution for microbiome data are wrong, how the finite mixture models operate, and finally an example of this analysis using HMP stool samples. Download PDF for more information.
Dealing with High-Dimensional Data
In this Technical Report we focus on wide and short data. When deciding how to analyze big data there are two ways of thinking about it. Are the data organized in a few columns and lots of rows (tall and narrow), or are the data organized in lots of columns and few rows (short and wide)? In biomedical research using high-throughput technology (i.e., -omics), short and wide occurs – lots of columns (wide) corresponding to the biological measurements, and few rows (short) corresponding to patients or samples. While this may not be big data in terms of storage, it presents huge problems from having a very large number of ways of analyzing the many biological measurements. Download PDF for more information.
In Technical Report 2 we showed through simulations that all pairwise distances become identical as the number of dimensions approaches infinity. This is a fact which can also be proven mathematically. In this Supplement, we demonstrate this theory with real microbiome data. Download PDF for more information.