Ah. Was trying to describe snps with a "single variant," but that would make them "not polymorphic" and "not variable". "Monomorphic" was the word I was looking for. Still a bit overwhelmed by how complicated this is.sovietxrobot wrote:I am not sure what is meant by removing common variants- common variants can contribute to disease risk. This isn't a step I have ever taken, but it could be that the genotyping platform already ignores these variants.
The typical process for genomics research is to get a sample set genotyped on some platform (usually ~300k SNPs). These data is reduced to only well-sampled variants, and then imputed against a reference genome. Essentially, you use reference data and properties of genetics to interpolate SNPs you haven't directly observed. After QC of the imputed data, you are left with something in the ballpark of millions of SNPs.
Therein lies the problem- the sample size for this project is going to be tiny. A genome-wide approach looking to discover new risk SNPs is not realistic. Analyzing a smaller set of SNPs that you already have a hypothesis on is much more tractable.
Discussion here: https://www.biostars.org/p/80014/
and of usage of the --maf option in PLINK to exclude monomorphic "snps" here: https://stackoverflow.com/questions/335 ... f-the-data
@Ghost, There doesn't appear to be a list of monomorphic loci applicable to 23andMe data.
@sovietxrobot, You appear to have a good grasp on this subject. What would you consider a sufficient number of cases?
I think we have around 60 if we aggregate PFS, PSSD, and PAS patients.