Page 29 of 33

Re: PSSD Genome Project [Megathread]

Posted: Wed Oct 02, 2019 10:15 pm
by Dubya_B
sovietxrobot wrote:I am not sure what is meant by removing common variants- common variants can contribute to disease risk. This isn't a step I have ever taken, but it could be that the genotyping platform already ignores these variants.

The typical process for genomics research is to get a sample set genotyped on some platform (usually ~300k SNPs). These data is reduced to only well-sampled variants, and then imputed against a reference genome. Essentially, you use reference data and properties of genetics to interpolate SNPs you haven't directly observed. After QC of the imputed data, you are left with something in the ballpark of millions of SNPs.

Therein lies the problem- the sample size for this project is going to be tiny. A genome-wide approach looking to discover new risk SNPs is not realistic. Analyzing a smaller set of SNPs that you already have a hypothesis on is much more tractable.
Ah. Was trying to describe snps with a "single variant," but that would make them "not polymorphic" and "not variable". "Monomorphic" was the word I was looking for. Still a bit overwhelmed by how complicated this is.

Discussion here: https://www.biostars.org/p/80014/
and of usage of the --maf option in PLINK to exclude monomorphic "snps" here: https://stackoverflow.com/questions/335 ... f-the-data

@Ghost, There doesn't appear to be a list of monomorphic loci applicable to 23andMe data.

@sovietxrobot, You appear to have a good grasp on this subject. What would you consider a sufficient number of cases?
I think we have around 60 if we aggregate PFS, PSSD, and PAS patients.

Re: PSSD Genome Project [Megathread]

Posted: Thu Oct 03, 2019 11:01 am
by sovietxrobot
Dubya_B wrote: Ah. Was trying to describe snps with a "single variant," but that would make them "not polymorphic" and "not variable". "Monomorphic" was the word I was looking for. Still a bit overwhelmed by how complicated this is.
monomorphic SNPs aren't included on most chips (at least as far as I know) because there is nothing to compare. But yes it is trivial to remove- the minor allele frequency will be 0.
@sovietxrobot, You appear to have a good grasp on this subject. What would you consider a sufficient number of cases?
I think we have around 60 if we aggregate PFS, PSSD, and PAS patients.
thanks, I work in genetics research. The necessary sample size depends on what you are trying to do. Assuming that there is some commonality between these 3 conditions (PFS, PSSD, PAS), you would be sufficiently powered to examine a small number of pre-determined SNPs of interest. The more subjects you have, the smaller effects you can detect. Discovery of new SNPs on a genome-wide level requires sample sizes of at least hundreds, but realistically most modern studies have sample sizes in the five or six figure range.

Re: PSSD Genome Project [Megathread]

Posted: Thu Oct 03, 2019 10:46 pm
by Ghost
Thanks for your help on this guys. Let's find a time to talk at some point and figure out how we want to move forward.

We have a fair number of genomes, but MORE would always be better. It's not too late for people to join in on this. As evidenced by how long I've been working on this, it's not something I'm going to drop soon.

Re: PSSD Genome Project [Megathread]

Posted: Tue Oct 22, 2019 11:11 am
by sovietxrobot
Paper on the genetic effects of citalopram: Kanherkar et al. 2018. The effect of citalopram on genome-wide DNA methylation of human cells. International Journal of Genomics.

In short: Human cell culture exposed to citalopram for a month. OXT gene downregulated, inhibition of dopa pathway. Coincides with findings of negative effects of SSRIs on dopaminergic signaling.

Re: PSSD Genome Project [Megathread]

Posted: Fri Feb 05, 2021 6:37 pm
by Ghost
I have some exciting news to share on a project I've been working on for a while now.

I have compiled a .csv dataset for the 30 PSSD genomes I've collected the past 2 years (huge thanks to all who sent them in). This means that any person can now search for allele frequency of around 28k snps on the 23andme v4 chip. All the data is de-identified and lets people do a lot of one-off searches they might be curious about.

https://pssdlab.wordpress.com/raw-data- ... e-project/

I have some more plans for this data in the future, and I'll update you all as I move forward with them.

Best,

G

Re: PSSD Genome Project [Megathread]

Posted: Sat Feb 06, 2021 1:40 pm
by sovietxrobot
@ghost and @dubya lets talk more about this. also encourage others to submit genomes.

Re: PSSD Genome Project [Megathread]

Posted: Sat Feb 06, 2021 3:48 pm
by Ghost
The more genomes the better!

If we had, for example, a 5% error rate due to noise from not having enough genomes, that would work out to around 1,400 inaccurate data points from the 28k from 23andme v4. I'm hoping that publishing this data again in 2021 stirs up a few more people to buy the 23andme kit and submit their data. I haven't given up on this project and all contributions will continue to be useful into the future.

Re: PSSD Genome Project [Megathread]

Posted: Sat Feb 06, 2021 5:01 pm
by DrugsAreBad
@Ghost

Are you sure you posted the raw data? I only see a Spreadsheet containing frequencies and a link to snpedia.

Re: PSSD Genome Project [Megathread]

Posted: Sun Feb 07, 2021 9:50 am
by PsychoGenesis
why is rs1042778 omitted??

Re: PSSD Genome Project [Megathread]

Posted: Sun Feb 07, 2021 4:59 pm
by Ghost
DrugsAreBad wrote: Sat Feb 06, 2021 5:01 pm @Ghost

Are you sure you posted the raw data? I only see a Spreadsheet containing frequencies and a link to snpedia.
I should clarify, I posted the raw data the program generated. Some time in the near furure I'm going to run it against the frequencies posted on the NCBI website.

RE rs1042778: I don't see it in the list of snps snpedia listed on the v4 chip, so I didn't grab the frequencies from the data. I can go back and grab it later if you'd like.