Scenario: You have a list of 50 samples per group that meet criteria for your RNA-seq study, but you only want to sequence 20 per group. How do you randomly pick 20 per group? Also, how do you randomly order them for sequencing?
Random selection with Excel:
- In your spreadsheet with all eligible samples, have a column for each sample's group (e.g. IVF, NIFT, Spontaneous; Female, Male; Overweight, Normal, Underweight; et cetera). It needs to be a categorical variable, not discrete or continuous numbers.
- Add a new column and label it "RandomNum"
- Fill the "RandomNum" column with =RAND() and press enter to generate random numbers
- Copy the "RandomNum" column and re-paste as values to remove the formula, otherwise it re-calculates each time you open the spreadsheet
- Sort by the random number, then the group, and pick 20 from each group that way
- Make a new column "Include" and fill in yes/no
Make the final selection:
- Check criteria demographics of your randomly selected groups. Swap out Include=yes and Include=no samples if needed to fix any obvious imbalances or remove obvious outliers. For example, when comparing fetal sex, we try to also balance fetal race. Another example: if the maternal age range is 30-40 in one group, and similar in the other group except for one outlier age 23, we may swap that outlier for a different subject.
- Don't balance demographics that might be related to your study variable (and keep in mind you don't always know which are related). For example, for our study of infertility looking at IVF and non-IVF treatment subgroups, the non-IVF treatment group had a higher BMI than both the IVF and the no treatment control group. We did not balance for BMI because anything metabolic can be related to fertility.
- Don't balance outcome demographics. For example, for our study of first trimester placenta gene expression, we tried to balance fetal sex and gestational age at first trimester sampling time, but we did not balance birth weight because that is a study outcome (appears after our first trimester sampling timepoint). In fact, males tend to be heavier than females at birth.
Random order with Excel:
- Sort the spreadsheet by the random number again and then sort by column "Include" so you have all the Include=yes samples together
- Make a new column "Order" and fill in with 1,2,3,4,5,...,38,39,40 (example assuming two groups of 20) to randomize the order of the samples for sequencing
No comments:
Post a Comment