This R code example shows how to automatically compile multiple csv spreadsheets (or csv-formatted simple Excel spreadsheets) into one multi-sheet Excel spreadsheet. Personally, I use this to combine datasets from Ingenuity Pathways Analysis for supplemental files. I can do it manually but I always worry that I'll accidentally open the wrong analysis file or mis-name the tab.
This way, I have a record of what files became what named sheets. I can also quickly re-compile a new Excel results file if I rerun the analysis by just changing the filenames in this code.
Packages
#install.packages("gdata")
#install.packages("openxlsx")
library(gdata) ## for read.xls() function; requires perl
library(openxlsx)
Data input with R package gdata
## function to load data and edit columns -----------------
get_data = function(myfile, mysheet=1) {
ipa = gdata::read.xls(
xls = myfile,
header = TRUE,
stringsAsFactors = FALSE,
fileEncoding = "UTF-8",
skip = 0,
method = "csv",
na.strings = c(
"NA"
,
""
,
"-"
,
" "
),
sheet = mysheet)
print(dim(ipa))
## [1] 612 5
## edit column names
colnames(ipa) = c("Pathway", "-Log10P", "Ratio", "Zscore", "Molecules")
## make Pathway an ordered factor so ggplot doesn't reorder it
## don't really need it here but it's helpful if you are making plots too
ipa$Pathway = factor(ipa$Pathway, levels=ipa$Pathway)
## data preview - make sure you're using the right data!
cat("\n\nPreview:\n\n")
print(ipa[1:9,1:4])
return(ipa)
}
## load data -----------------
top15cvs = get_data(
myfile = "2021-11-18run/protein_top15-expr_First/canonical-FirstTop15p.xls")
## Preview:
##
## Pathway -Log10P Ratio Zscore
## 1 EIF2 Signaling 73.1 0.558 NA
## 2 Mitochondrial Dysfunction 67.3 0.614 NA
## 3 Oxidative Phosphorylation 62.6 0.739 NA
## 4 Sirtuin Signaling Pathway 38.5 0.360 NA
## 5 Regulation of eIF4 and p70S6K Signaling 33.7 0.425 NA
## 6 Huntington's Disease Signaling 31.6 0.335 NA
## 7 mTOR Signaling 29.5 0.368 NA
## 8 Protein Ubiquitination Pathway 28.0 0.320 NA
## 9 Estrogen Receptor Signaling 26.0 0.264 NA
top15pl = get_data(
myfile = "2021-11-18run/protein_top15-expr_Third/canonical-ThirdTop15.xls")
## [1] 579 5
##
##
## Preview:
##
## Pathway -Log10P Ratio Zscore
## 1 EIF2 Signaling 72.6 0.554 NA
## 2 Mitochondrial Dysfunction 46.9 0.509 NA
## 3 Oxidative Phosphorylation 41.5 0.595 NA
## 4 mTOR Signaling 32.4 0.382 NA
## 5 Sirtuin Signaling Pathway 29.8 0.318 NA
## 6 Regulation of eIF4 and p70S6K Signaling 28.8 0.391 NA
## 7 Huntington's Disease Signaling 20.9 0.278 NA
## 8 Phagosome Maturation 19.8 0.346 NA
## 9 Estrogen Receptor Signaling 19.5 0.234 NA
Alternative: Data input with R package openxlsx
get_data = function(myfile, mysheet=1) {
mytable = openxlsx::read.xlsx(
xlsxFile = myfile,
colNames = TRUE,
rowNames = FALSE,
startRow = 1,
na.strings = c("NA", "", "-", " "),
sheet = mysheet)
return(mytable)
}
Create multi-sheet Excel output with R package openxlsx
## sheet names and dataframes to use for the final file -------------
dataset_names = list(
'A-top15-First' = top15cvs,
'B-top15-Third' = top15pl,
'C-DEGs-First-Upreg' = D1up,
'D-DEGs-Third-Upreg' = D3up,
'E-SEGs' = SEGs
)
## make Excel filename with date stamp --------------
fn = paste0("SI_File_IPA_Enriched_Canonical_Pathways_", Sys.Date(), ".xlsx"); fn
## [1] "SI_File_IPA_Enriched_Canonical_Pathways_2022-01-21.xlsx"
## make Excel file -------------------------------
openxlsx::write.xlsx(
dataset_names, file = fn
)
Session
sessionInfo()
No comments:
Post a Comment