What are MD5 checksums? Checksums are nonsense text strings used to "summarize" a file version. No matter the size of the file (1 kb or 30 GB), the checksum algorithm
gives you a conveniently short nonsense string of letters and numbers. The exact same file will give you the exact same checksum every time. If you change a single character or pixel, you will get a different checksum.
MD5 is a specific popular algorithm to get checksums.
Why use checksums? The purpose of checksums is to notice data corruption, especially when downloading files from or uploading files to server. Every time you transfer files between computers, there is a risk of data corruption. For small files, the risk is small and you'll most likely notice, for example if your email attachment download fails due to an internet interruption.
For large files such as raw sequencing data files, it's a bigger issue and you might not notice right away (or ever) if the last few RNA-seq reads of a >30 million reads file are missing. Therefore, the best practice when downloading new sequencing is to create MD5 checksums yourself and compare them with the MD5 checksum created by the originating computer (the sequencing core's server). They should be the same. If not, something went wrong during file transfer! Try re-downloading the data.
Similarly, when you upload sequencing data to a public repository (e.g. NCBI GEO), you provide MD5 checksums so that the receivers (NCBI's data curators) can confirm the upload was successful.
How to get an MD5 checksum for an individual file? See example below using the Linux terminal. I created a text file containing only the phrase "hello pretend this is sequencing data". The checksum for that file is "b088d8d4d1d831af2d8d16147389aa7d". If I change the first letter to uppercase, the checksum completely changes.