March 21, 2017

Python: Starting Python for the First Time (Windows & Linux)

Python is a programming language that is popular as a first programming languages for biologists. This very basic "getting started" tutorial is for Windows users, plus some suggestions for Linux users.

Install Python on Windows

Windows does not have Python pre-installed like Linux, so some version of Python must be installed in order to run a Python script. There are various versions. Biologists, make your life easy and download a SciPy.org version that comes with common add-ons like the pandas packages (useful for editing spreadsheets), the numpy package (useful for working with big tables of data), and various statistics and plotting packages. See my recommended downloads below:

  1. Download and install Python:
    1. If hard drive space isn't an issue (you'll need 5-6 GB), download the Anaconda distribution of Python, one of the SciPy recommended distributions with support for Windows, Linux, and Mac. It comes with a bunch of common packages used in the life sciences (pandas, numpy, matplotlib). If Python is your first programming language, installing Anaconda may make your learning process easier. Also recommended for expert programmers. Anaconda is the most common Python distribution for data science and general life sciences uses.
    2. If you don't have enough hard drive space, download from Python.org directly and download other packages as needed. Beware, this requires more steps and is less beginner friendly. You will need to Google a lot of errors.
  2. Download and install a text editor for programmers (lightweight software): 
    1. You can also use the default Notepad.exe application (because programs are just text files with special file extensions), but text editors built for programmers will make your life easier by color-coding the code and helping you notice when you forgot a closing parenthesis or added an extra apostrophe.
    2. If you are short on hard drive space, you can do all your coding on a text editor.
    3. WindowsNotepad++ is freeware that works for Python and many, many other programming languages. 
    4. Linux: Geany is freeware that works similarly. 
  3. Download and install more complex IDE software (highly recommended for beginners): 
    1. Spyder comes installed with Anaconda and is my favorite IDE for Python. 
    2. Jupyter is popular and great for multi-script projects. Personally, I find it to be too much for short biology/bioinformatics projects. 
    3. RStudio now supports Python so you can use that if you already have it (if you program in R). I don't recommend it if you are learning Python as a new student, but it is useful if you plan to use both R and Python languages.  
  4. Install pip and use it to install Python packages
    1. Instructions for Unix/Linux/macOS and Windows here


How to make a simple Python program and run it on the Windows terminal

These instructions use Notepad++ because that is how I learned Python. If you are using Spyder or RStudio instead, you can skip most of this and run the program directly on that software.
  1. Make a new folder at C:\Python 
  2. Open the program Notepad++
  3. Type the following and pay attention to parentheses, spaces, and quotation marks:
    print("Hello! Good morning!")
  4. Save into C:\Python with the following file name: hello.py
  5. Keep Notepad++ open. We will use it later.
  6. Open the "Run" Windows application by either searching for it or simply using [WindowsKey]+R. The Windows key looks like four squiggly boxes arranged as a square. It is usually two keys left of the spacebar.
  7. In the Run box, Open: cmd, then click OK. This opens a black command box which we will be using to run our Python script. 
  8. To navigate to the folder where our Python script is stored, type the following and press [Enter]:
    cd C:\Python
  9. That changed the directory which the command window is using to access and write files (cd = change directory). It is a very useful MS-DOS command.
  10. Type the following and press [Enter]:
    python hello.py
  11. That command runs our simple Python script. You should see a message that says:
    Hello! Good morning!
  12. Congratulations, you have run your first Python program!
  13. Go back to Notepad++ and change the message in between the quotation marks. Save.
  14. Go back to the cmd window and re-run your program by either re-typing python hello.py or using the up arrow key to search the cmd window memory for what you typed last. Press [Enter].
  15. Congratulations! You have now edited a program in Python!

Know the differences: Python 2 vs Python 3

Python went through a major update from version 2 to version 3. For Python beginners, it is VERY important to know this because it affects the syntax of code. If you are taking code from the internet to work on your project, you may need to edit it to to work with Python 2 or Python 3, otherwise you will get errors. 

The most obvious change is how to print an output.

Python 2.x
print "Hello World!"

Python 3.x
print("Hello World!")
 
The different print function syntax is usually the only thing I need to change when borrowing old code from the internet, but be aware of other differences between Python 2 vs Python 3.

Which Python version should you use? Python 3 for the life sciences. Definitely. If you are starting to learn for the first time, learn Python 3 and just remember to update the print function syntax if you incorporate old Python 2 code into your project. 

If you use Linux: check your Python versions by running these lines on the terminal:
python --version 
python3 --version 
 


How to make a Windows shortcut to run a script (easiest way) - optional!

Suppose you don't want to save your program to C:\Python and instead want to save your program to C:\Users\YourName\Dropbox\coding\python\test1\blah\blah\ or some other long path.

You can still run your program using the command window and the following commands:

cd C:\Users\YourName\Dropbox\coding\python\test1\blah\blah
python hello.py

But that's a lot to type. There is an easier way!

  1. Open Notepad (or Notepad++ or any simple text editor). Type this:

    cd C:\Users\YourName\Dropbox\coding\python\test1\blah\blah
    python hello.py
    pause


  2. Save that text file as run_hello.bat (or any name you want, just replace the .txt extension with .bat to create a batch file). It doesn't matter where you save this file. You can save it to the desktop.
  3. Double-click run_hello.bat and you will see a Windows command window open and run those three lines of code in your batch file. Windows will navigate to the directory you selected, run your Python program, and then leave the window open (with pause) so you can see any print output.
  4. Note: if you make the batch file in the same folder as the Python script, you can skip the change directory (cd) line entirely. You only need two lines in run_hello.bat:

    python hello.py
    pause

Install Python on Windows Subsystem for Linux (Ubuntu Distribution) - optional

Some bioinformatics Python tools are not available for Windows (e.g. bioconda for GWAS analysis). However, starting with Windows 10, they are still accessible if you set up the Windows Subsystem for Linux (WSL). This is not the same as dual boot. WSL allows you to run Linux within Windows. I recommend the Ubuntu version of Linux. 

To install Python on the WSL Ubuntu distribution, follow the Anaconda installation instructions for Debian. Ubuntu is a version of Debian.

Install prerequisites by typing this into the Ubuntu terminal:

sudo apt-get update
sudo apt-get install libgl1-mesa-glx libegl1-mesa
sudo apt-get install libxrandr2 libxrandr2 libxss1 
sudo apt-get install libxcursor1 libxcomposite1 libasound2 
sudo apt-get install libxi6 libxtst6


Download the Linux Anaconda installer (x86 probably) and save it wherever you're going to save all your Ubuntu files. For me, that's Dropbox.

Navigate to where the file is saved and run the installer. Note that in the WSL environment, /mnt/c is the C:/ drive. Example of how to navigate to a Dropbox folder:
cd /mnt/c/Users/YourName/Dropbox

Check the sha256sum for your download by typing this (or the equivalent for a different version):
sha256sum Anaconda3-2020.02-Linux-x86_64.sh

Result:
2b9f088b2022edb474915d9f69a803d6449d5fdb4c303041f60ac4aefcc208bb  Anaconda3-2020.02-Linux-x86_64.sh


If the sha256 matches (it does for me), run the installation file using the Ubuntu terminal:
bash Anaconda3-2020.02-Linux-x86_64.sh
Click [ENTER] past the user agreement. Accept the default installation location.

Activate Anaconda on the Ubuntu terminal: 
source anaconda3/bin/activate
conda info #information about anaconda

Close and re-open the Ubuntu terminal.

If needed, install bioconda on the WSL by typing this into the Ubuntu terminal:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge


How to do basic stuff on the Python console

  • Work directory
    • import os
    • print(os.getcwd()) #get current work directory
    • os.chdir('/Documents/User/example') #change directory
    • os.listdir() #list files in the current directory
  • How to find out where Python is installed so you can set the Python interpreter for Spyder (if you're having issues installing packages for Spyder)
    • Windows console: py --sys



-----------

Updated 2/18/2024

Bookmarks: single cell RNA-seq tutorials and tools

These are my bookmarks for single cell transcriptomics resources and tutorials. scRNA-seq introductions How to make R obj...