Biology for computer scientists

I've recently signed up for a course in Bioinformatics, a field that develops methods and software tools for understanding biological data. Having a strictly computational background, there were several terms and definitions from biology I needed to look up. Therefore, I crawled Wikipedia (the simple version) to gather some relevant definitions and would like to share them with you. Some of the information also originates from Chad Myers' lecture materials in CSCI 5461 - Functional Genomics, Systems Biology, and Bioinformatics.


According to Wikipedia, a cell is the smallest unit of life that can replicate independently. There are two types of cells: eukaryotes (either single-celled or part of multicellular organisms), which contain a nucleus, and prokaryotes (normally single-cell organisms), which do not. A cell consists of several biomolecules: DNA, RNA, protein, and lipids. These are molecules that drive life through many different biological processes.


Wikipedia says that Proteins are long chains molecules built from amino acids. They're essential to all cells and take part in several processes, such as metabolism, cell signaling, immune responses, and cell division. The processes also include mechanical and structural functions; these functions can be found in muscle cells. Four things determine what a protein will do: order of amino acids, twists in the chain, how it's folded up, and if it's made up of different sub-units.

Amino acid

Amino acids are the building blocks of proteins. In eukaryotes, there are 20 amino acids that proteins can be synthesized from. Chemically, amino acids consist of both amine and carboxyl functional groups.


DNA is short for deoxyribonucleic acid (one of the two nucleic acids, the other one is RNA). It contains the genetic code for organisms. DNA resides in every cell of the organisms and directs the production of needed proteins. The DNA is split into a "coding" and a "non-coding" part, where latter group contains sequences that do not code proteins. 98% of the human genome is "non-coding DNA".

DNA is made of four types of nucleotides, namely Adenine (A), Thymine (T), Cytosine (C), and Guanine (G). A and T are bonded by on two hydrogen bonds, likewise C and G are bonded by three, therefore creating base pairs. These base pairs are connected in a long sequence that forms a double helix. This way the cell can read the base pairs of the DNA and transcribe it into a code that can be translated by the cell to synthesize proteins.


The genome is the entire collection of genetic information encoded in its DNA, and includes both genes and non-coding sequences.


This is the basic unit of genetic information inherited from one generation to the next. It's a form of DNA, which contains a single set of instruction. The instruction normally code for a particular protein.


The genotype is the genetic constitution of an organism, mainly its genome.


Genotype + Environment --> Phenotype

The phenotype is the composite of an organism's observable characteristics or traits, such as morphology or development. Phenotypes are determined mainly by genes, but are also influenced by environmental factors. In other words, your phenotypes (observable traits), result from the interactions between your genes and the environment.

Thanks Liz Pham for proofreading and suggesting improvements for this blog post! Cover photo from xkcd.