2.1 Basics of Computer Systems

For a course about chemical information we start at the lowest level - how computers deal with information internally. Some of you may already have some understanding of this, but there are some important topics we need to touch on to set things up for other topics.  As a start, imagine you have a Word document that contains the following:


Humans of course can very easily identify that this is a telephone number, but to a computer it looks like the following in its basic format – binary (more information on this later…)

00111001, 00110000, 00110100, 00101101,
00110110, 00110010, 00110000, 00101101,
00110001, 00110001, 00110011, 00111000

Whether it be on a regular hard disk drive (magnetic coating on a disk) or a flash/SSD (memory chips) all data on a computer is stored as 1’s and O’s  (see http://www.pcmag.com/article2/0,2817,2404258,00.asp). As a result, all data on a computer must be represented in binary notation.  Each recorded 1 or 0 is a ‘bit’ and eight bits make a ‘byte’. A byte is the basis of presenting data because eight bits, being either 1 or 0, can together represent numbers up to 255.  This is because for each bit there are two possible permutations and eight bits gives 28 combinations - 256 values, or 0 thru 255.

In the early days of computer systems, a single byte was used as the way to represent text characters, and was defined by the American Standard Code for Information Interchange (ASCII – see http://www.ascii-code.com/).  Initially, only the numbers 0-127 where used to represent letters, numbers, punctuation marks and symbols (32-127), and non-printed characters (0-31).  Subsequently, extended ASCII was introduced which added accented characters, other punctuation marks, and symbols (128-255).

Looking back on the example of the telephone number above, we can now translate the binary into the telephone number:

Binary Decimal ASCII Character



































Although we still ‘use’ ASCII today, in reality we use something called UTF-8.  This is easier to say than how it is derived - Universal Coded Character Set + Transformation Format - 8-bit.  Unicode (see http://unicode.org) started in 1987 as an effort to create a universal character set that would encompass characters from all languages and defined 16-bits, two bytes -> 216 -> 256 x 256 = 65536 possible characters – or code points. Today, the first 65536 characters are considered the “Basic Multilingual Plane”, and in addition there are sixteen other planes for representing characters giving a total of 1,114,112 code points.  Thankfully, we don’t need to worry because if something is UTF-8 encoded it is backward compatible with the first 128 ASCII characters.

It’s worth pointing out at this stage that the development of Unicode is a good thing for science.  We speak our own language and have special symbols that we use in many different situations (how about the equilibrium symbol?  ⇌ ) and so publishers in science and technology have developed fonts for reporting scientific research.  Check out and install STIX fonts (http://www.stixfonts.org) which would not be possible without Unicode.

Additional Material


Additional Resources

No votes yet
Join the conversation.

Comments 4

John House (not verified) | Tue, 09/08/2015 - 13:51
Thanks for adding the article regarding the difference between SSD's and HDD's, it was a very good read. That would make sense as to why smart phones are much more quicker to turn on that desktops.

John House (not verified) | Tue, 09/08/2015 - 16:50
I followed the link and could not find where to download it; I would appreciate it if anyone could help direct me to the proper location. Also, how exactly do we use it? Does it plug into Microsoft Word?

Brandon Davis (not verified) | Tue, 09/08/2015 - 16:54
<a href="http://www.stixfonts.org/install.html">http://www.stixfonts.org/install.html</a>

Stuart Chalk's picture
Stuart Chalk | Tue, 09/08/2015 - 17:07
On the install page it gives you directions on how to install the font set (there are several files). The download link takes you to the STIX fonts SourceForge site where you need to go to the files section. The following link however, will take you to the latest release (<a href="http://sourceforge.net/projects/stixfonts/files/Current%20Release/">http://sourceforge.net/projects/stixfonts/files/Current%20Release/</a>) and you will want to download the -word.zip file.