2.5 Storing Information in Databases

A database is a system that stores information in a logical, organized format.  Many applications on your computer (contacts, calendar, mail reader) rely on databases as the places to store the information that needs to be displayed in a way that allows the application to easily go grab the data.

While there are a number of styles of database the most common currently are relational.  This means that data in different ‘tables’ is related together by the use of a unique id field, also called a foreign key.  Each row of a table is given a unique value and this is then entered in a field in another table for any rows of data that are related to the information in the first table.  Sounds complicated but it's easy to see it in action in the upcoming examples.

Desktop Database

Microsoft Access has been a staple part of the Microsoft Office suite of applications since 1992.  It often goes overlooked as there are a wider variety of uses for Word, Excel, and PowerPoint, but it is a powerful relational database that has found great use in business.

Users can define multiple tables of data (one entry is a row with multiple fields (columns)) and then link them together using unique ids that are automatically created on each row.  Columns can be different data types (text, numeric, date/time, currency, etc.) or relational (lookup).  The example below shows a small set of artists, albums, and songs that are related together via fields defined in the tables.  The relationships can be see in the last image with the ‘Album’ field of the ‘Songs’ table being linked to the ID field of the ‘Albums’ table, and the ‘Band’ field of the ‘Albums’ table being linked to the ID field of the ‘Artists’ table.  As can be seen in the figures, entering data in the linked fields is done through dropdown menu’s (to control what text is in the field) making it easier to create the table and providing the relationships.

Access table relationships

Artists table

Albums table

Songs table

Web-based Databases

Currently, the world of relational database is primarily geared toward the use of the Structured Query Language (SQL).  SQL is a special programming language for retrieving data out of databases originally developed at IBM in the early 1970s. Relational Software (now called Oracle) created the first commercial software to implement SQL in the late 70s – called Oracle V2, and has subsequently become a major player in the database industry.  Currently, there are many implementations of SQL databases, both commercial and open source (free), and most of them are accessible from scripting languages so they can be considered ‘Web-based’ databases.

Although it grew out of the relational database model, SQL has deviated somewhat to include features and provide functionality needed for database applications.  The basic structure of SQL is an English language sentence comprised of multiple parts using defined keywords.  ‘SELECT’ is used to indicate that data is to be retrieved, ‘FROM’ is used to identify the table(s) from which the data is to be obtained, and ‘WHERE’ defines conditions on which the command searches.  So the following SQL query (left) selects all the substance that have chloride (‘%' is used as a wildcard) in the their name and returns the names and formulas of those entries found (rather than all the fields (columns) in the substances table).

Simple SQL queryComplex SQL query

The more complicated query (right) adds the ‘JOIN’ syntax that is used to create relationships between tables on the fly so that related data can be retrieved in a single query.  The designation LEFT JOIN means that all the data from the ‘LEFT’ (first) table is including in the results even if there is no related data in the ‘RIGHT’ (second) table.  The keyword ‘AS’ is used to create an alias to a table to make the query shorter, and the keyword ‘ON’ is used to define the fields to join between the tables.

One current trend of note is the movement toward ‘NoSQL’ databases.  This is somewhat of a misnomer as the class of databases this commonly refers to are really non-relational in nature and as a result there is no need to use SQL queries (although some do).  The database model types lumped into the NoSQL classification are: Column, Document, Key-Value, Graph, and Multi-model based.  Many of these have only been developed in the last few years as a response to interest in big data – applications where very fast processing of large (distributed) datasets is required.  Graph based databases are of interest in many situations due to their lack of a data model and representation of ‘semantic’ data or subject-predicate-object triples.  These databases have their own query language, SPARQL which is the interestingly recursive acronym that stands for ‘SPARQL Protocol and RDF Query Language’.

Additional Material

 

Additional Resources

Rating: 
0
No votes yet

Annotations