Introduction to relational databases and SQL

In our modern world, databases are ubiquitous. If you want to write any piece of software that creates and stores information for future use, using a database system is necessary. Also, if you get into any software development role, you are likely expected to be familiar with databases.

Your job will, at some point or another, require you to interact with DBs. This interaction can be either by directly creating and managing them or simply retrieving stored data to perform operations and display it to users.

If you tried to get into and learn about databases before, you might have been intimidated by all the jargon you encountered. Don’t worry. Beyond the fancy and complicated words, learning to use databases is pretty easy, after all. On top of that, it is an indispensable skill to have under your belt.

This series of tutorials will teach you all the necessary things to get up and running with databases. Eventually, we will build together a functional app to put all that you learned into practice. This will help you see how it all fits together in a real-world application.

But because some theory is unavoidable, let’s start by defining some terms that we will need and use from now on.

What is a database?

A database is nothing more than a stored and organized collection of data for future use. Database Management System (DBMS) is specific software that provides us with the necessary tools to interact with the stored information. It allows us to:

Create new data
Read and retrieve the information we created in the past
Update already existing data using new values
Delete existing data that we don’t need anymore

These four basic operations are abbreviated as CRUD operations. You will hear a lot this term working with database systems.

Different databases paradigms

There are different types (or paradigms) of databases, such as key-value, document-oriented or relational databases. They differ mainly in how they store information and retrieve it from memory.

The type that is most used today is relational databases. Edgar F. Codd described this paradigm in 1970 in his influential paper entitled “A Relational Model of Data for Large Shared Data Banks”. In a world where technologies appear and become obsolete within a few years, it is impressive how this model—after more than 50 years- still exists and dominates the world of databases.

From now on, we will focus mainly on the relational databases model. This is why you should familiarize yourself with the philosophy behind it and its terminology.

A relational database is made of tables. These tables are made of columns (also called fields) and rows (or records). Each column has a specific data type that determines what goes into it (like digits, text, and so on). To illustrate these concepts using a practical example, consider the database of a digital store. It can contain several tables, such as:

products: Where we store a list of everything we sell in our store, besides any other relevant information such as its price, available quantity, and so on.
clients: Here we can store and keep track of the clients who buy from our store.
purchases: It is critical to keep track of the purchases made by your clients to perform analytics later on and improve your business. Any product purchased by any client goes into this table.

Putting “relational” in relational databases

The “relational” part in the name of this paradigm refers to our ability to create entities that have relationships with each other. This gives our data a logical structure which can be very helpful, as you will see in the future.

If you go back and consider our digital store example again, you will see that any purchase is performed by a client who purchases a product.

Each row in the purchases table should have two references: one to another row in the clients table (the one who made the purchase) and a second to a row in the products table (the product that was purchased).

A reference to another table in one table is called a foreign key. We shall explore these relationships in detail in upcoming tutorials.

Structured Query Language (SQL)

As you already realize by now, the purpose of databases is to perform CRUD operations on the stored data (besides other advanced operations). To do that, relational database management systems (RDBMS) use Structured Query Language or SQL for short. Usually, it is pronounced sequel.

SQL allows us, developers, to create new databases, add tables in them, insert rows in tables, search and retrieve data, update or delete anything we want, and so on. You can use SQL to write it directly in the command-line interface that comes with your RDBMS of choice.

But, in most real-world applications, a database on its own is not very useful. It is something that the users of your app should not worry about. You obviously don’t expect the store manager to add or delete products from the database using the command line in the digital store example.

You as a developer should create a graphical user interface (GUI) for them, which can be a desktop, web, and/or a mobile app. Your app lets them easily see, enter, update, or delete data without worrying about what’s happening under the hood.

Because database functionality is integrated with the rest of the app, you usually write SQL directly into your code using a programming language of choice. Most programming languages provide libraries that make this very easy. For now, don’t worry about that, we shall cover all this very soon. But it is handy to grasp the bigger picture before we dive into details.

Not all database systems were created equal

When you start writing apps that use databases, you can pick from various relational database management systems provided by different vendors. The most prominent player in the databases market is Oracle and its flagship product Oracle Database. It’s a solution targeted mainly at big companies rather than individuals.

MySQL is another RDBMS that is free and open-source (think of free as in free speech, not free beer). MySQL is so popular and has a large market share. Sun Microsystems originally developed it before Oracle acquired it, thereby acquiring MySQL and still developing it today. Oracle as a company isn’t liked much in the free software community. After the acquisition, some of the original developers of MySQL forked it and created a community-supported version called MariaDB. MariaDB is touted as having more features, new storage engines, fewer bugs, and better performance.

Another major player in this area is PostgreSQL. According to its developers, it is the most advanced open-source relational database. It is feature-rich with impressive performance even in huge databases.

And finally, we can’t conclude any talk about databases without mentioning Microsoft’s product, MS SQL Server.

Since the various RDBMSs are developed by different vendors and have different features, the SQL you write can vary depending on the product. These differences are not very relevant when you are just starting with SQL, but they become more important the more you use advanced vendor-specific features. Since you can’t just write SQL in the void, you need to pick one product. For this tutorial, we chose to go with PostgreSQL since it is both free and powerful. But we will focus on writing database-agnostic SQL that is directly or easily portable to other systems.

In the next tutorial, I will guide you step by step through creating your first database and writing your first SQL query.

You’ll learn how to download, install PostgreSQL and get started with it!