1. An Introduction to SQL
A database is nothing but a collection of organized data. It doesn’t have to be in a digital format to be called a database. A telephone directory is a good example, which stores data about people and organizations with a contact number. Software which is used to manage a digital database is called a Database Management System (DBMS).
The most prevalent database organizational model is the Relational Model, developed by Dr. E F Codd in his groundbreaking research paper - A Relational Model of Data for Large Shared Data Banks.In this model, data to be stored is organized as rows inside a table with the column headings specifying the corresponding type of data stored. This is not unlike a spreadsheet where the first row can be thought of as column headings and the subsequent rows storing the actual data.
SQL stands for Structured Query Language and it is the de-facto standard for interacting with relational databases. Almost all database management systems you’ll come across will have a SQL implementation. SQL was standardized by the American National Standards Institute (ANSI) in 1986 and has undergone many revisions, most notably in 1992 and 1999. However, all DBMS’s do not strictly adhere to the standard defined but rather remove some features and add others to provide a unique feature set. Nonetheless, the standardization process has been helpful in giving a uniform direction to the vendors in terms of their database interaction language.
1.1 SQL Commands Classification
SQL is a language for interacting with databases. It consists of a number of commands with further options to allow you to carry out your operations with a database. While DBMS’s differ in the command subset they provide, usually you would find the classifications below.
-
Data Definition Language (DDL) : CREATE TABLE, ALTER TABLE, DROP TABLE etc.
These commands allow you to create or modify your database structure.
-
Data Manipulation Language (DML) : INSERT, UPDATE, DELETE
These commands are used to manipulate data stored inside your database.
-
Data Query Language (DQL) : SELECT
Used for querying or selecting a subset of data from a database.
-
Data Control Language (DCL) : GRANT, REVOKE etc.
Used for controlling access to data within a database, commonly used for granting user privileges.
-
Transaction Control Commands : COMMIT, ROLLBACK etc.
Used for managing groups of statements as a unit of work.
Besides these, your database management system may give you other sets of commands to work more efficiently or to provide extra features. But it is safe to say that the ones above would be present in almost all DBMS’s you encounter.
1.2 Explaining Tables
A table in a relational database is nothing but a matrix of data where the columns describe the type of data and the row contains the actual data to be stored. Have a look at the figure below to get a sense of the visualization of a table in a database.
| id | language | author | year |
|---|---|---|---|
| 1 | Fortran | Backus | 1955 |
| 2 | Lisp | McCarthy | 1958 |
| 3 | Cobol | Hopper | 1959 |
The above table stores data about programming languages. It consists of 4 columns (id, language, author and year) and 3 rows. The formal term for a column in a database is a field and a row is known as a record.
There are two things of note in the figure above. The first one is that, the id field effectively tells you nothing about the programming language by itself, other than its sequential position in the table. The second is that though we can understand the fields by looking at their names, we have not formally assigned a data type to them i.e. we have not restricted (not yet anyways) whether a field should contain alphabets or numbers or a combination of both.
The id field here serves the purpose of a primary key in the table. It makes each record in the table unique and its advantages will become clearer in chapters to come. But for now consider this, what if a language creator made two languages in the same year; we would have a difficult time narrowing down on the records. An id field usually serves as a good primary key since it’s guaranteed to be unique, but usage of other fields for this purpose is not restricted.
Just like programming languages, SQL also has data types to define the kind of data that will be stored in its fields. In the table given above, we can see that the fields language and author must store English language characters. Thus their data type during table creation should be specified as varchar which stands for variable number of characters.
The other commonly used data types you will encounter in subsequent chapters are:
| Fixed length characters | char |
| Integer values | int |
| Decimal numbers | decimal |
| Date data type | date |