12. Subqueries

A subquery, simply put, is a query written as a part of a bigger statement. Think of it as a SELECT statement inside another one. The result of the inner SELECT can then be used in the outer query. Let us take a simple example to illustrate this. Consider the same source tables as the ones in the Understanding Joins chapter - authors_tbl and newlang_tbl. We will try to write a query (and a subquery) to display the author of a particular language.

Listing: A simple subquery example
1 SELECT author FROM authors_tbl 
2   WHERE language_id IN (SELECT id FROM newlang_tbl WHERE language='Tcl');
3 
4 > Ousterhout

The subquery SELECT id FROM newlang_tbl WHERE language='Tcl' picks the correct language id from the newlang_tbl and passes it on to the outer query on the authors table. This frees us from the responsibility of joining the two tables using the language id field. Which approach to take in certain situations - a join, a subquery or a combination of both - is mostly a matter of personal preference. Other times, one approach will be clearly the superior choice.

12.1 Types of subqueries

We can broadly classify subqueries into three categories.

  1. Scalar subqueries A subquery that returns only a single column of a single row as its output. The example in the previous section, where the subquery returns the id for Tcl is a scalar subquery.
  2. Table subqueries A table subquery can return more than a single row and many columns per row. In essence, it can return a table itself to take part in your outer query. Let us take an example where we wish to display all the programming language writers who created a language after 1980.
Listing: A table subquery example
1 SELECT author, language FROM authors_tbl a, (SELECT id, language FROM newlang_tb\
2 l WHERE year > 1980) n 
3   WHERE a.language_id = n.id;

Carefully study the FROM clause of the query above. Our table subquery is placed within it and it returns a set of languages which were created after 1980. The result consists of two rows and two columns, one of which i.e. language is picked up to be displayed in the final output.

Figure: authors who created their languages after 1980
author language
Wall Perl
Ousterhout Tcl
  1. Row subqueries A subquery that returns a single row but more than one column is called a row subquery. These are the least important type of subqueries since most database management systems do not support it, including SQLite.

12.2 Using subqueries in INSERT statements

We can even use subqueries inside other SQL statement like INSERT. Let us try to add a new language and a new author in our tables and ease our task of remembering id numbers by just a bit by using subqueries.

Listing: Inserting a new programming language
1 INSERT INTO newlang_tbl (id, language, year, standard) 
2   VALUES (6, 'Pascal', 1970, 'ISO');

The contents of our table now look as shown below.

id language year standard
1 Prolog 1972 ISO
2 Perl 1987  
3 APL 1964 ANSI
4 Tcl 1988  
5 BASIC 1964 ANSI
6 Pascal 1970 ISO

While inserting a new entry into the authors_tbl, we can either remember that we used the language_id as 6 for Pascal, or use a subquery. Let us see an example of the latter approach.

Listing: Inserting a new author using a subquery
1 INSERT INTO authors_tbl (author_id, author, language_id)
2   VALUES (7, 'Wirth', (SELECT id FROM newlang_tbl WHERE language='Pascal'));

We believe that this should put the correct language id for Wirth since he created Pascal. Let us verify this belief by looking at te contents of the table.

author_id author language_id
5 Kemeny 5
6 Kurtz 5
1 Colmerauer 1
2 Wall 2
3 Ousterhout 4
4 Iverson 3
7 Wirth 6