Welcome to my set of tutorials on MySQL. While
there are loads of tutorials on MySQL (usually in combination with a
scripting language) on the Net aimed at building applications etc.
Part 8 - Advanced Queries
Aliasing and referencing, Self-Joins, Temporary Tables, Full Text Searches.
Introduction
Earlier in this series when we looked at SQL statements we were primarily
concerned with either retrieving, changing or deleting values within the
database
and also manipulating the database structures. SQL is more powerful than
that and in this workshop I aim to introduce some of the advanced functions
and queries that can be useful when building more complex applications.
As
One of the simplest manipulations is to define a new structural element
(table or column) by aliasing an existing value. A common use for this is
to create
a shorthand reference to elements with long names to make the SQL statements
shorter and reduce the chance of typos in the longer names.
SELECT <columns>
FROM <existing_table_name>
AS <new_table_name>
It is important to remember that the table hasn't actually been renamed,
but instead the <new_table_name> is simply a reference
that exists for the duration of the SQL statement. For example to see this
working lets create a simple SELECT statement that retrieves the name column
from the artists table (that we created in Part 5)
using a reference 't1'.
mysql> SELECT t1.name
-> FROM artists
-> AS t1;
+----------------------+
| name |
+----------------------+
| Jamiroquai |
| Various |
| westlife |
| Various |
| Abba |
As we can see t1 is a lot easier to type that artists. To see the full benefit
of this let's revisit one of the join statements from Part 5.
The existing statement is:
mysql> SELECT artists.Artist, cds.title, genres.genre FROM cds
-> LEFT JOIN genres ON cds.genreID = genres.genreID
-> LEFT JOIN artists ON cds.artistID = artists.artistID
-> WHERE (genres.genre = 'Pop');
Whereas our modified statement would look like so:
mysql> SELECT t2.Artist, t1.title, t3.genre FROM cds AS t1
-> LEFT JOIN genres AS t3 ON t1.genreID = t3.genreID
-> LEFT JOIN artists AS t2 ON t1.artistID = t2.artistID
-> WHERE (t3.genre = 'Pop');
Using aliases also has
few other advantages including:
- Avoiding any reserved(used by MySQL) words.
- Allowing Multiple Joins to the same table
- Allowing Self-Joins
- Assigning the result of MySQL function to a temporary column name.
While there is not much more to be said about MySQL reserved words and MySQL
functions are covered in Part 9 of this Virtual Workshop series, we will
take a look at Self-Joins next.
Altering the database structures and adding data
In order to include this new producer column (and include meaningful data)
we need to alter the structure of the cds table.
mysql> ALTER TABLE cds
-> ADD producerID INT(3);
At this point you can if you want you can enter the appropriate producer
information for your existing CDs entered. For use in this workshop we also
need to enter the details of an album produced by the artist and an album
produced by a outside party. In this case the famously self reliant Prince
(or whatever he is calling himself this week) and The Beatles with and their
producer the '5th Beatle' Sir George Martin.
| Artist |
Title |
Year |
Label |
Bought |
Tracks |
Genre |
Producer |
| Prince |
Sign of the Times |
1987 |
Warner Brothers |
1987-11-07 |
16 |
Soul |
Prince |
| The Beatles |
The White Album |
1990 |
Capitol |
1994-07-20 |
30 |
Classic Rock |
George Martin |
This data has to be entered into the artists table first:
mysql> INSERT INTO artists
-> VALUES ('?','The Beatles');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO artists
-> VALUES ('?','Prince');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO artists
-> VALUES ('?','George Martin');
Query OK, 1 row affected (0.00 sec)
Then the get the artistIDs using a SELECT statement
mysql> SELECT *
-> FROM artists;
+----------+----------------------+
| artistID | name |
+----------+----------------------+
-- snip --
| 17 | The Beatles |
| 18 | Prince |
| 19 | George Martin |
+----------+----------------------+
19 rows in set (0.00 sec)
Next enter the label details in the label table.
mysql> INSERT INTO label
-> VALUES ('?','Warner Brothers');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO label
-> VALUES ('?','Capitol');
Query OK, 1 row affected (0.00 sec)
And getting those newly created label IDs as well.
mysql> SELECT *
-> FROM label;
+---------+-----------------+
| labelID | name |
+---------+-----------------+
-- snip --
| 5 | Warner Brothers |
| 6 | Capitol |
+---------+-----------------+
6 rows in set (0.02 sec)
We should also know the genreIDs as well (Classic Rock = 3; Soul = 5) and
we can enter the cd details into the cds table (you can see why it is better
to have a scripted interface to MySQL).
mysql> INSERT INTO cds (
-> title, year, bought, tracks, genreID, artistID, labelID, producerID
-> ) VALUES (
-> 'Sign of the Times','1987','1987-11-07','16','5','18','5','18'
-> );
Query OK, 1 row affected (0.08 sec)
mysql> INSERT INTO cds (
-> title, year, bought, tracks, genreID, artistID, labelID, producerID
-> ) VALUES (
-> 'The White Album','1990','1994-07-20','30','3','17','6','19'
-> );
Query OK, 1 row affected (0.01 sec)
This will now allow us to see some of the good things that you can do with
AS.
Multiple Joins to One Table.
Consider the following SQL statement and resulting output.
mysql> SELECT cds.title, artists.name
-> FROM cds
-> LEFT JOIN artists
-> USING (artistID)
-> WHERE artists.artistID = 17;
+-----------------+-------------+
| title | name |
+-----------------+-------------+
| The White Album | The Beatles |
+-----------------+-------------+
1 row in set (0.00 sec)
This is fine, but if we wanted to include the producer details we need to
join to a producer table, which we don't have because we decided that producers
are artists and this creates a problem. We solve this problem by making ANOTHER
join to the artists table only this time calling it the producers table using
AS to create this alias.
LEFT JOIN artists AS producers ....etc
There is another small consideration. In order to complete the join to the
newly aliased producers tale we have to construct the join without the USING
clause. This is because we need to join the cds.producerID field on the producers.artistID
field which obviously aren't the same name that USING requires. SO to construct
the whole query.
mysql> SELECT cds.title, artists.name AS Artist, producers.name AS Producer
-> FROM cds
-> LEFT JOIN artists
-> USING (artistID)
-> LEFT JOIN artists AS producers
-> ON cds.producerID = producers.artistId
-> WHERE artists.artistID = 17;
+-----------------+-------------+---------------+
| title | Artist | Producer |
+-----------------+-------------+---------------+
| The White Album | The Beatles | George Martin |
+-----------------+-------------+---------------+
1 row in set (0.01 sec)
NOTE: The artists.name and producers.name field has also been aliased to
differentiate them in the output.
We can also use this method of aliasing tables to to join a table to itself
or perform a Self-Join as it is known.
The Self-Join
A Self-Join (as the name suggests) is when a table is joined to itself and
is made possible by aliasing the original table. The main reasons for using
a Self-Join arise when you need to either compare the values in differ net
rows in the same table.
Unfortunately in the CDs examples we have used so far there is no obvious
candidate to demonstrate the Self-Join correctly so I will use a different
example this one time only. The classic example given when explaining Self-Joins
(I read this in many different
books
when
learning
SQL) involves
an 'Employees'
table
that
stores
the details of their supervisor as well.
| EmployeeID |
Name |
Salary |
ManagerID |
| 023452 |
Ken Smith |
45000 |
NULL |
| 087652 |
Linda Jones |
25000 |
023452 |
Employees Table
In order to get the name of the Linda's manager you would have to join
the table to itself (using an alias to create a 'Manager' table). The
join would occur ON Employees.ManagerID = Manager.EmployeeID.
Thus we can construct the SQL statement using AS like so:
SELECT
Employees.EmployeeID, Employees.Name, Employees.Salary, Manager.Name AS Manager
FROM Employees
LEFT JOIN Employees AS Manager
ON Employees.ManagerID = Manager.EmployeeID
WHERE (Employees.EmployeeID = '087652';
This would result in the following being output.
| EmployeeID |
Name |
Salary |
Manager |
| 087652 |
Linda Jones |
25000 |
Ken Smith |
Aliasing Function output
Using AS with built in functions is discussed in more depth in Part
9 of this workshop series, but worth covering quickly here. Consider
a quick query to find the average number of tracks.
mysql> SELECT AVG(tracks)
-> FROM cds;
+-------------+
| AVG(tracks) |
+-------------+
| 22.1667 |
+-------------+
1 row in set (0.00 sec)
AVG(tracks) does tell us certain information, but this field would be
difficult to use as part of a script and therefore we can use AS to give
a more meaningful name.
mysql> SELECT AVG(tracks) AS AverageTracks
-> FROM cds;
+---------------+
| AverageTracks |
+---------------+
| 22.1667 |
+---------------+
1 row in set (0.03 sec
UNION Joins
Union Joins allow the results of two queries to be combined into one outputted
result set. This is done by having the 2 (or more) queries glued together
by the UNION operator.
SELECT <fields>
FROM <table>
WHERE <condition>
UNION
SELECT <fields>
FROM <table>
WHERE <condition>
For example if you wanted to search for artist beginning with
either P or G you would construct two statements
that searched each phrase and use the UNION statement.
mysql> SELECT artists.name
-> FROM artists
-> WHERE (artists.name LIKE 'P%')
-> UNION
-> SELECT artists.name
-> FROM artists
-> WHERE (artists.name LIKE 'G%');
+-----------------+
| name |
+-----------------+
| pop goes the 80 |
| Prince |
| George Martin |
+-----------------+
3 rows in set (0.03 sec)
By now with the knowledge you possess you will have worked out that this
could just have easily been done by using two where conditions.
mysql> SELECT artists.name
-> FROM artists
-> WHERE ((artists.name LIKE 'P%') || (artists.name LIKE 'G%'));
+-----------------+
| name |
+-----------------+
| pop goes the 80 |
| Prince |
| George Martin |
+-----------------+
3 rows in set (0.00 sec)
However UNION also allows you to combine the results from different tables
not just the same one. To give a practical, but somewhat unrealistic in
a 'real world' sense lets return to our CDs database and select all the
genres and all the labels that start with letters A to M.
mysql> SELECT label.name
-> FROM label
-> WHERE (label.name BETWEEN 'A%' AND 'M%')
-> UNION
-> SELECT genres.genre
-> FROM genres
-> WHERE (genres.genre BETWEEN 'A%' AND 'M%');
+----------------+
| name |
+----------------+
| jive |
| EMI |
| Capitol |
| Easy Listening |
| Heavy Metal |
| Eighties |
| Hip Hop |
| Jazz |
| Guitar Bands |
+----------------+
9 rows in set (0.04 sec)
A more practical example would be to imagine if we had in fact made a
separate producers table rather than just alias the artists table. We could
query both tables using UNION to produce a result set.
Temporary Tables
As there name suggests, temporary tables are fleeting in nature lasting
only for the length of the MySQL session. Some of the reasons that you
may wish to do this include.
- A large, busy site may create a temporary table copies (which are issued
and die with a session for example) for each client to use, thus easing
congestion in on the original table.
- MySQL does not yet support subselects. Using temporary tables can,
in some instances, overcome that problem.
- Storing data during structural database changes.
There are many other reasons that I could go into but let's just look
at one cool trick.
Creating the Temporary Table.
The syntax for creating temporary tables is almost identical that used
for creating a normal table. Except that there is an extra TEMPORARY clause.
CREATE TEMPORARY TABLE <table> (
field definitions
)
This will work by using a query to create the table as well, like so.
CREATE TEMPORARY TABLE <newtable>
SELECT * FROM <oldtable>
And it is here that the 'cool trick' comes in. I received an e-mail asking
for help (as I often do) and I was able to provide a solution using a temporary
table. The problem was related to how to remove duplicate rows from a database.
This came about from a web form that fed into a database and users were
submitting it multiple times, thus creating duplicated data. The correspondent
wished to remove any duplicate entries. We can replicate this problem by
creating a copy of the cds table with all the data duplicated.
First make the duplicate cds table:
mysql> CREATE TABLE cdsdupe
-> SELECT * FROM cds;
Apply the same structural constraints to the table:
mysql> ALTER TABLE cdsdupe MODIFY cdID INT(3) auto_increment primary key;
Next insert the same data again only omitting the cdID which would cause
a conflict with the existing cdIDs (this means using the longer method
of specifying which fields to insert into and select from).
mysql> INSERT INTO cdsdupe
-> (title, year, bought, tracks, genreID, artistID, labelID, producerID)
-> SELECT cds.title, cds.year, cds.bought, cds.tracks, cds.genreID,
-> cds.artistID, cds.labelID, cds.producerID
-> FROM cds;
Query OK, 8 rows affected (0.00 sec)
Records: 8 Duplicates: 0 Warnings: 0
You can check that there are indeed duplicate entries using a standard
select statement. So how do we solve this problem? The DISTINCT qualifier can be used to only output records
one if there are duplicates - which is what we want. Thus if we create
a temporary table using a DISTINCT qualifier in a SELECT statement we can
filter out the duplicates.
mysql> CREATE TEMPORARY TABLE cdstemp
-> SELECT DISTINCT cdsdupe.title, cdsdupe.year, cdsdupe.bought,
-> cdsdupe.tracks, cdsdupe.genreID, cdsdupe.artistID,
-> cds.labelID, cds.producerID
-> FROM cdsdupe;
Query OK, 8 rows affected (0.03 sec)
Records: 8 Duplicates: 0 Warnings: 0
Again you can check this with a select statement. We next need to remove
the data from the original table using a TRUNCATE statement
TRUNCATE TABLE <table_name>
This is better than a delete statement as it drops then recreates the
table. This means that it is faster on large tables and more importantly
can be used as part of a transaction.
mysql> TRUNCATE TABLE cdsdupe;
Query OK, 0 rows affected (0.03 sec)
We then use the temporary table to repopulate the cdsdupe table.
mysql> INSERT INTO cdsdupe
-> (title, year, bought, tracks, genreID, artistID, labelID, producerID)
-> SELECT cdstemp.title, cdstemp.year, cdstemp.bought, cdstemp.tracks,
-> cdstemp.genreID, cdstemp.artistID, cdstemp.labelID, cdstemp.producerID
-> FROM cdstemp;
Thus we have removed the duplicates from the cdsdupe table.
Using Full Text Searches
Full Text searches allow you to search for a phrase that can appear ANYWHERE
in fields that you specify. This is a kind of 'super-wildcard' approach.
To do this you first have to alter any table nominating fields to be searchable
within brackets as part of a FULL TEXT defination.
ALTER TABLE <table> ADD FULLTEXT (fields)
To illustrate lets make our title field of the cds table searchable.
mysql> ALTER TABLE cds ADD FULLTEXT (title);
Query OK, 8 rows affected (0.36 sec)
Records: 8 Duplicates: 0 Warnings: 0
This done we could query the database using the MATCH() and AGAINST()
functions, where the field is matched against the search term.
SELECT * FROM cds WHERE MATCH(fields) AGAINST ('search term')
So if we wanted to see all the cds with 'funk' in the title (just because
there are bound to be some).
mysql> SELECT cdID, title, year, tracks
-> FROM cds
-> WHERE
-> MATCH(title) AGAINST ('hits');
+------+---------------------+------+--------+
| cdID | title | year | tracks |
+------+---------------------+------+--------+
| 5 | Abbas Greatest Hits | 1998 | 24 |
+------+---------------------+------+--------+
1 row in set (0.02 sec)
The MATCH function also sorts multiple results be relevance with the first
result being the most relevant. To demonstrate lets add another cds with
hits in the title (The Hits by Prince).
mysql> INSERT INTO cds (
-> title, year, bought, tracks, genreID, artistID, labelID, producerID
-> ) VALUES (
-> 'The Hits','1993','1993-10-07','58','5','18','5','18'
-> );
Query OK, 1 row affected (0.08 sec)
Running the FULL TEXT query again will put the newly added CD top.
mysql> SELECT cdID, title, year, tracks
-> FROM cds
-> WHERE
-> MATCH(title) AGAINST ('hits');
+------+---------------------+------+--------+
| cdID | title | year | tracks |
+------+---------------------+------+--------+
| 9 | The Hits | 1993 | 58 |
| 5 | Abbas Greatest Hits | 1998 | 24 |
+------+---------------------+------+--------+
2 rows in set (0.00 sec)
We can ask MySQL to show us the rating as well by SELECTing the result
of the match().
mysql> SELECT title, MATCH(title) AGAINST ('hits') AS Rating
-> FROM cds
-> WHERE MATCH(title) AGAINST ('hits');
+---------------------+-----------------+
| title | Rating |
+---------------------+-----------------+
| The Hits | 1.238520026207 |
| Abbas Greatest Hits | 1.2109839916229 |
+---------------------+-----------------+
2 rows in set (0.01 sec)
You can see that The Hits (1.23....) has a higher rating than Abbas Greatest
Hits (1.21.....)
Conclusion
As these parts progress and you, the reader, become more confident
with SQL statements I have to introduce more complex problems and solutions
to keep you entertained .
|