This tutorial is intended for those developers
who come across the situation where they need to cater for what is
known as a 'many-to-many' relationship. This tutorial will show a
technique that has been successful for me for many years.
How to Handle a Many-to-Many Relationship with PHP and MySQL
Intended Audience
This tutorial is intended for those developers who come across
the situation where they need to cater for what is known as a
'many-to-many' relationship. It will explain how to generate the
database tables to deal with such a relationship, and how to design
HTML forms to display and update the contents of these tables.
This situation can often catch the inexperienced designer/developer
unawares, often leading to clumsy or unworkable solutions. This
tutorial will show a technique that has been successful for me for many
years.
Introduction
What exactly is a many-to-many relationship? When your user tries to
explain to you how a pair of business objects are related he might say
"many of these can be related to many of those, and many of those can
be related to many of these." If you attempt to draw such a
relationship in an Entity Relationship Diagram (ERD) you will probably
end up with something like this:
Those of us who are more experienced in database design will
immediately see this as a totally illegal relationship. The only sort
of relationship which is possible is a 'one-to-many', so the novice
might redraw his ERD as follows:
That may look better, but again it is impossible to implement in any
RDBMS that I have come across. You can have A-to-B in a 'one-to-many'
relationship, or B-to-A in a 'one-to-many' relationship, but you cannot
have both at the same time. So what are we to do?
The solution is to create a new table which will act as the 'many' in
two 'one-to-many' relationships, with each of the original tables
acting as the 'one'. This is shown in the following diagram, with the
new table identified as 'X'.
Table 'X' is sometimes referred to as a 'link', 'intersection' or 'xref' (cross-reference) table.
Database schema
Let us assume that the existing tables, 'A' and 'B', are defined as follows:
CREATE TABLE `a` (
`a_id` varchar(6) NOT NULL default '',
`a_desc` varchar(40) NOT NULL default '',
PRIMARY KEY (`a_id`)
)
CREATE TABLE `b` (
`b_id` varchar(6) NOT NULL default '',
`b_desc` varchar(40) NOT NULL default '',
PRIMARY KEY (`b_id`)
)
This new table must be constructed to allow the following:
* It must have a column which links back to table 'A'.
* It must have a column which links back to table 'B'.
* It must allow no more than one row to exist for any combination of rows from table 'A' and table 'B'.
* It must have a primary key.
In order to satisfy all these rules all I need in table 'X' are two
columns, 'a_id' and 'b_id'. I do not need an additional column for a
primary key as I can use the existing two columns. This produces a
database schema as follows:
CREATE TABLE `x` (
`a_id` varchar(6) NOT NULL default '',
`b_id` varchar(6) NOT NULL default '',
PRIMARY KEY (`a_id`,`b_id`),
)
Note here that I have created a primary key from two columns, not
one. Some naive database designers insist that every database table
should have a single-column technical primary key called 'id' which
obtains its value from an automatically-incrementing internal counter.
In MySQL this can be done with the 'auto-increment' option. While I
agree that there are places where a technical primary key is definitely
beneficial, in my years of experience I have also encountered instances
where a technical primary key is not only unnecessary, it is actually
detrimental. A cross-reference table is one of those places.
Also note that it is not necessary to create a separate index for
'a_id' as this is already covered by virtue of the fact that it forms
the leading portion of the primary key. A separate index on 'b_id'
might be of benefit depending on the volume of rows in the table and
the frequency of searches on that column.
Form design
Having designed the database we must now turn our attentions to
designing the forms that will manipulate the data. For this I will
suggest two alternatives although experienced developers may be able to
suggest more.
Simple
In this design I have 4 simple forms which are interconnected as shown in the following diagram:
 |
| |
We start off with the form <Browse 'A'> which enables the user to
browse through all occurrences of table 'A'. During the processing of
this form the user will be able to select an occurrence of 'A', then
press a control (a button or a hyperlink) which will pass the identity
of the selected occurrence to a second form which I have identified as
<Browse 'X' for selected 'A'>.
This second form will show which occurrence of table 'A' has been
selected, then list the current contents of table 'X' for that
occurrence. For each occurrence of table 'X' it will include the
relevant description from table 'B'. The MySQL query to achieve this is
as follows:
SELECT x.a_id, x.b_id, b.b_desc
FROM x
LEFT JOIN b ON (b.b_id = x.b_id)
WHERE (x.a_id = 'whatever')
A similar result may also be achieved using the following:
SELECT x.a_id, x.b_id, b.b_desc
FROM x, b
WHERE (x.a_id = 'whatever') AND (b.b_id = x.b_id)
The difference between these two SQL queries will only become apparent
if an occurrence of table 'B' is missing for an occurrence of table
'X'. With the first query all columns selected from table 'B' will be
returned as empty. With the second query the effect will be that the
occurrence of 'X' will be dropped from the selection.
The second form can only show existing occurrences of table 'X'. In my
design this form would have controls to invoke a third from to add new
occurrences to 'X' or a fourth form to delete occurrences from 'X'.
Complex
An alternative to this combines the facilities of forms (2), (3) and
(4) into a single more complex form, as shown in the following diagram:
This modified form will still show which occurrence of table 'A' has
been selected, but it will then list all occurrences of table 'B'.
Against each occurrence of table 'B' will be a checkbox to indicate if
an associated occurrence exists on table 'X' or not. The user will be
able to toggle the checkbox in order to create or delete the entry on
table 'X'. The MySQL query to select the data is as follows:
SELECT a.a_id, b.b_id, b.b_desc,
CASE WHEN x.b_id IS NULL THEN 'F' ELSE 'T' END AS selected
FROM a, b
LEFT JOIN x ON (a.a_id = x.a_id AND b.b_id_id = x.b_id)
WHERE (a.a_id = 'whatever')
Notice here that I am selecting from table 'A' and 'B', then doing a
LEFT JOIN on table 'X'. The CASE..WHEN..THEN..END statement checks the
existence of an associated occurrence on 'X' and sets a column named
'selected' to 'T' if one exists or 'F' if one does not.
As an added refinement I use a third form to allow the entry of
selection criteria so that the contents of the second form can be
refined. This is usually more relevant in those cases where there are
more columns to be displayed, but it does provide the ability to select
only those occurrences of 'B' where associated occurrences of 'X'
either do or do not exist. This is accomplished by altering the WHERE
clause to be one of the following:
WHERE (a.a_id = 'whatever') AND x.a_id IS NULL
or
WHERE (a.a_id = 'whatever') AND x.a_id IS NOT NULL
Complex update
If you thought that the code to retrieve the current data from the
database was tricky then what will you think about the code needed to
update the database after the user has made his changes? For this I
have a standard procedure which has as its input two arrays:
1. $fieldarray which has an entry for each row displayed in the form,
and for each row it contains the names and values for the primary key
required for table 'X'.
2. $selectarray which has an entry for the checkbox on each of the
rows. Note that this will only contain entries where the checkbox is
ON. If it has been checked OFF then the $_POST array will not contain
an entry for that row.
I shall now describe the code to process these two arrays.
I begin by looping through each row that was displayed in the form and initialise two string variables:
foreach ($fieldarray as $rownum => $rowdata) {
$insert = NULL;
$delete = NULL;
Each row provides me with the names and values for the primary key, so I can move their details into the two string variables.
foreach ($rowdata as $fieldname => $fieldvalue) {
$insert .= "$fieldname='$fieldvalue',";
$delete .= "$fieldname='$fieldvalue' AND ";
} // foreach
When there are no more fields left I can trim the unwanted ',' and ' AND '.
$insert = rtrim($insert, ',');
$delete = rtrim($delete, ' AND ');
Now I examine the contents of the checkbox in $selectarray and
construct the SQL query to either create the entry if the checkbox is
ON or delete the entry if the checkbox is OFF:
if (isset($selectarray[$rownum])) {
$query = "INSERT INTO $tablename SET $insert";
} else {
$query = "DELETE FROM $tablename WHERE $delete";
} // if
Finally I execute the query and check for errors. Note that I ignore
errors concerning duplicate entries. This is caused by a checkbox being
ON originally and not being changed to OFF by the user:
$result = @mysql_query($query, $dbconnect);
if (mysql_errno() <> 0) {
if (mysql_errno() == 1062) {
// ignore duplicate entry
} else {
trigger_error("SQL", E_USER_ERROR);
} // if
} // if
} // foreach
Summary
I first designed this solution to the many-to-many problem several
years ago while developing in a different language. It was an
interesting exercise to convert it to PHP and MySQL.
In my long career I have also encountered variations of this problem.
In this solution table 'X' contains nothing but the foreign key fields
which link up to the two related tables, but this can be made more
complicated by the following:
* There may be additional fields on table 'X', which means that the
database update routine will have to deal with more than simply
choosing between an insert or a delete.
* A more complicated version is where table 'X' can contain multiple
occurrences for each combination of 'a_id' and 'b_id'. The example I
have encountered several times is where each entry covers a different
date range causing only one to be regarded as 'current' on any
particular date, thus providing a history of values over a period of
time.
However, I shall leave the solutions to these additional problems to a later date.
|