Select Rows with Keys Matching Multiple Columns in Subquery
When you query a database table using SQL, you might find the need to:
- select rows from table A using a certain criteria (i.e., a WHERE clause)
- then, use one or more columns from result set (coming from the above query)
as a subquery to subselect from table B
You can do this quite easily in SQL
key1 | key2 | val | |
---|---|---|---|
0 | A | B | 2 |
1 | A | Z | 3 |
2 | C | D | 4 |
3 | E | F | 5 |
4 | G | H | 6 |
color | key1 | key2 | |
---|---|---|---|
0 | red | A | B |
1 | orange | A | Z |
2 | yellow | C | D |
3 | green | E | F |
4 | blue | G | H |
So, if we wanted to grab all rows from df_colors where the value in df_vals is
inclusively between 2 and 6, then:
color | key1 | key2 | |
---|---|---|---|
0 | orange | A | Z |
1 | yellow | C | D |
2 | green | E | F |
Notice that the inner subquery produces the keys where the values are between 2
and 6
key1 | key2 | |
---|---|---|
0 | A | Z |
1 | C | D |
2 | E | F |
Then, each pair of keys are used to subselect rows from df_colors.
So, the ON keyword:
- Grabs the first pair of keys (A, B) from df_colors and compares them to each pair of keys returned by the subquery [(A, Z), (C,D), (E, F)]
- If there is a match, then that row from df_colors is returned
- This process (Steps 1-2) is repeated for the remaining rows in df_colors [(A,
Z), (C, D), (E, F), (G, H)]
The query above is equivalent to the following:
Note: This latter query doesn’t work in sqlite3 since tuples aren’t supported
but it should work fine for a real database. Note that it might be slower than
the first query and is a bit less obvious. So, it is advised that you stick with
the first query.