A simple IN list predicate is a predicate where the left operand is a simple column reference and the IN list is composed entirely of constants or parameter markers. The following are examples of simple IN predicates:
orig_airport IN ('ABQ', 'AKL', 'DSM') orig_airport IN (?, ?, ?) orig_airport IN ('ABQ', ?, ?, 'YYZ')
Derby transforms each IN list predicate into an equality predicate whose right operand is a parameter marker that is created internally. This internal equality predicate is called a probe predicate. Each of the above examples of simple IN predicates is transformed into the following probe predicate:
orig_airport = ?
Probe predicates are treated differently than normal equality predicates. Probe predicates are processed in a special way during query optimization and execution.
During optimization, Derby analyzes the probe predicate to determine if the probe predicate is useful for limiting the number of rows retrieved from disk. For a probe predicate to be useful, both of the following requirements must be true:
If both of these requirements are met, Derby will use the probe predicate at query execution to probe the underlying index for values in the IN list. In other words, the right operand of the probe predicate (the parameter) becomes a place-holder into which Derby plugs the different values from the IN list. Then for each value, Derby reads the matching rows from the index.
If either of the two requirements is not satisfied, Derby discards the internal probe predicate and executes the query using the original IN list predicate.
The following query is submitted to Derby:
SELECT flights.orig_airport, cities.city_name FROM flights, cities WHERE flights.orig_airport IN ('ABQ', 'DSM', 'YYZ') AND flights.orig_airport = cities.airport
The Derby optimizer transforms this query internally into:
SELECT flights.orig_airport, cities.city_name FROM flights, cities WHERE flights.orig_airport = ? AND flights.orig_airport = cities.airport
In this transformed query flights.orig_airport = ? is an internal probe predicate.
There is an index on the org_airport column in the flights table. If the estimated cost of probing that index for the three values (ABQ, DSM, YYZ) is less than the cost of accessing the flights table in some other way, Derby will perform probing on the index at query execution. This approach ensures that Derby reads only the necessary rows from the Derby table.
At a higher level, the approach by Derby to use index probing for IN lists is an internal way of evaluating the transformed predicate multiple times. The predicate is evaluated one time for each value in the IN list.
PreparedStatement ps = conn.prepareStatement( "select flights.orig_airport, cities.city_name " + "from flights, cities " + "where flights.orig_airport = ? " + "and flights.orig_airport = cities.airport "); ps.setString(1, "ABQ"); rs1 = ps.executeQuery(); ps.setString(1, "DSM"); rs2 = ps.executeQuery(); ps.setString(1, "YYZ"); rs3 = ps.executeQuery();
From an SQL perspective, Derby is logically (but not actually) performing the following statement:
SELECT flights.orig_airport, cities.city_name FROM flights, cities WHERE flights.orig_airport = 'ABQ' AND flights.orig_airport = cities.airport UNION ALL SELECT flights.orig_airport, cities.city_name FROM flights, cities WHERE flights.orig_airport = 'DSM' AND flights.orig_airport = cities.airport UNION ALL SELECT flights.orig_airport, cities.city_name FROM flights, cities WHERE flights.orig_airport = 'YYZ' AND flights.orig_airport = cities.airport
In the above SQL example, for each subquery the equality predicate limits the number of rows read from the flights table so that the process avoids having to read unnecessary rows from disk.
The larger the flights table, the more time Derby will save by probing the index for the relatively few IN list values.
By using probe predicates, regardless of how large the base table is, Derby only has to probe the index a maximum of N times, where N is the size of the IN list. If N is significantly less than the number of rows in the table, or is significantly less than the number of rows between the minimum value and the maximum value in the IN list, selective probing ensures that Derby does not spend time reading unnecessary rows from disk.