Restartable Processes Require Complex Logic-Transactions

The most serious problem with the “commit before the logical transaction is over” approach is the fact that it frequently leaves your database in an unknown state if the UPDATE fails halfway through. Unless you planned for this ahead of time, it is very hard to restart the failed process, allowing it to pick up where it left off.

For example, say we were not applying the LOWER() function to the column, as in the previous example, but rather some other function of the column, such as this:
last_ddl_time = last_ddl_time + 1;

If we halted the UPDATE loop partway through, how would we restart it? We could not just rerun it, as we would end up adding 2 to some dates, and 1 to others. If we fail again, we would add 3 to some, 2 to others, 1 to the rest, and so on. We need yet more complex logic—some way to “partition” the data. For example, we could process every OBJECT_ NAME that starts with A, and then B, and so on:
$ sqlplus eoda/foo@PDB1
SQL> create table to_doasselect distinct substr( object_name, 1,1 ) first_char from T/Table created.

Now, we could restart this process if it fails, since we would not process any object name that had already been processed successfully. The problem with this approach, however, is that unless we have some attribute that evenly partitions the data, we will end up having a very wide distribution of rows.

The second UPDATE did more work than all of the others combined. Additionally, if other sessions are accessing this table and modifying the data, they might update the OBJECT_NAME field as well. Suppose that some other session updates the object named Z to be A, after we already processed the As.

We would miss that record. Furthermore, this is a very inefficient process compared to UPDATE T SET LAST_DDL_TIME = LAST_DDL_TIME+1. We are probably using an index to read every row in the table, or we are full scanning it n times, both of which are undesirable. There are so many bad things to be said about this approach.

The best approach is to do it simply. If it can be done in SQL, do it in SQL. What can’t be done in SQL, do in PL/SQL. Do it using the least amount of code you can. Have sufficient resources allocated. Always think about what happens in the event of an error.

So many times, I’ve seen people code update loops that worked great on the test data but then failed halfway through when applied to the real data. Then they are really stuck, as they have no idea where the loop stopped processing. It’s a lot easier to size undo correctly than to write a restartable program. If you have truly large tables that need to be updated, you should be using partitions, which you can update each individually. You can even use parallel DML to perform the update or, in Oracle 11g Release 2 and above, the DBMS_PARALLEL_EXECUTE package.

Using Autocommit

My final words on bad transaction habits concern the one that arises from using the popular programming APIs ODBC and JDBC. These APIs “autocommit” by default. Consider the following statements, which transfer $1000 from a checking account to a savings account:
update accounts set balance = balance – 1000 where account_id = 123; update accounts set balance = balance + 1000 where account_id = 456;

If your program is using ODBC or JDBC when you submit these statements, they (silently) inject a commit after each UPDATE. Consider the impact of this if the system fails after the first UPDATE and before the second. You’ve just lost $1000!

I can sort of understand why ODBC does this. The developers of SQL Server designed ODBC, and this database demands that you use very short transactions due to its concurrency model (writes block reads, reads block writes, and locks are a scarce resource). What I can’t understand is how this got carried over into JDBC, an API that is supposed to support “the enterprise.” It is my belief that the very next line of code after opening a connection in JDBC should always be this:

Connection conn = DriverManager.getConnection (“jdbc:oracle:oci:@database”,”scott”,”tiger”);conn.setAutoCommit (false);

This returns control over the transaction back to you, the developer, which is where it belongs. You can then safely code your account transfer transaction and commit it after both statements have succeeded. Lack of knowledge of your API can be deadly in this case. I’ve seen more than one developer unaware of this autocommit “feature” get into big trouble with their application when an error occurred.