SQL Tricks: What's an Upsert?

So, despite the debate on utilizing an Active Record (sometimes called Table Row) design pattern, there are times when it can be incredibly useful. Especially when dealing with simple forms that deal with a single record within a table (go figure). It's also really handy if you're utilizing something like The Illudium PU-36 Code Generator to auto-generate your data modeling. But, occasionally, you hit a glitch and need to rethink.

One of the great things about working where I do is that we have two full-time database administrators. These guys are nothing but head-on experts in SQL, specifically MS SQL. That is their sole mission, to monitor and refine our SQL processes. Now, we have a lot of strictly SQL side processes that churn through data all day long, but these guys also watch SQL threads being processed, identify issues in various execution plans, and assist our entire development team in learning to write better SQL within our code. Now, none of our guys are slouches at SQL anyway. They can all write fairly complex query statements, stored procedures, and manage transactions, but sometimes things come up that it's nice to have a specialist's eyes.

The other day I ran into just this type of issue. I had a very vanilla row object, just four data fields, two of which formed the table's unique key. Now, the DAOs, created by Illudium, have a simple save() method. It takes a bean object, passes it to an exists() method, which in turn queries the database to find out if a record already exists with the keys identified in the bean. If the record doesn't exist it will return false, to which the exists() method will then pass the bean into a create() method. If the record does exist then the exists() method will pass the bean to an update() method. All pretty simple and straightforward. I've seen several different ways of writing the Active Record pattern, and many of them have some permutation of this action, either in the DAO or the Gateway. My problem, though, is that I was getting cfquery timeouts, for no true apparent reason. So, I called in John to look at it with me.

Now, I'll start this with saying that there wasn't anything wrong with the code the way it had been generated. Everything should have functioned as planned. In fact my process was looping over a large structure to insert or update records in the database, and the exists() method's query was only timing out about once in 1900 records. But it was erroring, so I asked for a specialist. I showed my code to John, explained exactly what was happening with each record iteration, and John said "Why are you doing all of this? Why don't you just write an Upsert?" What's an Upsert? Well, what does it sound like? An Upsert is a single query block which will update an existing record if it's found, or create a new record if not. And it is ridiculously easy:

<cfset var updQ = "" />
      <cftry>
         <cfquery name="updQ" datasource="#APPLICATION.dsn#">
            UPDATE    dbo.tblATrack
            SET      tsLast = <cfqueryparam value="#CreateODBCDateTime(ARGUMENTS.aTrack.gettsLast())#" cfsqltype="cf_sql_timestamp" />
            WHERE   intAID = <cfqueryparam value="#ARGUMENTS.aTrack.getintAID()#" CFSQLType="cf_sql_integer" />
            AND      chServer = <cfqueryparam value="#ARGUMENTS.aTrack.getchServer()#" cfsqltype="cf_sql_char" maxlength="2" />

            IF @@rowcount = 0
               BEGIN
                  INSERT INTO dbo.tblATrack (
                     tsStart,
                     intAID,
                     tsLast,
                     chServer )
                  VALUES (
                     <cfqueryparam value="#CreateODBCDateTime(ARGUMENTS.aTrack.gettsStart())#" CFSQLType="cf_sql_timestamp" />,
                     <cfqueryparam value="#ARGUMENTS.aTrack.getintAID()#" CFSQLType="cf_sql_integer" />,
                     <cfqueryparam value="#CreateODBCDateTime(ARGUMENTS.aTrack.gettsLast())#" CFSQLType="cf_sql_timestamp" />,
                     <cfqueryparam value="#ARGUMENTS.aTrack.getchServer()#" cfsqltype="cf_sql_char" maxlength="2" /> )
               END
         </cfquery>
         <cfcatch type="database">
            <!--- Error Handler Here --->
         </cfcatch>
      </cftry>

Man, I love these guys. I went from three methods [save() - exists() - update()/create()] with two separate hits to the database server, to a single, simple save() method with one hit to the database server. Now, this code is specific to MS SQL, but I would be surprised if MySQL, Oracle, and PostgreSQL didn't have similar processes to draw from (if you know the answer to that please post it below). Now I'll have to adjust Illudium's XSLT doc for the DAO creation, but this can save quite a bit of overhead in certain circumstances.

This might've been old hat to some of my readers out there, but I thought it worth sharing for those who are like I was. Those who didn't know. I hope this nugget helps someone. If so, feel free to comment.

Related Blog Entries

Comments
zac spitzer's Gravatar With CF8, cfquery will return the updated recordcount for an update/insert statement which means you can avoid the not so portable sql used in the example

http://www.kylehayes.info/blog/index.cfm/2007/4/23...

it also means you can track your missing record using some CFML
# Posted By zac spitzer | 2/5/08 1:02 AM
Steve 'Cutter' Blades's Gravatar @Zac - You're missing the point of the example entirely. I'm well aware of the CF8 cfquery functionality, but by coding the query as I have I do not need to run a second query (translation: second trip to the SQL server in totally separate statement) if I actually need an insert, I just let SQL handle all of that for me. And, yes, this isn't portable. It is written for MS SQL. The challenge is finding a way to do the same thing, in a single query block, on the other platforms. And if it isn't possible? Hey, MS SQL can't do 'Limit', but sometimes you have to use what you're given.
# Posted By Steve 'Cutter' Blades | 2/5/08 6:22 AM
dc's Gravatar Oracle has had this since 9i, although the terminology thy use is MERGE
http://www.psoug.org/reference/merge.html
# Posted By dc | 2/5/08 8:37 AM
Gareth's Gravatar I came across something similar to this looking at SQLite. It has an "ON CONFLICT", that allows you to do something when a conflict occurs. So, if you try to insert and a conflict occurs (e.g. you are trying to insert a record that already exists in the table), you can do "INSERT ON CONFLICT REPLACE" (but can be trimmed to INSERT OR REPLACE or even just REPLACE). Seems like something that should be a standard in SQL. I'm pretty sure that most programmers when trying to insert something, would be quite happy with it updating if the insert failed (especially due to a primary key constraint).
# Posted By Gareth | 2/5/08 8:38 AM
zac spitzer's Gravatar I do get your point (and that i went off in a side direction), but i'll take a step back and explain my thinking.

If you want to take db calls and network overhead into account. Your example is sending twice as many parameters for every query and around 2-4 times the query text. Plus you now have two insert statements in your DAO which just feels kinda wrong to me.

As your example is in a DAO, three lines of code would achieve things in a cross db manner, just after the closing CFQUERY tag add.

<cfif updQ.recordcount eq 0>
   <cfset insert(arguments)/>
</cfif>

It's actually effectively the same code isn't it? The 'challenge' for me is more often how to keep things really simple and expressive (maintainable)?

Your example looks like it's a bit different to basic form DAO CRUD, in those kind of situations, i usually pass in a pk of -1 to indicate a new record to be inserted and thus i can avoid that pesky old exists() query
# Posted By zac spitzer | 2/5/08 9:22 AM
Steve 'Cutter' Blades's Gravatar @Zac - Yes, I have old objects that operate in the same fashion. My bean will have a default on the primary key to -1, to which I'll place a conditional on it to see what might need to happen. Most of my objects have changed since using Illudium to auto-generate, which is why it's now written as above [save() - exists(), etc] That being said, by utilizing a save() method in this fashion I have one method, rather than four, and tap the SQL server once, rather than twice. It seems like a small thing, but in our environment it can be important. The single query may be slightly larger, but it's also conditionalized at the SQL server itself, which will determine what gets run and what does not within a single SQL request. And, it's less code within my DAO, which means less memory space being utilized when the object is instantiated.

To each his own, everyone must use what works best for them and their application.
# Posted By Steve 'Cutter' Blades | 2/5/08 10:11 AM
Nathan Miller's Gravatar This is great! I've used SQL forever and never thought about doing this, but it makes perfect sense. This technique really saves a lot of code writing.
# Posted By Nathan Miller | 2/5/08 1:55 PM
Mark Cadle's Gravatar Oracle doesn't really have anything like this. Merge is used for when you have to perform a combination of insert and update operations on a single table with respect to another table.
Therefore, you can't just pass arguments to it in a cfc as this demonstration is trying to show. You would have to make two DB calls.
# Posted By Mark Cadle | 2/5/08 10:34 PM
Julian Halliwell's Gravatar Steve, yes this is one of those little pearls waiting to be discovered that significantly simplify and improve your code. Rather than "upsert", though I use "insate", which MySQL supports with syntax that I don't think you could make any more concise:

INSERT INTO mytable (pkColumn,column1,column2) VALUES (1,'a','b')
ON DUPLICATE KEY UPDATE column1='a',column2='b'

That's it. No, really. Sheer beauty.
Julian.
# Posted By Julian Halliwell | 2/9/08 4:19 AM
Matt Williams's Gravatar Very cool Cutter. There are probably tons of things we could do with conditional SQL if we really thought about it.

One possibility here, though I'm not sure if there is a benefit...
To keep from sending the same parameter twice (in the case of a failed update resulting in a insert), you could Declare / Set each param in the SQL. It requires 2 additional lines per variable though. It'd be interesting to hear your DBA's recommendation on such:
DECLARE @tsLast datetime
SET @tsLast = <cfqueryparam value="..." cfsqltype="cf_sql_timestamp" />

Then in your update / inserts you just reference @tsLast.

It may be overkill in this situation, but I know of some nasty selects I've seen before that reference the same argument.myVar about 5 or six times. Seems declare and setting there would be more efficient, as opposed to having 5 sql params with the same value.
# Posted By Matt Williams | 2/9/08 10:05 AM
Werbeagentur's Gravatar Great! And many thanks for all the Informations guys....
# Posted By Werbeagentur | 2/27/08 3:54 PM
deepak's Gravatar best Online movie Webs : - http://www.tiptopmovie.com/
# Posted By deepak | 5/9/08 5:46 AM
BlogCFC v. 5.8.001 was created by Raymond Camden. Layout inspired by bluerobot.com., with some JQuery thrown in for fun.