PostgreSQL Vs. MySQL for INSERT IGNORE + REPLACE - stored procedures, savepoints and beyond
So lately I've doing a variety of development that uses PostgreSQL. It's come a long way since I last used it a couple of years ago, esp. in the realm of performance. The auto-complete in the psql tool is also a very nice touch, I can get it confused occasionally, but the auto-complete in the similar mysql tool is far less advanced.
There is however one thing that still bugs me, and it's turned up in plenty of other places. This is the issue of INSERT IGNORE/REPLACE that MySQL supports, but PostgreSQL doesn't. The widely documented way to handle these is to used a stored procedure instead, but that has a major shortcoming: it can only insert a single value at a time. This is a major performance limitation, and also suffers concurrency issues. INSERT IGNORE and REPLACE handle multi-valued inserted much more gracefully.
INSERT IGNORE is defined as inserting a data tuple, and if the primary key already exists, that singular tuple is not applied to the table. REPLACE is defined as insert a data tuple, and if the primary key already exists, using the remaining data in the tuple to perform an UPDATE statement.
If anybody has got other ideas that make handling REPLACE and INSERT IGNORE in PostgreSQL easier, I'd love to hear them.
MySQL version
Below is an example:
-- Create our table first. CREATE TABLE t ( i1 INT, i2 INT, s VARCHAR(64), PRIMARY KEY(i1,i2) ); -- Insert some data INSERT INTO t (i1,i2,s) VALUES (1,1,'foo'), (1,2,'bar'), (2,1,'zap'); -- INSERT IGNORE example INSERT IGNORE INTO t (i1,i2,s) VALUES (1,1,'skipped 1'), (2,2,'yes'), (2,1,'skipped 2'), (2,3,'yes again'); -- REPLACE example REPLACE INTO t (i1,i2,s) VALUES (1,1,'replaced 1'), (1,3,'totally new'), (1,2,'replaced 2');
That's enough example for now. The first logical way to solve this would seem to be BEGIN; INSERT INTO t (...); INSERT INTO (...); COMMIT; and ignore the duplicate errors, but unfortunately cause the transaction to rollback.
Stored procedures
The classical stored procedure approach to implementing the REPLACE equivalently, is below. If you wanted to implement INSERT IGNORE instead, you'd drop the UPDATE line.
CREATE OR REPLACE FUNCTION replace_into(INT,INT,VARCHAR(64)) RETURNS bool AS $$ DECLARE tmprow RECORD; f_i1 ALIAS FOR $1; f_i2 ALIAS FOR $2; f_s ALIAS FOR $3; BEGIN SELECT INTO tmprow * FROM t WHERE i1=f_i1 AND i2=f_i2; IF FOUND THEN UPDATE t SET s=f_s WHERE i1=f_i1 AND i2=f_i2; RETURN 'f'; ELSE INSERT INTO t (i1,i2,s) VALUES (f_i1,f_i2,f_s); RETURN 't'; END IF; END; $$ LANGUAGE plpgsql; -- now we use it as follows: SELECT replace_into(1,1,'replaced 1');
There are a couple of problems with this. If you want to insert multiple rows, this means that you have multiple transactions to perform - this may be a definite performance concern - N pairs of (SELECT, UPDATE or INSERT) must by definition perform worse than a single large INSERT that just leaves out a few of it's values. You also need to write the stored procedure for every table where you need to use INSERT IGNORE or REPLACE, which just sucks. Additionally, because multiple transactions are used, there is the danger of concurrency and race issues depending on the application that uses your data. If you have to ensure a set of rows is in the database, and one of which already exists, but the next 2 must only exist together, you are in serious trouble - PostgreSQL doesn't support nested transactions until version 8.0.
Save-points
If you are lucky enough to use PostgreSQL newer than 8.0, you get nested transactions, and, the UPDATE documentation contains this gem using them:
BEGIN; -- other operations SAVEPOINT sp1; INSERT INTO wines VALUES('Chateau Lafite 2003', '24'); -- Assume the above fails because of a unique key violation, -- so now we issue these commands: ROLLBACK TO sp1; UPDATE wines SET stock = stock + 24 WHERE winename = 'Chateau Lafite 2003'; -- continue with other operations, and eventually COMMIT;
Using SAVEPOINT would make life easier, but we still have the performance issue of not being able to use a multi-row INSERT. It does have the advantage that you don't need to write custom stored procedures for each table, but you're still stuck with writing custom code for each REPLACE. SAVEPOINT works wonderful as an INSERT IGNORE statement from a programmatic point of view:
Firstly define your prepared INSERT statement to take a single row, with the classic ? in the values tuple. Start a transaction with a BEGIN, then for each row you wish to insert, take a save-point, run the prepared statement with the values, if you hit a key violation, issue rollback to the save-point, otherwise take the next save-point and continue.
So now we have INSERT IGNORE covered in a programmatic sense. You can implement the previous paragraph as a function taking the single-row insert, along with an array of arrays.
We are still stuck on the matter of the REPLACE statement however. Without introspecting our database (a-l´-Ruby-On-Rails), we cannot automatically identify the columns that form the unique or primary key(s). The ultimate objective here is to find a way that the beyond the initial keyword of REPLACE, the query can be handled in a transparent and efficient manner. Introspection would allow automatic creation of the SAVEPOINT+UPDATE above, but that might not be the most efficent way of doing it.
Temporary tables?
If we consider introspecting the database and the REPLACE query, there is a potential that we may gain performance. The following snippet is probably not valid PostgreSQL, but merely serves to illustrate the idea.
-- Create a temporary table. -- MySQL would do this in memory, but I don't know about PostgreSQL. CREATE TEMPORARY TABLE t_temp LIKE temp; -- Now using the exact original query, we can insert all of those values -- directly into the temporary table, as there will not be any conflicts. INSERT INTO t_temp (i1,i2,s) VALUES (...row1...),(...row2...); -- Now comes the magic -- First of all we handle the rows that do conflict. -- If you were doing INSERT IGNORE, you'd leave this line out. -- We need to know the keys that comprise the unique/primary key. UPDATE t SET (i1,i2,s) FROM (SELECT t_temp.* FROM t_temp JOIN t USING (i1,i2)); -- Now we handle the distinct rows that don't conflict INSERT INTO t SELECT t_temp.* FROM t_temp WHERE (i1,i2) NOT IN (SELECT i1,i2 FROM t);
The above requires that the query can be decomposed to identify the primary/unique keys, including those that might not be explicitly specified such as a SERIAL column. Depending on the size of the insert and the existing table, having good query optimizer/planner performance will be critically important.
Insert ignore.
(Anonymous) 2007-06-25 10:10 am (UTC)(link)Example:
create table foo ( id integer not null, dt integer not null, constraint foo_pkey primary key(id, dt));
create table bar ( id integer not null, dt integer not null, constraint bar_pkey primary key(id, dt));
insert into foo values(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2);
insert into bar values(1, 1), (1, 2), (3, 1), (3, 2);
insert into bar select id, dt from foo where (id, dt) not in (select id, dt from bar);
Re: Insert ignore.
Been searching for this as well
Re: Been searching for this as well
(Anonymous) 2007-08-14 09:03 am (UTC)(link)no subject
this may be an alternative
http://archives.postgresql.org/pgsql-general/2007-08/msg00702.php
Re: this may be an alternative
(Anonymous) 2007-08-14 05:04 pm (UTC)(link)1. You need to declare the RULE for every table seperately.
2. The RULE converts _every_ INSERT to a REPLACE (or an ignore). You cannot easily flip between normal inserts, insert-ignore and replace on the same table.
Re: this may be an alternative
(Anonymous) 2008-06-16 08:48 pm (UTC)(link)http://www.postgresql.org/docs/8.0/static/plpgsql-control-structures.html
search for "EXCEPTION WHEN unique_violation THEN"
Re: this may be an alternative
INSERT IGNORE solution
CREATE OR REPLACE RULE "insert_ignore" AS ON INSERT TO table WHERE EXISTS(SELECT true FROM table WHERE key = NEW.key) DO INSTEAD NOTHING;
This should be able preserve performance of multi-row INSERT IGNORE queries, however I haven't verified that, because my old version of Postgres doesn't support multi-row inserts at all :D
Re: INSERT IGNORE solution
Re: INSERT IGNORE solution
Re: INSERT IGNORE solution
Re: INSERT IGNORE solution
(Anonymous) 2009-01-21 01:42 am (UTC)(link)select set_myObject(src.f1, src.f2, src.f3)
FROM source_table src;
where set_myObject performs INSERT/IGNORE functionality and returns void (or a scalar value, say the key) and f1, f2, and f3 are fields from your select.
Postgres is much more powerful than you realize.
Re: INSERT IGNORE solution
(Anonymous) 2009-01-29 02:00 am (UTC)(link)2) in which case you would have to create separate set_myObject() for each table you want this functionality on.
I totally agree.
For more information on the COPY, and its drawbacks, check out the wiki.
http://wiki.postgresql.org/wiki/COPY
Re: I totally agree.
no subject
(Anonymous) 2010-01-28 06:09 am (UTC)(link)insert into (blah blah) WHERE NOT EXISTS...
no subject
"Using an INSERT .. SELECT .. FROM DUAL ... WHERE NOT EXISTS" is subject to the same limitation as the stored procedure example.
- special variant of every INSERT query.
- no multiple rows in a single insert
It also doesn't give us the ability to do REPLACE.
no subject
(Anonymous) 2010-03-16 09:14 pm (UTC)(link)create table foo(
bid int primary key,
rid int);
insert into foo (bid, rid) values (1,2);
insert into foo (bid, rid) values (2,3);
insert into foo
select bid, rid
from (values (1, 2),(2,3),(3,4)) as li(bid, rid)
where (bid, rid) not in (select bid, rid from cms.test);
no subject
no subject
(Anonymous) 2010-07-15 06:36 pm (UTC)(link)no subject
In the same vein, here is an alternate Postgres solution for REPLACE, using neither rules nor stored procedures nor any explicitly created temporary tables, and which supports multiple insert records:
Note that this "REPLACE" alternative differs slightly from the MySQL description in that the whole set of DELETEs is done in bulk at once, before any INSERT. This has the advantage that it avoids any possible collision on secondary "unique" constraints involving records in the set meant to be updated. For example, say that there is a non-primary-key unique-constraint on field "uf", and the REPLACE intends to swap the values of "uf" in the first two records. If the deleting/inserting is done one record at a time, then the change to the first record will fail, because once completed, it will leave both of the two first records with the same "uf" value. If, however, the DELETEs are done in bulk before the inserts, then both records #1 and #2 will be gone before any insert takes place, so the collision will not occur.