?

Log in

No account? Create an account
Speeding Up Bulk SQLite Inserts - Shlomif's Technical Posts Community [entries|archive|friends|userinfo]
Shlomif's Technical Posts Community

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Links
[Links:| Shlomi Fish's Homepage Main Journal Homesite Blog Planet Linux-IL Amir Aharoni in Unicode open dot dot dot ]

Speeding Up Bulk SQLite Inserts [Apr. 8th, 2011|10:29 pm]
Shlomif's Technical Posts Community

shlomif_tech

[shlomif]
[Tags|, , , , , ]
[Current Location |Home]
[Current Music |Betty Everett - You're No Good]

This is a tip on how to speed up bulk inserts in SQLite. In case, you don't know SQLite is an SQL database implemented as a small, yet very fast library, which, accesses everything from the file-system. Furthermore, it is open-source, public domain, and reliable.

Recently, I wanted to create a server-side script to individually display my fortune cookie quotes - each on its own page based on its ID. In order to speed up the lookup, I decided to store everything in a database, be it the raw file-system, something like Berkeley DB or Tokyo Cabinet, or an SQL-based database. I ended up using SQLite, which worked pretty well.

I've written a program to populate the database by parsing the data from the XML source files and extracted the data from there. I prepared an INSERT statement:

my $insert_sth = $dbh->prepare(<<'EOF');
INSERT INTO fortune_cookies (collection_id, str_id, title, text)
VALUES(?, ?, ?, ?)
EOF

(It started a bit simpler, but that's what I have now.)

And then I executed it for each record. It took a lot of time to complete - many minutes. So I consulted #perl about it and got a few suggestions:

  1. That I should disable the so-called “shared database” mode, which is the default SQLite mode that assumes other processes on the machine might access the database.

  2. That I should turn off Autocommit and commit in batches with more than one record in each in every transaction.

The number two option seemed the most likely, and I tried commiting in batches of 50 records each and it made the populating program finish in a few seconds. So I was very happy.

I guess I can try the shared database mode later on, since I populate the database offline. Now the program for displaying every quote individually is working nicely, and I am very happy. Thanks to the #perl people for helping me again.

LinkReply

Comments:
From: (Anonymous)
2011-04-09 07:58 pm (UTC)
I've used and tuned Sqlite3 many-many times.

The best to do is following (pseudo code)

statement.prepare("insert into(...) values(?,?,?...)");

sql_connection.exec("begin");

foreach row begin
statement.bind(1,first_param);
statement.bind(2,second_param);
...
statement.bind(n,last_param);
statement.exec();
end

sql_connection.exec("commit");

The reason it is fast is the fact that every
transaction is ended with fsync(2) to ensure
D of ACID (see man fsync) and it is very heavy
system call

So once you do it in a singe transaction
it is very-very-very fast.


Artyom
(Reply) (Thread)
[User Picture]From: shlomif
2011-04-13 11:28 am (UTC)

Right


Yes, one can use a single transaction for that, as I've shown. I should note that doing stmt.bind([pos], [value]) is much less tedious in Perl and similar languages which have a variable-length argument list. What I normally do is $sth->execute($param1, $param2, $param3). One can also use named parameters, but I have not delved into it.




Next time, please log in with LiveJournal or OpenID.

(Reply) (Parent) (Thread)
From: (Anonymous)
2011-04-09 09:41 pm (UTC)

ORLite

package MyDB;

use ORLite 'my/database.sqlite';

1;

#######################################

my @records = ...;

my $i = 0;
MyDB->begin;
while ( my $cookie = shift @records ) {
MyDB::FortuneCookies->create(%$cookie);
MyDB->commit_begin unless ++$i % 50;
}
MyDB->commit;
(Reply) (Thread)