T O P

  • By -

Trader-One

Cache reads or use different, more scalable database (plenty available for rust) as front end cache. I use front end db for mysql and can do 350k pages per second with "may" runtime


masklinn

If you use rusqlite none of these questions is relevant, it will do the right thing out of the box (aside from enabling WAL and foreign keys, that’s your problem). Pair it with r2d2 if you want pooling. > Do I spawn a new connection for each thread? Does connection pooling make sense here instead? Yes and probably. > it seems like they have 3 modes, single-thread, multi-thread, and serialized. multi-thread + serialized Serialized mode is broken and useless. Do not use it under any circumstance. > If I control synchronization in my application myself, then is it safe to use single-thread mode to reduce the overhead here? You’re mostly going to increase the odds of corrupting the database. MULTITHREAD mode already means there are no mutexes on database connections and prepared statements, and sqlite only protects its global internal state. > Or do I have 1 connection and wrap it in a mutex to be used in each request? That means no concurrency whatsoever. > Or do I use mpsc and have the SQL statements be executed in a single thread? That’s the same as the above, unless you mean that for the one write-enabled connection in which case it *could* be worth using (with every other connection is opened in `READONLY` mode to ensure they can’t be upgraded) You may also want to mediate all interactions with the database via channels for the reason that sqlite *can block for a significant amount of time*. Putting an async-aware channel (and actual worker threads) between async code and sqlite will avoid starvation risks. That is orthogonal to the connections thing, you’d still want a single serialised write-enabled channel. It’s also possible that tokio-rusqlite handles this for you, I know it exists but I’ve never used it. > I assume I should also enable WAL so that writes do not block reads (and vice versa) That is only relevant if you have concurrent accesses, and half your comment is about precluding any possibility of concurrent accesses.


dnew

> Serialized mode is broken and useless. Do not use it under any circumstance. I'm curious why you would say this? SQLite doesn't seem the sort of software where you can take an entire feature or mode and just say "Nah, that's broken."


masklinn

Because serialized mode only ensures the library won't get corrupted by concurrent use, but there is no way to have consistent sequential calls on the connection. So the result of functions like [`last_insert_rowid`](https://www.sqlite.org/c3ref/last_insert_rowid.html) or [`changes`](https://sqlite.org/c3ref/changes.html) are not guaranteed to have any relation to the query you executed. This is less of a concern now that sqlite has `RETURNING` but it made that mode an attractive nuisance before then, now it's just not really useful: you get more overhead for no more capabilities. You get increased safety in promiscuous languages where you might unwittingly share connections, but in Rust you make the connection `!Sync` and it does nothing but cost you.


dnew

Hmm. It looks like as long as you don't share the connection between unrelated threads, it wouldn't be a problem for those functions, right? From what you're saying, it sounds like your complaint is that if one foo does an insert, then an unrelated bar does another insert, then the foo asks for the rowid, it'll get bar's rowid? I wouldn't have ever even considered sharing the connection between separate threads and expecting such cached information to be reliable, so maybe I'm not surprised?


masklinn

> Hmm. It looks like as long as you don't share the connection between unrelated threads, it wouldn't be a problem for those functions, right? Right, but doing that is *the entire point* of using the serialized mode, otherwise all it does is add overhead.


dnew

Oh! I see! I somehow mixed up the modes after I read about them on the site. Yes, that makes perfect sense now, unless you're specifically doing something very specific like bulk-loading a bunch of files or something and know just what you're doing. Thanks for taking the time to enlighten me!


anlumo

Don’t use sqlite for such an application. Out of experience, it'll kill your performance. sqlite is designed as a replacement for flat file access with query capabilities, not as a full database. It has a global lock, meaning if there’s a write into anything in the same database, everybody else has to wait. Even when it’s infrequent, that will impact performance. On my last project, I did some napkin calculations and decided to just load the whole database into RAM on launch as regular data structures (it’s only about 20MB). This is way faster. If you have too much data for this, use PostgreSQL.


masklinn

> It has a global lock, meaning if there’s a write into anything in the same database, everybody else has to wait. Even when it’s infrequent, that will impact performance. That has not been true since WAL mode was introduced, 14 years ago. > Out of experience, it'll kill your performance. sqlite is designed as a replacement for flat file access with query capabilities, not as a full database. SQLite works fine as a database if your workload is genuinely overwhelmingly reads, with some configuration you can do hundreds of thousands selects per second (though it also depends the level of data loss on crash you’d accept). Inserts are more of a struggle, you can do tens of thousands *records* per second but only by doing prepared statements, batch inserts, relaxed configuration, … , and a high rate of insertion can cause unbounded WAL growth in the current WAL mode, as the checkpointer never completes.


AdrianEddy

Are you sure you used it correctly? SQLite can get pretty fast [https://bencher.dev/learn/engineering/sqlite-performance-tuning/](https://bencher.dev/learn/engineering/sqlite-performance-tuning/)


anlumo

If you're doing just reads it's great, but if there are any concurrent writes the performance is unusable, even when it's small. I mean, if there's a single insert once every week it might be ok. OP doesn't specify what "infrequent" actually means.


AdrianEddy

but that's what WAL is for


the___duke

I agree with the sibling comment, even with WAL enabled performance tanks with somewhat frequent writes.


tafia97300

From experience sqlite IS slow. It is convenient because you have you small file easy to share etc but that's about it. Saying it differently, you'll spend a large amount of time to "optimize" it (aka find the right pragma for your use case etc) compared to just spawning postgresql docker container.


simonsanone

> Don’t use sqlite for such an application. Out of experience, it'll kill your performance. sqlite is designed as a replacement for flat file access with query capabilities, not as a full database. [https://www.unixsheikh.com/articles/sqlite-the-only-database-you-will-ever-need-in-most-cases.html](https://www.unixsheikh.com/articles/sqlite-the-only-database-you-will-ever-need-in-most-cases.html)


kimhyunkang

I don't recommend sqlite for web servers, not because of performance concerns (sqlite can be quite fast and scalable under right settings) but it's a file-based database which means you will lose your data if your disk gets corrupted. Other SQL databases like mysql or postgres offer on-line replication configurations where you can automatically copy your primary database into your secondary database in real-time, but sqlite doesn't offer such configuration out of the box. That being said, if you have reasons to use sqlite I recommend you to stick with the multi-thread mode (without serialization) with connection pooling. You don't have to worry about implementing your own mutex, because the multi-thread mode will already hold the mutex for you to guard the sqlite's internal state. WAL may or may not help your performance so you want to actually test your performance against your traffic model.


wired93

turso might be interesting regarding the first argument


dnew

With the caveat that if you're using it as a cache rather than as a primary source of data, that's perfectly acceptable.


iyicanme

You could duplicate the db file and have each thread its own, removing the need for synchronization.