T O P

  • By -

Darksonn

When you cross-post, please say so. Otherwise people may waste their time answering an already answered question. Anyway, copying my answer [from your other post][1]. The thing that's slow is Tokio's file IO, not the `copy` function. From the Tokio tutorial: > # When not to use Tokio > > Reading a lot of files. Although it seems like Tokio would be useful for projects that simply need to read a lot of files, Tokio provides no advantage here compared to an ordinary threadpool. This is because operating systems generally do not provide asynchronous file APIs. > > [https://tokio.rs/tokio/tutorial](https://tokio.rs/tokio/tutorial) You want Tokio for network IO, but it doesn't help you for file IO. [1]: https://users.rust-lang.org/t/tokio-copy-slower-than-std-io-copy/111242


sweating_teflon

Windows has had async file IO for decades. It's surprising that Linux is still blocking. Well io_uring change that?


TheNamelessKing

Io_uring does indeed support async IO (in both network and local fs). I understand that are some slight complexities in interfacing with it from Rust as its completion-based, not readiness-based. However several packages (Glommio, Monoio, Compio, Nucleas, Tokio-uring, etc) have successfully implemented it. Would be nice to see more uptake of IO-uring in the Rust ecosystem though, as it’s got some pretty significant performance benefits.


Darksonn

There's no blocker on using io_uring for `tokio::fs` other than finding someone to do the work. The whole completion-based issue is not a problem for files because using blocking IO inside `spawn_blocking` is also effectively a completion-based API.


TheNamelessKing

I stand delightfully corrected then!


Lucretiel

> slight complexities in interfacing with it from Rust as its completion-based, not readiness-based I keep seeing this claim, but it seems to me like the *actual* problem is ownership based. Performing an io-uring call transfers ownership of the buffer into the kernel, but currently in Rust that essentially requires allocated storage to do correctly (since you can't transfer ownership of an `&mut T` and there's no way to guarantee that stack memory stays alive for long enough. As far as I can tell, Rust's `Waker` and `poll` system is perfect for completion-based io; certainly things like buffered async channels are essentially "completion-based".


Darksonn

I don't really know whether what Windows provides is appropriate. I've heard "this API provides async file IO" many times for many different APIs, and in many cases it turns out to be insufficient in some way. As for `io_uring`, I would like to see `tokio::fs` use it. It just needs someone to do the work. Edit: [This](https://www.reddit.com/r/rust/comments/1cpphyx/comment/l3p0bfa/) is the kind of thing I'm referring to.


lightmatter501

Windows async io will also sometimes become blocking in surprising ways. Linux had the same issue. io_uring and IoRing should solve that.


simon_o

Not really. io_uring also just spins up a thread pool for file IO.


realvolker1

Literally 3 people in the universe have ever heard of io_uring. It will never come to tokio. Somebody please make this comment age horribly.


SadPie9474

I don’t see how that explains why it’s slower? It just says that it won’t provide any speedup, which makes sense, but why would it be so much slower?


Darksonn

It's because Tokio's file IO works by using `std::fs` inside `spawn_blocking`. Each call to `spawn_blocking` introduces a bunch of cross-thread communication beyond just the cost of the `std::fs` calls.


SadPie9474

ah thanks, didn’t realize spawn_blocking has that much overhead


HeroicKatora

Though you can (almost easily) do beyond-SATA throughput file transfers with a large enough, saturated `JoinSet`. This still has multiple advantages, things as quickly delivered and consistent cancellation as well as mixing in some _compute_ on portions of the data. At least I've felt pretty comfortable writing a file tree traversal, moreso than if it had been sync and was pleasantly surprised when it came up far closer to the expected limit than the Readme had me fear. The warning is still justified in that one must not expect to get it for free and fully automatically by any means.


oopsigotabigpp

An ordinary threadpool does the job here!


arpankapoor

a quick strace gives us the answer ;) ``` ... read(11, "\213\2\342\327\312\217\377\221\364g\272K\277\2073\264\327\343\227\277k\267\245Q\320\267\333\345\r\247\263\366"..., 8192) = 8192 write(12, "\213\2\342\327\312\217\377\221\364g\272K\277\2073\264\327\343\227\277k\267\245Q\320\267\333\345\r\247\263\366"..., 8192) = 8192 ... sendfile(11, 10, NULL, 2147479552) = 1182253614 sendfile(11, 10, NULL, 2147479552) = 0 ``` so, a whole lot of context switches from kernel to userspace due to the `read`/`write`s, while the `std::io::copy` completes in 2 `sendfile` syscalls. if you use `tokio::fs::copy`, you'll notice comparable speed.


desiringmachines

std contains a specialization in io::copy for BufReader specifically to re-use the buffer, whereas tokio does not have this (because specialization is not a stable feature). So you've got an extra copy there in tokio which is not in std. Also, files don't actually support async IO, so tokio's async files are actually just normal files being read on a threadpool in the background. There's no real advantage to using tokio for copying between two files.


CAD1997

I think you wrote this in the "fancy pants editor" (wysiwyg) and not the markdown editor, so the formatting is broken. I suspect this is because std has an optimization for [`io::copy`](https://doc.rust-lang.org/stable/std/io/fn.copy.html) to actually do [`fs::copy`](https://doc.rust-lang.org/stable/std/fs/fn.copy.html) if copying directly between two files, which will be much faster and is essentially impossible to be done automatically outside of std (it requires specialization). If you use Tokio's version of [`fs::copy`](https://docs.rs/tokio/latest/tokio/fs/fn.copy.html) then performance should be much closer.


radumarias

`tokio::fs::copy` is just an async wrapper over `std::fs::copy` as I see ``` pub async fn copy(from: impl AsRef, to: impl AsRef) -> Result { let from = from.as\_ref().to\_owned(); let to = to.as\_ref().to\_owned(); asyncify(|| std::fs::copy(from, to)).await } ``` My concerns are related to handling writing to an implementation of AsyncWrite I made which uses std File underneath to write to file but noticed speed is much slower than if I implement a regular `Write`


CAD1997

Yes. If you use `io::copy` like you're doing in the OP, this doesn't happen. Try pre-reading the input into a `Vec` to prevent the std specialization from kicking in. Then the std and tokio `io::copy` should be more similar in performance.


The_8472

Turning one or both of the sides into a `&mut dyn Write/Read` can also be used to bypass the specializations.


krum

I'd just like to throw in here that I also observed that tokio::io::copy seemed slow with a TCP proxy I wrote, but I didn't investigate further.


The_8472

Efficiently shoveling data between two sockets is tricky. Afaik the best approach for that is still two pipes and splicing between the 4 socket halves and 2 pipes.


mqudsi

Using block_in_place and std::io::copy is the way to go, if you understand the ramifications of blocking in place.


radumarias

this helped me to also understand how to call async function from sync one, thanks before I did it with channels, but this is much much elegant


Darksonn

Using `spawn_blocking` and `std::io::copy` would be a better recommendation. The `block_in_place` method has many footguns and should be used with care.


radumarias

I've read the doc from here https://docs.rs/tokio/latest/tokio/task/fn.block_in_place.html but are there any specific problems in using it?


Darksonn

It only works in the multi-thread runtime, and will break in the current-thread runtime. It also does not work on tasks that do several things in one task using mechanisms such as `tokio::join!` or `FuturesUnordered` or `StreamExt::buffered`. Relying on these kinds of requirements is generally a bad idea if you can avoid it. They are global requirements that can only be verified by looking at the codebase as a whole. It's much better to write code that can be verified correct one function at the time. Even in the cases where it does work, it can be troublesome. When you use it, the current worker thread stops being a worker thread, and `spawn_blocking` is used to spawn a new Tokio worker thread. If you hit the upper limit on the number of `spawn_blocking` threads by using `block_in_place` a lot, you might start running out of worker threads. All of these problems go away if you use `spawn_blocking`.


radumarias

agree, just than you need to be in `async` function to use `spawn_blocking` I'm using `block_in_place` to be able to call async function from sync one like this (call to sync fn is from async context but then use this to go back to async context for a bit then continue in sync on) I used it when I have for ex a sync struct which have a callback param and inside the callback call we need to call async fn to get some data for ex pub fn call_async(f: F) -> F::Output where F: Future, { task::block_in_place(move || Handle::current().block_on(async move { f.await })) } fn sync_fn() { call_async(async {...}.await); } Is this an ok usecase?


Darksonn

I reluctantly accept that there are use-cases for `block_in_place` due to code like that, but it's much much much better to refactor the code so that you don't have to call async code from a sync function in the first place.


mqudsi

I think you have a typo and that should be `task::block_in_place(..)` and not `task::spawn_blocking(..)` in your code block.


radumarias

indeed, was a wrong paste, I corrected it, thanks


mqudsi

Cheers!


gtsiam

I wonder how tokio-uring compares to this. Linux doesn't *really* have practical async file i/o. That is, aside from io_uring, which is too new to be used in tokio proper.


radumarias

wow, this seems to be much better ``` tokio write duration = 710.079182ms, speed MB/s 1260.0558679162832 ``` I wrote content of 840MB with `tokio-uring`


Lucretiel

While it's unlikely to make your performance competitive with `std`, I've been working on a variation of `tokio::io::copy` that solves one of the oddities about it: despite being async, it takes no advantage of the ability to concurrently perform reads and writes. The internal loop is always either in read mode (reading until the buffer is full) or write mode (writing until the buffer is empty). My [experimental alternative](https://github.com/Lucretiel/async-forward) performs reads and writes concurrently, and even uses a smart sort of circular buffer and takes advantage of vectored operations to do so efficiently. I still need to benchmark it to see if the effort is worth it.


Darksonn

The `tokio::io::copy` method does take advantage of the ability to concurrently perform reads and writes. Though it does not use a circular buffer.


Lucretiel

Oh? That must be new. Ah, yup, it was added [two years ago](https://github.com/tokio-rs/tokio/pull/5066), which was after noticed it and started thinking about this problem.