I like Node.js' simple and fully isolated concurrency model. You shouldn't be blocking the main event loop for 30 seconds! The main event loop is not intended to be used for heavy processing.
You can just set up a separate child process for that. The main event loop which handles connections should just co-ordinate and delegate work to other programs and processes. It can await for them to complete asynchronously; that way the event loop is not blocked.
I recall people have been able to get up to around a million (idle) WebSocket connections handled by a single process.
I was able to comfortably get 20k concurrent sockets per process each churning out 1 outbound message every 3 to 5 seconds (randomized to spread out the load).
It is a good thing that Node.js forces developers to think about this because most other engines which try to hide this complexity tend to impose a significant hidden cost on the server in the form of context switching... With Node.js, there is no such cost, your process can basically have a whole CPU core for itself and it can orchestrate other processes in a maximally efficient way if you write your code correctly... Which Node.js makes very easy to do. Spawning child processes and communicating with them in Node.js is a breeze.
Reading the article, I didn’t see this answered: why not scale to more nodes if your workload is CPU bound? Spin off 1 cpu and a few gb of ram container and scale that as wide as you need?
e.g., this certainly helps when the event loop is blocked, but so could FFI calls to another language for the CPU bound work. I’d only reach for a new Node thread if these didn’t pan out, because there’s usually a LOT that goes into spinning up a new node process in a container (isolating the data, making sure any bundlers and transpilers are working, making sure the worker doesn’t pull in all the app code, etc.).
Side car processes aren’t free, either. Now your processes are contending for the same pool of resources and can’t share anything, which IME means more likelihood of memory issues, esp if there isn’t anything limiting the workers your app can spawn.
Still, good article! Love seeing the ways people tackle CPU bound work loads in an otherwise I/O bound Node app.
Related tangent: Platformatic's "Watt" server^1 takes a pretty interesting approach to Node, leveraging worker threads on all available cores for maximum efficiency.
I get its a constraint of the language but the ubiquitousness of bundlers and differing toolchains in the JS world has always made me regret trying to use worker primitives, whether they be web workers, worker threads and more. Not to mention trying to ship them to users via a library being a nightmare as mentioned in the article.
Almost none of them treat these consistently (if they consider these at all) and all require you to work around them in strange ways.
It feels like there is a lot they could help with in the web world, especially in complex UI and moving computation off the main thread but they are just so clunky to use that almost nobody tries to work around it.
The ironic part is if bundlers, transpilers, compilers etc. weren't used at all they would probably have much more widespread use.
I'm currently writing simulations of trading algorithms for my own use.
I'm using worker_threads + SharedArrayBuffer and running them in Bun. I also tried porting the code to C# and Go, but the execution time ended up being very similar to the Bun version. NodeJS was slower.
Only C gave a clear, noticeable performance advantage — but since I haven't written C in a long time, the code became significantly harder to maintain.
I love the simplicity of Node.js that each process or child process can have its own CPU core with essentially no context switching (assuming you have enough CPU cores).
Most other ways are just hiding the context switching costs and complicating monitoring IMO.
I went through a similar journey trying worker threads for CPU-bound work in Node. The serialization cost of passing data between threads ate most of my gains, especially with larger inputs. Ended up going the napi-rs route instead — Rust addon running in the main thread with near-zero FFI overhead. Different tradeoff since you lose the parallelism, but for my workload the raw speed was already enough.
You can just set up a separate child process for that. The main event loop which handles connections should just co-ordinate and delegate work to other programs and processes. It can await for them to complete asynchronously; that way the event loop is not blocked.
I recall people have been able to get up to around a million (idle) WebSocket connections handled by a single process.
I was able to comfortably get 20k concurrent sockets per process each churning out 1 outbound message every 3 to 5 seconds (randomized to spread out the load).
It is a good thing that Node.js forces developers to think about this because most other engines which try to hide this complexity tend to impose a significant hidden cost on the server in the form of context switching... With Node.js, there is no such cost, your process can basically have a whole CPU core for itself and it can orchestrate other processes in a maximally efficient way if you write your code correctly... Which Node.js makes very easy to do. Spawning child processes and communicating with them in Node.js is a breeze.
e.g., this certainly helps when the event loop is blocked, but so could FFI calls to another language for the CPU bound work. I’d only reach for a new Node thread if these didn’t pan out, because there’s usually a LOT that goes into spinning up a new node process in a container (isolating the data, making sure any bundlers and transpilers are working, making sure the worker doesn’t pull in all the app code, etc.).
Side car processes aren’t free, either. Now your processes are contending for the same pool of resources and can’t share anything, which IME means more likelihood of memory issues, esp if there isn’t anything limiting the workers your app can spawn.
Still, good article! Love seeing the ways people tackle CPU bound work loads in an otherwise I/O bound Node app.
Worker threads can be more convenient than FFI, as you don't need to compile anything, you can reuse the main application's functions, etc.
1. https://docs.platformatic.dev/docs/overview/architecture-ove...
Almost none of them treat these consistently (if they consider these at all) and all require you to work around them in strange ways.
It feels like there is a lot they could help with in the web world, especially in complex UI and moving computation off the main thread but they are just so clunky to use that almost nobody tries to work around it.
The ironic part is if bundlers, transpilers, compilers etc. weren't used at all they would probably have much more widespread use.
And you can make it thread-like if you prefer by creating a “load balancer” setup to begin with to keep them CPU bound.
Spawn a process for each CPU, bind data you need, and it can feel like multithreading from your perspective.More here https://github.com/bennyschmidt/simple-node-multiprocess
Most other ways are just hiding the context switching costs and complicating monitoring IMO.