Hosting Shadowsocks proxy servers is usually an easy task. You install it, and just forget about everything. And due to it being a lightweight proxy, shadowsocks is usually I/O-bounded: To achieve highest throughput, you need higher ethernet, not faster CPU.
However, it is not always the case. It is known that some VPS providers only focus on premium connections and bandwidth, and completely ignores CPU and RAM performance to save money(cough bandwagonhost cough). As such, their VPSes often have 1GBps+ connectivity, with 1 CPU core and 512MB RAM.
This makes Shadowsocks CPU-bounded. The reason your proxy server is so slow might be because its CPU is so outdated that cannot even handle cryptography. I mean, what is the point of 1GBps connectivity when the CPU cannot handle it?
That makes it important to figure out which implementation is the fastest, without compromising security. There are 3 major implementations:
shadowsocks-libev
is the most popular implementation.
It runs smoothly from MIPS routers to large-scale servers.
go-shadowsocks2
is written in pure Golang.
It is also used in V2ray and other proxy software.
shadowsocks-rust
is the new player here. It is written in Rust,
and ensures both memory safety and speed.
Other implementations exist (such as nodejs), but most of them are not up-to-date. Using Shadowsocks with an outdated implementation is generally regarded as unsafe.
The biggest factor influencing Shadowsocks’ speed is
the performance of used encryption method.
In newer machines with AES-NI, aes-256-gcm
is regarded as the fastest one.
In older machines with AES-NI, chacha20-ietf-poly1305
is roughly as fast as aes-256-gcm
.
In machines without AES-NI, chacha20-ietf-poly1305
is much faster.
Thus we will use chacha20-ietf-poly1305
in the benchmark
to eliminate uncertainty from AES-NI instructions.
Secondly, the efficiency of used cryptography library in the implementation is
linked with the performance of this implementation.
shadowsocks-libev
uses libsodium
. It is mostly fine,
but lacks AES acceleration on ARM64.
go-shadowsocks2
uses Golang’s official crypto lib.
shadowsocks-rust
uses ring
, which is regarded as the fastest
cryptography library, even faster than OpenSSL.
Other factors can also influence speed, such as compiler optimization, LTO, language design, etc.
Controlling variables
Since we only care about server-side performance, I will use different implementations of shadowsocks server, connect a client to it, and see how much throughput can be achieved.
To control variables, I used shadowsocks-libev
as the test client.
shadowsocks-libev
is the fastest client implementation,
even able to handle it on a slow router.
Test host: Vultr HFC 2C/2GB RAM
Environment: Arch Linux, linux-zen 5.12.10-zen1-1-zen
CPU: Skylake, aes, avx2, x86_64-v3
Next time I will run it on a bare-metal machine with fixed CPU frequency.
Setup test client
We will use standard iperf3
to test throughput,
and shadowsocks-libev
as test client.
Launch iperf3
server and shadowsocks-libev
client.
iperf3 -s
ss-tunnel -m chacha20-ietf-poly1305 -s 127.0.0.1 -p 8488 -k passwd -l 1090 -L 127.0.0.1:5201
Test shadowsocks-libev
Start shadowsocks server.
ss-server -m chacha20-ietf-poly1305 -s 127.0.0.1 -p 8488 -k passwd
Then, launch iperf3
.
iperf3 -c 127.0.0.1 -p 1090
Output:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 5.03 GBytes 4.32 Gbits/sec 0 sender
[ 5] 0.00-10.00 sec 5.02 GBytes 4.31 Gbits/sec receiver
Test go-shadowsocks2
Start shadowsocks server.
ss-go -s "ss://AEAD_CHACHA20_POLY1305:[email protected]:8488"
Then, launch iperf3
.
iperf3 -c 127.0.0.1 -p 1090
Output:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 5.15 GBytes 4.43 Gbits/sec 2 sender
[ 5] 0.00-10.00 sec 5.14 GBytes 4.41 Gbits/sec receiver
It is only slightly faster.
There is really no reason to use it,
as shadowsocks-libev
is much more stable.
Test shadowsocks-rust
ssserver-rust --single-threaded -m chacha20-ietf-poly1305 -s "[::]:8488" -k passwd
Then, launch iperf3
.
iperf3 -c 127.0.0.1 -p 1090
Output:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 6.12 GBytes 5.26 Gbits/sec 2 sender
[ 5] 0.00-10.03 sec 6.12 GBytes 5.25 Gbits/sec receiver
This is almost 20%(!!!) faster than shadowsocks-libev
,
which completely exceeds my expectations.
Mind that this test server has a Skylake CPU,
with AES-NI, AVX2, and all other modern instructions.
On this test server, 5.25 Gbps easily saturates the ethernet link. So 20% does not make much difference. On bandwagonhost, their slower CPUs are similar to Sandybridge models. That makes this 20% uplift much more important. It can make the difference between 500MBps and 600MBps.
Conclusion
shadowsocks-libev
’s server implementation is actually the slowest.
I have no idea why this happened. Maybe shadowsocks-libev
aggressively optimizes for client uses,
and ignores server uses?
Given the popularity of itself in servers, it is really bad.
Moreover, it lacks AES acceleration on ARM64.
This really makes shadowsocks-rust
an attractive choice.
go-shadowsocks2
is a bit faster than shadowsocks-libev
.
I guess it is ok since it is used in a lot of Golang proxy softwares.
shadowsocks-rust
is the fastest, with no surprise.
Rust is fast, the crypto lib ring
is fast, and Rust has good LLVM optimizations.
So to maximize your proxy server’s throughput,
shadowsocks-rust
should be the first choice.
It has frequent updates, has better performance, and should have less segfaults.
Next time, I will test if optimizing for target CPU instead of generic x86_64 will yield better performance.