Norman Maurer

[maven-release-plugin] prepare for next development iteration

[maven-release-plugin] prepare release netty-3.6.10.Final

HORNETQ-1406 - Correctly handle direct buffers when detect protocol to use

Motivation:

The current code did not work with direct buffers as this expected to be able to access the backing array.

Modifications:

Read directly into a byte array and so make it work with all flavors of ByteBuf.

Result:

No more UnsupportedOperationException when direct buffers are used.

HORNETQ-1406 - Correctly handle direct buffers when detect protocol to use

Motivation:

The current code did not work with direct buffers as this expected to be able to access the backing array.

Modifications:

Read directly into a byte array and so make it work with all flavors of ByteBuf.

Result:

No more UnsupportedOperationException when direct buffers are used.

Directly write CompositeByteBuf if possible without memory copy. Related to [#2719]

Motivation:

In linux it is possible to write more then one buffer withone syscall when sending datagram messages.

Modifications:

Not copy CompositeByteBuf if it only contains direct buffers.

Result:

More performance due less overhead for copy.

Directly write CompositeByteBuf if possible without memory copy. Related to [#2719]

Motivation:

In linux it is possible to write more then one buffer withone syscall when sending datagram messages.

Modifications:

Not copy CompositeByteBuf if it only contains direct buffers.

Result:

More performance due less overhead for copy.

Directly write CompositeByteBuf if possible without memory copy. Related to [#2719]

Motivation:

In linux it is possible to write more then one buffer withone syscall when sending datagram messages.

Modifications:

Not copy CompositeByteBuf if it only contains direct buffers.

Result:

More performance due less overhead for copy.

Fix buffer leak in DatagramUnicastTest caused by incorrect usage of CompositeByteBuf

Motivation:

Due incorrect usage of CompositeByteBuf a buffer leak was introduced.

Modifications:

Correctly handle tests with CompositeByteBuf.

Result:

No more buffer leaks

Fix buffer leak in DatagramUnicastTest caused by incorrect usage of CompositeByteBuf

Motivation:

Due incorrect usage of CompositeByteBuf a buffer leak was introduced.

Modifications:

Correctly handle tests with CompositeByteBuf.

Result:

No more buffer leaks

Fix buffer leak in DatagramUnicastTest caused by incorrect usage of CompositeByteBuf

Motivation:

Due incorrect usage of CompositeByteBuf a buffer leak was introduced.

Modifications:

Correctly handle tests with CompositeByteBuf.

Result:

No more buffer leaks

[#2867] Workaround performance issue with IPv4-mapped-on-IPv6 addresses

Motivation:

InetAddress.getByName(...) uses exceptions for control flow when try to parse IPv4-mapped-on-IPv6 addresses. This is quite expensive.

Modifications:

Detect IPv4-mapped-on-IPv6 addresses in the JNI level and convert to IPv4 addresses before pass to InetAddress.getByName(...) (via InetSocketAddress constructor).

Result:

Eliminate performance problem causes by exception creation when parsing IPv4-mapped-on-IPv6 addresses.

    • -2
    • +26
    /transport-native-epoll/src/main/c/io_netty_channel_epoll_Native.c
[#2867] Workaround performance issue with IPv4-mapped-on-IPv6 addresses

Motivation:

InetAddress.getByName(...) uses exceptions for control flow when try to parse IPv4-mapped-on-IPv6 addresses. This is quite expensive.

Modifications:

Detect IPv4-mapped-on-IPv6 addresses in the JNI level and convert to IPv4 addresses before pass to InetAddress.getByName(...) (via InetSocketAddress constructor).

Result:

Eliminate performance problem causes by exception creation when parsing IPv4-mapped-on-IPv6 addresses.

    • -2
    • +26
    /transport-native-epoll/src/main/c/io_netty_channel_epoll_Native.c
[#2867] Workaround performance issue with IPv4-mapped-on-IPv6 addresses

Motivation:

InetAddress.getByName(...) uses exceptions for control flow when try to parse IPv4-mapped-on-IPv6 addresses. This is quite expensive.

Modifications:

Detect IPv4-mapped-on-IPv6 addresses in the JNI level and convert to IPv4 addresses before pass to InetAddress.getByName(...) (via InetSocketAddress constructor).

Result:

Eliminate performance problem causes by exception creation when parsing IPv4-mapped-on-IPv6 addresses.

    • -2
    • +26
    /transport-native-epoll/src/main/c/io_netty_channel_epoll_Native.c
[#2426] Not cause busy loop when interrupt Thread of AbstractNioSelector

Motivation:

Because Thread.currentThread().interrupt() will unblock Selector.select() we need to take special care when check if we need to rebuild the Selector. If the unblock was caused by the interrupt() we will clear it and move on as this is most likely a bug in a custom ChannelHandler or a library the user makes use of.

Modification:

Clear the interrupt state of the Thread if the Selector was unblock because of an interrupt and the returned keys was 0.

Result:

No more busy loop caused by Thread.currentThread().interrupt()

[#2426] Not cause busy loop when interrupt Thread of AbstractNioSelector

Motivation:

Because Thread.currentThread().interrupt() will unblock Selector.select() we need to take special care when check if we need to rebuild the Selector. If the unblock was caused by the interrupt() we will clear it and move on as this is most likely a bug in a custom ChannelHandler or a library the user makes use of.

Modification:

Clear the interrupt state of the Thread if the Selector was unblock because of an interrupt and the returned keys was 0.

Result:

No more busy loop caused by Thread.currentThread().interrupt()

Add support for sendmmsg(...) and so allow to write multiple DatagramPackets with one syscall. Related to [#2719]

Motivation:

On linux with glibc >= 2.14 it is possible to send multiple DatagramPackets with one syscall. This can be a huge performance win and so we should support it in our native transport.

Modification:

- Add support for sendmmsg by reuse IovArray

- Factor out ThreadLocal support of IovArray to IovArrayThreadLocal for better separation as we use IovArray also without ThreadLocal in NativeDatagramPacketArray now

- Introduce NativeDatagramPacketArray which is used for sendmmsg(...)

- Implement sendmmsg(...) via jni

- Expand DatagramUnicastTest to test also sendmmsg(...)

Result:

Netty now automatically use sendmmsg(...) if it is supported and we have more then 1 DatagramPacket in the ChannelOutboundBuffer and flush() is called.

    • -1
    • +110
    /transport-native-epoll/src/main/c/io_netty_channel_epoll_Native.c
    • -0
    • +8
    /transport-native-epoll/src/main/c/io_netty_channel_epoll_Native.h
Add support for sendmmsg(...) and so allow to write multiple DatagramPackets with one syscall. Related to [#2719]

Motivation:

On linux with glibc >= 2.14 it is possible to send multiple DatagramPackets with one syscall. This can be a huge performance win and so we should support it in our native transport.

Modification:

- Add support for sendmmsg by reuse IovArray

- Factor out ThreadLocal support of IovArray to IovArrayThreadLocal for better separation as we use IovArray also without ThreadLocal in NativeDatagramPacketArray now

- Introduce NativeDatagramPacketArray which is used for sendmmsg(...)

- Implement sendmmsg(...) via jni

- Expand DatagramUnicastTest to test also sendmmsg(...)

Result:

Netty now automatically use sendmmsg(...) if it is supported and we have more then 1 DatagramPacket in the ChannelOutboundBuffer and flush() is called.

    • -1
    • +110
    /transport-native-epoll/src/main/c/io_netty_channel_epoll_Native.c
    • -0
    • +8
    /transport-native-epoll/src/main/c/io_netty_channel_epoll_Native.h
Add support for sendmmsg(...) and so allow to write multiple DatagramPackets with one syscall. Related to [#2719]

Motivation:

On linux with glibc >= 2.14 it is possible to send multiple DatagramPackets with one syscall. This can be a huge performance win and so we should support it in our native transport.

Modification:

- Add support for sendmmsg by reuse IovArray

- Factor out ThreadLocal support of IovArray to IovArrayThreadLocal for better separation as we use IovArray also without ThreadLocal in NativeDatagramPacketArray now

- Introduce NativeDatagramPacketArray which is used for sendmmsg(...)

- Implement sendmmsg(...) via jni

- Expand DatagramUnicastTest to test also sendmmsg(...)

Result:

Netty now automatically use sendmmsg(...) if it is supported and we have more then 1 DatagramPacket in the ChannelOutboundBuffer and flush() is called.

    • -1
    • +110
    /transport-native-epoll/src/main/c/io_netty_channel_epoll_Native.c
    • -0
    • +8
    /transport-native-epoll/src/main/c/io_netty_channel_epoll_Native.h
Disable caching of PooledByteBuf for different threads.

Motivation:

We introduced a PoolThreadCache which is used in our PooledByteBufAllocator to reduce the synchronization overhead on PoolArenas when allocate / deallocate PooledByteBuf instances. This cache is used for both the allocation path and deallocation path by:

- Look for cached memory in the PoolThreadCache for the Thread that tries to allocate a new PooledByteBuf and if one is found return it.

- Add the memory that is used by a PooledByteBuf to the PoolThreadCache of the Thread that release the PooledByteBuf

This works out very well when all allocation / deallocation is done in the EventLoop as the EventLoop will be used for read and write. On the otherside this can lead to surprising side-effects if the user allocate from outside the EventLoop and and pass the ByteBuf over for writing. The problem here is that the memory will be added to the PoolThreadCache that did the actual write on the underlying transport and not on the Thread that previously allocated the buffer.

Modifications:

Don't cache if different Threads are used for allocating/deallocating

Result:

Less confusing behavior for users that allocate PooledByteBufs from outside the EventLoop.

Disable caching of PooledByteBuf for different threads.

Motivation:

We introduced a PoolThreadCache which is used in our PooledByteBufAllocator to reduce the synchronization overhead on PoolArenas when allocate / deallocate PooledByteBuf instances. This cache is used for both the allocation path and deallocation path by:

- Look for cached memory in the PoolThreadCache for the Thread that tries to allocate a new PooledByteBuf and if one is found return it.

- Add the memory that is used by a PooledByteBuf to the PoolThreadCache of the Thread that release the PooledByteBuf

This works out very well when all allocation / deallocation is done in the EventLoop as the EventLoop will be used for read and write. On the otherside this can lead to surprising side-effects if the user allocate from outside the EventLoop and and pass the ByteBuf over for writing. The problem here is that the memory will be added to the PoolThreadCache that did the actual write on the underlying transport and not on the Thread that previously allocated the buffer.

Modifications:

Don't cache if different Threads are used for allocating/deallocating

Result:

Less confusing behavior for users that allocate PooledByteBufs from outside the EventLoop.

Disable caching of PooledByteBuf for different threads.

Motivation:

We introduced a PoolThreadCache which is used in our PooledByteBufAllocator to reduce the synchronization overhead on PoolArenas when allocate / deallocate PooledByteBuf instances. This cache is used for both the allocation path and deallocation path by:

- Look for cached memory in the PoolThreadCache for the Thread that tries to allocate a new PooledByteBuf and if one is found return it.

- Add the memory that is used by a PooledByteBuf to the PoolThreadCache of the Thread that release the PooledByteBuf

This works out very well when all allocation / deallocation is done in the EventLoop as the EventLoop will be used for read and write. On the otherside this can lead to surprising side-effects if the user allocate from outside the EventLoop and and pass the ByteBuf over for writing. The problem here is that the memory will be added to the PoolThreadCache that did the actual write on the underlying transport and not on the Thread that previously allocated the buffer.

Modifications:

Don't cache if different Threads are used for allocating/deallocating

Result:

Less confusing behavior for users that allocate PooledByteBufs from outside the EventLoop.

[#2843] Add test-case to show correct behavior of ByteBuf.refCnt() and ByteBuf.release(...)

Motivation:

We received a bug-report that the ByteBuf.refCnt() does sometimes not show the correct value when release() and refCnt() is called from different Threads.

Modifications:

Add test-case which shows that all is working like expected

Result:

Test-case added which shows everything is ok.

[#2843] Add test-case to show correct behavior of ByteBuf.refCnt() and ByteBuf.release(...)

Motivation:

We received a bug-report that the ByteBuf.refCnt() does sometimes not show the correct value when release() and refCnt() is called from different Threads.

Modifications:

Add test-case which shows that all is working like expected

Result:

Test-case added which shows everything is ok.

[#2843] Add test-case to show correct behavior of ByteBuf.refCnt() and ByteBuf.release(...)

Motivation:

We received a bug-report that the ByteBuf.refCnt() does sometimes not show the correct value when release() and refCnt() is called from different Threads.

Modifications:

Add test-case which shows that all is working like expected

Result:

Test-case added which shows everything is ok.

[#2847] Correctly encode HTTP to SPDY if X-SPDY-Associated-To-Stream-ID is not present

Motivation:

Because of a bug a NPE was thrown when tried to encode HTTP to SPDY and no X-SPDY-Associated-To-Stream-ID was present.

Modifications:

Use 0 as default value when X-SPDY-Associated-To-Stream-ID not present.

Result:

No NPE anymore.

Correctly release buffers on protocol errors

Motivation:

We failed to release buffers on protocolErrors which caused buffer leaks when using HTTP/2

Modifications:

Release buffer on protocol errors

Result:

No more buffer leaks

Correctly release buffer in DelegatingHttp2ConnectionHandlerTest

Motivation:

Because we did not release the buffer correctly in DelegatingHttp2ConnectionHandlerTest the CI failed because of a buffer leak.

See https://secure.motd.kr/jenkins/job/pr_2849_kichik_netty_master/4/consoleFull

Modifications:

Correctly release buffer

Result:

No more leak error.

Cleanup test

Motivation:

Saw a lot of inspector warnings

Modifications:

Fix inspector warnings

Result:

Cleaner code

[#2841] Fix SingleThreadEventLoopTest that was failing because of GC pressure

Motivation:

Sometimes the SingleThreadEventLoopTest did fail on our CI. This was because of GC pressure produced by Thread.sleep(...) when interrupted as it creates a new InterruptedException all the time (and need to fill it).

Modifications:

Replace Thread.sleep(...) with LockSupport.parkNanos(...) to eliminate exception overhead.

Result:

SingleThreadEventLoopTest produce a lot less garbage.

Modify HttpObjectDecoder to allow parsing the HTTP headers in multiple steps.

Motivation:

At the moment the whole HTTP header must be parsed at once which can lead to multiple parsing of the same bytes. We can do better here and allow to parse it in multiple steps.

Modifications:

- Not parse headers multiple times

- Simplify the code

- Eliminate uncessary String[] creations

- Use readSlice(...).retain() when possible.

Result:

Performance improvements as shown in the included benchmark below.

Before change:

[nmaurer@xxx]~% ./wrk-benchmark

Running 2m test @ http://xxx:8080/plaintext

16 threads and 256 connections

Thread Stats Avg Stdev Max +/- Stdev

Latency 21.55ms 15.10ms 245.02ms 90.26%

Req/Sec 196.33k 30.17k 297.29k 76.03%

373954750 requests in 2.00m, 50.15GB read

Requests/sec: 3116466.08

Transfer/sec: 427.98MB

After change:

[nmaurer@xxx]~% ./wrk-benchmark

Running 2m test @ http://xxx:8080/plaintext

16 threads and 256 connections

Thread Stats Avg Stdev Max +/- Stdev

Latency 20.91ms 36.79ms 1.26s 98.24%

Req/Sec 206.67k 21.69k 243.62k 94.96%

393071191 requests in 2.00m, 52.71GB read

Requests/sec: 3275971.50

Transfer/sec: 449.89MB