Bug 2206 - [JAVA & CPUS=1] FUTEX_WAKE and FUTEX_CLOCK_REALTIME broken / wait forever?
[JAVA & CPUS=1] FUTEX_WAKE and FUTEX_CLOCK_REALTIME broken / wait forever?
Status: CLOSED FIXED
Product: OpenVZ
Classification: Unclassified
Component: kernel
rhel6-2.6.32_042stabXXX
x86_64 (AMD64) All
: P3 major
Assigned To: Konstantin Khlebnikov
: 1908 2314 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-05 14:08 EST by Stefan Priebe
Modified: 2015-04-01 15:06 EDT (History)
10 users (show)

See Also:


Attachments
diff-ve-virtualize-sys-devices-system-cpu (3.02 KB, patch)
2012-07-20 12:03 EDT, Konstantin Khorenko
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Priebe 2012-03-05 14:08:42 EST
Hi,

i've seen this bug very often using java based applications in VPS using OpenVZ. Sadly i wasn't able to provide an example with public aivalable software in the past.

Now i am.

1st some details:
1.) CT0 /  Hostsystem is running Debian lenny with a custom configured RHEL6 2.6.32-049.5 based kernel
2.) CT100 is running Debian Squeeze

I'm able to reproduce these FUTEX hangs using a the minecraft_server.jar. I can reproduce this with OpenJDK and Sun Java JRE (different versions tested).

I'm starting the minecraft server with:
~# java -Xms512M -Xmx850M -jar minecraft_server.jar nogui

then it seems to start fine but at some point preparing spawn area it just stops. This point is different on every start.

Example output:
177 recipes
27 achievements
2012-03-05 18:58:19 [INFO] Starting minecraft server version 1.2.3
2012-03-05 18:58:19 [INFO] Loading properties
2012-03-05 18:58:19 [INFO] Starting Minecraft server on X:25565
2012-03-05 18:58:19 [WARNING] **** SERVER IS RUNNING IN OFFLINE/INSECURE MODE!
2012-03-05 18:58:19 [WARNING] The server will make no attempt to authenticate usernames. Beware.
2012-03-05 18:58:19 [WARNING] While this makes the game possible to play without internet access, it also opens up the ability for hackers to connect with any username they choose.
2012-03-05 18:58:19 [WARNING] To change this, set "online-mode" to "true" in the server.settings file.
2012-03-05 18:58:19 [INFO] Preparing level "Lets Mine Together"
2012-03-05 18:58:19 [INFO] Default game type: 0
2012-03-05 18:58:20 [INFO] Preparing start region for level 0
2012-03-05 18:58:21 [INFO] Preparing spawn area: 12%
2012-03-05 18:58:22 [INFO] Preparing spawn area: 16%
2012-03-05 18:58:23 [INFO] Preparing spawn area: 24%
2012-03-05 18:58:24 [INFO] Preparing spawn area: 32%
2012-03-05 18:58:25 [INFO] Preparing spawn area: 40%
2012-03-05 18:58:26 [INFO] Preparing spawn area: 52%
2012-03-05 18:58:27 [INFO] Preparing spawn area: 65%
2012-03-05 18:58:28 [INFO] Preparing spawn area: 81%


As strace just shows this:
[pid  5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 492738000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid  5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 542959000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid  5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 593345000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid  5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 643570000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid  5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 694108000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid  5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 744438000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid  5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 794845000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid  5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 845246000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid  5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 895815000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid  5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  5011] futex(0x162a054, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1330974452, 946122000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid  5011] futex(0x162a028, FUTEX_WAKE_PRIVATE, 1) = 0

Why do i think this is a bug? 
This is plain simple i've seen exactly this behaviour on totally different java based apps. For example on Amazons AMTU server too. Also this work absolutely fine using an OpenVZ 2.6.18 based kernel.

Stefan
Comment 1 Stefan Priebe 2012-03-05 14:30:50 EST
Same happens on:
1.) CT0 /  Hostsystem is running Debian Squeeze with a custom configured RHEL6 2.6.32-049.6 based kernel
2.) CT100 is running Debian Squeeze i386

Host system was in both cases x64.
Comment 2 Stefan Priebe 2012-03-06 06:07:49 EST
another hint it seems to work always fine on CT0. It just hangs in VPS. The java process needs to be killed by a kill -9 then.
Comment 3 Stefan Priebe 2012-03-19 09:50:19 EDT
Can nobody help? This bothers me since 2.6.32 is released.
Comment 4 Kir Kolyshkin 2012-03-22 17:02:56 EDT
Kostya,

Please assign someone to investigate this.
Comment 5 Konstantin Khorenko 2012-03-23 04:12:07 EDT
Kostya, please take a look at this.
Comment 6 Konstantin Khlebnikov 2012-03-28 07:43:25 EDT
Please attach container's config file and
try to set all resources unlimited.
Comment 7 Stefan Priebe 2012-03-28 14:24:06 EDT
Hi Kostya,

here is the config:
----------------------------------------------------
ONBOOT="yes"

PHYSPAGES="0:256000"
SWAPPAGES="0:384000"
NUMPROC="400"
NUMFILE="8192"

# Disk quota parameters (in form of softlimit:hardlimit)
DISKSPACE="10485760:10485760"
DISKINODES="1000000:1000000"
QUOTATIME="0"

# CPU fair sheduler parameter
CPUUNITS="2000"
OFFLINE_MANAGEMENT="yes"
IPTABLES="ip_tables iptable_filter iptable_mangle ipt_limit ipt_REJECT ipt_length ip_conntrack ipt_state "
CAPABILITY="SYS_CHROOT:on "
QUOTAUGIDLIMIT="2000"
VE_ROOT="/vz/root/$VEID"
VE_PRIVATE="/vz/private/$VEID"
OSTEMPLATE="debian-6.0-amd64-minimal"
ORIGIN_SAMPLE="vservercon-starter"
IP_ADDRESS="XX"
HOSTNAME="XX"
NAMESERVER="XX"
CPULIMIT="100"
CPUS="1"
DEVICES="c:10:200:rw "
----------------------------------------------------

Here is the output of user_beancounters:
----------------------------------------------------
# cat /proc/user_beancounters 
Version: 2.5
       uid  resource                     held              maxheld              barrier                limit              failcnt
   848629:  kmemsize                  9295866             25583616  9223372036854775807  9223372036854775807                    0
            lockedpages                     0                    0  9223372036854775807  9223372036854775807                    0
            privvmpages                 16517               353815  9223372036854775807  9223372036854775807                    0
            shmpages                       18                  690  9223372036854775807  9223372036854775807                    0
            dummy                           0                    0                    0                    0                    0
            numproc                        22                   81                  400                  400                    0
            physpages                   18686                80613                    0               256000                    0
            vmguarpages                     0                    0  9223372036854775807  9223372036854775807                    0
            oomguarpages                 1949                 2191  9223372036854775807  9223372036854775807                    0
            numtcpsock                      3                   15  9223372036854775807  9223372036854775807                    0
            numflock                        1                    7  9223372036854775807  9223372036854775807                    0
            numpty                          1                    5  9223372036854775807  9223372036854775807                    0
            numsiginfo                      0                   45  9223372036854775807  9223372036854775807                    0
            tcpsndbuf                   52224               261120  9223372036854775807  9223372036854775807                    0
            tcprcvbuf                   49152              1249280  9223372036854775807  9223372036854775807                    0
            othersockbuf                    0                27904  9223372036854775807  9223372036854775807                    0
            dgramrcvbuf                     0                 8704  9223372036854775807  9223372036854775807                    0
            numothersock                   41                   47  9223372036854775807  9223372036854775807                    0
            dcachesize                4193270             20474766  9223372036854775807  9223372036854775807                    0
            numfile                       168                  375                 8192                 8192                    0
            dummy                           0                    0                    0                    0                    0
            dummy                           0                    0                    0                    0                    0
            dummy                           0                    0                    0                    0                    0
            numiptent                      10                   10  9223372036854775807  9223372036854775807                    0
---------------------------------------------------

Sometimes the start also looks like this:
----------------------------------------------------
root@vps-848-848629:/opt/mm# java -Xmx1024M -Xms1024M -jar minecraft_server.jar  nogui
182 recipes
27 achievements
2012-03-28 18:21:06 [INFO] Starting minecraft server version 1.2.4
2012-03-28 18:21:06 [INFO] Loading properties
2012-03-28 18:21:06 [WARNING] server.properties does not exist
2012-03-28 18:21:06 [INFO] Generating new properties file
2012-03-28 18:21:06 [INFO] Starting Minecraft server on *:25565
2012-03-28 18:21:06 [WARNING] Failed to load ban list: java.io.FileNotFoundException: banned-players.txt (No such file or directory)
2012-03-28 18:21:06 [WARNING] Failed to load ip ban list: java.io.FileNotFoundException: banned-ips.txt (No such file or directory)
2012-03-28 18:21:06 [WARNING] Failed to load operators list: java.io.FileNotFoundException: ops.txt (No such file or directory)
2012-03-28 18:21:06 [WARNING] Failed to load white-list: java.io.FileNotFoundException: white-list.txt (No such file or directory)
2012-03-28 18:21:06 [INFO] Preparing level "world"
2012-03-28 18:21:06 [INFO] Default game type: 0
2012-03-28 18:21:07 [INFO] Preparing start region for level 0
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (nmethod.cpp:1880), pid=7673, tid=140711694026496
#  Error: guarantee(nm->_lock_count >= 0,"unmatched nmethod lock/unlock")
#
# JRE version: 6.0_18-b18
# Java VM: OpenJDK 64-Bit Server VM (14.0-b16 mixed mode linux-amd64 )
# Derivative: IcedTea6 1.8.13
# Distribution: Debian GNU/Linux 6.0.4 (squeeze), package 6b18-1.8.13-0+squeeze1
# An error report file with more information is saved as:
# /opt/mm/hs_err_pid7673.log
#
# If you would like to submit a bug report, please include
# instructions how to reproduce the bug and visit:
#   http://icedtea.classpath.org/bugzilla
#
Aborted
-----------------------------------------------
Comment 8 Stefan Priebe 2012-03-28 14:27:40 EDT
sometimes it also happens that java does not even start:
root@vps-848-XXXXX:/opt/mm# java -Xmx1024M -Xms1024M -jar minecraft_server.jar  nogui        
...

now only kill -9 can help.
Comment 9 Stefan Priebe 2012-03-28 14:34:47 EDT
Even changing NUMPROC and NUMFILE to unlimited doesn't change anything at all. The only way to fix it is to downgrade to 2.6.18 based kernel.
Comment 10 Stefan Priebe 2012-04-10 05:02:24 EDT
@Kostya
Any news on this?
Comment 11 letic 2012-05-24 07:09:34 EDT
We are encountering the exact same issue with ALL our Java applications running in containers.

As anybody found a workaround for this issue ?

What information do you require me to provide to debug this issue ?

Thanks in advance
LeTic
Comment 12 Stefan Priebe 2012-05-24 07:15:49 EDT
I opened this bug (under another ID) also last year while pretesting the RHEL6 releases. Sadly it seems that no OpenVZ developer cares about this problem.
Comment 13 letic 2012-06-01 05:52:54 EDT
Hey Stefan,

I tried to debug the issue myself (see here : https://redmine.personalized-software.ie/projects/opensource/wiki/Java_openVZ_Futex_issue for the long story) and found the following :

The issue is apparently affecting only :
* hosts with several cores/CPUs running any version of the 2.6.32-openvz kernel (tested with debian squeeze, proxmox 1.9, proxmox 2.0 and vanilla patched kernel).
* (debian) guests with only one CPU

But I did found two solutions/workarounds :
* Affect more than 1 CPU to the guest
* Give CPU affinity (--cpumask) to the guest

This was quite tricky to debug so I hope this might help other people stuck with the same problem. Unfortunately once you know what the solution is you always find people who found the same In any case it cannot hurt to have more documentation about this :o)

LeTic
Comment 14 Stefan Priebe 2012-06-01 05:57:39 EDT
Thanks for the information. Sadly i won't give all my Java Hosts more CPUs nor set the CPU affinity.

I hope an OpenVZ Dev can come up with your researches! THX!
Comment 15 Konstantin Khlebnikov 2012-06-13 11:19:14 EDT
*** Bug 1908 has been marked as a duplicate of this bug. ***
Comment 16 CoolCold 2012-06-30 21:36:07 EDT
Not sure this is the same bug, but may be. In the night monitoring notified about increased LA, so I've looked in and found java in containers started to eat cpu and generating a lot of soft interrupts/context switches, here is output from vmstat:

root@fr05:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
15  0      0 4360608 4109900 3824168    0    0     2    24    2    0  1  1 97  0
18  0      0 4360352 4109904 3824164    0    0     0    68 16630 4042453 13 33 55  0
22  0      0 4360484 4109904 3824168    0    0     0     0 22484 3811483 11 30 58  0
15  0      0 4360856 4109904 3824168    0    0     0   396 27419 3839159 12 30 58  0
16  0      0 4360960 4109908 3824168    0    0     0    16 22607 3887733 12 33 55  0
14  0      0 4361084 4109916 3824160    0    0     0    16 20232 3859333 12 30 58  0
16  0      0 4361368 4109920 3824168    0    0     0    56 32442 3777620 10 28 61  0
20  0      0 4361368 4109920 3824172    0    0     0     0 21971 3814482 14 30 56  0
13  0      0 4361492 4109920 3824172    0    0     0     8 31462 3848631 16 33 51  0

as you can see around 22484 interrupts, 3811483 context switches. Java became a bit crazy in all containers it was running, while such crazyness shouldn't be related. Strace shows something like:

futex(0x7f43e800d328, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f43e800d354, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1341105428, 922628000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f43e800d328, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f43e800d354, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1341105428, 922697000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f43e800d328, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f43e800d354, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1341105428, 922765000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f43e800d328, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f43e800d354, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1341105428, 922834000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f43e800d328, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f43e800d354, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1341105428, 922903000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f43e800d328, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f43e800d354, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1341105429, 125837000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f43e800d328, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f43e800d354, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1341105429, 125929000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f43e800d328, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f43e800d354, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1341105429, 126003000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f43e800d328, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f43e800d354, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1341105429, 126073000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f43e800d328, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f43e800d354, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1341105429, 126139000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f43e800d328, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f43e800d354, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1341105429, 126206000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f43e800d328, FUTEX_WAKE_PRIVATE, 1) = 0


may be gdb will be interesting:
(gdb) bt
#0  0x00007f589ff9e569 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x00007f589f2ea9a6 in os::PlatformEvent::park(long) () from /usr/lib/jvm/java-6-openjdk/jre/lib/amd64/server/libjvm.so
#2  0x00007f589f2ee55b in os::sleep(Thread*, long, bool) () from /usr/lib/jvm/java-6-openjdk/jre/lib/amd64/server/libjvm.so
#3  0x00007f589f3ddd08 in WatcherThread::run() () from /usr/lib/jvm/java-6-openjdk/jre/lib/amd64/server/libjvm.so
#4  0x00007f589f2ee8d2 in java_start(Thread*) () from /usr/lib/jvm/java-6-openjdk/jre/lib/amd64/server/libjvm.so
#5  0x00007f589ff998ca in start_thread () from /lib/libpthread.so.0
#6  0x00007f589f8f892d in clone () from /lib/libc.so.6
#7  0x0000000000000000 in ?? ()
(gdb) bt full
#0  0x00007f589ff9e569 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
No symbol table info available.
#1  0x00007f589f2ea9a6 in os::PlatformEvent::park(long) () from /usr/lib/jvm/java-6-openjdk/jre/lib/amd64/server/libjvm.so
No symbol table info available.
#2  0x00007f589f2ee55b in os::sleep(Thread*, long, bool) () from /usr/lib/jvm/java-6-openjdk/jre/lib/amd64/server/libjvm.so
No symbol table info available.
#3  0x00007f589f3ddd08 in WatcherThread::run() () from /usr/lib/jvm/java-6-openjdk/jre/lib/amd64/server/libjvm.so
No symbol table info available.
#4  0x00007f589f2ee8d2 in java_start(Thread*) () from /usr/lib/jvm/java-6-openjdk/jre/lib/amd64/server/libjvm.so
No symbol table info available.
#5  0x00007f589ff998ca in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#6  0x00007f589f8f892d in clone () from /lib/libc.so.6
No symbol table info available.
#7  0x0000000000000000 in ?? ()
No symbol table info available.



container was initially runned on 2.6.32-042stab049.6 kernel, so showing this in version, but migrated to 2.6.32-042stab055.16-el6-openvz and running on that version now.
Comment 17 Konstantin Khlebnikov 2012-07-17 07:13:03 EDT
JAVA-JIT generates wrong code on SMP Host if container's CPUS=1
There is no simple way to fix this in the kernel without breaking compatibility.
Safest solution: set --cpus=0 and don't touch it any more (we plan to deprecate this option in future) use --cpuslimit / --cpumask instead.
Comment 18 CoolCold 2012-07-17 07:15:47 EDT
I think my comment was triggered by 1st of july and leap second - a lot of java/mysql programms started to eat cpu due to another kernel bug.
Comment 19 Konstantin Khlebnikov 2012-07-17 07:23:57 EDT
(In reply to comment #18)
> I think my comment was triggered by 1st of july and leap second - a lot of
> java/mysql programms started to eat cpu due to another kernel bug.

Seems so. I told about Stefan's problem.
Comment 20 Konstantin Khlebnikov 2012-07-17 07:44:26 EDT
*** Bug 2314 has been marked as a duplicate of this bug. ***
Comment 21 Konstantin Khorenko 2012-07-20 11:01:00 EDT
* diff-ve-virtualize-sys-devices-system-cpu
Added to 042stab060

virtualize /sys/devices/system/cpu

Currently the "virtualization" consists in creating empty cpu# dir for
each possible cpu.
Comment 22 Stefan Priebe 2012-07-20 11:18:51 EDT
thanks for fixing this issue. Can you please upload the patch so people can test with 057.1? Thanks!
Comment 23 Konstantin Khorenko 2012-07-20 12:03:56 EDT
Created attachment 1806 [details]
diff-ve-virtualize-sys-devices-system-cpu

diff-ve-virtualize-sys-devices-system-cpu attached to the bug
Comment 24 Alex Athanasopoulos 2012-07-25 03:26:22 EDT
I think this bug and the workaround (not limiting the # of cpus) applies in my case too, so I just want to report simpler steps to reproduce with the default debian/proxmox kernels and just compiling a minimal .java file.

I used two installations:
1. proxmox 2.1 bare-metal install CD on Intel(R) Celeron(R) CPU G530 @ 2.40GHz
2. debian 6.0.5 amd64 net install CD on Intel(R) Celeron(R) CPU E3300  @ 2.50GHz
both are two-core CPUs.

uname -srvm
Linux 2.6.32-11-pve #1 SMP Wed Apr 11 07:17:05 CEST 2012 x86_64
Linux 2.6.32-5-openvz-amd64 #1 SMP Sun May 6 05:21:56 UTC 2012 x86_64

I used openvz template ubuntu-10.04-standard_10.04-4_i386 and did the following in the resulting container:
apt-get update
apt-get upgrade
apt-get install openjdk-6-jdk
(I also did apt-get install java-common, but I don't think it matters).

# create a user and su as that user.  Then:
echo "public class A {}" > A.java
javac A.java

repeat a few times until it hangs (less than 10 tries)
The last few lines of "strace javac A.java" are:
clone(child_stack=0xb7796494, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0xb7796bd8, {entry_number:12, base_addr:0xb7796b70, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb7796bd8) = 1072
futex(0xb7796bd8, FUTEX_WAIT, 1072, NULL

I created an openvz template of the container so I could re-launch it in a few different ways.

With proxmox, I created the container like this:
pvectl create 103 local:vztmpl/ubuntu-10.04-java_10.04-4_i386.tar.gz -disk 8 -hostname java.proxmox.int -onboot 1 -ip_address 192.168.1.103 -nameserver 192.168.1.1 -memory 2048 -cpus 1

I also copied the previous 103.conf file to /etc/vz/conf/ve-dev.conf-sample and removed the last few container-specific lines.  Then I did:
vzctl create 103 --config dev --ostemplate ubuntu-10.04-java_10.04-4_i386 --ipadd 192.168.1.103 --hostname java.proxmox.int
vzctl set 103 --onboot yes --name java --nameserver 192.168.1.1 --diskspace 8G --cpus 1 --save

I copied the container .tar.gz and ve-dev.conf-sample files to the pure debian machine (no proxmox) and did the same thing.  On the debian machine the config file did not seem to get applied properly, because running "top" reported all of the machine's memory (4G) instead of the specified amount.

I also created a .tar.gz template of a working proxmox 1.9 container and launched it on proxmox 2.1 with similar results.  On proxmox 1.9 it was working mostly fine (a java process would hang once in a while).  On proxmox 2.1 it hang immediately.  proxmox 1.9 was running 2.6.32-6-pve.

The md5sums of the installation media I used are:
940919e832e64c29e67d89233a11b88b  proxmox-ve_2.1-f9b0f63a-26.iso
a213b1d6da1996c677706d843b6ee0f2  debian-6.0.5-amd64-netinst.iso
ed04c27ede0327d39b88e43bf9b0d10e  ubuntu-10.04-standard_10.04-4_i386.tar.gz

Downloaded from:
http://proxmox.org/downloads/proxmox-ve/17-iso-images
http://www.debian.org/distrib/netinst#smallcd
For debian, I then installed openvz from the debian repositories, using apt-get.
Comment 25 Yasuyuki Nakamura 2012-09-19 00:32:13 EDT
(In reply to comment #21)
> * diff-ve-virtualize-sys-devices-system-cpu
> Added to 042stab060
> 
> virtualize /sys/devices/system/cpu
> 
> Currently the "virtualization" consists in creating empty cpu# dir for
> each possible cpu.

It should be only 1 cpu directly?
with 2.6.32-042stab061.2 kernel shows all physical CPU in  /sys/devices/system/cpu.
I think this behavior is not expected.

If I'm misunderstanding, please change status.
Comment 26 Konstantin Khlebnikov 2012-09-19 03:18:31 EDT
No, it's not. This hack disables UP optimizations in jvm if machine is actually SMP.
Comment 27 Yasuyuki Nakamura 2012-09-19 03:33:13 EDT
(In reply to comment #26)
> No, it's not. This hack disables UP optimizations in jvm if machine is
> actually SMP.

thanks to explain.

Now I tested with description of following ticket.
http://bugzilla.openvz.org/show_bug.cgi?id=2314
(this ticket is duplicated with Bug 2206)

the Java program shows wrong behavior with  2.6.32-042stab061.2 kernel.
So it seems the bug is not fixed or I find other problem.
Do you have any ideas?
Comment 28 Konstantin Khlebnikov 2012-09-19 04:10:47 EDT
Probably this workaround in sysfs just does not work for you JVM/libc version, or libc cannot get access to /sys for some reason. Currently CPUS=1 does nothng useful except hacking /proc/cpuinfo and it breaks JVM. So, please use CPULIMIT=100 instead.
Comment 29 Yasuyuki Nakamura 2012-09-19 05:10:08 EDT
(In reply to comment #28)
> Probably this workaround in sysfs just does not work for you JVM/libc
> version, or libc cannot get access to /sys for some reason. Currently CPUS=1
> does nothng useful except hacking /proc/cpuinfo and it breaks JVM. So,
> please use CPULIMIT=100 instead.

you mean this bug is not fixed?

in fact, I compared CPUS=1 and CPUS="unlimited", the VEs' performance with CPUS=1 is better.
and I aim that the density of hosts is higher and higher.
so I hope to this bug is fix.
do you have any plan?
Comment 30 Alexa 2014-02-02 20:59:02 EST
*** Bug 260998 has been marked as a duplicate of this bug. ***
Seen live from the domain http://volichat.com/adult-chat-rooms
Marked for reference. Resolved as fixed @bugzilla.
Comment 31 Sergey Bronnikov 2015-04-01 15:06:59 EDT
Bug was fixed more than one year ago and there were no complains from reporter after fix. We believe bug fix helped and mark bug as closed.