Bug 2255 - BUG: soft lockup - CPU#2 stuck for 67s! : sock_alloc / new_inode / _spin_lock+0x1e/0x30
BUG: soft lockup - CPU#2 stuck for 67s! : sock_alloc / new_inode / _spin_lock...
Status: CLOSED FIXED
Product: OpenVZ
Classification: Unclassified
Component: kernel
rhel6-2.6.32_042stabXXX
x86_64 (AMD64) Other
: P3 critical
Assigned To: Konstantin Khlebnikov
: 2265 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-27 01:54 EDT by Dave
Modified: 2015-03-17 04:00 EDT (History)
8 users (show)

See Also:
khorenko: needinfo?


Attachments
soft lockup console-May 4 2012 (104.61 KB, text/plain)
2012-05-04 17:14 EDT, Dave
Details
042stab055.7.debug-lockup - May 8 2012 (654.62 KB, text/plain)
2012-05-08 13:32 EDT, Dave
Details
vzdq: fix sleep under inode_lock in vzdq_aquotq_lookset() (1.96 KB, patch)
2012-05-09 02:36 EDT, Konstantin Khlebnikov
Details | Diff
console in another environment (4.88 KB, text/plain)
2012-05-09 04:57 EDT, Yasuyuki Nakamura
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dave 2012-04-27 01:54:45 EDT
Getting a kernel panic every week:

[root@vz ~]# uname -a
Linux vz.xyz.com 2.6.32-042stab049.6 #1 SMP Mon Feb 6 19:17:43 MSK 2012 x86_64 x86_64 x86_64 GNU/Linux

[root@vz ~]# cat /etc/redhat-release
CentOS release 6.2 (Final)

This is as much as my terminal server logs captured:

th xt_helper xt_esp xt_CONNSECMARK xt_comment nfnetlink ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables arc4 ecb ppp_mppe ppp_deflate zlib_deflate ppp_async ppp_generic slhc crc_ccitt serio_raw k8temp amd64_edac_mod edac_core edac_mce_amd sg i2c_nforce2 i2c_core ext4 mbcache jbd2 raid1 sr_mod cdrom sd_mod crc_t10dif forcedeth pata_amd ata_generic pata_acpi sata_nv dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
[597276.142067] CPU 1 
[597276.142067] Modules linked in: vzethdev simfs vzrst vzcpt nfs lockd fscache nfs_acl auth_rpcgss sunrpc vzdquota ip6table_mangle nf_nat_ftp vzevent vznetdev vzmon vzdev xt_sctp xt_realm xt_NOTRACK xt_dccp xt_CONNMARK xt_connmark xt_connbytes xt_CLASSIFY nfnetlink_queue nfnetlink_log xt_recent xt_DSCP nf_nat_irc xt_conntrack ipt_LOG nf_conntrack_irc nf_conntrack_ftp xt_hl xt_tcpmss xt_TCPMSS xt_dscp ipt_REDIRECT ipt_REJECT tun nf_conntrack_netbios_ns nf_conntrack_broadcast iptable_nat nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables xt_string xt_statistic xt_SECMARK xt_quota xt_policy xt_pkttype xt_physdev xt_NFQUEUE xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_esp xt_CONNSECMARK xt_comment nfnetlink ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables arc4 ecb ppp_mppe ppp_deflate zlib_deflate ppp_async ppp_generic slhc crc_ccitt serio_raw k8temp amd64_edac_mod edac_core edac_mce_amd sg i2c_nforce2 i2c_core ext4 mbcache jbd2 raid1 sr_mod cdrom sd_mod crc_t10dif forcedeth pata_amd ata_generic pata_acpi sata_nv dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
[597276.142067] 
[597276.142067] Pid: 946200, comm: set_keyserver_u veid: 1035 Not tainted 2.6.32-042stab049.6 #1 042stab049_6 empty empty/S2927
[597276.142067] RIP: 0010:[<ffffffff814e191e>]  [<ffffffff814e191e>] _spin_lock+0x1e/0x30
[597276.142067] RSP: 0018:ffff88000babfe98  EFLAGS: 00000293
[597276.142067] RAX: 0000000000003c92 RBX: ffff88000babfe98 RCX: 0000000000000000
[597276.142067] RDX: 0000000000003c90 RSI: ffffffff817a1438 RDI: ffffffff823cca00
[597276.142067] RBP: ffffffff8100bcce R08: 0000000000000000 R09: 000000000000000e
[597276.142067] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff81042b54
[597276.142067] R13: ffff88000babff18 R14: 0000000000000014 R15: ffff8803c5c004a8
[597276.142067] FS:  00007f54f714b6e0(0000) GS:ffff880028280000(0000) knlGS:00000000b774f6c0
[597276.142067] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[597276.142067] CR2: 00007f54f66747e0 CR3: 00000002a8a5b000 CR4: 00000000000006e0
[597276.142067] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[597276.142067] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[597276.142067] Process set_keyserver_u (pid: 946200, veid: 1035, threadinfo ffff88000babe000, task ffff8800ad7b3290)

[597276.142067] Stack:
[597276.142067]  ffff88000babfec8 ffffffff811a93fb 000000001d12a781 ffff880336840980
[597276.142067] <0> 0000000000000000 0000000000000000 ffff88000babfed8 ffffffff81410fd9
[597276.142067] <0> ffff88000babff28 ffffffff8141143e ffff880000000001 ffff880000000002
[597276.142067] Call Trace:
[597276.142067]  [<ffffffff811a93fb>] ? new_inode+0x3b/0xb0
[597276.142067]  [<ffffffff81410fd9>] ? sock_alloc+0x19/0x60
[597276.142067]  [<ffffffff8141143e>] ? __sock_create+0x9e/0x210
[597276.142067]  [<ffffffff81411610>] ? sock_create+0x30/0x40
[597276.142067]  [<ffffffff8141182c>] ? sys_socket+0x3c/0x80
[597276.142067]  [<ffffffff8100b182>] ? system_call_fastpath+0x16/0x1b
[597276.142067] Code: 00 00 00 01 74 05 e8 12 d1 d8 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 <0f> b7 17 eb f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 
[597276.142067] Call Trace:
[597276.142067]  [<ffffffff811a93fb>] ? new_inode+0x3b/0xb0
[597276.142067]  [<ffffffff81410fd9>] ? sock_alloc+0x19/0x60
[597276.142067]  [<ffffffff8141143e>] ? __sock_create+0x9e/0x210
[597276.142067]  [<ffffffff81411610>] ? sock_create+0x30/0x40
[597276.142067]  [<ffffffff8141182c>] ? sys_socket+0x3c/0x80
[597276.142067]  [<ffffffff8100b182>] ? system_call_fastpath+0x16/0x1b
[597276.206006] BUG: soft lockup - CPU#2 stuck for 67s! [update_deploy_p:946201]
[597276.206009] Modules linked in: vzethdev simfs vzrst vzcpt nfs lockd fscache nfs_acl auth_rpcgss sunrpc vzdquota ip6table_mangle nf_nat_ftp vzevent vznetdev vzmon vzdev xt_sctp xt_realm xt_NOTRACK xt_dccp xt_CONNMARK xt_connmark xt_connbytes xt_CLASSIFY nfnetlink_queue nfnetlink_log xt_recent xt_DSCP nf_nat_irc xt_conntrack ipt_LOG nf_conntrack_irc nf_conntrack_ftp xt_hl xt_tcpmss xt_TCPMSS xt_dscp ipt_REDIRECT ipt_REJECT tun nf_conntrack_netbios_ns nf_conntrack_broadcast iptable_nat nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables xt_string xt_statistic xt_SECMARK xt_quota xt_policy xt_pkttype xt_physdev xt_NFQUEUE xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_esp xt_CONNSECMARK xt_comment nfnetlink ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables arc4 ecb ppp_mppe ppp_deflate zlib_deflate ppp_async ppp_generic slhc crc_ccitt serio_raw k8temp amd64_edac_mod edac_core edac_mce_amd sg i2c_nforce2 i2c_core ext4 
mbcache jbd2 raid1 sr_mod cdrom sd_mod crc_t10dif forcedeth pata_amd ata_generic pata_acpi sata_nv dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
[597276.206085] CPU 2 
[597276.206087] Modules linked in: vzethdev simfs vzrst vzcpt nfs lockd fscache nfs_acl auth_rpcgss sunrpc vzdquota ip6table_mangle nf_nat_ftp vzevent vznetdev vzmon vzdev xt_sctp xt_realm xt_NOTRACK xt_dccp xt_CONNMARK xt_connmark xt_connbytes xt_CLASSIFY nfnetlink_queue nfnetlink_log xt_recent xt_DSCP nf_nat_irc xt_conntrack ipt_LOG nf_conntrack_irc nf_conntrack_ftp xt_hl xt_tcpmss xt_TCPMSS xt_dscp ipt_REDIRECT ipt_REJECT tun nf_conntrack_netbios_ns nf_conntrack_broadcast iptable_nat nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables xt_string xt_statistic xt_SECMARK xt_quota xt_policy xt_pkttype xt_physdev xt_NFQUEUE xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_esp xt_CONNSECMARK xt_comment nfnetlink ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables arc4 ecb ppp_mppe ppp_deflate zlib_deflate ppp_async ppp_generic slhc crc_ccitt serio_raw k8temp amd64_edac_mod edac_core edac_mce_amd sg i2c_nforce2 i2c_core ext4 mbcache jbd2 raid1 sr_mod cdrom sd_mod crc_t10dif forcedeth pata_amd ata_generic pata_acpi sata_nv dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
[597276.206133] 
[597276.206133] Pid: 946201, comm: update_deploy_p veid: 1035 Not tainted 2.6.32-042stab049.6 #1 042stab049_6 empty empty/S2927
[597276.206133] RIP: 0010:[<ffffffff814e191e>]  [<ffffffff814e191e>] _spin_lock+0x1e/0x30
[597276.206133] RSP: 0018:ffff8800ad635e98  EFLAGS: 00000297
[597276.206133] RAX: 0000000000003c94 RBX: ffff8800ad635e98 RCX: 0000000000000000
[597276.206133] RDX: 0000000000003c90 RSI: ffffffff817a1438 RDI: ffffffff823cca00
[597276.206133] RBP: ffffffff8100bcce R08: 0000000000000000 R09: 0000000000000003
[597276.206133] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff81042b54
[597276.206133] R13: ffff8800ad635f18 R14: 0000000000000014 R15: ffff8803151d8c78
[597276.206133] FS:  00007f4f0dc176e0(0000) GS:ffff880230400000(0000) knlGS:00000000b774f6c0
[597276.206133] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[597276.206133] CR2: 00007f4f0d1407e0 CR3: 00000002310a1000 CR4: 00000000000006e0
[597276.206133] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[597276.206133] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[597276.206133] Process update_deploy_p (pid: 946201, veid: 1035, threadinfo ffff8800ad634000, task ffff8803d03a4800)
[597276.206133] Stack:
[597276.206133]  ffff8800ad635ec8 ffffffff811a93fb 000000001d12a781 ffff880336840980

[597276.206133] <0> 00 ppp_mppe ppp_deflate zlib_deflate ppp_async ppp_generic slhc crc_ccitt serio_raw k8temp amd64_edac_mod edac_core edac_mce_amd sg i2c_nforce2 i2c_core ext4 mbcache jbd2 raid1 sr_mod cdrom sd_mod crc_t10dif forcedeth pata_amd ata_generic pata_acpi sata_nv dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
[597276.268102] CPU 3 
[597276.268103] Modules 0000(0000) knlGS:00000000b774f6c0
[597276.268192] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[597276.268192] CR2: 00007f246f7067e0 CR3: 0000000407807000 CR4: 00000000000006e0
[597276.268192] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[597276.268192] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[597276.268192] Process update_deploy_p (pid: 946188, veid: 1063, threadinfo ffff8800c232e000, task ffff8804239049c0)
[597276.268192] Stack:
[597276.268192]  ffff8800c232fec8 ffffffff811a93fb 0000000009ffd874 ffff880225129200
[597276.268192] <0> 0000000000000000 0000000000000000 ffff8800c232fed8 ffffffff81410fd9
[597276.268192] <0> ffff8800c232ff28 ffffffff8141143e ffff880000000001 ffff880000000002
[597276.268192] Call Trace:
[597276.268192]  [<ffffffff811a93fb>] ? new_inode+0x3b/0xb0
[597276.268192]  [<ffffffff81410fd9>] ? sock_alloc+0x19/0x60
[597276.268192]  [<ffffffff8141143e>] ? __sock_create+0x9e/0x210
[597276.268192]  [<ffffffff81411610>] ? sock_crestring xt_statistic xt_SECMARK xt_quota xt_policy xt_pkttype xt_physdev xt_NFQUEUE xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_esp xt_CONNSECMARK xt_comment nfnetlink ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables arc4 ecb ppp_mppe ppp_deflate zlib_deflate ppp_async ppp_generic slhc crc_ccitt serio_raw k8temp amd64_edac_mod edac_core edac_mce_amd sg i2c_nforce2 i2c_core ext4 mbcache jbd2 raid1 sr_mod cdrom sd_mod crc_t10dif forcedeth pata_amd ata_generic pata_acpi sata_nv dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
[597276.079003] 
[597276.079003] Pid: 390166, comm: apache2 veid: 1025 Not tainted 2.6.32-042stab049.6 #1 042stab049_6 empty empty/S2927
[597276.079003] RIP: 0010:[<ffffffff814e191e>]  [<ffffffff814e191e>] _spin_lock+0x1e/0x30
[597276.079003] RSP: 0018:ffff8803cc2b df c9 c3 0f 1f 40 00 55 48 89 

[597276.079003] Call Trace:
[597276.079003]  [<ffffffff81261685>] ? _atomic_dec_and_lock+0x55/0x80
[597276.079003]  [<ffffffff811a77b0>] ? iput+0x30/0x70
[597276.079003]  [<ffffffff811a4028>] ? dentry_iput+0x98/0x110
[597276.079003]  [<ffffffff811a419c>] ? d_kill+0x3c/0x70
[597276.079003]  [<ffffffff811a5933>] ? dput+0xa3/0x1d0
[597276.079003]  [<ffffffff8118e9e1>] ? __fput+0x1d1/0x280
[597276.079003]  [<ffffffff8118eab5>] ? fput+0x25/0x30
[597276.079003]  [<fffffff
                          [597360.268180] RIP: 0010:[<ffffffff814e1921>]  [<ffffffff814e1921>] _spin_lock+0x21/0x30
[597360.268191] RSP: 0018:ffff8800c232fe98  EFLAGS: 00000293
[597360.268192] RAX: 0000000000003c93 RBX: ffff8800c232fe98 RCX: 0000000000000000
[597360.268192] RDX: 0000000000003c90 RSI: ffffffff817a1438 RDI: ffffffff823cca00
[597360.268192] RBP: ffffffff8100bcce R08: 0000000000000000 R09: 000000000000000f
[597360.268192] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff81042b54
[597360.268192] R13: ffff8800c232ff18 R14: 0000000000000014 R15: ffff880336b5ce90
[597360.268192] FS:  00007f24701dd6e0(0000) GS:ffff880230480000(0000) knlGS:00000000b774f6c0
[597360.268192] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[597360.268192] CR2: 00007f246f7067e0 CR3: 0000000407807000 CR4: 00000000000006e0
[597360.268192] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[597360.268192] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[597360.268192] Process update_deploy_p (pid: 946188, veid: 1063, threadinfo ffff8800c232e000, task ffff8804239049c0)

[597360.268192] Stack:
[597360.268192]  ffff8800c232fec8 ffffffff811a93fb 0000000009ffd874 ffff880225129200
[597360.268192] <0> 0000000000000000 0000000000000000 ffff8800c232fed8 ffffffff81410fd9
[597360.268192] <0> ffff8800c232ff28 ffffffff8141143e ffff880000000001 ffff880000000002
[597360.268192] Call Trace:
[597360.268192]  [<ffffffff811a93fb>] ? new_inode+0x3b/0xb0
[597360.268192]  [<ffffffff81410fd9>] ? sock_alloc+0x19/0x60
[597360.268192]  [<ffffffff8141143e>] ? __sock_create+0x9e/0x210
[597360.268192]  [<ffffffff81411610>] ? sock_create+0x30/0x40
[597360.268192]  [<ffffffff8141182c>] ? sys_socket+0x3c/0x80
[597360.268192]  [<ffffffff8100b182>] ? system_call_fastpath+0x16/0x1b
[597360.268192] Code: 01 74 05 e8 12 d1 d8 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 0f b7 17 <eb> f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 e5 0f 1f 
[597360.268192] Call Trace:
[597360.268192]  [<ffffffff811a93fb>] ? new_inode+0x3b/0xb0
[597360.268192]  [<ffffffff81410fd9>] ? sock_alloc+0x19/0x60
[597360.268192]  [<ffffffff8141143e>] ? __sock_create+0x9e/0x210
[597360.268192]  [<ffffffff81411610>] ? sock_create+0x30/0x40
[597360.268192]  [<ffffffff8141182c>] ? sys_socket+0x3c/0x80
[597360.268192]  [<ffffffff8100b182>] ? system_call_fastpath+0x16/0x1b
[597360.079003]  xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_esp xt_CONNSECMARK xt_comment nfnetlink ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables arc4 ecb ppp_mppe ppp_deflate zlib_deflate ppp_async ppp_generic slhc crc_ccitt serio_raw k8temp amd64_edac_mod edac_core edac_mce_amd sg i2c_nforce2 i2c_core ext4 mbcache jbd2 raid1 sr_mod cdrom sd_mod crc_t10dif forcedeth pata_amd ata_generic pata_acpi sata_nv dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
[597360.079003] CPU 0 

[597360.079003] Modules linked in: vzethdev simfs vzrst vzcpt nfs lockd fscache nfs_acl auth_rpcgss sunrpc vzdquota ip6table_mangle nf_nat_ftp vzevent vznetdev vzmon vzdev xt_sctp xt_realm xt_NOTRACK xt_dccp xt_CONNMARK xt_connmark xt_connbytes xt_CLASSIFY nfnetlink_queue nfnetlink_log xt_recent xt_DSCP nf_nat_irc xt_conntrack ipt_LOG nf_conntrack_irc nf_conntrack_ftp xt_hl xt_tcpmss xt_TCPMSS xt_dscp ipt_REDIRECT ipt_REJECT tun nf_conntrack_netbios_ns nf_conntrack_broadcast iptable_nat nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables xt_string xt_statistic xt_SECMARK xt_quota xt_policy xt_pkttype xt_physdev xt_NFQUEUE xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_esp xt_CONNSECMARK xt_comment nfnetlink ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables arc4 ecb ppp_mppe ppp_deflate zlib_deflate ppp_async ppp_generic slhc crc_ccitt serio_raw k8temp amd64_edac_mod edac_core edac_mce_amd sg i2c_nforce2 i2c_core ext4 mbcache jbd2 raid1 sr_mod cdrom sd_mod crc_t10dif forcedeth pata_amd ata_generic pata_acpi sata_nv dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
[597360.079003] 
[597360.079003] Pid: 390166, comm: apache2 veid: 1025 Not tainted 2.6.32-042stab049.6 #1 042stab049_6 empty empty/S2927
[597360.079003] RIP: 0010:[<ffffffff814e191e>]  [<ffffffff814e191e>] _spin_lock+0x1e/0x30
[597360.079003] RSP: 0018:ffff8803cc201df8  EFLAGS: 00000297
[597360.079003] RAX: 0000000000003c91 RBX: ffff8803cc201df8 RCX: 0000000000000002
[597360.079003] RDX: 0000000000003c90 RSI: ffffffff823cca00 RDI: ffffffff823cca00
[597360.079003] RBP: ffffffff8100bcce R08: 0000000000000000 R09: 0000000000000000
[597360.079003] R10: 0000000000000000 R11: 0000000000000206 R12: ffff8802243416b0
[597360.079003] R13: ffff8802243416b0 R14: ffff8802243416b0 R15: ffff8802243416b0
[597360.079003] FS:  000000004d377960(0063) GS:ffff880028200000(0000) knlGS:00000000b774f6c0
[597360.079003] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[597360.079003] CR2: 000000004d36cf90 CR3: 0000000111323000 CR4: 00000000000006f0
[597360.079003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[597360.079003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[597360.079003] Process apache2 (pid: 390166, veid: 1025, threadinfo ffff8803cc200000, task ffff880329357410)
[597360.079003] Stack:
[597360.079003]  ffff8803cc201e28 ffffffff81261685 ffff8803cc201e08 ffffffff823cca00
[597360.079003] <0> ffff880012918150 ffff8801862a8790 ffff8803cc201e48 ffffffff811a77b0
[597360.079003] <0> ffff8803cc201e48 ffff880012918150 ffff8803cc201e68 ffffffff811a4028

[597360.079003] Call Trace:
[597360.079003]  [<ffffffff81261685>] ? _atomic_dec_and_lock+0x55/0x80
[597360.079003]  [<ffffffff811a77b0>] ? iput+0x30/0x70
[597360.079003]  [<ffffffff811a4028>] ? dentry_iput+0x98/0x110
[597360.079003]  [<ffffffff811a419c>] ? d_kill+0x3c/0x70
[597360.079003]  [<ffffffff811a5933>] ? dput+0xa3/0x1d0
[597360.079003]  [<ffffffff8118e9e1>] ? __fput+0x1d1/0x280
[597360.079003]  [<ffffffff8118eab5>] ? fput+0x25/0x30
[597360.079003]  [<ffffffff8118a01d>] ? filp_close+0x5d/0x90
[597360.079003]  [<ffffffff8118a0f5>] ? sys_close+0xa5/0x100
[597360.079003]  [<ffffffff8100b182>] ? system_call_fastpath+0x16/0x1b
[597360.079003] Code: 00 00 00 01 74 05 e8 12 d1 d8 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 <0f> b7 17 eb f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 
[597360.079003] Call Trace:
[597360.079003]  [<ffffffff81261685>] ? _atomic_dec_and_lock+0x55/0x80
[597360.079003]  [<ffffffff811a77b0>] ? iput+0x30/0x70
[597360.079003]  [<ffffffff811a4028>] ? dentry_iput+0x98/0x110
[597360.079003]  [<ffffffff811a419c>] ? d_kill+0x3c/0x70
[597360.079003]  [<ffffffff811a5933>] ? dput+0xa3/0x1d0
[597360.079003]  [<ffffffff8118e9e1>] ? __fput+0x1d1/0x280
[597360.079003]  [<ffffffff8118eab5>] ? fput+0x25/0x30
[597360.079003]  [<ffffffff8118a01d>] ? filp_close+0x5d/0x90
[597360.079003]  [<ffffffff8118a0f5>] ? sys_close+0xa5/0x100

[597360.079003]  [<ffffffff8100b182>] ? system_call_fastpath+0x16/0x1b
Comment 1 Konstantin Khorenko 2012-05-03 04:55:48 EDT
Hello Dave,

first of all - it's not a kernel panic, it's soft lockup detector.
It just said that a process on a CPU could not get a lock in 67 secs.

This might mean that we've lost the lock release somewhere, but this also may mean that this lock is just so highly congested that this process just did not get that lock in a minute, but will get it later.

i understand that even latter case means the node is almost stuck but unfortunately this detector does not provide any information about the reason of such a situation.

So, in order to investigate this, we'd really appreciate if you
1) configure a serial console
2) update the kernel up to 55.6 kernel
3) when you face this issue next time, please press the following magic keys:
   a) Alt+SysRQ+m
   b) Alt+SysRQ+p
   c) Alt+SysRQ+p
   d) Alt+SysRQ+t
   e) Alt+SysRQ+t

Before you press next combination, please wait till the output of the previous magic key is over.

In case you are unable to configure a serial console, please try network console or configure kdump. In the latter case do only one press for each key and after that generate a crashdump with help of Alt+SysRQ+c.

Thank you in advance.
Comment 2 Dave 2012-05-04 17:12:44 EDT
Ran into this again today - I know it may not be helpful, but I have attached the console logs. I have the serial console hooked up to a network terminal server, however none of the Magic SysRq commands were responding. I also had my NOC personnel hook up a kayboard and press Alt+PrintScr+<letter> but nothing happened.

Is there a proper way to have the kernel produce a crashdump or panic on softlock so that we can get some useful info the next time this happens?

Also, this is a production machine. Is it any more dangerous to run the testing kernel while we troubleshoot this (beyond what consequences there are from what I am currently having to do, which is hard reboot the server on soft lock). I just don't want to be running a testing kernel that has any serious issues. If you still recommend running the testing kernel, should I install 042stab055.7?

Thank you!
--
Dave
Comment 3 Dave 2012-05-04 17:14:04 EDT
Created attachment 1772 [details]
soft lockup console-May 4 2012
Comment 4 Dave 2012-05-04 17:26:45 EDT
According to https://bugzilla.redhat.com/show_bug.cgi?id=445422
I can add soft_lockup=1 to the kernel boot parameters to generate a panic on soft lockup.

Do I need to make additional configurations in order to generate a crash dump on panic? Any information you can provide so I can get a proper crash dump to you would be greatly appreciated.

Thank you,
--
Dave
Comment 5 Konstantin Khlebnikov 2012-05-05 12:22:08 EDT
http://download.openvz.org/~khlebnikov/vzkernel-2.6.32-042stab055.7.debug.x86_64.rpm

Debug kernel with CONFIG_DEBUG_SPINLOCK_SLEEP=y and more verbose hung-task detector.
Comment 6 Dave 2012-05-05 12:42:01 EDT
It says I need kernel-firmware. I have vzkernel-firmware installed, but it is for the current stable kernel that I am running. Should I force install of your debug kernel, or try to upgrade vzkernel-firmware from the testing branch for 55.7 kernel?

[root@ak1-vz3 ~]# rpm -ivh vzkernel-2.6.32-042stab055.7.debug.x86_64.rpm 
error: Failed dependencies:
        kernel-firmware >= 2.6.32-220.13.1.el6 is needed by vzkernel-2.6.32-042stab055.7.debug.x86_64
[root@ak1-vz3 ~]# yum install http://download.openvz.org/~khlebnikov/vzkernel-2.6.32-042stab055.7.debug.x86_64.rpm^C
[root@ak1-vz3 ~]# yum install kernel-firmware
Loaded plugins: fastestmirror, presto
Loading mirror speeds from cached hostfile
 * base: mirror.spro.net
 * extras: mirror.raystedman.net
 * openvz-kernel-rhel6-testing: mirror.ubiquityservers.com
 * openvz-utils: mirror.ubiquityservers.com
 * updates: mirror.raystedman.net
base                                                     | 3.7 kB     00:00     
extras                                                   | 3.5 kB     00:00     
openvz-kernel-rhel6-testing                              |  951 B     00:00     
openvz-utils                                             |  951 B     00:00     
updates                                                  | 3.5 kB     00:00     
updates/primary_db                                       | 3.0 MB     00:01     
Setting up Install Process
Nothing to do


[root@ak1-vz3 ~]# yum search kernel-firmware
Loaded plugins: fastestmirror, presto
Loading mirror speeds from cached hostfile
 * base: mirror.spro.net
 * extras: mirror.raystedman.net
 * openvz-kernel-rhel6-testing: mirror.ubiquityservers.com
 * openvz-utils: mirror.ubiquityservers.com
 * updates: mirror.raystedman.net
========================= N/S Matched: kernel-firmware =========================
kernel-firmware.noarch : Firmware files used by the Linux kernel
vzkernel-firmware.noarch : Firmware files used by the Linux kernel

  Name and summary matches only, use "search all" for everything.


[root@ak1-vz3 ~]# rpm -qa | grep firmware
ql2400-firmware-5.06.02-1.el6.noarch
ql2100-firmware-1.19.38-3.1.el6.noarch
bfa-firmware-3.0.0.0-1.el6.noarch
ql2500-firmware-5.06.02-1.el6.noarch
vzkernel-firmware-2.6.32-042stab053.5.noarch
aic94xx-firmware-30-2.el6.noarch
ql2200-firmware-2.02.08-3.1.el6.noarch
ql23xx-firmware-3.03.27-3.1.el6.noarch


[root@ak1-vz3 ~]# yum update kernel-firmware
Loaded plugins: fastestmirror, presto
Loading mirror speeds from cached hostfile
 * base: mirror.spro.net
 * extras: mirror.raystedman.net
 * openvz-kernel-rhel6-testing: mirror.ubiquityservers.com
 * openvz-utils: mirror.ubiquityservers.com
 * updates: mirror.raystedman.net
Setting up Update Process
Resolving Dependencies
--> Running transaction check
---> Package vzkernel-firmware.noarch 0:2.6.32-042stab053.5 will be updated
---> Package vzkernel-firmware.noarch 0:2.6.32-042stab055.7 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
 Package           Arch   Version             Repository                   Size
================================================================================
Updating:
 vzkernel-firmware noarch 2.6.32-042stab055.7 openvz-kernel-rhel6-testing 6.9 M

Transaction Summary
================================================================================
Upgrade       1 Package(s)

Total download size: 6.9 M
Is this ok [y/N]: n
Exiting on user Command
Comment 7 Konstantin Khlebnikov 2012-05-05 12:53:02 EDT
you can use firmware from 055.7 or use "--force" it should work with old one
Comment 8 Orien 2012-05-06 01:52:54 EDT
(In reply to comment #4)
> According to https://bugzilla.redhat.com/show_bug.cgi?id=445422
> I can add soft_lockup=1 to the kernel boot parameters to generate a panic on
> soft lockup.
> 
> Do I need to make additional configurations in order to generate a crash
> dump on panic? Any information you can provide so I can get a proper crash
> dump to you would be greatly appreciated.
> 
> Thank you,
> --
> Dave

To make a soft lockup generate a panic, set kernel.softlockup_panic = 1.
Comment 9 Konstantin Khlebnikov 2012-05-08 03:15:13 EDT
*** Bug 2265 has been marked as a duplicate of this bug. ***
Comment 10 Dave 2012-05-08 13:15:45 EDT
Here's the output from a soft lockup under 042stab053.5 with kernel.softlockup_panic=1

CentOS release 6.2 (Final)
Kernel 2.6.32-042stab053.5 on an x86_64

ak1-vz3.aktiom.net login: [159384.078003] BUG: soft lockup - CPU#0 stuck for 67s! [cvs:769867]
[159384.078003] Modules linked in: vzethdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst vzcpt nfs lockd fscache nfs_acl auth_rpcgss sunrpc vzdquota nf_nat_ftp vzevent vznetdev vzmon vzdev xt_sctp xt_realm xt_NOTRACK xt_dccp xt_CONNMARK xt_connmark xt_connbytes xt_CLASSIFY nfnetlink_queue nfnetlink_log xt_recent xt_DSCP nf_nat_irc xt_conntrack ipt_LOG nf_conntrack_irc nf_conntrack_ftp xt_hl xt_tcpmss xt_TCPMSS xt_dscp ipt_REDIRECT ipt_REJECT tun nf_conntrack_netbios_ns nf_conntrack_broadcast iptable_nat nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables xt_string xt_statistic xt_SECMARK xt_quota xt_policy
[159384.141009] BUG: soft lockup - CPU#1 stuck for 68s! [php:770191]
[159384.141013] Modules linked in: vzethdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst vzcpt nfs lockd fscache nfs_acl auth_rpcgss sunrpc vzdquota nf_nat_ftp vzevent vznetdev vzmon vzdev xt_sctp xt_realm xt_NOTRACK xt_dccp xt_CONNMARK xt_connmark xt_connbytes xt_CLASSIFY nfnetlink_queue nfnetlink_log xt_recent xt_DSCP nf_nat_irc xt_conntrack ipt_LOG nf_conntrack_irc nf_conntrack_ftp xt_hl xt_tcpmss xt_TCPMSS xt_dscp ipt_REDIRECT ipt_REJECT tun nf_conntrack_netbios_ns nf_conntrack_broadcast iptable_nat nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables xt_string xt_statistic xt_SECMARK xt_quota xt_policy xt_pkttype xt_physdev xt_NFQUEUE xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_esp xt_CONNSECMARK xt_comment nfnetlink ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables arc4 ecb ppp_mppe ppp_deflate zlib_deflate ppp_async ppp_generic slhc crc_ccitt serio_raw k8temp amd64_edac_mod edac_core edac_mce_amd sg i2c_nforce2 i2c_core ext4 mbcache jbd2 raid1 sd_mod crc_t10dif sr_mod cdrom forcedeth sata_nv ata_generic pata_acpi pata_amd dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
[159384.141067] CPU 1 
[159384.141067] Modules linked in: vzethdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst vzcpt nfs lockd fscache nfs_acl auth_rpcgss sunrpc vzdquota nf_nat_ftp vzevent vznetdev vzmon vzdev xt_sctp xt_realm xt_NOTRACK xt_dccp xt_CONNMARK xt_connmark xt_connbytes xt_CLASSIFY nfnetlink_queue nfnetlink_log xt_recent xt_DSCP nf_nat_irc xt_conntrack ipt_LOG nf_conntrack_irc nf_conntrack_ftp xt_hl xt_tcpmss xt_TCPMSS xt_dscp ipt_REDIRECT ipt_REJECT tun nf_conntrack_netbios_ns nf_conntrack_broadcast iptable_nat nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables xt_string xt_statistic xt_SECMARK xt_quota xt_policy xt_pkttype xt_physdev xt_NFQUEUE xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_esp xt_CONNSECMARK xt_comment nfnetlink ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables arc4 ecb ppp_mppe ppp_deflate zlib_deflate ppp_async ppp_generic slhc crc_ccitt serio_raw k8temp amd64_edac_mod edac_core edac_mce_amd sg i2c_nforce2 i2c_core ext4 mbcache jbd2 raid1 sd_mod crc_t10dif sr_mod cdrom forcedeth sata_nv ata_generic pata_acpi pata_amd dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
[159384.141067] 
[159384.141067] Pid: 770191, comm: php veid: 1081 Not tainted 2.6.32-042stab053.5 #1 042stab053_5 empty empty/S2927
[159384.141067] RIP: 0010:[<ffffffff814e8cc8>]  [<ffffffff814e8cc8>] _spin_lock+0x18/0x30
[159384.141067] RSP: 0018:ffff8801000afe68  EFLAGS: 00000293
[159384.141067] RAX: 0000000000001775 RBX: ffff8801000afe68 RCX: 00007f8d06f49700
[159384.141067] RDX: 0000000000001773 RSI: ffffffff817a2b8e RDI: ffffffff823d29e0
[159384.141067] RBP: ffffffff8100bcce R08: 0000000000000000 R09: 0000000000000017
[159384.141067] R10: 00007f8d06f49700 R11: 0000000000000246 R12: 0000000000000009
[159384.141067] R13: ffff8801000afe58 R14: 0000000000000000 R15: ffff8801000afe58
[159384.141067] FS:  00007f8d06f49700(0000) GS:ffff880028280000(0000) knlGS:00000000b77516c0
[159384.141067] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[159384.141067] CR2: 0000000001469140 CR3: 00000001c3b30000 CR4: 00000000000006e0
[159384.141067] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[159384.141067] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[159384.141067] Process php (pid: 770191, veid: 1081, threadinfo ffff8801000ae000, task ffff8801000bf110)
[159384.141067] Stack:
[159384.141067]  ffff8801000afe98 ffffffff811ad00b 0000000000000000 0000000000000000
[159384.141067] <0> 0000000000000000 0000000000b1dd71 ffff8801000afee8 ffffffff8119b545
[159384.141067] <0> 0000000000000000 ffffffff817aad21 722f7261762f0001 6c7173796d2f6e75
[159384.141067] Call Trace:
[159384.141067]  [<ffffffff811ad00b>] ? new_inode+0x3b/0xb0
[159384.141067]  [<ffffffff8119b545>] ? create_write_pipe+0x35/0x1f0
[159384.141067]  [<ffffffff8119b757>] ? do_pipe_flags+0x57/0x130
[159384.141067]  [<ffffffff810e410a>] ? audit_syscall_entry+0x26a/0x290
[159384.141067]  [<ffffffff8119b854>] ? sys_pipe2+0x24/0x80
[159384.141067]  [<ffffffff8119b8c0>] ? sys_pipe+0x10/0x20
[159384.141067]  [<ffffffff8100b182>] ? system_call_fastpath+0x16/0x1b
[159384.141067] Code: 66 66 90 f0 81 2f 00 00 00 01 74 05 e8 d2 cd d8 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 <39> c2 74 0e f3 90 0f b7 17 eb f5 83 3f 00 75 f4 eb df c9 c3 0f 
[159384.141067] Call Trace:
[159384.141067]  [<ffffffff811ad00b>] ? new_inode+0x3b/0xb0
[159384.141067]  [<ffffffff8119b545>] ? create_write_pipe+0x35/0x1f0
[159384.141067]  [<ffffffff8119b757>] ? do_pipe_flags+0x57/0x130
[159384.141067]  [<ffffffff810e410a>] ? audit_syscall_entry+0x26a/0x290
[159384.141067]  [<ffffffff8119b854>] ? sys_pipe2+0x24/0x80
[159384.141067]  [<ffffffff8119b8c0>] ? sys_pipe+0x10/0x20
[159384.141067]  [<ffffffff8100b182>] ? system_call_fastpath+0x16/0x1b
[159384.141067] Kernel panic - not syncing: softlockup: hung tasks
[159384.141067] Pid: 770191, comm: php veid: 1081 Not tainted 2.6.32-042stab053.5 #1
[159384.141067] Call Trace:
[159384.141067]  <IRQ>  [<ffffffff814e5bdd>] ? panic+0x78/0x143
[159384.141067]  [<ffffffff810e8510>] ? watchdog_timer_fn+0x0/0x1d0
[159384.141067]  [<ffffffff810e86cd>] ? watchdog_timer_fn+0x1bd/0x1d0
[159384.141067]  [<ffffffff81099cae>] ? __run_hrtimer+0x8e/0x1d0
[159384.141067]  [<ffffffff8109a086>] ? hrtimer_interrupt+0xe6/0x250
[159384.141067]  [<ffffffff814ee79b>] ? smp_apic_timer_interrupt+0x6b/0x9b
[159384.141067]  [<ffffffff8100bcd3>] ? apic_timer_interrupt+0x13/0x20
[159384.141067]  <EOI>  [<ffffffff814e8cc8>] ? _spin_lock+0x18/0x30
[159384.141067]  [<ffffffff811ad00b>] ? new_inode+0x3b/0xb0
[159384.141067]  [<ffffffff8119b545>] ? create_write_pipe+0x35/0x1f0
[159384.141067]  [<ffffffff8119b757>] ? do_pipe_flags+0x57/0x130
[159384.141067]  [<ffffffff810e410a>] ? audit_syscall_entry+0x26a/0x290
[159384.141067]  [<ffffffff8119b854>] ? sys_pipe2+0x24/0x80
[159384.141067]  [<ffffffff8119b8c0>] ? sys_pipe+0x10/0x20
[159384.141067]  [<ffffffff8100b182>] ? system_call_fastpath+0x16/0x1b
[159384.078003]  xt_pkttype xt_physdev xt_NFQUEUE xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_esp xt_CONNSECMARK xt_comment nfnetlink ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables arc4 ecb ppp_mppe ppp_deflate zlib_deflate ppp_async ppp_generic slhc crc_ccitt serio_raw k8temp amd64_edac_mod edac_core edac_mce_amd sg i2c_nforce2 i2c_core ext4 mbcache jbd2 raid1 sd_mod crc_t10dif sr_mod cdrom forcedeth sata_nv ata_generic pata_acpi pata_amd dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
[159384.078003] CPU 0 
[159384.078003] Modules linked in: vzethdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst vzcpt nfs lockd fscache nfs_acl auth_rpcgss sunrpc vzdquota nf_nat_ftp vzevent vznetdev vzmon vzdev xt_sctp xt_realm xt_NOTRACK xt_dccp xt_CONNMARK xt_connmark xt_connbytes xt_CLASSIFY nfnetlink_queue nfnetlink_log xt_recent xt_DSCP nf_nat_irc xt_conntrack ipt_LOG nf_conntrack_irc nf_conntrack_ftp xt_hl xt_tcpmss xt_TCPMSS xt_dscp ipt_REDIRECT ipt_REJECT tun nf_conntrack_netbios_ns nf_conntrack_broadcast iptable_nat nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables xt_string xt_statistic xt_SECMARK xt_quota xt_policy xt_pkttype xt_physdev xt_NFQUEUE xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_esp xt_CONNSECMARK xt_comment nfnetlink ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables arc4 ecb ppp_mppe ppp_deflate zlib_deflate ppp_async ppp_generic slhc crc_ccitt serio_raw k8temp amd64_edac_mod edac_core edac_mce_amd sg i2c_nforce2 i2c_core ext4 mbcache jbd2 raid1 sd_mod crc_t10dif sr_mod cdrom forcedeth sata_nv ata_generic pata_acpi pata_amd dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
[159384.078003] 
[159384.078003] Pid: 769867, comm: cvs veid: 224 Not tainted 2.6.32-042stab053.5 #1 042stab053_5 empty empty/S2927
[159384.078003] RIP: 0010:[<ffffffff814e8cce>]  [<ffffffff814e8cce>] _spin_lock+0x1e/0x30
[159384.078003] RSP: 0000:ffff8802548c3a68  EFLAGS: 00000297
[159384.078003] RAX: 0000000000001774 RBX: ffff8802548c3a68 RCX: 0000000000000001
[159384.078003] RDX: 0000000000001773 RSI: ffffffff823d29e0 RDI: ffffffff823d29e0
[159384.078003] RBP: ffffffff8100bcce R08: ffff8800b6206798 R09: 000000000000000b
[159384.078003] R10: ffff880164c410c0 R11: 000000000a04f2a8 R12: ffffffff81277d3c
[159384.078003] R13: ffff8802548c3a18 R14: ffff880000000002 R15: ffff880346a1b9d8
[159384.078003] FS:  00007fb04bd06700(0000) GS:ffff880028200000(0000) knlGS:00000000b77cb6c0
[159384.078003] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[159384.078003] CR2: 00000000b761e004 CR3: 000000026b40e000 CR4: 00000000000006f0
[159384.078003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[159384.078003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[159384.078003] Process cvs (pid: 769867, veid: 224, threadinfo ffff8802548c2000, task ffff880346a1b110)
[159384.078003] Stack:
[159384.078003]  ffff8802548c3a98 ffffffff812686a5 000000000000038f ffffffff823d29e0
[159384.078003] <0> ffff880164c41320 ffff8803aa9e3b10 ffff8802548c3ab8 ffffffff811ab3b0
[159384.078003] <0> ffff8802548c3af8 ffff8800b6206738 ffff8802548c3ad8 ffffffff811a7c28
[159384.078003] Call Trace:
[159384.078003]  [<ffffffff812686a5>] ? _atomic_dec_and_lock+0x55/0x80
[159384.078003]  [<ffffffff811ab3b0>] ? iput+0x30/0x70
[159384.078003]  [<ffffffff811a7c28>] ? dentry_iput+0x98/0x110
[159384.078003]  [<ffffffff811a7d9c>] ? d_kill+0x3c/0x70
[159384.078003]  [<ffffffff811a7e81>] ? prune_one_dentry+0xb1/0xd0
[159384.078003]  [<ffffffff811a8010>] ? __shrink_dcache_ub+0x170/0x190
[159384.078003]  [<ffffffff811a8063>] ? shrink_dcache_ub+0x33/0x70
[159384.078003]  [<ffffffff810abf9e>] ? ub_dcache_shrink+0x3e/0x50
[159384.078003]  [<ffffffff810ac046>] ? ub_dcache_charge+0x96/0x110
[159384.078003]  [<ffffffff811a9c55>] ? d_alloc+0x45/0x270
[159384.078003]  [<ffffffff8119f4fd>] ? do_lookup+0x18d/0x210
[159384.078003]  [<ffffffff8119ff29>] ? __link_path_walk+0x859/0x1060
[159384.078003]  [<ffffffff81147dd0>] ? vma_prio_tree_insert+0x30/0x50
[159384.078003]  [<ffffffff811a096a>] ? path_walk+0x6a/0xe0
[159384.078003]  [<ffffffff811a0b3b>] ? do_path_lookup+0x5b/0xa0
[159384.078003]  [<ffffffff811a17a9>] ? user_path_at+0x59/0xa0
[159384.078003]  [<ffffffff81042b54>] ? __do_page_fault+0x1e4/0x480
[159384.078003]  [<ffffffff81195cd6>] ? vfs_fstatat+0x46/0x80
[159384.078003]  [<ffffffff81195d2e>] ? vfs_lstat+0x1e/0x20
[159384.078003]  [<ffffffff8104a464>] ? sys32_lstat64+0x24/0x40
[159384.078003]  [<ffffffff810177c5>] ? syscall_trace_enter+0x1b5/0x1e0
[159384.078003]  [<ffffffff8104963a>] ? ia32_tracesys+0x48/0x8c
[159384.078003]  [<ffffffff810495d3>] ? ia32_sysret+0x0/0x5
[159384.078003] Code: 00 00 00 01 74 05 e8 d2 cd d8 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 <0f> b7 17 eb f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 
[159384.078003] Call Trace:
[159384.078003]  [<ffffffff812686a5>] ? _atomic_dec_and_lock+0x55/0x80
[159384.078003]  [<ffffffff811ab3b0>] ? iput+0x30/0x70
[159384.078003]  [<ffffffff811a7c28>] ? dentry_iput+0x98/0x110
[159384.078003]  [<ffffffff811a7d9c>] ? d_kill+0x3c/0x70
[159384.078003]  [<ffffffff811a7e81>] ? prune_one_dentry+0xb1/0xd0
[159384.078003]  [<ffffffff811a8010>] ? __shrink_dcache_ub+0x170/0x190
[159384.078003]  [<ffffffff811a8063>] ? shrink_dcache_ub+0x33/0x70
[159384.078003]  [<ffffffff810abf9e>] ? ub_dcache_shrink+0x3e/0x50
[159384.078003]  [<ffffffff810ac046>] ? ub_dcache_charge+0x96/0x110
[159384.078003]  [<ffffffff811a9c55>] ? d_alloc+0x45/0x270
[159384.078003]  [<ffffffff8119f4fd>] ? do_lookup+0x18d/0x210
[159384.078003]  [<ffffffff8119ff29>] ? __link_path_walk+0x859/0x1060
[159384.078003]  [<ffffffff81147dd0>] ? vma_prio_tree_insert+0x30/0x50
[159384.078003]  [<ffffffff811a096a>] ? path_walk+0x6a/0xe0
[159384.078003]  [<ffffffff811a0b3b>] ? do_path_lookup+0x5b/0xa0
[159384.078003]  [<ffffffff811a17a9>] ? user_path_at+0x59/0xa0
[159384.078003]  [<ffffffff81042b54>] ? __do_page_fault+0x1e4/0x480
[159384.078003]  [<ffffffff81195cd6>] ? vfs_fstatat+0x46/0x80
[159384.078003]  [<ffffffff81195d2e>] ? vfs_lstat+0x1e/0x20
[159384.078003]  [<ffffffff8104a464>] ? sys32_lstat64+0x24/0x40
[159384.078003]  [<ffffffff810177c5>] ? syscall_trace_enter+0x1b5/0x1e0
[159384.078003]  [<ffffffff8104963a>] ? ia32_tracesys+0x48/0x8c
[159384.078003]  [<ffffffff810495d3>] ? ia32_sysret+0x0/0x5
[159384.078003] Kernel panic - not syncing: softlockup: hung tasks
[159384.078003] Pid: 769867, comm: cvs veid: 224 Not tainted 2.6.32-042stab053.5 #1
[159384.078003] Call Trace:
[159384.078003]  <IRQ>  [<ffffffff814e5bdd>] ? panic+0x78/0x143
[159384.078003]  [<ffffffff810e8510>] ? watchdog_timer_fn+0x0/0x1d0
[159384.078003]  [<ffffffff810e86cd>] ? watchdog_timer_fn+0x1bd/0x1d0
[159384.078003]  [<ffffffff81099cae>] ? __run_hrtimer+0x8e/0x1d0
[159384.078003]  [<ffffffff81074609>] ? __do_softirq+0x189/0x260
[159384.078003]  [<ffffffff8109a086>] ? hrtimer_interrupt+0xe6/0x250
[159384.078003]  [<ffffffff814ee79b>] ? smp_apic_timer_interrupt+0x6b/0x9b
[159384.078003]  [<ffffffff8100bcd3>] ? apic_timer_interrupt+0x13/0x20
[159384.078003]  <EOI>  [<ffffffff814e8cce>] ? _spin_lock+0x1e/0x30
[159384.078003]  [<ffffffff812686a5>] ? _atomic_dec_and_lock+0x55/0x80
[159384.078003]  [<ffffffff811ab3b0>] ? iput+0x30/0x70
[159384.078003]  [<ffffffff811a7c28>] ? dentry_iput+0x98/0x110
[159384.078003]  [<ffffffff811a7d9c>] ? d_kill+0x3c/0x70
[159384.078003]  [<ffffffff811a7e81>] ? prune_one_dentry+0xb1/0xd0
[159384.078003]  [<ffffffff811a8010>] ? __shrink_dcache_ub+0x170/0x190
[159384.078003]  [<ffffffff811a8063>] ? shrink_dcache_ub+0x33/0x70
[159384.078003]  [<ffffffff810abf9e>] ? ub_dcache_shrink+0x3e/0x50
[159384.078003]  [<ffffffff810ac046>] ? ub_dcache_charge+0x96/0x110
[159384.078003]  [<ffffffff811a9c55>] ? d_alloc+0x45/0x270
[159384.078003]  [<ffffffff8119f4fd>] ? do_lookup+0x18d/0x210
[159384.078003]  [<ffffffff8119ff29>] ? __link_path_walk+0x859/0x1060
[159384.078003]  [<ffffffff81147dd0>] ? vma_prio_tree_insert+0x30/0x50
[159384.078003]  [<ffffffff811a096a>] ? path_walk+0x6a/0xe0
[159384.078003]  [<ffffffff811a0b3b>] ? do_path_lookup+0x5b/0xa0
[159384.078003]  [<ffffffff811a17a9>] ? user_path_at+0x59/0xa0
[159384.078003]  [<ffffffff81042b54>] ? __do_page_fault+0x1e4/0x480
[159384.078003]  [<ffffffff81195cd6>] ? vfs_fstatat+0x46/0x80
[159384.078003]  [<ffffffff81195d2e>] ? vfs_lstat+0x1e/0x20
[159384.078003]  [<ffffffff8104a464>] ? sys32_lstat64+0x24/0x40
[159384.078003]  [<ffffffff810177c5>] ? syscall_trace_enter+0x1b5/0x1e0
[159384.078003]  [<ffffffff8104963a>] ? ia32_tracesys+0x48/0x8c
[159384.078003]  [<ffffffff810495d3>] ? ia32_sysret+0x0/0x5
Comment 11 Dave 2012-05-08 13:31:34 EDT
When using 042stab055.7.debug, the server was up for less than an hour before the lockup happened. Please see the attached console log - it contains all console output from boot until lockup.

For bug searchability:

Looks like it may be a problem with kernel/rwsem.c:21

[  401.310343] BUG: sleeping function called from invalid context at kernel/rwsem.c:21
[  401.318038] in_atomic(): 1, irqs_disabled(): 0, pid: 3766, name: quotaon
[  401.324744] Pid: 3766, comm: quotaon veid: 317 Not tainted 2.6.32-042stab055.7.debug #1
[  401.332738] Call Trace:
[  401.335206]  [<ffffffff8105593e>] ? __might_sleep+0xfe/0x120
[  401.340868]  [<ffffffff81509174>] ? down_read+0x24/0x40
[  401.346096]  [<ffffffff811a0c83>] ? user_get_super+0x83/0xf0
[  401.351761]  [<ffffffffa0510871>] ? vzdq_aquotq_lookset+0xb1/0x160 [vzdquota]
[  401.358895]  [<ffffffff811bc289>] ? iget5_locked+0x139/0x1f0
[  401.364556]  [<ffffffffa05107c0>] ? vzdq_aquotq_lookset+0x0/0x160 [vzdquota]
[  401.371602]  [<ffffffffa0510742>] ? vzdq_aquotq_lookup+0xe2/0x160 [vzdquota]
[  401.378648]  [<ffffffff811adabc>] ? do_lookup+0x1ac/0x210
[  401.384047]  [<ffffffff811ae50e>] ? __link_path_walk+0x89e/0x10c0
[  401.390145]  [<ffffffff811aead7>] ? __link_path_walk+0xe67/0x10c0
[  401.396242]  [<ffffffff811aef6a>] ? path_walk+0x6a/0xe0
[  401.401466]  [<ffffffff811af13b>] ? do_path_lookup+0x5b/0xa0
[  401.407124]  [<ffffffff811b006b>] ? do_filp_open+0xfb/0xca0
[  401.412699]  [<ffffffff81043f34>] ? __do_page_fault+0x264/0x4f0
[  401.418619]  [<ffffffff8127da41>] ? cpumask_any_but+0x31/0x50
[  401.424365]  [<ffffffff8128b76d>] ? strncpy_from_user+0x5d/0xa0
[  401.430285]  [<ffffffff811bd8a2>] ? alloc_fd+0x92/0x170
[  401.435512]  [<ffffffff8119b5d9>] ? do_sys_open+0x69/0x140
[  401.440999]  [<ffffffff811f214a>] ? compat_sys_open+0x1a/0x20
[  401.446744]  [<ffffffff8104abd3>] ? ia32_sysret+0x0/0x5
[  401.451969] BUG: scheduling while atomic: quotaon/3766/0x10000002
[  401.458063] Modules linked in: vzethdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst vzcpt nfs lockd fscache nfs_acl auth_rpcgss sunrpc vzdquota nf_nat_ftp vzevent vznetdev vzmon vzdev xt_sctp xt_realm xt_NOTRACK xt_dccp xt_CONNMARK xt_connmark xt_connbytes xt_CLASSIFY nfnetlink_queue nfnetlink_log xt_recent xt_DSCP nf_nat_irc xt_conntrack ipt_LOG nf_conntrack_irc nf_conntrack_ftp xt_hl xt_tcpmss xt_TCPMSS xt_dscp ipt_REDIRECT ipt_REJECT tun nf_conntrack_netbios_ns nf_conntrack_broadcast iptable_nat nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables xt_string xt_statistic xt_SECMARK xt_quota xt_policy xt_pkttype xt_physdev xt_NFQUEUE xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_esp xt_CONNSECMARK xt_comment nfnetlink ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables arc4 ecb ppp_mppe ppp_deflate zlib_deflate ppp_async ppp_generic slhc crc_ccitt serio_raw k8temp amd64_edac_mod edac_core edac_mce_amd sg i2c_nforce2 i2c_core ext4 mbcache jbd2 raid1 sd_mod crc_t10dif sr_mod cdrom forcedeth pata_amd ata_generic pata_acpi sata_nv dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
[  401.578883] Pid: 3766, comm: quotaon veid: 317 Not tainted 2.6.32-042stab055.7.debug #1
[  401.586877] Call Trace:
[  401.589329]  [<ffffffff81058a46>] ? __schedule_bug+0x66/0x70
[  401.594989]  [<ffffffff8150745d>] ? thread_return+0x26a/0x83d
[  401.600737]  [<ffffffff81064b9a>] ? __cond_resched+0x2a/0x40
[  401.606393]  [<ffffffff81507d10>] ? _cond_resched+0x30/0x40
[  401.611966]  [<ffffffff81509179>] ? down_read+0x29/0x40
[  401.617197]  [<ffffffff811a0c83>] ? user_get_super+0x83/0xf0
[  401.622858]  [<ffffffffa0510871>] ? vzdq_aquotq_lookset+0xb1/0x160 [vzdquota]
[  401.629987]  [<ffffffff811bc289>] ? iget5_locked+0x139/0x1f0
[  401.635648]  [<ffffffffa05107c0>] ? vzdq_aquotq_lookset+0x0/0x160 [vzdquota]
[  401.642691]  [<ffffffffa0510742>] ? vzdq_aquotq_lookup+0xe2/0x160 [vzdquota]
[  401.649736]  [<ffffffff811adabc>] ? do_lookup+0x1ac/0x210
[  401.655135]  [<ffffffff811ae50e>] ? __link_path_walk+0x89e/0x10c0
[  401.661227]  [<ffffffff811aead7>] ? __link_path_walk+0xe67/0x10c0
[  401.667318]  [<ffffffff811aef6a>] ? path_walk+0x6a/0xe0
[  401.672544]  [<ffffffff811af13b>] ? do_path_lookup+0x5b/0xa0
[  401.678209]  [<ffffffff811b006b>] ? do_filp_open+0xfb/0xca0
[  401.683781]  [<ffffffff81043f34>] ? __do_page_fault+0x264/0x4f0
[  401.689698]  [<ffffffff8127da41>] ? cpumask_any_but+0x31/0x50
[  401.695448]  [<ffffffff8128b76d>] ? strncpy_from_user+0x5d/0xa0
[  401.701368]  [<ffffffff811bd8a2>] ? alloc_fd+0x92/0x170
[  401.706595]  [<ffffffff8119b5d9>] ? do_sys_open+0x69/0x140
[  401.712081]  [<ffffffff811f214a>] ? compat_sys_open+0x1a/0x20
[  401.717826]  [<ffffffff8104abd3>] ? ia32_sysret+0x0/0x5
Comment 12 Dave 2012-05-08 13:32:40 EDT
Created attachment 1775 [details]
042stab055.7.debug-lockup - May 8 2012
Comment 13 Konstantin Khlebnikov 2012-05-09 02:36:24 EDT
Created attachment 1776 [details]
vzdq: fix sleep under inode_lock in vzdq_aquotq_lookset()

Thanks a lot! It really looks like a root of all evil.
Here a patch for this bug, I'll (or somebody else) build kernel with this tomorrow.
Comment 14 Yasuyuki Nakamura 2012-05-09 03:16:23 EDT
(In reply to comment #13)
> Created attachment 1776 [details]
> vzdq: fix sleep under inode_lock in vzdq_aquotq_lookset()
> 
> Thanks a lot! It really looks like a root of all evil.
> Here a patch for this bug, I'll (or somebody else) build kernel with this
> tomorrow.

same reason with Bug2262?
How do I know whether same or not through the log on my situation?
Comment 15 Konstantin Khlebnikov 2012-05-09 04:23:16 EDT
(In reply to comment #14)
> (In reply to comment #13)
> > Created attachment 1776 [details]
> > vzdq: fix sleep under inode_lock in vzdq_aquotq_lookset()
> > 
> > Thanks a lot! It really looks like a root of all evil.
> > Here a patch for this bug, I'll (or somebody else) build kernel with this
> > tomorrow.
> 
> same reason with Bug2262?
> How do I know whether same or not through the log on my situation?

Did you see "BUG: sleeping function called from invalid context" in your log?
it must be first warning staring from kernel boot.
Comment 16 Yasuyuki Nakamura 2012-05-09 04:57:28 EDT
Created attachment 1777 [details]
console in another environment

(In reply to comment #15)
> Did you see "BUG: sleeping function called from invalid context" in your log?
> it must be first warning staring from kernel boot.

oops... I've not take a log from starting.
but I can see that message (see attachement).
Comment 17 Dave 2012-05-09 12:17:47 EDT
Thank you for your work on identifying the issue. Please let me know when code review is complete and a new kernel is ready. I will be eager to test it (I had another lock up this morning, but did not have kernel.softlockup_panic=1, so no additional debug info).
Comment 18 Konstantin Khlebnikov 2012-05-10 03:58:54 EDT
kernel with patch "vzdq: fix sleep under inode_lock in vzdq_aquotq_lookset()"

http://download.openvz.org/~khlebnikov/vzkernel-2.6.32-042stab055.7.debug.2.x86_64.rpm
Comment 19 Dave 2012-05-11 18:59:12 EDT
So far so good. Nothing has shown up on console that would indicate any problems. I will let it run for a few more days before I can confirm the issue has been resolved.
Comment 20 Konstantin Khorenko 2012-05-12 06:27:25 EDT
2.6.32-042stab055.10 kernel has the fix from comment 13.
Comment 21 Francisco Dias 2012-05-15 15:15:12 EDT
Any update on if anyone has seen any crashes yet?

I've pushed this patch to a couple production boxes and haven't seen anything yet, but figured i'd check in :)

Francisco
Comment 22 Dave 2012-05-15 21:15:16 EDT
vzkernel-2.6.32-042stab055.7.debug.2.x86_64.rpm has been solid with no debug output to console for 4 days. I usually see the soft lockup within 6-7 days at the longest. I am 99% sure this kernel fixed it - I will definitely report any further issues.
Comment 23 Francisco Dias 2012-05-15 21:37:38 EDT
Awesome to hear that :)

All goes well this will address a bug report I had in December regarding this same thing.

Francisco
Comment 24 Konstantin Khlebnikov 2012-05-16 01:11:34 EDT
fixed in 042stab055.10
Comment 25 Sergey Bronnikov 2015-03-17 04:00:40 EDT
Bug was fixed more than one year ago and there were no complains from reporter after fix. We treat bug as fixed and close bug.