Bug 1757 - UBCs contain CTIDs that have are deleted unmounted down, causes vzmemcheck to return incorrect data
UBCs contain CTIDs that have are deleted unmounted down, causes vzmemcheck to...
Status: CLOSED FIXED
Product: OpenVZ
Classification: Unclassified
Component: vzctl
zzz-rhel5-2.6.18_028stabXXX
x86_64 (AMD64) RHEL/CentOS 5
: P2 major
Assigned To: Kir Kolyshkin
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-02-02 00:28 EST by Pete de Zwart
Modified: 2015-04-01 14:43 EDT (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pete de Zwart 2011-02-02 00:28:22 EST
Attempting to calculate utilisation, commitment & limit indices for our OpenVZ HNs is inaccurate given that vzmemcheck is including containers that are deleted:

pdzwart@atlassian45:~/OpenVZ[23:16:08](0,0)$ sudo /usr/sbin/vzlist
Container(s) not found
pdzwart@atlassian45:~/OpenVZ[23:16:32](0,1)$ sudo /usr/sbin/vzmemcheck -v
Output values in %
veid        LowMem  LowMem     RAM MemSwap MemSwap   Alloc   Alloc   Alloc
              util  commit    util    util  commit    util  commit   limit
1031          0.00   13.13    0.00    0.00   12.70    0.00   12.70   57.78
1005          0.00   13.13    0.00    0.00   12.70    0.00   12.70   57.78
1010          0.00   13.13    0.00    0.00   12.70    0.00   12.70   57.78
1017          0.00   13.13    0.00    0.00   12.70    0.00   12.70   57.78
1016          0.00   13.13    0.00    0.00   12.70    0.00   12.70   57.78
1015          0.00   13.13    0.00    0.00   12.70    0.00   12.70   57.78
1014          0.00   13.13    0.00    0.00   12.70    0.00   12.70   57.78
1013          0.00   13.13    0.00    0.00   12.70    0.00   12.70   57.78
1012          0.00   13.13    0.00    0.00   12.70    0.00   12.70   57.78
1009          0.00   13.13    0.00    0.00   12.70    0.00   12.70   57.78
1008          0.00   13.13    0.00    0.00   12.70    0.00   12.70   57.78
-------------------------------------------------------------------------
Summary:      0.00  144.38    0.00    0.00  139.75    0.00  139.75  635.53
pdzwart@atlassian45:~/OpenVZ[23:16:35](0,0)$ sudo /usr/sbin/vzctl status 1031
CTID 1031 deleted unmounted down
pdzwart@atlassian45:~/OpenVZ[23:21:26](0,0)$ 

There are also entries in both /proc/user_beancounters and /proc/bc/${CTID}/ that show allocations held:

pdzwart@atlassian45:~/OpenVZ[23:24:56](0,0)$ export CTID=1010
pdzwart@atlassian45:~/OpenVZ[23:25:44](0,0)$ sudo egrep -A23 "${CTID}:" /proc/user_beancounters
     1010:  kmemsize                    45366             20936957            841784627            925963089                    0
            lockedpages                     0                    0                41102                41102                    0
            privvmpages                     0              1438840              4476066              4923672                    0
            shmpages                        0                 3872               447606               447606                    0
            dummy                           0                    0                    0                    0                    0
            numproc                         0                  321                20550                20550                    0
            physpages                       0               685472                    0  9223372036854775807                    0
            vmguarpages                     0                    0               746011  9223372036854775807                    0
            oomguarpages                    0               685472               746011  9223372036854775807                    0
            numtcpsock                      0                  430                20550                20550                    0
            numflock                        0                   10                 1000                 1100                    0
            numpty                          0                    6                  512                  512                    0
            numsiginfo                      0                   11                 1024                 1024                    0
            tcpsndbuf                  156608             30611240            196422075            280594875                    0
            tcprcvbuf                       0            117644640            196422075            280594875                    0
            othersockbuf                    0               295872             98211037            182383837                    0
            dgramrcvbuf                     0                28400             98211037             98211037                    0
            numothersock                    0                   41                20550                20550                    0
            dcachesize                   3756                62913            183872621            189388800                    0
            numfile                         0                 7046               328800               328800                    0
            dummy                           0                    0                    0                    0                    0
            dummy                           0                    0                    0                    0                    0
            dummy                           0                    0                    0                    0                    0
            numiptent                       0                   14                  200                  200                    0
pdzwart@atlassian45:~/OpenVZ[23:25:48](0,0)$ sudo cat /proc/bc/${CTID}/resources
            kmemsize                    45366             20936957            841784627            925963089                    0
            lockedpages                     0                    0                41102                41102                    0
            privvmpages                     0              1438840              4476066              4923672                    0
            shmpages                        0                 3872               447606               447606                    0
            numproc                         0                  321                20550                20550                    0
            physpages                       0               685472                    0  9223372036854775807                    0
            vmguarpages                     0                    0               746011  9223372036854775807                    0
            oomguarpages                    0               685472               746011  9223372036854775807                    0
            numtcpsock                      0                  430                20550                20550                    0
            numflock                        0                   10                 1000                 1100                    0
            numpty                          0                    6                  512                  512                    0
            numsiginfo                      0                   11                 1024                 1024                    0
            tcpsndbuf                  156608             30611240            196422075            280594875                    0
            tcprcvbuf                       0            117644640            196422075            280594875                    0
            othersockbuf                    0               295872             98211037            182383837                    0
            dgramrcvbuf                     0                28400             98211037             98211037                    0
            numothersock                    0                   41                20550                20550                    0
            dcachesize                   3756                62913            183872621            189388800                    0
            numfile                         0                 7046               328800               328800                    0
            numiptent                       0                   14                  200                  200                    0
            swappages                       0                    0  9223372036854775807  9223372036854775807                    0
pdzwart@atlassian45:~/OpenVZ[23:25:58](0,0)$ sudo /usr/sbin/vzctl status ${CTID}
CTID 1010 deleted unmounted down
pdzwart@atlassian45:~/OpenVZ[23:26:04](0,0)$ 

Package versions are as follows:

pdzwart@atlassian45:~/OpenVZ[23:26:26](0,1)$ rpm -qa |grep vz
vzctl-lib-3.0.24.1-1
vzpkg-2.7.0-18
vztmpl-fedora-9-1.1-1
vzrpm44-4.4.1-22.5
vztmpl-fedora-core-3-2.0-2
vzctl-3.0.24.1-1
vzrpm43-python-4.3.3-7_nonptl.6
vztmpl-centos-5-2.0-3
vztmpl-fedora-core-6-1.2-1
vzquota-3.0.12-1
vzrpm44-python-4.4.1-22.5
vztmpl-centos-4-2.0-2
vztmpl-fedora-core-5-2.0-2
vzrpm43-4.3.3-7_nonptl.6
vztmpl-fedora-core-4-2.0-2
ovzkernel-2.6.18-194.8.1.el5.028stab070.2
vzyum-2.4.0-11
vztmpl-fedora-7-1.1-1
pdzwart@atlassian45:~/OpenVZ[23:26:36](0,0)$ 

Our workaround is a python script that parses /proc/user_beancounters correlating running containers from a vzlist execution.
Comment 1 Kir Kolyshkin 2011-02-04 07:27:48 EST
Pete,

Thanks for reporting! Please specify what kernel are you using?

Note that held!=0 in beancounters means leak(s), and this is most probably the kernel bug, thus reassigning to kernel.


From the tools point of view, maybe it makes sense to exclude stopped containers beancounters, but this is usually not an issue, because beancounters for stopped containers are down to zero and thus removed.
Comment 2 Pete de Zwart 2011-02-06 17:21:04 EST
From /var/log/dmesg:

Linux version 2.6.18-164.15.1.el5.028stab068.9 (root@rhel5-build-x64) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) #1 SMP Tue Mar 30 18:07:38 MSD 2010

From /boot/grub/menu.list

title Red Hat Enterprise Linux Server (2.6.18-164.15.1.el5.028stab068.9)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-164.15.1.el5.028stab068.9 ro root=LABEL=/ rhgb quiet elevator=deadline selinux=0
        initrd /initrd-2.6.18-164.15.1.el5.028stab068.9.img
Comment 3 Kir Kolyshkin 2011-02-07 04:17:36 EST
Pete,

This kernel, 028stab068.9, was released at the end of March 2010, ie about 10 months ago. It is old and buggy. I suggest you to update to the latest rhel5 stable kernel (http://wiki.openvz.org/Download/kernel/rhel5) and give it a try. If you will see UBC leaks with the latest kernel, please file a separate bug for that.

As for the tools, indeed we need to filter out beancounters for running CTs only -- reassigning the bug back to vzctl. I will work on that.
Comment 4 Pete de Zwart 2011-02-07 17:02:27 EST
Kir, we'll upgrade to the latest stable and advise if it resolves the issue with UBCs for deleted unmounted down containers.
Comment 5 Kir Kolyshkin 2011-02-09 10:37:10 EST
It makes sense to upgrade your kernels from time to time anyway -- fixed bugs and security holes, better performance, etc. Please do so. If you don't want to stop all CTs, use live migration (migrate out/reboot/migrate back).

As for the bug itself, I have fixed it. The following GIT commits are relevant:
http://git.openvz.org/?p=vzctl;a=commit;h=24cc0e560700d568175112ea4cc4895c91f26504
http://git.openvz.org/?p=vzctl;a=commit;h=0517647e2e91c934142b8f59112e2ca4df762334
http://git.openvz.org/?p=vzctl;a=commit;h=d94974e1f9be2dfae11cee75fdb67dd44066cf9b

Plus two cosmetic fixes to vzmemcheck while we're at it:
http://git.openvz.org/?p=vzctl;a=commit;h=2ffb55af838c4c8989cbf0bccf480dcaf55d5f19
http://git.openvz.org/?p=vzctl;a=commit;h=383e2756bc406e85d7f1d5e0024062f19599285e

Fix will appear in vzctl >= 3.0.26 (which I hope to release some time this month, better sooner than later).
Comment 6 Sergey Bronnikov 2015-04-01 14:43:23 EDT
Bug was fixed more than one year ago and there were no complains from reporter after fix. We believe bug fix helped and mark bug as closed.