Re: [Observium] observium Digest, Vol 61, Issue 74
unsubscribe
Marko Uusitalo Senior Lecturer Helsinki Metropolia University of Applied Sciences Bulevardi 31 00180 HELSINKI FINLAND
GSM +358 50 3525 975 Email Marko.uusitalo@metropolia.fi www.metropolia.fi/en
________________________________________ Lähettäjä: observium [observium-bounces@observium.org] käyttäjän observium-request@observium.org [observium-request@observium.org] puolesta Lähetetty: 12. elokuuta 2015 9:50 Vastaanottaja: observium@observium.org Aihe: observium Digest, Vol 61, Issue 74
Send observium mailing list submissions to observium@observium.org
To subscribe or unsubscribe via the World Wide Web, visit http://postman.memetic.org/cgi-bin/mailman/listinfo/observium or, via email, send a message with subject or body 'help' to observium-request@observium.org
You can reach the person managing the list at observium-owner@observium.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of observium digest..."
Today's Topics:
1. Re: 'Duplicate entry' issues on vlans_fdb (Aaron Mayfield)
----------------------------------------------------------------------
Message: 1 Date: Wed, 12 Aug 2015 06:50:01 +0000 From: Aaron Mayfield amayfield@artisaninfrastructure.com To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb Message-ID: f42b01e9c6a44b1783eb4bc895f324f9@NG-USA-MXAI.neverfailgroup.com Content-Type: text/plain; charset="utf-8"
Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I?m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers [2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers [2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers [2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers [2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers [2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers [2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers [2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers [2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers [2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers [2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers [2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers [2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I?m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cxmailto:tom.laermans@powersource.cx> wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn?t any problems running it (back to being fast). Also don?t seem to be getting the errors in the db.log when running the poller ?one-at-a-time?.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.comhttp://www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.commailto:support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential ? Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential ? Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential ? Artisan Infrastructure, Inc. et all.)
participants (1)
-
Marko Uusitalo