Re: [Observium] 'Duplicate entry' issues on vlans_fdb
![](https://secure.gravatar.com/avatar/21caf0a08d095be7196a1648d20942be.jpg?s=120&d=mm&r=g)
If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam. On 12/08/2015 07:28:55, Tom Laermans tom.laermans@powersource.cx wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/7941076427c7cfce44646fa3eba4be42.jpg?s=120&d=mm&r=g)
Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers [2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers [2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers [2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers [2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers [2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers [2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers [2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers [2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers [2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers [2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers [2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers [2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cxmailto:tom.laermans@powersource.cx> wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.comhttp://www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.commailto:support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers [2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers [2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers [2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers [2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers [2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers [2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers [2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers [2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers [2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers [2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers [2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers [2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cxmailto:tom.laermans@powersource.cx> wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.comhttp://www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.commailto:support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/7941076427c7cfce44646fa3eba4be42.jpg?s=120&d=mm&r=g)
Haven’t found any specific smoking gun yet. There does seem to be a trend of all of my Arista devices taking longer to run than the others, but this is difficult to verify since most of my Observium devices are Arista.
If I want to disable the fdb-tables module for troubleshooting- is there a way to do that globally? Thanks
[amayfield@kc-netview observium]$ sudo ./poller-wrapper.py INFO: starting the poller at 2015/08/12 02:17:55 with 8 threads, slowest devices first INFO starting alerter.php for 55 INFO finished alerter.php for 55 INFO: worker Thread-8 finished device 55 in 130 seconds INFO starting alerter.php for 28 INFO finished alerter.php for 28 INFO: worker Thread-7 finished device 28 in 135 seconds INFO starting alerter.php for 27 INFO finished alerter.php for 27 INFO: worker Thread-6 finished device 27 in 143 seconds INFO starting alerter.php for 17 INFO finished alerter.php for 17 INFO: worker Thread-3 finished device 17 in 159 seconds INFO starting alerter.php for 22 INFO finished alerter.php for 22 INFO: worker Thread-5 finished device 22 in 172 seconds INFO starting alerter.php for 20 INFO finished alerter.php for 20 INFO: worker Thread-2 finished device 20 in 176 seconds INFO starting alerter.php for 52 INFO finished alerter.php for 52 INFO starting alerter.php for 21 INFO finished alerter.php for 21 INFO: worker Thread-8 finished device 52 in 147 seconds INFO: worker Thread-1 finished device 21 in 277 seconds INFO starting alerter.php for 56 INFO finished alerter.php for 56 INFO: worker Thread-7 finished device 56 in 151 seconds INFO starting alerter.php for 51 INFO finished alerter.php for 51 INFO: worker Thread-6 finished device 51 in 145 seconds INFO starting alerter.php for 37 INFO finished alerter.php for 37 INFO: worker Thread-3 finished device 37 in 130 seconds INFO starting alerter.php for 19 INFO finished alerter.php for 19 INFO: worker Thread-4 finished device 19 in 294 seconds INFO starting alerter.php for 38 INFO finished alerter.php for 38 INFO: worker Thread-5 finished device 38 in 125 seconds INFO starting alerter.php for 18 INFO finished alerter.php for 18 INFO: worker Thread-2 finished device 18 in 156 seconds INFO starting alerter.php for 1 INFO finished alerter.php for 1 INFO: worker Thread-4 finished device 1 in 61 seconds INFO starting alerter.php for 49 INFO finished alerter.php for 49 INFO: worker Thread-1 finished device 49 in 102 seconds INFO starting alerter.php for 14 INFO finished alerter.php for 14 INFO: worker Thread-5 finished device 14 in 83 seconds INFO starting alerter.php for 5 INFO finished alerter.php for 5 INFO: worker Thread-3 finished device 5 in 104 seconds INFO starting alerter.php for 50 INFO finished alerter.php for 50 INFO: worker Thread-6 finished device 50 in 105 seconds INFO starting alerter.php for 53 INFO finished alerter.php for 53 INFO: worker Thread-8 finished device 53 in 127 seconds INFO starting alerter.php for 16 INFO finished alerter.php for 16 INFO: worker Thread-5 finished device 16 in 36 seconds INFO starting alerter.php for 10 INFO finished alerter.php for 10 INFO: worker Thread-4 finished device 10 in 63 seconds INFO starting alerter.php for 54 INFO finished alerter.php for 54 INFO starting alerter.php for 6 INFO finished alerter.php for 6 INFO: worker Thread-7 finished device 54 in 137 seconds INFO: worker Thread-2 finished device 6 in 91 seconds INFO starting alerter.php for 11 INFO finished alerter.php for 11 INFO: worker Thread-6 finished device 11 in 32 seconds INFO starting alerter.php for 7 INFO finished alerter.php for 7 INFO: worker Thread-1 finished device 7 in 49 seconds INFO starting alerter.php for 36 INFO finished alerter.php for 36 INFO: worker Thread-8 finished device 36 in 26 seconds INFO starting alerter.php for 15 INFO finished alerter.php for 15 INFO: worker Thread-7 finished device 15 in 14 seconds INFO starting alerter.php for 41 INFO finished alerter.php for 41 INFO: worker Thread-2 finished device 41 in 18 seconds INFO starting alerter.php for 4 INFO finished alerter.php for 4 INFO: worker Thread-1 finished device 4 in 15 seconds INFO starting alerter.php for 23 INFO finished alerter.php for 23 INFO: worker Thread-6 finished device 23 in 18 seconds INFO starting alerter.php for 3 INFO finished alerter.php for 3 INFO: worker Thread-8 finished device 3 in 15 seconds INFO starting alerter.php for 25 INFO finished alerter.php for 25 INFO: worker Thread-5 finished device 25 in 29 seconds INFO starting alerter.php for 26 INFO finished alerter.php for 26 INFO: worker Thread-4 finished device 26 in 28 seconds INFO starting alerter.php for 33 INFO finished alerter.php for 33 INFO starting alerter.php for 29 INFO finished alerter.php for 29 INFO: worker Thread-2 finished device 33 in 5 seconds INFO: worker Thread-1 finished device 29 in 5 seconds INFO starting alerter.php for 35 INFO finished alerter.php for 35 INFO: worker Thread-8 finished device 35 in 3 seconds INFO starting alerter.php for 9 INFO finished alerter.php for 9 INFO: worker Thread-5 finished device 9 in 2 seconds INFO starting alerter.php for 2 INFO finished alerter.php for 2 INFO starting alerter.php for 31 INFO finished alerter.php for 31 INFO: worker Thread-3 finished device 2 in 56 seconds INFO: worker Thread-4 finished device 31 in 2 seconds INFO starting alerter.php for 30 INFO finished alerter.php for 30 INFO: worker Thread-6 finished device 30 in 4 seconds INFO starting alerter.php for 8 INFO finished alerter.php for 8 INFO starting alerter.php for 57 INFO finished alerter.php for 57 INFO: worker Thread-2 finished device 8 in 2 seconds INFO: worker Thread-1 finished device 57 in 2 seconds INFO starting alerter.php for 32 INFO finished alerter.php for 32 INFO: worker Thread-8 finished device 32 in 1 seconds INFO starting alerter.php for 24 INFO finished alerter.php for 24 INFO: worker Thread-7 finished device 24 in 15 seconds INFO: poller-wrapper.py polled 45 devices in 454 seconds with 8 workers
WARNING: the process took more than 5 minutes to finish, you need faster hardware or more threads INFO: in sequential style polling the elapsed time would have been: 3590 seconds WARNING: Consider setting a minimum of 13 threads. (This does not constitute professional advice!) [amayfield@kc-netview observium]$
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers [2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers [2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers [2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers [2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers [2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers [2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers [2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers [2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers [2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers [2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers [2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers [2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cxmailto:tom.laermans@powersource.cx> wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.comhttp://www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.commailto:support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
In your config.php :
$config['poller_modules']['fdb-table'] = 1;
adam. On 13/08/2015 19:04:41, Aaron Mayfield amayfield@artisaninfrastructure.com wrote: Haven’t found any specific smoking gun yet. There does seem to be a trend of all of my Arista devices taking longer to run than the others, but this is difficult to verify since most of my Observium devices are Arista. If I want to disable the fdb-tables module for troubleshooting- is there a way to do that globally? Thanks [amayfield@kc-netview observium]$ sudo ./poller-wrapper.py INFO: starting the poller at 2015/08/12 02:17:55 with 8 threads, slowest devices first INFO starting alerter.php for 55 INFO finished alerter.php for 55 INFO: worker Thread-8 finished device 55 in 130 seconds INFO starting alerter.php for 28 INFO finished alerter.php for 28 INFO: worker Thread-7 finished device 28 in 135 seconds INFO starting alerter.php for 27 INFO finished alerter.php for 27 INFO: worker Thread-6 finished device 27 in 143 seconds INFO starting alerter.php for 17 INFO finished alerter.php for 17 INFO: worker Thread-3 finished device 17 in 159 seconds INFO starting alerter.php for 22 INFO finished alerter.php for 22 INFO: worker Thread-5 finished device 22 in 172 seconds INFO starting alerter.php for 20 INFO finished alerter.php for 20 INFO: worker Thread-2 finished device 20 in 176 seconds INFO starting alerter.php for 52 INFO finished alerter.php for 52 INFO starting alerter.php for 21 INFO finished alerter.php for 21 INFO: worker Thread-8 finished device 52 in 147 seconds INFO: worker Thread-1 finished device 21 in 277 seconds INFO starting alerter.php for 56 INFO finished alerter.php for 56 INFO: worker Thread-7 finished device 56 in 151 seconds INFO starting alerter.php for 51 INFO finished alerter.php for 51 INFO: worker Thread-6 finished device 51 in 145 seconds INFO starting alerter.php for 37 INFO finished alerter.php for 37 INFO: worker Thread-3 finished device 37 in 130 seconds INFO starting alerter.php for 19 INFO finished alerter.php for 19 INFO: worker Thread-4 finished device 19 in 294 seconds INFO starting alerter.php for 38 INFO finished alerter.php for 38 INFO: worker Thread-5 finished device 38 in 125 seconds INFO starting alerter.php for 18 INFO finished alerter.php for 18 INFO: worker Thread-2 finished device 18 in 156 seconds INFO starting alerter.php for 1 INFO finished alerter.php for 1 INFO: worker Thread-4 finished device 1 in 61 seconds INFO starting alerter.php for 49 INFO finished alerter.php for 49 INFO: worker Thread-1 finished device 49 in 102 seconds INFO starting alerter.php for 14 INFO finished alerter.php for 14 INFO: worker Thread-5 finished device 14 in 83 seconds INFO starting alerter.php for 5 INFO finished alerter.php for 5 INFO: worker Thread-3 finished device 5 in 104 seconds INFO starting alerter.php for 50 INFO finished alerter.php for 50 INFO: worker Thread-6 finished device 50 in 105 seconds INFO starting alerter.php for 53 INFO finished alerter.php for 53 INFO: worker Thread-8 finished device 53 in 127 seconds INFO starting alerter.php for 16 INFO finished alerter.php for 16 INFO: worker Thread-5 finished device 16 in 36 seconds INFO starting alerter.php for 10 INFO finished alerter.php for 10 INFO: worker Thread-4 finished device 10 in 63 seconds INFO starting alerter.php for 54 INFO finished alerter.php for 54 INFO starting alerter.php for 6 INFO finished alerter.php for 6 INFO: worker Thread-7 finished device 54 in 137 seconds INFO: worker Thread-2 finished device 6 in 91 seconds INFO starting alerter.php for 11 INFO finished alerter.php for 11 INFO: worker Thread-6 finished device 11 in 32 seconds INFO starting alerter.php for 7 INFO finished alerter.php for 7 INFO: worker Thread-1 finished device 7 in 49 seconds INFO starting alerter.php for 36 INFO finished alerter.php for 36 INFO: worker Thread-8 finished device 36 in 26 seconds INFO starting alerter.php for 15 INFO finished alerter.php for 15 INFO: worker Thread-7 finished device 15 in 14 seconds INFO starting alerter.php for 41 INFO finished alerter.php for 41 INFO: worker Thread-2 finished device 41 in 18 seconds INFO starting alerter.php for 4 INFO finished alerter.php for 4 INFO: worker Thread-1 finished device 4 in 15 seconds INFO starting alerter.php for 23 INFO finished alerter.php for 23 INFO: worker Thread-6 finished device 23 in 18 seconds INFO starting alerter.php for 3 INFO finished alerter.php for 3 INFO: worker Thread-8 finished device 3 in 15 seconds INFO starting alerter.php for 25 INFO finished alerter.php for 25 INFO: worker Thread-5 finished device 25 in 29 seconds INFO starting alerter.php for 26 INFO finished alerter.php for 26 INFO: worker Thread-4 finished device 26 in 28 seconds INFO starting alerter.php for 33 INFO finished alerter.php for 33 INFO starting alerter.php for 29 INFO finished alerter.php for 29 INFO: worker Thread-2 finished device 33 in 5 seconds INFO: worker Thread-1 finished device 29 in 5 seconds INFO starting alerter.php for 35 INFO finished alerter.php for 35 INFO: worker Thread-8 finished device 35 in 3 seconds INFO starting alerter.php for 9 INFO finished alerter.php for 9 INFO: worker Thread-5 finished device 9 in 2 seconds INFO starting alerter.php for 2 INFO finished alerter.php for 2 INFO starting alerter.php for 31 INFO finished alerter.php for 31 INFO: worker Thread-3 finished device 2 in 56 seconds INFO: worker Thread-4 finished device 31 in 2 seconds INFO starting alerter.php for 30 INFO finished alerter.php for 30 INFO: worker Thread-6 finished device 30 in 4 seconds INFO starting alerter.php for 8 INFO finished alerter.php for 8 INFO starting alerter.php for 57 INFO finished alerter.php for 57 INFO: worker Thread-2 finished device 8 in 2 seconds INFO: worker Thread-1 finished device 57 in 2 seconds INFO starting alerter.php for 32 INFO finished alerter.php for 32 INFO: worker Thread-8 finished device 32 in 1 seconds INFO starting alerter.php for 24 INFO finished alerter.php for 24 INFO: worker Thread-7 finished device 24 in 15 seconds INFO: poller-wrapper.py polled 45 devices in 454 seconds with 8 workers WARNING: the process took more than 5 minutes to finish, you need faster hardware or more threads INFO: in sequential style polling the elapsed time would have been: 3590 seconds WARNING: Consider setting a minimum of 13 threads. (This does not constitute professional advice!) [amayfield@kc-netview observium]$ From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb Hi Aaron, Can you tell which device is taking a long time? You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar) Screenshots of those might help :) Thanks, Adam. Sent with AquaMail for Android http://www.aqua-mail.com [http://www.aqua-mail.com] On 12 August 2015 7:50:28 am Aaron Mayfield <amayfield@artisaninfrastructure.com [mailto:amayfield@artisaninfrastructure.com]> wrote: Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade: [2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers [2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers [2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers [2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers [2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers [2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers [2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers [2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers [2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers [2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers [2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers [2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers [2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other. So I have some kind of performance issue. One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not. Going to continue to try the poller manually and see if I can figure out where the slowdown is. From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.org [mailto:observium@observium.org] Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb Man, why didn't I think of this? This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module. If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data. If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database. adam. On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cx [mailto:tom.laermans@powersource.cx]> wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org [mailto:observium@observium.org] Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.com [http://www.artisaninfrastructure.com] Partner portal: https://portal.vpdc.us [https://portal.vpdc.us] Partner support: support@artisaninfrastructure.com [mailto:support@artisaninfrastructure.com]
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org [mailto:observium@observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium]
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.org [mailto:observium@observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium]
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org [mailto:observium%40observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium] This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
![](https://secure.gravatar.com/avatar/7941076427c7cfce44646fa3eba4be42.jpg?s=120&d=mm&r=g)
Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers [2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers [2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers [2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers [2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers [2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers [2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers [2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers [2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers [2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers [2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers [2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers [2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cxmailto:tom.laermans@powersource.cx> wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.comhttp://www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.commailto:support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield amayfield@artisaninfrastructure.com wrote: Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these. This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname. From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb Hi Aaron, Can you tell which device is taking a long time? You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar) Screenshots of those might help :) Thanks, Adam. Sent with AquaMail for Android http://www.aqua-mail.com [http://www.aqua-mail.com] On 12 August 2015 7:50:28 am Aaron Mayfield <amayfield@artisaninfrastructure.com [mailto:amayfield@artisaninfrastructure.com]> wrote: Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade: [2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers [2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers [2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers [2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers [2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers [2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers [2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers [2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers [2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers [2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers [2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers [2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers [2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other. So I have some kind of performance issue. One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not. Going to continue to try the poller manually and see if I can figure out where the slowdown is. From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.org [mailto:observium@observium.org] Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb Man, why didn't I think of this? This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module. If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data. If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database. adam. On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cx [mailto:tom.laermans@powersource.cx]> wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org [mailto:observium@observium.org] Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.com [http://www.artisaninfrastructure.com] Partner portal: https://portal.vpdc.us [https://portal.vpdc.us] Partner support: support@artisaninfrastructure.com [mailto:support@artisaninfrastructure.com]
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org [mailto:observium@observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium]
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.org [mailto:observium@observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium]
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org [mailto:observium%40observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium] This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
![](https://secure.gravatar.com/avatar/7941076427c7cfce44646fa3eba4be42.jpg?s=120&d=mm&r=g)
OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots.
Thanks
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Thursday, August 13, 2015 4:48 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System <observium@observium.orgmailto:observium@observium.org> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers [2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers [2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers [2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers [2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers [2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers [2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers [2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers [2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers [2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers [2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers [2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers [2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cxmailto:tom.laermans@powersource.cx> wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.comhttp://www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.commailto:support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
![](https://secure.gravatar.com/avatar/7941076427c7cfce44646fa3eba4be42.jpg?s=120&d=mm&r=g)
Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently?
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Thursday, August 13, 2015 5:54 PM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots.
Thanks
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Thursday, August 13, 2015 4:48 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System <observium@observium.orgmailto:observium@observium.org> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers [2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers [2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers [2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers [2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers [2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers [2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers [2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers [2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers [2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers [2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers [2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers [2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cxmailto:tom.laermans@powersource.cx> wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.comhttp://www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.commailto:support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
We've an Arista engineer floating around here somewhere, I'll ask him :)
adam. On 17/08/2015 15:03:12, Aaron Mayfield amayfield@artisaninfrastructure.com wrote: Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently? From: observium [mailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Thursday, August 13, 2015 5:54 PM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots. Thanks From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Thursday, August 13, 2015 4:48 PM To: observium@observium.org [mailto:observium@observium.org] Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb You also want the device performance for one of these arista devices, where you'll see which module is taking the time. It's on the right hand side of the device navbar. adam. On 13/08/2015 22:46:05, Aaron Mayfield <amayfield@artisaninfrastructure.com [mailto:amayfield@artisaninfrastructure.com]> wrote: Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these. This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname. From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System <observium@observium.org [mailto:observium@observium.org]> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb Hi Aaron, Can you tell which device is taking a long time? You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar) Screenshots of those might help :) Thanks, Adam. Sent with AquaMail for Android http://www.aqua-mail.com [http://www.aqua-mail.com] On 12 August 2015 7:50:28 am Aaron Mayfield <amayfield@artisaninfrastructure.com [mailto:amayfield@artisaninfrastructure.com]> wrote: Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade: [2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers [2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers [2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers [2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers [2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers [2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers [2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers [2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers [2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers [2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers [2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers [2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers [2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other. So I have some kind of performance issue. One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not. Going to continue to try the poller manually and see if I can figure out where the slowdown is. From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.org [mailto:observium@observium.org] Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb Man, why didn't I think of this? This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module. If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data. If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database. adam. On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cx [mailto:tom.laermans@powersource.cx]> wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org [mailto:observium@observium.org] Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.com [http://www.artisaninfrastructure.com] Partner portal: https://portal.vpdc.us [https://portal.vpdc.us] Partner support: support@artisaninfrastructure.com [mailto:support@artisaninfrastructure.com]
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org [mailto:observium@observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium]
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.org [mailto:observium@observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium] This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org [mailto:observium%40observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium] This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
![](https://secure.gravatar.com/avatar/1052eb8dc1534e3c73eb849a27827e0a.jpg?s=120&d=mm&r=g)
I've been disabling the fdb-tables poller in my observium instances due to performance problems for a while. Adam poked me to look into this, so I root caused it to the code that displays the contents of the fdb to the terminal. You can apply
http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff
to see this effect - it prints out the time taken to print the table if it takes more than 2 seconds. In my test case, it takes around 30 seconds to print a 2800-entry fdb (and the fdb-table module takes around 33 seconds).
So, it's nothing Arista-specific - it's some behavior of the table printer. The workaround is to comment out the calls to print_cli_table() in includes/polling/fdb-table.inc.php . Polling my sample Arista with about 2800 FDB entries, the module takes 33.4 seconds with the print_cli_table() calls and 6 seconds without.
Bill
On Mon, Aug 17, 2015 at 10:02 AM, Aaron Mayfield < amayfield@artisaninfrastructure.com> wrote:
Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently?
*From:* observium [mailto:observium-bounces@observium.org] *On Behalf Of *Aaron Mayfield *Sent:* Thursday, August 13, 2015 5:54 PM
*To:* Observium Network Observation System observium@observium.org *Subject:* Re: [Observium] 'Duplicate entry' issues on vlans_fdb
OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots.
Thanks
*From:* observium [mailto:observium-bounces@observium.org observium-bounces@observium.org] *On Behalf Of *Adam Armstrong *Sent:* Thursday, August 13, 2015 4:48 PM *To:* observium@observium.org *Subject:* Re: [Observium] 'Duplicate entry' issues on vlans_fdb
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield < amayfield@artisaninfrastructure.com> wrote:
Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
*From:* observium [mailto:observium-bounces@observium.org observium-bounces@observium.org] *On Behalf Of *Adam Armstrong *Sent:* Wednesday, August 12, 2015 3:58 AM *To:* Observium Network Observation System observium@observium.org *Subject:* Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield < amayfield@artisaninfrastructure.com> wrote:
Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers
[2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers
[2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers
[2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers
[2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers
[2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers
[2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers
[2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers
[2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers
[2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers
[2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers
[2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers
[2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
*From:* observium [mailto:observium-bounces@observium.org observium-bounces@observium.org] *On Behalf Of *Adam Armstrong *Sent:* Wednesday, August 12, 2015 1:33 AM *To:* observium@observium.org *Subject:* Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans tom.laermans@powersource.cx wrote:
If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller
processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org
observium-bounces@observium.org] On Behalf Of Adam Armstrong
Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php
-h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was
several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 -
Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned')
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 -
Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned')
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 -
Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned')
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 -
Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned')
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 -
Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems
to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ;
+-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name |
Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+
| vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL |
| BTREE | |
| vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL |
| BTREE | |
| vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL |
NULL | | BTREE | |
| vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | |
BTREE | |
| vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES |
BTREE | |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+
5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has
had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
*This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)* _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
*This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)*
*This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)* *This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)*
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
Ahh. Thanks Bill! This would explain it.
I don't have any devices with large numbers of fdb entries, so I wouldn't have caught the abysmal performance.
I'll disable the table mode for this module.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 17 August 2015 20:15:37 Bill Fenner fenner@gmail.com wrote:
I've been disabling the fdb-tables poller in my observium instances due to performance problems for a while. Adam poked me to look into this, so I root caused it to the code that displays the contents of the fdb to the terminal. You can apply
http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff
to see this effect - it prints out the time taken to print the table if it takes more than 2 seconds. In my test case, it takes around 30 seconds to print a 2800-entry fdb (and the fdb-table module takes around 33 seconds).
So, it's nothing Arista-specific - it's some behavior of the table printer. The workaround is to comment out the calls to print_cli_table() in includes/polling/fdb-table.inc.php . Polling my sample Arista with about 2800 FDB entries, the module takes 33.4 seconds with the print_cli_table() calls and 6 seconds without.
Bill
On Mon, Aug 17, 2015 at 10:02 AM, Aaron Mayfield < amayfield@artisaninfrastructure.com> wrote:
Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently?
*From:* observium [mailto:observium-bounces@observium.org] *On Behalf Of *Aaron Mayfield *Sent:* Thursday, August 13, 2015 5:54 PM
*To:* Observium Network Observation System observium@observium.org *Subject:* Re: [Observium] 'Duplicate entry' issues on vlans_fdb
OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots.
Thanks
*From:* observium [mailto:observium-bounces@observium.org observium-bounces@observium.org] *On Behalf Of *Adam Armstrong *Sent:* Thursday, August 13, 2015 4:48 PM *To:* observium@observium.org *Subject:* Re: [Observium] 'Duplicate entry' issues on vlans_fdb
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield < amayfield@artisaninfrastructure.com> wrote:
Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
*From:* observium [mailto:observium-bounces@observium.org observium-bounces@observium.org] *On Behalf Of *Adam Armstrong *Sent:* Wednesday, August 12, 2015 3:58 AM *To:* Observium Network Observation System observium@observium.org *Subject:* Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield < amayfield@artisaninfrastructure.com> wrote:
Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers
[2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers
[2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers
[2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers
[2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers
[2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers
[2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers
[2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers
[2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers
[2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers
[2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers
[2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers
[2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
*From:* observium [mailto:observium-bounces@observium.org observium-bounces@observium.org] *On Behalf Of *Adam Armstrong *Sent:* Wednesday, August 12, 2015 1:33 AM *To:* observium@observium.org *Subject:* Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans tom.laermans@powersource.cx wrote:
If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller
processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org
observium-bounces@observium.org] On Behalf Of Adam Armstrong
Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php
-h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was
several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 -
Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned')
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 -
Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned')
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 -
Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned')
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 -
Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned')
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 -
Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems
to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ;
+-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name |
Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+
| vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL |
| BTREE | |
| vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL |
| BTREE | |
| vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL |
NULL | | BTREE | |
| vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | |
BTREE | |
| vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES |
BTREE | |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+
5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has
had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
*This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)* _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
*This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)*
*This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)* *This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)*
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/1052eb8dc1534e3c73eb849a27827e0a.jpg?s=120&d=mm&r=g)
I still had another device that had worse fdb-table performance time than I would expect, and found a little bug in the code that updates fdb_status. If you ever end up with an entry that changes its fdb_status value (e.g., from "learned" to "invalid" before it gets aged out), this code causes the new value to be '' instead of the new value; then every subsequent run finds that fdb_status is wrong and attempts to rewrite it but rewrites it to ''. The diff:
http://www.fenron.com/~fenner/observium-update-fdb-status.diff
This brought this device's fdb-table down from 40s to 18s.
Bill
On Mon, Aug 17, 2015 at 3:41 PM, Adam Armstrong adama@memetic.org wrote:
Ahh. Thanks Bill! This would explain it.
I don't have any devices with large numbers of fdb entries, so I wouldn't have caught the abysmal performance.
I'll disable the table mode for this module.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 17 August 2015 20:15:37 Bill Fenner fenner@gmail.com wrote:
I've been disabling the fdb-tables poller in my observium instances due to performance problems for a while. Adam poked me to look into this, so I root caused it to the code that displays the contents of the fdb to the terminal. You can apply
http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff
to see this effect - it prints out the time taken to print the table if it takes more than 2 seconds. In my test case, it takes around 30 seconds to print a 2800-entry fdb (and the fdb-table module takes around 33 seconds).
So, it's nothing Arista-specific - it's some behavior of the table printer. The workaround is to comment out the calls to print_cli_table() in includes/polling/fdb-table.inc.php . Polling my sample Arista with about 2800 FDB entries, the module takes 33.4 seconds with the print_cli_table() calls and 6 seconds without.
Bill
On Mon, Aug 17, 2015 at 10:02 AM, Aaron Mayfield < amayfield@artisaninfrastructure.com> wrote:
Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently?
*From:* observium [mailto:observium-bounces@observium.org] *On Behalf Of *Aaron Mayfield *Sent:* Thursday, August 13, 2015 5:54 PM
*To:* Observium Network Observation System observium@observium.org *Subject:* Re: [Observium] 'Duplicate entry' issues on vlans_fdb
OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots.
Thanks
*From:* observium [mailto:observium-bounces@observium.org observium-bounces@observium.org] *On Behalf Of *Adam Armstrong *Sent:* Thursday, August 13, 2015 4:48 PM *To:* observium@observium.org *Subject:* Re: [Observium] 'Duplicate entry' issues on vlans_fdb
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield < amayfield@artisaninfrastructure.com> wrote:
Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
*From:* observium [mailto:observium-bounces@observium.org observium-bounces@observium.org] *On Behalf Of *Adam Armstrong *Sent:* Wednesday, August 12, 2015 3:58 AM *To:* Observium Network Observation System observium@observium.org *Subject:* Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield < amayfield@artisaninfrastructure.com> wrote:
Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers
[2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers
[2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers
[2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers
[2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers
[2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers
[2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers
[2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers
[2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers
[2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers
[2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers
[2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers
[2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
*From:* observium [mailto:observium-bounces@observium.org observium-bounces@observium.org] *On Behalf Of *Adam Armstrong *Sent:* Wednesday, August 12, 2015 1:33 AM *To:* observium@observium.org *Subject:* Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans tom.laermans@powersource.cx wrote:
If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller
processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org
observium-bounces@observium.org] On Behalf Of Adam Armstrong
Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php
-h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was
several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062
- Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query:
INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned')
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062
- Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query:
INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned')
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062
- Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query:
INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned')
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062
- Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query:
INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned')
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062
- Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query:
INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems
to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ;
+-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name |
Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+
| vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL
| | BTREE | |
| vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL
| | BTREE | |
| vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL |
NULL | | BTREE | |
| vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | |
BTREE | |
| vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES
| BTREE | |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+
5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list
has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
*This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)* _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
*This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)*
*This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)* *This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)*
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
These changes are now in both trunk and stable. I've disabled the table output until we can investigate the performance of large table outputs.
adam.
On 17/08/2015 20:59:08, Bill Fenner fenner@gmail.com wrote: I still had another device that had worse fdb-table performance time than I would expect, and found a little bug in the code that updates fdb_status. If you ever end up with an entry that changes its fdb_status value (e.g., from "learned" to "invalid" before it gets aged out), this code causes the new value to be '' instead of the new value; then every subsequent run finds that fdb_status is wrong and attempts to rewrite it but rewrites it to ''. The diff:
http://www.fenron.com/~fenner/observium-update-fdb-status.diff [http://www.fenron.com/~fenner/observium-update-fdb-status.diff]
This brought this device's fdb-table down from 40s to 18s.
Bill
On Mon, Aug 17, 2015 at 3:41 PM, Adam Armstrong <adama@memetic.org [mailto:adama@memetic.org]> wrote:
Ahh. Thanks Bill! This would explain it. I don't have any devices with large numbers of fdb entries, so I wouldn't have caught the abysmal performance. I'll disable the table mode for this module. Adam. Sent with AquaMail for Android http://www.aqua-mail.com [http://www.aqua-mail.com] On 17 August 2015 20:15:37 Bill Fenner <fenner@gmail.com [mailto:fenner@gmail.com]> wrote: I've been disabling the fdb-tables poller in my observium instances due to performance problems for a while. Adam poked me to look into this, so I root caused it to the code that displays the contents of the fdb to the terminal. You can apply
http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff [http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff]
to see this effect - it prints out the time taken to print the table if it takes more than 2 seconds. In my test case, it takes around 30 seconds to print a 2800-entry fdb (and the fdb-table module takes around 33 seconds).
So, it's nothing Arista-specific - it's some behavior of the table printer. The workaround is to comment out the calls to print_cli_table() in includes/polling/fdb-table.inc.php . Polling my sample Arista with about 2800 FDB entries, the module takes 33.4 seconds with the print_cli_table() calls and 6 seconds without.
Bill
On Mon, Aug 17, 2015 at 10:02 AM, Aaron Mayfield <amayfield@artisaninfrastructure.com [mailto:amayfield@artisaninfrastructure.com]> wrote:
Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently? From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Aaron Mayfield Sent: Thursday, August 13, 2015 5:54 PM
To: Observium Network Observation System <observium@observium.org [mailto:observium@observium.org]> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots. Thanks From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Thursday, August 13, 2015 4:48 PM To: observium@observium.org [mailto:observium@observium.org] Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb You also want the device performance for one of these arista devices, where you'll see which module is taking the time. It's on the right hand side of the device navbar. adam. On 13/08/2015 22:46:05, Aaron Mayfield <amayfield@artisaninfrastructure.com [mailto:amayfield@artisaninfrastructure.com]> wrote: Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these. This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname. From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System <observium@observium.org [mailto:observium@observium.org]> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb Hi Aaron, Can you tell which device is taking a long time? You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar) Screenshots of those might help :) Thanks, Adam. Sent with AquaMail for Android http://www.aqua-mail.com [http://www.aqua-mail.com] On 12 August 2015 7:50:28 am Aaron Mayfield <amayfield@artisaninfrastructure.com [mailto:amayfield@artisaninfrastructure.com]> wrote: Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade: [2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers [2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers [2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers [2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers [2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers [2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers [2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers [2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers [2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers [2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers [2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers [2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers [2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other. So I have some kind of performance issue. One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not. Going to continue to try the poller manually and see if I can figure out where the slowdown is. From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.org [mailto:observium@observium.org] Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb Man, why didn't I think of this? This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module. If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data. If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database. adam. On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cx [mailto:tom.laermans@powersource.cx]> wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org [mailto:observium@observium.org] Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 [tel:512.600.4297] www.artisaninfrastructure.com [http://www.artisaninfrastructure.com] Partner portal: https://portal.vpdc.us [https://portal.vpdc.us] Partner support: support@artisaninfrastructure.com [mailto:support@artisaninfrastructure.com]
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org [mailto:observium@observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium]
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.org [mailto:observium@observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium] This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org [mailto:observium%40observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium] This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.org [mailto:observium@observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium]
_______________________________________________ observium mailing list observium@observium.org [mailto:observium%40observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium]
_______________________________________________ observium mailing list observium@observium.org [mailto:observium@observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium]
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/a4042920f4bf89a219241c65ae64c5d8.jpg?s=120&d=mm&r=g)
Just for some comparative numbers, both pre and post change… polling a Cisco 7609 without fdb-table takes 15-20 seconds. With fdb-table, it takes 130-140 seconds.
…Ron
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Monday, August 17, 2015 3:37 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
These changes are now in both trunk and stable. I've disabled the table output until we can investigate the performance of large table outputs.
adam.
On 17/08/2015 20:59:08, Bill Fenner fenner@gmail.com wrote:
I still had another device that had worse fdb-table performance time than I would expect, and found a little bug in the code that updates fdb_status. If you ever end up with an entry that changes its fdb_status value (e.g., from "learned" to "invalid" before it gets aged out), this code causes the new value to be '' instead of the new value; then every subsequent run finds that fdb_status is wrong and attempts to rewrite it but rewrites it to ''. The diff:
http://www.fenron.com/~fenner/observium-update-fdb-status.diff
This brought this device's fdb-table down from 40s to 18s.
Bill
On Mon, Aug 17, 2015 at 3:41 PM, Adam Armstrong adama@memetic.org wrote:
Ahh. Thanks Bill! This would explain it.
I don't have any devices with large numbers of fdb entries, so I wouldn't have caught the abysmal performance.
I'll disable the table mode for this module.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 17 August 2015 20:15:37 Bill Fenner fenner@gmail.com wrote:
I've been disabling the fdb-tables poller in my observium instances due to performance problems for a while. Adam poked me to look into this, so I root caused it to the code that displays the contents of the fdb to the terminal. You can apply
http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff
to see this effect - it prints out the time taken to print the table if it takes more than 2 seconds. In my test case, it takes around 30 seconds to print a 2800-entry fdb (and the fdb-table module takes around 33 seconds).
So, it's nothing Arista-specific - it's some behavior of the table printer. The workaround is to comment out the calls to print_cli_table() in includes/polling/fdb-table.inc.php . Polling my sample Arista with about 2800 FDB entries, the module takes 33.4 seconds with the print_cli_table() calls and 6 seconds without.
Bill
On Mon, Aug 17, 2015 at 10:02 AM, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently?
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Thursday, August 13, 2015 5:54 PM
To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots.
Thanks
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Thursday, August 13, 2015 4:48 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers
[2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers
[2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers
[2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers
[2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers
[2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers
[2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers
[2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers
[2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers
[2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers
[2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers
[2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers
[2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans tom.laermans@powersource.cx wrote:
If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
Cisco is a special case since it requires per-vlan context polling.
Also, your numbers are terribly confusing. Pre and post change, but you give numbers for with and without the module, not numbers for the module pre and post change.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 00:15:12 "Ron Marosko" ron@rjr-services.com wrote:
Just for some comparative numbers, both pre and post change… polling a Cisco 7609 without fdb-table takes 15-20 seconds. With fdb-table, it takes 130-140 seconds.
…Ron
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Monday, August 17, 2015 3:37 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
These changes are now in both trunk and stable. I've disabled the table output until we can investigate the performance of large table outputs.
adam.
On 17/08/2015 20:59:08, Bill Fenner fenner@gmail.com wrote:
I still had another device that had worse fdb-table performance time than I would expect, and found a little bug in the code that updates fdb_status. If you ever end up with an entry that changes its fdb_status value (e.g., from "learned" to "invalid" before it gets aged out), this code causes the new value to be '' instead of the new value; then every subsequent run finds that fdb_status is wrong and attempts to rewrite it but rewrites it to ''. The diff:
http://www.fenron.com/~fenner/observium-update-fdb-status.diff
This brought this device's fdb-table down from 40s to 18s.
Bill
On Mon, Aug 17, 2015 at 3:41 PM, Adam Armstrong adama@memetic.org wrote:
Ahh. Thanks Bill! This would explain it.
I don't have any devices with large numbers of fdb entries, so I wouldn't have caught the abysmal performance.
I'll disable the table mode for this module.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 17 August 2015 20:15:37 Bill Fenner fenner@gmail.com wrote:
I've been disabling the fdb-tables poller in my observium instances due to performance problems for a while. Adam poked me to look into this, so I root caused it to the code that displays the contents of the fdb to the terminal. You can apply
http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff
to see this effect - it prints out the time taken to print the table if it takes more than 2 seconds. In my test case, it takes around 30 seconds to print a 2800-entry fdb (and the fdb-table module takes around 33 seconds).
So, it's nothing Arista-specific - it's some behavior of the table printer. The workaround is to comment out the calls to print_cli_table() in includes/polling/fdb-table.inc.php . Polling my sample Arista with about 2800 FDB entries, the module takes 33.4 seconds with the print_cli_table() calls and 6 seconds without.
Bill
On Mon, Aug 17, 2015 at 10:02 AM, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently?
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Thursday, August 13, 2015 5:54 PM
To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots.
Thanks
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Thursday, August 13, 2015 4:48 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers
[2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers
[2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers
[2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers
[2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers
[2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers
[2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers
[2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers
[2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers
[2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers
[2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers
[2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers
[2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans tom.laermans@powersource.cx wrote:
If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/a4042920f4bf89a219241c65ae64c5d8.jpg?s=120&d=mm&r=g)
Yeah, minor brainfart on specifics… it was a rough thing and I can’t say I saw much change… refresh me on how to svn back down to the previous version and I’ll give you exact numbers for the module and on which svn release. I’m on the “stable” train.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 2:47 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Cisco is a special case since it requires per-vlan context polling.
Also, your numbers are terribly confusing. Pre and post change, but you give numbers for with and without the module, not numbers for the module pre and post change.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 00:15:12 "Ron Marosko" ron@rjr-services.com wrote:
Just for some comparative numbers, both pre and post change… polling a Cisco 7609 without fdb-table takes 15-20 seconds. With fdb-table, it takes 130-140 seconds.
…Ron
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Monday, August 17, 2015 3:37 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
These changes are now in both trunk and stable. I've disabled the table output until we can investigate the performance of large table outputs.
adam.
On 17/08/2015 20:59:08, Bill Fenner fenner@gmail.com wrote:
I still had another device that had worse fdb-table performance time than I would expect, and found a little bug in the code that updates fdb_status. If you ever end up with an entry that changes its fdb_status value (e.g., from "learned" to "invalid" before it gets aged out), this code causes the new value to be '' instead of the new value; then every subsequent run finds that fdb_status is wrong and attempts to rewrite it but rewrites it to ''. The diff:
http://www.fenron.com/~fenner/observium-update-fdb-status.diff
This brought this device's fdb-table down from 40s to 18s.
Bill
On Mon, Aug 17, 2015 at 3:41 PM, Adam Armstrong adama@memetic.org wrote:
Ahh. Thanks Bill! This would explain it.
I don't have any devices with large numbers of fdb entries, so I wouldn't have caught the abysmal performance.
I'll disable the table mode for this module.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 17 August 2015 20:15:37 Bill Fenner fenner@gmail.com wrote:
I've been disabling the fdb-tables poller in my observium instances due to performance problems for a while. Adam poked me to look into this, so I root caused it to the code that displays the contents of the fdb to the terminal. You can apply
http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff
to see this effect - it prints out the time taken to print the table if it takes more than 2 seconds. In my test case, it takes around 30 seconds to print a 2800-entry fdb (and the fdb-table module takes around 33 seconds).
So, it's nothing Arista-specific - it's some behavior of the table printer. The workaround is to comment out the calls to print_cli_table() in includes/polling/fdb-table.inc.php . Polling my sample Arista with about 2800 FDB entries, the module takes 33.4 seconds with the print_cli_table() calls and 6 seconds without.
Bill
On Mon, Aug 17, 2015 at 10:02 AM, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently?
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Thursday, August 13, 2015 5:54 PM
To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots.
Thanks
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Thursday, August 13, 2015 4:48 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers
[2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers
[2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers
[2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers
[2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers
[2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers
[2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers
[2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers
[2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers
[2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers
[2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers
[2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers
[2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans tom.laermans@powersource.cx wrote:
If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
I guess this is just because cisco kit is /super/ slow at fdb-table for other reasons. We strongly recommend disabling it on these devices :)
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 3:38:50 pm "Ron Marosko" ron@rjr-services.com wrote:
Yeah, minor brainfart on specifics… it was a rough thing and I can’t say I saw much change… refresh me on how to svn back down to the previous version and I’ll give you exact numbers for the module and on which svn release. I’m on the “stable” train.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 2:47 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Cisco is a special case since it requires per-vlan context polling.
Also, your numbers are terribly confusing. Pre and post change, but you give numbers for with and without the module, not numbers for the module pre and post change.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 00:15:12 "Ron Marosko" ron@rjr-services.com wrote:
Just for some comparative numbers, both pre and post change… polling a Cisco 7609 without fdb-table takes 15-20 seconds. With fdb-table, it takes 130-140 seconds.
…Ron
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Monday, August 17, 2015 3:37 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
These changes are now in both trunk and stable. I've disabled the table output until we can investigate the performance of large table outputs.
adam.
On 17/08/2015 20:59:08, Bill Fenner fenner@gmail.com wrote:
I still had another device that had worse fdb-table performance time than I would expect, and found a little bug in the code that updates fdb_status. If you ever end up with an entry that changes its fdb_status value (e.g., from "learned" to "invalid" before it gets aged out), this code causes the new value to be '' instead of the new value; then every subsequent run finds that fdb_status is wrong and attempts to rewrite it but rewrites it to ''. The diff:
http://www.fenron.com/~fenner/observium-update-fdb-status.diff
This brought this device's fdb-table down from 40s to 18s.
Bill
On Mon, Aug 17, 2015 at 3:41 PM, Adam Armstrong adama@memetic.org wrote:
Ahh. Thanks Bill! This would explain it.
I don't have any devices with large numbers of fdb entries, so I wouldn't have caught the abysmal performance.
I'll disable the table mode for this module.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 17 August 2015 20:15:37 Bill Fenner fenner@gmail.com wrote:
I've been disabling the fdb-tables poller in my observium instances due to performance problems for a while. Adam poked me to look into this, so I root caused it to the code that displays the contents of the fdb to the terminal. You can apply
http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff
to see this effect - it prints out the time taken to print the table if it takes more than 2 seconds. In my test case, it takes around 30 seconds to print a 2800-entry fdb (and the fdb-table module takes around 33 seconds).
So, it's nothing Arista-specific - it's some behavior of the table printer. The workaround is to comment out the calls to print_cli_table() in includes/polling/fdb-table.inc.php . Polling my sample Arista with about 2800 FDB entries, the module takes 33.4 seconds with the print_cli_table() calls and 6 seconds without.
Bill
On Mon, Aug 17, 2015 at 10:02 AM, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently?
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Thursday, August 13, 2015 5:54 PM
To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots.
Thanks
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Thursday, August 13, 2015 4:48 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers
[2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers
[2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers
[2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers
[2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers
[2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers
[2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers
[2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers
[2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers
[2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers
[2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers
[2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers
[2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans tom.laermans@powersource.cx wrote:
If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/7941076427c7cfce44646fa3eba4be42.jpg?s=120&d=mm&r=g)
I did some extensive experimenting this morning.
First of all, I updated to version 6894. On this version I still am seeing poor performance for fdb-table (~86 seconds to run, 100% CPU core utilization). So at this point I started reverting back and forth between older versions (using “svn update –r XXXX) and timing the module performance against a specific switch (device 17).
Through this process I was able to discover the exact version point at which the fdb-table module got slow.
If I do an “sudo svn update -r 6847”, I get fdb-table performance of ~27 seconds:
Module [ fdb-table ] time: 26.75s Graphs [checked]: ping, ping_snmp, uptime, fdb_count
Polled in 26.9278 seconds UPDATED!
Checking alerts Memory usage: 12MB (peak: 22.75MB) MySQL: Cell[0/0s] Row[2/0s] Rows[8/0.02s] Column[0/0s] Update[37/0.02s] Insert[73/0.05s] Delete[0/0s]
If I go to the next version up (sudo svn update -r 6848), I get an fdb-table time of ~85 seconds:
##### Completed polling run at 2015-08-18 10:52:48
o Devices Polled 1 o Poller Time 85.65 secs o Memory usage 22.75MB (peak: 34.25MB) o MySQL Usage Cell[1/0s] Row[5/0.001s] Rows[8/0.021s] Column[0/0s] Update[76/0.035s] Insert[54/0.025s] Delete[57/0.019s] o RRDTool Usage update[5/0.006s]
To make sure this wasn’t a fluke, I switched back and forth between version 6847 and 6848 three times and tested the runtime on each version. Consistently, version 6847 ran ~30 seconds and version 6848 ran ~85 seconds. I also see lower CPU core utilization on the older version.
Can anyone else test this and see what results you get?
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 10:14 AM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
I guess this is just because cisco kit is /super/ slow at fdb-table for other reasons. We strongly recommend disabling it on these devices :)
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 3:38:50 pm "Ron Marosko" <ron@rjr-services.commailto:ron@rjr-services.com> wrote: Yeah, minor brainfart on specifics… it was a rough thing and I can’t say I saw much change… refresh me on how to svn back down to the previous version and I’ll give you exact numbers for the module and on which svn release. I’m on the “stable” train.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 2:47 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Cisco is a special case since it requires per-vlan context polling.
Also, your numbers are terribly confusing. Pre and post change, but you give numbers for with and without the module, not numbers for the module pre and post change.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 00:15:12 "Ron Marosko" <ron@rjr-services.commailto:ron@rjr-services.com> wrote: Just for some comparative numbers, both pre and post change… polling a Cisco 7609 without fdb-table takes 15-20 seconds. With fdb-table, it takes 130-140 seconds.
…Ron
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Monday, August 17, 2015 3:37 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
These changes are now in both trunk and stable. I've disabled the table output until we can investigate the performance of large table outputs.
adam.
On 17/08/2015 20:59:08, Bill Fenner <fenner@gmail.commailto:fenner@gmail.com> wrote: I still had another device that had worse fdb-table performance time than I would expect, and found a little bug in the code that updates fdb_status. If you ever end up with an entry that changes its fdb_status value (e.g., from "learned" to "invalid" before it gets aged out), this code causes the new value to be '' instead of the new value; then every subsequent run finds that fdb_status is wrong and attempts to rewrite it but rewrites it to ''. The diff:
http://www.fenron.com/~fenner/observium-update-fdb-status.diff
This brought this device's fdb-table down from 40s to 18s.
Bill
On Mon, Aug 17, 2015 at 3:41 PM, Adam Armstrong <adama@memetic.orgmailto:adama@memetic.org> wrote:
Ahh. Thanks Bill! This would explain it.
I don't have any devices with large numbers of fdb entries, so I wouldn't have caught the abysmal performance.
I'll disable the table mode for this module.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 17 August 2015 20:15:37 Bill Fenner <fenner@gmail.commailto:fenner@gmail.com> wrote: I've been disabling the fdb-tables poller in my observium instances due to performance problems for a while. Adam poked me to look into this, so I root caused it to the code that displays the contents of the fdb to the terminal. You can apply
http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff
to see this effect - it prints out the time taken to print the table if it takes more than 2 seconds. In my test case, it takes around 30 seconds to print a 2800-entry fdb (and the fdb-table module takes around 33 seconds).
So, it's nothing Arista-specific - it's some behavior of the table printer. The workaround is to comment out the calls to print_cli_table() in includes/polling/fdb-table.inc.php . Polling my sample Arista with about 2800 FDB entries, the module takes 33.4 seconds with the print_cli_table() calls and 6 seconds without.
Bill
On Mon, Aug 17, 2015 at 10:02 AM, Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently?
From: observium [mailto:observium-bounces@observium.orgmailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Thursday, August 13, 2015 5:54 PM
To: Observium Network Observation System <observium@observium.orgmailto:observium@observium.org> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots.
Thanks
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Thursday, August 13, 2015 4:48 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System <observium@observium.orgmailto:observium@observium.org> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers [2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers [2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers [2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers [2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers [2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers [2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers [2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers [2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers [2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers [2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers [2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers [2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cxmailto:tom.laermans@powersource.cx> wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297tel:512.600.4297 www.artisaninfrastructure.comhttp://www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.commailto:support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
![](https://secure.gravatar.com/avatar/a4042920f4bf89a219241c65ae64c5d8.jpg?s=120&d=mm&r=g)
So on that C7609….
Did a svn up –r 6847 (which I think took me down to 6833 as I’m on stable instead of current), and ran the poller –m fdb-table against it.
Module [ fdb-table ] time: 96.1547s
Graphs [checked]: ping, ping_snmp, uptime, fdb_count, port_fdb_count
Then did svn up –r 6895 since that’s the latest current and ran the poller –m fdb-table again:
##### Module Start: fdb-table #####
ERROR: Device does not support per-VLAN community.
+------+--------------+--------+----------+----------+---------+---------+
| VLAN | MAC Address | Port | Port ID | FDB Port | ifIndex | Status |
+------+--------------+--------+----------+----------+---------+---------+
<deleted a bunch of entries>
+------+--------------+--------+----------+----------+---------+---------+
o Module time 14.7221s
Oh, that’s interesting. ;-)
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Tuesday, August 18, 2015 11:13 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
I did some extensive experimenting this morning.
First of all, I updated to version 6894. On this version I still am seeing poor performance for fdb-table (~86 seconds to run, 100% CPU core utilization). So at this point I started reverting back and forth between older versions (using “svn update –r XXXX) and timing the module performance against a specific switch (device 17).
Through this process I was able to discover the exact version point at which the fdb-table module got slow.
If I do an “sudo svn update -r 6847”, I get fdb-table performance of ~27 seconds:
Module [ fdb-table ] time: 26.75s
Graphs [checked]: ping, ping_snmp, uptime, fdb_count
Polled in 26.9278 seconds
UPDATED!
Checking alerts
Memory usage: 12MB (peak: 22.75MB)
MySQL: Cell[0/0s] Row[2/0s] Rows[8/0.02s] Column[0/0s] Update[37/0.02s] Insert[73/0.05s] Delete[0/0s]
If I go to the next version up (sudo svn update -r 6848), I get an fdb-table time of ~85 seconds:
##### Completed polling run at 2015-08-18 10:52:48
o Devices Polled 1
o Poller Time 85.65 secs
o Memory usage 22.75MB (peak: 34.25MB)
o MySQL Usage Cell[1/0s] Row[5/0.001s] Rows[8/0.021s] Column[0/0s] Update[76/0.035s] Insert[54/0.025s] Delete[57/0.019s]
o RRDTool Usage update[5/0.006s]
To make sure this wasn’t a fluke, I switched back and forth between version 6847 and 6848 three times and tested the runtime on each version. Consistently, version 6847 ran ~30 seconds and version 6848 ran ~85 seconds. I also see lower CPU core utilization on the older version.
Can anyone else test this and see what results you get?
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 10:14 AM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
I guess this is just because cisco kit is /super/ slow at fdb-table for other reasons. We strongly recommend disabling it on these devices :)
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 3:38:50 pm "Ron Marosko" ron@rjr-services.com wrote:
Yeah, minor brainfart on specifics… it was a rough thing and I can’t say I saw much change… refresh me on how to svn back down to the previous version and I’ll give you exact numbers for the module and on which svn release. I’m on the “stable” train.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 2:47 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Cisco is a special case since it requires per-vlan context polling.
Also, your numbers are terribly confusing. Pre and post change, but you give numbers for with and without the module, not numbers for the module pre and post change.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 00:15:12 "Ron Marosko" ron@rjr-services.com wrote:
Just for some comparative numbers, both pre and post change… polling a Cisco 7609 without fdb-table takes 15-20 seconds. With fdb-table, it takes 130-140 seconds.
…Ron
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Monday, August 17, 2015 3:37 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
These changes are now in both trunk and stable. I've disabled the table output until we can investigate the performance of large table outputs.
adam.
On 17/08/2015 20:59:08, Bill Fenner fenner@gmail.com wrote:
I still had another device that had worse fdb-table performance time than I would expect, and found a little bug in the code that updates fdb_status. If you ever end up with an entry that changes its fdb_status value (e.g., from "learned" to "invalid" before it gets aged out), this code causes the new value to be '' instead of the new value; then every subsequent run finds that fdb_status is wrong and attempts to rewrite it but rewrites it to ''. The diff:
http://www.fenron.com/~fenner/observium-update-fdb-status.diff
This brought this device's fdb-table down from 40s to 18s.
Bill
On Mon, Aug 17, 2015 at 3:41 PM, Adam Armstrong adama@memetic.org wrote:
Ahh. Thanks Bill! This would explain it.
I don't have any devices with large numbers of fdb entries, so I wouldn't have caught the abysmal performance.
I'll disable the table mode for this module.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 17 August 2015 20:15:37 Bill Fenner fenner@gmail.com wrote:
I've been disabling the fdb-tables poller in my observium instances due to performance problems for a while. Adam poked me to look into this, so I root caused it to the code that displays the contents of the fdb to the terminal. You can apply
http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff
to see this effect - it prints out the time taken to print the table if it takes more than 2 seconds. In my test case, it takes around 30 seconds to print a 2800-entry fdb (and the fdb-table module takes around 33 seconds).
So, it's nothing Arista-specific - it's some behavior of the table printer. The workaround is to comment out the calls to print_cli_table() in includes/polling/fdb-table.inc.php . Polling my sample Arista with about 2800 FDB entries, the module takes 33.4 seconds with the print_cli_table() calls and 6 seconds without.
Bill
On Mon, Aug 17, 2015 at 10:02 AM, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently?
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Thursday, August 13, 2015 5:54 PM
To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots.
Thanks
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Thursday, August 13, 2015 4:48 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers
[2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers
[2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers
[2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers
[2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers
[2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers
[2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers
[2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers
[2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers
[2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers
[2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers
[2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers
[2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans tom.laermans@powersource.cx wrote:
If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
That's related to the database update bug that bill fixed yesterday. :)
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 5:41:46 pm "Ron Marosko" ron@rjr-services.com wrote:
So on that C7609….
Did a svn up –r 6847 (which I think took me down to 6833 as I’m on stable instead of current), and ran the poller –m fdb-table against it.
Module [ fdb-table ] time: 96.1547s
Graphs [checked]: ping, ping_snmp, uptime, fdb_count, port_fdb_count
Then did svn up –r 6895 since that’s the latest current and ran the poller –m fdb-table again:
##### Module Start: fdb-table #####
ERROR: Device does not support per-VLAN community.
+------+--------------+--------+----------+----------+---------+---------+
| VLAN | MAC Address | Port | Port ID | FDB Port | ifIndex | Status |
+------+--------------+--------+----------+----------+---------+---------+
<deleted a bunch of entries>
+------+--------------+--------+----------+----------+---------+---------+
o Module time 14.7221s
Oh, that’s interesting. ;-)
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Tuesday, August 18, 2015 11:13 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
I did some extensive experimenting this morning.
First of all, I updated to version 6894. On this version I still am seeing poor performance for fdb-table (~86 seconds to run, 100% CPU core utilization). So at this point I started reverting back and forth between older versions (using “svn update –r XXXX) and timing the module performance against a specific switch (device 17).
Through this process I was able to discover the exact version point at which the fdb-table module got slow.
If I do an “sudo svn update -r 6847”, I get fdb-table performance of ~27 seconds:
Module [ fdb-table ] time: 26.75s
Graphs [checked]: ping, ping_snmp, uptime, fdb_count
Polled in 26.9278 seconds
UPDATED!
Checking alerts
Memory usage: 12MB (peak: 22.75MB)
MySQL: Cell[0/0s] Row[2/0s] Rows[8/0.02s] Column[0/0s] Update[37/0.02s] Insert[73/0.05s] Delete[0/0s]
If I go to the next version up (sudo svn update -r 6848), I get an fdb-table time of ~85 seconds:
##### Completed polling run at 2015-08-18 10:52:48
o Devices Polled 1
o Poller Time 85.65 secs
o Memory usage 22.75MB (peak: 34.25MB)
o MySQL Usage Cell[1/0s] Row[5/0.001s] Rows[8/0.021s] Column[0/0s] Update[76/0.035s] Insert[54/0.025s] Delete[57/0.019s]
o RRDTool Usage update[5/0.006s]
To make sure this wasn’t a fluke, I switched back and forth between version 6847 and 6848 three times and tested the runtime on each version. Consistently, version 6847 ran ~30 seconds and version 6848 ran ~85 seconds. I also see lower CPU core utilization on the older version.
Can anyone else test this and see what results you get?
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 10:14 AM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
I guess this is just because cisco kit is /super/ slow at fdb-table for other reasons. We strongly recommend disabling it on these devices :)
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 3:38:50 pm "Ron Marosko" ron@rjr-services.com wrote:
Yeah, minor brainfart on specifics… it was a rough thing and I can’t say I saw much change… refresh me on how to svn back down to the previous version and I’ll give you exact numbers for the module and on which svn release. I’m on the “stable” train.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 2:47 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Cisco is a special case since it requires per-vlan context polling.
Also, your numbers are terribly confusing. Pre and post change, but you give numbers for with and without the module, not numbers for the module pre and post change.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 00:15:12 "Ron Marosko" ron@rjr-services.com wrote:
Just for some comparative numbers, both pre and post change… polling a Cisco 7609 without fdb-table takes 15-20 seconds. With fdb-table, it takes 130-140 seconds.
…Ron
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Monday, August 17, 2015 3:37 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
These changes are now in both trunk and stable. I've disabled the table output until we can investigate the performance of large table outputs.
adam.
On 17/08/2015 20:59:08, Bill Fenner fenner@gmail.com wrote:
I still had another device that had worse fdb-table performance time than I would expect, and found a little bug in the code that updates fdb_status. If you ever end up with an entry that changes its fdb_status value (e.g., from "learned" to "invalid" before it gets aged out), this code causes the new value to be '' instead of the new value; then every subsequent run finds that fdb_status is wrong and attempts to rewrite it but rewrites it to ''. The diff:
http://www.fenron.com/~fenner/observium-update-fdb-status.diff
This brought this device's fdb-table down from 40s to 18s.
Bill
On Mon, Aug 17, 2015 at 3:41 PM, Adam Armstrong adama@memetic.org wrote:
Ahh. Thanks Bill! This would explain it.
I don't have any devices with large numbers of fdb entries, so I wouldn't have caught the abysmal performance.
I'll disable the table mode for this module.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 17 August 2015 20:15:37 Bill Fenner fenner@gmail.com wrote:
I've been disabling the fdb-tables poller in my observium instances due to performance problems for a while. Adam poked me to look into this, so I root caused it to the code that displays the contents of the fdb to the terminal. You can apply
http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff
to see this effect - it prints out the time taken to print the table if it takes more than 2 seconds. In my test case, it takes around 30 seconds to print a 2800-entry fdb (and the fdb-table module takes around 33 seconds).
So, it's nothing Arista-specific - it's some behavior of the table printer. The workaround is to comment out the calls to print_cli_table() in includes/polling/fdb-table.inc.php . Polling my sample Arista with about 2800 FDB entries, the module takes 33.4 seconds with the print_cli_table() calls and 6 seconds without.
Bill
On Mon, Aug 17, 2015 at 10:02 AM, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently?
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Thursday, August 13, 2015 5:54 PM
To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots.
Thanks
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Thursday, August 13, 2015 4:48 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers
[2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers
[2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers
[2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers
[2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers
[2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers
[2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers
[2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers
[2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers
[2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers
[2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers
[2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers
[2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans tom.laermans@powersource.cx wrote:
If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/7941076427c7cfce44646fa3eba4be42.jpg?s=120&d=mm&r=g)
Should this update bug fix be apparent in version 6895? Still seeing the ~85 second runtime on 6895:
[amayfield@kc-netview observium]$ sudo svn update At revision 6895. [amayfield@kc-netview observium]$ sudo ./poller.php -h 17 -m fdb-table
___ _ _ / _ \ | |__ ___ ___ _ __ __ __(_) _ _ _ __ ___ | | | || '_ \ / __| / _ | '__|\ \ / /| || | | || '_ ` _ \ | |_| || |_) |__ | __/| | \ V / | || |_| || | | | | | ___/ |_.__/ |___/ ___||_| _/ |_| __,_||_| |_| |_| Observium Professional 0.15.8.6894 http://www.observium.org
##### Starting polling run at 2015-08-18 12:18:34 #####
##### au000a-u25-swa10g [17] #####
o OS arista_eos o Last poll duration 21.20 seconds o Last Polled 2015-08-18 12:15:23 o SNMP Version v2c o Device status Device is reachable by PING (14.1ms) and SNMP (27.64ms) o Modules Enabled system, os, fdb-table
##### Module Start: system #####
o Uptime 332 days, 9h 22m 45s o sysObjectID .1.3.6.1.4.1.30065.1.3011.7150.3282.24 o snmpEngineID F5717F001C731E9CBC00 o sysDescr Arista Networks EOS version 4.13.8M running on an Arista Networks DCS-7150S-24 o sysName au000a-u25-swa10g o Location 7301 Metropolis Drive, Austin, Texas 78744,US o Module time 0.1291s
##### Module Start: os #####
o OS Poller OS o Hardware DCS-7150S-24 o Version 4.13.8M o Features <empty> o Serial <empty> o Asset <empty>
o Module time 0.0033s
##### Module Start: fdb-table #####
+------+--------------+-------+---------+----------+---------+---------+ | VLAN | MAC Address | Port | Port ID | FDB Port | ifIndex | Status | +------+--------------+-------+---------+----------+---------+---------+
***data omitted***
| 4021 | 748ef8a7ad41 | Po85 | 1023 | 110 | 1000085 | learned | | 4094 | 001c731e968a | Po10 | 1021 | 103 | 1000010 | learned | +------+--------------+-------+---------+----------+---------+---------+
o Module time 88.8261s
##### au000a-u25-swa10g [17] completed poller modules at 2015-08-18 12:20:03 #####
o Graphs [checked] ping, ping_snmp, uptime o Graphs [added] fdb_count o Poller time 89.025 seconds o Updated Data uptime, last_polled, last_polled_timetaken, device_state
##### Completed polling run at 2015-08-18 12:20:03 #####
o Devices Polled 1 o Poller Time 89.08 secs o Memory usage 22.75MB (peak: 33.25MB) o MySQL Usage Cell[0/0s] Row[5/0.001s] Rows[8/0.032s] Column[0/0s] Update[28/0.014s] Insert[247/0.098s] Delete[200/0.09s] o RRDTool Usage update[5/0.005s]
[amayfield@kc-netview observium]$
I notice the poller still says version 6894, but I’m guessing maybe the version number didn’t get updated or something.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 12:11 PM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
That's related to the database update bug that bill fixed yesterday. :)
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 5:41:46 pm "Ron Marosko" <ron@rjr-services.commailto:ron@rjr-services.com> wrote: So on that C7609….
Did a svn up –r 6847 (which I think took me down to 6833 as I’m on stable instead of current), and ran the poller –m fdb-table against it.
Module [ fdb-table ] time: 96.1547s Graphs [checked]: ping, ping_snmp, uptime, fdb_count, port_fdb_count
Then did svn up –r 6895 since that’s the latest current and ran the poller –m fdb-table again: ##### Module Start: fdb-table #####
ERROR: Device does not support per-VLAN community. +------+--------------+--------+----------+----------+---------+---------+ | VLAN | MAC Address | Port | Port ID | FDB Port | ifIndex | Status | +------+--------------+--------+----------+----------+---------+---------+ <deleted a bunch of entries> +------+--------------+--------+----------+----------+---------+---------+ o Module time 14.7221s
Oh, that’s interesting. ;-)
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Tuesday, August 18, 2015 11:13 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
I did some extensive experimenting this morning.
First of all, I updated to version 6894. On this version I still am seeing poor performance for fdb-table (~86 seconds to run, 100% CPU core utilization). So at this point I started reverting back and forth between older versions (using “svn update –r XXXX) and timing the module performance against a specific switch (device 17).
Through this process I was able to discover the exact version point at which the fdb-table module got slow.
If I do an “sudo svn update -r 6847”, I get fdb-table performance of ~27 seconds:
Module [ fdb-table ] time: 26.75s Graphs [checked]: ping, ping_snmp, uptime, fdb_count
Polled in 26.9278 seconds UPDATED!
Checking alerts Memory usage: 12MB (peak: 22.75MB) MySQL: Cell[0/0s] Row[2/0s] Rows[8/0.02s] Column[0/0s] Update[37/0.02s] Insert[73/0.05s] Delete[0/0s]
If I go to the next version up (sudo svn update -r 6848), I get an fdb-table time of ~85 seconds:
##### Completed polling run at 2015-08-18 10:52:48
o Devices Polled 1 o Poller Time 85.65 secs o Memory usage 22.75MB (peak: 34.25MB) o MySQL Usage Cell[1/0s] Row[5/0.001s] Rows[8/0.021s] Column[0/0s] Update[76/0.035s] Insert[54/0.025s] Delete[57/0.019s] o RRDTool Usage update[5/0.006s]
To make sure this wasn’t a fluke, I switched back and forth between version 6847 and 6848 three times and tested the runtime on each version. Consistently, version 6847 ran ~30 seconds and version 6848 ran ~85 seconds. I also see lower CPU core utilization on the older version.
Can anyone else test this and see what results you get?
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 10:14 AM To: Observium Network Observation System <observium@observium.orgmailto:observium@observium.org> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
I guess this is just because cisco kit is /super/ slow at fdb-table for other reasons. We strongly recommend disabling it on these devices :)
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 3:38:50 pm "Ron Marosko" <ron@rjr-services.commailto:ron@rjr-services.com> wrote: Yeah, minor brainfart on specifics… it was a rough thing and I can’t say I saw much change… refresh me on how to svn back down to the previous version and I’ll give you exact numbers for the module and on which svn release. I’m on the “stable” train.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 2:47 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Cisco is a special case since it requires per-vlan context polling.
Also, your numbers are terribly confusing. Pre and post change, but you give numbers for with and without the module, not numbers for the module pre and post change.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 00:15:12 "Ron Marosko" <ron@rjr-services.commailto:ron@rjr-services.com> wrote: Just for some comparative numbers, both pre and post change… polling a Cisco 7609 without fdb-table takes 15-20 seconds. With fdb-table, it takes 130-140 seconds.
…Ron
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Monday, August 17, 2015 3:37 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
These changes are now in both trunk and stable. I've disabled the table output until we can investigate the performance of large table outputs.
adam.
On 17/08/2015 20:59:08, Bill Fenner <fenner@gmail.commailto:fenner@gmail.com> wrote: I still had another device that had worse fdb-table performance time than I would expect, and found a little bug in the code that updates fdb_status. If you ever end up with an entry that changes its fdb_status value (e.g., from "learned" to "invalid" before it gets aged out), this code causes the new value to be '' instead of the new value; then every subsequent run finds that fdb_status is wrong and attempts to rewrite it but rewrites it to ''. The diff:
http://www.fenron.com/~fenner/observium-update-fdb-status.diff
This brought this device's fdb-table down from 40s to 18s.
Bill
On Mon, Aug 17, 2015 at 3:41 PM, Adam Armstrong <adama@memetic.orgmailto:adama@memetic.org> wrote:
Ahh. Thanks Bill! This would explain it.
I don't have any devices with large numbers of fdb entries, so I wouldn't have caught the abysmal performance.
I'll disable the table mode for this module.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 17 August 2015 20:15:37 Bill Fenner <fenner@gmail.commailto:fenner@gmail.com> wrote: I've been disabling the fdb-tables poller in my observium instances due to performance problems for a while. Adam poked me to look into this, so I root caused it to the code that displays the contents of the fdb to the terminal. You can apply
http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff
to see this effect - it prints out the time taken to print the table if it takes more than 2 seconds. In my test case, it takes around 30 seconds to print a 2800-entry fdb (and the fdb-table module takes around 33 seconds).
So, it's nothing Arista-specific - it's some behavior of the table printer. The workaround is to comment out the calls to print_cli_table() in includes/polling/fdb-table.inc.php . Polling my sample Arista with about 2800 FDB entries, the module takes 33.4 seconds with the print_cli_table() calls and 6 seconds without.
Bill
On Mon, Aug 17, 2015 at 10:02 AM, Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently?
From: observium [mailto:observium-bounces@observium.orgmailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Thursday, August 13, 2015 5:54 PM
To: Observium Network Observation System <observium@observium.orgmailto:observium@observium.org> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots.
Thanks
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Thursday, August 13, 2015 4:48 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System <observium@observium.orgmailto:observium@observium.org> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers [2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers [2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers [2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers [2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers [2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers [2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers [2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers [2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers [2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers [2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers [2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers [2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cxmailto:tom.laermans@powersource.cx> wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297tel:512.600.4297 www.artisaninfrastructure.comhttp://www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.commailto:support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
This is because I'm an idiot. I apparently didn't disable the table, somehow. Try now! :D
adam. On 18/08/2015 18:27:13, Aaron Mayfield amayfield@artisaninfrastructure.com wrote: Should this update bug fix be apparent in version 6895? Still seeing the ~85 second runtime on 6895: [amayfield@kc-netview observium]$ sudo svn update At revision 6895. [amayfield@kc-netview observium]$ sudo ./poller.php -h 17 -m fdb-table ___ _ _ / _ \ | |__ ___ ___ _ __ __ __(_) _ _ _ __ ___ | | | || '_ \ / __| / _ | '__|\ \ / /| || | | || '_ ` _ \ | |_| || |_) |__ | __/| | \ V / | || |_| || | | | | | ___/ |_.__/ |___/ ___||_| _/ |_| __,_||_| |_| |_| Observium Professional 0.15.8.6894 http://www.observium.org ##### Starting polling run at 2015-08-18 12:18:34 ##### ##### au000a-u25-swa10g [17] ##### o OS arista_eos o Last poll duration 21.20 seconds o Last Polled 2015-08-18 12:15:23 o SNMP Version v2c o Device status Device is reachable by PING (14.1ms) and SNMP (27.64ms) o Modules Enabled system, os, fdb-table ##### Module Start: system ##### o Uptime 332 days, 9h 22m 45s o sysObjectID .1.3.6.1.4.1.30065.1.3011.7150.3282.24 o snmpEngineID F5717F001C731E9CBC00 o sysDescr Arista Networks EOS version 4.13.8M running on an Arista Networks DCS-7150S-24 o sysName au000a-u25-swa10g o Location 7301 Metropolis Drive, Austin, Texas 78744,US o Module time 0.1291s ##### Module Start: os ##### o OS Poller OS o Hardware DCS-7150S-24 o Version 4.13.8M o Features <empty> o Serial <empty> o Asset <empty> o Module time 0.0033s ##### Module Start: fdb-table ##### +------+--------------+-------+---------+----------+---------+---------+ | VLAN | MAC Address | Port | Port ID | FDB Port | ifIndex | Status | +------+--------------+-------+---------+----------+---------+---------+ ***data omitted*** | 4021 | 748ef8a7ad41 | Po85 | 1023 | 110 | 1000085 | learned | | 4094 | 001c731e968a | Po10 | 1021 | 103 | 1000010 | learned | +------+--------------+-------+---------+----------+---------+---------+ o Module time 88.8261s ##### au000a-u25-swa10g [17] completed poller modules at 2015-08-18 12:20:03 ##### o Graphs [checked] ping, ping_snmp, uptime o Graphs [added] fdb_count o Poller time 89.025 seconds o Updated Data uptime, last_polled, last_polled_timetaken, device_state ##### Completed polling run at 2015-08-18 12:20:03 ##### o Devices Polled 1 o Poller Time 89.08 secs o Memory usage 22.75MB (peak: 33.25MB) o MySQL Usage Cell[0/0s] Row[5/0.001s] Rows[8/0.032s] Column[0/0s] Update[28/0.014s] Insert[247/0.098s] Delete[200/0.09s] o RRDTool Usage update[5/0.005s] [amayfield@kc-netview observium]$ I notice the poller still says version 6894, but I’m guessing maybe the version number didn’t get updated or something. From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 12:11 PM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb That's related to the database update bug that bill fixed yesterday. :) Adam. Sent with AquaMail for Android http://www.aqua-mail.com [http://www.aqua-mail.com] On 18 August 2015 5:41:46 pm "Ron Marosko" <ron@rjr-services.com [mailto:ron@rjr-services.com]> wrote: So on that C7609…. Did a svn up –r 6847 (which I think took me down to 6833 as I’m on stable instead of current), and ran the poller –m fdb-table against it. Module [ fdb-table ] time: 96.1547s Graphs [checked]: ping, ping_snmp, uptime, fdb_count, port_fdb_count Then did svn up –r 6895 since that’s the latest current and ran the poller –m fdb-table again: ##### Module Start: fdb-table ##### ERROR: Device does not support per-VLAN community. +------+--------------+--------+----------+----------+---------+---------+ | VLAN | MAC Address | Port | Port ID | FDB Port | ifIndex | Status | +------+--------------+--------+----------+----------+---------+---------+ <deleted a bunch of entries> +------+--------------+--------+----------+----------+---------+---------+ o Module time 14.7221s Oh, that’s interesting. ;-) From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Aaron Mayfield Sent: Tuesday, August 18, 2015 11:13 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb I did some extensive experimenting this morning. First of all, I updated to version 6894. On this version I still am seeing poor performance for fdb-table (~86 seconds to run, 100% CPU core utilization). So at this point I started reverting back and forth between older versions (using “svn update –r XXXX) and timing the module performance against a specific switch (device 17). Through this process I was able to discover the exact version point at which the fdb-table module got slow. If I do an “sudo svn update -r 6847”, I get fdb-table performance of ~27 seconds:
Module [ fdb-table ] time: 26.75s Graphs [checked]: ping, ping_snmp, uptime, fdb_count Polled in 26.9278 seconds UPDATED! Checking alerts Memory usage: 12MB (peak: 22.75MB) MySQL: Cell[0/0s] Row[2/0s] Rows[8/0.02s] Column[0/0s] Update[37/0.02s] Insert[73/0.05s] Delete[0/0s] If I go to the next version up (sudo svn update -r 6848), I get an fdb-table time of ~85 seconds: ##### Completed polling run at 2015-08-18 10:52:48 o Devices Polled 1 o Poller Time 85.65 secs o Memory usage 22.75MB (peak: 34.25MB) o MySQL Usage Cell[1/0s] Row[5/0.001s] Rows[8/0.021s] Column[0/0s] Update[76/0.035s] Insert[54/0.025s] Delete[57/0.019s] o RRDTool Usage update[5/0.006s] To make sure this wasn’t a fluke, I switched back and forth between version 6847 and 6848 three times and tested the runtime on each version. Consistently, version 6847 ran ~30 seconds and version 6848 ran ~85 seconds. I also see lower CPU core utilization on the older version. Can anyone else test this and see what results you get? From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 10:14 AM To: Observium Network Observation System <observium@observium.org [mailto:observium@observium.org]> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb I guess this is just because cisco kit is /super/ slow at fdb-table for other reasons. We strongly recommend disabling it on these devices :) Adam. Sent with AquaMail for Android http://www.aqua-mail.com [http://www.aqua-mail.com] On 18 August 2015 3:38:50 pm "Ron Marosko" <ron@rjr-services.com [mailto:ron@rjr-services.com]> wrote: Yeah, minor brainfart on specifics… it was a rough thing and I can’t say I saw much change… refresh me on how to svn back down to the previous version and I’ll give you exact numbers for the module and on which svn release. I’m on the “stable” train. From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 2:47 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb Cisco is a special case since it requires per-vlan context polling. Also, your numbers are terribly confusing. Pre and post change, but you give numbers for with and without the module, not numbers for the module pre and post change. Adam. Sent with AquaMail for Android http://www.aqua-mail.com [http://www.aqua-mail.com] On 18 August 2015 00:15:12 "Ron Marosko" <ron@rjr-services.com [mailto:ron@rjr-services.com]> wrote: Just for some comparative numbers, both pre and post change… polling a Cisco 7609 without fdb-table takes 15-20 seconds. With fdb-table, it takes 130-140 seconds. …Ron From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Monday, August 17, 2015 3:37 PM To: observium@observium.org [mailto:observium@observium.org] Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb These changes are now in both trunk and stable. I've disabled the table output until we can investigate the performance of large table outputs. adam. On 17/08/2015 20:59:08, Bill Fenner <fenner@gmail.com [mailto:fenner@gmail.com]> wrote: I still had another device that had worse fdb-table performance time than I would expect, and found a little bug in the code that updates fdb_status. If you ever end up with an entry that changes its fdb_status value (e.g., from "learned" to "invalid" before it gets aged out), this code causes the new value to be '' instead of the new value; then every subsequent run finds that fdb_status is wrong and attempts to rewrite it but rewrites it to ''. The diff: http://www.fenron.com/~fenner/observium-update-fdb-status.diff [http://www.fenron.com/~fenner/observium-update-fdb-status.diff] This brought this device's fdb-table down from 40s to 18s. Bill On Mon, Aug 17, 2015 at 3:41 PM, Adam Armstrong <adama@memetic.org [mailto:adama@memetic.org]> wrote: Ahh. Thanks Bill! This would explain it. I don't have any devices with large numbers of fdb entries, so I wouldn't have caught the abysmal performance. I'll disable the table mode for this module. Adam. Sent with AquaMail for Android http://www.aqua-mail.com [http://www.aqua-mail.com] On 17 August 2015 20:15:37 Bill Fenner <fenner@gmail.com [mailto:fenner@gmail.com]> wrote: I've been disabling the fdb-tables poller in my observium instances due to performance problems for a while. Adam poked me to look into this, so I root caused it to the code that displays the contents of the fdb to the terminal. You can apply http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff [http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff] to see this effect - it prints out the time taken to print the table if it takes more than 2 seconds. In my test case, it takes around 30 seconds to print a 2800-entry fdb (and the fdb-table module takes around 33 seconds). So, it's nothing Arista-specific - it's some behavior of the table printer. The workaround is to comment out the calls to print_cli_table() in includes/polling/fdb-table.inc.php . Polling my sample Arista with about 2800 FDB entries, the module takes 33.4 seconds with the print_cli_table() calls and 6 seconds without. Bill On Mon, Aug 17, 2015 at 10:02 AM, Aaron Mayfield <amayfield@artisaninfrastructure.com [mailto:amayfield@artisaninfrastructure.com]> wrote: Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently? From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Aaron Mayfield Sent: Thursday, August 13, 2015 5:54 PM
To: Observium Network Observation System <observium@observium.org [mailto:observium@observium.org]> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots. Thanks From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Thursday, August 13, 2015 4:48 PM To: observium@observium.org [mailto:observium@observium.org] Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb You also want the device performance for one of these arista devices, where you'll see which module is taking the time. It's on the right hand side of the device navbar. adam. On 13/08/2015 22:46:05, Aaron Mayfield <amayfield@artisaninfrastructure.com [mailto:amayfield@artisaninfrastructure.com]> wrote: Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these. This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname. From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System <observium@observium.org [mailto:observium@observium.org]> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb Hi Aaron, Can you tell which device is taking a long time? You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar) Screenshots of those might help :) Thanks, Adam. Sent with AquaMail for Android http://www.aqua-mail.com [http://www.aqua-mail.com] On 12 August 2015 7:50:28 am Aaron Mayfield <amayfield@artisaninfrastructure.com [mailto:amayfield@artisaninfrastructure.com]> wrote: Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade: [2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers [2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers [2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers [2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers [2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers [2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers [2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers [2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers [2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers [2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers [2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers [2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers [2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other. So I have some kind of performance issue. One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not. Going to continue to try the poller manually and see if I can figure out where the slowdown is. From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.org [mailto:observium@observium.org] Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb Man, why didn't I think of this? This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module. If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data. If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database. adam. On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cx [mailto:tom.laermans@powersource.cx]> wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org [mailto:observium-bounces@observium.org]] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org [mailto:observium@observium.org] Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 [tel:512.600.4297] www.artisaninfrastructure.com [http://www.artisaninfrastructure.com] Partner portal: https://portal.vpdc.us [https://portal.vpdc.us] Partner support: support@artisaninfrastructure.com [mailto:support@artisaninfrastructure.com]
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org [mailto:observium@observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium]
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.org [mailto:observium@observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium] This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org [mailto:observium%40observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium] This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.org [mailto:observium@observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium] _______________________________________________ observium mailing list observium@observium.org [mailto:observium%40observium.org] MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium]
_______________________________________________ observium mailing list observium@observium.org [mailto:observium@observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium] _______________________________________________ observium mailing list observium@observium.org [mailto:observium@observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium] _______________________________________________ observium mailing list observium@observium.org [mailto:observium%40observium.org] MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium] _______________________________________________ observium mailing list observium@observium.org [mailto:observium%40observium.org] MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium] This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org [mailto:observium%40observium.org] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [http://postman.memetic.org/cgi-bin/mailman/listinfo/observium] This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
![](https://secure.gravatar.com/avatar/7941076427c7cfce44646fa3eba4be42.jpg?s=120&d=mm&r=g)
I had a dog and bingo was his name-o. That did the trick! Awesome!
##### au000a-u25-swa10g [17] completed poller modules at 2015-08-18 12:44:02 #####
o Graphs [checked] ping, ping_snmp, uptime o Graphs [added] fdb_count o Poller time 26.623 seconds o Updated Data uptime, last_polled, last_polled_timetaken, device_state
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 12:36 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
This is because I'm an idiot. I apparently didn't disable the table, somehow. Try now! :D
adam.
On 18/08/2015 18:27:13, Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Should this update bug fix be apparent in version 6895? Still seeing the ~85 second runtime on 6895:
[amayfield@kc-netview observium]$ sudo svn update At revision 6895. [amayfield@kc-netview observium]$ sudo ./poller.php -h 17 -m fdb-table
___ _ _ / _ \ | |__ ___ ___ _ __ __ __(_) _ _ _ __ ___ | | | || '_ \ / __| / _ | '__|\ \ / /| || | | || '_ ` _ \ | |_| || |_) |__ | __/| | \ V / | || |_| || | | | | | ___/ |_.__/ |___/ ___||_| _/ |_| __,_||_| |_| |_| Observium Professional 0.15.8.6894 http://www.observium.org
##### Starting polling run at 2015-08-18 12:18:34 #####
##### au000a-u25-swa10g [17] #####
o OS arista_eos o Last poll duration 21.20 seconds o Last Polled 2015-08-18 12:15:23 o SNMP Version v2c o Device status Device is reachable by PING (14.1ms) and SNMP (27.64ms) o Modules Enabled system, os, fdb-table
##### Module Start: system #####
o Uptime 332 days, 9h 22m 45s o sysObjectID .1.3.6.1.4.1.30065.1.3011.7150.3282.24 o snmpEngineID F5717F001C731E9CBC00 o sysDescr Arista Networks EOS version 4.13.8M running on an Arista Networks DCS-7150S-24 o sysName au000a-u25-swa10g o Location 7301 Metropolis Drive, Austin, Texas 78744,US o Module time 0.1291s
##### Module Start: os #####
o OS Poller OS o Hardware DCS-7150S-24 o Version 4.13.8M o Features <empty> o Serial <empty> o Asset <empty>
o Module time 0.0033s
##### Module Start: fdb-table #####
+------+--------------+-------+---------+----------+---------+---------+ | VLAN | MAC Address | Port | Port ID | FDB Port | ifIndex | Status | +------+--------------+-------+---------+----------+---------+---------+
***data omitted***
| 4021 | 748ef8a7ad41 | Po85 | 1023 | 110 | 1000085 | learned | | 4094 | 001c731e968a | Po10 | 1021 | 103 | 1000010 | learned | +------+--------------+-------+---------+----------+---------+---------+
o Module time 88.8261s
##### au000a-u25-swa10g [17] completed poller modules at 2015-08-18 12:20:03 #####
o Graphs [checked] ping, ping_snmp, uptime o Graphs [added] fdb_count o Poller time 89.025 seconds o Updated Data uptime, last_polled, last_polled_timetaken, device_state
##### Completed polling run at 2015-08-18 12:20:03 #####
o Devices Polled 1 o Poller Time 89.08 secs o Memory usage 22.75MB (peak: 33.25MB) o MySQL Usage Cell[0/0s] Row[5/0.001s] Rows[8/0.032s] Column[0/0s] Update[28/0.014s] Insert[247/0.098s] Delete[200/0.09s] o RRDTool Usage update[5/0.005s]
[amayfield@kc-netview observium]$
I notice the poller still says version 6894, but I’m guessing maybe the version number didn’t get updated or something.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 12:11 PM To: Observium Network Observation System <observium@observium.orgmailto:observium@observium.org> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
That's related to the database update bug that bill fixed yesterday. :)
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 5:41:46 pm "Ron Marosko" <ron@rjr-services.commailto:ron@rjr-services.com> wrote: So on that C7609….
Did a svn up –r 6847 (which I think took me down to 6833 as I’m on stable instead of current), and ran the poller –m fdb-table against it.
Module [ fdb-table ] time: 96.1547s Graphs [checked]: ping, ping_snmp, uptime, fdb_count, port_fdb_count
Then did svn up –r 6895 since that’s the latest current and ran the poller –m fdb-table again: ##### Module Start: fdb-table #####
ERROR: Device does not support per-VLAN community. +------+--------------+--------+----------+----------+---------+---------+ | VLAN | MAC Address | Port | Port ID | FDB Port | ifIndex | Status | +------+--------------+--------+----------+----------+---------+---------+ <deleted a bunch of entries> +------+--------------+--------+----------+----------+---------+---------+ o Module time 14.7221s
Oh, that’s interesting. ;-)
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Tuesday, August 18, 2015 11:13 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
I did some extensive experimenting this morning.
First of all, I updated to version 6894. On this version I still am seeing poor performance for fdb-table (~86 seconds to run, 100% CPU core utilization). So at this point I started reverting back and forth between older versions (using “svn update –r XXXX) and timing the module performance against a specific switch (device 17).
Through this process I was able to discover the exact version point at which the fdb-table module got slow.
If I do an “sudo svn update -r 6847”, I get fdb-table performance of ~27 seconds:
Module [ fdb-table ] time: 26.75s Graphs [checked]: ping, ping_snmp, uptime, fdb_count
Polled in 26.9278 seconds UPDATED!
Checking alerts Memory usage: 12MB (peak: 22.75MB) MySQL: Cell[0/0s] Row[2/0s] Rows[8/0.02s] Column[0/0s] Update[37/0.02s] Insert[73/0.05s] Delete[0/0s]
If I go to the next version up (sudo svn update -r 6848), I get an fdb-table time of ~85 seconds:
##### Completed polling run at 2015-08-18 10:52:48
o Devices Polled 1 o Poller Time 85.65 secs o Memory usage 22.75MB (peak: 34.25MB) o MySQL Usage Cell[1/0s] Row[5/0.001s] Rows[8/0.021s] Column[0/0s] Update[76/0.035s] Insert[54/0.025s] Delete[57/0.019s] o RRDTool Usage update[5/0.006s]
To make sure this wasn’t a fluke, I switched back and forth between version 6847 and 6848 three times and tested the runtime on each version. Consistently, version 6847 ran ~30 seconds and version 6848 ran ~85 seconds. I also see lower CPU core utilization on the older version.
Can anyone else test this and see what results you get?
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 10:14 AM To: Observium Network Observation System <observium@observium.orgmailto:observium@observium.org> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
I guess this is just because cisco kit is /super/ slow at fdb-table for other reasons. We strongly recommend disabling it on these devices :)
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 3:38:50 pm "Ron Marosko" <ron@rjr-services.commailto:ron@rjr-services.com> wrote: Yeah, minor brainfart on specifics… it was a rough thing and I can’t say I saw much change… refresh me on how to svn back down to the previous version and I’ll give you exact numbers for the module and on which svn release. I’m on the “stable” train.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 2:47 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Cisco is a special case since it requires per-vlan context polling.
Also, your numbers are terribly confusing. Pre and post change, but you give numbers for with and without the module, not numbers for the module pre and post change.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 00:15:12 "Ron Marosko" <ron@rjr-services.commailto:ron@rjr-services.com> wrote: Just for some comparative numbers, both pre and post change… polling a Cisco 7609 without fdb-table takes 15-20 seconds. With fdb-table, it takes 130-140 seconds.
…Ron
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Monday, August 17, 2015 3:37 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
These changes are now in both trunk and stable. I've disabled the table output until we can investigate the performance of large table outputs.
adam.
On 17/08/2015 20:59:08, Bill Fenner <fenner@gmail.commailto:fenner@gmail.com> wrote: I still had another device that had worse fdb-table performance time than I would expect, and found a little bug in the code that updates fdb_status. If you ever end up with an entry that changes its fdb_status value (e.g., from "learned" to "invalid" before it gets aged out), this code causes the new value to be '' instead of the new value; then every subsequent run finds that fdb_status is wrong and attempts to rewrite it but rewrites it to ''. The diff:
http://www.fenron.com/~fenner/observium-update-fdb-status.diff
This brought this device's fdb-table down from 40s to 18s.
Bill
On Mon, Aug 17, 2015 at 3:41 PM, Adam Armstrong <adama@memetic.orgmailto:adama@memetic.org> wrote:
Ahh. Thanks Bill! This would explain it.
I don't have any devices with large numbers of fdb entries, so I wouldn't have caught the abysmal performance.
I'll disable the table mode for this module.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 17 August 2015 20:15:37 Bill Fenner <fenner@gmail.commailto:fenner@gmail.com> wrote: I've been disabling the fdb-tables poller in my observium instances due to performance problems for a while. Adam poked me to look into this, so I root caused it to the code that displays the contents of the fdb to the terminal. You can apply
http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff
to see this effect - it prints out the time taken to print the table if it takes more than 2 seconds. In my test case, it takes around 30 seconds to print a 2800-entry fdb (and the fdb-table module takes around 33 seconds).
So, it's nothing Arista-specific - it's some behavior of the table printer. The workaround is to comment out the calls to print_cli_table() in includes/polling/fdb-table.inc.php . Polling my sample Arista with about 2800 FDB entries, the module takes 33.4 seconds with the print_cli_table() calls and 6 seconds without.
Bill
On Mon, Aug 17, 2015 at 10:02 AM, Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently?
From: observium [mailto:observium-bounces@observium.orgmailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Thursday, August 13, 2015 5:54 PM
To: Observium Network Observation System <observium@observium.orgmailto:observium@observium.org> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots.
Thanks
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Thursday, August 13, 2015 4:48 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System <observium@observium.orgmailto:observium@observium.org> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers [2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers [2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers [2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers [2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers [2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers [2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers [2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers [2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers [2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers [2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers [2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers [2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cxmailto:tom.laermans@powersource.cx> wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297tel:512.600.4297 www.artisaninfrastructure.comhttp://www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.commailto:support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
Moral of the story guys:
Buy Arista kit. Not only do they fix their own bugs, but other people's too!
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 18:57:45 Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
I had a dog and bingo was his name-o. That did the trick! Awesome!
##### au000a-u25-swa10g [17] completed poller modules at 2015-08-18 12:44:02 #####
o Graphs [checked] ping, ping_snmp, uptime o Graphs [added] fdb_count o Poller time 26.623 seconds o Updated Data uptime, last_polled, last_polled_timetaken, device_state
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 12:36 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
This is because I'm an idiot. I apparently didn't disable the table, somehow. Try now! :D
adam.
On 18/08/2015 18:27:13, Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Should this update bug fix be apparent in version 6895? Still seeing the ~85 second runtime on 6895:
[amayfield@kc-netview observium]$ sudo svn update At revision 6895. [amayfield@kc-netview observium]$ sudo ./poller.php -h 17 -m fdb-table
/ _ \ | |__ ___ ___ _ __ __ __(_) _ _ _ __ ___ | | | || '_ \ / __| / _ | '__|\ \ / /| || | | || '_ ` _ \ | |_| || |_) |__ | __/| | \ V / | || |_| || | | | | | ___/ |_.__/ |___/ ___||_| _/ |_| __,_||_| |_| |_| Observium Professional 0.15.8.6894 http://www.observium.org
##### Starting polling run at 2015-08-18 12:18:34 #####
##### au000a-u25-swa10g [17] #####
o OS arista_eos o Last poll duration 21.20 seconds o Last Polled 2015-08-18 12:15:23 o SNMP Version v2c o Device status Device is reachable by PING (14.1ms) and SNMP (27.64ms) o Modules Enabled system, os, fdb-table
##### Module Start: system #####
o Uptime 332 days, 9h 22m 45s o sysObjectID .1.3.6.1.4.1.30065.1.3011.7150.3282.24 o snmpEngineID F5717F001C731E9CBC00 o sysDescr Arista Networks EOS version 4.13.8M running on an Arista Networks DCS-7150S-24 o sysName au000a-u25-swa10g o Location 7301 Metropolis Drive, Austin, Texas 78744,US o Module time 0.1291s
##### Module Start: os #####
o OS Poller OS o Hardware DCS-7150S-24 o Version 4.13.8M o Features <empty> o Serial <empty> o Asset <empty>
o Module time 0.0033s
##### Module Start: fdb-table #####
+------+--------------+-------+---------+----------+---------+---------+ | VLAN | MAC Address | Port | Port ID | FDB Port | ifIndex | Status | +------+--------------+-------+---------+----------+---------+---------+
***data omitted***
| 4021 | 748ef8a7ad41 | Po85 | 1023 | 110 | 1000085 | learned | | 4094 | 001c731e968a | Po10 | 1021 | 103 | 1000010 | learned | +------+--------------+-------+---------+----------+---------+---------+
o Module time 88.8261s
##### au000a-u25-swa10g [17] completed poller modules at 2015-08-18 12:20:03 #####
o Graphs [checked] ping, ping_snmp, uptime o Graphs [added] fdb_count o Poller time 89.025 seconds o Updated Data uptime, last_polled, last_polled_timetaken, device_state
##### Completed polling run at 2015-08-18 12:20:03 #####
o Devices Polled 1 o Poller Time 89.08 secs o Memory usage 22.75MB (peak: 33.25MB) o MySQL Usage Cell[0/0s] Row[5/0.001s] Rows[8/0.032s] Column[0/0s] Update[28/0.014s] Insert[247/0.098s] Delete[200/0.09s] o RRDTool Usage update[5/0.005s]
[amayfield@kc-netview observium]$
I notice the poller still says version 6894, but I’m guessing maybe the version number didn’t get updated or something.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 12:11 PM To: Observium Network Observation System <observium@observium.orgmailto:observium@observium.org> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
That's related to the database update bug that bill fixed yesterday. :)
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 5:41:46 pm "Ron Marosko" <ron@rjr-services.commailto:ron@rjr-services.com> wrote: So on that C7609….
Did a svn up –r 6847 (which I think took me down to 6833 as I’m on stable instead of current), and ran the poller –m fdb-table against it.
Module [ fdb-table ] time: 96.1547s Graphs [checked]: ping, ping_snmp, uptime, fdb_count, port_fdb_count
Then did svn up –r 6895 since that’s the latest current and ran the poller –m fdb-table again: ##### Module Start: fdb-table #####
ERROR: Device does not support per-VLAN community. +------+--------------+--------+----------+----------+---------+---------+ | VLAN | MAC Address | Port | Port ID | FDB Port | ifIndex | Status | +------+--------------+--------+----------+----------+---------+---------+
<deleted a bunch of entries> +------+--------------+--------+----------+----------+---------+---------+ o Module time 14.7221s
Oh, that’s interesting. ;-)
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Tuesday, August 18, 2015 11:13 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
I did some extensive experimenting this morning.
First of all, I updated to version 6894. On this version I still am seeing poor performance for fdb-table (~86 seconds to run, 100% CPU core utilization). So at this point I started reverting back and forth between older versions (using “svn update –r XXXX) and timing the module performance against a specific switch (device 17).
Through this process I was able to discover the exact version point at which the fdb-table module got slow.
If I do an “sudo svn update -r 6847”, I get fdb-table performance of ~27 seconds:
Module [ fdb-table ] time: 26.75s Graphs [checked]: ping, ping_snmp, uptime, fdb_count
Polled in 26.9278 seconds UPDATED!
Checking alerts Memory usage: 12MB (peak: 22.75MB) MySQL: Cell[0/0s] Row[2/0s] Rows[8/0.02s] Column[0/0s] Update[37/0.02s] Insert[73/0.05s] Delete[0/0s]
If I go to the next version up (sudo svn update -r 6848), I get an fdb-table time of ~85 seconds:
##### Completed polling run at 2015-08-18 10:52:48
o Devices Polled 1 o Poller Time 85.65 secs o Memory usage 22.75MB (peak: 34.25MB) o MySQL Usage Cell[1/0s] Row[5/0.001s] Rows[8/0.021s] Column[0/0s] Update[76/0.035s] Insert[54/0.025s] Delete[57/0.019s] o RRDTool Usage update[5/0.006s]
To make sure this wasn’t a fluke, I switched back and forth between version 6847 and 6848 three times and tested the runtime on each version. Consistently, version 6847 ran ~30 seconds and version 6848 ran ~85 seconds. I also see lower CPU core utilization on the older version.
Can anyone else test this and see what results you get?
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 10:14 AM To: Observium Network Observation System <observium@observium.orgmailto:observium@observium.org> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
I guess this is just because cisco kit is /super/ slow at fdb-table for other reasons. We strongly recommend disabling it on these devices :)
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 3:38:50 pm "Ron Marosko" <ron@rjr-services.commailto:ron@rjr-services.com> wrote: Yeah, minor brainfart on specifics… it was a rough thing and I can’t say I saw much change… refresh me on how to svn back down to the previous version and I’ll give you exact numbers for the module and on which svn release. I’m on the “stable” train.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 2:47 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Cisco is a special case since it requires per-vlan context polling.
Also, your numbers are terribly confusing. Pre and post change, but you give numbers for with and without the module, not numbers for the module pre and post change.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 00:15:12 "Ron Marosko" <ron@rjr-services.commailto:ron@rjr-services.com> wrote: Just for some comparative numbers, both pre and post change… polling a Cisco 7609 without fdb-table takes 15-20 seconds. With fdb-table, it takes 130-140 seconds.
…Ron
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Monday, August 17, 2015 3:37 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
These changes are now in both trunk and stable. I've disabled the table output until we can investigate the performance of large table outputs.
adam.
On 17/08/2015 20:59:08, Bill Fenner <fenner@gmail.commailto:fenner@gmail.com> wrote: I still had another device that had worse fdb-table performance time than I would expect, and found a little bug in the code that updates fdb_status. If you ever end up with an entry that changes its fdb_status value (e.g., from "learned" to "invalid" before it gets aged out), this code causes the new value to be '' instead of the new value; then every subsequent run finds that fdb_status is wrong and attempts to rewrite it but rewrites it to ''. The diff:
http://www.fenron.com/~fenner/observium-update-fdb-status.diff
This brought this device's fdb-table down from 40s to 18s.
Bill
On Mon, Aug 17, 2015 at 3:41 PM, Adam Armstrong <adama@memetic.orgmailto:adama@memetic.org> wrote:
Ahh. Thanks Bill! This would explain it.
I don't have any devices with large numbers of fdb entries, so I wouldn't have caught the abysmal performance.
I'll disable the table mode for this module.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 17 August 2015 20:15:37 Bill Fenner <fenner@gmail.commailto:fenner@gmail.com> wrote: I've been disabling the fdb-tables poller in my observium instances due to performance problems for a while. Adam poked me to look into this, so I root caused it to the code that displays the contents of the fdb to the terminal. You can apply
http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff
to see this effect - it prints out the time taken to print the table if it takes more than 2 seconds. In my test case, it takes around 30 seconds to print a 2800-entry fdb (and the fdb-table module takes around 33 seconds).
So, it's nothing Arista-specific - it's some behavior of the table printer. The workaround is to comment out the calls to print_cli_table() in includes/polling/fdb-table.inc.php . Polling my sample Arista with about 2800 FDB entries, the module takes 33.4 seconds with the print_cli_table() calls and 6 seconds without.
Bill
On Mon, Aug 17, 2015 at 10:02 AM, Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently?
From: observium [mailto:observium-bounces@observium.orgmailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Thursday, August 13, 2015 5:54 PM
To: Observium Network Observation System <observium@observium.orgmailto:observium@observium.org> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots.
Thanks
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Thursday, August 13, 2015 4:48 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System <observium@observium.orgmailto:observium@observium.org> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers [2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers [2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers [2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers [2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers [2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers [2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers [2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers [2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers [2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers [2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers [2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers [2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cxmailto:tom.laermans@powersource.cx> wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297tel:512.600.4297 www.artisaninfrastructure.comhttp://www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.commailto:support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.orgmailto:observium%40observium.org MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observiumhttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/a4042920f4bf89a219241c65ae64c5d8.jpg?s=120&d=mm&r=g)
Yeah, I already have it disabled… was hoping maybe it might have been fixed since I’ve got a lot of Cisco stuff I watch.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 10:14 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
I guess this is just because cisco kit is /super/ slow at fdb-table for other reasons. We strongly recommend disabling it on these devices :)
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 3:38:50 pm "Ron Marosko" ron@rjr-services.com wrote:
Yeah, minor brainfart on specifics… it was a rough thing and I can’t say I saw much change… refresh me on how to svn back down to the previous version and I’ll give you exact numbers for the module and on which svn release. I’m on the “stable” train.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 2:47 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Cisco is a special case since it requires per-vlan context polling.
Also, your numbers are terribly confusing. Pre and post change, but you give numbers for with and without the module, not numbers for the module pre and post change.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 00:15:12 "Ron Marosko" ron@rjr-services.com wrote:
Just for some comparative numbers, both pre and post change… polling a Cisco 7609 without fdb-table takes 15-20 seconds. With fdb-table, it takes 130-140 seconds.
…Ron
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Monday, August 17, 2015 3:37 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
These changes are now in both trunk and stable. I've disabled the table output until we can investigate the performance of large table outputs.
adam.
On 17/08/2015 20:59:08, Bill Fenner fenner@gmail.com wrote:
I still had another device that had worse fdb-table performance time than I would expect, and found a little bug in the code that updates fdb_status. If you ever end up with an entry that changes its fdb_status value (e.g., from "learned" to "invalid" before it gets aged out), this code causes the new value to be '' instead of the new value; then every subsequent run finds that fdb_status is wrong and attempts to rewrite it but rewrites it to ''. The diff:
http://www.fenron.com/~fenner/observium-update-fdb-status.diff
This brought this device's fdb-table down from 40s to 18s.
Bill
On Mon, Aug 17, 2015 at 3:41 PM, Adam Armstrong adama@memetic.org wrote:
Ahh. Thanks Bill! This would explain it.
I don't have any devices with large numbers of fdb entries, so I wouldn't have caught the abysmal performance.
I'll disable the table mode for this module.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 17 August 2015 20:15:37 Bill Fenner fenner@gmail.com wrote:
I've been disabling the fdb-tables poller in my observium instances due to performance problems for a while. Adam poked me to look into this, so I root caused it to the code that displays the contents of the fdb to the terminal. You can apply
http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff
to see this effect - it prints out the time taken to print the table if it takes more than 2 seconds. In my test case, it takes around 30 seconds to print a 2800-entry fdb (and the fdb-table module takes around 33 seconds).
So, it's nothing Arista-specific - it's some behavior of the table printer. The workaround is to comment out the calls to print_cli_table() in includes/polling/fdb-table.inc.php . Polling my sample Arista with about 2800 FDB entries, the module takes 33.4 seconds with the print_cli_table() calls and 6 seconds without.
Bill
On Mon, Aug 17, 2015 at 10:02 AM, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently?
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Thursday, August 13, 2015 5:54 PM
To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots.
Thanks
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Thursday, August 13, 2015 4:48 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers
[2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers
[2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers
[2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers
[2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers
[2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers
[2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers
[2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers
[2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers
[2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers
[2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers
[2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers
[2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans tom.laermans@powersource.cx wrote:
If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
For cisco im not sure there is a way to fix it, at least on devices with a lot of vlans, since we have to query the devices for each vlan individually, and its really slow!
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 5:23:12 pm "Ron Marosko" ron@rjr-services.com wrote:
Yeah, I already have it disabled… was hoping maybe it might have been fixed since I’ve got a lot of Cisco stuff I watch.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 10:14 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
I guess this is just because cisco kit is /super/ slow at fdb-table for other reasons. We strongly recommend disabling it on these devices :)
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 3:38:50 pm "Ron Marosko" ron@rjr-services.com wrote:
Yeah, minor brainfart on specifics… it was a rough thing and I can’t say I saw much change… refresh me on how to svn back down to the previous version and I’ll give you exact numbers for the module and on which svn release. I’m on the “stable” train.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 18, 2015 2:47 AM To: Observium Network Observation System Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Cisco is a special case since it requires per-vlan context polling.
Also, your numbers are terribly confusing. Pre and post change, but you give numbers for with and without the module, not numbers for the module pre and post change.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 18 August 2015 00:15:12 "Ron Marosko" ron@rjr-services.com wrote:
Just for some comparative numbers, both pre and post change… polling a Cisco 7609 without fdb-table takes 15-20 seconds. With fdb-table, it takes 130-140 seconds.
…Ron
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Monday, August 17, 2015 3:37 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
These changes are now in both trunk and stable. I've disabled the table output until we can investigate the performance of large table outputs.
adam.
On 17/08/2015 20:59:08, Bill Fenner fenner@gmail.com wrote:
I still had another device that had worse fdb-table performance time than I would expect, and found a little bug in the code that updates fdb_status. If you ever end up with an entry that changes its fdb_status value (e.g., from "learned" to "invalid" before it gets aged out), this code causes the new value to be '' instead of the new value; then every subsequent run finds that fdb_status is wrong and attempts to rewrite it but rewrites it to ''. The diff:
http://www.fenron.com/~fenner/observium-update-fdb-status.diff
This brought this device's fdb-table down from 40s to 18s.
Bill
On Mon, Aug 17, 2015 at 3:41 PM, Adam Armstrong adama@memetic.org wrote:
Ahh. Thanks Bill! This would explain it.
I don't have any devices with large numbers of fdb entries, so I wouldn't have caught the abysmal performance.
I'll disable the table mode for this module.
Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 17 August 2015 20:15:37 Bill Fenner fenner@gmail.com wrote:
I've been disabling the fdb-tables poller in my observium instances due to performance problems for a while. Adam poked me to look into this, so I root caused it to the code that displays the contents of the fdb to the terminal. You can apply
http://www.fenron.com/~fenner/observium-print-cli-table-timing.diff
to see this effect - it prints out the time taken to print the table if it takes more than 2 seconds. In my test case, it takes around 30 seconds to print a 2800-entry fdb (and the fdb-table module takes around 33 seconds).
So, it's nothing Arista-specific - it's some behavior of the table printer. The workaround is to comment out the calls to print_cli_table() in includes/polling/fdb-table.inc.php . Polling my sample Arista with about 2800 FDB entries, the module takes 33.4 seconds with the print_cli_table() calls and 6 seconds without.
Bill
On Mon, Aug 17, 2015 at 10:02 AM, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Is there anyone else out there polling Arista switches? Has anyone else out there noticed any performance issues with polling the fdb-tables module recently?
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Aaron Mayfield Sent: Thursday, August 13, 2015 5:54 PM
To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
OK, I picked one of the Arista switches at random and collected the poller performance stats both with and without fdb-table enabled. Attaching screenshots.
Thanks
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Thursday, August 13, 2015 4:48 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
You also want the device performance for one of these arista devices, where you'll see which module is taking the time.
It's on the right hand side of the device navbar.
adam.
On 13/08/2015 22:46:05, Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Adam, you also asked for screenshots of the polling performance page. My apologies for taking so long to grab these.
This kind of confirms my suspicion that it is my Arista devices that are taking longer, they have the “swa” in the hostname.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield amayfield@artisaninfrastructure.com wrote:
Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers
[2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers
[2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers
[2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers
[2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers
[2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers
[2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers
[2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers
[2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers
[2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers
[2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers
[2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers
[2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers
[2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers
[2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers
[2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans tom.laermans@powersource.cx wrote:
If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org mailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium MailScanner has detected a possible fraud attempt from "postman.memetic.org" claiming to be http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
![](https://secure.gravatar.com/avatar/7941076427c7cfce44646fa3eba4be42.jpg?s=120&d=mm&r=g)
Some additional information on this issue. I figured out how to disable the fdb-table module globally in config.php. Now observium is back to being useable. I’m attaching screenshots of the polling performance both with fdb-table enabled and without fdb-table enabled. I’m also attaching the mysql performance stats screenshot for each.
I’m seeing total polling time of about 3340 seconds with fdb_table enabled and 560 seconds without it enabled.
If I run “poller.php –h all –m fdb-table” manually, here are the stats I get:
##### Completed polling run at 2015-08-13 16:19:01 #####
o Devices Polled 45 o Poller Time 2142. secs o Memory usage 167.2MB (peak: 167.7MB) o MySQL Usage Cell[0/0s] Row[115/0.045s] Rows[280/1.337s] Column[0/0s] Update[10015/10.296s] Insert[4152/3.583s] Delete[4144/1.749s] o RRDTool Usage update[217/0.249s]
I definitely seem to be seeing a performance issue of some type with fdb-table after the upgrade I did. For now, I’m going to have to leave fdb-table disabled and run it once in a while as you suggest. Also when I’m watching htop while running fdp-table manually, poller.php hits 100% CPU and sits there a lot. This is not something I would normally see in the past during day to day operation.
The one thing that I can see is amiss is I’m getting a complaint from observium that I’m running PHP 5.3.3. Could this have anything to do with the issue?
I did upgrade to cause this, but typically I upgrade about once a month, so the software I upgraded from wasn’t THAT old.
Thanks
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 3:58 AM To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
Can you tell which device is taking a long time?
You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)
Screenshots of those might help :)
Thanks, Adam.
Sent with AquaMail for Android http://www.aqua-mail.com
On 12 August 2015 7:50:28 am Aaron Mayfield <amayfield@artisaninfrastructure.commailto:amayfield@artisaninfrastructure.com> wrote: Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect. I’m scratching my head as to why it worked fine before the upgrade. In fact, here is the observium.log output before/after the upgrade:
[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers [2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers [2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers [2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers [2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers [2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers [2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers [2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers [2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers [2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers [2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers [2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers [2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers [2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers [2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers [2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers
I did the upgrade and then the poller starts running poorly. Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.
So I have some kind of performance issue.
One strange thing, I only have 45 devices I’m polling, yet the port count under the Port menu says I have 44859 ports. That must be a miscalculation of some type or my database is messed up somewhere. Not sure if related or not.
Going to continue to try the poller manually and see if I can figure out where the slowdown is.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Wednesday, August 12, 2015 1:33 AM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Man, why didn't I think of this?
This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.
If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.
If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table
Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.
adam.
On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cxmailto:tom.laermans@powersource.cx> wrote: If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...
Tom
On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
Here is the requested output. I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU). Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast). Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: Tuesday, August 11, 2015 11:52 PM To: observium@observium.orgmailto:observium@observium.org Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
Hi Aaron,
These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
Thanks,
adam.
On 12/08/2015 05:01:53, Aaron Mayfield wrote:
Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
[2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned') [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
Here is the schema of my vlans_fdb file:
mysql> show columns from vlans_fdb -> ; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | device_id | int(11) | NO | PRI | NULL | | | vlan_id | int(11) | NO | PRI | NULL | | | port_id | int(11) | YES | MUL | NULL | | | mac_address | varchar(32) | NO | PRI | NULL | | | fdb_status | varchar(32) | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 5 rows in set (0.00 sec)
mysql> mysql> show index from vlans_fdb -> ; +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | | | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | | | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | | +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+ 5 rows in set (0.04 sec)
mysql>
Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
What should I check? Thanks for any help.
Aaron Mayfield Cloud Expert Networking Specialist
12400 Hwy. 71 W. Suite 350-407 Austin, TX 78738 T. 512.600.4297 www.artisaninfrastructure.comhttp://www.artisaninfrastructure.com Partner portal: https://portal.vpdc.us Partner support: support@artisaninfrastructure.commailto:support@artisaninfrastructure.com
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________ observium mailing list observium@observium.orgmailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________ observium mailing list observium@observium.orgmailto:observium%40observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
participants (5)
-
Aaron Mayfield
-
Adam Armstrong
-
Bill Fenner
-
Ron Marosko
-
Tom Laermans