In your config.php  : 

$config['poller_modules']['fdb-table']                    = 1;

adam.

On 13/08/2015 19:04:41, Aaron Mayfield <amayfield@artisaninfrastructure.com> wrote:

Haven’t found any specific smoking gun yet.  There does seem to be a trend of all of my Arista devices taking longer to run than the others, but this is difficult to verify since most of my Observium devices are Arista.

 

If I want to disable the fdb-tables module for troubleshooting- is there a way to do that globally?   Thanks

 

 

[amayfield@kc-netview observium]$ sudo ./poller-wrapper.py

INFO: starting the poller at 2015/08/12 02:17:55 with 8 threads, slowest devices first

INFO starting alerter.php for 55

INFO finished alerter.php for 55

INFO: worker Thread-8 finished device 55 in 130 seconds

INFO starting alerter.php for 28

INFO finished alerter.php for 28

INFO: worker Thread-7 finished device 28 in 135 seconds

INFO starting alerter.php for 27

INFO finished alerter.php for 27

INFO: worker Thread-6 finished device 27 in 143 seconds

INFO starting alerter.php for 17

INFO finished alerter.php for 17

INFO: worker Thread-3 finished device 17 in 159 seconds

INFO starting alerter.php for 22

INFO finished alerter.php for 22

INFO: worker Thread-5 finished device 22 in 172 seconds

INFO starting alerter.php for 20

INFO finished alerter.php for 20

INFO: worker Thread-2 finished device 20 in 176 seconds

INFO starting alerter.php for 52

INFO finished alerter.php for 52

INFO starting alerter.php for 21

INFO finished alerter.php for 21

INFO: worker Thread-8 finished device 52 in 147 seconds

INFO: worker Thread-1 finished device 21 in 277 seconds

INFO starting alerter.php for 56

INFO finished alerter.php for 56

INFO: worker Thread-7 finished device 56 in 151 seconds

INFO starting alerter.php for 51

INFO finished alerter.php for 51

INFO: worker Thread-6 finished device 51 in 145 seconds

INFO starting alerter.php for 37

INFO finished alerter.php for 37

INFO: worker Thread-3 finished device 37 in 130 seconds

INFO starting alerter.php for 19

INFO finished alerter.php for 19

INFO: worker Thread-4 finished device 19 in 294 seconds

INFO starting alerter.php for 38

INFO finished alerter.php for 38

INFO: worker Thread-5 finished device 38 in 125 seconds

INFO starting alerter.php for 18

INFO finished alerter.php for 18

INFO: worker Thread-2 finished device 18 in 156 seconds

INFO starting alerter.php for 1

INFO finished alerter.php for 1

INFO: worker Thread-4 finished device 1 in 61 seconds

INFO starting alerter.php for 49

INFO finished alerter.php for 49

INFO: worker Thread-1 finished device 49 in 102 seconds

INFO starting alerter.php for 14

INFO finished alerter.php for 14

INFO: worker Thread-5 finished device 14 in 83 seconds

INFO starting alerter.php for 5

INFO finished alerter.php for 5

INFO: worker Thread-3 finished device 5 in 104 seconds

INFO starting alerter.php for 50

INFO finished alerter.php for 50

INFO: worker Thread-6 finished device 50 in 105 seconds

INFO starting alerter.php for 53

INFO finished alerter.php for 53

INFO: worker Thread-8 finished device 53 in 127 seconds

INFO starting alerter.php for 16

INFO finished alerter.php for 16

INFO: worker Thread-5 finished device 16 in 36 seconds

INFO starting alerter.php for 10

INFO finished alerter.php for 10

INFO: worker Thread-4 finished device 10 in 63 seconds

INFO starting alerter.php for 54

INFO finished alerter.php for 54

INFO starting alerter.php for 6

INFO finished alerter.php for 6

INFO: worker Thread-7 finished device 54 in 137 seconds

INFO: worker Thread-2 finished device 6 in 91 seconds

INFO starting alerter.php for 11

INFO finished alerter.php for 11

INFO: worker Thread-6 finished device 11 in 32 seconds

INFO starting alerter.php for 7

INFO finished alerter.php for 7

INFO: worker Thread-1 finished device 7 in 49 seconds

INFO starting alerter.php for 36

INFO finished alerter.php for 36

INFO: worker Thread-8 finished device 36 in 26 seconds

INFO starting alerter.php for 15

INFO finished alerter.php for 15

INFO: worker Thread-7 finished device 15 in 14 seconds

INFO starting alerter.php for 41

INFO finished alerter.php for 41

INFO: worker Thread-2 finished device 41 in 18 seconds

INFO starting alerter.php for 4

INFO finished alerter.php for 4

INFO: worker Thread-1 finished device 4 in 15 seconds

INFO starting alerter.php for 23

INFO finished alerter.php for 23

INFO: worker Thread-6 finished device 23 in 18 seconds

INFO starting alerter.php for 3

INFO finished alerter.php for 3

INFO: worker Thread-8 finished device 3 in 15 seconds

INFO starting alerter.php for 25

INFO finished alerter.php for 25

INFO: worker Thread-5 finished device 25 in 29 seconds

INFO starting alerter.php for 26

INFO finished alerter.php for 26

INFO: worker Thread-4 finished device 26 in 28 seconds

INFO starting alerter.php for 33

INFO finished alerter.php for 33

INFO starting alerter.php for 29

INFO finished alerter.php for 29

INFO: worker Thread-2 finished device 33 in 5 seconds

INFO: worker Thread-1 finished device 29 in 5 seconds

INFO starting alerter.php for 35

INFO finished alerter.php for 35

INFO: worker Thread-8 finished device 35 in 3 seconds

INFO starting alerter.php for 9

INFO finished alerter.php for 9

INFO: worker Thread-5 finished device 9 in 2 seconds

INFO starting alerter.php for 2

INFO finished alerter.php for 2

INFO starting alerter.php for 31

INFO finished alerter.php for 31

INFO: worker Thread-3 finished device 2 in 56 seconds

INFO: worker Thread-4 finished device 31 in 2 seconds

INFO starting alerter.php for 30

INFO finished alerter.php for 30

INFO: worker Thread-6 finished device 30 in 4 seconds

INFO starting alerter.php for 8

INFO finished alerter.php for 8

INFO starting alerter.php for 57

INFO finished alerter.php for 57

INFO: worker Thread-2 finished device 8 in 2 seconds

INFO: worker Thread-1 finished device 57 in 2 seconds

INFO starting alerter.php for 32

INFO finished alerter.php for 32

INFO: worker Thread-8 finished device 32 in 1 seconds

INFO starting alerter.php for 24

INFO finished alerter.php for 24

INFO: worker Thread-7 finished device 24 in 15 seconds

INFO: poller-wrapper.py polled 45 devices in 454 seconds with 8 workers

 

WARNING: the process took more than 5 minutes to finish, you need faster hardware or more threads

INFO: in sequential style polling the elapsed time would have been: 3590 seconds

WARNING: Consider setting a minimum of 13 threads. (This does not constitute professional advice!)

[amayfield@kc-netview observium]$

 

From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong
Sent: Wednesday, August 12, 2015 3:58 AM
To: Observium Network Observation System <observium@observium.org>
Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb

 

Hi Aaron,

Can you tell which device is taking a long time?

You can check the poller performance page from the "globe" menu, and the device performance tab (the "clock" icon on the right of the device navbar)

Screenshots of those might help :)

Thanks,
Adam.

Sent with AquaMail for Android
http://www.aqua-mail.com

On 12 August 2015 7:50:28 am Aaron Mayfield <amayfield@artisaninfrastructure.com> wrote:

Yeah this seems to be a performance issue of some type and the fdb table stuff seems like a side effect.  I’m scratching my head as to why it worked fine before the upgrade.  In fact, here is the observium.log output before/after the upgrade:

 

[2015/08/11 10:42:14 -0500] poller-wrapper.py(23384): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers

[2015/08/11 10:47:12 -0500] poller-wrapper.py(3121): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers

[2015/08/11 10:52:13 -0500] poller-wrapper.py(15078): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers

[2015/08/11 10:57:13 -0500] poller-wrapper.py(27618): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers

[2015/08/11 11:02:14 -0500] poller-wrapper.py(7205): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers

[2015/08/11 11:07:14 -0500] poller-wrapper.py(19611): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers

[2015/08/11 11:12:12 -0500] poller-wrapper.py(31781): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers

[2015/08/11 11:17:15 -0500] poller-wrapper.py(11383): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers

[2015/08/11 11:22:15 -0500] poller-wrapper.py(23688): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers

[2015/08/11 11:27:14 -0500] poller-wrapper.py(3412): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers

[2015/08/11 11:32:10 -0500] poller-wrapper.py(15327): /opt/observium/poller-wrapper.py: polled 45 devices in 128 seconds with 8 workers

[2015/08/11 11:37:14 -0500] poller-wrapper.py(27814): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers

[2015/08/11 11:42:13 -0500] poller-wrapper.py(7491): /opt/observium/poller-wrapper.py: polled 45 devices in 132 seconds with 8 workers

[2015/08/11 11:47:13 -0500] poller-wrapper.py(19987): /opt/observium/poller-wrapper.py: polled 45 devices in 131 seconds with 8 workers

[2015/08/11 11:52:15 -0500] poller-wrapper.py(32100): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers

[2015/08/11 11:57:14 -0500] poller-wrapper.py(11743): /opt/observium/poller-wrapper.py: polled 45 devices in 133 seconds with 8 workers

[2015/08/11 12:02:22 -0500] poller-wrapper.py(23906): /opt/observium/poller-wrapper.py: polled 45 devices in 140 seconds with 8 workers

[2015/08/11 12:06:52 -0500] poller-wrapper.py(4395): /opt/observium/poller-wrapper.py: polled 45 devices in 111 seconds with 8 workers

[2015/08/11 12:21:24 -0500] poller-wrapper.py(11770): /opt/observium/poller-wrapper.py: polled 45 devices in 683 seconds with 8 workers

[2015/08/11 12:38:09 -0500] poller-wrapper.py(17020): /opt/observium/poller-wrapper.py: polled 45 devices in 1388 seconds with 8 workers

[2015/08/11 12:48:30 -0500] poller-wrapper.py(26555): /opt/observium/poller-wrapper.py: polled 45 devices in 1708 seconds with 8 workers

[2015/08/11 13:06:30 -0500] poller-wrapper.py(2438): /opt/observium/poller-wrapper.py: polled 45 devices in 2487 seconds with 8 workers

[2015/08/11 13:12:30 -0500] poller-wrapper.py(9984): /opt/observium/poller-wrapper.py: polled 45 devices in 2548 seconds with 8 workers

[2015/08/11 13:31:56 -0500] poller-wrapper.py(19437): /opt/observium/poller-wrapper.py: polled 45 devices in 3414 seconds with 8 workers

[2015/08/11 13:40:50 -0500] poller-wrapper.py(25290): /opt/observium/poller-wrapper.py: polled 45 devices in 3647 seconds with 8 workers

[2015/08/11 13:55:34 -0500] poller-wrapper.py(956): /opt/observium/poller-wrapper.py: polled 45 devices in 4231 seconds with 8 workers

[2015/08/11 14:02:54 -0500] poller-wrapper.py(7354): /opt/observium/poller-wrapper.py: polled 45 devices in 4370 seconds with 8 workers

[2015/08/11 14:20:50 -0500] poller-wrapper.py(14288): /opt/observium/poller-wrapper.py: polled 45 devices in 5147 seconds with 8 workers

 

I did the upgrade and then the poller starts running poorly.  Then I see the side effect of the of the duplicate entries with the fdb table because the poller processes are running so slowly they are stacking on top of each other.

 

So I have some kind of performance issue. 

 

One strange thing, I only have 45 devices I’m polling, yet the port count under  the Port menu says I have 44859 ports.  That must be a miscalculation of some type or my database is messed up somewhere.   Not sure if related or not.

 

Going to continue to try the poller manually and see if I can figure out where the slowdown is.

 

 

From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong
Sent: Wednesday, August 12, 2015 1:33 AM
To: observium@observium.org
Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb

 

Man, why didn't I think of this?

 

This sounds like the problem. I guess something caused your poller processes to get confused, and they ended up running in parallel, not unthinkable when the same part of the poller process runs for so long, 74 seconds for the fdb-table module.

 

If you don't /really/ need this data, I'd recommend disabling it. It's one of the trade offs we have to make between performance and data.

 

If you still want the fdb data, you can force that module to be run using a less-often scheduled process in cron like ./poller.php -h all -m fdb-table

 

Note that this will break whatever graphs (fdb count?) that fdb-table produces, but you'll still have the data in the database.

 

adam.

On 12/08/2015 07:28:55, Tom Laermans <tom.laermans@powersource.cx> wrote:

If you're running multiple simultaneous pollers against the same device is not unthinkable they'll all be trying to insert the same data into the table...

Tom

On Aug 12, 2015 8:02 AM, Aaron Mayfield wrote:
>
> Here is the requested output.  I had to kill all the other poller processes running on the system to get it to run (they were hosing the CPU).  Oddly enough, once I killed all the other processes, I didn’t any problems running it (back to being fast).  Also don’t seem to be getting the errors in the db.log when running the poller ‘one-at-a-time’.
>
>  
>
>  
>
>  
>
> From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong
> Sent: Tuesday, August 11, 2015 11:52 PM
> To: observium@observium.org
> Subject: Re: [Observium] 'Duplicate entry' issues on vlans_fdb
>
>  
>
> Hi Aaron,
>
>  
>
> These seem to be gone from my db.log. Could you send me a ./poller.php -h 54 -m fdb-table -d ?
>
>  
>
> Thanks,
>
> adam.
>>
>> On 12/08/2015 05:01:53, Aaron Mayfield wrote:
>>
>> Just today I updated to the latest and greatest (0.15.8.6882). I was several revisions behind and several database updates were done as a result. After the update, I noticed my poller.php processes started taking all the CPU, started getting gaps in the graphs, etc. I noticed thousands of these entries in db.log:
>>
>>
>> [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2083-6ef88537f91f' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2083','220115','6ef88537f91f','learned')
>> [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-005056a927c2' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','005056a927c2','learned')
>> [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2084-a66aaf0bf4cc' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2084','220115','a66aaf0bf4cc','learned')
>> [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2085-228a3d193c66' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2085','220115','228a3d193c66','learned')
>> [2015/08/11 22:01:02 -0500] poller.php(21342): Failed dbQuery (#1062 - Duplicate entry '54-2086-0ee7c729643b' for key 'dev_vlan_mac'), Query: INSERT INTO `vlans_fdb` (`device_id`,`vlan_id`,`port_id`,`mac_address`,`fdb_status`) VALUES ('54','2086','220115','0ee7c729643b','learned')
>>
>> If I run a poller process against a switch manually, everything seems to run fine with the exception of the fdb-table module, which is taking over 600 seconds to run.
>>
>> Here is the schema of my vlans_fdb file:
>>
>> mysql> show columns from vlans_fdb
>> -> ;
>> +-------------+-------------+------+-----+---------+-------+
>> | Field | Type | Null | Key | Default | Extra |
>> +-------------+-------------+------+-----+---------+-------+
>> | device_id | int(11) | NO | PRI | NULL | |
>> | vlan_id | int(11) | NO | PRI | NULL | |
>> | port_id | int(11) | YES | MUL | NULL | |
>> | mac_address | varchar(32) | NO | PRI | NULL | |
>> | fdb_status | varchar(32) | NO | | NULL | |
>> +-------------+-------------+------+-----+---------+-------+
>> 5 rows in set (0.00 sec)
>>
>> mysql>
>> mysql> show index from vlans_fdb
>> -> ;
>> +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+
>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub art | Packed | Null | Index_type | Comment |
>> +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+
>> | vlans_fdb | 0 | dev_vlan_mac | 1 | device_id | A | 15 | ULL | NULL | | BTREE | |
>> | vlans_fdb | 0 | dev_vlan_mac | 2 | vlan_id | A | 18348 | ULL | NULL | | BTREE | |
>> | vlans_fdb | 0 | dev_vlan_mac | 3 | mac_address | A | 128440 | ULL | NULL | | BTREE | |
>> | vlans_fdb | 1 | device_id | 1 | device_id | A | 78 | ULL | NULL | | BTREE | |
>> | vlans_fdb | 1 | port_id | 1 | port_id | A | 431 | ULL | NULL | YES | BTREE | |
>> +-----------+------------+--------------+--------------+-------------+-----------+-------------+---- ----+--------+------+------------+---------+
>> 5 rows in set (0.04 sec)
>>
>> mysql>
>>
>> Does my table structure look right? I see someone else on the list has had this same issue, but there is no indication that this should be a problem in the latest version.
>>
>> What should I check? Thanks for any help.
>>
>>
>>
>>
>> Aaron Mayfield
>> Cloud Expert
>> Networking Specialist
>>
>> 12400 Hwy. 71 W. Suite 350-407
>> Austin, TX 78738
>> T. 512.600.4297
>> www.artisaninfrastructure.com
>> Partner portal: https://portal.vpdc.us
>> Partner support: support@artisaninfrastructure.com
>>
>>
>>
>> This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company. Finally the recipient should check this email and any attachment for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
>> _______________________________________________
>> observium mailing list
>> observium@observium.org
>> http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
>
> This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed.  If you have received this email in error, please notify the system manager.  Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company.  Finally the recipient should check this email and any attachment for the presence of viruses.  The company accepts no liability for any damage caused by any virus transmitted by this email.   (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)
_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium


This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed.  If you have received this email in error, please notify the system manager.  Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company.  Finally the recipient should check this email and any attachment for the presence of viruses.  The company accepts no liability for any damage caused by any virus transmitted by this email.   (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.) _______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed.  If you have received this email in error, please notify the system manager.  Please note that any views or opinions present in this email are solely those of the author and do not necessarily represent those of the company.  Finally the recipient should check this email and any attachment for the presence of viruses.  The company accepts no liability for any damage caused by any virus transmitted by this email.   (Proprietary & Confidential – Artisan Infrastructure, Inc. et all.)