Discussion:
[smokeping-users] Alerting when a Slave stops sending data
Olivier
2018-04-22 15:50:10 UTC
Permalink
Hello,

I have a Debian Jessie box with Smokeping 2.6 installed on it.

It receives data from Slaves over the Internet (10 slaves or so).
Each Slave roughly monitors xDSL or fiber links.

Every monday, I can see that data from one or two slaves is missing.
Then I remotely restart smokeping service on slave where data is missing.

I would like to implement something like:

- if no data at all from Slave for a given period of time, then restart
Slave's smokeping service and send a Notice email

- if no data at all from Slave for a longer period of time and Slave's
restart already attempted, then send a Warning email

As Slaves data is stored on a known directory ins Master's filesystem, I
think I can detect when data from a slave has not been lately modified,
reading directories of files modification times.

Is there a better way to do so ? Alert's settings seem more appropriate
when WAN links in my case, are slower.

Best regards
Gregory Sloop
2018-04-22 18:29:43 UTC
Permalink
This is an awesome idea - and one I've wished for in the past - but never got around to working on.
Checking the slave data files modification times seems plausible as a way to check updates - but you'd have to test to be sure. [IIRC that will work though.]

Personally, I'd probably try to write it in bash - or something completely external to smokeping. [Bash because of few dependicies - though you'll probably want/need something like sendemail for email notifications...

If slaves are behind NAT or something similar, you'll have to have a way to get to the slave for handling a restart, but that's really outside the scope of what you're doing.

Honestly, simply getting notification that a slave is not pushing updates would be more than enough - even without the restart.

Sounds fab to me. And I can't think of a better way, off hand.

-Greg


Hello,

I have a Debian Jessie box with Smokeping 2.6 installed on it.

It receives data from Slaves over the Internet (10 slaves or so).
Each Slave roughly monitors xDSL or fiber links.

Every monday, I can see that data from one or two slaves is missing.
Then I remotely restart smokeping service on slave where data is missing.

I would like to implement something like:

- if no data at all from Slave for a given period of time, then restart Slave's smokeping service and send a Notice email

- if no data at all from Slave for a longer period of time and Slave's restart already attempted, then send a Warning email

As Slaves data is stored on a known directory ins Master's filesystem, I think I can detect when data from a slave has not been lately modified, reading directories of files modification times.

Is there a better way to do so ? Alert's settings seem more appropriate when WAN links in my case, are slower.

Best regards
--
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail: ***@sloop.net
http://www.sloop.net
---
Bill Houle
2018-04-22 21:47:26 UTC
Permalink
As someone who recently had to implement a monitor of not-smokeping processes, might I suggest “monit”? It is a fairly mainstream package that is readily available in yum and apt-get repos.

Monit is a locally-installed (ie per slave) daemon process that can monitor files (by timestamp or checksum), processes (by PID), programs (by exit code), and system (by resource consumption). It has a flexible config language that can alert/start/stop/exec based on those monitor conditions.

I could see monit being used to watch each slave and alert and/or auto-restart the data collection.

—bill
Post by Gregory Sloop
This is an awesome idea - and one I've wished for in the past - but never got around to working on.
Checking the slave data files modification times seems plausible as a way to check updates - but you'd have to test to be sure. [IIRC that will work though.]
Personally, I'd probably try to write it in bash - or something completely external to smokeping. [Bash because of few dependicies - though you'll probably want/need something like sendemail for email notifications...
If slaves are behind NAT or something similar, you'll have to have a way to get to the slave for handling a restart, but that's really outside the scope of what you're doing.
Honestly, simply getting notification that a slave is not pushing updates would be more than enough - even without the restart.
Sounds fab to me. And I can't think of a better way, off hand.
-Greg
Hello,
I have a Debian Jessie box with Smokeping 2.6 installed on it.
It receives data from Slaves over the Internet (10 slaves or so).
Each Slave roughly monitors xDSL or fiber links.
Every monday, I can see that data from one or two slaves is missing.
Then I remotely restart smokeping service on slave where data is missing.
- if no data at all from Slave for a given period of time, then restart Slave's smokeping service and send a Notice email
- if no data at all from Slave for a longer period of time and Slave's restart already attempted, then send a Warning email
As Slaves data is stored on a known directory ins Master's filesystem, I think I can detect when data from a slave has not been lately modified, reading directories of files modification times.
Is there a better way to do so ? Alert's settings seem more appropriate when WAN links in my case, are slower.
Best regards
--
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
http://www.sloop.net
---
_______________________________________________
smokeping-users mailing list
https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
Darren Murphy
2018-04-22 23:44:33 UTC
Permalink
I second the monit suggestion. I have used it for exactly this purpose
(watching/restarting slave threads) in the past.

regards,
Darren
Post by Bill Houle
As someone who recently had to implement a monitor of not-smokeping
processes, might I suggest “monit”? It is a fairly mainstream package that
is readily available in yum and apt-get repos.
Monit is a locally-installed (ie per slave) daemon process that can
monitor files (by timestamp or checksum), processes (by PID), programs (by
exit code), and system (by resource consumption). It has a flexible config
language that can alert/start/stop/exec based on those monitor conditions.
I could see monit being used to watch each slave and alert and/or
auto-restart the data collection.
—bill
This is an awesome idea - and one I've wished for in the past - but never
got around to working on.
Checking the slave data files modification times seems plausible as a way
to check updates - but you'd have to test to be sure. [IIRC that will work
though.]
Personally, I'd probably try to write it in bash - or something completely
external to smokeping. [Bash because of few dependicies - though you'll
probably want/need something like sendemail for email notifications...
If slaves are behind NAT or something similar, you'll have to have a way
to get to the slave for handling a restart, but that's really outside the
scope of what you're doing.
Honestly, simply getting notification that a slave is not pushing updates
would be more than enough - even without the restart.
Sounds fab to me. And I can't think of a better way, off hand.
-Greg
Hello,
I have a Debian Jessie box with Smokeping 2.6 installed on it.
It receives data from Slaves over the Internet (10 slaves or so).
Each Slave roughly monitors xDSL or fiber links.
Every monday, I can see that data from one or two slaves is missing.
Then I remotely restart smokeping service on slave where data is missing.
- if no data at all from Slave for a given period of time, then restart
Slave's smokeping service and send a Notice email
- if no data at all from Slave for a longer period of time and Slave's
restart already attempted, then send a Warning email
As Slaves data is stored on a known directory ins Master's filesystem, I
think I can detect when data from a slave has not been lately modified,
reading directories of files modification times.
Is there a better way to do so ? Alert's settings seem more appropriate
when WAN links in my case, are slower.
Best regards
http://www.sloop.net
*---*
_______________________________________________
smokeping-users mailing list
https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
_______________________________________________
smokeping-users mailing list
https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
Gregory Sloop
2018-04-23 16:39:20 UTC
Permalink
Monit won't help if the slave went down because someone unplugged it, or some other disaster befell it. It also won't help if the process is still running, but not actually pushing data to the master.

However, it does have the benefit of being easy to install and configure, with no development/debug time required.

For the use cases I've got, I think something running on the master would be more likely to be helpful more of the time. [I can't recall a single case where the slave was still up and functional, where Monit would do anything, yet the smokeping slave process was borked. But that may just be me.]

-Greg


I second the monit suggestion. I have used it for exactly this purpose (watching/restarting slave threads) in the past.

regards,
Darren

On 23 April 2018 at 05:47, Bill Houle <***@siliconexus.com> wrote:
As someone who recently had to implement a monitor of not-smokeping processes, might I suggest “monit”? It is a fairly mainstream package that is readily available in yum and apt-get repos.

Monit is a locally-installed (ie per slave) daemon process that can monitor files (by timestamp or checksum), processes (by PID), programs (by exit code), and system (by resource consumption). It has a flexible config language that can alert/start/stop/exec based on those monitor conditions.

I could see monit being used to watch each slave and alert and/or auto-restart the data collection.

—bill



On Apr 22, 2018, at 11:29 AM, Gregory Sloop <***@sloop.net> wrote:

This is an awesome idea - and one I've wished for in the past - but never got around to working on.
Checking the slave data files modification times seems plausible as a way to check updates - but you'd have to test to be sure. [IIRC that will work though.]

Personally, I'd probably try to write it in bash - or something completely external to smokeping. [Bash because of few dependicies - though you'll probably want/need something like sendemail for email notifications...

If slaves are behind NAT or something similar, you'll have to have a way to get to the slave for handling a restart, but that's really outside the scope of what you're doing.

Honestly, simply getting notification that a slave is not pushing updates would be more than enough - even without the restart.

Sounds fab to me. And I can't think of a better way, off hand.

-Greg


Hello,

I have a Debian Jessie box with Smokeping 2.6 installed on it.

It receives data from Slaves over the Internet (10 slaves or so).
Each Slave roughly monitors xDSL or fiber links.

Every monday, I can see that data from one or two slaves is missing.
Then I remotely restart smokeping service on slave where data is missing.

I would like to implement something like:

- if no data at all from Slave for a given period of time, then restart Slave's smokeping service and send a Notice email

- if no data at all from Slave for a longer period of time and Slave's restart already attempted, then send a Warning email

As Slaves data is stored on a known directory ins Master's filesystem, I think I can detect when data from a slave has not been lately modified, reading directories of files modification times.

Is there a better way to do so ? Alert's settings seem more appropriate when WAN links in my case, are slower.

Best regards
--
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail: ***@sloop.net
http://www.sloop.net
---
_______________________________________________
smokeping-users mailing list
smokeping-***@lists.oetiker.ch
https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users

_______________________________________________
smokeping-users mailing list
smokeping-***@lists.oetiker.ch
https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
--
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail: ***@sloop.net
http://www.sloop.net
---
Bill Houle
2018-04-23 21:35:29 UTC
Permalink
Well, you could run monit on the master and it would do the just-alerting you wanted. I just thought on the slaves might be better so you have some actual control of the processes and/or visibility into general resources of the host. But hey, it’s Linux; multiple ways to skin that cat.

PS: monit would be free, but (if you implemented on the slaves) you could also throw some $$ at the M/Monit tool which could run on the master and give you a “single pane of glass” view into the entire monit+smokeping master-and-slaves ecosystem...

—bill
Post by Gregory Sloop
Monit won't help if the slave went down because someone unplugged it, or some other disaster befell it. It also won't help if the process is still running, but not actually pushing data to the master.
However, it does have the benefit of being easy to install and configure, with no development/debug time required.
For the use cases I've got, I think something running on the master would be more likely to be helpful more of the time. [I can't recall a single case where the slave was still up and functional, where Monit would do anything, yet the smokeping slave process was borked. But that may just be me.]
-Greg
I second the monit suggestion. I have used it for exactly this purpose (watching/restarting slave threads) in the past.
regards,
Darren
As someone who recently had to implement a monitor of not-smokeping processes, might I suggest “monit”? It is a fairly mainstream package that is readily available in yum and apt-get repos.
Monit is a locally-installed (ie per slave) daemon process that can monitor files (by timestamp or checksum), processes (by PID), programs (by exit code), and system (by resource consumption). It has a flexible config language that can alert/start/stop/exec based on those monitor conditions.
I could see monit being used to watch each slave and alert and/or auto-restart the data collection.
—bill
This is an awesome idea - and one I've wished for in the past - but never got around to working on.
Checking the slave data files modification times seems plausible as a way to check updates - but you'd have to test to be sure. [IIRC that will work though.]
Personally, I'd probably try to write it in bash - or something completely external to smokeping. [Bash because of few dependicies - though you'll probably want/need something like sendemail for email notifications...
If slaves are behind NAT or something similar, you'll have to have a way to get to the slave for handling a restart, but that's really outside the scope of what you're doing.
Honestly, simply getting notification that a slave is not pushing updates would be more than enough - even without the restart.
Sounds fab to me. And I can't think of a better way, off hand.
-Greg
Hello,
I have a Debian Jessie box with Smokeping 2.6 installed on it.
It receives data from Slaves over the Internet (10 slaves or so).
Each Slave roughly monitors xDSL or fiber links.
Every monday, I can see that data from one or two slaves is missing.
Then I remotely restart smokeping service on slave where data is missing.
- if no data at all from Slave for a given period of time, then restart Slave's smokeping service and send a Notice email
- if no data at all from Slave for a longer period of time and Slave's restart already attempted, then send a Warning email
As Slaves data is stored on a known directory ins Master's filesystem, I think I can detect when data from a slave has not been lately modified, reading directories of files modification times.
Is there a better way to do so ? Alert's settings seem more appropriate when WAN links in my case, are slower.
Best regards
--
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
http://www.sloop.net
---
_______________________________________________
smokeping-users mailing list
https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
_______________________________________________
smokeping-users mailing list
https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
--
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
http://www.sloop.net
---
_______________________________________________
smokeping-users mailing list
https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
Continue reading on narkive:
Search results for '[smokeping-users] Alerting when a Slave stops sending data' (Questions and Answers)
228
replies
Is Trump the biggest FAILURE as President in US history?
started 2020-08-20 14:25:08 UTC
politics
Loading...