ROS answers down

bionade24 · February 23, 2020, 3:27pm

Why is ROS answers so down again so long? Long outages should not happen in modern environments. Where is your monitoring? Whatever happened, please explain it.

gavanderhoorn · February 23, 2020, 3:32pm

The monitoring is OK, I believe: status.ros.org:

Partial System Outage

but all the monitoring won’t help if there are no humans around (ie: awake) to react to alerts.

Edit: does anyone know why the graphs have disappeared from status.ros.org?

smac · February 23, 2020, 10:38pm

When posting, I think its better form to be less aggressive and try to understand what’s happened and if it bothers you, be part of the solution.

I understand your frustration, but its not productive in this case unless your volunteering to help maintain this infrastructure.

Edit: As an example strategy, in complicated topics I try to take the viewpoint that its “us against the problem” and not “you versus me”.

Katherine_Scott · February 25, 2020, 7:19am

I’m sorry that this happened and caused you some problems. The reality of the situation is that ROS Answers is a bit long in the tooth and could use some love and attention. We don’t have a full time development or ops team working on it at all times (like say some other, larger, Q&A sites). If the server goes down someone needs to actually ssh into the server and restart the service.

The ROS answers server is maintained by a group of people who aren’t formally “on-call”. We do have a monitoring service attached to ROS answers but it isn’t like anyone has a “pager” that goes off in the middle of the night if something goes wrong. This particular outage happened on the evening over the weekend for most of the admins. The server came back up approximately when an admin looked at their e-mail on Sunday morning.

gavanderhoorn · February 25, 2020, 10:36am

Would distributing that “reset the server” duty across a couple of time-zones help?

I’d be willing to push the button if needed during regular business hours here in Europe.

gonzalocasas · February 25, 2020, 12:37pm

Count me in as well to be in the pool of ssh reseters

bionade24 · February 29, 2020, 9:07pm

Or simply since you already monitor ROS answer outages, let a script restart the service. No human interaction needed.

gbiggs · March 4, 2020, 7:40am

I can do the Asia time zone if needed.

Topic		Replies	Views
Answers is down (502s), but status page doesn't reflect it ROS General	6	926	October 16, 2019
Answers.ros.org: The System is Down Site Feedback	5	2495	December 17, 2017
Is there a reason ROS Answers is down? ROS General	1	306	November 30, 2024
Answers.ros.org Down 2019-08-06 Site Feedback answers	0	545	August 6, 2019
Ros.org latency & availability ROS General	12	2890	November 2, 2017

ROS answers down

Related topics