Search Archive:
<< Previous      Next >>
726 Views Mar 24 Major Outage [resolved]

A water leak drowned a rack of IEM servers and there's much pain to be had at the moment. I am unsure yet how I can get things back going, but am trying! Thanks for your patience.

Update 9 PM 24 March: Oye, what a horrible mess. Firstly, thanks for your patience today and for the help the folks within Agronomy IT provided me. Most things should be running OK now, but there are about four datasets and services that are degraded. Those include MRMS data, mtarchive stuff, an auxillary mtarchive data service, and some local research data service. Two network switches were shorted out and I am down a bunch of compute capacity at the moment. In general, please do reach out to me if you find something that is not working. I have deployed a boatload of band-aids over the past 11 hours, so we shall see how many of these hold up.

Update 8 AM 27 March: Recovery efforts will continue today and there will likely be some brief outages of this website and related services as repairs continue and temporary workarounds are undone. Will itemize progress here today. Thanks again for your patience.

Update 12 PM 27 March: The MRMS service has been recovered. There was some unavoidable data loss during the 24th, but NOAA now has a AWS Archive that is much more usable than mine anyway!

Update 10 PM 27 March: The MTArchive service has almost been fully recovered and should have a small hole repaired by mid-day tomorrow. It also had some unavoidable data loss during the outage on the 24th.

Update 11 PM 27 March: The SMOS dataflow has been fixed.