ProjectLocker Emergency Maintenance - August 5, 2015 - pl3.projectlocker.com

Published 05 August 2015 by C. G. Brown

4:52 ET

The server appears to be back online. Customers should again have access.

4:08 ET

The RAID Arrays are 93% rebuilt.

3:21 ET

Tthe RAID Arrays are 67% rebuilt. 

2:21 ET

The faulty drives have been replaced and the RAID Arrays are 45% rebuilt. 

12:32 ET

The data center has reported two drives with warnings, which caused degradation of the array. We have requested that they replace the drive that is in worse shape, rebuild the array, and get customers back online ASAP. We will then take an extra backup of the server and schedule an outage to replace the second drive during off-hours, where there will be less customer impact.

11:22 ET

Continue Reading...

Topics: Maintenance

ProjectLocker Maintenance - July 4 - pl1.projectlocker.com

Published 03 July 2015 by C. G. Brown

On Saturday, July 4, from 1 AM CDT (6 AM GMT) to  10 AM CDT (3 PM GMT), server pl1 will be undergoing maintenance to replace a hard drive and rebuild the RAID array. Access to Subversion, Git, and Trac services will be suspended during this time for that server. Users with a URL that ends in '1' will be affected (free1, venture1, etc.).

Continue Reading...

Topics: Maintenance

Post-Mortem: Emergency Maintenance on May 21-22

Published 02 June 2015 by C. G. Brown

All times are US Eastern Daylight Time unless otherwise indicated.

What Happened?

At about 5 PM on May 21, we received a call from a customer complaining that access to server pl2 was unavailable. We logged on to the server and noted that any attempts to run commands resulted in either a complaint about read-only access or a bus error. We rebooted the server, which restored access, and informed the customer.

We did some initial research and the error indicated that there may be an imminent failure of the RAID array. We noted the issue and began preparations for a maintenance window over the weekend.

At about 8 PM that night, the error recurred and we started to receive tickets indicating that the issue had resurfaced. By 10 PM, our team had met and developed an emergency maintenance action plan. We informed our customers via the ProjectLocker Blog and Twitter that there would be an emergency maintenance window on the server and told our data service provider, IBM Softlayer, to schedule an immediate diagnostic.

The diagnostic indicated that the drive failed and needed immediate replacement. The team replaced the drive and began the rebuild. By 1:47 AM on May 22 the drive had been replaced and the rebuild was in progress.

Due to the nature of the standard IBM configurations, it was not possible to run the server in a fully usable state while the RAID array rebuild was in progress. We were told the rebuild would take about 12 hours.

We checked in about 11 AM to try to get an update. Due to a miscommunication with the data team, it appeared that they indicated that the process would need to be interrupted for us to determine status, and they would receive some sort of notification when the rebuild was complete.

We maintained communications with the team throughout, and at about 4:30 PM, the miscommunication was cleared up when we asked again about when the rebuild would be complete. They rebooted the server and successfully restarted all services. The server was back online by 5:34 PM.

What Went Well?

Our team quickly isolated the issue and was able to initiate the rebuild during off-peak hours. IBM was communicative and responsive throughout the entire process. We informed customers via a publicly accessible location. No other servers were affected. We have not had an issue like this since April 2011.

What Could Have Been Done Better?

The initial response speed was appropriate. However, our in-house team should have had a better understanding of what the disaster recovery process would look like in detail and relied less on the IBM team for guidance. The majority of the downtime for the rebuild was

Continue Reading...

Topics: Disaster Recovery, Maintenance

ProjectLocker Emergency Maintenance May 21-22 - pl2.projectlocker.com

Published 21 May 2015 by C. G. Brown

05:23 PM ET

At this time, service appears to be restored in full. We are running checks to ensure the server is up, but you may begin commits and other normal activities.

05:10 PM ET

We are taking down the OS to check status of the build in BIOS mode. We will update when we have it.

03:36 PM ET

The rebuild continues. Our particular RAID configuration did not allow rebuild with degraded performance uptime, so we are continuing to experience outage in the interim. We're continuing to monitor.


07:45 AM ET

The RAID array is continuing to rebuild. Our hardware team estimates that the build will take over 12 hours. We're continuing to monitor progress and will keep you updated.

1:08 AM ET

We've identified an issue with one of the drives. We'll need to replace the drive and rebuild the RAID array, which we are doing now. The server will be offline for a few hours while this takes place, but there should ultimately be no loss of data. We apologize in advance to our international customers.


11:30 PM ET

We are currently performing emergency maintenance on server pl2. You may experience intermittent access issues with that server during that time.. Users with a URL that ends in '2' will be affected (free2, venture2, etc.).

Users on other servers will not be affected, and Portal will be accessible, though it's recommended that affected users do not make changes to users or projects during the window. 

You can email us at support [-at-] with any questions.

Continue Reading...

Topics: Maintenance

ProjectLocker Emergency Maintenance April 16 - pl2.projectlocker.com

Published 15 April 2015 by C. G. Brown

On Thursday, April 16, from 1 AM CDT (6 AM GMT) to 4 AM CDT (9 AM GMT), server pl2 will be undergoing emergency maintenance. Access to Subversion, Git, and Trac services will be suspended during this time for that server. Users with a URL that ends in '2' will be affected (free2, venture2, etc.).

Continue Reading...

Topics: Maintenance

Get Updates by Email

Follow @ProjectLockerHQ on Twitter

Follow Us

Free Checklist: How to Choose Source Control for your Project