Problem on *1.projectlocker.com

Published 19 February 2016 by C. G. Brown

We are aware of an issue on pl1.projectlocker.com. This affects all users whose server name ends in 1. We are researching the issue and attempting to bring the server back online. We will update this ticket as we learn more.

Continue Reading...

Sluggish Network Issue

Published 20 January 2016 by Runako Godfrey

UPDATE: 20 Jan 21:49 GMT-5

There appears to have been a routing issue in our provider's data center. There was a bad connection between two routers outside of the ProjectLocker-specific portion of the network but inside our data center, and traffic through those routers had problems. The resulting packet loss was significant enough to impede network speeds.

Thank you for your patience. At this time, we believe the issue is resolved but will continue to monitor. Please try your connections now and let us know via support@ if you still see performance problems.

UPDATE: 20 Jan 20:59 GMT-5

The network team has identified and tuned a possible culprit. We are testing to determine the status.

UPDATE: 20 Jan 19:05 GMT-5 

Diagnostics on the network have proved inconclusive. We're doing further diagnostics on the server to determine if there's any hardware issue.

UPDATE: 20 Jan 17:29 GMT-5

The network team is performing a more detailed physical investigation of the network to determine where there might be problems.

UPDATE: 20 Jan 17:00 GMT-5

The network team has performed an initial investigation, and we're pushing for additional checking on a packet loss issue we're seeing on some of the internal switches inside the data center and outside of our specific network.

We've been tracking a networking issue affecting some of our customers on some of our servers today (see the list of affected servers below). We are aware of this issue and are actively working to resolve it. 

Continue Reading...

Topics: Maintenance

ProjectLocker Emergency Maintenance - August 5, 2015 - pl3.projectlocker.com

Published 05 August 2015 by C. G. Brown

4:52 ET

The server appears to be back online. Customers should again have access.

4:08 ET

The RAID Arrays are 93% rebuilt.

3:21 ET

Tthe RAID Arrays are 67% rebuilt. 

2:21 ET

The faulty drives have been replaced and the RAID Arrays are 45% rebuilt. 

12:32 ET

The data center has reported two drives with warnings, which caused degradation of the array. We have requested that they replace the drive that is in worse shape, rebuild the array, and get customers back online ASAP. We will then take an extra backup of the server and schedule an outage to replace the second drive during off-hours, where there will be less customer impact.

11:22 ET

Continue Reading...

Topics: Maintenance

ProjectLocker Maintenance - July 4 - pl1.projectlocker.com

Published 03 July 2015 by C. G. Brown

On Saturday, July 4, from 1 AM CDT (6 AM GMT) to  10 AM CDT (3 PM GMT), server pl1 will be undergoing maintenance to replace a hard drive and rebuild the RAID array. Access to Subversion, Git, and Trac services will be suspended during this time for that server. Users with a URL that ends in '1' will be affected (free1, venture1, etc.).

Continue Reading...

Topics: Maintenance

Post-Mortem: Emergency Maintenance on May 21-22

Published 02 June 2015 by C. G. Brown

All times are US Eastern Daylight Time unless otherwise indicated.

What Happened?

At about 5 PM on May 21, we received a call from a customer complaining that access to server pl2 was unavailable. We logged on to the server and noted that any attempts to run commands resulted in either a complaint about read-only access or a bus error. We rebooted the server, which restored access, and informed the customer.

We did some initial research and the error indicated that there may be an imminent failure of the RAID array. We noted the issue and began preparations for a maintenance window over the weekend.

At about 8 PM that night, the error recurred and we started to receive tickets indicating that the issue had resurfaced. By 10 PM, our team had met and developed an emergency maintenance action plan. We informed our customers via the ProjectLocker Blog and Twitter that there would be an emergency maintenance window on the server and told our data service provider, IBM Softlayer, to schedule an immediate diagnostic.

The diagnostic indicated that the drive failed and needed immediate replacement. The team replaced the drive and began the rebuild. By 1:47 AM on May 22 the drive had been replaced and the rebuild was in progress.

Due to the nature of the standard IBM configurations, it was not possible to run the server in a fully usable state while the RAID array rebuild was in progress. We were told the rebuild would take about 12 hours.

We checked in about 11 AM to try to get an update. Due to a miscommunication with the data team, it appeared that they indicated that the process would need to be interrupted for us to determine status, and they would receive some sort of notification when the rebuild was complete.

We maintained communications with the team throughout, and at about 4:30 PM, the miscommunication was cleared up when we asked again about when the rebuild would be complete. They rebooted the server and successfully restarted all services. The server was back online by 5:34 PM.

What Went Well?

Our team quickly isolated the issue and was able to initiate the rebuild during off-peak hours. IBM was communicative and responsive throughout the entire process. We informed customers via a publicly accessible location. No other servers were affected. We have not had an issue like this since April 2011.

What Could Have Been Done Better?

The initial response speed was appropriate. However, our in-house team should have had a better understanding of what the disaster recovery process would look like in detail and relied less on the IBM team for guidance. The majority of the downtime for the rebuild was

Continue Reading...

Topics: Disaster Recovery, Maintenance

Get Updates by Email

Follow @ProjectLockerHQ on Twitter

Follow Us

Free Checklist: How to Choose Source Control for your Project