Railway - Internal network failure – Incident details

Internal network failure

Resolved
Operational
Started about 2 months agoLasted about 3 hours

Affected

Dashboard

Operational from 1:35 PM to 1:52 PM, Degraded performance from 1:52 PM to 2:03 PM, Operational from 2:03 PM to 4:27 PM

Builds

Major outage from 1:35 PM to 2:03 PM, Degraded performance from 2:03 PM to 2:08 PM, Operational from 2:08 PM to 2:19 PM, Major outage from 2:19 PM to 4:27 PM

Builders

Major outage from 1:35 PM to 2:03 PM, Degraded performance from 2:03 PM to 2:08 PM, Operational from 2:08 PM to 2:19 PM, Major outage from 2:19 PM to 4:27 PM

Image Registry

Major outage from 1:35 PM to 2:03 PM, Degraded performance from 2:03 PM to 2:08 PM, Operational from 2:08 PM to 2:19 PM, Major outage from 2:19 PM to 4:27 PM

Third-Party: npm Registry

Third-Party: Github Integrations

Updates
  • Resolved
    Resolved

    We are resolving the incident while continuing to root-cause the issue.

    The recent incidents with network update propagation were related to internal connectivity issues between our api servers and the backing database. We upgraded our connection pooler to pull in some patches to address this, but that did not have the desired effect.

    We've reverted this change and scaled up the pooling layer and are continuing to monitor the situation.

  • Investigating
    Update

    We’re investigating errors with network initializations on builds, and have paused new deployments for hobby and trial users.

  • Investigating
    Investigating

    We're seeing failures on builds and deployments again, and are investigating the cause of the incident.

  • Monitoring
    Update

    We are still monitoring the fix. If your deploy failed or your service is getting network errors, please redeploy it from the dashboard.

  • Monitoring
    Update

    Our fix has deployed and we are re-enabling builds and deploys. We will keep monitoring as new builds go out.

  • Monitoring
    Monitoring

    We implemented a fix and are currently monitoring the result.

  • Investigating
    Update

    Certain parts of the dashboard may fail to load when accessing affected components - mainly the architecture tab of a project

  • Investigating
    Update
    We are currently investigating this incident.
  • Investigating
    Investigating

    We are currently investigating an issue with our internal networking. Builds are paused