0.6.0 Forking Bug Post-mortem

Discussion in 'Announcements' started by jy-p, Nov 27, 2016.

  1. 2017/12/15 - Decred v1.1.2 released! → Release Notes  → Downloads
  1. jy-p

    jy-p Sr. Member
    Organizer

    Jan 2, 2016
    133
    340
    Male
    A 0.6.0 forking bug was triggered during the afternoon of Friday, November 25th, 2016, at block 84120, which was mined at appoximately 14:44 CST. This bug only affected nodes running version 0.6.0 for reasons that will be explained below. While, technically speaking, this was indeed a forking bug in version 0.6.0, there were only 1 or 2 blocks mined on the forked chain, so there were not 2 chains of comparable length running in parallel. Rather, the 0.6.0 nodes effectively stopped since their chain had stalled. Roughly 7 hours later, at block 84157, a 0.6.1 patch release was pushed that fixed this issue. We also pulled the 0.6.0 release from GitHub to prevent anyone from inadvertently downloading it in the future. Running any version besides 0.6.0 will get your node past block 84120, but you may need to restart dcrd a couple times to get the chain to sync.

    This bug involved a number of issues that worked in concert to jam the chain for 0.6.0 nodes. Those familiar with software development are welcome to have a look at the commit that fixes the problem.

    Since most users are not developers, this commit requires some explanation. There are a few key points to take away from this commit:
    • The dcrd code from the initial February 8th release contained a bug where the function isMajorityVersion was called using an incorrect variable. When iterating to determine whether a new majority version had been triggered, the iteration was only occurring over the last CurrentBlockVersion blocks, instead of the last BlockRejectNumRequired blocks. Since CurrentBlockVersion was set to 2 for the 0.6.0 release, this meant that only having 2 version 2 blocks in a row triggered the requirement that all blocks must be version 2 from that point forward. The expected value for BlockRejectNumRequired is 950 on mainnet, so this led to premature rejection of blocks that were version 1. The commit fixes the incorrect parameters.
    • It is clear that either the majority of the PoW miners and/or the PoS miners were not running version 0.6.0, which led to the 0.6.0 nodes jamming. Based on the blocks that followed 84118-84120 having v1 instead of v2, it indicates that either (A) the majority of PoW was running v1 nodes and the majority of PoS was running v2 nodes or (B) the majority of PoS was running v2 nodes and the majority of PoS was running v1 nodes. Once the majority of PoW and PoS miners were running non-0.6.0 nodes, the chain started running smoothly again.
    • CurrentBlockVersion really has no business being a parameter for the whole chain since it will vary as the chain advances. CurrentBlockVersion should not be included in the consensus rules. The commit removes this variable from the list of chain-wide parameters.
    • CurrentBlockVersion should not have been changed to 2 for version 0.6.0, and it should have been tested on testnet and/or simnet before pushing it to mainnet, to verify it had no adverse effects. We were already planning to do this more thorough testing of the upcoming softfork in 0.7.0 here, here and here.
    • Additional full-block tests need to be added to dcrd to ensure version changes do not fork or otherwise halt the chain. These tests were missed since we did not expect version enforcement to occur until the soft fork enforcement is added in the coming 0.7.0 release.
    Creating an easy-to-access fix for this bug required cutting an emergency patch release, version 0.6.1, on a Friday evening, and this took us a few hours to complete. We take this forking bug as a reminder that both test coverage and behavioral testing are key to ensuring the stability of the Decred chain, especially now that we are blazing a new trail towards upgrading consensus code on an ongoing basis.

    Feel free to ask any further questions you have on this topic in this thread.
     
  2. Dyrk

    Dyrk Sr. Member
    Developer

    Jan 7, 2016
    518
    376
    Male
    Wonderland
    Thanks everyone for great work and hot-fixing. I had fun at night updating many of stakepool / dcrstats / evolution / private nodes :) Thank you @davecgh for being in touch all this time in IRC.
     
    jy-p and Alexoz like this.
  3. Alexoz

    Alexoz Member

    Good job guys!
     

Share This Page