This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Best Practices for Recovery Points and snapshot frequency

So I have a shiney new DL4300 which I have connected to my VMware Vcenter server and I have made tentative steps setting up snapshots and schedules.

 

However I don't really feel like I know what I should be setting these to ? Should I be doing hourly snapshots of all my servers ? Should I try and stagger the snapshots through an extensive series of off-set schedules ? I need to have a different schedule to back up my vCenter server as doing that at the same time as any other server caused major problems with snapshots not being cleared from the VMWare host correctly.

Feeling a bit clueless on this!

Parents
  • Hi Chris:
    You know your own data protection needs the best and this is where you should start from. There are a few parameters to consider.
    First thing to determine is how important is the data you protect. Everything goes down to establishing priorities and fitting them into the protection resources you have.
    Normally, you have a few machines holding essential data such as Exchange, Core Business and Accounting information etc. Goes without saying that they need to receive most of the attention. You need to set a retention policy, a backup schedule and a backup health check schedule that would make you to state confidently that your company data is safe. Without discarding the advantages of automatic checks such as mountability, attachability etc, I strongly believe that the best health check operation is mounting recovery points (the newest one would do)manually and making sure that the data they contain is accessible.
    Next thing is to setup a disaster recovery plan. For instance, if something goes really wrong, what are the steps to take and the estimated time to bring your network back, first in minimal and then in full working condition. For instance, to recover Exchange in a situation like this you need to have at least a working domain controller with an active PDC role, thus you need to extend the scope. At the same time you need to decide the order in which your environment comes back to life after bringing back the basic network services (and ask the appropriate leadership level in your organization to provide guidance). For instance, after "reviving" active directory, dhcp, dns etc, should the e-mail come next or you have some contingency plan for communications (i.e. via a public im/e-mail provider) and the appropriate step would be bringing up your financial information (AR, AP, POS) etc. This is important as restoring multiple machines takes time and is essentially a sequential process. At this point it makes sense to analyze using virtual standby-s, at least for the main network services.
    The last point of concern are the servers that are important enough to be protected but for some reason, you are not too worried about. For instance servers with a low data change rate or which contain data that is relatively quickly processed or even machines that contain important data which depreciates quickly so a long retention policy is useless.
    At least but not at last, all these activities require resources. Your DL 4300 is a very powerful machine. However, you will notice that you may reach its limits. If your retention policies are too long and you protect a lot of data, you will run out of repository space. If you perform too many concomitant operations, you will run out of physical resources -- the most important being storage IOPS.
    There is an art to balance all this requirements which is validated by experience and it should not be taken easily. After all, the final goal is to reduce the burst of adrenaline received when disaster strikes to a mere trickle...
Reply
  • Hi Chris:
    You know your own data protection needs the best and this is where you should start from. There are a few parameters to consider.
    First thing to determine is how important is the data you protect. Everything goes down to establishing priorities and fitting them into the protection resources you have.
    Normally, you have a few machines holding essential data such as Exchange, Core Business and Accounting information etc. Goes without saying that they need to receive most of the attention. You need to set a retention policy, a backup schedule and a backup health check schedule that would make you to state confidently that your company data is safe. Without discarding the advantages of automatic checks such as mountability, attachability etc, I strongly believe that the best health check operation is mounting recovery points (the newest one would do)manually and making sure that the data they contain is accessible.
    Next thing is to setup a disaster recovery plan. For instance, if something goes really wrong, what are the steps to take and the estimated time to bring your network back, first in minimal and then in full working condition. For instance, to recover Exchange in a situation like this you need to have at least a working domain controller with an active PDC role, thus you need to extend the scope. At the same time you need to decide the order in which your environment comes back to life after bringing back the basic network services (and ask the appropriate leadership level in your organization to provide guidance). For instance, after "reviving" active directory, dhcp, dns etc, should the e-mail come next or you have some contingency plan for communications (i.e. via a public im/e-mail provider) and the appropriate step would be bringing up your financial information (AR, AP, POS) etc. This is important as restoring multiple machines takes time and is essentially a sequential process. At this point it makes sense to analyze using virtual standby-s, at least for the main network services.
    The last point of concern are the servers that are important enough to be protected but for some reason, you are not too worried about. For instance servers with a low data change rate or which contain data that is relatively quickly processed or even machines that contain important data which depreciates quickly so a long retention policy is useless.
    At least but not at last, all these activities require resources. Your DL 4300 is a very powerful machine. However, you will notice that you may reach its limits. If your retention policies are too long and you protect a lot of data, you will run out of repository space. If you perform too many concomitant operations, you will run out of physical resources -- the most important being storage IOPS.
    There is an art to balance all this requirements which is validated by experience and it should not be taken easily. After all, the final goal is to reduce the burst of adrenaline received when disaster strikes to a mere trickle...
Children
No Data