Another Shut down a Core post

Is this even a topic of discussion in the product for any version? This has been a huge topic of discussion and anger since version 5 came out and zero progress has been made

And now in V6, it seems to be getting WORSE. 

1) It seems like the changes to the repository in v6, cause the checks that runs after a dirty shutdown to take longer (just a guess)

2) The unsupported powershell script that was provided by DELL for v5 does not work for 6 and Support just told me there is no script for 6 and never will be.

3) Long running jobs. With the introduction of cloud archives, exports etc, we are seeing massive jobs that run (and block other jobs like backups but that is another issue) for days, meaning scheduling a restart becomes even harder.

I am not even asking for the Core to be able to cleanly shutdown during an OS shutdown (something that every other application in the world seems to manage) But how is this not a single button(s) on the GUI "Prepare for Shutdown" and "Core has been restarted" (to un-pause jobs)

Or at the very least, a single supported powershell script to perform this basic function?

We have dozens of Cores in various geographic locations and managing simple shutdowns is such a massive problem for us.

  • Hi scashman:
    The issue with shutting down a core graciously arouse from the jobs that are running at that time. Assuming that there are no stuck or in-cancellable jobs, a "template" to shut down the core graciously via Powershell would look as below:

    # disable core service
    set-service RapidRecoveryCore -StartupType Disable
    # suspend snapshots
    suspend-snapshot -all
    # if incoming replication is present
    suspend-replication -incoming all
    # cancel all active jobs (you need to wait at least 30 sec. after no active jobs are reported)
    for(;;){stop-activejobs -all}
    # stop core service
    stop-service RapidRecoveryCore
    # when the core service stopped, find out what other services are still running
    get-service Dell*,Rapid*
    # stop those services either separately or as a block
    stop-service Rapid*,Dell*

    When you restart the core, you need to use sc.exe to re-enable the core service as PowerShell does not have a command to enable a service in delayed-start mode.
    sc.exe config RapidRecoveryCore start= delayed-auto
    (note a space after the "=" sign)
    # start the core service
    start-service RapidRecoveryCore

    Only the core service needs to be started (assuming that RapidRecoveryMongod starts together with the core)
    # Wait for the repository check to finish and enable snapshots
    resume-snapshot -all
    # resume incoming replication if present
    resume-replication -incoming all

    Hope that this helps.
  • In reply to Tudor.Popescu:

    We all know what the powershell approach is. I mentioned the issues with powershell in my original post. So when you respond with powershell as the answer, it makes me worry that no one at Quest is listening to our feedback or understands what our issues are

    - This post is from 2014 ... and nothing has changed.

    en.community.dell.com/.../

    - Powershell is un-supported. So your suggested method to power down your product is to use unsupported commands and scripting.

    - You want every single customer to build their own script to power down your product.

    - When a change is made that breaks our script, you want every one of us individually to spend our time to fix the script that powers down your product.

    - Just look at the amount of work above. Name one other product that takes this much individual work, all on the customers shoulders to build and manage, to simply power down.

    - Look at the debacle with Windows 2012 and the "hidden" windows key to shut down the OS. The process to shut down Windows was a bit harder but no where near this and yet MS changed it after the feedback in the next release, R2.

    - "The issue with shutting down a core graciously arouse from the jobs that are running at that time"

    I know what the issue is, but why is this our problem? Why cant you guys write some code that handles shutting down your product. Pause jobs, wait till jobs are done and then send a message to the terminal that the Core is ready.

    Lastly, I would like to say that the fact we are still talking about basic things like a supported method to reliably and easily power down your product takes away time and energy that we could use to talk about real issues with the product.
  • In reply to Emte:

    Hi scashman:
    Don't shoot the pianist (support engineer). :)
  • In reply to Tudor.Popescu:

    Never. I feel your pain. But this issue is so obvious and has been going for so long. I would love to see someones input that has the power to address it vs everyone just brushing it aside and saying powershell

    #Gina? (does this work here?)

    Who else should I tag

    When I talk to most new admins, they typically don't even know that the product needs some type of external power-down, they assume it acts like every other application in the world and works with an OS power down.

    I wonder how much data loss (repo corruption) occurs just because of this simple issue
  • In reply to Emte:

    These are all very good points.

    I can tell you that we have evaluated each of the concerns you listed above, and we will be addressing the majority of them in the next major release of Rapid Recovery.

    Also, we will be implementing a customer facing ideation portal in the next week or so. This will allow you to submit ideas directly for consideration, as well as vote on ideas for prioritization.

    We have many additional exciting new changes coming to Rapid Recovery this year. Stay tuned.
  • In reply to Roger.Layton:

    Thanks Roger

    Could you give more details about how this is being fixed? This has been an issue for a long time and I (and several other long time product users) have been a very vocal critic of it, so we would love to give feedback on the idea before its locked in

    The ideation portal maybe a good idea, but would need to be a 2 way feedback mechanism to be useful. If its just a 1 way submission form, its not going to be helpful
  • In reply to Emte:

    For what it's worth.

    With AA all we ever did was shut off the O/S and it was never an issue. New install of RR and had some concerns so I killed the core service (services.msc), restarted the O/S and restarted the core and had no problems.

    To be fair we don't have round the clock snapshots and I waited for a pause in the action first but I wanted to pass it on for when this gets read in the future.

  • In reply to laytonj:

    One of the main issues to be aware of with doing that is when you have increased the size of the deduplication cache on your core. During core shutdown we must stop all jobs (some jobs are not cancellable, so they must complete prior to the core shutting down), then we flush the deduplication cache from memory to disk, then the service will completely stop. If say you configure a 20 GB deduplication cache, the core must write 20 GB of data to disk prior to stopping the service. On a RAID 1 array with 2 x 10K SAS drives, this takes somewhere between 3 and 4 minutes generally. So the core software will take a minimum of 4 minutes to stop. Windows generally does not allow such a long timeout and will forcibly terminate the process prior to it's graceful shutdown. This in turn can cause corruption in the deduplication cache. So making sure the service stops before shutting down is important.

    Another potential pitfall is if there are other jobs running that were modifying data in the repository and those jobs were not cancellable when you initiate a shutdown. The core service continues those jobs even as Windows tells it to stop. So when Windows finally reaches it's timeout and forcibly terminates the core, it is very possible that corruption could be introduced in the repository since the core will not know what the last write was that was committed to disk.

    These are two of the most common reasons the core does not stop gracefully during a Windows shutdown in Rapid Recovery and I'm sure are just some of the underlying concerns that generated scashman's post here.
  • In reply to Emte:

    While some of the specifics are still being worked out, the high level overview involves pausing current processes, cleanly stopping the repository, and shutting down the Rapid Recovery services. The server can then be restarted while maintaining repository integrity.

    With regard to the ideation portal, comments will be allowed per feature. This allows the two-way feedback you asked for, and provides the mechanism to further define each feature before implementation.
  • In reply to Roger.Layton:

    Thank you for the info.

    Does "pausing current processes" mean a Core will be able to pause all types of jobs and resume them without issue after restart? There are several types of jobs that run so long and are not able to handle a restart currently, this would be a great feature.

    Is this planned to be a multi-step process or a single "Stop Core" (and Resume Core) function as far as the user is concerned?
  • In reply to Emte:

    The goal is to create a single "stop core" feature for the user.

    As you can imagine, there are many dependencies between tasks in Rapid Recovery. Initially, running tasks will be stopped (cancelled), but some of the new technologies we're building should allow certain tasks to be paused and resumed in later product releases.