?

Log in

No account? Create an account
entries friends calendar profile Previous Previous Next Next
Ideas needed - Ed's journal
sobrique
sobrique
Ideas needed
Right. I'm trying to diagnose a problem with our backup server.
I have a log file, that tells me: start time, finish time, 'quantity' of data backed up and number of files. Along with things like filesystem and backup 'level' (incremental or full).

The problem is, I _know_ it's running 'a bit slow'. And I need to figure out how best to represent this information, such that it's 'obvious' if there's a particular culprit.

My suspicion is that we're just 'overloaded' and need more tape drives, but they're not cheap, and so if we go that route, it'd better fix the problem...

Anyone got thoughts on the subject?
4 comments or Leave a comment
Comments
xarrion From: xarrion Date: September 14th, 2004 06:03 am (UTC) (Link)
With that info, I'd probably go back a while (provided you keep historical logs like the one you've described) and calculate the ave. backup speed. (perhaps one ave. for quantity, another for files). Then dump it all into a spreadsheet, create a graph and see if there's a certain day when the backup time 'spikes'. Or do it by eye, but it'd be harder to spot, I'd imagine.
sobrique From: sobrique Date: September 14th, 2004 06:10 am (UTC) (Link)
Ah, I can _do_ a throughput graph. But the problem I've got is 4 tapedrives and 'backup sets' mean I get multiple concurrent backups. Which can be hard to track which is slow, especially if one is incremental, the other 'full'.
From: erisreg Date: September 14th, 2004 06:36 am (UTC) (Link)

got thoughts

are you keeping just event logs, or are you doing full logging with PID and timing,.. with the full info you can track the bottle necks that are happening and pinpoint the cause of those bottlenecks,..0.0
sobrique From: sobrique Date: September 14th, 2004 07:15 am (UTC) (Link)

Re: got thoughts

At the moment we're keeping 'backup logs'. Eg. job schedule, start time, throughtput, ufsdump level etc.

At the moment, I'm aiming for staggered chart looking something like this:
18:00-------19:00-----20:00
---                           HOSTNAME:/FS
 --                           HOSTNAME:/FS
 --                           HOSTNAME:/FS
   ----                       HOSTNAME:/FS  
   --                         HOSTNAME:/FS 
   --------                   HOSTNAME:/FS
     --- 
    
Throughput
18:00-------19:00-----20:00
  |    | 
 |||   ||
 ||||  || 
|||||  ||

Ok, crappy formatting I know ;p And it'll probably not turn out right when I submit this, so I can try to corellate 'slow' jobs, with throughput troughs.

Full logging is about plan D at the moment, because the amount of rubbish something like a truss will grab is going to be horrible. We're talking maybe a Terabyte a night here, so...
4 comments or Leave a comment