hate these ads?, log in or register to hide them
Page 1 of 2 12 LastLast
Results 1 to 20 of 30

Thread: last night's downtime and the fight against forum downs

  1. #1
    Administrator EntroX's Avatar
    Join Date
    April 9, 2011
    Location
    Miami, FL
    Posts
    10,984

    last night's downtime and the fight against forum downs

    i figured i should give a more technical explanation to what happened.

    the last few crashes we had were due to a mix of issues, i will start from the actual source of it all. gzip.

    before the Drakma drama i would do 2 backups a day, very early in the day and very late, usually the times with less users online to lower the impact of the database compression (database un-compressed is about 1.8gb, compressed its around 250mb) using a lot of resources. this actually had caused at least one kernel panic, its kinda funny how it happens but i'll explain it in detail.

    - database gets exported
    - database gets compressed
    -> cpu goes to 100%
    -> mysql starts to struggle to read&write content
    -> keeps making more connections
    -> runs out of connection slots
    -> vbulletin can no-longer open new connections to mysql
    (now here its the funny thing, this wouldn't always happen, i'd say that 1 every 20 times the machine would actually crap itself)
    -> vbulletin starts sending emails of vBulletin cannot connect to the DB for every request, about 200-300 emails per minute
    -> mysql gives up and hangs up
    -> sendmail keeps sending emails and emails and emails
    -> vm crashes and dies

    this would be the worst case scenario, on most of the times it would peg a bit, shoot 200-300 emails and thats it, no downtime at all, now after the drama i decieded to finally up the backups to once per hour, this is something i always wanted to do but never actually did it. some of you might have noticed that since then the forum had been acting a bit..funny, i would often get emails about mysql dying every 2 or 3 hours because of the CPU pegging up, but miraculously it would stay up anyways.

    last nights downtime however was a bit different.

    vB as usual started dumping the database but encountered a slight...problem. it ran out of space:



    the way backups are working is as follows:

    vB dumps the db -> db gets compressed -> db gets moved to a dropbox folder -> (at my house)backup gets moved into my drobo

    this has been working flawlessly with a slight exception, dropbox (at least on Linux and OSX) likes to generate cache files and not delete them at all, because this makes oh-so-much-sense right?

    Code:
    [snip@snip ~]# du -a / | sort -n -r | head -n 10
    20183731    /
    14918776    /root
    14851700    /root/Dropbox
    14592164    /root/Dropbox/.dropbox.cache
    6776052    /root/Dropbox/.dropbox.cache/2012-06-23
    6259832    /root/Dropbox/.dropbox.cache/2012-06-24
    3559144    /var
    1779740    /var/lib
    1743440    /var/lib/mysql
    1711588    /var/lib/mysql/fhc_vbulletin
    there were at least 20GB taken by dropbox cache, i simply deleted them and boom, here is a graph of the current disk usage after clearing the dropbox cache and the failed backups from last night:



    now that i had space back i simply wrote a quick bash script and added it to the crontab so it dumps the dropbox cache every day so it keeps things neat and clean.

    but that's only one part of the problem, the fact is that every time a backup would get compressed the whole forum would go into a crawl or even crash, after reading up a bit i found a nifty tool called cpulimit, i simply run it in a separate screen and it detects when gzip runs and limits its CPU usage to 30% and waits for the next run, i've been monitoring this and the CPU graph shows up that its working rather well now:



    this is an after and before graph of the CPU usage, the large peak was the CPU usage using an un-capped gzip, its all smooth afterwards since it started running it capped.


    so great, i fixed both of the problems that caused last night's crash, but instead of leaving it there i finally dusted off my good old nagios setup and now every single time any FHC hosted service goes down or starts having performance issues i will be immediately emailed and SMS'd in order to fix it, of course you can reach me on IRC/MSN if needed but that should help quite a bit.



    i figured you guys deserved an explanation so that's why i threw this out there, of course i will greatly appreciate any feedback on how i can improve things overall and what-not.
    Last edited by EntroX; June 25 2012 at 10:17:44 PM.
    need me?, contact me on Discord (entrox#0001) or via e-mail (entrox@failheap-challenge.com).

  2. #2
    Helen's Avatar
    Join Date
    April 9, 2011
    Posts
    4,188
    shutupandstartchargingusmonehs

  3. #3
    XenosisReaper
    Guest
    nice background, also nice post

  4. #4
    Paradox's Avatar
    Join Date
    December 24, 2011
    Location
    Deepest Darkest Devonshire
    Posts
    8,042
    EntroX best troX. Nice work.

  5. #5
    Movember 2012Donor ctrlchris's Avatar
    Join Date
    April 10, 2011
    Location
    Fuckin roos m8
    Posts
    10,747
    I love you trox
    <3

    Tapatalk etc


    Your posting is medium, its not rare and its not well done
    - Krans 26/7/12

  6. #6
    Donor Mike deVoid's Avatar
    Join Date
    April 9, 2011
    Location
    Nottingham, UK
    Posts
    6,900
    Now you've got Ups

  7. #7
    Donor Sponk's Avatar
    Join Date
    April 10, 2011
    Location
    AU TZ
    Posts
    11,502
    Why the hell didn't you just renice the gzip script?
    Contract stuff to Seraphina Amaranth.

    "You give me the awful impression - I hate to have to say - of someone who hasn't read any of the arguments against your position. Ever."


  8. #8
    Administrator EntroX's Avatar
    Join Date
    April 9, 2011
    Location
    Miami, FL
    Posts
    10,984
    Quote Originally Posted by Sponk View Post
    Why the hell didn't you just renice the gzip script?
    because the backups are done via the vBulletin backup function, if i tweaked it to do that every vB update would mean i have to remember to modify it again, i eliminate that possibility by using something external.
    need me?, contact me on Discord (entrox#0001) or via e-mail (entrox@failheap-challenge.com).

  9. #9
    Varcaus's Avatar
    Join Date
    May 15, 2011
    Posts
    20,880
    I came hoping to under stand what happened. Nope

  10. #10
    Administrator Movember 2012 Don Pellegrino's Avatar
    Join Date
    April 9, 2011
    Location
    Montreal, Canada
    Posts
    3,361
    FHC devblogs

    Sent from my GT-I9100M using Tapatalk 2

  11. #11
    XenosisReaper
    Guest
    Quote Originally Posted by Don Pellegrino View Post
    FHC devblogs

    Sent from my GT-I9100M using Tapatalk 2
    I'm still waiting for the financial report showing me where all my donation money is going.

  12. #12
    Movember 2012Donor ctrlchris's Avatar
    Join Date
    April 10, 2011
    Location
    Fuckin roos m8
    Posts
    10,747
    Quote Originally Posted by XenosisReaper View Post
    Quote Originally Posted by Don Pellegrino View Post
    FHC devblogs

    Sent from my GT-I9100M using Tapatalk 2
    I'm still waiting for the financial report showing me where all my donation money is going.
    Alcohol


    Your posting is medium, its not rare and its not well done
    - Krans 26/7/12

  13. #13
    Administrator Movember 2012 Don Pellegrino's Avatar
    Join Date
    April 9, 2011
    Location
    Montreal, Canada
    Posts
    3,361
    Quote Originally Posted by ctrlchris View Post
    Quote Originally Posted by XenosisReaper View Post
    Quote Originally Posted by Don Pellegrino View Post
    FHC devblogs

    Sent from my GT-I9100M using Tapatalk 2
    I'm still waiting for the financial report showing me where all my donation money is going.
    Alcohol
    Well SOMEONE has to pay for entrox's rum and tequila

    Sent from my GT-I9100M using Tapatalk 2

  14. #14
    XenosisReaper
    Guest
    Quote Originally Posted by ctrlchris View Post
    Quote Originally Posted by XenosisReaper View Post
    Quote Originally Posted by Don Pellegrino View Post
    FHC devblogs

    Sent from my GT-I9100M using Tapatalk 2
    I'm still waiting for the financial report showing me where all my donation money is going.
    Alcohol
    So, basically, I'm not donating money I could have spent on alcohol so somebody else can not buy alcohol and fix FHC?

    I'll keep not donating then I guess.

  15. #15

    Join Date
    September 13, 2011
    Location
    Norway
    Posts
    870
    you need "new" server hardware?

  16. #16
    filingo's Avatar
    Join Date
    April 9, 2011
    Location
    in space
    Posts
    5,023
    your phone has pantsu on it
    No longer Deleting all your posts erryday due to butthurt
    usually pink or pinkest flamingo in other games


    FREE NYAN CAT
    JUSTICE FOR AMANTU
    http://eve.enviroweb.org/

  17. #17
    walrus's Avatar
    Join Date
    April 9, 2011
    Location
    Fancomicidolkostümier- ungsspielgruppenzusammenkunft
    Posts
    6,398
    Quote Originally Posted by Don Pellegrino View Post
    FHC devblogs

    Sent from my GT-I9100M using Tapatalk 2
    Should we post it on eve-o?
      Spoiler:
    Quote Originally Posted by RazoR View Post
    But islamism IS a product of class warfare. Rich white countries come into developing brown dictatorships, wreck the leadership, infrastructure and economy and then act all surprised that religious fanaticism is on the rise.
    Also:
    Quote Originally Posted by Tellenta View Post
    walrus isnt a bad poster.
    Quote Originally Posted by cullnean View Post
    also i like walrus.
    Quote Originally Posted by AmaNutin View Post
    Yer a hoot

  18. #18
    Donor halbarad's Avatar
    Join Date
    April 9, 2011
    Posts
    5,007
    Quote Originally Posted by walrus View Post
    Quote Originally Posted by Don Pellegrino View Post
    FHC devblogs

    Sent from my GT-I9100M using Tapatalk 2
    Should we post it on eve-o?
    Leave it to Quackbot, he seems good at crossposting devblogs and I'm sure he'll add a nice sprinkling of html special chars that weren't parsed correctly.

  19. #19
    smagd's Avatar
    Join Date
    April 12, 2011
    Location
    https://duckduckgo.com
    Posts
    1,323
    Good job.

    Out of curiosity, is there an advantage of using "cpulimit" over standard "nice"?
    Quote Originally Posted by dstopia
    WHERE IS CCP AND WHAT HAVE YOU DONE WITH THEM?????

  20. #20
    Movember 2012 Stoffl's Avatar
    Join Date
    April 10, 2011
    Location
    The original viennese waffle
    Posts
    22,674
    Quote Originally Posted by Paradox View Post
    EntroX best troX. Nice work.

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •