Wild Card Posted September 29, 2008 Report Posted September 29, 2008 Hello I have recently had a problem with an overload on the server I was using a php script to automatically check file sharing sites to see if the links were still active. Here is the original problem we received from the host. please remove the following hack automated Link checker! this is causing over load on the server that was the reason host suspeneded the site. this php file is causing the trouble vbbot.php here is what the host said Your site appears to be running a PHP script which is using large amounts of CPU and is continually running until we kill off the process. We have disabled the script /home/xxxxxxxxx/public_html/forums/vbbot.php by chmoding it to 0 to stop the issue. If you could please contact us when you believe you've solved the problem it would be much appreciated. The file was deleted and uploaded again as autobot.php I have disabled the script yet again and activated your account. Activation of this script again will result in termination of your account. We have gotten rid of this but I was wondering if you might have a look at this script and see how we can make it useable we do not want to get booted from you we very much enjoy being a part of your orginization. Quote
Tony Posted September 29, 2008 Report Posted September 29, 2008 There is no simple fix to a system like this. Lets say you have 10,000 threads each with just 1 outside link to be checked. That turns into 10,000 outgoing requests. So doing it daily that is 10,000 requests via curl each day. It's pretty much a proxy which we do not allow due to the CPU and bandwidth they use. The script itself is pretty flawed. It just runs continually and it's way of reducing it's load is sleeping itself. This is not going to stop the fact the process is sitting there and still using CPU time and memory. An ideal system would be one that stores last checked thread and is ran every few minutes. This way it's not sleeping and it runs then is killed. That's a complete re-design of the system in order to make it more reasonable. It's one of those scripts where it's advantages do not outweigh it's disadvantages. Quote
fenerli Posted September 30, 2008 Report Posted September 30, 2008 Wild Card: Does your site have enough traffic that a significant portion of these threads are viewed between link checks? You might be better off only running it on demand, i.e. only when a thread is viewed coupled with what Tony suggested, a "last checked" timestamp for each link so that it only checks once every specified interval. Also, what method are you using to check the links? cURL? Quote
Wild Card Posted September 30, 2008 Author Report Posted September 30, 2008 Thank you Guys for taking the time to have a look at this. The way the script is setup it does timestamp the post and would not check it again until the next day. We did run it on a demand basis but we could have made a cron job to check it once a day. The problem I think we faced it was the first time running and it had so many links to check and bin the dead ones and send auto pms to the original poster the server went nuts and over loaded. I stripped out the pms but it was still too server intense. I would like to have an Opportunity to rewrite it and maybe set up a time where we can test with your permission and get your feedback if this script is at all plausable and server friendly. Quote
Tony Posted October 2, 2008 Report Posted October 2, 2008 Thank you Guys for taking the time to have a look at this. The way the script is setup it does timestamp the post and would not check it again until the next day. We did run it on a demand basis but we could have made a cron job to check it once a day. The problem I think we faced it was the first time running and it had so many links to check and bin the dead ones and send auto pms to the original poster the server went nuts and over loaded. I stripped out the pms but it was still too server intense. I would like to have an Opportunity to rewrite it and maybe set up a time where we can test with your permission and get your feedback if this script is at all plausable and server friendly. There is really no way this thing is going to ever be server friendly. Every time you check if a url still works it's intensive. You run it on 1000 topics a day that's quite a bit. This only gets worse as you add more topics. So just to make things clear the intensiveness is the portion where you need to see if the url is still working. fopen, curl doesn't matter. They're all costly when you go outside onto the internet. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.