Site Notes
Published August 19, 2003
I'm sure many of you have noticed a bug that I created when I added pictures to Blogcritics. On the main page, near the top, on either side of the spotlight boxes, images from Amazon are randomly pulled from the day's posts so far. Normally it works reasonably well. Some items don't have pictures on Amazon, so occasionally one or both of the images on the front page is empty, but I can live with that.
The problem comes in where I wrote "from the day's posts so far." The images are chosen using technical means I'll explain at the end to avoid annoying non-geeks, but it really, really does weird things when there hasn't yet been a post. So every day between midnight and whenever someone gets around to posting something, weird ugly error messages were appearing in place of the little images. Blech. I knew it was happening, but I had only encountered the errors a few times myself, so I hadn't fixed it. Until now. I put in some error handling, even handling possible errors that should theoretically never, ever happen. Let's hope I'm right.
I apologize to every one who has ever seen those nasty, ugly errors. It is entirely my fault, and I'm sorry. They're fixed now!
Now the geek stuff. Seriously, this is about to get ultra-geeky, with PHP code and Unix weirdness, so read on at your own risk.
Here is the PHP code as it was before:
001: <?php list($year,$month,$day)=split("/",date("Y/m/d"));
002: $phys_path='/home2/eolsen/public_html/archives/';
003: $url_path='http://blogcritics.org/archives/';
004: $date_path="$year/$month/$day/";
005: $directory=opendir("$phys_path$date_path");
006: while($filename=readdir($directory)) {$filepart=explode(".",$filename);
007: if($filepart[1]=="php") {$valid_filename[]=$filename;}}
008: foreach($valid_filename as $basename) {$lines=file("$phys_path$date_path$basename");
009: foreach($lines as $line) {if(preg_match("/\<img src=\"http\:\/\/images.amazon.com\/images\/.*.jpg\"/",$line,$matches)) {$valid_link[]="<a href=\"$url_path$date_path$basename\">$matches[0]/>";}}}
010: $randlink=rand(0,count($valid_link)-1);
011: echo $valid_link[$randlink]; ?>
This snippet code is INCLUDEd in the front page, twice. It uses the current date (line 1) to construct both a directory path and an URL (lines 2-4), opens the directory path (line 5) and looks into each php file (lines 6-8) found there to match lines that look like they contains Amazon images (line 9). Then it picks one at random and returns a "pretty" version of it (lines 10-11), suitable for displaying an image.
The three errors were coming from lines 5, 6 and 8, and looked like this:
Warning: opendir(/home2/eolsen/public_html/archives/2003/08/19/): failed to open dir: No such file or directory in /home2/eolsen/public_html/includes/art.php on line 5Warning: readdir(): supplied argument is not a valid Directory resource in /home2/eolsen/public_html/includes/art.php on line 6
Warning: Invalid argument supplied for foreach() in /home2/eolsen/public_html/includes/art.php on line 8
Amazingly, after failing to open a directory because it only exists when something is posted on that day (the first error), my code continued on as if nothing had gone wrong, and so failed to read the contents of the nonexistent directory (the second error) and then again failed to read the contents of the nonexistent file within that nonexistent directory (the third error). Three errors! I knew I should have had more sleep the night I wrote this code.
- Site Notes
- Published: August 19, 2003
- Type:
- Section: Culture
- Filed Under: Culture: Administrative
- Writer: Phillip Winn
- Phillip Winn's BC Writer page
- Phillip Winn's personal site
- Spread the Word
- Like this article?
- Email this
Save to del.icio.us
Comments
i think you're doing a great job, errors or not.
one thing i've noticed: every once in a while your random image selection will hit the same picture. it's not really a bug but it does look kinda funny to see the image up on both the left and right.
Thanks for the kind words. I just learned about the search error a couple of days ago and haven't looked at it yet beyond a couple of minutes exploratory overview. It might have something to do with including dynamically-generated perl output in a PHP file? I have no idea yet. I've put off revamping the search for a long time, but I guess I'll dig into it soon. In fact, it's now my next priority after making sure Jan Herman is completely happy. Which reminds me, I need to call him...
Anyway, Mark, yes, that does happen. It's more likely to happen earlier in the day when there is a much smaller pool of images to draw on, but it can theoretically happen at any time. That's because the snippet of code is called twice, and each time it has no idea if it has ever been called before.
Also, I've now updated the script again so that if it is a new day, it will use the previous day's images. In fact, it will go back five days, as if that has ever happened. I'm not too happy about how I did it, so if any PHP coders have any ideas on how to improve the loop logic, I'll post the details in the next comment.
Here's the new code snippet:
004: $tries=5;do{$date_path="$year/$month/$day/";$day--;$tries--;} while(!is_dir("$phys_path$date_path")&&($tries>0));
The old line 6 has been folded into line 5, so I'm back to 11 lines of code. I've actually changed this code since my last comment, since I figured out how to get it to look more like what I envisioned in my head. The 'do..while' stuff doesn't seem to be documented as clearly as I would like, but I tried it and it worked, so I'm okay now.
I probably shouldn't concatenate so many lines together. Too bad.
So to explain. I'll start with the 'do' part. Basically create a block of code within the curly-braces which sets date_path just like before, but then proactively decrements day and tries, just in case. Then the while block checks to see if the dir exists. If it does, we go to the next line and all we've done wastefully is set and decrement the tries variable, and decrement the day variable. If the dir doesn't exist (because it is a new day), and we haven't hit our limit of tries, then we go back and set date_path again. Since we had already decremented day ahead of time, it all works.
Of course, we need to ensure that we don't get stuck in an infinite loop, which is why I've introduced the tries variable.
It makes sense to me, though I took a torturous mental path to get there, so it might not for you. :)
Phillip -
From a very cursory overview of what is happening here, it seems to me that one could eliminate a couple of those loops by generating a date specific source file containing the basename and image graphics at the time that entries are first created (the trigger point for new content) or say via a cron job on schedule.
This way the program wouldn't need to loop through multiple directories and parse the .php extension files containing the jpg images with every page load.
This would naturally speed up page loading as well because then on page load all that would be necessary was:
1. determine the date
2. load all images from a prebuilt image directory
3. choose 2 random points in the array (this would take care of the issue that Mark addressed, as well, assuming that there were at least 2 target images, of course)
4. parse the info into the necessary link structure. A link, IMG SRC, etc.
5. echo the output.
Below is a quick example of code I worked up to achieve this using this pregenerated source file that is already prebuilt. The file would contain the following two items (you could add more if you need more info) on each line delimited by the pipe symbol:
basename.php|img src code
Here's the actual code that would generate the two different random images in place of lines 1-12 above:
srand ((float) microtime() * 10000000);
$date_path = date("m-d-Y") . '.txt';
$imgfile = shuffle(file("$phys_path/$date_path"));
$rand_keys = array_rand($imgfile, 2);
$leftitem = explode("|", $imgfile[$rand_keys[0]]);
$rightitem = explode("|", $imgfile[$rand_keys[1]]);
=======
Now, no need to call the routine a second time, just use the following code in the page where needed:
// this is for the left side graphic
print("$leftitem[0]/>");
// this is for the right side graphic (note that it will always be different assuming you have at least 2 different images in the source file)
print("$rightitem[0]/>");
=======
Hope this helps and again, happy coding to you!
Well the print part of my example code got totally chewed up by the MT script in that last comment. Let me try again with PRE tags this time:
// this is for the left side graphic
print("$leftitem[0]/>");
// this is for the right side graphic
// (note that it will always be different
// assuming you have at least 2 different
// images in the source file)
print("$rightitem[0]/>");
=======
Hope this helps and again, happy coding to you!
[Edited by Phillip, second comment manually wrapped to avoid browser width problems.]
Ok I'll throw up the white flag with posting the print statement through the MT script, sorry that I mucked up the page ;) If you want me to email you the print part of the code then just drop me an email at tdavid at tdscripts dot com.
Happy coding to you :)
One other note, you don't need to seed the randomizer if you are using php version 4.2.0 or later. I didn't know what version you folks are running on this server, so I included that code. If you are using version 4.2.0 or later just remove the srand line as it is extraneous.
Happy coding to you :)
One of the major struggles I have with this site is that posting articles and comments takes too long, because too much is already built with each post. I have an ongoing project to try to reduce the amount of work that is done for each post, so I'm reluctant to consider anything that moves in the opposite direction, especially since that particular script seems to load extremely quickly (less than one second for me, including actually retrieving the artwork from Amazon).
Still, I've made a note of this, and the next time I sit down to try to complete my list of 'Everything that is updated when a post is saved', I'll consider this. Thanks for the tip!
P.S. It's PHP version 4.3.2.
P, at last someone who speaks your language!!
Probably better to benchmark the code that I gave you versus the code you are using and you'll see a dramatic difference.
You're correct that surfers don't notice a second in execution time, but the server does and will notice it even more if the traffic on the site here keeps increasing.
I can totally understand not wanting to frontload the post or comment process any further if MT is heavy as written and that's why I also suggested possibly using a cron job or other somewhat transparent method.
Cron jobs are great for reducing server burden and keeping things skipping along as you probably already know. It's not like there are posts happening every minute, so a cron job run once every 5 minutes or so would probably more than suffice.
Do with my comments what you will, they are meant as only friendly observations.
Happy coding to you and keep up the good work :)
TDavid, true that. I don't know that MT is itself particularly heavy, but it's flexible, and the way we're using it is quite heavy. Lots and lots of files get updated when something is done, some of which are only used behind the scenes.
I do need to visit all of the PHP includes and code as part of my master map of the site, and a few cron jobs might indeed be just what I need. Overall I've been toying with a number of ideas about dynamically generating some content versus rendering it at create-time. I know dynamic rendering will result in much heavier overall server load, but surfers won't notice the extra second or two, while posters definitely notice the time it takes to post.
Still, I've managed to improve things dramatically so that comments no longer take 30 seconds or more as they once did, so maybe I should relax a bit...
Thanks for your suggestions!
P.S. Your code example came through fine in my email notification of your comments, but the leading '<' hoses browsers. I manually converted all of mine to '<' to avoid that. :)
freaks
Eric, you're just jealous of our superiour intellects. Yeah, that's the ticket.
(Now if only we would apply our "superior intellects" to solving important problems, eh?) ;)
You wrote:
"... but surfers won't notice the extra second or two, while posters definitely notice the time it takes to post."
Actually, according to most the feedback that I've received in my online work, surfers can and *do* notice small time periods like 1-2 seconds if that is converted from broadband to dialup. Most webmasters are even more impatient than surfers, I think.
It seems that folks would rather come to a blazing fast loading page and the way you are doing things currently is going to be faster than any sort of dynamically generated page, as I'm sure you'll be weighing this factor into any future changes.
I have to constantly remind myself that there are still lots of folks out there that don't have better than 33k dial-up connections (some are less than 21k in rural areas, unfortunately) and when I test page loads at those speeds things are invariably amplified.
No offense to writers posting their outstanding articles here, but if the generated page is slower to load, a reader might be gone from reading their article to a page/article that loads faster. I think they'd rather wait 1-2 seconds then ask their readers to wait those 1-2 seconds. But I'm sure they can chime in with their individual feelings on this.
Surfers and well, most everybody in this day and age is way impatient! LOL
Personally, I would rather wait a few seconds for a comment or when posting an article as opposed to having a delay on the page when stopping by to read.
1-2 seconds is splitting hairs, I know, however 1-2 seconds at broadband could equate to 8-10+ seconds at lower bandwiths, and that pause is definitely noticeable to the end user and worthy of consideration.
BTW, the blogcritics index page currently takes 30 seconds on a 56k modem to load, which is fine for us broadband users (I'm making the assumption you are a broadband user), but that is a bit slow for dialup users, I would think.
I'm not complaining and I doubt any of the modem users are because they are used to things loading slow on their dialup connection, but if I had only a dialup connection availability I'd be a bit disappointed with most websites today, I think. Maybe some dialup users will weigh into this thread, if the php code didn't spook them away, that is :)
Possibly you could consider, if you don't already have, a lower-bandwith version of the site for dialup and wireless users (btw, this site fails to load on my sprintpcs vision cell phone).
Fortunately weblog sites like this one are mostly text, which makes for a speedier loading page than a primarily image-driven one.
Alas, I must get back to the coding mines myself now. Have a nice day and keep up the good work, Phillip ;)
of course I am jealous of your superior intellects
TDavid, I think you're talking apples and oranges here. The page takes longer for dialup users because it has a lot of content. A good portion of that I'm not allowed to remove - and believe me, I've tried. To name but one example, many Blogcritics posters participate in large part to promote their own sites. Removing the list of 300+ bloggers linked to their webpages seems to make sense from a usability point of view (and would chop several seconds off of the page-load for dialup users), but would probably lead to a revolt of the posters, and without the posters, there's no site.
ANyway, that's off-topic. The point is that time spent transferring data obviously multiplies the slower the connection. Time spent on the server (which is what happens in the case of art.inc) does not change with the speed of the connection. The server is the same, and very little data is sent back.
Thanks for the explanation, Phil. I wrote you about the problem last week in regard to the Safari browser for OS X. Since then, I've done some more reading on PHP, Slash and the Netscape code underlying Safari. Seems the page may continue to display oddly in Safari because of bugginess in the browser code. Safari is fast, but still has kinks that need to be worked out.
Sorry, Diva, I don't remember the email or the issue. I use Safari at home and I don't think I've seen anything particularly odd. Please send me email again at pwinn@blogcritics.org and I'll take a look.
I had to skim through this, because my head was on the verge of exploding from all the code. Must - look - away!
To mirror the other comments, many thanks for keeping up the good work on the site. I'm a web-designer/graphic artist by trade, but this back-end code stuff just drifts right over my head. I'm amazed that you even knew what was wrong, let alone how to fix it!
Ahhh, back from lunch. Anybody else a KFC fan?
Phil - with all due respect, I don't think from reading your last comments that it is clear what I'm trying to say (the beauty of text on the web versus voice).
Therefore, let me briefly attempt to explain another way maybe, or of course you are welcome to contact me via email about it personally and we can discuss it further that way.
So let's say it takes 1-2 seconds on the server side to execute a program versus .25 second when optimized multiplied by 10,000 executions over a given 24 hour period (I have no idea how much traffic you folks are doing so I'm just throwing out 10k over a 24 hour period as an example number), then do the math of the loss or savings in load on Apache (my reference to the benchmarking earlier).
Also, as I'm sure you well know, the headers have to exchange and the page has to be made ready to parse before it begins outputting to the browser so if Apache is taxed there is a bottleneck to get to and around before the actual page size -- and thus the bandwith -- becomes the paramount secondary concern that I mentioned in my last post.
I've been called into many jobs to fix programs that have this type of ailment. The solution is simple, although the specifics sometimes are much more complex, but basically it boils down to: less dynamic content, more cached or static content, less reliance on database (where the bottlenecks usually occur), more efficient database queries, better/improved file access schemes and last but certainly not least: optimized code.
So 1-2 seconds of script execution time can be very expensive in the grand scheme of things on a busy/loaded server, especially if it isn't a dedicated server and the traffic is heavy.
If it's a program that's executed 1000 times a day it isn't going to matter much, but when scaling comes into play, then it will become of paramount concern.
If the overall server load is high then pages will not be served in a timely manner and that 1-2 seconds time can be amplified, which was the point I was trying to make above.
You are correct that it won't make any difference if the person is on a dialup or broadband once the page is ready to be sent -- the page is still X bytes in size, however, the process to get to that point is still dependent on many factors the biggest of which usually is server efficiency. So if a program is chewing up load that could be saved the pages will be served faster from the server.
I'm not sure I understand the comment about not being able to do anything with the content on the page? I think you mean that there are links/content that can't be removed? Or do you mean that there is nothing from a programming standpoint that could be done to serve up alternative weighted pages?
For example, a small bit of code could analyze if the referrer was a wireless device and redirect to a less weighty version of the same page. It would be transparent.
A cookie could be (at the user's option) placed/used for those with lower bandwith connection so that those who enter could be served up a less weigted page, and of course spiders could get the full meal deal so no SE action is lost.
It's been bandied about, and I don't know how accurate this is, that the spiders choke on a significant amount of hyperlinks or too large page content anyway. I've seen pages that were 250k+ being spidered effectively though, so I don't know how accurate this info is.
I realize the traffic numbers and server load in discussion probably does *not* apply to this particular situation and website so I am throwing this out for the good of those who are interested in this subject.
Your comments in the beginning of this were soliciting feedback on possible program improvement. Now that I've posted some possible solutions and suggestions with detail, I'll shut up about it further here in this thread to avoid becoming annoying.
If you need anything further or have any further questions regarding this feel free to shoot me an email or give me a ring at my office. That number is posted prominently at my website
Happy coding to you and again, keep up the good work. None of what we are discussing here is meant to be critical of you and your work; it's simply food for possible improvement :)










Hello Phillip -
It's all good, bugs happen to the best of us. I've been programming for 20+ years now and I still get snakebit on occasion ;) Part of the game, no matter what skill level one is at.
Just FYI, but you might want to take a gander at the search function when you get a chance as it keeps throwing the following error:
Warning: main(): stream does not support seeking in /home2/eolsen/public_html/search.php on line 40
Here's a suggestion: you can preface many PHP functions with the @ symbol and it will suppress the error so it doesn't choke the output. Also, maybe take a look at the error_reporting function.
PHP has lots of useful, built-in functions ;) Happy coding to you!