How often do search engines crawl the web? A report by Dirk Lewandowski, Henry Wahlig and Gunnar Meyer-Bautor of the Department of Information Science at the Heinrich-Heine-University Düsseldorf, Germany looked at websites with daily updates and concluded that many were not crawled more than once every 20 days. There are specific issues identified at Yahoo, and to a lesser degree MSN which often seem to have out of date caches.
The study was undertaken using the expression “Helicopter crash in Iraq”, and looking at updates from Reuters.
The implication of this for communications business professional is that even with daily updated content you cannot be sure that your content will be found by any search engine.
Unless you are using RSS and blogging techniques to ensure that your content is published to as many intermediary sites as possible your story may not even be indexed by major search engines within the news cycle. The implication of this study is that even if a PR professional puts up a press release on their own site and on e.g. PR Web, it might not be being indexed by the major search engines, and is therefore unlikely to be found the 75% of users who use search engines to find information about companies.
[...] Over the past few weeks, I have been paying close attention to how search engines crawl my blog. I have been paying attention to how often my blog is crawled, the number of pages that get crawled per visit, the length of time between visits as well the depth of each crawl (both internal and external depth). The new design for my individual post pages is a direct reflection on what I have learned over the last few weeks of studying the crawling behaviors of my blog. All in all, and to keep the explanation simple, references and links within my articles typically weren’t getting crawled and if they did get crawled it didn’t seem to hold much weight towards honoring any kind of credit to the source page I was linking to. An even easier way to explain it is that by the time my articles pages were crawled, the crawler typically ran out of gas before it finished following all the links in my article. The crawler would continue where it left off the next time it came back, which typically took about a week. [...]
I think it depends on the inbound links from some sites having good PR.
I think that having good inbound links definitely helps being crawled. Having a flattish site helps with being crawled too. And publishing regularly … hence the continuing effectiveness of blogging (despite the sharp decline in hype!). Corporate sites, with their deep structure and their Stalinist control over the message, fare badly. University websites, with their devolved structure, multiple home pages and well-maintained individual pages, do rather better. One could imagine a corporate site of the future having more of a wiki structure where any department or individual could post (within their guidelines) and everything useful to a company’s multiple stakeholders (clients, partners, employees, investors) got published.