The day started off like all others, I woke up late, rubbed my eyes, rolled over in bed and opened up my laptop.
What I found was a bit disturbing though, it appears that Chicago Tribune has caused some customer service issues due to expired content. How? Well to answer that question, we have to turn to Twitter.
You see, @ourmaninchicago, @LeahJones and @chicagocarless had a conversation that touched on a rather significant frustration that has been looming around Tribune’s sites for a while now—content expiration!
Why in the world would Tribune expire its content? After all, they are a news organization. A collection of information with print archives going back to the 1800s. Why in the world would they expire their online content? It’s simple . . .
-
The server load to run massive databases that are necessary to keep every article.
- Literally thousands of articles per day are created or fed into Tribune Company (keep in mind Tribune owns a number of news organizations including L.A. Times, Chicago Tribune, two Florida papers, a few more along the east coast, and a dozen or more broadcast companies as well).
- When Tribune came online over 10 years ago, the infrastructure and costs were wildly different. Much of the same code developed then, is still being used today.
-
Partnerships and contracts with 3rd parties
- The value of inbound links and keeping customers on Tribune domains wasn’t a focus over the years. Thus contracts were entered into (sorry no details).
-
Lack of focus and understanding of the SEO implications of expiring content.
- The online divisions of the newspapers and broadcast sites are a recent focus. The focus and understanding of SEO is even more recent.
There are a lot of little reasons that can all funnel up into those three big reasons that float around through the company, but those are the big ones and in that order that I hear almost every week.
So, I’m the SEO Director at Tribune . . . I should fix it, right?? Agreed!
This is where we are at so far on this:
-
We have our two largest sites up with some type of Archive News sites.
-
- Only print articles and USUALLY a 301 redirect from the original published URL directs you to the correct page on the ‘articles’ server.
-
How many articles and how far back? Good question! No more than 10 years (via this project, you may find stragglers) and most articles are within the past few years.
- Google index stats: 1.4 million pages (mostly articles)
-
- All online content that Chicago Tribune was the original creator of (non-AP, or wire stories) is on the ‘archive’ server. The 301 redirects don’t work as well though and occasionally you may not be able to find the story you are looking for. We are working on it, but it’s far from perfect.
-
How many articles and how far back? Only a couple of years . . . for now.
- Google index stats: 218,000 pages (mostly articles)
-
-
The technology used for the two sites above can be used to create a similar system on other Tribune news sites.
- There are some things that need to be done first (can’t get into details), then these archive sites can be built for other Tribune news sites.
-
All of our blogs posts are saved for as long as the blog is still active
-
I’m working on setting a policy on what should be done with blogs that are no longer updated (if anything). However, this must be done smartly and justly.
- There are literally thousands of medium PageRank pages sitting out there and I want to make sure that any redirects that may occur go to a proper home for both the human experience and search engine experience.
-
Bottom line? By the end of the year I will have all of the newspaper sites with some sort of archive site. It won’t be the most perfect but it will help significantly in solving the massive number of 404 errors I look at in Google Webmaster Tools (one domain below). Keep in mind these numbers used to be in the high hundreds of thousands when I started with Tribune in February 2008. Progress is being made . . . much more work to be done though.

How much traffic are the article/archive servers generating for us? Well, I’m not going to do a screenshot for that, but I will mention that they equate to less than 3% of our total traffic. So despite how people feel this is ‘must have’ content. The numbers are hard to justify when we have other projects that can easily trump 15% lift in traffic. But it is really important for SEO because there are literally millions of dead links on the internet that could be driving PageRank to the Tribune network of sites. As an SEO, I am very frustrated by it, but I also see from a product standpoint other things in the hopper may be a higher priority.
Again, I’ll have all the Tribune newspaper sites with an archive system by the end of the year. I’ll try to be patient if you will.
Note: Still unedited, sorry for typos…
YOu’re an amazon aff again? ahahaha lmao!
Nice work though, and interesting to hear the travails of an SEO DIrector. Tough stuff cleaning up messes! But I’m also sure you’re having a blast
!
“It’s a hard knock life…”
The question I have is should some content be expired?
I raise this question in concern to local police blotters that track arrests and traffic infractions. In most cases these are dropped from public records or can be expunged after a period of time.
Sure, I can go look at microfiche of old newspapers at the library but should this type of content be expired online on purpose after a specific period of time?
I believe we are at a turning point in media as more print publications realize the strength of the distribution network online. As high as the costs may be for the killer servers needed to run the best database of articles the current distribution network has to be even more costly.
Since the media are at this turning point now this is the best time to look at this ethical question, see what the law suggests that can be used as guidelines, and make a decision on how this type of information should be removed when it’s relevance has expired.
Keep up the great work and leadership in bringing a great publication to it’s rightful place online.
@gab – Yep! They approved me after 5-years of being banned. Never again will I piss them off. Not worth it.
@Randy – Hmm. An interesting viewpoint that I only hear on rare occasion. I can tell you that I’ve removed content from the search engines for similar reasons that you mention above (it was already deleted from our systems via automation). Being a person that has a TON of information about himself online, I’m not sure I’m the right person to ask. Because I believe that the internet becomes the ‘book of life’ of sorts. All things ever collected about a person. We are who we are . . . we all make mistakes, some grave mistakes some minor. I think society is becoming more lenient as to what we allow to be socially accepted (whether that is a good or bad thing is another conversation).
Curious to hear what others feel (and shocked this took a non-SEO turn considering most of my Twitter followers are in the Search Industry).
Brent
Great starting post!
Managing archives for large publishers is an enormous task, but the SEO gains can be significant.
I’ve recently discovered that my client has a ton of great links from BBC (pages from 2001, and 2002), however since BBC did not do great job with archives the pages are not in G index…
Telegraph UK has a good practice of site wide linking to their Archive Index
p
I’d love to have your thoughts on their ‘conversion = sale-on-ctr-session-only’ policy and what you’re getting out of the program so far…
p.s. You still have a fotprint in the foter that would be better off deleted. And yes, those typos were intentional.
Keep up the great work its inspirational and has really helped me a lot.
Thank you