Martin Splitt from Google explained the concept of centerpiece annotation, a term used in Google to define the primary content of a page or site. Martin said they are able to understand that a page’s primary topic is on A and the rest of the content on that page might not be the primary. So Google will weigh the content differently based on that, Martin said.
He said this at the 28:50 mark in this Duda webinar, here is what Martin said:
I don’t know what we have publicly said about this but I think I brought it up in one of the podcast episodes so I can probably say that we have a thing called the centerpiece annotation for instance and there’s a few other annotations that we have. Where we look at the semantic content as well as potentially the layout tree.
But fundamentally we can read that from the content structure in HTML already, and figure out so oh this looks like from all the natural language processing that we did on this on this entire text content here that we got it looks like this is primarily about topic A, dog food. And then there’s like this other thing here which which seems to be like links to related products but it’s not really part of the centerpiece, it’s not really main content here this seems to be additional stuff. And then there’s like a bunch of boilerplates, so hey we figured out that the menu looks pretty much the same on all these pages and this this looks pretty much like that menu that we have on all the other pages or of this of this domain, for instance or we’ve seen this before.
We don’t even actually go by domain or like oh this looks like a menu. We we figure out what looks like boilerplate and then that gets weighted differently as well. So if if you happen to have content on a page that is not related to the main topic of the of the rest of the content, we might not give it as much of a consideration as you think. We still use that information for the link discovery and figuring out your site structure and all of that. But if a page has 10,000 words on dog food and then 3,000, or 2,000 or 1,000 words on on bikes, then probably this is not good content for bikes.
Here is the embed:
Glenn Gabe summed this up on Twitter as saying “Google has a centerpiece annotation (& others). It looks at semantic content & layout tree. From NLP, G can identify a page is about topic X, then ID supplemental content vs main content, boilerplate, etc. Then that can get weighted differently by Google.”
This makes sense, I just didn’t fully know that Google called this “centerpiece annotation” internally.
Forum discussion at Twitter.