When a computer program such as a search engine looks at a webpage and sees agent “Jane Smith” it doesn’t necessarily understand that “Jane Smith” is a name, let alone know whether that’s the listing agent or someone else. It certainly can’t make the connection that all this content is courtesy of Jane, and that her website might well be the authoritative source for the content. If this was possible, there could be benefits, both for Jane and for other web users. Jane (or her identity on the web) could get credit for the content, with her site accruing additional SEO benefit. For the last few years technologists, mostly search engine providers, have been considering how to open up these and many more possibilities by making the meaning and content of websites more understandable by computers – and that effort is called the semantic web – semantic meaning “relating to meaning or arising from the distinction between the meanings of different words or symbols.”
Sadly, all the technology to code semantically existed long ago – before the web itself – but it wasn’t adopted for web use. HTML (Hypertext Markup Language) code was defined in 1993 as an easy way to make web pages. This was, in some ways, a technological giant step backward. More than seven years earlier, a markup language had been developed called SGML (Standard Generalized Markup Language ISO 8879:1986 – based on GML created in the 1960s). The focus of SGML was separating content from display to allow for just the sort of semantic interpretation we now lack on the web… I’ll explain. Let’s look at a bit of content with some “old school” HTML embedded in it. The only HTML tags I’ll use are paragraph markers and line breaks:
The computer can’t automatically tell who the recipient is and who the letter is from. It can’t tell the phone number from the fax number. Now let’s look at a fragment from an SGML document:
It should be very clear how a computer could easily interpret this, and you can imagine that if you were reading this in a mobile browser that the telephone number would be hyperlinked to allow you to dial it. Just for entirety’s sake, I want to note that this document would normally reference another file that helps evaluate the letter for structural and even content validity (a “DTD”). There would also be other documents that could help the computer understand how the data in the SGML document was meant to be displayed.
I know that I’m simplifying this to create a reasonably short article and that there are some gray areas—but that’s the heart of it: HTML was primarily designed around display of content for a person who knows how to interpret it as data, a step backward from predecessor technologies in terms of creating the semantic web – but it was very easy to adopt, and that was a novel thing at the time.
Where We Are Today
Google has started with a ‘baby step’ toward encouraging the semantic web by trying to get web page coders to add specific author and publisher information to content.
So, instead of:
Google would like to see the following code, which would let them know that I have authored content and the location of my online profile (so they can tie all of my contributions to the web together):
Similarly, Google would like to see the “publisher” identified. So, on a Clareity Consulting web page that I have authored, there would not only be the author tag above, but also a publisher tag:
The examples above use our Google+ pages, but theoretically the link could be to another profile page entirely – it’s just a question of whether Google would know how to interpret that profile page.
Below is an illustration of the benefit I get from using the tags. I am listed as an author in the search results, and my Google+ profile shows up on the right within the search results when one clicks on my name.
Google and other search engines would also like to know which site was the original source of content – so it would be polite for those who re-print my article, “10 Tips for Real Estate Agent Information Security,” to include a tag indicating what is known as the “canonical source” inside the HTML “head” area:
There’s no reason for today’s webmasters not to use these tags. It is even in discussion to require they be used on IDX websites, so authors and their sites get due credit.
The search engines (for starters) would love to be able to better interpret more about your website content. Therefore; Bing, Google, Yahoo! and Russian search engine Yandex have collaborated to create even more tags for all sorts of content. You can learn more about those tags on http://schema.org. To show one example from the site:
There actually is already a set of tags for properties, including Single Family Residential: http://schema.org/SingleFamilyResidence
. As you noticed if you followed the link, it’s not really designed to accommodate listings – but what if it was…? Should the industry try to work with the schema.org group to make it so?
There are two main problems with the semantic web that need to be considered – the first is “tag spam” and how to handle it. Tag spam is when tags are misused (i.e. someone claims to be the canonical source of an article when it was someone else). The second is regarding “screen scraping,” which is when someone visits your site and grabs all the data (made so much easier with the semantic web data tags) for malicious purpose (i.e. re-compiling MLS data from your web site to sell to marketers). Since not many agents have anti-scraping (i.e. Distil – http://realestate.distilnetworks.com) on their websites, increasing the ease of screen-scraping with such extremely detailed semantic web tags is not a great idea.
What does this mean to real estate professionals?
So, to sum up: use the three tags described earlier: author, publisher, and canonical – and encourage others to do the same.
What does this mean to industry wonks?
Let’s continue the discussion about the use of the semantic web-related tags in IDX and other contexts. Let’s consider if we should be engaging with schema.org – and also consider the risks of doing so, along with risk mitigations for which we should advocate.
Share this post: