“Is Google a scraper?” That was the question at the center of news stories surrounding MIBOR’s decision to tell a broker not to let Google index their site. The quick answer is “No” – there was no restrictive terms of service or limiting robots.txt file on the site, so technically Google did absolutely nothing wrong. But the question being asked … that was the wrong question.
Finally, after the hype died down, the ‘real’ question started to emerge: “Should or could MLSs require that brokers not allow individual listing pages be indexed by search engines”. Since listings are given to brokers for advertisement, unless the seller opts out of online advertisement, since most consumers are searching for property online and search engines are an important part of online marketing, search engines will be an important component of giving listings the proper exposure and should be leveraged as much as possible. Also (and obviously) the MLS could probably make rules pertaining to an IDX feed but realistically not regarding the broker’s own listings. But whether search engines should be allowed to index the sites is again the wrong question.
What’s the real concern here? We’ve had IDX for some time – was it really just okay when it was invisible to search engines? Of course not. The real concern about ‘data scraping’ only comes from when the data is misused – that is, used for a purpose other than that intended by the homeowner when they provided the information to the real estate professional and by that professional when they added their own creative descriptions to the data to create the often copyrighted listing content.
What kind of misuse has there traditionally been? When a site is easy to scrape someone can come along and grab the listings in an automated way for display in an unauthorized location. Data can also be recompiled to create derivative products or to market back to the consumer. If the scraper adds an automated reverse telephone look up to scraped data, someone giving a real estate professional information to market their property one fine morning may find themselves called by moving companies and other service providers that very evening – and it reflects poorly on the real estate professional when that happens. So, the real question we need to ask ourselves is, “How do we stop the misuse of data while not compromising the ability of the broker to market properties and promote the web sites on which the properties are located?”
Let’s look at the type of requests consumers put into search engines. I believe that there has been a lot of hype about needing the whole address in the web page title and that individual addresses need their own website. Do consumers really expect to type in “100 Test Street in Testville, TN” and come back with a website? I don’t think so – not at this point. We all know how the traffic comes in via web site search terms: “houses in Testville, TN” … “Testville Tennessee real estate” … “homes in Testville” “Subdivision Name in Testville”. So, city, state and neighborhood/subdivision are obvious candidates to allow a search engine to index. Key attributes might also be searched on – “lake view” etc. But the full address? Price? Bedrooms? Bathrooms? Square feet? Lot size? I say, “ridiculous!” Are they needed for search engine optimization (SEO)? I believe the answer is an emphatic, “No”. Since those bits of data don’t help in the indexing of the listing by search engines for marketing of the property online BUT they are prone to misuse when programatically gathered (scraped) there is no reason why MLSs should not require that websites put anti-scraping mechanisms in place on those key items, while allowing search engines to programatically gather other information for the purpose of providing free links back to the web site.
But, anti-scraping begins at home. Less than 5% of MLS public sites have any anti-scraping in place to speak of – and good measures are far more rare. But, I digress – before we launch into a tangent of anti-scraping tactics, we need to agree on a strategy for the level of protection required for the data to balance marketing with information security and privacy, and we must set policy that is reflected in contract terms pertaining not only to industry sites but to syndication endpoints as well.
Note – I’ve been traveling for more than a week and am writing this at o-dark-thirty in an airport parking lot – it’s not my finest piece of writing – sorry! Hopefully I’m getting the ideas across anyway…
Share this post: