Property sellers expect that, when they give you listing information, it will only be used to market and sell their property. While that means that you need to give the listing information exposure on the Internet, it is your responsibility to take reasonable steps that the data is not ‘scraped’ off of your web site and used for illicit purposes, such as direct-marketing the seller or display on unapproved places on the Internet. When the consumer gives you their information and receives calls that evening or print ads for move-related property and settlement services soon after, they know their information is being used for undesired purposes, and that either you have intentionally or otherwise betrayed their expectations for the limited use of their information.
Most real estate professionals don’t realize how easy it is for people to send out a web-crawling program to collect the listing information for illicit use or what the information can be used for. If someone can ‘scrape’ the property address from your site, they can send mail to the owner – or by having their computer use a reverse telephone directory, create a list of phone numbers to call. Sometimes the address is listed in your list of listings, search results or on property detail pages, and sometimes a poorly integrated mapping solution will include a link with the address embedded in it. Price and MLS number are other pieces of information that data pirates especially desire. Email addresses are harvested for spam which, given how much spam is used for phishing and to send viruses and spyware, increases your own risk as your email address is spread around the Internet. While no method of preventing scraping is foolproof, here are some reasonable steps you can take to cut down on the problem:
First, make it hard for the web-crawling robot to go from listing to listing:
- Require that some search criteria are filled in and only return a reasonable, limited number of search results for any search
- Do not allow a search by numeric MLS number – it’s too easy for a scraping program to search for #111123, #111124, #111125, etc. That’s called ‘incrementing’.
- Programmers shouldn’t use numbers to identify listings – e.g. http://yourdomain.com/ListingDetail?id=1234. That number can be used for ‘incrementing’ – whether in the web site address or sent via a web form. Then, make it hard to collect specific information off of the page:
- ‘Render’ information such as address and price as a graphic or using Flash. If the information is not text on the web page, it’s difficult to scrape. Note that while this is one of the most important steps to take to stop scraping, it makes your site not useful by those with limited vision!
There are certainly additional steps needed to stop the more persistent and sophisticated data scrapers – rate limiting, CAPTCHA testing, pattern monitoring and so forth. Clareity helps companies address those through its security assessment service – but if you have taken the steps above you at least have taken reasonable steps to protect the seller against the most basic bad guys. Unfortunately these days the bad guys are getting even more sophisticated, and the only way to stop them is to use sophisticated defensive tools like Distil (http://realestate.distilnetworks.com).
Whether you maintain your own web site, have a webmaster, or send your listings to various other sites for display, you may want to make sure that they are taking appropriate steps to stop ‘scraping’, so that when a property seller gives you information needed to market and sell their property, you’ve taken reasonable steps to ensure it is only used for that purpose, protecting the seller’s privacy.
About the author: Matt Cohen is Clareity Consulting’s Chief Technologist and leads its security assessment practice. Matt has spoken at many conferences, workshops and leadership retreats around the country on security related topics, and is a well-regarded real estate industry expert on software design, product management, project management, data center reliability, scalability, and security. Clareity Consulting was founded in 1996 to provide information technology consulting to the real estate industry and its related businesses.