How to Detect and Stop Web Content Theft

On
Laptop on a wooden tableYou created a splendid blog or a business website, worked hard on it day and night, only to find one day that your exceptional content is being stolen and is used by content scrapers on their spammy websites. You get another shock when you see that these pages (with stolen content) are somehow ranking way higher than your original content pages on various search engines. Content theft is emerging as a major problem for web publishers, especially for the newbies who either are unaware of it or have no clue about how to deal with it. Believe it or not, it is just a matter of time before your website will fall victim to plagiarism. Dealing with content scrapers is not a cakewalk, but it's not rocket science either. By taking some preventive measures and by using the right kind of tools, you can detect content theft quite easily and can also deal with the culprits. Let's get started and see how to monitor and deter content theft in the best possible way. After going through this guide, you'll be able to tackle content scrapers, effectively.

Laptop on a wooden table We'll start with various techniques to monitor and detect content theft and will conclude with methods to deter or deal with these content scrapers. You can pick the ones that fit your needs and give you the best results.

Your original content is copyrighted the moment you publish it on your website.

This statement holds true only if you're not using a license that allows copying and publishing of content on external domains.

Spying on Content Scrapers

Before you try to confront a content scraper, you must know the tricks to detect content theft and the associated culprits. Fortunately, it's not hard to do so and most of the detection process can be automated.

So, let's get started and check out various techniques to reliably detect content theft.

  • Quick Manual Search: Before we dive into automated and sophisticated techniques to spy on content scrapers, let's go through a simple method to quickly check if a specific article or post is copied elsewhere or not.

    Google search on mobile Start with picking one of the popular articles hosted on your website you want to check against content theft. Copy one of the short paragraphs (approx. 5 to 10 lines) within clipboard. Enclose the copied text within double quotes (as shown below) and make sure there is no double quote character within the copied text.

    "short paragraph text copied from the article"
    Paste this text within Google search engine and make a query. The result will display all the web pages containing that paragraph text.

    From here on, you can easily click on the search result links to find out which websites have copied that article containing the copied short paragraph. This is one of the crudest and basic methods one can employ to perform a quick search against content theft.
  • Use Plagiarism Detection Service: If there is one plagiarism detection service that's popular across the globe, it's none other than Copyscape.

    Copyscape logo Both free and premium plans are available for the content publishers. Simply provide your website's URL and it'll provide with the results of possible plagiarism attempts.

    Large websites with huge archives should consider their premium plan for extended and reliable plagiarism reports. One can also use its Copysentry feature to get automatic alerts of possible content thefts.
  • Monitor and Get Alerted - The Smart Way: And now, let's deep dive into Google Alerts. It's one of the powerful yet lesser known tools content publishers can use to their advantage.

    We'll use it to set up automated alerts to alarm you about possible content theft attempts. Start with fine-tuning the global settings about how you may want to receive the alerts. A possible scenario is shown in the image below.

    Google Alerts setting Once the notification settings are done, it's time to set up multiple alerts for your website. There is a dual advantage in configuring these alerts.

    Apart from getting alerts for possible plagiarism attempts, you can also monitor new link backs popping up on the web. In a nutshell, you get to know where and who is talking about your website and its content.

    Let's create our first alert.

    Step 1: Type your domain name in the box (as shown below) and click 'Show options' link to expand the dropdown options menu.

    Step 2: Make sure to select the 'All results' option for the 'How many' attribute.

    Google alert Step 3: Finally, click the 'Create Alert' button to finish the process.

    If you get overwhelmed with the alerts and find it difficult to follow all of them, change the 'All results' option to 'Only the best results'.

    I'll recommend using the following search terms to create multiple alerts for your website.

    yourdomainname
    yourdomainname.com
    http://www.yourdomainname.com
    
    Though these search terms may seem redundant, I'll still recommend creating multiple alerts for all the search queries mentioned above.
  • Use Reverse Image Search: Another handy tool to detect plagiarism is Google image search. If you frequently use customized images on your website, use this option to find external domains using your site's graphic content.

    There are 3 different ways to trigger a reverse image search. The first method involves simple dragging of the image from your desktop to the provided box as shown in the image below. Make sure you use the same image copy used on the website.

    Reverse image search on Google If you're opting for the second method, provide the direct hotlink URL of the image. While using this option, make sure image hotlinking is not disabled on the server.

    And last but not the least, you can use the regular file selection dialogue to pick the image file to search for. This method is a reliable way to detect image theft.
You can use one of the methods mentioned above to look for content scrapers and to identify the websites copying your content.

Dealing with Content Scrapers

Now that we're familiar with various methods of detecting content theft, it's time to learn about handling these content thieves. We'll start with the basics and will gradually explore all the available options to crack a whip against the culprit.

So, here we go!

  • Collect Information: Start collecting important information about the website and its owner before you initiate the content removal process. Following data should be collected as soon as possible.

    Start with WHOIS Lookup service to find important information associated with the website's domain name. The most vital data to be captured is the name of the domain registrar. Through the WHOIS records, we can also find the site's web host via name server records.

    WHOIS records The last important piece of information is the domain's registrant name and the associated organization's name. This information is sometimes hidden with WHOIS Guard service.

    Most spammy websites have this information hidden so that the owner's identity is not revealed to the public. In such case, simply ignore this information. At this stage, take screenshots of the stolen content web page and its search engine result entry.

    And last but not the least, visit the website in question as a normal visitor and check if there are any advertisements on the web pages. If it is so, view the source code of the page and note from which ad network these advertisements are served.
  • Ask to Comply: Once the information has been collected, it's time to move on to the next step. Normally, such websites do not provide any way to contact the site owner.

    In case, there is a contact page with a web form or an email address, get your message across and ask the site owner to remove the stolen content within a 1 or 2 days deadline. Do not reveal your post deadline strategy if your valid demands are not fulfilled.

    While asking for content removal, share the links to both the original and the copied web pages to facilitate easy purging of the stolen content. Do wait for the reply till your proposed deadline.

    In case, the site owner refuses to comply and remains defiant, it's time to move on to the next step.
  • File a Complaint to the Advertiser: The most potent weapon against a content thief is to get the advertisements blocked appearing on his website.

    It can be done by sending a DMCA notice to the site owner via the ad network he is using. For example, if the site is using one of the most popular ad networks like Google Adsense, use this page to report copyright infringement.

    If any other ad network is being used on the website, you can ask their support executive about the DMCA notification process. In most cases, this strategy works like charm.
  • File a Complaint to the Web Host/Domain registrar: If there are no ads on the website, file a DMCA complaint to the web hosting company associated with the same. For some of the popular hosting companies, copyright infringement notices can be sent through the following links.
    Almost all web hosting companies provide such links to report the content theft. You just need to look for the reporting web page on the hosting company's website.
  • File a Complaint to the Search Engines: Another effective strategy is to get the culprit website removed from the index of popular search engines viz., Google and Bing.

    Use the following links for the same.

    You must be logged in to the respective Google or Microsoft account before you attempt to send a DMCA notice via one of the links given above.

    Getting a site removed from the search index ensure its organic traffic is completely blocked giving a huge blow to the growth of the same. This step forces the site owner to comply.
No matter how you file a complaint, be sure to send the screenshots and other visual proofs you've collected initially. If done correctly, one of the methods mentioned above knocks out the culprit, completely.

Taking Preemptive Measures to Stop Content Theft

Prevention is better than cure. It's an old saying which perfectly applies in this case too. There are several techniques and methodologies to preemptively deter content theft attempts. Let's check out some of these methods.

  • Protect RSS Feed: Websites scraping content from other sites often grab the RSS feed to automatically import and publish the original content.

    The best strategy to deal with this situation is to add a custom feed footer section which not only includes a copyright declaration but also includes a link back to the original article. Scrapers generally avoid such RSS feeds to ensure their malpractice is not exposed to the general public.

    Let's check out some feed protection tools.

    If you're using Google's Blogger platform, use the following process to inject a custom feed footer section within the blog's RSS feed.

    Custom feed footer for Blogger platform Within your blog's dashboard, go to Settings → Other → Post Feed Footer and write the custom code (CSS/HTML/JavaScript) to integrate your very own feed footer for your blog's RSS feed.

    To make it work correctly, make sure the feed entries are offered at full length. If you're offering partial feed entries, custom feed footer will not appear.

    Self-hosted WordPress users can use the following options for feed protection.

    Most self-hosted WordPress blogs use Yoast SEO plugin. If you're not using it, install it right away. Thereafter, go to SEO → Dashboard → Features → Advanced setting pages option.

    Enable this option to activate the advanced features which include RSS customization as well.

    Custom feed footer via Yoast SEO After activation, go to SEO → Advanced → RSS option. Here you can create feed footer section as shown in the image above. This option provides several placeholder variables to easily insert vital information within the feed footer.

    For technically sound users, there another handy option to create a custom feed footer section.

    <?php
    if(!function_exists('custom_rss_feed_footer_section')) {
    
     function custom_rss_feed_footer_section($content) {
    
      if(is_feed())
      $content .= 'Write custom feed footer text here.';
      return $content;
     }
    
     add_filter('the_excerpt_rss', 'custom_rss_feed_footer_section');
     add_filter('the_content', 'custom_rss_feed_footer_section');
    }
    ?>
    All you need to do is to paste this code snippet in your theme's functions.php file. Thereafter, you can write the custom text within the placeholder quotes indicated in the snippet above.

    If you find a website using your modified RSS feed, simply ignore it. Search engines sooner or later will notice the original source of the copied content via custom feed footer and will punish by literally wiping it out from the search results.
  • License Content: As already stated at the beginning of this article, your content is automatically copyrighted whenever it is published on your website.

    In this case, though not mandatory, you can insert a simple copyright statement within the footer of the website. Sometimes, publishers allow reuse of their content in a specific way.

    Creative Commons license If you're looking for such a custom license, you can quickly generate a customized Creative Commons license for people who're looking to reuse your content in a specific way.

    After license generation, you also get a handy HTML embed code to prominently display it on your website. The best place for displaying Creative Commons licenses is at the end of every article.

    For publishers, not allowing any kind of content reuse, a simple copyright statement isn't enough. In addition to that, add the following two things to clearly make everyone aware of it.

    Create a Copyright Policy page and display its link within the footer.

    Copyscape banners And, if you have enough space within the footer or you're using a multi-column footer, embed one of the Copyscape badges shown above. You can easily grab these banners from here.

    Apart from embedding it in the footer, there's no harm in including one of these badges within your Copyright Policy page. In a nutshell, let your website visitors clearly see the content copyright policy applies to the published content.
  • Watermark Images: Another effective method to deter content thieves is to watermark images. This strategy can stop more than half of the image stealing attempts.

    WordPress users can use this image watermarking plugin to automatically process the uploaded images. For other platforms, one can choose from these online watermarking tools to protect images right from within the web browser.
  • Disable Image Hotlinking: Some lazy content thieves use direct links of images hosted on your web server. This way, they not only steal content but also steal your server's bandwidth.

    The remedy to this problem is to disable image hotlinking on your web server. There are several ways to achieve this. This most common method is to add the following directives within the .htaccess file present in the root directory of your website.

    RewriteEngine on
    RewriteCond %{HTTP_REFERER} !^$
    RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?yourdomain.com [NC]
    RewriteRule \.(jpg|jpeg|png|gif)$ http://path-to-alternate-image.tld/hotlink.gif [NC,R,L]
    The dummy URL in the last line of the directives should be replaced with the URL of the actual replacement image to be automatically displayed in place of the original image.

    The method mentioned above works fine with sites powered by Apache web server. In case, you're using Nginx as your web server, add the following directive inside your Nginx configuration file.

    location ~ .(gif|png|jpe?g)$ {
         valid_referers none blocked yourdomain.com *.yourdomain.com;
         if ($invalid_referer) {
            return   403;
        }
    }
    Within both the directive snippets, do not forget to edit and replace dummy domain name with your website's domain name, wherever required. Though one can use a plugin for the same, I'll strongly recommend using the method mentioned above.

    Another method to discourage image stealing is to disable right-click on the website. It's an extreme step and should be avoided as legitimate visitors find it very annoying.

Wrapping It Up

Let's once summarize and add a few more bits and pieces to this guide to grab all the essentials of content theft detection and the methods to deter content scrapers.

We learned that there are several ways to detect content theft. My favorite is the use of Google Alerts and I'm sure you'll love it too. And, a quick manual search is best when you want to be sure about a specific piece of content and doesn't want to wait for the arrival of an automated alert.

While dealing with a content thief, the most potent action is filing a DMCA complaint via an ad network currently active on the website. Blocking the income stream is the best way to force the culprit to comply.

If you're publishing exceptional long-form content, a custom feed footer with a link back to the original post is a must.

Let's finish things with a few more tips and tricks to combat content theft.

  • Make sure you've claimed and registered your websites within Google webmaster tools and Bing webmaster tools. This way, you can easily get vital reports related to site content and its search presence.
  • Devote some time in creating a clear and concise copyright policy page copy. Include a section that summarizes the entire policy in layman's terms.
  • Consider using a security scanning plugin on your WordPress site to detect suspicious automated content scraping bots.
  • And last but not the least, focus most of your energy in creating evergreen content. No matter how much your content is scraped, in the long run, your site will flourish beating all the content thieves.