How to protect your copyright content of website from theft?

How to protect content from copyright theft?

Approaching it is time to big summer school holidays, and thousands of students flock to attracting and endless ocean of possibilities - the Internet, in order to cut the bubble out there, and by next Sept. 1 to drive up to the school on the new line of Lamborghinis.

Of course, the desire of young people is commendable, but our young innovators are always trying to follow the path of least resistance: "I want to do nothing, and in order to loot the stream flowed!" In this endeavor our "future Zuckerberg buys parsilku and begins to methodically steal content from sites and blogs , rivet doorways yes sattelity and pollute the search engines.

Search engines somehow trying to deal with this phenomenon, creating all sorts of filters, but
What remains to do to the victims of theft - for those who do not know firsthand that earnings on the Internet is a big job: the creators of blogs, for which every post no more, no less your own child, the owners of information projects, invest in content, to hire professional copywriters and Journalists?

Until recently they had to just sit and watch as the young innovators "free download" their content megabytes and prevraschiyut it in doorways and linkopomoyki for selling links. But not now!

How did it start?

On the preservation of the content on their own website especially do not think about it, yet, according to a famous Russian proverb about thunder and man, I do not find yourself in the role of victims of kidnappers content. So it was with us. One day I discovered that the site of my wife, which she so diligently and lovingly filled content of more than 5 years, razderbanil on Doric one (and maybe more than one) beginner black optimizer. I will not describe in detail the reaction of my wife on the news. Let me just say that the tragic situation lies in the fact that Yandex first indexed the stolen articles, not the originals from our website. How can this happen?

It is known that search engines love updated site. Search engine spiders visiting your site more often if it is constantly new articles. But the problem is that an ordinary blogger physically can not produce as much content. But the average thief - you can! Still - armed with a parser, it configures its own pseudo-site on avtopublikatsiyu stolen content so at least once in five minutes! In the eyes of search engines such site into a dynamic news - not less. Especially if you make it properly marinate Domain ...

How did I do?

I went to Google a solution. That's what I found:
  • Scripts that disable selection / copying text in the browser - a kindergarten, and only. From the parser does not save, but ordinary users pobesit - for sure!
  • Encryption of content and output it to the visitor through JavaScript - not suitable if the main source of traffic to your website - search engines.
  • Plugins for WordPress, delaying renovation of RSS-feed - save only if the content is stolen only RSS-grabber, and even then not always.
  • Some unbelievable database of malicious robots, mostly from China and Russia, it is unclear as there falling enjoyed by the U.S. government and the pair zabugornye university - apparently, they have there, too, sawing ...

Numerous discussions on blogs on this issue unanimously end conclusion: "You do not want to steal - not published on the Internet!" Maybe I'm bad looking, but it would seem, the problem has existed for a long time - should not exist and the solution. But no.

What you need to protect?

And need the following:
  • Do not give access to content by any automated systems, except for a specified list of useful bots (search engines, RSS-aggregators, etc.), and all people freely admit to the site. In other words, we need even before you load the page to determine who is trying to download it: the boat or person.

So, not finding a suitable solution, I began to develop his own. I needed to make the system of content protection, which would be independent of the site engine, because my wife and a few sites, all on different engines. Two weeks picking documentation Web standards gave me the way to solve the problem. And the solution is lying at the intersection of several technologies. In the end, loomed a software package, which would simply be criminal not to make it public service!

What happened?

It took three months to develop ready-to-use system consisting of a "brain", analyzing incoming traffic to the site and the Personal Account with all the necessary settings, and billing.
In a small digression to point out that in assessing the labor should not rely on I specify a period of 3 months. Man-hours actually typed on the strength of the month. I just worked in between boiling the bottles for my son's four now:)

So, let me submit to your court my creations:


The system allows you to define BOTFILTER bots before loading the first page of the site.
This implies the main features of the system:
  • prevent copying of content malign robots
  • Spam Protection

Yes! Captcha is no longer needed! Robots which are able to recognize the captcha go forest:)

Of the extra buns are currently working:
  • Speed-limiting access to the site. If some kind person will very quickly update your site any heavy page, you can make a bad site. BOTFILTER will not permit, and a couple of minutes will limit access to this rascal.
  • DNS-editor to be placed on the service domain with the support of wildcards-records. Just a useful thing, if your hosting does not allow such.

Connect your website or blog to BOTFILTER very simple: just change the NS-server for your domain, and system settings include the real IP-address of your site. After this, all traffic to your site will go through the "brain" of the system: spiders, which you yourself have chosen, and people would normally see your site, and malicious bots instead of the site will be error "403 Access Denied".

How to check whether the system? Very easy too. After connecting and updating the DNS-records (your site should start pinging on the new IP-address), try very quickly poobnovlyat your site F5. Will give you a message:
You can also check job protection through the curl , or wget , if you are on Linux: just type
# curl ваш-сайт.ru
instead of the site and get the message "You are a bot, dear!", and going through a browser, make sure that the site opens as usual.

The system is quite reliable. The architecture is such that if BOTFILTER deny the "brain", the sites of customers will continue to boot normally. A more detailed description of the system you can find on the site .

The plans for the future

Firstly, if I was not cast aside tomatoes in the near future I plan to tell habrasoobschestvu more detail about the internal structure of their system. Moreover, the happy owners of their own servers can build for themselves a similar system to protect content from theft, cross haproxy , nginx , Node.JS and Memcached . But those who do not have your own server, of course, is easier and cheaper to use my service.

Secondly, I plan to improve the resiliency of the system by increasing the server capacity and the purchase of 1Gbit-channel. At present the system in testing processes up to 5000 requests per second, but this is clearly not enough when suddenly one of the clients will pull hard real DDoS-attack.

Third, despite the fact that I did not plan to protect the sites from DDoS - for this is the special services , but in the near future BOTFILTER system will protect the site from overloading. We all remember the story of the site Skolkovo . So, if they used BOTFILTER, their site would not fall.

Well, in the fourth, I want to try to stick to the foreign market. The site's interface has been translated into English. Work is underway to translate the documentation.

Instead of a conclusion

If someone has already guessed how this all works, much to ask - do not write about it in the comments. Will not simplify the thief lives. Let them head broken.

Run the project with absolute zero, so that without support habrasoobschestva I can not do. If you have friends, bloggers or site owners do not consider for work to send them a link on this topic. I am confident that my service will still be useful to people.


Post a Comment