HTML Obfuscation – February 19, 2010

I just finished a fun project today for a customer that involves trying to prevent HTML from being easily copied.  It isn't possible to prevent someone like myself from getting the content, but you can go pretty far in preventing an average user from getting the content.

There are a couple commercial solutions available, but they fail in different ways, mostly notably not working in some browsers (I still need to test mine with Safari and Opera), but even in Internet Explorer, all of the solutions I saw let you "save as text" and you get the entire text content of the page.  My customer is particularly interested in protecting the text content, so those options weren't acceptable.

Most of the project wasn't interesting, just fighting with javascript and encoding and quotes, etc. but the last part: obfuscating the text, was pretty neat.

You can see an example of the protected HTML: protected.txt.

There are a couple improvements that could be made - adding a recursive encryption thing to go through a couple layers of encrypting the encrypted stuff, so it would make it harder to manually get the content.  (I once manually decrypted someone's javascript and I almost gave up at the 8th recursion, but that was when it was finally done, so I'll need to make sure to do more layers than that...)  It'd be nice to have other tags than span, but if I use div, IE puts a paragraph break when saving as text, so it is more obvious what is the right text.  I think that maybe adding some attributes into the span tag, and perhaps having some attributes with a < in them should make it a little harder to write a regular expression to remove the purposeful obfuscation.

The other neat thing about this project is that the HTML is dynamically generated from source .js and .html files, and I am planning on using a tinymce editor for the customer, so I'll need to update the script to parse the generated HTML from the editor in order to obfuscate it.


Questions? Have Anything to Add?
(your comments will be published on this site - click here for private questions)

I once wrote a web page that had its contents decompressed on the client side by a javascript implementation of LZW.

Posted by Peter V on February 19, 2010, 7:31 pm