Importing Data From Ning – July 26, 2013

I have a customer who is trying to move their website from to a new web hosting provider, and ning supposedly provides a method for exporting the data, but they acknowledge that their data format is broken, and so requires manually editing to make it readable by other parsers.  They bug they know about is that the ning files start and end with extraneous parentheses, and so those have to be removed prior to reading by any JSON parser.  You'd think that Ning could remove those parentheses, but perhaps this is their goal - to not let people leave their service after starting with them.  If that is the case, then I'm sorry to say that I'll be wrecking their business model by showing you how to get your data out of Ning...

They use the JSON format, which seems like a good idea, except that support for JSON is pretty lacking in lots of languages.  PHP only supports JSON files up to 1MB in size before it flakes out, so I originally tried to cut each file into chunks and parse them and then glue them back together.  That involved lots of hacking on the raw JSON and was prone to error.

I then tried the built-in json and simplejson and finally in python, but couldn't get them to properly read the file at all.  Next up was the jsoncpp project, but I couldn't get that to compile the given examples, and after searching and finding other people saying they couldn't compile it either, and author just saying, "works for me, I don't know what your problem is", I gave up on that project as well.  The next failure in the works was, which I forget how that failed; it might have just failed due to the badly formatted json data from Ning.

The main isssue with all of these parsers is that they just crash and say, "Can't import!", and so there isn't any way for me to look at the data to see what is wrong.

I gave it another go-round with JSON.awk, and tada! it actually reports the location that the parser couldn't figure out what was going on, and that is when I found out that Ning's JSON data forgets to put in commas between objects every once in a while.  The bug appears to only affect Ning export files that are over 1MB in size.  Perhaps they are using the buggy PHP JSON writer, and presumably it has a similar bug as the reader with large data sets.

Here is my script using sed:


for file in *.json; do
  sed -e 's/^(//' < $file > $file.new1
  sed -e 's/)$//' < $file.new1 > $file.new2
  sed -e 's/}{/},{/g' < $file.new2 > $file
  rm $*

This removes the starting and ending parentheses and then adds in the missing commas.  The only bug in this script that I know of is that if someone typed in }{ in their signature or forum discussion, etc. it will also be transformed to },{ which I'm willing to live with - it only happened once in the data that I am importing, and that user will wonder why his strange smiley face or whatever he was trying to do now has a comma in it on the new site, but close enough after 8 hours of hacking on the ning data.

I haven't yet finished the project, which is to get the data imported into drupal, and so I might write another script to get the data from the JSON.awk output into something that is easy to read with a PHP/Drupal importer, but I'm at least moving in the right direction now.

Lastly, I should note that this script is based on a script by Chris Chiappa, who got me going on Linux in 1998, and I saved the sed script he gave me to edit all of the libraries (binary format) on my system to get it upgraded from libc5 to libc6.  As my first introduction to sysadmin-ing on Linux, it seemed pretty crazy (and I still think editing binary files that are supporting the entire operating system with a scripted text editor is pretty nuts) and I wondered if I would ever think Linux was a good idea.  But, I'm happy to say that other than that first issue, I've had very few issues in upgrading ever since - actually, I'm not even sure if I've ever reinstalled the system since that time - I've changed various bits of hardware over the years, but I don't think I ever reinstalled.  In a quick look around the system, the earliest files I can find are from 2002, so maybe I reinstalled then, I don't remember.