Archive for software

New Backup Solution - JungleDisk + Amazon S3

I’ve settled on a new backup solution. I’m going to be using Amazon’s persistent storage solution called S3. Amazon provides a virtually infinite, scalable storage cloud that allows you to store files indefinitely. You pay a small fee to get the data there ($.10/gig), a fee to store the data ($.15/gig/month), and a fee to retrieve the data ($.17/gig).

Features I was looking for:

  • Reasonably Priced
  • Automatic
  • Reliable
  • Scalable
  • Well performing
  • Easy

Price

To automate my backups, I’m using a product called JungleDisk. You can purchase it for $20, and you get free upgrades for life. I love products that have free upgrades for life, since I don’t have to worry about when buy it. They also allow you to use it on unlimited computers, which I definitely need.

JungleDisk by default talks directly with S3, so they don’t need to run any servers. You’re counting 100% on the reliable storage provided by S3.

Organization

Amazon stores your files in "buckets", which you can think of as a single level folder/directory structure. JungleDisk can easily connect to multiple buckets at the same time. You configure each bucket independently. JungleDisk can automatically detect all of your buckets, and you can easily create new ones.

image

I highly recommend creating buckets for each logical group of files you want to back up. Try to avoid sharing a bucket between computers when possible. If you tell JungleDisk that a bucket is only used on one computer, it doesn’t have to query S3 to determine what needs to be synchronized. The default is set to multiple computers. This setting is under the "Bucket Settings" for each bucket.

image

Each bucket also lets you choose what to back up. Of course there are extensive options for backing up subfolders, excluding files, etc.

image

You can even set up how your local folders get mapped to the remote folders. This lets you do multiple folders in one bucket. On my laptop, I have a couple of folders that get backed up to a single bucket, but are organized into different folders so that I could easily restore them independently.

image 

Scheduling

Setting up a schedule is very easy. For example, on my laptop, I have it set to synchronize my files every hour. It uses the timestamp’s of the files to determine if there are any new, changed, or deleted files. Since I’m not sharing this particular bucket between machines, it can instantly determine if anything needs backed up.

image

 image

Security

JungleDisk has all the security options you would expect. You can communicate with Amazon over port 80 unencrypted, or use SSL. I actually turn off the SSL option, because I use the JungleDisk encryption. I don’t see a reason to do double encryption.

image

When you create the bucket, you can specify a custom key that encrypts your data. I like this option because I am the only one that has access to the data. Even Amazon can’t tell what I’m storing on there.

image

Performance

Since Amazon is providing the storage, they’re able to scale indefinitely. You can be confident that they can handle whatever you throw at them. They had no problem letting me upload at over 56,000 kbits/sec from my dedicated host. I backed up 4 gigs in about 10 minutes.

Fast Upload

Other Features

  • Bandwidth limiting - If you don’t want to use up all of your upstream or downstream bandwidth, you can limit it, and even schedule when it’s limited. This could be useful for limiting the connection during the day. However, I much prefer a QoS solution since it will maximize the amount of bandwidth I can use.
  • Previous versions - There are extensive options for storing previous versions of changed or deleted files. This option is very impressive, and great for documents.
  • Network drive - You can make a bucket show up as a drive on your computer, which allows you to drag and drop files to and from the bucket.
  • Jungle Disk Plus - For $1/month extra, you can get JungleDisk Plus. They use an Amazon EC2 server to proxy your data to S3. This allows you to resume large file uploads, and also lets you send just the differences. If you’re backing up large files and/or files that may have sections change frequently, this could end up saving you money.

Using an army and luck to reach critical mass

This post is going to explain the importance of your product reaching critical mass. When I say "product", I mean an actual product your selling, or simply a website or blog. When I’m talking about critical mass, I’m describing the point at which your product becomes viral, sometimes known as the network effect. This should be a lesson to anyone thinking of creating their own product or service.

image

Above, you’ll see the typical technology adoption bell curve. What you need to realize is that you’re starting on the left, and you’re trying to get up the hill. Do you think it’s easy? Well, judging by Youtube, twitter, milliondollarhomepage, Digg, or myspace, it must be easy!

The truth is, you should have a path to success. Here are just a couple of paths that have worked for other products:

  • Create a product that is leaps and bounds better than anything your potential customers have ever seen - An example is Google, which was originally created by students in college. The brilliance of the algorithm and its implementation were the start of a massive company.
  • Have an army of followers that listen to your advice - A great example is Steve Jobs. Before he even makes a new product announcement, people line up at Apple stores. People trust that he’ll make cool stuff, so they listen to whatever he says. You can bet that if Steve Jobs mentioned your product, you would people lining up at your door to buy it.
  • Get lucky - It happens time and time again. Multiple products are released at the same time, all with similar features and price. Sometimes one of them gets lucky, and the others die. An example is the VHS vs Betamax format war. VHS was considered the inferior product, yet it went on to become the de facto standard.
  • Create a product that is viral by nature - Twitter and Myspace come to mind. Once one person joins, they’re begging their friends to use if. If they don’t, the service is useless. The result is that you get an army of free advertisers talk to your key demographic.
  • Spend a ridiculous amount of cash to bombard users with advertising - infomercial’s and those annoying "we’ll double the offer" commercials come to mind.

Hopefully I’m making the situation look difficult. I couldn’t find any concrete numbers, but you can be sure that more than half of online businesses fail within the first couple of years. That includes well-funded businesses. If you expect to start the next Fog Creek Software while working part time in the evenings, you need to have a plan.

The best advice I can give you is to do whatever it takes to get your product into the hands of as many people as possible. It might mean making partnerships with someone of influence, or it might mean creating a viral marketing campaign. It might also mean that you’ll have to give your product away for free, build up your army of followers, and then invent another great product. If you already have a product with a good user base, you’re probably already in good shape. If you are just starting out, don’t think that people will magically find you, unless you’re counting on the "lucky" path I described.

DreamHost disallows use as a backup service

I just received this email from DreamHost:

Dear Jason,

Our system has noticed what seems to be a large amount of "backup/non-web" content on your account (#xxxx), mostly on user "xxxx" on the web server "xxxx".

Some of that content specifically is in /home/superjason/Backup (although there may be more in other locations as well.)

Unfortunately, our terms of service (http://www.dreamhost.com/tos.html) state:

The customer agrees to make use of DreamHost Web Hosting servers primarily for the purpose of hosting a website, and associated email functions. Data uploaded must be primarily for this purpose; DreamHost Web Hosting servers are not intended as a data backup or archiving service. DreamHost Web Hosting reserves the right to negotiate additional charges with the Customer and/or the discontinuation of the backups/archives at their discretion.

At this point, we must ask you to do one of three things:

* You can delete all backup/non-web files on your account.
* You can close your account from our panel at: https://panel.dreamhost.com/?tree=billing.accounts (We are willing to refund to you any pre-paid amount you have  remaining, even if you’re past the 97 days. Just reply to this email after closing your account from the panel.

OR!

* You may now enable your account for backup/non-web use!

If you’d like to enable your account to be used for non-web files, please visit the link below. You will be given the option to be charged $0.20 a month per GB of usage (the monthly average, with daily readings) across your whole account.

We don’t think there exists another online storage service that has anything near the same features, flexibility, and redundancy for less than this, so we sincerely hope you take us up on this offer!

In the future, we plan to allow the creation of a single "storage" user on your account which will have no web sites (or email). For now though, if you choose to enable your account for backups, nothing will change (apart from the charges).

If you want to enable backup/non-web use on this account, please go here:

https://panel.dreamhost.com/backups.cgi?g=xxxx

If you choose not to enable this, you must delete all your non-web files by 2008-06-29 or your account will be suspended.

If you have any questions about this or anything at all, please don’t hesitate to contact us by replying to this email.

Thank you very much for your understanding,
The Happy DreamHost Backup/Non-Web Use Team

image

Admittedly, the primary reason I chose them was because they’re a great backup solution. I didn’t realize that it was against their terms of service, in fact, there is a ton of information out there about using rsync to backup your files to them. I’ve been doing it for well over a year now, and I’ve been recommending the service to others.

I had heard many bad things about DreamHost, but they were working great for backups. Now you can’t use them for that, what do they have going for them? I can’t imagine many users have 400GB+ websites that their hosting. If they are, I have a hard time believing they’re using cheap shared hosting!

Now I’m looking into other backup solutions, and it’s looking pretty grim. Since my server and laptop both run Windows 2008, there aren’t many solutions available. For example, Mozy requires you to use their business version, which is ridiculously expensive.

Another option is JungleDisk, which uses Amazon’s S3 service. I would be looking at paying $30/month to backup around 200GB.

Does anyone have other ideas? The solution has to be automatic, so that I don’t have to count on remembering to do it. It also has to be off-site, because I don’t want to lose all of our photos of our house burns down.

Customers trust you, even if you don’t deserve it

Jeff Atwood over at Coding Horror had an interesting post about sites that ask for your email password to lookup contacts in their system. He suggests that they stop doing that immediately, and long-term, find a more secure solution.

Trust

I can understand where he’s coming from. He doesn’t want to hand over the keys to all of his information.

I see things a little differently, because I’ve been on the other side of the fence. Imagine that you’re writing a site and your #1 goal is to make it easy to use. Jeff himself is a huge advocate of usability. The problem is that developers don’t have an unlimited amount of time. The quickest way to make something easy to use at this point is to simply ask for the information, grab the addresses, and be done with it.

I agree that if the major email providers provide a more secure way to access the data, it’s certainly worth investigating.

The second point I’d like to make is that Yelp probably doesn’t care that Jeff won’t give his personal information. It’s an optional step to save him time, and even if he doesn’t use the service, he probably represents only 1% of the users that use the service.

I’m very paranoid when it comes to passwords. I have automatically generated random passwords for every site that I use. I still have been trusting enough to give sites like Facebook my Gmail credentials to go check my address book. I should be changing my email password on a regular basis anyway.

The fact is that 99% of users will happily give over any information that you ask for.

When I do computer work on the side (which I’m trying to avoid these days), I’ll ask for a certain password and strangers will happily give me all of their personal information. Bank account passwords, email passwords, work passwords, etc. I try to tell them they shouldn’t do that, but you’re not going to change everyone’s attitude overnight.

Remember, more than 70% of people would reveal their password in exchange for a bar of chocolate!

I’m sure those services in question would like to have a better solution for accessing the data, but it’s probably at the end of a long list of potential features. The only way that’s going to change is if they start losing a significant number of customers over it.

The winner of the free software is….

The entries for the free software contest were a bit disappointing. Only 14 entries. I guess you all have MSDN accounts!

Trophy

Out of those entries, only one added a backlink to the site. His username is Akbar. By default, he gets first choice at what he wants. The challenge is that he lives in India, so I’m working with him to see if there is a way to get it to him. If that doesn’t work out, I’ll choose someone else randomly.

Choosing the second winner was a little harder. I had to come up with a way to generate a random number between 1 and 14 (the number of entries). Random.org to the rescue. I used their site to generate a table of random numbers. To make it even more random, I decided to use the 8th value in the first column of values (100 numbers were generated). I figured if the previous winner or a trackback was chosen, I would generate another set.

So the winner with second choice for the software is….

Entry #5, which is "Federiko".

Congratulations to both of you!