To test, I used the 2013 Crime Dataset from the City of Chicago. It is around 73 mb. I tested EBS locally mounted, NFS mounted, and S3. Each test was run on a m1.small instance in us-west-2, and the s3 data was stored in the same region. The NFS mount was an m1.large instance in the same region. The instances ran Ubuntu 12.04 LTS. S3 was tested using pandas read HTML and read S3 options.
The test code loaded each dataset from the respective location into a pandas dataframe.
Method | Time (s) |
---|---|
S3 over HTML | 23.8897938728 |
S3 over S3 | 28.6781361103 |
Local Mount | 15.3996708393 |
NFS Mounted | 17.7497649193 |
So, EBS is faster than loading from an S3, but not by as much as you’d think. I’ll try and run this with and IO optimized volume sometime soon. Something is up with Boto’s S3 loading options, though.