Tuesday, June 15, 2010

tmpreaper and ctime, atime, and mtime in Ubuntu

There is a package in Ubuntu can be used to clean directories with those files older than a certain period of time. Before we get into that, let's first clarify three terms related to times in Ubuntu: ctime, atime and mtime.

ctime is the creation time of the file. Say I created this file at Wed Jun 16, 9:45:15, 2010, then this time spot will be exactly the ctime of this file.

atime is the access time of the file. Displaying the file contents or executing the file script will update the atime.

mtime is the modify(ication) time which will be updated when the actual content of the file is modified.

Back to the tmpreaper command, since it is not default installed into the Ubuntu, you have to sudo apt-get install to get the latest version of tmpreaper.

It is a simple command tmpreaper TIME-FORMAT DIRS to invoke the function to do the clean job for you.

TIME-FORMAT is a parameter that specifies the duration of the file which has not been accessed. By default, the time here is about atime. So even if you modify the content in a later stage but does not access the file, the file might still be deleted. Of course, you can enforce the command to run in terms of mtime which you have to append --mtime to the original command.

While the DIRS is the directory you would like to invoke this function, such as /tmp. Never try to do such a thing on the root directory or you may encounter a disaster.

If you have to manually run the command every time, then there is no sense to use this. While the power strengthens with combining another tool CronTab.

CronTab is used to create cron job to run specific script in a period of time. In order to run the cron job, all you need to do is write a script which include the command we have talked previously, then edit the configuration file of CronTab, then the scripts will run as you required in the background.

To edit the configuration file, simply run sudo crontab -e, add an entry in to the file. The format of the file is m h dom mon dow command, the first five sections are divided by space, and you can use asterisk to specify anytime like a wildcard.
Fox example, * * * * * /XXX.bash will run every minute. More usage can be seen from the documentations.

Sunday, June 13, 2010

Handling Azure Large File Upload

In Azure storage, files smaller than 64MB can be directly stored as a single blob into the storage. However, when you want to store a file larger than 64MB, things will become a little bit complicated. The way to accomplish this task is to use the block list service.

Block, unlike blob, is a small unit of file which can be aggregated as a list to form a large file, with each of the small chunk to have a limit of 4MB. For example, say if you have a 100MB file which you want to store into Azure, you have to manually split the files into at least 25 pieces, and then using the put block & put block list operation to upload all the 25 items. More details are listed below:

1) Split large files: this can be done in various ways, via existing tools or write your simple code. Pay attention to write down those file names and make them in the sequence you split them.

2) Put Block: Each of the pieces created last step is called a block, and the second will upload each block one by one into the storage via Put Block operation. The basic process is no difference with other methods, however, one thing need to pay attention is the blockid is a required parameter and all blockids of the blocks must be the same size. In our example, you can have a Base64 blockid with arbitrary length less than 64, but you have to enforce all of the 25 items to have the same length. If not, a 400 exception, or The specified blob or block content is invalid error message will be returned.

3) Put Block List: The last but not the least step is to notify the server that all pieces are uploaded and now it's your job to combine them altogether.

After the three steps, you will be able to upload any size files into the Azure storage.