Sunday, November 1, 2009

Effects of too many files in a directory

How to efficiently process a large directory

in reference to:

"In addition to the while suggestion above, depending on the filesystem you use, your suggestion to hash the directories can be a good one, especially if the system has to read multiple disk blocks to find the file you are looking for.

The same concept has been applied to mail spools (qmail), and suggested to help speed up access to home directories on hosts with large numbers of "users".

With the number of files you are considering, you probably want to consider something along the lines of NUMFILES < SPLIT ** DEPTH where SPLIT is the number of subdirectories that can fit in one disk block, and DEPTH is how deep your directory structure should go. Once you get to the point where NUMFILES is larger, then you start needing multiple directory reads to find the file you need to open.

Add this to the while suggestion above, and you should be able to access each individual file (such as an open() call) as quick as the OS can handle it.

Of course, this is all IIRC, and it has been a while since I have applied this in my studies.

Now this is all based on older file systems (inode, UFS style, chained directory block style, etc). The newer filesystems (btree, reiser?) may not have this "problem" anymore."
- Re: Efficient processing of large directory (view on Google Sidewiki)

Storing a graph with millions of nodes as files

Nice way of using MD5 hash to distribute files in a directory hierarchy.

in reference to:

"I concur with rhesa's method. I've used this before with considerable success.

I just untarred a test structure containing a million files distributed this way using 3 levels of subdirectory to give an average of ~250 file per directory. I then ran a quick test of opening and reading 10,000 files at random and got an average time to locate, open, read and close each file of 12ms.

#! perl -slw
use strict;
use Math::Random::MT qw[ rand ];
use Digest::MD5 qw[ md5_hex ];
use Benchmark::Timer;


our $SAMPLE ||= 1000;

my $T = new Benchmark::Timer;

for my $i ( 1 .. $SAMPLE ) {
$T->start( 'encode/open/read/close' );
my $md5 = md5_hex( int( rand 1e6 ) );
my( $a, $b, $c ) = unpack 'AAA', $md5 ;

$T->start( 'open' );
open my $fh, '<', "fs/$a/$b/$c/$md5.dat" or warn "fs/$a/$b/$c/$md5
+ : $!";
$T->stop( 'open' );

$T->start( 'read' );
my $data = do{ local $/; <$fh> };
$T->stop( 'read' );

$T->start( 'close' );
close $fh;
$T->stop( 'close' );
$T->stop( 'encode/open/read/close' );
}

$T->report;
__END__
c:\test>612729-r -SAMPLE=10000
10000 trials of encode/open/read/close (112.397s total), 11.240ms/tria
+l
10000 trials of open (110.562s total), 11.056ms/trial
10000 trials of read (158.554ms total), 15us/trial
10000 trials of close (365.520ms total), 36us/trial
[download]


The files in this case are all 4k, but that doesn't affect your seek time. If you envisage needing to deal with much more than 1 million files, moving to 4 levels of hierarchy woudl distribute the million files at just 15 p"
- Perl solution for storage of large number of small files (view on Google Sidewiki)

Saturday, June 27, 2009

Shake Dance

I saw this funny guy on my way to PATH station at Journal square and couldn't stop myself from shooting it. I like him.


Wednesday, April 29, 2009

Fun Videos

I really like the following videos. I stole this one from one of my friend
Shubha's status message. It clearly depicts how gestures, eye contact and body language can speak more than even what words can express. I have always faced difficulty in expressing myself with clear words. I somewhat lack verbosity. Here is it. Enjoy




This one is funny and I love to watch it every time.




This one is so good and inspiring.



Monday, April 13, 2009

Self Composed Quotes

I often try to come up with quotes related to my experiences and thoughts or read them from different sources and put them in my Google status message and then they get lost after some time. From now on, I will keep them into my blog so that I can remember them. Here is the first to start with from my favorite Matrix :


  • You are only as good as your actions

  • If you’re not falling down every once in a while, you’re not running fast enough

  • Change is the only constant

  • What do you do when your only option does not work, you never give up

  • If you are good at something, never do it for free.

  • Thoughts have no limits, actions do.

  • Mind has no limits, body has.

  • Either you die as a hero or wait long enough to see yourself become a villian

  • Sometimes medicines are awful but patient needs it.

  • Death is the most beautiful agent of life.

  • Have the courage to follow your heart and intuition. Somehow they already know what you want to do, everything else is irrelevant.

  • Your anger gives you great power, but if you let it, it will destroy you.

  • Training is nothing, will is everything

  • To manipulate fear of others, you must first master your own.

  • Sometimes you gotta run before you can walk.

  • Don't think you are, Know you are.

  • It is purpose that creates, purpose that connects us, purpose that pulls us, purpose that guides us, thrives us, it is purpose that defines us, purpose that binds us.

  • We are not here because we are free, we are here because we are not free.

  • We have already made our choice, now we are here to understand why we made that choice.