Yarrago: 2010

20 November 2010

Install NVidia Driver On Fedora 14

I currently have Fedora 14 running on my ~~ancient laptop~~ primary PC and whilst the nouveau video driver is probably good I do find one major problem with it on this laptop. Unfortunately with the nouveau driver the fan on the graphics card runs constantly which is just noisy and annoying. I have used i8k tools (in conjunction with gkrellm) in the past to control this but I've had my share of problems with it causing the fan to cut on and off sporadically and so I just find the best solution is to use the proprietary NVidia driver. Unfortunately getting them on Fedora isn't as easy as it should be, but it isn't to hard if you know what your doing.

I use vi to edit config files but if your not familiar with vi just substitute it with nano and you should be on your way.

Installing The Driver
Download the latest drivers from NVidia. The current version for my card is NVIDIA-Linux-x86-173.14.28-pkg1.run (but this should work fine for any version of the drivers you need). Make sure you know the full path where you are saving them, for this example I will use:

/home/username/Downloads/NVIDIA-Linux-x86-173.14.28-pkg1.run

# su

Blacklist the nouveau driver by adding the following lines to the end on the blacklist config file (I'm not sure this step is necessary, if someone tries without it and it works let me know...I've just been doing it ever since I worked out how to install the drivers a few Fedoras ago and I'm not sure it's still relevant):

# vi /etc/modprobe.d/blacklist.conf 

+ # Blacklist Nouveau

+ blacklist nouveau

Add the following kernel boot option: nouveau.modeset=0

# vi /boot/grub/menu.lst

--- /boot/grub/menu.lst.bkp    2010-11-01 00:00:00.000000000 +0000

+++ /boot/grub/menu.lst    2010-11-01 00:00:00.000000000 +0000

@@ -13,7 +13,7 @@

 title Fedora (2.6.35.6-45.fc14.i686)

     root (hd0,1)

-    kernel /vmlinuz-2.6.35.6-45.fc14.i686 ro  root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root  rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8  SYSFONT=latarcyrheb-sun16 KEYTABLE=us rhgb quiet

+    kernel /vmlinuz-2.6.35.6-45.fc14.i686 ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us rhgb quiet nouveau.modeset=0

     initrd /initramfs-2.6.35.6-45.fc14.i686.img

I have found that in Fedora 14 changing to runlevel 3 can create problems. We should just be able to:

# init 3

and have the system kill the x server, but I have found that it can lock-up and if it does make it the xserver seems to still be running at some level and the NVidia driver complains during install. So the easiest way to get around this is to start the system in runlevel 3 (temporarily) and then swap back once we have the driver installed. To do this change the default startup runlevel in inittab from 5 to 3:

# vi /etc/inittab

- id:5:initdefault:

+ id:3:initdefault:

Reboot the PC:

# reboot

When you reboot you should have a console login. Login and then su as root again.

Make the driver install runnable:

# chmod +x /home/username/Downloads/NVIDIA-Linux-x86-173.14.28-pkg1.run

Run the install package:

# /home/username/Downloads/NVIDIA-Linux-x86-173.14.28-pkg1.run

Change the default startup runlevel from 3 back to 5.

# vi /etc/inittab

- id:3:initdefault:

+ id:5:initdefault:

Now you can either reboot or manually set the runlevel to 5:

# reboot

or
# init 5

I have also found that with the nVidia driver the ppi (dpi) gets set strangely (but maybe that's just me):

 # xdpyinfo | grep resolution 

resolution:    129x126 dots per inch 

What I want is a ppi (dpi) of 96. To fix this I had to add an option to my xorg.conf.
# vi /etc/X11/xorg.conf

--- /etc/X11/xorg.conf.bkp    2010-11-01 00:00:00.000000000 +0000
+++ /etc/X11/xorg.conf    2010-11-01 00:00:00.000000000 +0000
@@ -42,8 +42,9 @@ EndSection
Section "Device"
     Identifier     "Device0"
     Driver         "nvidia"
     VendorName     "NVIDIA Corporation"
+    Option "DPI" "96 x 96"
EndSection

Section "Screen"
     Identifier     "Screen0"

19 September 2010

Mount and Automount Ext3 USB Drives on Western Digital My Book World 2ND Edition (White Light)

It seems somewhat silly that the WD My Book World 2ND Edition (MBWE II)supports USB drives that have NTFS and FAT filesystems but doesn't support the Linux EXT3 considering it runs Linux.

These instructions will let you mount and use USB drives with EXT3 partitions and I expect drives with other supported filesystems.

Firstly be careful! As with every tutorial I post only do this if you know what you are doing I take no responsibility you do it all at your own risk.

I actually expect that this will work for other filesystems that the NAS knows about but I don't have any drives with these formats to test it with, if you do try please post your results for others.

Manual Mount
Firstly you will need to SSH into the drive (you may need to turn it on using the admin web interface). There are lots of other sites that will tell you how to do this if you aren't familiar with this process.

If you aren't familiar using SSH you don't want to type the # it just indicates it is a command to enter, and everything is case sensitive. If you needed this help then you really need to be careful.

If you just want to mount the EXT3 drive as a one off it's easiest to do it manually, I've been doing this for some time now.

To mount the drive we need to find the device file associated with the drive, we can do this by running fdisk.

# fdisk -l

...

Disk /dev/sdb: 1024 MB, 1024966656 bytes

32 heads, 62 sectors/track, 1009 cylinders

Units = cylinders of 1984 * 512 = 1015808 bytes

   Device Boot    Start       End    Blocks   Id  System

/dev/sdb1               1         159      157697    b  Win95 FAT32

/dev/sdb2   *         160        1009      843200   83  Linux

The first set of entries (/dev/sda*) which I have omitted here are the WD's internal drive (if you have a dual drive version I'm assuming you will have a second drive listed here as well).

You should have a set of entries for your EXT3 drive. The above example is for a 1GB drive with two partitions, the first is a ~160MB fat drive and the second is a ~840MB EXT3 partition. So what I'm interested in is /dev/sdb2.

Next we create a folder to mount to, this can be anything but I've found the following nice because you can log into the copy manager on the web interface of the drive and use it to copy files between the drives (if you don't care about the copy manager /DataVolume/usb is probably a better place to make it).

# mkdir /shares/usb1-1share1

Then we mount the drive (replacing /dev/sdb2 with whatever device file fdisk showed your EXT3 partition to be listed as):

# mount -n -t ext3 /dev/sdb2 /shares/usb1-1share1

for other filesystems try:

# mount -n /dev/sdb2 /shares/usb1-1share1

You can then copy files to the mounted drive either using the web interface copy manager or from the console using either of the following commands:

# cp /DataVolume/sharename/file /shares/usb1-1share1/

# cp -r /DataVolume/sharename/folder /shares/usb1-1share1/

Note however that this method does not share the mounted filesystem though the standard mechanisms like CIFS. While it is possible to do this with this method it is just easier to automount the drive so if you want this read on.

Automount

The above worked fine for me for some time but I really wanted the drive to automount, and when I did a search around the web to find out how to do it, it seemed like I wasn't the only one to want the feature:
http://mybookworld.wikidot.com/forum/t-24171/
http://mybookworld.wikidot.com/forum/t-166644/need-help-sharing-ext3-usb-drive
http://mybookworld.wikidot.com/forum/t-247589
http://community.wdc.com/t5/My-Book-World-Edition/usb-ext3-drive-not-mounting/td-p/19414
http://community.wdc.com/t5/My-Book-World-Edition/MBWE-II-ext3-formatted-USB-storage/td-p/18643

I am currently not a member of all of these communities, and haven't bothered to list this on all these old threads, but if you have come here from one of these communities or another post about this it might be helpful if you would link this post.

This code patch is for Firmware Version 01.02.04 (currently the most recent firmware version) and should work with (Firmware Version 01.01.16).

First things first lets backup the file incase we make a mistake:
# cp /sbin/usb_auto_share_add.php /sbin/usb_auto_share_add.php.bkp

Should you need it you can restore the file using:
# cp /sbin/usb_auto_share_add.php.bkp /sbin/usb_auto_share_add.php

The only text editor on the NAS is currently vi, I like to recommend nano or pico to inexperienced users but unfortunately they aren't installed on this system. So I have developed an alternative for those who aren't familiar with vi.

Copy the file to a share you have, open the file over the network, edit it and then copy the file back. Here's how to do that, if your happy with vi just skip on a bit.

Find the share you want to copy to:
# ls /DataVolume/

Copy the file to the share using (substituting sharename with the case sesitive name you found using ls):

# cp /sbin/usb_auto_share_add.php /DataVolume/sharename/

Now open the file over the network using the method you normally would, make the edits outlined below and then save the file.

Once you've made the changes copy the file back to the folder using

# cp /DataVolume/sharename/usb_auto_share_add.php /sbin/

Here are the changes to make to the file. If your going to post this elsewhere a little credit that you found it here first might be nice :)

Open the file in vi:

# vi /sbin/usb_auto_share_add.php

Sorry for the layout of the following, I suggest that you copy it into something nicer before you read it (like notepad with Word Wrap turned off).

| lines from the original used for alignment. 

o optional lines

+ lines to add.

- lines to remove

/sbin/usb_auto_share_add.php

|    // mount USB share for NTFS/HFS+ filesystem using ufsd

|    @system('/bin/mount -n -t ufsd -o gid=1000,umask=002,iocharset=utf8,force /dev/' . $USBDevInfo['devname'] . ' ' . $mountPoint, $retval);

|    if($retval != 0) {

|      // mount USB share with UTF8 option for FAT filesystem

|      @system('/bin/mount -n -o gid=1000,umask=002,iocharset=utf8 /dev/' . $USBDevInfo['devname'] . ' ' . $mountPoint, $retval);

|      $filesystem = 'fat';

|    }

+   

+    if($retval != 0) {

+      // Try mounting  USB share as ext3. 

+      @system('/bin/mount -n -t ext3 /dev/' . $USBDevInfo['devname'] . ' ' . $mountPoint, $retval);

+      if($retval == 0) {

+        $filesystem = 'ext3';

+      }

+    }

+   

+    if($retval != 0) {

+      // Try mounting  USB share as any other file system known to the system. 

+      @system('/bin/mount -n /dev/' . $USBDevInfo['devname'] . ' ' . $mountPoint, $retval);

+      if($retval == 0) {

+        $filesystem = ''; // or could use [= 'unknown';] // Note: Could find file system type with [mount] but see no reason to at this stage.

+      }

+    }

O   

O    // Debug Log

O    @system("/usr/bin/logger \"Mounted File System: $filesystem\" ");

O   

|    // mount USB share without UTF8 option for HFS+ filesystem

|    //if($retval != 0)

|    //  @system('/bin/mount -n -o gid=1000,umask=002 /dev/' . $USBDevInfo['devname'] . ' ' . $mountPoint, $retval);

=-=-=-=-=

|      // mount USB share for NTFS/HFS+ filesystem using ufsd

|      @system('/bin/mount -n -t ufsd -o gid=1000,umask=002,iocharset=utf8,force /dev/' . $USBDevInfo['partition'][$i] . ' ' . $mountPoint, $retval);

|      if($retval != 0) {

|        // mount USB share with UTF8 option for FAT filesystem

|        @system('/bin/mount -n -o gid=1000,umask=002,iocharset=utf8 /dev/' . $USBDevInfo['partition'][$i] . ' ' . $mountPoint, $retval);

|        $filesystem = 'fat';

|      }

+      

+      if($retval != 0) {

+        // Try mounting  USB share as ext3. 

+        @system('/bin/mount -n -t ext3 /dev/' . $USBDevInfo['partition'][$i] . ' ' . $mountPoint, $retval);

+        if($retval == 0) {

+          $filesystem = 'ext3';

+        }

+      }

+      

+      if($retval != 0) {

+      // Try mounting  USB share as any other file system known to the system. 

+        @system('/bin/mount -n /dev/' . $USBDevInfo['partition'][$i] . ' ' . $mountPoint, $retval);

+        if($retval == 0) {

+          $filesystem = ''; // or could use [= 'unknown';] // Note: Could find file system type with [mount] but see no reason to at this stage.

+        }

+      }

+      

O      // Debug Log

O      @system("/usr/bin/logger \"Mounted File System: $filesystem\" ");

O      

|      // mount USB share without UTF8 option for HFS+ filesystem

|      //if($retval != 0)

|      //  @system('/bin/mount -n -o gid=1000,umask=002 /dev/' . $USBDevInfo['partition'][$i] . ' ' . $mountPoint, $retval);

=====

Now you should be able to automount EXT3 drives and get them to show up as a shares like other filesystems on USB drives (according to your USB permsssions, see below):

It would have been nicer to get the filesystem type from mount but I haven't got around to that yet.
Now I didn't spend to long on this patch, and I havn't tested it with other file systems, but I believe it will also work with others supported by the NAS. I believe mount will try the filesystems listed in order in the following file:

# cat /etc/filesystems

You may be able to add more file systems to the list and even build and compile modules for file systems that aren't supported yet but that is well beond the scope of this article. I might add another post in future if I have the need, ext4 anyone???

I have identified a number of potential issues using EXT3, some which I have experienced and some that I hypothesised will create problems, if you find any more let me know and I'll list them here.

To access the share using normal methods you need to set the USB share permssions apropriately using the admin web interface (Advanced Mode->Users->USB Share Permissions).
Because EXT3 is a native Linux file system supporting permssions, the permissions of the files may cause problems. This will particularly be a problem if you have created the files (or permssions) on another Linux system and then connected the drive to the NAS or are doing the reverse. You should be able to fix this using commands like:
# chmod 775 /shares/usb1-1share1/
# chown root /shares/usb1-1share1/
# chgrp jewab /shares/usb1-1share1/

I will try to get in contact with the developers and see if they will include this patch into the next firmware update. If I get any worthwhile feedback I'll let you know.

Update: I've tried to contact the developers (listed in the file) and the email address appears to be dead. If you know of a firmware developer that works on the MBWE II or want to contact support to have this included in future firmware updates please feel free I'm sure it will help out many others.

01 September 2010

S.M.A.R.T Western Digital My Book World 2ND Edition (White Light)

No matter how you look at it the hard drive in my computer is failing.

Fortunately I know "Data You Have Not Backed Up Is Data You Wouldn’t Mind Losing" so I'm not too worried and hopefully I'll be right when the drive does fail (I'm currently just trying to limp it through until I can replace it). But one of the drives I use is the Western Digital My Book World 2ND Edition (White Light) which due to its external nature I have no way of being warned about SMART errors. SMART warnings don't always necessarily correlate to an imminent HDD failure, but they are defiantly better then nothing.

So I set out to work out how to access the SMART information on the drive. Fortunately the developers had already included the smartmontools package on the drive so it was a quite easy process.

Firstly you will need to SSH into the drive (you may need to turn it on using the web interface).

Once your in all you need is the following command which will give you all the SMART information for the drive.

$ smartctl -a -d ata /dev/sda

Example:
# smartctl -a -d ata /dev/sda
smartctl version 5.38 [arm-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD10EAVS-00D7B1
Serial Number:    -- Removed --
Firmware Version: 01.01A01
User Capacity:    1,000,204,886,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is:    Wed Sep 1 10:15:56 2010 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84)    Offline data collection activity
                    was suspended by an interrupting command from host.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (22800) seconds.
Offline data collection
capabilities:             (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   2) minutes.
Extended self-test routine
recommended polling time:     ( 255) minutes.
Conveyance self-test routine
recommended polling time:     (   5) minutes.
SCT capabilities:            (0x303f)    SCT Status supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail Always       -       0
3 Spin_Up_Time            0x0027   163   160   021    Pre-fail Always       -       6825
4 Start_Stop_Count        0x0032   098   098   000    Old_age   Always       -       2040
5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail Always       -       0
7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       1747
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       757
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       4
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       2040
194 Temperature_Celsius     0x0022   119   096   000    Old_age   Always       -       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector 0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
    1        0        0 Not_testing
    2        0        0 Not_testing
    3        0        0 Not_testing
    4        0        0 Not_testing
    5        0        0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

28 August 2010

ImageImprint - Identify Identical Images, Changes In Images And Produce Minimal JPEGs

Create hashes (checksums) for the content of JPEG files and other image types to identify duplicate images and image changes.

I've used MD5's for some time now to keep any eye on duplicate files and changes in files that I don't want to change. But when it comes to images the standard routine of just running an MD5 over the whole file doesn't cut it. The problem with image files (and many other types of files) is that the file can change without the image content actually changing, this might happen for example if you add meta data to the file, or the image is opened and resaved without any editing but changing the layout and structure of the file. Now while these are indeed changes to the file content and indeed invalidate a complete file hash this is not always desirable, you see what I am concerned about is image content. So what I wanted was a more fine grained way of identifying if an image file had changed and if so whether the actual image content had changed or just some other meta/extra data in the image file. What ImageImprint does is generates up to four image hashes (2 for any image file, 4 for JPEG files) that provide more insight on whether an image file has changed and the nature of the changes.

The four types of image hash ImageImprint can produce are as follows:

Full File Hash: The same as any other hashing program, runs the hash over the entire file. It can be used to identify identical files and any changes in the file.
Generic Image Hash: Renders the image to a standard format and then hashes the result. It can be used to identify identical images (even when the file format and encoding scheme vary "in some instances: non lossy") and changes to an image.
Minimal Jpeg Hash: Only for JPEG images. Reduces the JPEG to its most minimal form in a standard format and hashes the result. Can identify identical images that were derived from the same base encode and identify changes to a JPEG raw image data.
SOS Jpeg Hash: Only for JPEG images. This is really the kernel of a JPEG file, it is basically the raw image data but doesn't contain enough information to reconstruct the full image (the coding key information is missing). You can think of this like the image information with the colour space missing, or a compression algorithm with the key lookup table removed, I guess it is somewhere between those in reality. This really automates the second method in this process. Can be used in much the same way as the above hash, but does not guarantee complete JPEG integrity, and therefor can be used in some instances where a JPEG is more significantly reorganised (say optimised) without the actual image changing.

Hopefully I'll end up getting this on a wiki so that users can update it to be more useful but for now if you have any comments or suggestions, just leave them as posts.

Release Version

Click here to download the latest version of ImageImprint (V 0.9.0.0).

Screen Shots

Usage

This is the same as "-h" but a little more detailed.

To run the program using these options you either need to create a shortcut to the application and then add the comandline options to the target line:

or run the program from the command prompt and append the command line options:

Usage: YarragoImageImprint_V_0_9_0_0.exe [options]

-l: Displays the programs licence.

-file: The file that ImageImprint will produce hashes for. ImageImprint only runs in file mode or directory mode, if both are specified file mode takes precedence.

-d {directory}: The directory that ImageImprint will produce hashes for.

-s {log}: Save the hash data to file instead of displaying it. Instead display the status.
Status: Start - Current - Files Processed - Second Per Files - Instantaneous Second Per Files (the number of seconds it took to process the last file).

-nr: The default operation of the program is to recursively process all sub directories in the base directory. This option stops the behavior so that only files directly in the base directory are processed.

-fs: Write out file size.

-f: Generate a full file hash.

-j: JPEG files only. Generate a minimal JPEG file hash.

-jffo: Only use the first frame in the JPEG image (only effects -j and -sos, does not effect -i). Some JPEGs contain multiple images, if this option is used only the first frame will be used for the JPEG hashes and subsequent images are ignored.

-sos: JPEG files only. Generate a hash based only on the SOS segment(s).

-w {directory}: Write out the hash data files for the minimal JPEG files and/or SOS segments to this directory (implies -j if it is not used).

-i: Generate the generic image hash (for any known image type).

-nv: Do not verify images when using the generic image method (this speeds up processing time considerably).

-debug: Run the debug test (takes time and does not produce a hash, but checks the JPEG hashes produced are sane).

-hash {algorithm}: Change the algorithm used to produce the hash (md5, sha1). Default if not specified is MD5.

-v {version}: The JPEG imprint version to use. Not supported in the current version but will be available in future versions if the hash algorithms change so you can verify against old hashes you produced. This is for legacy usage only and you should always generate new hashes with the most recent version of the algorithm.

-qv: Gets the program versions.

-np: Do not prompt to enter a key on exit.

Here is a sample command line I would typically use:
YarragoImageImprint_V_0_9_0_0.exe -f -j -sos -i -d "C:\pathtoimagefiles\ " -s "C:\ImageImprintLog.csv"

Here is a sample command line I would use if I was debugging or wanted to see what was going on:
YarragoImageImprint_V_0_9_0_0.exe -f -j -sos -i -d "C:\pathtoimagefiles\ " -s "C:\ImageImprintDebugLog.csv" -w "C:\writefileshere\ " -debug

Sample Usage

There are a number of tasks that ImageImprint will help you do:

Identify identical duplicate images.
Ensure that images aren't becoming corrupted over time either though disk corruption or human error.
Identify if a program is capable of performing an operation losslessly. (Really just a permutation of the above).
Produce minimal JPEG files (without metadata). This serves a couple of purposes:

It can let you produce minimal files with identical images for further use or distribution. (E.g. If you want to produce images for distribution that don't have any metadata).
It can let you debug my program for me, you can see what is being produced and determine whether there is a problem with it.

What I typically do is generate the hashes for a collection of files and have the program save them to a .csv file using a command like:
YarragoImageImprint_V_0_9_0_0.exe -f -j -sos -i -d "C:\pathtoimagefiles\ " -s "C:\ImageImprintLog.csv"

To find duplicates I then open the result in spreadsheet software and sort each type of hash (one type at a time) and use a formula to match identical hashes.

To find changes I use some method (spreadsheet or BeyondCompare) to compare an old log file with a current log file to find differences.

If you have a single photo you are interested in you should just be able perform the comparison by hand.

I have produced a set of sample images and hashes to better explain what the different hashes mean in a real world context.

First download the sample images from here.

Lenna is the standard test image so I wont depart here.

Here is a table showing the each type of hash for a number of images, these are MD5's which I have shortened to just the first 3 bytes and last 3 bytes so 137BC77A0D150F3C6149755C968A38DE becomes 137...8DE.

File Name	Full File	JPEG	SOS	Generic
Image_Sample.bmp	137...8DE			225...169
Image_Sample_RGB_Data.dat	225...169
Lenna.bmp	303...ACC			2DB...041
Lenna.tiff	727...27E			2DB...041
Lenna_Jpeg_Standard.bmp	06A...BF3			081...FC5
Lenna_Minimal.jpg	B1B...376	B1B...376	181...C61	081...FC5
Lenna_Minimal_SOS_Segment_Only.sos	181...C61
Lenna_Progressive.jpg	E11...2E1	350...C99	AE3...8D7	081...FC5
Lenna_Standard.jpg	587...0C9	B1B...376	181...C61	081...FC5
Lenna_Standard_Comment.jpg	82A...320	B1B...376	181...C61	081...FC5
Lenna_Standard_Meta_1.jpg	9B9...1F6	B1B...376	181...C61	081...FC5
Lenna_Standard_Meta_2.jpg	108...790	B1B...376	181...C61	081...FC5
Lenna_Standard_Restart.jpg	85E...194	729...99A	33E...D76	081...FC5

Lenna.tiff: The original image file.

Lenna.bmp: A resave from the original image. The file hash differs from the original because the file format is different, but the generic image hash stays the same because BMP is lossless and so the image is identical to the original.

Lenna_Standard.jpg: A resave from the original. Again the file hash differs because it is a different file. The image hash also differs from the original because JPEG is a lossy compression and so the actual image is different. We also get a JPEG and SOS hash because it is a JPEG.

Lenna_Jpeg_Standard.bmp: A resave from Lenna_Standard.jpg. The generic image hash stays the same as Lenna_Standard.jpg because BMP is lossless.

Lenna_Minimal.jpg: This is the same as Lenna_Standard.jpg but is the minimal JPEG (as written out by ImageImprint), both JPEG hashes for both files are the same. The file hash and JPEG hash for this file are also the same, because the image is already in its minimal form and so when it is processed the same image is produced.

Lenna_Minimal_SOS_Segment_Only.sos: This is a file that just contains the SOS segment from the Lenna_Standard.jpg image. You can see that the file hash matches the SOS hash from all images derived from Lenna_Standard.jpg.

Lenna_Progressive.jpg: A resave of the original image as a progressive JPEG. The JPEG hash differs because the raw JPEG data is totally different, but because GIMP can save identical progressive and standard JPEG images the generic image hashes match.

Lenna_Standard_Comment.jpg: A resave of Lenna_Standard.jpg with a comment added to the file. The file hash changes because the file has varied, but the JPEG, SOS and generic image hashes match because the JPEG image data hasn't been altered.

Lenna_Standard_Meta_1.jpg: Much the same as above.
Lenna_Standard_Meta_2.jpg: Much the same as above.

Lenna_Standard_Restart.jpg: Much the same as progressive, except that it is a standard JPEG image with restart markers.

Image_Sample.bmp: This is a test file I created to demonstrate how the generic image hash works. It is a normal 24bit BMP test image.

Image_Sample_RGB_Data.dat: This is the raw data that is generated by the generic image hash function, hence the file hash is the same as the generic image hash for Image_Sample.bmp.

Analysis

The following is a description of what the program does and a bit of background on how it works.

Generic Image
Raw pixel data BGR 8 bytes per pixel (why BGR? because System.Drawing.Imaging.PixelFormat.Format24bppRgb is actually BGR which I expect is because of the endianness of the machine it is running on) and I didn't really want to add the overhead of reordering (it really doesn't matter so long as it is standard). Starting from the top left pixel and then working across the top row in order and then the second top row and so on until the far right pixel in the bottom row is reached.

Here is the data that is produced to hash (the linebreaks and spaces are formatting niceties only).

0000FF 0000FF 0000FF 0000FF 0000FF
00FF00 00FF00 00FF00 00FF00 00FF00
FF0000 FF0000 FF0000 FF0000 FF0000
0000FF 00FF00 FF0000 0000FF 00FF00
FF0000 0000FF 00FF00 FF0000 0000FF
00FF00 FF0000 0000FF 00FF00 FF0000
0000FF 0000FF 0000FF 0000FF 0000FF
FF0000 FF0000 0000FF 00FF00 00FF00
00FF00 00FF00 0000FF FF0000 FF0000
0000FF 0000FF 0000FF 0000FF 0000FF

If you want to see how it actually works you can pump it through an online hex to MD5 converter (such as the one here) to see that the hash produced is the same as in the table above.

For those of you that better understand the JPEG standard you may be interested to know in more detail exactly how the JPEG hashes work. For those of you who are interested but don't know how JPEG files work I suggest you take a look here, here or here if you are really keen (Appendix B) and then come back when you have the basics down pat.

When generating a hash for JPEG a minimal JPEG file is generated (you can write it out and examine the contents using the "-w" command line switch). The minimal JPEG only uses the segments found below from the original JPEG and discards all other segments. The segments are then sorted into a standard order and multiple segments of the same type are joined into a single segment. In this way JPEG files that have the same content but in a differing fragmentation can be converted to a uniform structure leading to identical output images and therefor image hashes.

Jpeg Standard Format
SOI
[Tables/Misc]
SOF
[Tables/Misc]
SOS[...More SOS's...]
EOI

Tables/Misc:
DQT
DHT
DRI

All tables joined into a single segment. Tables ordered by decoder id, then by precision (8 bit then 16bit for DQT's) or table type (AC then DC for DHT's). If the id byte is identical (decoder id + other info) is the same, then the tables are left in the order they appear in the image, in practice I don't think this should happen.

The following are some information and caveats that I have identified with each hash.

Full File Hash: File hash changes even when the actual image content stays the same.
Generic Image Hash: Images that contain higher then 24bit fidelity may be incorrectly paired with images using 24bit or above (because the image is effectively flattened into 24bits). I have not tested images with an alpha channel. No guard against image skewing (I.E. you could combine the entire image onto a single row or any combination of rows, so long as the pixel order is the same and generate the same image), but in practice I don't think this will ever happen. Some image types can contain multiple images in the file, this hash is only ever run over the primary image in the file (subsequent images are ignored). This type of hash takes the longest because it uses the standard .NET image conversion function which somehow verifies the whole image and this process seems to take a while (and for this reason this type of hash is probably more robust and less error prone). It also uses up considerably more resources (RAM) because the full image must be decompressed to this raw format and for even reasonable sized JPEG's this can mean a whole bucket load of raw data (yes this could be done slightly more efficiently).
Minimal Jpeg Hash: There are a number of scenarios that could lead to a minimal JPEG that are different even though the image is truly identical and this is why you have the other hashes to provide some fall back against this. Will not identify more complex changes then simple reorganisation and fragmentation file changes (GIMP can produce identical progressive vs standard images, which leads to grossly different underlying data but identical images).
SOS Jpeg Hash: The image could become corrupt (conceivable the DHT, or DQT tables could become corrupt) without altering the hash.

Obviously the full file hash will be stable over time. I also imagine that both the JPEG hashes will be stable, at least up to a point, with bug fixes been included in later versions but the version commandline option providing backward historical compatibility for comparison against existing hash logs. However I envisaged that the generic image hash would not necessarily be stable because of the way it is implemented (I'd really like to get it to RGB one day, it just seems nicer), so don't rely on it for long term achieve verification alone (well at least not yet).

The JPEG methods perform some file type validation (they make sure that the structure seems fine and that the critical segments are there but they don't check the data in the segments are sane. For this reason if you want to be sure that your image files are sane to begin with you are better using a different program or running the generic image hash which make use of Microsoft's image validation, although I'm not sure how this works and what its tolerance is like.

As a result of the above it is best to use a combination (a couple) of the above for most tasks, hence giving you further insight on how the file and/or image has changed.

Testing And Debugging

This program was really only designed for my personal use, I didn't intend releasing it. I've only released it because I thought it might be useful to someone else. As a result the error handling is not as robust as it could have been, because I figured if there was a problem I would identify it when it occurred and I would treat it well and would only give it valid data. So go easy on it and be careful, it isn't the most robust code I've ever written but it does work well if you give it valid data (I.E. directories that actually exist). If you have a genuine repeatable bug you can produce with reasonable input let me know and I'll see what I can do to fix it. I'd say its somewhere between Alpha and Beta level code. I have successfully run the program over my photo collection which has in excess of 50,000 JPEG's and am fairly confident with the output. Running it over this collection from a network drive with the "-debug" option (which takes significantly longer) took around 2 full days.

Here are a list of bugs that have been identified that I'm aware of but that I haven't got around to fixing yet:

When files are written out they are only valid if a valid hash is generated. I have not actually experienced this bug, but I expect that this is what happens if the input file is not valid.

So this is where you can come in, the program works well and if you treat it well you can get a lot out of it, but if you do find an error let me know and I will try to fix it. I provide the software and you provide the testing win-win. The most untested component is the command line interface, because up until just about the release version I just modified the parameters in code (it meant I didn't have to have spend time on something that when I was using it myself I didn't need).

Debug checks you can run for me:

Run the debug check for all your files and let me know if there are any failures.
There were a few other errors I was going to get you to check for me but I can't think of them right now? Oops.

The checks that the debug option performs are as follows:

They checks only apply to the custom JPEG processing I have written.
Check that the minimal JPEG that is produced matches (image wise) the original image.
Check that rehashing the minimal JPEG produces the same result.

Plugins
I see a few plugins in the future of this program so that users can generate these hashes as they download their images from their camera or card as part of their workflow. If you are from a company that produces this software you have two options, either write the plugin your self that makes use of the commandline interface of the program or give me a free version to work with and if I find time I will write the plugin for you.

Interested in JPEGs in detail or Photo Management

In the course of developing my program I was doing a bit of in depth reading on how JPEG files work and I came across the following site:
http://www.impulseadventure.com/photo
I think its really after my own heart on a lot of digital photography management and its really worth a look.

13 August 2010

Printing Word Documents To XPS With Default Filenames

I was fairly disappointed when I found that when printing to an XPS file from Word you have to manually specify the filename every time. This is fine if you have one file you want to convert to an XPS file or even a few, but if you have a number it becomes quite a tedious process. I'm now using XPS documents as part of further work flow and I found it frustrating to have to enter the filename every time.

What I actually wanted was an easy method to be able to convert Word documents to image files. A few months ago I wanted to be able to easily convert any document to an image file. There were a range of products out there, but none of them did what I wanted so I rolled my own that was able to convert XPS documents to image files:

If you’re interested in this let me know and I'll upload it so everyone can use it.
Update: I have made this available, see my free software page.

With Word documents you already have the option to right click and print, but this only works in this case if your default printer is set to the XPS writer and you'll still have to specify the file name every time, all in all if you have enough documents this eats up a lot of time.

So what I really wanted to be able to do was not even have to open the document, but instead right click on it and simply select Print To XPS.

I couldn't find the information on how to do this, so I worked out how to do it and thought I would pass on here.

The solution I came up with can be used in two ways:
• If you just want to be able to quickly print the document you are working on to an XPS file with the same name from inside word.
• If you want to be able to right click in explorer and print a word document straight to an XPS file.

There are two parts to installing this:
• Installing the macro into word.
• Creating the right click context menu in explorer using the registry editor.

Installing The Macro Into Word

These instructions are for Microsoft Word 2003, but I'm sure that you can do the same things for other versions of Word, you will just have to be a little clever if things aren’t in the same place.

Update 01/09/2010: I have tested this with Word 2007 under Windows 7 and found that it still works, there is just a little variance in where things are located.

Open up a new word document and go to:
Tools->Macro->VisualBasicEditor

You will see the project explorer, it will contain two documents Normal and something like "Project(Document1)", we are interested in "Normal" because we want to apply the macros to the Normal template so that we can run it from any document, if you select the other option you will apply it to the currently opened document, which means the macro will only work with that document.

Download this macro file I have created, which contains all the macros we need to print to XPS files using the documents name as the name for the outputted XPS file.

Right click on "Normal" and select "Import File...".

Select my macro file from the import file dialog.

The file will now appear under the modules in the normal document and the code should show in the editor.

Close the VisualBasic Editor.

To test, Tools->Macro->Macros, two new macros should appear"
"PrintToXpsWithFilename"
"PrintToXpsWithFilenameAndClose"

You should be able to directly run these two macros:
"PrintToXpsWithFilename" will print to an .xps file with the same name and in the same directory as the current document and leave the document open.
"PrintToXpsWithFilenameAndClose" will do the same, but will close the document (and word) once it has been printed.

If you want you can now assign keyboard shortcuts or menu buttons, but these are fairly trivial so I will leave you to do that yourself.

Those of you who are just interested in the macros and are not interested in the right click functionality can leave us now. For the rest we will progress onto the registry.

Creating The Right Click Context Menu In Explorer Using Registry Editor

If you aren’t familiar editing the registry you might want to do a bit of background reading first.
Obviously backup your registry data first.

Fire up the registry editor.

First we will do it for old .doc documents, but see further down if you want to do it for .docx’s.

Open the following key:
HKEY_CLASSES_ROOT\Word.Document.8\shell\

Create a key called "PrintToXPS", set its (default) string value so that it is "Print To XPS" this will be the name that shows up on the files right click explorer menu.

Now create a sub key in the “PrintToXPS” key called "command", this key will hold the action that we want to perform. So now you should have:
HKEY_CLASSES_ROOT\Word.Document.8\shell\PrintToXPS\command

Set “command”s (default) string value to everything between the brackets but not including them ["C:\Program Files\Microsoft Office\OFFICE11\WINWORD.EXE" /mPrintToXpsWithFilenameAndClose /q /n "%1 "]

Update 01/09/2010: For other versions of word you need to adjust the path you use. For Word 2007 use:
C:\Program Files\Microsoft Office\OFFICE12\WINWORD.EXE

If you want to do it for other types of documents we need to find out where the shell command for that document type is located. I imagine it will be the same on your system so you probably won't need to look this up, but if you have trouble see what values have on your system.
To do this expand HKEY_CLASSES_ROOT\, you will see a huge amount of sub keys, you want to scroll down until you find the extension of the word document you want to be able to print from. On my system I can see the following two keys with the default value listed afterward:
HKEY_CLASSES_ROOT\.doc - Word.Document.8
HKEY_CLASSES_ROOT\.docx - Word.Document.12

Just replace this in the key paths you opened above and follow the instructions. So if I wanted to make it work for .docx I would open the following key and then follow the instructions above:
HKEY_CLASSES_ROOT\Word.Document.12\shell\

That should be all, you should now be able to right click on the Word document types you have set up and click “Print To XPS”.

Some further info for those who just need to know:

I think that the above is just a symbolic link to this key:
HKEY_LOCAL_MACHINE\SOFTWARE\Classes\Word.Document.8\shell\

So I have a feeling (although I haven't tested this) that if you want to apply it just to the current user you could put it in:
HKEY_CURRENT_USER\SOFTWARE\Classes\Word.Document.8\shell\Print To XPS

Have fun!

27 July 2010

Open And Disassemble Western Digital My Book World

Here is a tutorial on how to open a Western Digital My Book World 2ND Edition (White Light), although I expect that it applies to most of the same edition Western Digital My Book range.

Of course you use the information here at your own risk.

See also this tutorial, which helped me open mine, I have tried to include a few more details in my explanation.

Here are roughly where all of the catches that keep the case locked are (they are duplicated on both sides of the case).

I found it easiest to remove the two end clips first. If you look through the holes in the case you can see the two clips (sorry my camera doesn't perform well under these conditions - no manual focus).

I used a plastic card (credit card) inserted at the corners of the case and then slid into position. Then pry the clips apart (looking through the end so you can see the clips). Pop all 4 (2 on each side).

The top and bottom have sliding rails, but they also have little tabs to ensure they don't just slide off.

I found it easiest to pop the rails off the tracks again sliding the plastic card in from the corner and then prying them slightly apart and then out to the side (not over to the end).

There are then two locking tabs under the front feet, you can depress them with a short thin poker. Be careful not to push to hard as they are made from reasonably brittle plastic.

There are also two tabs at the top visible from the outside, by this stage the case should be really loose and so they didn't present much of a problem, but you may need to be aware of them in case the keep it locked up.

You should now be able to slide the outer casing away from the inner.

I have photos of the inside disassembly, although it is much more straight forward (no sneaky hidden clips and catches). I'll put them up if there is enough demand.

Briefly:
• Mine had a light pipe, remove this first, it should just slide out.
• Then remove the hard drive and electronics from the caddy (it is just sitting in there, not screws or locking bits). Pop it up at the front (light side) and then slide it forward and out.
• Remove the electronics by removing the 3 screws and slide the tray to release, this disconnects the hard drive from the extra electronics.

24 June 2010

Free Software Is Not Like Free Beer But Free Beer Is Like Free Software

It's long been described that free software is not like free beer, but today I came across free beer that is like free software. This isn't even the only open drink, you can also get OpenCola.

Pages