Saturday 15 February 2014

perl - Duplicate photo searching with compare only pure imagedata and image similarity? -


About 600 GB photos are being collected for 13 years - now FBSD is stored on zfs / server.

Photos in many computers, in many deeper subdirectories from different USB manipulation software (iPhoto, Picasa, Himachal Pradesh and many others) - Removed images from external USB HDDS, disc disasters separate from several partial backups - shortly = Critical Miss

Then I did the following:

  • Search for trees for the same size files (Sharp) and md5 checksum for those
  • Collected duplicated images (same size + same MD5 = duplicate)

    This has been a great help, but there are many duplicates here too:

    • What is different with the XIF / IPTC data added by some photo management software, but the image is the same (or at least "looks like it" and has the same dimensions)
    • Or they are only one size version of the original pictures
    • or they are "advanced" Karan, etc.

      Now Question:

      • How to Duplicate Search withg Just a JPG Without XP / IPTC Checks "pure image bytes" in and like meta notifications? Therefore, you want to filter photo-duplicate, what is different with only the XF tag, but the image is the same (hence file checksum does not work, but image checksum Could ...). This (I hope) is not very complicated - but some direction is needed.
      • Can the Pearl module remove "pure" image data from JPG file which is useful for comparison / checksum

        more complex

      • Find the "similar" images, only
        • the original version of the size
        • "Advanced" version of the original (from some photo manipulation programs) < / Li>
        • Is there any algorithm available in a UNIX command form or Pearl module (XS)?) To find out these special "duplicate" Or can I use?

          I am creating complex scripts BASH and "+ -" :) Address Pearl . Use free BSD / Linux utilities on the server and OS X can be used on the network (but the fastest way to work with 600 GB on the LAN) ... < P> My rough idea:

          • Delete only images at the end of the workflow
          • Use image :: execute script to collect image code , Probably based on image-creation date and camera model (maybe
          • To create checksum of pure image data (or extract histogram - similar images should have the same histogram) - Not sure about this
          • Use some similarity detection to find size and photo Duplicate based on growth - do not know how to ...

            Any ideas, help, any (software / algorithm) sign how to make order in chaos? < P> PS:

            Here's almost identical number : But I have already been done with the answer (MD5). Looking to compare more accurate checksuming and image algorithms.

            Have you seen Randall Schwartz? They use a Perl script with Image Magic to compare resized pictures (4x4 RGB grid) versions, which they compare to "similar" images to flag.

No comments:

Post a Comment