Thursday, 15 January 2015

python - Tracking a file over time -


From time to time, a special file is tracked on the file system between two points, T1 And T2 . The emphasis here is on seeing the file as a specific unit on the file system.

The final goal is to determine whether the data (reluctance) of a file has been changed between T1 , to determine whether any data and attributes By changing the file-hash and creation / modification features of the file on T1 and comparing it with the equivalent of recording on T2 , T2 . If all the features are unchanged but the hash is not valid, then we can say that there is a problem. In all other cases, we may be ready to say that a changed hash is the result of an amendment and the result of not having any change on an unchanged hash and unchanged modification file (data) depends on everyone.

Now, there are several ways to reference file and related drawbacks:

  • Path of the file : However, if the file is a different location The method fails to move on.
  • A data-hash of file-data : a file's permissions, or rather (a) pointers in file-data on the disk, even if the pointer moves to a different directory But the data can not be changed or this method may also fail.

    I think that to track the file ID on the file T1 on the T2 , even if it does not change its location Has changed, so it is not necessary to view it as a new file.

    I know about two ways pywin offer win32file.GetFileInformationByHandle () and win32file.GetFileInformationByHandleEx () , but they explicitly specify the file - are restricted from the system, to cross-platform compatibility and to track swayed files away from a universal approach.

    My question is simple: is there any other ideas / principles for tracking a file, ideally on the basis of platforms / FS?

    Any cerebrospinal food is welcome for consideration!

    "itemprop =" text ">

    This is actually possible in general, because the idea of ​​file identification is an illusion (similar to illusion

    1. You can not track identity by using content of content, because content changes.

    2. Can not track via any other associated property, because many file editors will delete the old file and create a new one and save the change.

      The version control system handles it in three ways:

      1. (subversion) manual Take a look at the move. Operation.

      2. (Git) A file (eg, if a new file is less than 50% from an existing file, then it is a copy ), Use estimates to conduct labels as the operation of "move" depending on the change of content.

        Inode Number of points Living conditions are not fixed not and are not reliable here, you can see that there is a file editing inode number with Vime, which we can check with will change state -f% i :

         $ touch file.txt $ stat - f% i file.txt 4828200 $ vim file.txt ... to file.txt ... $ state -f% i By changing file.txt 4828218   

No comments:

Post a Comment