Friday, 15 June 2012

clojure read large text file and count occurrences -


I am trying to read a large text file and see the incidence of specific errors counting. For example, for the following sample text

  some bla error123 foo test error123 line junk error 55 more accessories   

I want to end with ( Not really care that data structure though I'm thinking of a map)

  error 123 - 2 error 55 - 1   

Even I've tried so far away

  (read-big- File find-error "sample.txt") Returns:  
  (zero zero "error 123" zero zero "error 123" zero zero "error 55 "zero zero)   

Next I tried to remove items like zero values ​​and groups

  (  

Which gives

   

code> {"error 123" ["error 123" "error 123"], "error 55" ["error 55"] }

This desired value Getting closer to production, though it may not be efficient now how can I mean? In addition, in the form of close-up and functional programming in any new form, I appreciate any suggestions about how I can improve it. Thanks!

I think you are looking for the frequency function:

 < Code> User = & gt; (Doctor Frequency) ------------------------- Closer.core / Frequency ([cola]) Number of maps compared to different items The times they appear void   

Therefore, it should you do what you want:

  (frequencies (delete from zero? (Read-large -File search-error "sample.txt"))) ;; = & Gt; If your text file is really big, then I would recommend it to the  line-seq  inline. "Error 123" 2, "Error 55" 1}   

To make sure that you do not get out of memory, instead of using filter You can also use the map and to delete .

  (defn count-lines [ex, filename] (with - Open [RDR (IO / Reader Filename)] (frequencies (filter-line-RDR))) (defn is-error-line? [Line] (re-search # error "line)) (count-lines Is-error-line? "Sample.txt") ;; = & Gt; {"Error123" 2, "Error 55" 1}    

No comments:

Post a Comment