Friday, 15 June 2012

Read an IIS log to pandas dataframe -


I have an IIS log file that contains rows in the following formats:

61.245 163.59 - [16 / May / 2013: 23: 55: 09 +0530] "GET / Ehrram / Recruitment / Image / Divider. Gif http / 1.1" 404 1245 "" Mozilla / 5.0 (Windows NT 6.1; RV: 20.0) Gecho / 20100101 Firefox / 20.0 "Gate / Ehrram / Recruitment / Image / Divider. Gif - HTTP / 1.1 www.example.com

I have to get some columns from this and make a data frame, the following method In this condition Creates a data frame with a column. Do I have to keep each column of the partition as a column of dataframe? And the second thing is that the length of the log file line is not unique, so take the price from the splitting in this way How to improve the accuracy of?

  log_list = [] line f: ip = (Line.split ('') [0]) time = (line split ('') [2] ) Method = (line split ('') [4]) position = (line.split ('') [7]) bytes = (line split ('') [8]) = referrer = (line split '') [9]) Agent = (Line split ('') [10]) Data = ip + '+ + + + + + + + + + + + + + + + + + + + agent log_list.append (data) df = pandus. Datafeed (log_list)   

The following code should be completed which you are trying to do : Pendasa import read_csv from log_file = 'filename.log' df = read_csv (log_file, sep = r '\ s +', usecols = [0, 2, 4, 7, 8, 9, 10])

.

No comments:

Post a Comment