Tuesday, 15 May 2012

python - Using regex to capture text in parentesis if they exist -


Now I'm working on a Python script to pars my blankets file generated by Notes when I highlight something , Takes a note, or bookmarks I'm using regex to collect data from the file, then I'm planning to store it in SQL database. Right now, though, I'm having trouble matching the line in which the book's title and possibly an author are included.

There are three possibilities for this line, they can be in the format:

  Title (last, first) Title (author) title   

What do I need to capture the title for regex, and if it is present then whatever is in the ending bracket, otherwise captures an empty string. For example, I want regex here to give me the result For:

  ('title', 'last, first') ('title', 'author') ('title' '' )   

Right now I have managed to do a regex which catches the bracket, but without the title the title is not. What I have now:

  (. +) (?: ((. +) \) (?: \ N | \ z)) *   

The only issue is that it requires that the line ends with an author, and if I give it an option of accepting an empty string, then the whole line is without a writer. . I.e.

  ('title (last, first)', '') ('title (author)', '') ('title', '')   

If you try to line-line matches, you can use this regex:

  ^ (. +?) (?: \ ((. +) \))? $   

I have added the beginning of the line anchor and the closing of the line anchor, then put the space in the first non-capturing group, so that the title is captured without any other details. Could. I * to operator ? , because I do not think you will have more than one pair of brackets if you think you have more then change.

I removed the other non-capturing group because at the end of the line anchor it will be sure that this is the end of the line.

Demo

No comments:

Post a Comment