Thursday, 15 September 2011

Stanford NER: How do I create a new training set that I can use and test out? -


In my understanding, to create a training file, you put your words in a text file. Then after each word, add a place or tab to the tag (such as PAR, LOC, etc ...)

I have also copied one sample from a sample file to a word pad. How do I get them into a gz file, which I can input and use in a classifier?

Please guide me though. I am a newbie and quite inefficient with technology.

Your training file ( training - data.tsv ) should look like this :

  IO has been transferred to Vancouver Venue BC LOCATION Tomorrow o   

Where o means "outside ", As is not a named unit

where the space between columns is a tab .

You do not put them in the ser.gz file. The CRGs file classifier model is created by the training process.

To train a classifier driver:

  anticipated java -cp Ner.jar edu.stanford.nlp.ie.crf.CRFClassifier- my-classifier.properties   

Where my-classifier.properties will look like this:

  trainFile = training-data.tsv serializeTo = my- Classification-model.ser.gz map = word = 0, answer = 1 ...    

No comments:

Post a Comment