Duplicate content is being posted by its users in a website I am running. As a result, when visitors search for content, then it gives returns, which show results that are exactly the same.
The problem is that duplicate content is not carbon copy can change one word here and there, or post it a few days but additional results are not required because they essentially resubmit .
An example would be:
Title: trousers for sale Description: I am selling a pair of trousers I have a hole in them. Contact Rob I 1234 < / P>
Title: Trousers for sale Description: I am selling a pair of trousers I have some holes contact Rob
Title: My pajamas for sale Description: I am a trouser I'm selling the pair. They have a hole I contact Rob at 1234
Is there an algorithm (preferably made in PHP and fast) which can remove these duplicates with decent accuracy? It will search through a set of approximately 50 objects, up to 500 characters of each lesson.
Edit: I should also add that such results are not next to each other, so I can not just compare the current result with the result of the past like this selection title in a perfect world Something will happen, from the database WHERE id (10,40,54,143,444) and unique (title, desc) & gt; 90%
Forget your answer above and use the following code:
class identical Text {Private $ arrayResults = array (); Private $ text; Public function test ($ text, $ approved_profile = 70) {if (calculation ($ - this-> array results)) {foreach ($ this-> result as array result) {equal_text ($ result , $ Text, $ percent); If (((int) $ percent) gt; = $ accepted_change) {$ this-> Save ($ text); Return (int) $ percent; }}} $ This-> Save ($ text); Return 0; } Save the private function ($ text) {$ this- & gt; Array result [$ text] = $ text; }} $ Equal = new similar text (); While (/ * [$ fetch = ...] * /) {$ title = $ Fetch ['title']; $ Description = $ fetch ['description']; If ($ same-> test ($ title, 70) || $ same-> test ($ description, 70)) {continue; }}
No comments:
Post a Comment