Matches in SemOpenAlex for { <https://semopenalex.org/work/W1775605004> ?p ?o ?g. }
Showing items 1 to 84 of
84
with 100 items per page.
- W1775605004 abstract "Exponential growth of the web increased the importance of web document classification and data mining. To get the exact information, in the form of knowing what classes a web document belongs to, is expensive. Automatic classification of web document is of great use to search engines which provides this information at a low cost. In this paper, we propose an approach for classifying the web document using the frequent item word sets generated by the Frequent Pattern (FP) Growth which is an association analysis technique of data mining. These set of associated words act as feature set. The final classification obtained after Naive Bayes classifier used on the feature set. For the experimental work, we use Gensim package, as it is simple and robust. Results show that our approach can be effectively classifying the web document. Web document classification is the process of classifying documents into predefined categories based on their content. The classifiers used for this purpose should be trained from the web documents that are already classified. The task is to assign a document to one or more classes or categories. This may be done manually (or intellectually) or algorithmically. Manual classification cost more. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is used mainly in information science and computer science. The problems are overlapping; however there is also interdisciplinary research on documents classification. The documents to be classified may be texts, images, music, etc. Each kind of document possesses its special classification problems. Documents may be classified according to their subjects or according to other attributes. Web document classification is the primary requirement for search engines, which retrieve documents in response to the user query. Documents classification or text categorization (as used in information retrieval context) is the process of assigning a document to a predefined set of categories based on the document content. Documents classification can be applied as an information filtering tool and can also be used to improve the retrieval results from a query process. Classification is one of the main data analysis techniques and deals with the categorizing a new data entry into one of the categories based on the values of different attributes. In general, classification algorithm needs to train a model based on pre-classified training documents. Once the model is ready, we can subject the test documents for evaluation through that model and that brings the classification process to an end. In this paper, we proposed an approach for automatically classifying web documents into a set of categories using FP-growth and Naive Bayes techniques. In our approach, we have given a set of example documents. We preprocess the documents by parsing and removing the stop words, doing stemming (10) and extracted noun as keywords. Then we apply FP-growth method (9) to find the frequent item word sets from each document.� The documents are treated as transactions and the set of frequently occurring words are viewed as a set of items in the transaction. The new documents are classified by applying Naive Bayes technique on these derived features sets." @default.
- W1775605004 created "2016-06-24" @default.
- W1775605004 creator A5038534019 @default.
- W1775605004 creator A5053903379 @default.
- W1775605004 date "2014-06-21" @default.
- W1775605004 modified "2023-09-27" @default.
- W1775605004 title "An Effective Approach for Web Document Classification using the Concept of Association Analysis of Data Mining" @default.
- W1775605004 cites W1542365497 @default.
- W1775605004 cites W2007395264 @default.
- W1775605004 cites W2059586463 @default.
- W1775605004 cites W2067223933 @default.
- W1775605004 cites W2076008912 @default.
- W1775605004 cites W2110224739 @default.
- W1775605004 cites W2156772624 @default.
- W1775605004 cites W2159082584 @default.
- W1775605004 hasPublicationYear "2014" @default.
- W1775605004 type Work @default.
- W1775605004 sameAs 1775605004 @default.
- W1775605004 citedByCount "4" @default.
- W1775605004 countsByYear W17756050042016 @default.
- W1775605004 countsByYear W17756050042017 @default.
- W1775605004 countsByYear W17756050042018 @default.
- W1775605004 crossrefType "posted-content" @default.
- W1775605004 hasAuthorship W1775605004A5038534019 @default.
- W1775605004 hasAuthorship W1775605004A5053903379 @default.
- W1775605004 hasConcept C118689300 @default.
- W1775605004 hasConcept C12267149 @default.
- W1775605004 hasConcept C124101348 @default.
- W1775605004 hasConcept C136764020 @default.
- W1775605004 hasConcept C154945302 @default.
- W1775605004 hasConcept C164120249 @default.
- W1775605004 hasConcept C177264268 @default.
- W1775605004 hasConcept C197046077 @default.
- W1775605004 hasConcept C199360897 @default.
- W1775605004 hasConcept C21959979 @default.
- W1775605004 hasConcept C23123220 @default.
- W1775605004 hasConcept C2780479914 @default.
- W1775605004 hasConcept C41008148 @default.
- W1775605004 hasConcept C52001869 @default.
- W1775605004 hasConcept C95623464 @default.
- W1775605004 hasConcept C97854310 @default.
- W1775605004 hasConceptScore W1775605004C118689300 @default.
- W1775605004 hasConceptScore W1775605004C12267149 @default.
- W1775605004 hasConceptScore W1775605004C124101348 @default.
- W1775605004 hasConceptScore W1775605004C136764020 @default.
- W1775605004 hasConceptScore W1775605004C154945302 @default.
- W1775605004 hasConceptScore W1775605004C164120249 @default.
- W1775605004 hasConceptScore W1775605004C177264268 @default.
- W1775605004 hasConceptScore W1775605004C197046077 @default.
- W1775605004 hasConceptScore W1775605004C199360897 @default.
- W1775605004 hasConceptScore W1775605004C21959979 @default.
- W1775605004 hasConceptScore W1775605004C23123220 @default.
- W1775605004 hasConceptScore W1775605004C2780479914 @default.
- W1775605004 hasConceptScore W1775605004C41008148 @default.
- W1775605004 hasConceptScore W1775605004C52001869 @default.
- W1775605004 hasConceptScore W1775605004C95623464 @default.
- W1775605004 hasConceptScore W1775605004C97854310 @default.
- W1775605004 hasLocation W17756050041 @default.
- W1775605004 hasOpenAccess W1775605004 @default.
- W1775605004 hasPrimaryLocation W17756050041 @default.
- W1775605004 hasRelatedWork W1479682228 @default.
- W1775605004 hasRelatedWork W2157620896 @default.
- W1775605004 hasRelatedWork W2167036382 @default.
- W1775605004 hasRelatedWork W2168836450 @default.
- W1775605004 hasRelatedWork W2184335383 @default.
- W1775605004 hasRelatedWork W2251561003 @default.
- W1775605004 hasRelatedWork W2303887538 @default.
- W1775605004 hasRelatedWork W2360300830 @default.
- W1775605004 hasRelatedWork W2421381789 @default.
- W1775605004 hasRelatedWork W2491607649 @default.
- W1775605004 hasRelatedWork W2509430763 @default.
- W1775605004 hasRelatedWork W2606955052 @default.
- W1775605004 hasRelatedWork W2810778250 @default.
- W1775605004 hasRelatedWork W3204069329 @default.
- W1775605004 hasRelatedWork W3210951214 @default.
- W1775605004 hasRelatedWork W1187248933 @default.
- W1775605004 hasRelatedWork W2168059377 @default.
- W1775605004 hasRelatedWork W2337305822 @default.
- W1775605004 hasRelatedWork W2559267432 @default.
- W1775605004 hasRelatedWork W2562702405 @default.
- W1775605004 isParatext "false" @default.
- W1775605004 isRetracted "false" @default.
- W1775605004 magId "1775605004" @default.
- W1775605004 workType "article" @default.