Matches in SemOpenAlex for { <https://semopenalex.org/work/W4376163575> ?p ?o ?g. }
Showing items 1 to 53 of
53
with 100 items per page.
- W4376163575 abstract "We introduce a state-of-the-art approach for URL categorization that leverages the power of Large Language Models (LLMs) to address the primary objectives of web content filtering: safeguarding organizations from legal and ethical risks, limiting access to high-risk or suspicious websites, and fostering a secure and professional work environment. Our method utilizes LLMs to generate accurate classifications and then employs established knowledge distillation techniques to create smaller, more specialized student models tailored for web content filtering. Distillation results in a student model with a 9% accuracy rate improvement in classifying websites, sourced from customer telemetry data collected by a large security vendor, into 30 distinct content categories based on their URLs, surpassing the current state-of-the-art approach. Our student model matches the performance of the teacher LLM with 175 times less parameters, allowing the model to be used for in-line scanning of large volumes of URLs, and requires 3 orders of magnitude less manually labeled training data than the current state-of-the-art approach. Depending on the specific use case, the output generated by our approach can either be directly returned or employed as a pre-filter for more resource-intensive operations involving website images or HTML." @default.
- W4376163575 created "2023-05-12" @default.
- W4376163575 creator A5020341973 @default.
- W4376163575 creator A5035172174 @default.
- W4376163575 creator A5059139229 @default.
- W4376163575 date "2023-05-08" @default.
- W4376163575 modified "2023-09-26" @default.
- W4376163575 title "Web Content Filtering through knowledge distillation of Large Language Models" @default.
- W4376163575 doi "https://doi.org/10.48550/arxiv.2305.05027" @default.
- W4376163575 hasPublicationYear "2023" @default.
- W4376163575 type Work @default.
- W4376163575 citedByCount "0" @default.
- W4376163575 crossrefType "posted-content" @default.
- W4376163575 hasAuthorship W4376163575A5020341973 @default.
- W4376163575 hasAuthorship W4376163575A5035172174 @default.
- W4376163575 hasAuthorship W4376163575A5059139229 @default.
- W4376163575 hasBestOaLocation W43761635751 @default.
- W4376163575 hasConcept C106131492 @default.
- W4376163575 hasConcept C118643609 @default.
- W4376163575 hasConcept C136764020 @default.
- W4376163575 hasConcept C144133560 @default.
- W4376163575 hasConcept C154945302 @default.
- W4376163575 hasConcept C162853370 @default.
- W4376163575 hasConcept C2777338717 @default.
- W4376163575 hasConcept C31972630 @default.
- W4376163575 hasConcept C41008148 @default.
- W4376163575 hasConcept C94124525 @default.
- W4376163575 hasConceptScore W4376163575C106131492 @default.
- W4376163575 hasConceptScore W4376163575C118643609 @default.
- W4376163575 hasConceptScore W4376163575C136764020 @default.
- W4376163575 hasConceptScore W4376163575C144133560 @default.
- W4376163575 hasConceptScore W4376163575C154945302 @default.
- W4376163575 hasConceptScore W4376163575C162853370 @default.
- W4376163575 hasConceptScore W4376163575C2777338717 @default.
- W4376163575 hasConceptScore W4376163575C31972630 @default.
- W4376163575 hasConceptScore W4376163575C41008148 @default.
- W4376163575 hasConceptScore W4376163575C94124525 @default.
- W4376163575 hasLocation W43761635751 @default.
- W4376163575 hasOpenAccess W4376163575 @default.
- W4376163575 hasPrimaryLocation W43761635751 @default.
- W4376163575 hasRelatedWork W1696049218 @default.
- W4376163575 hasRelatedWork W2040397200 @default.
- W4376163575 hasRelatedWork W2116350809 @default.
- W4376163575 hasRelatedWork W2365213443 @default.
- W4376163575 hasRelatedWork W2558223934 @default.
- W4376163575 hasRelatedWork W2748952813 @default.
- W4376163575 hasRelatedWork W2888998488 @default.
- W4376163575 hasRelatedWork W2899084033 @default.
- W4376163575 hasRelatedWork W3013319096 @default.
- W4376163575 hasRelatedWork W3198184493 @default.
- W4376163575 isParatext "false" @default.
- W4376163575 isRetracted "false" @default.
- W4376163575 workType "article" @default.