Resources > Crawlers list > ClueWeb Crawler

logo ClueWeb Crawler

ID: 220526
ProducerLanguage Technologies Institute External link
Other robots of the same producerLemurWebCrawler
Bot URLhttps://boston.lti.cs.cmu.edu/CMU-ClueWeb-Crawler/ External link
StatusActive Active

Known variants

ClueWeb-Crawler/1.0

UseragentstringClueWeb-Crawler/1.0 (+https://boston.lti.cs.cmu.edu/CMU-ClueWeb-Crawler/; mailto:cmu-clueweb-crawler@andrew.cmu.edu)/Nutch-1.22-SNAPSHOT (CMU ClueWeb Crawler for research; https://boston.lti.cs.cmu.edu/CMU-ClueWeb-Crawler/)
Category-- Uncategorised --
First seen2026-04-15 00:20:28
Last seen2026-04-15 06:06:25
IP addresses2
Walk from
128.2.204.71boston-cluster.lti.cs.cmu.edu US
128.2.204.80boston-1-19.lti.cs.cmu.edu US
 
Among our clients
View more...
 salesforce.com, inc.  
 MailChimp  
 Dailymotion SA  
 Allegro  
 Oracle  
 PayPal Holdings, Inc.