Resources > Crawlers list > ClueWeb Crawler

logo ClueWeb Crawler

ID: 220526
ProducerCarnegie Mellon University External link
Other robots of the same producerCMU-Cylab, LemurWebCrawler
Bot URLhttps://boston.lti.cs.cmu.edu/CMU-ClueWeb-Crawler/ External link
StatusActive Active

Known variants

ClueWeb-Crawler/1.0

UseragentstringClueWeb-Crawler/1.0 (
Category-- Uncategorised --
First seen2026-05-19 12:18:08
Last seen2026-06-05 09:03:29
IP addresses3
Walk from
128.2.204.71boston-cluster.lti.cs.cmu.edu US
128.2.204.72boston-1-16.lti.cs.cmu.edu US
128.2.204.65boston-1-13.lti.cs.cmu.edu US
 

ClueWeb-Crawler/1.0

UseragentstringClueWeb-Crawler/1.0 (+https://boston.lti.cs.cmu.edu/CMU-ClueWeb-Crawler/; mailto:cmu-clueweb-crawler@andrew.cmu.edu)/Nutch-1.22-SNAPSHOT (CMU ClueWeb Crawler for research; https://boston.lti.cs.cmu.edu/CMU-ClueWeb-Crawler/)
Category-- Uncategorised --
Respects robots.txtNo
First seen2026-04-15 00:20:28
Last seen2026-05-15 19:23:52
IP addresses8
Walk from
128.2.204.71boston-cluster.lti.cs.cmu.edu US
128.2.204.65boston-1-13.lti.cs.cmu.edu US
128.2.204.72boston-1-16.lti.cs.cmu.edu US
128.2.204.77boston-1-17.lti.cs.cmu.edu US
128.2.204.78boston-1-18.lti.cs.cmu.edu US
128.2.204.80boston-1-19.lti.cs.cmu.edu US
128.2.204.70boston-1-15.lti.cs.cmu.edu US
128.2.204.66boston-1-14.lti.cs.cmu.edu US
 
Among our clients
View more...
 salesforce.com, inc.  
 MailChimp  
 Dailymotion SA  
 Allegro  
 Oracle  
 PayPal Holdings, Inc.