I want to download a whole website (with sub-sites). Is there any tool for that?
9 Answers
Try example 10 from here:
wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL
–mirror: turn on options suitable for mirroring.-p: download all files that are necessary to properly display a given HTML page.--convert-links: after the download, convert the links in document for local viewing.-P ./LOCAL-DIR: save all the files and directories to the specified directory.
- 6,776
- 5,742
httrack is the tool you are looking for.
HTTrack allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure.
- 4,570
- 10
- 32
- 46
- 10,643
With wget you can download an entire website, you should use -r switch for a recursive download. For example,
wget -r http://www.google.com
- 207,228
- 345
WEBHTTRACK WEBSITE COPIER is a handy tool to download a whole website onto your hard disk for offline browsing. Launch ubuntu software center and type "webhttrack website copier" without the quotes into the search box. select and download it from the software center onto your system. start the webHTTrack from either the laucher or the start menu, from there you can begin enjoying this great tool for your site downloads
- 169
- 2
- 3
I use this command to download the whole website at http://example.com :
wget \
--recursive \
--no-clobber \
--page-requisites \
--adjust-extension \
--convert-links \
--restrict-file-names=windows \
--no-parent \
--domains "example.com" \
"http://example.com"
--recursivedownloads not just that web page, but also the web pages that it links to--no-clobberprevent the same file from being downloaded multiple times--page-requisitesdownload all inline images, sounds, and referenced stylesheets--convert-linksconvert links likehttp://example.com/hello.htmltohello.htmlso that the HTML files can load from disk--adjust-extensionadd.htmlfilename extension to HTML files that don't already end with that extension, in order to make it easier to open the HTML files from a file browser--restrict-file-names=windowsescape or replace special characters in file names, in order to make the downloaded filenames work as expected on operating systems like Windows (the operating system that is the least permissive when it comes to file names)--no-parentdo not ascend to the parent directory when retrieving recursively--domains example.comcomma-separated list of domains to be followed
- 44,031
I don't know about sub domains, i.e, sub-sites, but wget can be used to grab a complete site. Take a look at the this superuser question.
It says that you can use -D domain1.com,domain2.com to download different domains in single script. I think you can use that option to download sub-domains i.e -D site1.somesite.com,site2.somesite.com
I use Burp - the spider tool is much more intelligent than wget, and can be configured to avoid sections if necessary. The Burp Suite itself is a powerful set of tools to aid in testing, but the spider tool is very effective.
- 2,779
You can download Entire Website Command :
wget -r -l 0 website
Example :
wget -r -l 0 http://google.com
- 21
- 3
If speed is a concern (and the server's wellbeing is not), you can try puf, which works like wget but can download several pages in parallel. It is, however, not a finished product, not maintained and horribly undocumented. Still, for to download a web site with lots and lots of smallish files, this might be a good option.
- 7,414
