101

I want to download a whole website (with sub-sites). Is there any tool for that?

Cristiana Nicolae
  • 4,570
  • 10
  • 32
  • 46
UAdapter
  • 17,967

9 Answers9

174

Try example 10 from here:

wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL
  • –mirror : turn on options suitable for mirroring.

  • -p : download all files that are necessary to properly display a given HTML page.

  • --convert-links : after the download, convert the links in document for local viewing.

  • -P ./LOCAL-DIR : save all the files and directories to the specified directory.
dadexix86
  • 6,776
shellholic
  • 5,742
44

HTTrack for Linux copying websites in offline mode

httrack is the tool you are looking for.

HTTrack allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure.

Cristiana Nicolae
  • 4,570
  • 10
  • 32
  • 46
Sid
  • 10,643
11

With wget you can download an entire website, you should use -r switch for a recursive download. For example,

wget -r http://www.google.com
muru
  • 207,228
7

WEBHTTRACK WEBSITE COPIER is a handy tool to download a whole website onto your hard disk for offline browsing. Launch ubuntu software center and type "webhttrack website copier" without the quotes into the search box. select and download it from the software center onto your system. start the webHTTrack from either the laucher or the start menu, from there you can begin enjoying this great tool for your site downloads

frizeR
  • 169
  • 2
  • 3
3

I use this command to download the whole website at http://example.com :

wget \
    --recursive \
    --no-clobber \
    --page-requisites \
    --adjust-extension \
    --convert-links \
    --restrict-file-names=windows \
    --no-parent \
    --domains "example.com" \
    "http://example.com"
  • --recursive downloads not just that web page, but also the web pages that it links to
  • --no-clobber prevent the same file from being downloaded multiple times
  • --page-requisites download all inline images, sounds, and referenced stylesheets
  • --convert-links convert links like http://example.com/hello.html to hello.html so that the HTML files can load from disk
  • --adjust-extension add .html filename extension to HTML files that don't already end with that extension, in order to make it easier to open the HTML files from a file browser
  • --restrict-file-names=windows escape or replace special characters in file names, in order to make the downloaded filenames work as expected on operating systems like Windows (the operating system that is the least permissive when it comes to file names)
  • --no-parent do not ascend to the parent directory when retrieving recursively
  • --domains example.com comma-separated list of domains to be followed
Flimm
  • 44,031
3

I don't know about sub domains, i.e, sub-sites, but wget can be used to grab a complete site. Take a look at the this superuser question. It says that you can use -D domain1.com,domain2.com to download different domains in single script. I think you can use that option to download sub-domains i.e -D site1.somesite.com,site2.somesite.com

binW
  • 13,194
2

I use Burp - the spider tool is much more intelligent than wget, and can be configured to avoid sections if necessary. The Burp Suite itself is a powerful set of tools to aid in testing, but the spider tool is very effective.

Rory Alsop
  • 2,779
2

You can download Entire Website Command :

wget -r -l 0 website

Example :

wget -r -l 0 http://google.com
1

If speed is a concern (and the server's wellbeing is not), you can try puf, which works like wget but can download several pages in parallel. It is, however, not a finished product, not maintained and horribly undocumented. Still, for to download a web site with lots and lots of smallish files, this might be a good option.

loevborg
  • 7,414