Before running the
program it is advisable to adjust the general settings.
To do this, launch
the program and choose Default Options.
The first thing to do is decide which directory (new path) you will
use to save project files and the path to the directory for saving files
copied (downloaded) from the Internet.
Download files - this option is used to download files onto your hard drive.
Unless this option is highlighted the system will only download a list
of scanned hyperlinks into a special file.
Enter the proxy server properties (if you use a proxy server).
Then choose any other options you would like to use in downloading and
searching for hyperlinks.
Let's take a look at the various
Follow new links / URL
Follow new links / URL - to follow hyperlinks automatically
- this option allows you to automatically extract other websites linked
to the one you are scanning.
Copy subdirectory structure from website
- to copy the structure of a subdirectory from the website you wish
to download. If this option is highlighted your hard drive will be able
to create directories like the ones on the website you are downloading.
Extract local link - to search for local hyperlinks.
This option allows you to search for local links on the website
you are scanning, i.e. links that refer to other
documents on the website.
Stay within initial domain list.
A very convenient option that allows you to extract (not download)
hyperlinks (websites) not included in the original list of addresses. Here
you should decide whether you need to download other websites referred
to from one you are downloading. Using this option you will only download
the files you order. In this case the sites linked to the one you are investigating
will also be downloaded.
For example, you only need to download a list of (URL) addresses
and you don't need to download other domains linked to the original
list of domains (e.g. internet-soft.com)
Links level limit
Links level limit - number of downloading levels - shows the
number of steps involved in the hyperlinks.
An example will help to illustrate this option. Let's assume there is
a hyperlink from one site to another. There is a link from the second link
to the third, etc.
As you can see, a number of hyperlinks must be followed to get from
one site to another. This option gives you the greatest possible number
of hyperlink steps. Each step enables you to make some hyperlinks with
a number of other websites. So if you have selected only one level, you
will only be able to copy the websites (let's call them XI websites) to
which there is a link on the website you are downloading (scanning), and
not the sites with hyperlinks from XI websites.
The following chart shows how the links level limit works.
Number of connections
In this item you enter the number of simultaneous connections.
As a rule 3 - 10 connections are made. The optimal number of connections
will depend on the number of lines you have and the connection speed of
Save results automatically
To save your results automatically every N of minutes. This option
shows how frequently your interim search results are to be saved.
Time out for one connection
This option gives the maximum amount of time in seconds during which
each document (one connection) is downloaded.
At the end of this time the
program starts downloading the next document.
Number of retries
The number of attempts made to download each document.
This option shows the number
of attempts to download the same file if the provider connection or website
link is broken off. The program will make as many attempts to download
as you specify.
Swap URL count
The number of temporary addresses added to the list of tasks (tree
of downloadable addresses).
Does not visit twice already scanned site
This option allows you not to scan the addresses which have already
been searched previously.
Apply domainname.com = www.domainname.com
In some sites the hyperlinks to other sites contain no original www
symbols and when the same documents are downloaded they may be inscribed
twice in different directories. This option is designed to deal with this
anomaly in Internet sites. If you highlight this option INTERNET-SOFT.COM
and WWW.INTERNET-SOFT.COM will be treated as synonymous addresses. The
address is automatically prefixed as www in this type of search.
Expand the nodes parents to make the node visible
This convenience option is intended to graphically represent the tree
of websites scanned. In this way the option shows the current branches
of the site being downloaded and enables the program to graphically depict
the locations where sites are downloaded.
Identify browser as
This option shows how the program will be identified when the website
is downloaded by a remote server.
For example, when you download
a page using Internet Explorer 5.0, the remote server performs this operations
and writes the contents of the server as a protocol. The Extractor program
does the same thing when you visit a website.
We would like to draw your attention to the following:
Since the worldwide web contains a huge number of pages great data processing
power may be needed as well as a large amount of disk space on your computer
to download links and websites. A few hours of work by the program may
take up many gigabytes on your hard disk.
File Type Filter: Limiting the types and sizes of files
You can use this option to specify the types of files you want to
download and limit their size.
This is important, for example, when you only want to download text
documents without banners, pictures or archive files.
In this case, check the option beside html, htm, txt and shtml, etc.
You can use these menu options to limit the size of files to be downloaded.
If you have selected "Load all file sizes", files of all sizes will be
downloaded. Otherwise you will only get the sizes (specified in bytes)
you have selected.
URL / Domain Filter: Limitations by names of directories, domain
names and files.
You can make limitations by entering certain words in domains. Let's
say you're downloading files only from www.offline-browser.com.
You would only enter offline-browser as the filter word.
The filter can be used separately:
The filter can be used to include and exclude. If you have entered
words into the exclude filter, this means that if the URL contains any
of these words, the corresponding files will not be downloaded. If you
opt for the include filter, this means that only the names containing the
properties specified in the word filter will be downloaded.
Domains: Limitations by domain type.
to adjust the word content in a domain name;
to expand the domain;
to adjust the contents of a certain word in a directory name;
to modify any given word in the file name.
This option enables you to make limitations by type and country of
To do this click on the requested domain type.
This is all you have to do for the main program settings.
When you exit the menu window you save by default the data you have
entered and you can proceed to download websites.
Now we can start a project. The default properties you have entered
will automatically be called up when you start a new project. These properties
can be altered and saved for a later time for each separate project.
The term "project" therefore refers to the total number of options that
define which site and properties are to be downloaded.
How the program works
Creating a New Project
Downloading a website
Online / Offline Preview