[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ic] RobotUA


Grant wrote:

Grant wrote:

Thanks a lot for the info Phillip.  I'd like to clarify a couple things
though...



Usually with a 301 it takes a couple of runs from most spiders to decide
to go anywhere else into the
site.


What is the correct way to forward from your domain to your site's index
page so the spiders don't get confused?


Icdevgroup uses 302 and does not get indexed but for the first page, as
you may have noticed, from google
most other search engines will tromp all over the site and produce gory
listings
My opinion and of many others I have spoke with at
http://www.webmastersworld.com (where the GoogleGuy hangs out) say 301
is the only way if your going to redirect and expect rankings. You could
use a doorway page
that uses java script and just place links into the sight with some
keywords in it for the search engines, but feel that is
very unprofessional and spammy myself.



So pretty much everyone uses a 301 or 302 to get to the index page of their
site, and therefore has to deal with this issue?

Now depending on how long your system has been running with a 301
if you move now it will cause
you more problems. Realize that 301 is just like you told the mailman
you have a new address and then
you send a new change of address to all of your magazine companies.
Now how long does it take for them to get around to sending them to your
new address?
Then sundenlly you decide to send them and your mailman a new change of
address again even before
they have actually acted on your old change of address. Well you will
have at least 2 monthns before you get
any magazines or a good part of your mail will end up in
different places.
So usually using 301 in difference to 302 that says temp move don't keep
record of it. This is a very bad things
when it comes to spiders if you keep bouncing arround.


Are you saying a 301 or a 302 is better for spiders?


Here is a link from Google that talks about what they feel you should do

http://www.google.com/remove.html

And the snippet that talks about 301

*Change the URL of your website*

Since Google's crawler associates the content of a page with its URL,
there is no way to manually change the URL that is displayed for your
website. The URL will be updated the next time we crawl your site. The
crawler revisits each site according to an automatic schedule, and we
cannot manually accelerate the date on which your site will be recrawled.

If the URL of your website has changed since we last crawled it, you may
use the URL submission form <http://www.google.com/addurl.html> and the
URL removal methods described below. However, the URL submission form
does not take effect immediately, so using the URL removal feature may
leave your website inaccessible from Google until we crawl your site again.

Instead of requesting a change from Google, we recommend that you ask
the sites currently linked to your old site to update their links (to
point to your new site). Also, don't forget to change any entries you
may have in the Yahoo! directory and the Open Directory. Finally, if
your old URLs redirect to your new site using HTTP 301 (permanent)
redirects <http://www.ietf.org/rfc/rfc2616.txt>, our crawler will know
to use the new URL. Changes made in this way will take 6-8 weeks to be
reflected in Google.

I feel 301 is better.and also pay close attention to the time google
says it will take for the crawler to understand the
new address (6-8 weeks)

This is spoken completly from experience since I did this myself and
have seen its effects.

Also all of your DMOZ entries also need to point to your redirected
location to get credit for it.



Where are these entries?


If your site has been submitted to DMOZ at http://www.dmoz.org and since
Google and other search engines use
these listing for supporting your rankings they should be set to always
match your expected site location


Ok, thank you.

Point is this if you have just started doing this move, then leave it
alone. It will take at least 2 months for
google and a few others to catch up. If you have done this for awhile
you could completley lose at least
a months worth of crawls until they get around to seeing the new move.

This happend to me and I got impatient myself and moved around again.
Lost much traffic and after talking to some people at webmasterworld,
they just told me to not mess with it and be patient they will crawl
your site within one to two
months. If your sids are not showing they will jump on it soon.

--
Philip S. Hempel
debian/rules


It seems like there must be a better way to go about all this that doesn't
use 301s at all so the spiders will head straight inside.  What would that
be?

- Grant



Most SE's do not like 302 (temp redirect) and almost all suggest the
usage of 301 (permanant redirect)
302 does not push ranking onto the main page since this is want you want.

Since I quit using 302 and went to 301 and a few other things I
went from page 5 in
the rankings to number 1,2,3,4,5 for over 15 key word sets and
went from 100 users
to over 800 users (not search engines) in a day average.

Goggle spiders over 200 pages on my site now and we have as of
today on the average
have over 10 sales a day (from 1 every 3 weeks). (this is good for
a supposed part time business)


Hope this helps if you need more ask.

(and please excuse typos, wrote this in a rush)
--
Philip S. Hempel
debian/rules

Here's what Google's doing on my site:

64.68.82.70 - - [26/Nov/2002:08:30:13 -0800] "GET /robots.txt HTTP/1.0" 200
0 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.70 - - [26/Nov/2002:08:30:15 -0800] "GET / HTTP/1.0" 301 330 "-"
"Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.5 - - [26/Nov/2002:08:39:22 -0800] "GET /cgi-bin/shop/ HTTP/1.0"
200 38303 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.7 - - [26/Nov/2002:08:49:59 -0800] "GET /cgi-bin/shop/policies.html
HTTP/1.0" 200 35830 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.28 - - [26/Nov/2002:08:52:20 -0800] "GET
/cgi-bin/shop/moreinfo.html HTTP/1.0" 200 39917 "-" "Googlebot/2.1
(+http://www.googlebot.com/bot.html)"

That's it.  This shows that they are getting into the main site, past the
301.  They're just looking at a couple of pages though.  I've verified with
the Sam Spade browser that IC is sanitizing the URLs when the Google User
Agent is used.  Also, it's GETing "/cgi-bin/shop/", but I have NO links to
that particular path anywhere in the site.  The redirect redirects to
www.mystore.com/cgi-bin/shop/index.html.  How could it be hitting
"/cgi-bin/shop/"?
Try using http://validator.w3.org/ and a link checker there may be a bad link somewhere. ( I can't think of a checker
at the moment)

Thanks a lot for all your help Phillip.  Hopefully others
will benefit from this discussion too.  Any idea why the Googlebot wouldn't
be hitting up more pages?  There are a ton of links on that front index page
to all of my product categories.


A little history on how search engines work. A search engine will pickup your robots.txt if there is one.
Then it will pickup your index page.
Now most search engines rotate on a month to month basis what it sees on it's first run is what is will use for one month. If most of your links are on the index page it will run around for a while using just a few pages. Most notabaly any pages that have the highest PR (with google) will be hit throught the month.

Most search engines run picking up new data around once a month. On this once a month cycle, the search engine will then do it's deepest crawl also adding the new changes to the index page into the database This is when the freshest information is used and a new ranking is assigned to your pages. You will notice again a new hits on the higher ranking pages throughout the month.

Also one rule to remember, (for Google) is that it loves fresh edited pages. The more links and information Google sees added those pages may get picked up by the freshbot more often and used in it's index. I try to add at least one new link every few days to my highest hit pages throughout the month to increase the likelyhood
that Google and others will pick it up as a fresher page and use it.

The pages on your site that are linked from other site will also get hit more often as well even if Google did not see them from off your index page. Also remember, the deeper into the sight the page is the less likely that google will hit it as regular.

One of the biggest factors related to what a search engine does with a site has to do with how long the site has been in the search engines index. Many search engines since they only run once a month may have a limit on how many links it will crawl during a run on a site. If your site has only been in for a month then you will only see mabey 4 to 10 links hit.
Then the next month it will pickup 4 to 10 more and so on.

In conclusion: (if anyone wants more please ask)
Patience and */perseverance/* <http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&newwindow=1&q=perseverance&spell=1> is the biggest factor when doing SEO. Never make changes to your sight without always checking how it will html validate. Just because it works in a webbrowser does not mean an it works in
a SE. Most SE's are capable of html 3.2 + some 4.0 and  some CSS 1.0

Many of these tools are at http://www.searchengineworld.com/ for spider sims and other helpful tools. Also try frequenting http://www.webmasterworld.com/ there is a plethora of information there. One thing that IC does have is a way to do html checking on page (there is a tag and off hand I don't remember)

This information is not really technical in it's depth but trying to give a lite
run down how I know this works. I hear every day many people asking the same questions.

One of our primary goals in IC is to sell products and search engines are a very large
part of this. Always have patience IC is the *GREATEST* product in both content management
as well as online sales. If you think of IC as a content management system instead of a web store your goal would be to put
data in that helps everyone that comes to your site, Google is pushing to webmasters make this happen. If your site works well under lynx (even Google recomends using lynx)
then you should find most search engines will have no problem finding your information.


It is good that you look at your logs, since you noticed Google picking up somthing unexpected.
I take a cursory look at my logs at least once a day. This helps me to see what may be a problem.

BTW I do wish one thing of IC that would help is if a page is not found in IC that it could send error logs to apache(by default) of the errors. You may be suprised how many SE's will find errors
that most web browers fix for you.

Good luck

--

Philip S. Hempel


_______________________________________________
interchange-users mailing list
suppressed
http://www.icdevgroup.org/mailman/listinfo/interchange-users


Mail converted by mhonarc 2.6.15
This archive provided courtesy of JSW4.NET, Internet Hosting Services for Small Business.