Jan 5, 2013


Network Troubleshooting Techniques:
 
The 3 Main Approaches to Troubleshooting


As part of Cisco's CCNA certification they require you to know and understand 3 main methods of troubleshooting network issues from a client to a server.  The approaches are: the Top to Bottom Approach, the Cut-Through Approach, and the Bottom to Top Approach.  I will discuss the 3 of them as much as possible.  
For this you should have some knowledge of the OSI model.  There are 7 layers.  I remember them with this mnemonic: "Please Do Not Take Sales People Advice"  - Physical, Datalink, Network, Transport, Session, Presentation, and Application.  (That actually lists them from the bottom up.).  Below I briefly listed problems that can occur at each layer.  The main 3 layers you will look at when troubleshooting are the Application, Network, and Physical layers.  The rest of the layers will not have problems quite as often.

Quick Review of the OSI Model and problems that can occur at each layer.
  1. Application - Many times applications themselves can be misconfigured.  The most important reference within the application to look at is the target IP and Port.  Once you have confirmed that these are within the configuration you should be good to go.
  2. Presentation - You will rarely look for problems here.  This is how the applications code handles the network connection.  You can skip this when performing initial troubleshooting.
  3. Session - This also does not need much attention.  This is how the operating system handles the network connection.  I recommend skipping this as well during the initial troubleshooting.
  4. Transport - This is where the TCP/IP protocol begins to take shape.  Here is where you need to confirm that the right application port is being used.  If you find a problem in this layer it may mean that there is an issue with the TCP/IP installation on the system itself.  You may need to reinstall the TCP/IP protocol.  I have only seen a problem here twice in the last 10 years.  It is rare and you should troubleshoot it after going through everything else. 
  5. Network - This is where you  configure your IP address, subnet mask, default gateway, and any static routes.  Many times there are problems with routes and IP configurations in the layer.  So you will want to inspect them closely on the system you are troubleshooting.
  6. Datalink - This is where the NIC's MAC address is referenced.  It's also used for the ARP table.  When there is a problem here you will usually find that it is related to the MAC address that the system has assigned for another node on the network.  Every so often you will find a problem here, but not that often.     
  7. Physical - This is all about the cabling.  Many times cables can be broken, ever so slightly, that you need to replace them entirely.  Transceivers also fall into this layer.  You will find that on occasion they do malfunction.

Top to Bottom Approach:  The Application Layer

For this approach we take the OSI model starting from application layer and attempt to troubleshoot the problem from there.  I've listed out basic steps for what to do.
  1. Look at your application and try to recreate the problem that is happening.
  2. Check your applications configuration and make sure that the target IP and Port are correctly configured.
  3. Confirm if there are any proxies configured and if you application needs to use one.
  4. Perform a "telnet test" (instructions below) from the client to the server.  Confirm if it is successful or not. *This will only work if you are troubleshooting a TCP application.
  5. If your tests are unsuccessful, and the application configurations are correct, you will go down the OSI model to continue troubleshooting. 

How to do a telnet test:
(1st) Make sure that telnet is installed on the machine that you are running the test from
(2nd) Open up a command line on the client machine
(3rd) type "telnet <target server IP> <target application port>"
(*) Great way to test this is to do a telnet test to a website.  Try telneting to www.google.com on port 80 (the web service port).  The command will look like: "telnet www.google.com 80".  After you hit enter you should immediately see a blank screen or on a unix system it will tell you "Connected to www.google.com" and allow you to enter text.  


Cut Through Approach: The Network Layer

This method involves starting from the network layer of the OSI model.  I've listed the basic steps for what to do.
  1. Try to ping the server's IP address.  
    1. From a command line type: "ping <server IP>"
  2. If ping is unsuccessful, then try a traceroute to the server.  The key here is to see what the last hop is along the path.  
    1. From a Windows command line type: "tracert <server IP>"
    2. From a Linux/UNIX command line type: "traceroute <server IP>
  3. If you weren't able to ping the server and your traceroute shows the last hop as an IP address along the path, you need to investigate what the routing table on that last hop look like.  It may not be configured with the right next hop.  
  (*)  Something to keep in mind is that if there is a firewall along the path, the firewall may block the traceroute packet from everything after it.  This may confuse you when troubleshooting into thinking that the firewall is the problem.


Bottom to Top Approach: The Physical Layer

This method essentially involves starting from the Physical layer and working your way up when troubleshooting.  There is very little to starting here, and it is probably one of the easier ways to troubleshoot, at least initially.  
  1. Check the cable at the client.
    1. Make sure the cable is connected properly or even reseat the cable if necessary.
    2. You may need to completely change out a cable.  Sometimes it is actually pretty difficult to tell if there is a problem with the cabling because it can be something so subtle that you don't even notice it.
  2. Check the cable connection at the switch.
    1. It's always good to check the cables connection to the switch to rule out any connection problem at that end.  You may want to reseat the cable here as well.
    2. If your switch uses a transceiver for terminating the cable you may want to change out the entire transceiver.
    3. You can always move the connection of the client to another port if you suspect there actually is a problem with a port.  This is not completely uncommon.
  3. Check the cabling at your patch panel.
    1. Many times connections may go through a patch panel and we need to make sure that the connections are seated properly
    2. Similar to moving a connection on the switch you may want to move a connection across patch panel ports as part of troubleshooting.


*What Experience Teaches:
  1. Make sure that only one person is having the problem that is being reported.  Confirm that no one else is experiencing the problem.  That can easily tell you if you need to troubleshoot from the client or the server.
  2. Let users attempt the most basic troubleshooting techniques before you get involved.  (ie. have them ping the server, reseat the cable to their PC, reboot their PC)

Questions and comments are always appreciated.  I really hope this helps.  Thanks for reading.

- David Pagan

No comments:

Post a Comment