
The simplest definition of a Heuristic is an optimistic guess as to how much it will cost to get from a certain node in a graph or tree to the goal state. This guess must be optimistic, meaning must always be less than or equal to the actual cost of traveling from the node to the goal. Negative heuristics are also not allowed because bad things could happen. Admissible heuristics estimate the cost of reaching the goal state in a search algorithm. The heuristic is used in the A* search evaluation function to determine the expansion order (or search order) of nodes in the graph. Admissible heuristics can be found be "loosening" the constraints of a problem in order to make easier generalizations about costs. They can also be precomputed as long as we know the set of all possible goal states. The most straightforward and geometric example of a heuristic computed by loosening the constraints of a problem would be the straight line distance between two cities on a map. From basic geometry it is known that the shortest distance between two points is a straight line, thus the lowest possible cost of traveling between the two points is at least the straight line distance. Knowing more about the problem enables you to come up with a heuristic that more closely models the problem.
For example, imagine that you live 100 miles away from work and you have a car that can travel 100 miles per hour. If path costs are representing the amount of time between two points it's obvious that one potential heuristic for the problem would be (distance / max speed). This would give a heuristic value of 1 to the starting node. However we know from our experiences driving that there are speed limits and other laws that must (or at least should) be taken into account. A better and still admissible heuristic would use this additional information to help us compute more accurate estimates of the actual cost of traveling from a given node to the goal. For example, if we know that for the first 50 miles the speed limit is 50 miles per hour we know that the fastest we can possibly get to work is in 1.5 hours (traveling 50 miles per hour for 50 miles and 100 miles per hour for the rest).
Below is a graph with an admissible heurisic, the h values are the heuristic values for that node. Note how the heuristic values are always less than or equal to the shortest path to the goal node.

Finding an admissible heuristic is very important for using A* search, the better the heuristic the faster A* should be at finding an optimal path. Also, if a heuristic is inadmissible A* is not guaranteed to be optimal. However calculating a heuristic can be a bit of a double edges sword, because if the heuristic calculation is too computationally expensive it may result in a performance hit and slower performance than using an "inferior" heuristic. For example, a perfectly admissible heuristic for a graph search algorithm would be to find the shortest path from every node to the goal node using breadth first or UCS and use the actual cost as the heuristic. This would obviously be admissible but we would end up doing much more work than necessary.
Constructing a useful heuristic we can often use a trait of Dominance to essentially combine two heuristics. Suppose we have heuristic A which is very good (and admissible) at approximating long distances, but whenever we are closer than 10 units away it will always return zero. And suppose we have a heuristic B that is spot on when the goal is less than 10 units away but for anything further away it glitches out and returns zero. Using the max of these two heuristics would also be an admissible heuristic and often can produce good results. Often using two heuristics together in a new heuristic C = max(A,B) is greater than the sum of its parts. Also, the sum of its parts is not guaranteed to be admissible (lame joke).