[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

tcp-connect and core dump in MrEd.



Hi,

  I have the following piece of working[1] code:

(define (grab-html b e)
  (let*-values
   (((url) (send urlbox get-value))
    ((sin sout) (tcp-connect url 80))
    ((html-start) (regexp "Content-Type:.*"))
    ((html-src) (string)))
   (fprintf sout "GET http://www.~a/ HTTP/1.0~n~n" url)

   ;; read until "Content-Type..." is met
   (do ((line (read-line sin 'any) (read-line sin 'any)))
       ((regexp-match html-start line)))

   (send src set-value (let loop ((line (read-line sin 'any))
                       (str ""))
     (if (eof-object? line)
         str
         (loop (read-line sin 'any)
               (string-append str line)))))))

  It is part of a simple MrEd program that grabs the HTML source of a
  given URL.  It is supposed to be the rough equivalent of this Python code:

def get_src(url):
  try:
    url_in = urllib.urlopen(url)
  except:
    print 'Could not get source of %s' % (url)
    return
  src = url_in.readlines()
  for line in src:
    print line,

  There are still a few differences between the Scheme and Python
  codes that I'm trying to solve.  If I pass the url
  http://www.slashdot.org to the Python, for example, I get the
  correct HTML source of slashdot.org, but if I pass it to the Scheme
  code the following error is returned: "403 Forbidden: You don't have
  permission to access http://www.slashdot.org/ on this server.
  Apache/1.3.12 Server at slashdot.org Port 80."  This problem is
  obviously rooted at the call to tcp-connect but since that is the
  only function I can find in the MzScheme manual to connect to hosts
  I cannot figure out a solution--how can this be fixed?

  Another issue: if I pass the url http://www.microsoft.com to both
  scripts, they both return the correct source, but the Scheme program
  crashes and core dumps right after yielding the source.  In my mind
  the Scheme program's behavior is the correct one, but I have a
  feeling it's because of my code not dealing correctly with large
  amounts of HTML rather than as a tribute to Microsoft's products.
  It may be relevant to add that `src' is a text-field% box in MrEd
  into which the HTML source is inserted.

Thanks a lot,
-- 
Jordan Katz <katz@underlevel.net>  |  Mind the gap

[1] A highly subjective term in programming, there are still some
issues :)