For our NNTP client example, we are going to try to be more adventurous. It will be similar to the FTP client example in that we are going to download the latest of something—this time it will be the latest article available in the Python language newsgroup, comp.lang.python.
Once we have it, we will display (up to) the first 20 lines in the article, and on top of that, (up to) the first 20 meaningful lines of the article. By that, we mean lines of real data, not quoted text (which begin with “>” or “|”) or even quoted text introductions like “In article <. . .>, soAndSo@some.domain wrote:”.
Finally, we are going to do blank lines intelligently. We will display one blank line when we see one in the article, but if there are more than one consecutive blank, we only show the first blank line of the set. Only lines with real data are counted toward the “first 20 lines,” so it is possible to display a maximum of 39 lines of output, 20 real lines of data interleaved with 19 blank ones.
If no errors occur when we run our script, we may see something like this:
$ getLatestNNTP.py
*** Connected to host "your.nntp.server"
*** Found newsgroup "comp.lang.python"
*** Found last article (#471526):
From: "Gerard Flanagan" <grflanagan@...>
Subject: Re: Generate a sequence of random numbers that sum up to 1?
Date: Sat Apr 22 10:48:20 CEST 2006
*** First (<= 20) meaningful lines:
def partition(N=5):
vals = sorted( random.random() for _ in range(2*N) ) vals = [0] + vals + [1]
for j in range(2*N+1):
yield vals[j:j+2]
deltas = [ x[1]-x[0] for x in partition() ] print deltas
print sum(deltas)
[0.10271966686994982, 0.13826576491042208, 0.064146913555132801, 0.11906452454467387, 0.10501198456091299, 0.011732423830768779, 0.11785369256442912, 0.065927165520102249, 0.098351305878176198, 0.077786747076205365, 0.099139810689226726]
1.0
$
ptg 17.3 Network News 761
Example 17.2 NNTP Download Example (getFirstNNTP.py) This downloads and displays the first “meaningful” (up to 20) lines of the most recently available article in comp.lang.python, the Python newsgroup.
1 #!/usr/bin/env python 23 import nntplib
4 import socket
56 HOST = 'your.nntp.server' 7 GRNM = 'comp.lang.python' 8 USER = 'wesley'
9 PASS = "you'llNeverGuess"
1011 def main():
1213 try:
14 n = nntplib.NNTP(HOST)
15 #, user=USER, password=PASS) 16 except socket.gaierror, e:
17 print 'ERROR: cannot reach host "%s"' % HOST 18 print ' ("%s")' % eval(str(e))[1]
19 return
20 except nntplib.NNTPPermanentError, e:
21 print 'ERROR: access denied on "%s"' % HOST 22 print ' ("%s")' % str(e)
23 return
24 print '*** Connected to host "%s"' % HOST 2526 try:
27 rsp, ct, fst, lst, grp = n.group(GRNM) 28 except nntplib.NNTPTemporaryError, e:
29 print 'ERROR: cannot load group "%s"' % GRNM 30 print ' ("%s")' % str(e)
31 print ' Server may require authentication' 32 print ' Uncomment/edit login line above'
33 n.quit()
34 return
35 except nntplib.NNTPTemporaryError, e:
36 print 'ERROR: group "%s" unavailable' % GRNM 37 print ' ("%s")' % str(e)
38 n.quit()
39 return
40 print '*** Found newsgroup "%s"' % GRNM 4142 rng = '%s-%s' % (lst, lst)
43 rsp, frm = n.xhdr('from', rng) 44 rsp, sub = n.xhdr('subject', rng) 45 rsp, dat = n.xhdr('date', rng)
46 print '''*** Found last article (#%s):
47
(continued)
ptg 762 Chapter 17 Internet Client Programming
This output is given the original newsgroup posting, which looks like this:
From: "Gerard Flanagan" <grflanagan@...>
Subject: Re: Generate a sequence of random numbers that sum up to 1?
Date: Sat Apr 22 10:48:20 CEST 2006 Groups: comp.lang.python
Gerard Flanagan wrote:
> Anthony Liu wrote:
> > I am at my wit's end.
Example 17.2 NNTP Download Example (getFirstNNTP.py) (continued)
48 From: %s 49 Subject: %s 50 Date: %s
51 '''% (lst, frm[0][1], sub[0][1], dat[0][1]) 5253 rsp, anum, mid, data = n.body(lst) 54 displayFirst20(data)
55 n.quit()
5657 def displayFirst20(data):
58 print '*** First (<= 20) meaningful lines:\n' 59 count = 0
60 lines = (line.rstrip() for line in data) 61 lastBlank = True
62 for line in lines:
63 if line:
64 lower = line.lower()
65 if (lower.startswith('>') and not \ 66 lower.startswith('>>>')) or \
67 lower.startswith('|') or \
68 lower.startswith('in article') or \
69 lower.endswith('writes:') or \
70 lower.endswith('wrote:'):
71 continue
72 if not lastBlank or (lastBlank and line):
73 print ' %s' % line
74 if line:
75 count += 1
76 lastBlank = False
77 else:
78 lastBlank = True
79 if count == 20:
80 break
8182 if __name__ == '__main__':
83 main()
ptg 17.3 Network News 763
> > I want to generate a certain number of random numbers.
> > This is easy, I can repeatedly do uniform(0, 1) for
> > example.
> > But, I want the random numbers just generated sum up
> > to 1 .
> > I am not sure how to do this. Any idea? Thanks.
> ---
> import random
> def partition(start=0,stop=1,eps=5):
> d = stop - start
> vals = [ start + d * random.random() for _ in range(2*eps) ]
> vals = [start] + vals + [stop]
> vals.sort()
> return vals
> P = partition()
> intervals = [ P[i:i+2] for i in range(len(P)-1) ]
> deltas = [ x[1] - x[0] for x in intervals ]
> print deltas
> print sum(deltas)
> --- def partition(N=5):
vals = sorted( random.random() for _ in range(2*N) ) vals = [0] + vals + [1]
for j in range(2*N+1):
yield vals[j:j+2]
deltas = [ x[1]-x[0] for x in partition() ] print deltas
print sum(deltas)
[0.10271966686994982, 0.13826576491042208, 0.064146913555132801, 0.11906452454467387, 0.10501198456091299, 0.011732423830768779, 0.11785369256442912, 0.065927165520102249, 0.098351305878176198, 0.077786747076205365, 0.099139810689226726]
1.0
ptg 764 Chapter 17 Internet Client Programming
Of course, the output will always be different since articles are always being posted. No two executions will result in the same output unless your news server has not been updated with another article since you last ran the script.
Line-by-Line Explanation
Lines 1–9
This application starts with a few import statements and some constants, much like the FTP client example.
Lines 11–40
In the first section, we attempt to connect to the NNTP host server and bail if it tails (lines 13–24). Line 15 is commented out deliberately in case your server requires authentication (with login and password)—if so, uncomment this line and edit it in with line 14. This is followed by trying to load up the specific newsgroup. Again, it will quit if that newsgroup does not exist, is not archived by this server, or if authentication is required (lines 26–40).
Lines 42–55
In the next part we get some headers to display (lines 42–51). The ones that have the most meaning are the author, subject, and date. This data is retrieved and displayed to the user. Each call to the xhdr() method requires us to give the range of articles to extract the headers from. We are only interested in a single message, so the range is “X-X” where X is the last message number.
xhdr() returns a 2-tuple consisting of a server response (rsp) and a list of the headers in the range we specify. Since we are only requesting this information for one message (the last one), we just take the first element of the list (hdr[0]).That data item is a 2-tuple consisting of the article num- ber and the data string. Since we already know the article number (because we give it in our range request), we are only interested in the second item, the data string (hdr[0][1]).
The last part is to download the body of the article itself (lines 53–55). It con- sists of a call to the body() method, a display the first 20 or fewer meaningful lines (as defined at the beginning of this section), a logout of the server, and complete execution.
Lines 57–80
The core piece of processing is done by the displayFirst20() function (lines 57–80). It takes the set of lines making up the article body and does some preprocessing like setting our counter to 0, creating a generator expression
ptg 17.3 Network News 765
that lazily iterates through our (possibly large) set of lines making up the body, and “pretends” that we have just seen and displayed a blank line (more on this later; lines 59–61). When we strip the line of data, we only remove the trailing whitespace (rstrip()) because leading spaces may be intended lines of Python code.
One criterion we have is that we should not show any quoted text or quoted text introductions. That is what the big if statement is for on lines 65–71 (also include line 64). We do this checking if the line is not blank (line 63).
We lowercase the line so that our comparisons are case-insensitive (line 64).
If a line begins with “>” or “|,” it means it is usually a quote. We make an exception for lines that start with “>>>” since it may be an interactive interpreter line, although this does introduce a flaw that a triply-old message (one quoted three times for the fourth responder) is displayed. (One of the exercises at the end of the chapter is to remove this flaw.) Lines that begin with “in article . . .”, and/or end with “writes:” or “wrote:”, both with trailing colons ( : ), are also quoted text introductions. We skip all these with the continue statement.
Now to address the blank lines. We want our application to be smart. It should show blank lines as seen in the article, but it should be smart about it. If there is more than one blank line consecutively, only show the first one so the user does not see unnecessarily excessive lines, scrolling useful information off the screen. We should also not count any blank lines in our set of 20 meaningful lines. All of these requirements are taken care of in lines 72–78.
Theif statement on line 72 says to only display the line if the last line was not blank, or if the last line was blank but now we have a non-blank line. In other words, if we fall through and we print the current line, it is because it is either a line with data or a blank line as long as the previous line was not blank. Now the other tricky part: if we have a non-blank line, count it and set the lastBlank flag to False since this line was not empty (lines 74–76).
Otherwise, we have just seen a blank line so set the flag to True.
Now back to the business on line 61 . . . we set the lastBlank flag to True because if the first real (non-introductory or quoted) line of the body is a blank, we do not want to display it . . . we want to show the first real data line!
Finally, if we have seen 20 non-blank lines, then we quit and discard the remaining lines (lines 79–80). Otherwise we would have exhausted all the lines and the for loop terminates normally.