Updated Twitter posting and an apology

Let’s start with the apology. Earlier this week, RSS subscribers got repeated notices of the daily roundup of my Twitter updates. These posts are generated automatically overnight; because of a screwup on my part, two posts were being made each day and the feeds followed suit. I’m sorry about that. It took me a couple of days to figure out what was wrong, but it seems to be fixed now.

The screwup came as I was trying to make the twitterpost script more robust. When it’s working, twitterpost

I have a launchd process set up to run twitterpost in the middle of the night to automatically generate a blog post of my deathless thoughts. But it had stopped working well.

You may recall that at the beginning of the month Twitter was quite unreliable—more so than usual. For several days in a row, twitterpost had been unable to connect when it ran, so it would just fail, and there’d be no automatic posting for that day. I’d have to run the script by hand when Twitter was responsive. To fix this problem, I rewrote twitterpost to check whether the connection was successful and try again several minutes later if it wasn’t. Here’s the new source code:

  1:  #!/usr/bin/python
  2:  
  3:  import twitter
  4:  from datetime import datetime, timedelta
  5:  from time import sleep
  6:  import pytz
  7:  import wordpresslib
  8:  import sys
  9:  import re
 10:  
 11:  # Parameters.
 12:  tname = 'drdrang'                   # the Twitter username
 13:  chrono = True                       # should tweets be in chronological order?
 14:  replies = False                     # should Replies be included in the post?
 15:  burl = 'http://www.leancrew.com/all-this/xmlrpc.php'    # the blog xmlrpc URL
 16:  bid = 0                             # the blog id
 17:  bname = 'drdrang'                   # the blog username
 18:  bpword = 'seekret'                  # not the real blog password
 19:  bcat = 'personal'                   # the blog category
 20:  tz = pytz.timezone('US/Central')    # the blog timezone
 21:  utc = pytz.timezone('UTC')
 22:  
 23:  # Get the starting and ending times for the range of tweets we want to collect.
 24:  # Since this  is supposed to collect "yesterday's" tweets, we need to go back
 25:  # to 12:00 am of the  previous day. For example, if today is Thursday, we want
 26:  # to start at the midnight that  divides Tuesday and Wednesday. All of this is
 27:  # in local time.
 28:  yesterday = datetime.now(tz) - timedelta(days=1)
 29:  starttime = yesterday.replace(hour=0, minute=0, second=0, microsecond=0)
 30:  endtime = starttime + timedelta(days=1)
 31:  
 32:  # Create a regular expression object for detecting URLs in the body of a tweet.
 33:  # Adapted from
 34:  # http://immike.net/blog/2007/04/06/5-regular-expressions-every-web-programmer-should-know/
 35:  url = re.compile(r'''(https?://[-\w]+(\.\w[-\w]*)+(/[^.!,?;"'<>()\[\]\{\}\s\x7F-\xFF]*([.!,?]+[^.!,?;"'<>()\[\]\{\}\s\x7F-\xFF]+)*)?)''', re.I)
 36:  
 37:  # A regular expression object for initial hash marks (#). These must
 38:  # be escaped so Markdown doesn't interpret them as a heading.
 39:  hashmark = re.compile(r'^#', re.M)
 40:  
 41:  ##### Twitter interaction #####
 42:  
 43:  # Get all the available tweets from the given user. If Twitter is unresponsive,
 44:  # wait a while and try again. If it's still unreponsive after several tries,
 45:  # just give up.
 46:  api = twitter.Api()
 47:  tries = 9
 48:  trial = 0
 49:  while trial < tries:
 50:      trial += 1
 51:      try:
 52:          statuses = api.GetUserTimeline(user=tname)
 53:      except:
 54:          print "Can't connect to Twitter on try %d..." % trial
 55:          sleep(15*60)
 56:          continue
 57:      break
 58:  if trial >= tries:
 59:      print 0
 60:      sys.exit()
 61:  
 62:  # Tweets are in reverse chronological order by default.
 63:  if chrono:
 64:      statuses.reverse()
 65:  
 66:  # Collect every tweet and its timestamp in the desired time range into a list.
 67:  # The Twitter API returns a tweet's posting time as a string like 
 68:  # "Sun Oct 19 20:14:40 +0000 2008." Convert that string into a timezone-aware
 69:  # datetime object, then convert it to local time. Filter according to the
 70:  # start and end times.
 71:  tweets = []
 72:  for s in statuses:
 73:      posted_text = s.GetCreatedAt()
 74:      posted_utc = datetime.strptime(posted_text, '%a %b %d %H:%M:%S +0000 %Y').replace(tzinfo=utc)
 75:      posted_local = posted_utc.astimezone(tz)
 76:      if (posted_local >= starttime) and (posted_local < endtime):
 77:          timestamp = posted_local.strftime('%I:%M %p').lower()
 78:          # Add or escape Markdown syntax.
 79:          body = url.sub(r'<\1>', s.GetText())      # URLs
 80:          body = hashmark.sub(r'\#', body)          # initial hashmarks
 81:          body = body.replace('\n', '  \n')         # embedded newlines
 82:          if replies or body[0] != '@':
 83:              if timestamp[0] == '0':
 84:                  timestamp = timestamp[1:]
 85:              tweet = '[**%s**](http://twitter.com/%s/statuses/%s)  \n%s\n' % (timestamp, tname, s.GetId(), body)
 86:              tweets.append(tweet)
 87:  
 88:  # Obviously, we can quit if there were no tweets.
 89:  if len(tweets) == 0:
 90:      print 0
 91:      sys.exit()
 92:  
 93:  # A line for the end directing readers to the post with this program.
 94:  lastline = """
 95:  
 96:  *This post was generated automatically using the script described [here](http://www.leancrew.com/all-this/2009/07/updated-twitter-to-blog-script/).*
 97:  """
 98:  
 99:  # Uncomment the following 2 lines to see the output w/o making a blog post.
100:  # print '\n'.join(tweets) + lastline
101:  # sys.exit()
102:  
103:  ##### Blog interaction #####
104:  
105:  # Connect to the blog.
106:  blog = wordpresslib.WordPressClient(burl, bname, bpword)
107:  blog.selectBlog(bid)
108:  
109:  # Create the info we're going to post.
110:  post = wordpresslib.WordPressPost()
111:  post.title = 'Tweets for %s' % yesterday.strftime('%B %e, %Y')
112:  post.description = '\n'.join(tweets) + lastline
113:  post.categories = (blog.getCategoryIdFromName(bcat),)
114:  
115:  # And post it.
116:  newpost = blog.newPost(post, True)
117:  print newpost

The new stuff is in Lines 43-60. The connection to Twitter is now in the body of a loop. If the connection fails, the connection attempt is repeated 15 minutes later, for a maximum of 9 attempts over a two-hour period. If the connection succeeds, the process breaks out of the loop and continues on to create the post.

I tested the new twitterpost by running it with my network cable initially disconnected (Twitter wouldn’t oblige me by being down during testing). After a few loops, I’d reconnect the cable and the rest of the script would execute. It seemed to be working perfectly, so I was surprised to find the repeated posts the next day.

My first thought was that some mistake in the loop was making the script stutter. But it wouldn’t stutter when I tested it, and the connection to the blog wasn’t even in the loop. Eventually I learned that I had somehow created a second launchd process; the posts were repeated because twitterpost was being run twice each night. Killing the second launchd process eliminated the repeated post.

Update 9/15/09
Hey, it works! Last night, the script couldn’t connect to Twitter for an hour. I have it set to start at 1:15 am, but last night’s post is timestamped 2:15 am. Launching Console to look at the system log, I found this:

Sep 15 02:15:05 drang com.leancrew.twitterpost[45665]: Can't connect to Twitter on try 1...
Sep 15 02:15:05 drang com.leancrew.twitterpost[45665]: Can't connect to Twitter on try 2...
Sep 15 02:15:05 drang com.leancrew.twitterpost[45665]: Can't connect to Twitter on try 3...
Sep 15 02:15:05 drang com.leancrew.twitterpost[45665]: Can't connect to Twitter on try 4...
Sep 15 02:15:05 drang com.leancrew.twitterpost[45665]: 998

Four failures and then a success. I know the timestamps make it look like all the attempts were made at 2:15, but that’s because the script doesn’t flush its output as it runs; all the output hits the system log when script finishes. I saw the same thing when I was testing the script with the network cable disconnected.

I did my best to simulate a failed connection in my testing, and I thought it would work, but it’s still nice to see it doing what it’s supposed to in a real world situation.

Because I felt bad about sending out repeated feeds, and because I began to question the value of these automatic tweet posts, I decided to create new feeds that excluded the Twitter posts. Patrick Mosby had already done this through Yahoo! Pipes, and my first thought was to just link to his feed. But I didn’t want to be dependent on his maintaining that feed, so I just copied his work on my own Yahoo! account.

So now there are four feeds: an RSS 2.0 feed with every post, an Atom feed with every post, an RSS 2.0 feed without the Twitter posts, and an Atom feed without the Twitter posts.

I’ve never understood the niceties of RSS 2.0 v. Atom, but since WordPress and Yahoo! Pipes make it easy to provide both, that’s what I’ll do.

Tags: