Very cool LinkedIn Meetup mashup

This is such a nice idea. I don’t use LinkedIn. Maybe I can make one for Twitter.

planetjeffro:

Image representing LinkedIn as depicted in Cru...
Image via CrunchBase

The Problem:

I go to a lot of Meetups and other events (tech, networking, etc) so I always check the RSVP list beforehand to get a sense of who’s going.  In fact, I often will decide whether or not to go based on who I know.  For example, if I see a few tech folks who I haven’t keep up with or emailed recently, I love using these events as a way to quickly catch up with a group of people.

Regardless, I desperately want to know who I am 2nd degree friends with on LinkedIn so I can shoot an email or tweet and plan on connecting at the event.  I consider these kinds of events to be sort of like coffee meetings on steroids - a chance to have meaningful face-to-faces with a bunch of people without all the annoying planning that tends to go along with setting up simple Skype calls or a coffee meeting.  The trick is to know who’s going.

Also, events cost money, like this Tech Cocktail Mixer at GA for $15, which does sound awesome but would be more awesome if I had a few connections before showing up.

The Solution:

There are awesome companies trying to solve this and show you who you’re connected to and through which people.  My favorite is Sonar.me, and not just because I know Brett, the founder.  It’s my go to app when I’m out and about and curious who’s around me.  But there are some big limitations:

  1. People at the event need to be checked in (on 4square, Sonar, whatever).  Face it, you know only 5% of people at any event are checked in on 4square.  Even at General Assembly at the start of Startup Weekend, there were only 40 people checked in (which is a whopping number) out of the 150-200(?) attendees - and that’s the most tech-centric crowd in the most tech-savvy space during a tech event.  So I’m interested in capturing that other 90+% of people who have check-in fatigue or forget or don’t care and, frankly, have “pre-checked-in” via a simple RSVP.
  2. Facebook and Twitter aren’t networks I use for networking.  Linked In is. Brett says Linked In integration is on its way (which will really make Sonar shine for me), but in the meantime…

My Hack:

So I spent yesterday hacking together my most useful script(s) ever.  I’ll walk through two situations.

Eventbrite connections:

Start with a normal eventbrite page. It lists a name, company and maybe URL.  Great, but mostly worthless since, at the very least, it’s a lot of work to scroll through names and judge if I know anyone.  Worse to impossible to look people up in Linkedin and see who they are.

This is exactly what my script does.  It runs off of a simple text file (just cut and paste the eventbrite RSVP list, mainly because EB uses javascript that makes it hard to easily “get” the HTML) that looks like this:

Eli Greif, Landmark Ventures
Website: www.landmarkventures.com
Michael Levine, Ad Tech Startup
Website: google.com
Preetham Hegde, City of New York
Website: http://nyc.gov
Simone Grant, Morgan Museum
Website: themorgan.org

It goes through each name + company (and here’s the cool part) exploits Google to figure out the LinkedIn direct URL.  It’s a combination of syntax:

“$fname $lname $co $city linkedin”

Bill Johnson Microsoft new york linkedin

This is surprisingly accurate, but sometimes has people who aren’t quite right.  So it looks at the LI profile and builds a confidence score based on criteria such as:

+1 if any of the company words are present

+1 if the person’s first name is in the profile/URL

+2 if the person’s last name is in the profile/URL

+5 if the person’s full name/full company name is in the profile

It turns out this isn’t all that necessary, but it’s pretty accurate. (I used this to track down LI urls for the leaked SXSW participant list of 25,000 people that just had names and companies as well.)

Next, it logs into LinkedIn (with my username/password) using Perl’s Mechanize library and looks at the profile page.  Now it displays if you are 1st, 2nd or 3rd degree and who you have in common.

2nd degree is the most interesting, and it pulls a list of the people between you and the random person.  I won’t publish my list for the Tech Cocktail Mixer, but here are some interesting stats:

  • I found 16 people out of 59 RSVPs with whom I share connections
  • (I do have ~800+ connections on LI, so keep that in mind)
  • The most connections I share with one person is 45 / The least is, of course, 1
  • Median is 2 / Mean is 7
  • There are a lot of “usual suspects” as my shared connections, but there are a bunch of shared connections who show up less frequently.  I think those people may be more interesting to get intros through, just because they are uncommon in the NY tech circles.

The Meetup Hack:

I love meetup.  It’s how I got into the NY tech community and how I met most of the techies I know.  But the RSVP list could use some improvements. Upon first glance, it’s just a long, long list of names and pictures.  That’s cool, but as soon as I want more info I have to start mousing over and clicking on each person - a big pain.

My script takes a meetup event URL, like the one for the upcoming Lean Startup talk (100 ppl going with 100+ on the waiting list!):

http://www.meetup.com/lean-startup/events/31681192/

It then loops through each person’s name, looks at their profile to see if they have a LI url already entered (most don’t), and if not, uses the Google search algorithm.

These stats are more interesting:

  • Out of 102 attendees:
  • I have 1st connections / 43 2nd degree connections (!)
  • Median = 3 / Mean = 5.5
  • There are 7 people with > 10 connections in common. I should probably connect with them first.
  • Out of my 242 shared connections (with those 43 attendees), there are 107 unique people in my network who are the connectors (it would be interesting to tally and slice that data in different ways)

Issues:

Unfortunately, this script is in Perl and requires a username and password for LinkedIn, so it’s not exactly production ready.  But I think I could package and export it as an executable that would be run from the command line with arguments like “linked.exe blah@blah.com password meetup.com/url” though I’m not sure that’s a great idea.

I’d love to make this into a regular website that lets you OAuth your LI account and plug in a URL of eventbrite or meetup and it emails you a nice html email with links and lists of connections, etc.  Anyone want to help?

I do know this *probably* violates some annoying policies but it’s just for educational purposes.  I’m not sure how one would get around the problem of having one IP hit the LI servers 100+ times without getting booted.  I’ve factored in a lot of “sleep” commands to slow things down and hope they don’t notice. (It’s exceptionally easy to get noticed by Google when you let requests run in real-time.)

Next steps:

I can immediately think of some awesome next features to add:

  • So now I have a filtered list of event attendees I’m connected to and a list of those shared connections.  I can add a 1-click intro+LI connection that says via LI (if you have an email, which is a big if) or via twitter:

“Hi Bill, Looks like you’re going to event XYZ on Tuesday. We have x shared LI connections including Jane, Sally and Bob. Looking forward to connecting”

  • I am a member of 40 Meetup groups. Of the umpteen going on this week, I’d love to know the ONE I absolutely must attend because it has the most 2nd degree people.
  • One feature I’ve asked of Sonar.me is the ability to “turn off” certain connections.  So a tech VC I might have met a few times might have 2,000 connections and literally be the super-super-duper-steroid connector.  I’d like to ignore his connections though because unless I am personally connected to him strongly, that connection probably isn’t very helpful.  (I’m not going to ask 1 person for 13 introductions!)  I call this the Charlie Sheen effect on Sonar - Sheen single-handedly connects me to someone every time I open Sonar. Amusing, but not helpful ;-)

Thoughts?

Enhanced by Zemanta

(Source: planetjeffro)

Oops.
parrottrek:

Google Street View car after being pulled over by VCU Police.
This was the third time I’d seen the car in 10 minutes.

Oops.

parrottrek:

Google Street View car after being pulled over by VCU Police.

This was the third time I’d seen the car in 10 minutes.

Google App Engine price changes clearly explained

Thank you nice Tumblr, for lucidly explaining the Google App Engine price changes.

“The old pricing model charged for the wrong thing. The overwhelming vast majority of web applications use very little CPU; processes spend most of their time blocked waiting for I/O (datastore fetches, url fetches, etc). The App Engine cluster is not limited by CPU power, it is limited by the number of application instances that can be fit into RAM.

Charging for CPU time created architectural distortions…”

jeffschnitzer:

I don’t work for Google, but I read the mailing lists and pay attention. Also, I show up at places where Google buys beer. Here’s what I’ve learned:

What is changing?

Google is changing the way it charges for App Engine. Previously, you were charged for three things:

  1. Bandwidth in/out
  2. Data stored in the datastore
  3. “CPU time” of all your programs

The new billing model charges you for:

  1. Bandwidth in/out
  2. Data stored in the datastore
  3. Wall-clock time spent running application server instances
  4. Number (and type) of requests to the datastore

In addition, there is a 15-minute charge ($0.02) every time an instance starts and a flat $9 per application per month if you enable billing.

How are these pricing models different?

For most GAE applications, the biggest charge has always been “CPU time”. This number was composed of two parts:

  1. CPU time directly consumed by your frontend web application instance
  2. A crude approximation of CPU time consumed by the datastore and other APIs (“api_cpu_ms”)

In fact, api_cpu_ms has never been a real measure of CPU activity - the numbers are based on simple heuristics like “each index write costs 17ms”. So the change from api_cpu_ms billing to per-request billing for the datastore really isn’t much of a change.

The most significant change to the pricing model is that instead of billing you for CPU time consumed by your frontend web application, you will now be charged for every minute of wall-clock time that each instance runs, irrespective of how much CPU it consumes.

What was wrong with the old pricing model?

The old pricing model charged for the wrong thing. The overwhelming vast majority of web applications use very little CPU; processes spend most of their time blocked waiting for I/O (datastore fetches, url fetches, etc). The App Engine cluster is not limited by CPU power, it is limited by the number of application instances that can be fit into RAM. This is particularly problematic with single-threaded instances like the current Python servers and Java servers without <threadsafe>true</threadsafe>, which require multiple instances to serve concurrent requests.

Charging for CPU time created architectural distortions. Imagine your single-threaded Python application makes a URL fetch that takes 2s to complete (say, Facebook is having a bad day). You would need 200 instances (consuming large quantities of precious RAM) to serve 100 requests per second, but you would pay almost nothing because instances blocked waiting on I/O consume nearly no CPU time. Google’s solution was simply to refuse to autoscale your application if average request latency was greater than one second, essentially taking down your application if third-party APIs slow down.

Because the new instance-hour pricing model more accurately reflects real costs, App Engine can now give you those 200 instances — as long as you’re willing to pay for them.

Is this a move away from usage-based billing?

No. You’re still being charged for resources you consume, but now you’re being charged for the scarce resources that matter (occupied RAM) rather than the overabundant resources that are irrelevant (CPU time).

But my instance only uses 20MB! Why not charge per megabyte-hour?

It’s tempting to imagine that Google can just add instances to a machine until it runs out of RAM, then start adding instances to the next machine. In practice, you can’t architect a system like this. Your frontend instance may use 20MB now but nothing stops it from growing to 100MB without warning. If a couple application instances did this suddenly, it could push the box into swap and effectively crash all instances running on it. An application instance must reserve not just the RAM it actually uses, but the RAM it *could* use. Oversubscribing creates the risk of incurring unpredictable performance problems, and at the huge scale of App Engine, even low-sigma events become inevitable. I suspect that Google is very conservative about oversubscribing RAM reservations, if they do it at all.

Will my bill go up?

Almost certainly. Especially if you are using single-threaded instances (Python, or Java without <threadsafe>true</threadsafe>). Google really was charging an absurdly low price for App Engine before, letting us occupy many hundreds of megabytes of RAM for pennies a day. It was nice, but it wasn’t sustainable.

Does this mean App Engine is more expensive than other hosts?

It depends. If you’re looking at just the cost of computing, then yes GAE will be more expensive than services like AWS. On the other hand, you can run large-scale applications without the need to hire a system administrator or DBA - so that frees up a couple hundred thousand dollars per year from the budget.

It’s also really hard to make an apples-to-apples comparison. With the high-replication datastore, GAE provides a redundant, multihomed, fault tolerant system that can transparently survive whole datacenter crashes. It’s already happened. Setting up an equivalent system requires significant engineering effort, and you have to pay someone to wear a pager.

As App Engine has matured, it has gone from being a low-end hosting solution to a high-end hosting solution. Compared to a VPS at dreamhost, App Engine is very expensive. Compared to building your own HRD, App Engine is still comically cheap.

What can I do to lower my bill?

Google has created an article for this, but here’s some blunt advice.

There are two aspects to this, driven by the two separate aspects of billing:

  1. Lower the number of instances your app needs
  2. Reduce your datastore footprint

Most developers freaking out about their bill on the appengine mailing list are Python users shocked by the number of instances running to serve their application. You may be able to optimize this somewhat by tuning the scheduler but this will at best provide a small improvement. The stark reality is that single-threaded web servers are a huge expensive waste of resources. Processes that occupy big chunks of RAM while they sit around blocked on I/O don’t scale cost-effectively.

The only practical way to significantly lower your instance count is to use multithreading. This allows one instance to serve many concurrent requests, theoretically right up until you max out a CPU. Dozens of threads can block on I/O within a single process without consuming significant additional chunks of precious RAM.

  • If you are using Java, put <threadsafe>true</threadsafe> in your appengine-web.xml NOW
  • If you are using Python, beg/bribe/extort your way into the Python 2.7 Trusted Tester group

If you have turned on multithreading, it’s unlikely that the scariest line item in your bill will be instance-hours. Instead you will be wondering why you are being charged so much for the datastore. The good news is that there’s nothing new about optimizing datastore usage, this is what you should have been doing all along:

  • Cache entities in memcache when you can.
  • Remove unnecessary indexes.
  • Denormalize where you can. It’s much cheaper to load/save one fat entity than 20 small ones.

Why are developers complaining about the scheduler?

The scheduler is a red herring. It may very well have issues, but no amount of tweaking the scheduler will change the fact that in order to serve multiple concurrent requests with a single-threaded process, you need to run multiple instances. At best, the scheduler can trade off a crappy user experience for a lower bill.

Forget about the scheduler. Turn on multithreading ASAP.

Is there any good news about the price change?

Google says that higher prices will allow them to increase their commitment to App Engine and devote more resources to its development. This is probably true. If you’re building a business (as opposed to a hobby), the new pricing is probably not going to make or break you; salaries and overhead are probably still your biggest concern. However, the addition of significant new features (say, cross-entity-group transactions or spatial indexing) could allow you to improve your product in ways that were too expensive or difficult before. To the extent that more money means more features sooner, paying more might be worth it. Time will tell.

"G+ was build primarily as an identity service, so fundamentally, it depends on people using their real names if they’re going to build future products that leverage that information."

Ha! The Microsoft video is funnier, but what they’re saying actually isn’t true (see below).

rankandfile:

Without explicitly saying the word Hotmail, Google recently launched an Email Intervention campaign. Designed to help us get our friends off of old providers, and onto Gmail so we can gchat, call, and video chat with them, the video is aimed at hotmail users, and the people who love them.

Days later (coincidence?), Microsoft releases Gmail Man, a friendly, creepy mailman who asks embarrassing, intrusive questions while he delivers your email.

Never mind that Google Apps for Business (the closest analog to Office 365) doesn’t have ads in Gmail, and that Microsoft’s Hotmail is full of even blinkier, more garish ads (a better comparison for ad-laden consumer Gmail in the video).

LMAO.

rankandfile:

Google’s recent Extreme Makeover(s) reminds one of an episode of The Office:

After witnessing the rapid ascension of Ryan and his website project, a paranoid Creed dyes his hair black. 

"We are developing a free service called Chromoting that will enable Chrome notebook users to remotely access their existing PCs and Macs."

Chrome OS’s killer feature may end up being system management. I’ve certainly heard about flaws and missed features… but everything I read about boot-time, remote management, reinstall, and config sounds like The Future.

From the Chrome OS help pages. Color me intrigued. (via deltamualpha)

(Source: blank, via deltamualpha)

How to dual-boot Ubuntu on the CR-48 Chrome OS notebook

Haha.

“Was only a matter of time.”

I’ll say. Linux has been installed on crazier things. If they install Android on iPhones, they will install Linux on kissing cousins like Chrome OS.

Credit to the men and women who risk bricking a prized Chrome OS notebook.

randharristz:

Cr-48 netbook running Ubuntu It was only a matter of time: a page on the Chromium Projects website has emerged, detailing how to install Ubuntu on a Cr-48 netbook. The process is, understandably, a little risky — but it’s not like there are any tech bloggers out there that don’t know how to use Linux, right?

Snarkiness aside, the process is actually very easy. You have to hack at the SSD’s filesystem a little and fiddle with the Chrome OS kernel, but if you do everything right, you should be rewarded with a dual-boot system capable of running both Ubuntu and Chrome OS.

The best bit, though, is that you have to enable ‘developer mode’ to escape Chrome OS’s ‘verified boot’ security measure. To do this, you need to flip a switch on the back, under the battery, as per the hilarious instructional photo shown after the break.

Continue reading How to boot Ubuntu on the Cr-48 Chrome OS netbook

Is there a Chrome OS notebook CPU bottleneck?

This is really discouraging to read. It’s one thing to hear complaints about the much-maligned Flash player on Atom. But Google Docs’ Javascript-only requirements on the very-optimized Chrome browser still feeling slow?…

That’s a bummer. (Click through to read thecookinggeek’s workaround. Spoiler: It involves leaving Google Docs.)

thecookinggeek:

Since I started writing posts on the Cr-48, I have been using nothing but Google Docs. Performance of Google Docs on the Cr-48 notebook is less than stellar, conjuring up memories of the severe input lag on old computers running Office 98.