Silicon Glen: Internet

Showing posts with label Internet. Show all posts

08 March 2016

Search 3.0, transformative big data and the road ahead

Introduction

You may be wondering the significance of the three Scottish flags in the image. I took this picture a few weeks ago. I'm in Edinburgh, there's 3 flags and this article is about Search 3.0 and preferably I'd like Silicon Glen (the Scottish IT sector) to take this idea forward rather than Silicon Valley being the home of the best search engine. So my aim is for a Search 3.0 search engine to be based in Scotland. However I'm open to ideas. First of all, a history in order to explain what search 3.0 is and what transformative big data is all about.

Search 1.0

Although I have been on the Internet since the 1980's, my first experience of the web was in 1993 when I was studying for my Masters in Large Software Systems Development at Napier University and wrote a research paper on cooperative working. I first had email at home since 1988 and used usenet a great deal so was an early adopter of the web and downloaded Mosaic, used early search engines such as Yahoo, Excite, Lycos, and so on when they first came out. I was particularly interested in Altavista when it launched in 1995 as it had the biggest search index at the time and also was built by my former employer Digital. I had floated the idea of a web browser to them in 1989 but that was rather ahead of its time then. The early search engines were interesting and their job was a lot easier than now as there were so few sites however as the web grew the unstructured web needed some order to it so that relevant results came to the fore in the ever growing web.

Search 2.0

Search 2.0 came about when the founders of Google realised that a ranking of pages would help produce more relevant results. Their January 1998 paper on search is available. The basis for this was that the human element of embedding links in pages could be used to deduce that the pages being linked to were more important because people had chosen to link to them. In effect, the human element of adding links allowed a computer algorithm to assign a rank to the pages and produce results which people found more valuable. It was also (somewhat) hard to spam the search index as it required the manual effort of the links to be changed. Trustable sites on .edu and .ac.uk domains also scored higher. Search 2.0 has evolved since then in which ever better and more sophisticated algorithms have tried to make more sense of the data which is out there on the web to produce even more useful results. Despite 17 years of Google, search is still pretty poor.

Try these difficult searches:

You are flying soon. Your airline allows you to take a 2nd bag 40cm x 30cm x 15cm. Your task is to buy a bag that fits. As a secondary task, find a site that allows you to search by bag size.
You are travelling alone searching for a hotel room in London. You require a double bed, ensuite and breakfast. You want a 3 star hotel or better. That in itself is quite hard because by stating one adult, you sometimes get twin rooms returned. Hotelscombined and others when you rank the results by price give you hostels. However, try combining this search to add within 10 minutes walk of a London tube station on the Picadilly line or the number 133 bus or some transport requirement and you're stuck. A bit tricky if you're disabled and want accommodation near a bus route without having to change buses or a tube station accessible by wheelchair.
You need to be at a meeting. Find a travel planner site which allows a portion of your journey by public transport to be swapped with a taxi provided it saves you time and doesn't cost more than £15.
You are a single mother returning to work. You seek a part time job that allows you to balance childcare and work from 9am-3pm Mon to Friday or from home. Your challenge is to find the website that allows you to search for this.
You're looking to move house. The smallest bedroom must be at least 6ft by 8ft. Find the matching houses. Would prefer house to be within 5 mins walk of a bus stop.
Tell me the flights which, allowing for connections at the other end get me to my meeting in London on time. In London you have a choice of 5 airports all with different prices and onward travel times. Let me know the total journey cost too (including by public transport)
You have forgotten a friends birthday but know what would be the ideal present. Find all the local shops open now within a 30 minute drive which have it in stock.
Find me all the used cars in the UK which comfortably take three adults in the back seat for a long journey.
Find me all the events on in my area. Surprisingly there isn't a predominant global website which does this.
Find me a job, such that duplicate postings by multiple agencies for the same position are eliminated. Also, show on the job advert the time expected to complete the job application process as I favour jobs without application forms. Let me know the commute time to the job.

(the above list is not meant to be exhaustive and I welcome additions for things that we would find useful, but which can't be searched on via a primary search engine such as bing, google, duckduckgo or wolfram alpha.

There are lots of data driven searches for products and services that are simply impossible on the current web. There are three reasons for this.

1. The data is not published at all because it is not gathered in the first place. A bit like a 1993 when I was campaigning for more smoke free areas in pubs, the first step was to get pub guides to survey pubs so we had a current state of the market and some data to work with and actual pubs to speak to about how smoke free areas affected them.

2. The data is gathered but is in a database somewhere that you have to query via an intermediate website. Such sites usually charge you to list there, unlike Google which is free. This is the likes of lastminute.com, autotrader, zoopla, etc.

3. The data is published but is not structured in any useful way - instead you get a page of content and somewhere on that page is the info you need and you have to manually scour for it. Such as Amazon listing the size of luggage on the listings page but not giving me a filter to search for luggage under a certain size. We could attempt to solve this problem by applying AI and a deep knowledge of human language to interpret each page but that is a hard job to do error free and extremely hard to to for all the world's languages. As a Gaelic speaker, I support minority languages and I wouldn't want the speakers of minority languages to be sidelined. Data, ideas and feelings are our universal language and speech is only an interpretation of these.

So here is where clever Google algorithms run out of steam, because of the lack of quality data. So to Search 3.0 and transformative data.

What we've seen is that the old business model of a newspaper listing advertisements hasn't really changed much for the internet age. Ebay, Zoopla, Autotrader - they are simply at the same position in the sales cycle as a newspaper used to be selling adverts and making money from the advertiser based on their readership. What's changed in 20 years?

Search 3.0

This idea isn't new, but I have been promoting it and winning attention for it, just not financial backing. In 2000 I entered the Scottish Enterprise "Who wants to be an entrepreneur" competition with an early version of the idea and my idea got recommended for a feasibility study by Ian Ritchie , leading Scottish entrepreneur and TED speaker. I also submitted it to a computer magazine which awarded it one of the top E-commerce ideas in the UK in Feb 2000. The issue then was funding due to the dotcom crash. Great idea, no funding climate. I suggested it to a crowdsourcing site in 2006 where it was called "The next Google". I blogged about it in 2008 and did a Google hangout with Google's Product Manager Director for Search in 2013. Still no traction. Becoming rather fed up with the huge mountain to climb in order to get funding, I feel rather like Queen being told with Bohemian Rhapsody that "the band had no hope of it ever being played on radio". It is the most played song in Radio 1's history. Even Steve Jobs got ridiculed when the iPod was launched. The product that paved the way to the world's most valuable company. Laugh away now. Sometimes the critics get it wrong.

To counter that I'm putting some of the idea out there because back in the early 90s on the Internet that exactly what people used to do. For free. I did it with the UK Internet List in 1992, the first online guide to Scotland in 1994 and Tim Berners Lee did it with the web in 1991. Link to original post (might not work on mobile browsers). Why do this? To advance the Internet. To encourage debate. To drive forward standards. To recognise this is the first time in the history of the planet where we have a global free platform we can converse on to exchange ideas and to make that a better place for future generations. This only happens once in a planet's history and we are lucky to live in that time. It would be great if we got it right for future generations.

Why not?

Search 3.0 - Layer 1. Data enrichment

I listed a few examples above of searches I've found frustrating. However this could just be me. I don't know what you find frustrating about the web, what you are looking for that you can't find and what you would like to do to change it. Google probably has an idea because it can track sessions and the long sessions searching for repeatedly similar subjects might be a good indicator that the data is poor but in order to open this up democratically I suggest the following approach.

I'll begin by assuming the refinement of search is based around improving the quality of related high volume e-commerce sites. The reason for this is that if you approach the idea from a VC perspective, this is where you might build the greatest economic value for the search first. However you needn't necessarily follow this approach if you're being altruistic.

Step 1: Identify the top search categories you want to specialise in initially, e.g. hotel rooms, job listings, rooms to let, restaurants. Cross reference these search terms against the first page of results in Google for these terms, for each of the world’s top cities. Just record the domains returned (including from adverts because the adverts are ranked for relevance). Store this in a database. Rank it by city size if you want a priority order. You now have a list of the top 2nd level search engines by product and locality. You no longer need Google.

For each of the above queries to discover top categories and locations, the same query is sent off to dmoz.org (the Open Directory Project). The categories of results which are returned are what are relevant here are opposed to the actual pages returned. So for a query on travel and London , the top category returned from dmoz would be: Regional: Europe: United Kingdom: England: London: Travel and Tourism

Now you can correlate the Google results by category using the two results above. Furthermore as the dmoz directory is hierarchical, you can build up a hierarchy of websites to allow users to refine their search results. You now have a hierarchical product and geography driven database which references the top websites in the product category and geography. It's still only a list of websites though, no products or services yet.

Step 2: Since we have no data at present for more sophisticated searching, the user is presented with the option to refine the results by keywords against the results returned. Something like “To refine the listings on the webpages below, please indicate what is important to you..”

In the example above, the user could specify grading, price, address etc.

The next user who comes along with a similar search sees the keywords the first user added and then votes for them and/or adds their own. Over time, the keywords entered by users would be shown in each category of search results in order of decreasing popularity. This seeding of the database would occur during the product alpha/beta stages so that there was already a dataset at the time of formal launch. What the site is doing is learning in a Web2.0 sense what criteria are actually important to people to drive the next phase – a popularity contest for how to drive the next phase “how would you like to extend the search capability”, something most websites never ask – they just give you options and it’s “take it or leave it”

Stage 3: you have a database of accommodation websites categorised in a directory from stage 1 and you have a list of how you’d like to search them from stage 2. Next, you send out a search engine spider to these sites, like Google, and spider them. Websites are usually built from templates in Content Management Systems and even complex sites might only have around 20 unique templates. So once you have figured out the templates, the data on them is usually just repeating patterns.

Some might call this site scraping but it's no different to what a search engine does when it indexes content. You should respect the rules from the robots.txt file and behave in a considerate way when indexing other people's content.

We're not dealing with a vast number of websites here and they are based on repeating patterns so joining them up is not so hard.

You open up a programmable interface to the site for markup editors, possibly in return for a share of revenue from the advertising revenue generated from those listings. The markup editors would use a tool such as Xrayhttp://westciv.com/xray/ to examine the contents of pages on the site to see where the relevant info occurs that people want to search for. This is effectively an open API for a site scraper tool. mysupermarket.com demonstrates that site scraping works and rather than running into copyright issues all that is happening is an intelligent parsing of the site, rather like a search engine robot. There is nothing particularly revolutionary here – besides mysupermarket for groceries, the same concept has been applied to workhound for jobs and globrix for housing – however these sites are all narrow vertical markets, limited by geography and do not interact with users to extend their search capabilities.

Sites could ban this parsing if they wanted via the standard robots.txt and instructions would be available on how to do this for site owners. The site editors, guided by the top search terms from Stage 2 then indicate where the relevant content is on the page. For instance if you were parsing lastminute.com, the price information is after the text “total stay from”. If you were parsing job listings, the salary on Jobserve is next to the bold “Rate” keyword and so on. Although this is a non trivial job, the existence of existing site scrapers shows this can be done, plus XML/RSS feeds from the site provides additional scope to help with the parsing. The spider would only be sent to sites with a certain minimum (1000?) number of pages (as seen by Google) to ensure that only content rich sites were indexed. The volume of pages returns also gives you a good data set to teach the parsing technique.

Step 4: Once the top sites have been parsed in this way, the parsed information can then be used to drive subsequent searches. Supposing the price info had been parsed, the price keyword would show as bold on the search results indicating that data was available; this would then allow the user to refine further on that option. So we have built in this example a search engine that allows the user to search for hotels by price across all the top relevant accommodation search engines. Exactly the same pattern could be used to write a search engine for jobs, real estate, electronic goods for sale ultimately arriving at a search engine that is like ebay in terms of refining listings down to a level that the user wants, e.g. mobile phones with wifi, 5megapixel camera, etc etc. This difference however is that ebay charges for a listing whereas this is a general search engine that points off to the original site, allowing the product to be listed on the site for nothing. You might call this the professional consumer who knows what they want to search for (I have prosume.com for this purpose)

Step5: Complete world domination! (only kidding)

Having targeted the big sites for useful listings and built a really useful product and service search engine, categorised by product type, location and searchable by top keywords we now get to the bit were the Internet as we’ve known it can really change massively. VCs interested in buzzwords might call this disruption. I suppose it is because you no longer need to pay to advertise.

Until now, if you had a specialised product or service then in order to get it really noticed you had to submit it (usually at cost) to a specialised search site. If you have a property to sell in Edinburgh, you put it on ESPC.com. If you have a job, you add it to S1Jobs.com, etc. However, just as Google can index individual sites and list them, the same should be true for products and services. It shouldn’t be necessary to list them on some other site, you should be able to list the products and services effectively on your own site and have them searchable for free, just as Google indexes simple web pages for free. Why not?

How is this achieved?Having followed steps 1 to 4 above, let us assume that we want to allow people to list a job without having to pay to do it on a job search site. Jobs come from agencies and employers, so in the search category listing for Jobs in UK derived at stage 1, you publish a “get listed here” guide. The guide would refer to the top parsed search terms (derived from stage 4) and the format this needs to appear in on the webpage for it to be successfully parsed. So for a job listing you could require that there are bold fields “Min salary:” and “Max salary:” and next to these the salary information is stored (alternatively this info could come through in the site’s RSS feed). Thus any site can be added provided it can be easily parsed. What is especially exciting is that the search terms are of course driven by users so there is scope here to go well beyond the searchable terms on existing sites. For instance, users might want to search for jobs that are accessible by public transport, yet no job search site offers this. Disabled people might want to search for jobs that they can access from a level entrance (an option already available for tourist accommodation searches). Part time mums might want to search for jobs by specific working hours etc. Asking users how to improve search is a unique feature of this site. By specifying the enhanced template for listing against new criteria, sites would have an incentive to provide this information to make their listings more relevant and searchable.

How do users generate this data? Since the data format is open source, the tools will be freely available and could take the form of web apps, wordpress plug ins, CMS extensions and so on. They would be updated in real time to deal with updates to the agreed schema.

Where is the value? With open data, there is opportunity for competition - sites can bring the data together in new and interesting ways as we've seen where this has happened with government data. There would be competition for the data in terms of who built the best sites around it. There would be entrepreneurship in taking the data forward, rather than the world of jobseeking where the schema hasn't moved in over 20 years. There would be integration of the data with existing apps to make them more useful. There are lots of opportunities.

The net result is a search engine with the power of eBay's searches, the breadth of Google, the profitability of PlentyOfFish or Jobserve (85% profit) scaled up and the usefulness of Amazon, driving and expanding search according to user preference. With more openneess there is also more decentralisation and less need for high end centralised expensive data centres which usually the consumer ends up paying for in some form.

Besides products and services, there is also my data. I want to control who I share with whom. Like driving a search, the more I share the more relevant info I might get in terms of services, but ultimately it's my choice rather than the number of mandatory fields the people in your business have put up.

This isn't intented to be a polished article. It's open to inspection, adaption and improvement. As the Internet's data should be.

Craig

Original article at https://www.linkedin.com/pulse/search-30-transformative-big-data-road-ahead-craig-cockburn please also feel free to comment there.

29 November 2013

Time to change the social business model

As part of the Scottish Government Digital Scotland rollout, I would like the government to require companies over a certain size to offer a proportion of jobs on their books as distance working. This technology has been available for decades and for even longer companies have effectively outsourced work to different continents and had company subsidiaries that "remote work", so why not individuals?

Having more people work from home would relieve pressure on city housing markets, encourage the rural economy, help disabled people who have transport difficulties, encourage a development of digital technologies, make Scotland a leader in a social revolution away from the 19th century mentality of going to a factory for everyone to work in one place, encourage employment in rural areas, promote Scotland as a place of work in the global marketplace, reduce over congestion in cities, promote rural economies, assist parents and carers to balance childcare and work, decrease the need for transport (and hence be green), reduce the need for children to move when their parents change job and above all increase social stability.

All those advantages and more.

Yes we need a digital infrastructure although large parts of what we need have been in place for years, but we also need social change to go along with it. Can the government take a lead on enabling the social change to take advantage of the new digital infrastructure being created?

Craig.
West Lothian

Update 19/05/2014
I contributed to the Scottish Parliament Fathers and Parenting Inquiry, the report is now available. I am pleased to say my comments, ignored by The Scotsman, have been taken on board. Relevant extract below:

95. She highlighted the role of the parenting strategy in ensuring flexible working and other family-friendly policies were available to parents. That is a powerful tool that the Government has. We have the power to influence change in the national, top-level sense. Legislation is another powerful tool that the Government has to make further cultural change. We are also working with employers to support them in creating workplaces that encourage a better work-life balance for everyone. So that we can help dads to thrive at home and at work, we have formed a new partnership with Fathers Network Scotland, the parenting across Scotland group and Working Families to try to change the way that Scotland‘s parents live and work.

96. The Minister also brought attention to other approaches the Scottish Government was taking to encourage employers to offer flexible working. At the Institute of Directors awards tonight, we are sponsoring an award for companies that have shown excellence in providing family-friendly flexible working practices. This is the second year that we have sponsored the award, in order to work with a group of people who would not normally engage with this subject and to showcase the way in which businesses are doing their bit to allow families to have a better work-life balance.

Conclusion
97. We were not surprised to find that much of the evidence we heard on childcare and flexible working echoed what we heard during our Women and Work inquiry, but are concerned to find that not only do these issues keep women from actively participating in work, they keep fathers from actively participating in parenting. The imbalance in parental leave entitlements and access to flexible working arrangements are clearly a cause for concern. The Scottish Government has shown a drive towards improving the situation, and, as in our Women and Work inquiry we commend the Scottish Government on its approach and ask that such issues remain a priority in implementing the Children and Young People (Scotland) Act 2014 and the national parenting strategy. In responding, we ask that the Minister include an update on progress made against the recommendations made in our Women and Work inquiry report.

26 June 2012

UK Internet List 20 year anniversary

In June 1992 I started the UK's first guide to getting online. It's 20 years old today

See also
http://blog.siliconglen.com/2002/06/uk-internet-list.html and Facebook timeline, 1992

21 June 2011

The future of social media

Recently, French media banned the words "twitter" and "facebook" on TV. Well, that's what the press would have you believe. Here is the original decision in French. The real (not hyped up) story is that TV networks are now only allowed to mention Twitter or Facebook if the story is about Twitter or Facebook. Anything else, such as "follow TV station XYZ news on twitter" was deemed to be advertising on a public network and broke the rules. Fair enough I thought, seems to make sense to me. Why should a social network get free advertising if it isn't the subject of the news story. Pete Cashmore of Mashable thought this ban was "Ridiculous". I disagreed and posted my thoughts supporting the ban on Mashable, and the next web and on Hermione Way's wall (5th June). One response on The Next Web wrote "For a "noob" you certainly put forward a good point. I think what you proposed would be a great idea". Facebook and The Next Web, please note though - individual comments should have their own URL so that people can link to them and so says Tim Berners-Lee.

So that's the background to this post.

As Tim says, Facebook is becoming a closed silo of content. It wants things that way so you have to go to facebook. Advocates of an open web want the content fully opened out so that you can link to individual comments and work with all the data. Facebook doesn't want to go down the way of Myspace as little more than a hosting platform.

However, my point in relation to the French story is that the news networks and indeed anyone else shouldn't be asking anyone to follow them on twitter or facebook or any third party site. You don't need to go to twitter to see my tweets, you can see them here courtesy of a widget. You don't need to visit my blog with a web browser, you can subscribe to the RSS feed and read it in an RSS reader. When I tried to define Web2.0 in 2009, I wrote that a large part of it was the sharing of data between different sites and applications. The same should be true of social media.

When I go to Facebook, I get the Facebook branding and advertising. When I go to Twitter I get the Twitter branding and advertising. Do respectable major brands really want or need this? Sure they need to engage with their prospective audiences but on a 3rd party URL with 3rd party branding and advertising (possibly from competitors) isn't really the ideal platform. Does anyone remember Geocities, Netscape, Digital or CompuServe? None of those links work anymore. 3rd party sites that have been wound up or bought over. Only today Club Penguin, value $700m, has gone offline because Disney forgot to renew the domain in time. that wouldn't be so great if it was your twitter or facebook campaign in dust would it? Where's your service level agreement with facebook or twitter? You don't have one? What happens if like Google you suddenly lose access to all your data or your flickr account gets deleted? Oops. Not such a good social strategy anymore was it giving out all those 3rd party URLs you had no control over, no backup strategy and no compensation if they are down. The ideal approach would be to edit and store all my brand's social media content locally on-site and push it to relevant social networks as needed, thus giving me some fall-back should they foul up. A site central social media dashboard, shouldn’t be too hard to put together – think of it as a super TweetDeck - sold for £25m and it was mostly Twitter oriented rather than a general platform.

Big brands should be looking to protect and consolidate their identity rather than dilute it across the web. If I want to engage with a trusted brand, I should be able to do it on their site. Their social media landing page should show me what's going on in the world of social media for the brand in as much one place as possible. It should blend discussions, videos, updates, "likes" and conversations in a view that has the brands look and feel but also their advertising rather than their competitors. Social media platforms are just that - platforms. Thanks to Mark Zuckerberg and the likes we have great platforms that reach millions of users. Just however as you can "like" my page at siliconglen.com without actually going to the Facebook site, a brand should be able to embed their Facebook fan page within the brand's page. They should be able to embed their twitter feed within the brand's page too. Some sites already do this, see my profile on mywebcareer. It shouldn't be difficult for a brand, on its own domain, with its own branding and its own control to behind the scenes connect up with its different online presences in social media to present one consistent view in one place. I used to have a LinkedIn profile at https://www.linkedin.com/in/siliconglen , for a while LinkedIn changed it to http and then all the links broke. I think it is easier now for me to just maintain the brand at CraigCockburn.com so I am in control if LinkedIn change their policy again. Big brands should take note, if they manage to control their social media presence through URLs they own rather than a 3rd party and branding they own rather than a third party then of course the French will have shown us the way and no-one will need to advertise Twitter or Facebook anymore. Just say "non" to 3rd party branding, URLs and loss of data!

Is this the future of social media we really want? Feel free to share!

Craig

01 December 2010

London school closure status

Please visit opencheck for the latest info on school closures in London.

For a quicker way of getting the info just for your school, you might be interested in the following steps. Once you have done this once, you can bookmark the page for quick and easy access in future.

I'll use Abbey Primary school in Sutton (the first school in the Sutton area dropdown) as an example, however this will work for any London school in the scheme.

Goto edubase and search for your school there. Enter the relevant details for town and establishment name to search for your school. Click on the establishment name in the search results and in the example above this will take you to this page.

Note the LA number (in this case 319) and the 4 digit school reference (in this case 2012) and combine them together: 3192012.

Now put that number into this URL https://support.lgfl.org.uk/public/bits/opencheck.ashx?code=3192012 (replace the 7 digits at the end with the number for your school)

Bookmark this page. Now you have a quick one-line up to date status for your school that loads very quickly.

Hope this helps.

Craig

08 November 2009

End My Credit Crunch explained

People have been writing to me about End My Credit Crunch (http://www.endmycreditcrunch.com) and how it works, so here's a quick explanation.

There a twitter account @uncrunch, with a supporting website at the URL above hosting banner adverts and more info.

The adverts are sold at 5c per follower per month. With 220 followers, this means that an advert for a month in the twitter account and on the website would cost $11, i.e. $11 to reach 220 followers plus be seen on the site. Compare this to www.tweetvalue.com which values twitter accounts and at the time of posting, it values the uncrunch account at $125 per tweet. We're less than a 10th of the cost, and you get two tweets and a substantial standard size banner advert too, surely excellent value - and you're helping others too.

We're assuming worst case scenario that overheads including corporation tax, VAT, hosting etc will be 50%, so that leaves 50% profit. Out of the profit I'm hoping to take 1/3 to help my family through the credit crunch (debts due to loss of job; 2 family members with terminal cancer; two children at sick kids as well, one due an operation next month; having to sell house) and give away 2/3 to help others. The others will be in 3 forms - 1: a proportion as a winnings pot for anyone following the twitter account - win for free and an incentive to follow. 2: A proportion for people who refer advertisers and 3:a proportion for charities.

If we get 1000 followers and sell 150 adverts then this would mean we'd be giving away around about 1000 * 0.05 * 150 * 50% * 2/3 or about $2,500 per month - as the number of followers goes up so does the amount we can give away. The cost for advertisers to reach people stays the same at 5c per follower.

This could easily scale up to much bigger numbers once the momentum builds, there's twitter accounts with millions of followers out there. Feedback from the press has been really positive we just need more people to get the word out and to reach more people via blogs, newspapers etc. Even Max Clifford's PR company thought it was a good story but unfortunately couldn't take it on just now. The account is being followed by lots of media companies, including The Irish Times, The Times (London) and the Los Angeles Times amongst others.

I'd like to give away $250,000 which would represent 10 months work if we get 10,000 followers and sell the advertising space. It would be disappointing all round if it was a lot less than this and a dream come true if it was more.

thanks

Craig

10 October 2009

Web2.0, a definition

People ask me what Web2.0 is. This is my explanation, hope you find it useful. It's hopefully a bit more readable than the definition on wikipedia. I also follow this with some information about Web3.0.

You may have heard the term Web2.0, a term first used in 2004. If you ask an expert what it means you'll probably get differing answers depending on who you ask because there is no real clear definition of it. So this is my one.

There are two main feature of Web2.0 which distinguish it from sites that aren't Web2.0.

Web2.0 is about people creating their own content for publishing online
it is also about the supporting technology for this content

It is easier to explain Web2.0 if you set it in context of what there was previously.

In the early days of the web, despite it originally being conceived as a document sharing and editing environment, the editing part rarely happened. Early sites were generally about a company, organisation or individual producing content, publishing it on their website and then people reading that content or transacting with it, e.g. reading the news on-line or buying a book.

However, following the emergence of blogs it became easier for larger number of people to author their own content and have others comment on it, just as you can do here. Similarly, Amazon allowed others to post their own reviews. This activity, together with the very long standing Internet tradition of news groups, forums, bulletin boards and so on going back to the 1970's - all these came together to form the early implementation what we now call Web2.0.

When you consider that most people think of Web2.0 as twitter, facebook and other similar sites they think of it as a social platform which allows them to publish their own content easily and share it with their friends. However, this facility has been around on-line for almost 30 years. In 1979 with the invention of usenet groups it was possible to easily share content online and from my own personal experience I used to run a mailing list called Gaelic-L that was founded in 1989 and allowed people with similar interests to share content with their online connections even way back then. In 1990 I also proposed an early browser with user generated content and personalised news, based on the fact that many people were by that time doing much of that anyway.

Web2.0 is therefore more than just being able to publish content and share it with your friends, this has been possible for decades, it's about the types of technology that make it happen as well and how these combine together. In the early days if I wrote an article in a newsgroup, people might reply to it. With Web2.0 you can not only reply to it but you might be able to vote on it and even edit the original, this is how wikipedia works - people collaborate together using a wiki as a tool for sharing information. The articles in a wiki are often authored by several people rather than just one. Similarly it wasn't just that blogs made it easy for people to write their own content, the platforms they used to write their blogs held and published the content in a structured way and this allowed the content to be easily reused in other contexts using a technology called RSS (Really Simple Syndication). What this means is that you didn't have to go to the blog to read the post, you could pick up the notifications of new posts via an RSS reader or another website entirely. Sites can also publish a programming interface called an API which can support the same functionality as RSS and more besides. RSS feeds are particularly useful at following new content - e.g. new news article, new blog posts or more specialised searches such as new jobs matching your requirements on a job board. API calls are better for more generalised searches e.g. "how many twitter users are based in Edinburgh" or "Who posted the first tweet about Michael Jackson's death" or "give me the data to plot a graph of the number of times President Obama's Nobel prize was mentioned in the hours after the announcement was made", etc.

As an example of RSS in action, my posts here automatically feed out to twitter and friendfeed. My friendfeed is then published on my facebook pages. This sharing of data across many sites and applications and interpreting the content in different ways is one of the key distinguishing features of web2.0 over web1.0. This is quite a long post, too long for the 140 character limit for twitter, but the connection between my blog and twitter takes care of that. Similarly when I post something new to the photo sharing platform Flickr, it also appears via a link on Twitter even though twitter doesn't directly support photos - the sites all interact with the same content but in different ways.

Taking this example of data sharing further you can combine (mash) information from different sites to produce something new, this is called a mashup. An example might be pulling in data from Google maps, geotagged photos from Flickr, public rights of way information from the government or council and accommodation information and reviews from a hotel booking site. Combining this information together using the publicly available data would allow you to show walks overlaid on a map together with examples of the views you could expect to see along the way and recommended places to stay en-route.

So Web2.0 is about people creating content (blogs, photos, statuses) together with the supporting technology (facebook, wikis, twitter) allowing this content to be shared, connected and reused in many different ways. It isn't really about endless "beta", rounded graphics, pastel shades and large fonts although these are incidental elements of the Web2.0 scene.

Just as there's no single definition of Web2.0, there is even less clarity about what might come next for Web3.0. The leading consensus is this will be about the semantic web. This represents a bigger challenge than web2.0 because it is about taking the largely unstructured and often ambiguous content on the web and tagging it in ways that allow it to be more clearly defined and reused. For instance if I type London Bridge into Google, there is no way at present to distinguish if I meant the actual bridge itself, the railway station with the same name, the underground station with the same name, the hospital with the same name or the bridge that got shipped to Arizona. Another example is differentiating text with a particular meaning from the same text that occurs by coincidence - e.g. a Digital Will is a type of Will (a legal document for when someone dies) that covers digital assets such as your emails, photos, MP3s, on-line contacts, etc. However, if you search for this term in Google you get some references to both the legal document but also the same phrase occurring in entirely different contexts such as "Digital will overtake print" and "Western Digital will move to Irvine". The semantic web will not only help to classify how words are used from a linguistic point of view but it will also allow content to be queried as data - for instance on a restaurant website you could mark-up your opening hours and this would allow people to search using a semantic search engine for restaurants open at a particular time of day. The biggest challenges faced by Web3.0 are in agreeing the common vocabularies and then deploying them effectively across the billions of web pages that already exist.

As you can see, although Google is quite good at being able to find pages containing certain terms it is currently very poor at making sense of the data in a structured way. This is because without the data being marked up in a semantic way (either through the use of markup directly or by attempting to deduce the context), it is an exceptionally difficult task for a search engine to provide this functionality. Web3.0 will make this job a lot easier but the means by which Web3.0 will emerge is still unclear. What we do know though it that it should make searching for information a lot more powerful and specific. Google is also exceptionally poor at searching sites that already have structure - for instance if I wanted to find a hotel room for tonight I would use an accommodation search engine and Google would find me the site which listed the accommodation rather than the accommodation itself. Google can't tell me what rooms are available tonight but it can point me towards sites that are likely to have this information. This will all change with Web3.0 and the use of intermediary sites will significantly decline as the information they hold begins to open up to more generalised search engines.

I hope this has been helpful. If anyone is looking for a Web2.0 or Web3.0 specialist, please get in touch via craig@siliconglen.com, twitter, facebook or linkedin.

Craig
I do Internet things, manage large websites, play around with language, campaign for good causes, try to explain things and have fun singing along the way (not all at the same time!).

19 July 2009

End My Credit Crunch

This post is to announce the holding page for End My Credit Crunch.

The site announcement is being made on 19th July at 19:48, exactly one month to the minute after having the idea. This moment was captured on twitter as I stood at Dublin airport.

The background is that a week or so previously I had been preparing an article for The Times following a tweet from Times Money about how the credit crunch had affected my family and the various problems with jobs, income and working extensively away from home we were having as a result.

That article was supposed to appear on 13th June but unfortunately, got knocked out by other news even though they thought the article would work really well and they felt my situation was really tough. However, such is the way of newspaper articles. Hopefully this one will get a lot more coverage :-)

So after a few days thought following that setback, I tried my hand at coming up with an innovative way of getting out of the Credit Crunch rather than having the story printed in The Times. A month later, here we are and the site behind the scenes is nearly ready.

We are now building the user base ready for a successful launch. Please visit endmycreditcrunch.com and follow us on twitter or drop me a mail on info@endmycreditcrunch. We would be particularly keen to hear from anyone wishing to advertise on this novel twitter related site, we hope to successfully monetise twitter to generate income that can be effectively shared out to good causes and people affected by the credit crunch. Our aim is to give away $250,000.

If this gets to be as popular as the Million Dollar Homepage then you will be glad you got in early. Unlike that site however, we are taking standard banner adverts 468x60 that make an impact. Also unlike that site, our aim is to give away a large portion of what we raise to help as many people as possible.

End My Credit Crunch - a British business helping charities and people world wide out of the crunch, let us uncrunch you!

Craig

01 May 2009

Gaelic mailing list celebrates 20 years

The world's first online group for a minority language, Gaelic-L, celebrates its 20th anniversary today (latha buidhe Bealltainn, 1st of May). All the archives going back 20 years into the days before the web was invented are available at the Gaelic-L archives. Enjoy!

Craig (former co-owner, Gaelic-L - 1992)

09 April 2009

Fulltiltpoker.com, stickiest site on the Internet

Figures according to Nielsen on-line, February 2009. see here for full breakdown.

Craig, Web project manager, Full Tilt Poker.

Available from mid April for my next opportunity, LinkedIn.

01 May 2004

Scotland Guide - 10 years old

The First Online Guide to Scotland is 10 years old today.

I appear to have acquired "Legendary" status because of it!

25 November 2003

Some praise (shucks)

soc.culture.celtic

As was the case with many others, my first access to the internet was via a Freenet. I spent a good deal of time reading the soc.culture.scottish and the soc.culture.celtic Usenet groups.

I note that my submissions to the soc.culture.celtic faq are still up there, floating around the web for almost ten years now. Cool.

Craig Cockburn was then, and likely still is, one of the web's main Celtic and Scottish history/culture guru's, not to mention Scots Gaelic language.

Excerpt from The Campblog Archive

26 June 2002

The UK Internet List

Britain's first guide to getting online is 10 years old today

Silicon Glen

Total Pageviews