Should PR support web scraping as online agency?

Ryanair recently won a case against a German tour company it accused of scraping its website; whilst The Irish Times reports the airline is taking action:

over the alleged “screen-scraping” activities of Bravofly breach provisions of the Trademarks Act and the Copyright and Related Rights Act, amount to “passing off” and also breach the conditions for accessing the Ryanair website.

Other companies welcome linkages to increase traffic, and may enter into marketing arrangements (eg with price comparison sites or as affiliates) that offer financial benefits to either or both parties.

David Philips in “Online Public Relations” cites agency as one of three phenomena identified by Anne Gregory as important to online PR (alongside transparency and porosity):

‘Agency’ is the process of transformation of a message as it is passed from one person to another, and acts through the application to the original data of new context and understanding… Agency is of itself neither benign nor malicious.

PR practitioners are masters in using agency in respect of securing third party endorsement, traditionally via mainstream media reporting of messages.  We seek to ensure influential people are advocates for our organisations, helping build its reputation. 

The idea has always been that the wider our message spreads – especially by positive word of mouth – the better.  So, the internet, email and social media add velocity to the power of PR to spread a message – and provided it is not transformed negatively, everyone is happy.

Except, there is increasingly the issue of copyright and protecting intellectual property, with commercialism coming head to head with opportunities to maximise access and reach online. 

When PR practitioners send out a press release, they want its content to be reproduced as widely as possible. 

Should online content be considered in the same way as a press release or do bloggers have copyright control over its use?  There are issues around whether or not it is okay to link or quote, although generally it is accepted practice (but check whether the site authors stipulate preferences or terms of use).  I believe that, as with academic writing, it is good practice to acknowledge sources with links or hat tips and avoid plagiarism. 

Back in May, Zoe Margolis picked up on a growing trend for mainstream media to use blog content without permission or payment.  In particular, the case of JonnyB comes to mind, where a 392-word post was used by the Mail on Sunday – and although the paper did subsequently pay Jonny’s invoice, its apology contained the pretty amazing statement:

We generally take the view that blogs published on the internet have already been placed in the public domain by their authors and, in case of amateur writers, most people are happy to have their work recognised and displayed to a wider audience.

The Mail on Sunday attitude that blog posts are free to use as they are in the public domain is naive at best – especially when such newspapers protect their own copyright (including when this is simply reproducing corporate press releases).

Greenbanana has been quoted from twice recently by PRWeek – and I’m happy for the exposure.  If I had been asked, then I would have approved the use.  My posts are also picked up through spliced feeds eg FeedBurner’s PR Network and I use RSS to feed my blog to my educational website, (alongside feeds from the Guardian and BBC).

It is nice to be read – but there is a new trend where other bloggers lift an entire post.  This has occurred a couple of times for posts at PRConversations – including my recent musings on Google juice and digital dirt, which has been reproduced at Trimedia: Blog | Europe’s leading PR Agency & Communications Consultancy.

I have not been asked for this post to be used in this way – and it implies an involvement (possibly even an endorsement) with Trimedia in Switzerland, which does not exist.  If I’d been contacted, I would probably have agreed to author a guest post or allow this work to be cited, but my words have just been scraped.

Online usage of information is a grey area, being tested by case law as well as conventions and codes (such as the CIPR social media guidelines or the Blog Council toolkit). 

I believe there needs to be a balance between enforcing laws to protect originators whilst enabling their work to reach a wider audience (if that is what they want) as the agency of online communications allows. 

But scraping by commercial organisations is normally done for a vested interest, and one would expect the companies whose information is used should be asked to agree to this.  The principle should also apply when lifting the majority or entirety of a blog post (whether for online or offline publication), even when acknowledgement is given in the form of a credit or link.


  1. Good post Heather. I’m usually pleased with other’s cite my work. I like to be acknowledged though. I recently had to ask one blogger to take down some content that had been lifted from my website word for word without attribution. In fact is was a description of our services being used to represent their own. Shocking!

  2. Heather, you hit the nail on the head. Agency is both a human and a technological phenomenon of the internet. It can be a combined process as well. Scraping, and much of it is automated is agency but it is evolving.

    The application of web widgets means that content from a number of sources can be mashed together. In an application I have developed, it is possible to scrape text from media sites and automatically analyse it in such a way that it will both re-write a range of news subjects into new dominant themes. A sort of automated sub editor. The result is that, with levels of confidence, one can predict news.

    And this is where we have problems. In using others content are we standing on the shoulders of giants or are cutting them off at the knees?

    Its a strange new world.

  3. This is a grey area, Heather, and I have to admit I find myself a little conflicted. I was aware of the Trimedia scraping and was not personally offended, though I did not know the author of the blog had not contacted you.
    Because of my academic backround, I cannot stress enough the importance of citing. As PR pro’s, intellectual property IS our product. Is it so much to ask for recognition where it’s due?
    Great post on a pertinent topic.

  4. I have had an email from Trimedia – they have agreed to contact author’s rather than simply scraping copy onto their site, which is a step in the right direction. Shame they didn’t engage with my comments here themselves, but…

    David – I think we will need to think more about content in future and how agency will be automated. This is beneficial on the one hand – I read today about how the insurance companies feel comparison sites have expanded the market. But when our work is scraped or mashed, I feel there still ought to be some involvement of the originator.

    It is a strange new world – which makes it so fascinating.

  5. Here is a post regarding techniques for ‘Scraping your way to RSS feeds’ albeit in a non-programmatic (layman) way:

  6. Matt Wardman says:

    This is the Code of Practice I created two years ago:

    Imho Excerpts are fair game if they are used to drive traffic to the source rather than take it away.


  7. Matt Wardman says:

    >I have had an email from Trimedia – they have agreed to contact author’s rather than simply scraping copy onto their site, which is a step in the right direction. Shame they didn’t engage with my comments here themselves, but…

    That depends on being able to contact authors, which is difficult sometimes in blog-land.

Comments are closed.