AI, and what to do about it

Category Webinar

Recorded March 03, 2026

Published March 04, 2026

People

James Rosewell

Alex Hemingway

Additional Links

Data Labels

Robots.txt generator

AI identification

Movement for an Open Web

BBC robots.txt

Transcript

We're kicking off a series of these webinars.

The first one is on a really hot topic and it's a topic that is very close to my heart from the industry that I come on, which is brought come from, which is broadcasting.

Um, and, you know, I I feel very conflicted, you know, I come from a place where I use AI regularly.

But I'm also coming from an industry that seems to be in the middle of huge amount of panic and turmoil over the use of AI and the use of our assets essentially to train and to generate new information from AI.

I think we should sort of start off by setting out the parameters for this discussion, which is the kinds of AI that we're talking about today.

What are they?

Yeah, so you've got AIs that can be trained on, say, medical data and help create breakthroughs in medicine, for example, where they're using data that has been provided wilfully and with complete knowledge, you know, for that particular use case.

And then you've got AIs that are effectively creating new content from other people's content.

And that's really the kind of AI that we're concerned about at 51 degrees, is making sure that people who own copyright protected content can control how that content is made available and how it's used and how it's used in derivative works.

That's what an open web needs to flourish. That's really what we're going to be talking about today.

So we're pro-AI.

We love innovation.

But we are anti-theft.

Yeah.

I mean, I suppose it's a very obvious point, but just to state this early on, you know, the genie is out of the bottle, isn't it?

You know, you, you, it's not going back in.

So you have to have a position on it and to kind of embrace it, but also be cautious of it is probably the best approach.

Yeah, I mean, I think as far as people using AI is concerned, if, you know, there's only one thing we get across through this webinar on the work we're doing is just be aware about how the product that you're using has been made.

You know, you do that in in, you know, automobiles, you do that for food, um, be aware of how the AI service you're using has been created.

That's the, you know, that's the 1st start for publishers who are, who are a B to B business, so publishers, content owners tend to be our customers.

We want them to be aware through this webinar on the tools that we provide, what they can do to rebalance that equation and make it work for them, not just the AI companies.

Now, when we've talked about AI in the run-up to doing this webinar, you've obviously been schooling me quite well because it's an area that you are much more knowledgable than I am about.

But I really loved your description of the being kind of 3 kinds that we should be considering, which are the good, the bad, and the ugly.

So what do you actually mean by that?

Yeah, so you have the AIs that are prepared to identify themselves when they access internet content?

Um, at least uh, at least make an attempt to identify themselves?

Those are the ones that we considered good?

Then you have the bad, the ones that pretend to be humans.

They're a lot trickier to deal with.

And then you have the ugly.

And the main ugly is Google, who will access sites in order to get information to help with discovery, which often content owners want, but at the same time will take information that they consider to be freely available, and then use it to create their AI products, and each of these different AIs has different treatments effectively as to how you go about dealing with them as a publisher.

So if I understand this correctly, then if we start with the good, the good AIs are the ones that are very transparent about what they are and make themselves known to you.

How should publishers be handling or dealing with that kind of thing?

Well, there's 2 parts.

So there's the 2 parts of my outs.

One is the technology, and then the other is the policy.

So let's start with the technology and unsurprisingly, the policy is going to be for each individual publisher to decide.

So when a device, when anything happens on the web, you have what's called a client and a server, so maybe our phone would be a client, if you're a human, that's your access device effectively or your desktop or whatever it might be.

When a crawler is accessing a website effectively, it's a machine doing the same thing.

And we tend to call those crawlers or robots or spiders or, you know, terms like that.

But they can identify themselves through something called a user agent.

This is one of our superpowers as 51Degrees.

We really understand user agents, and they can put what are called product tokens into that user agent that identify themselves and adhere to a common format and a common schema and a common protocol.

So it's 51Degrees.

We can catalogue all those different types of crawlers that we see out there and look at those little product tokens.

And then what we can also do is form a view based on the published information that's available about what those crawlers are doing.

Okay?

So you might have a crawler that identifies itself as pinged them, for example.

All it's doing is checking for the health of a website.

Is it alive?

Yeah?

So it's a robot that's just saying, is this site alive as well?

Yeah?

So that could be called monitoring.

Yeah, you might have a robot that is accessing the website for analytics, uh, to perform some sort of analytics function.

Search is another popular one, then as we move into AI, you'd have training, so large language model training.

You then might have RAG, which is grounding, so accessing recent data in order to support an AI answer.

So there's lots of different crawler usages.

And we go through as 51Degrees, a wonderful data team, go through, and they catalogue all the different crawlers that are out there, and which of these common uses that people might understand that crawler can be used for.

And we also put in some information that helps our users find out more about that crawler so they can contact the operator.

So that's the 1st thing, is having the technology that helps you understand the crawlers that we consider a good crawlers and the good meaning that they're prepared to identify themselves.

Then we move on to the policy.

So there's something called robots.txt.

So this is a very simple text file. Been around for 30 plus years.

You can pretty much go to any domain on the internet. And just put slash robots.txt.

You know, try it now.

And you'll see there's this file.

And that is designed to tell these crawlers what they should. Do.

So it's like the preferences that the website operator has in relation to how they would like to be crawled.

So they might disallow pages.

They might allow pages.

They might restrict based on those product tokens I mentioned earlier, so they could say Bing can do this, Google can do that. You know, that sort of thing.

And these are effectively signposts.

There's nothing technically that requires anyone to follow these, but again, a good crawler will follow those instructions and only access the content that they're allowed to access.

So what you can do as a publisher for these good crawlers is rather than trying to deal with each crawler individually, those gazillions of them, is you can say which uses do you want and which uses do you not want, and then you can put that into your robots.txt file.

What 51Degrees do is if you go to the website, 51Degrees.com, then the developer's menu and then you'll see robots.txt, you can go in there, select the usages that you're happy with, you can see all the different usages and the table on the right-hand side, put in your email address and then we'll email you whenever crawlers change with an updated robots.txt that reflects those rather high-level preferences associated with those good crawlers.

Of course, and I'm sure we're going to talk about this in a bit, you're going to ask me about them in a pit, is there a certain combinations which just the crawlers don't support?

So that example of Google earlier, you don't want to necessarily block Google because that's all robots.txt can signpost.

It's like keep off the grass.

It doesn't say come on the grass for these reasons and not for these reasons.

It's just a signpost.

So if you were to say, I'm happy with search, but I don't want indexing, then certain crawlers just don't give you that option.

Unfortunately.

So have a look at that tool.

What you can also do is use that information in real time.

So as I said, robots.txt is just a signpost.

What you can do is take that data and use it in real time to redirect those crawlers to different content.

So we do this on our own site, 51Degrees.com.

If you go to the slash AI page or um services slash AI, you can see an example of this page.

So we actually send AIs for certain types of content, not all our content, some of it, we're happy for AIs to see, but we don't really want them taking databases and reverse engineering them effectively.

We can send them to an AI notice page, which basically says, if you want to license, we'll give you a license, you need to contact us and go human to human.

Of course, that treatment is going to depend on each individual publisher's policies and what they want to achieve from AI.

Great.

So just to summarise then, what you're saying you can do is you can, with your technology, you can recognise the good AIs and you can use this robots TXT trick, if you want to call it a trick, to point those good AIs in the direction that you want them to.

Yeah, I mean, whether it's a trick, it's been around for a long time, it goes back to kind of pre.com boom, basically.

It's been around for a long time, but it is merely a signpost.

The key really is that everyone respects the signpost.

And I think what's happened is some companies have decided not to respect copyright and not respect publisher's content and their wishes and therefore it can be worked around just like the keep off the grass sign and there's a sort of party going on on the grass. Which we find a little bit distressing and something we need to do something about.

Great.

So we are going to put some links together and share those out tomorrow after this session, aren't we?

And also it's worth saying that if you have any questions during this session, you can use the chat function.

If the technology works as it should, they will come straight through to me and we'll run through some questions towards the end.

So that's the good.

I mean, it all sounds wonderful so far, but I feel like we're creeping now into slightly darker territory.

And perhaps the ones that we should now start to worry about a bit more.

Tell me about bad AI.

So these are the AIs that are pretending to be humans.

So they're not prepared to identify themselves when you're accessing content.

Now, this isn't a problem just for AIs.

We've seen, you know, fraudsters trying to do this, survey fraud, for example, advertising fraud, finance or banking fraud, these are the, you know, these are all problems.

AI is just another type of fraud effectively when it's pretending to be a human when it shouldn't be.

So what can we do about that?

Well, the number one thing is to have some terms and conditions on your website that basically cover crawlers.

Um, then put it in your robots.txt.

So you can go to, say, bbc.co.uk slash robots.txt.

They've recently put a marker for their terms and conditions in the comments of robots.txt.

Go to facebook.com slash robots.txt.

The top there they've got their data and mining terms and conditions.

So the very 1st thing to do is make it crystal clear what your wishes are.

So yes, use your robots.txt, but put your terms and conditions there.

Um, That's that's the 1st thing to do, so kind of non-technical.

Um, and I think that's an overlooked kind of part of the answer is just make it clear.

If you...

Sorry, just to interrupt that.

And there's the idea with that, that you are by doing that, you're hopefully steering AIs to abide by those terms and conditions.

Or is this more about being very upfront and from a legal perspective?

Absolutely the latter.

Right.

So there's nothing, unfortunately, that requires a computer to go and read those terms and conditions and then try and work out what they're allowed to do.

But the fact that you've signposted your terms and conditions really clearly.

Um, just puts a few more chips on your side of the table if if and when it comes to litigation.

And let's face it, um, it may be 10 years down the line.

But if you can prove that you were doing the right thing, that when all the class actions run through, you know, you might want to ask me about that in a bit, when all of those run through, um, you'll have, you know, more evidence, not just the law that protects you, and obviously that varies by jurisdiction, people say, no, these were the terms and conditions.

Um, So that's the sort of, you know, the basic thing to do, you know, get on with that, um, talk to your trade body, um, come and talk to me uh, afterwards, if you like.

I think there's a great advantage in working together.

So having a common set of terms and conditions for, use me to find, to point people to my content and help me.

Don't use me to create a derivative product that competes with me.

Yeah?

Lawyers will argue about, you know, the text of that for a long time, but you can work together with your trade body and get a single document together, a single terms document that could be used across lots of different publishers.

That's hardly going to be unique to one particular publisher, how they want to be treated.

And this is, I mean, it's worth restate, this is such a simple thing to do as well.

I mean, in what it is quite a technical area, that very non-technical solution is a really simple thing to implement.

Yeah, absolutely right.

Yeah, absolutely.

So that's the point, I guess, for this webinar is to kind of give you some of the easy things to start with.

Then we move into how do you detect robots that are pretending to be humans?

And I think when this is discussed, there's a lot of talk about bot detection, what we really mean is there's a probability, yeah, and you can't discuss robots and crawlers without also talking about people.

There's just not a solution that's perfect for people and you can always identify people and always identify robots.

And I think that's the 1st thing is realise the limitations of technology.

So it is a bit like whack-a-mole.

Yeah?

You can, you know, if we take a comparable industry like retail, you know, no retailer wants shoplifting.

Yeah?

If you completely prohibit shoplifting where you require people to have their bags checked in airports style security, then it's going to create a very high level of friction for your majority of non-shoplifting customers.

So it's always a balance about the level of friction that you are prepared to create for people, and therefore it's a policy decision how you implement the technology.

So we can do several things to help you.

So one is form a probabilistic ID.

So this is creating an identifier based on the attributes that you can get from IP intelligence and from device detection.

And then using that to do things like look at frequency.

Yeah, so you might say, okay, I'm happy for both humans and crawlers to access 2 articles.

And then after that, if there's a frequency of access that seems unreasonable for a human, then maybe I might start to put a barrier in the way.

So this could be a capture.

You often see this as something that spins for a few seconds and then it says tap the box.

Um, sometimes they can be more complex, like select all the squares that got street lights in them or uh, zebra crossings or whatever, you know, whatever it might be.

So they're kind of like proof of humanness, things that you might put in the way.

And you might choose also to have IP reputation.

So this is the IP addresses like the network address where the traffic's coming from.

So if it's something that we identify as being low confidence and a data centre, Then you might go, eh, it's probably not humans coming from that.

I might kind of put that friction that bag check effectively in a little bit earlier, whereas if it's a residential IP address, um, and there's been, say, 3 requests over 5 minutes, you might say, okay, I won't quite go to that stage yet, but maybe when they get to 5 articles, I might increase that.

And of course we might vary by sight.

So we're a B to B site, for example, we're actually quite happy for the majority of our content to be consumed by whoever we'd like to educate.

You know, this webinar is going to be available for AI is to ingest and they might be learning from our words today.

But our database is that we spend a lot of love and time curating and put a lot of money into looking after those, which is our core product. Don't want to give those away.

So we actually treat different content differently, depending on its value and that value exchange.

Okay.

Um, so this isn't, this isn't just about looking for robots then.

This is about creating a just an acceptable experience for people, really.

Well, it's creating the least rubbish experience whilst protecting your copy.

Okay, yeah.

But it does also, it sounds a bit like a predator prey thing, which is that it will, they will be constantly outdoing each other and that this, the sands will be shifting a lot with this, this kind of issue.

Yeah.

So I think there's that's absolutely right.

And I think this is why the solutions we've talked about so far have been the things that you can do today.

There are some things that sort of come up in the future.

So I mentioned those terms and conditions earlier, for example.

At the moment, there isn't a technical standard, um, at least one that's not, uh, you know, widely accepted and supported by regulation as to where this should be signalled.

Yeah, so putting something in human readable comments as a start, like we discussed earlier, but we really need that to be a clear label, a data label, if you will, which contains the terms document associated with the use of that data.

So we have presented a proposal as, you know, myself and then other people within the industry within different disciplines around how that data label could work.

So that's at GitHub.com slash JW Rosewell, my handle on GitHub slash data labels.

You can have a look at that.

We think this is a, you know, great way of making it even easier for the crawler to find the terms and conditions, removing that ambiguity and making it clear where those terms and conditions are.

What about cookies?

Where do they come into this?

Well, cookies are a state mechanism.

So they originally came along to give the web a memory.

So it could remember what you did a 2nd ago or a minute ago or whatever it might be.

They have a role in that you can write them to a client device.

But a crawler could just ignore them and forget them straight away.

So you can't sort of go through the process of establishing, you know, somethings a request is human, write the cookie that says it's human. And then find that the crawlers either taken that, you know, from a client device or, you know, messed around with it.

So it's not always reliable.

What we like is those signals that just sit there in the background like the user agents, the IP addresses, the root effectively of how the network traffic got got to you.

They're more stable.

So the cooking can be useful.

As I say, you can get that humanness and the fact that a human client effectively will have an incentive to remember that token is useful.

But even then, it can still be problematic, and this is your sort of whack-a-mole, you know, just because you pass the bag check once going into the shop doesn't necessarily mean you're going to pass it a 2nd time.

Okay, so just sort of reverting back a bit to this idea of using IP addresses, this probabilistic data, essentially.

Are there privacy issues with what is quite specific information?

No, not in this instance, because the data is being used to purely for technical measures to identify robots.

But if you were to use it and not tell people about it, then depending on where you are in the world, that might be a problem.

So it does make sense to ensure that your website privacy policy is clear about how you might be using this data to deal with fraud or et cetera.

You don't have to go into too much detail.

You're effectively saying we might use this data to help identify that our terms and conditions are being respected and just leave it at that.

But yes, it's probably prudent. To make sure that you've got a notice about how you're using data.

The same data could be used, for example, to create an identifier that could be used to personalised advertising, for example, that's got nothing to do with the subject of this webinar. And AI.

But I think it raises an important point, which is just because data exists, isn't the problem is how the data's used.

And in this instance, it's being used for fraud, prevention or effectively to respect the terms and conditions and try and enforce those terms and conditions.

Okay.

So that's, we've talked about the bad AIs and some technical and non-technical approaches.

I mean, is that it for bad AIs?

Is there anything else to say on them?

Well, I think the, um, Ultimately, we need to tighten regulation, and there's a lot of talk in regulatory circles at the moment about AI and the impact of AI, and I think there's a growing realisation that unfettered innovation in development is not always positive if you take what's happened with social media or you take, you know, the outcry over groc.

Obviously, we're not necessarily talking about that specific example today.

We're more talking about the content that's ingested.

Groc was clearly changed on a very large body of data from Twitter.

So the value of Elon buying Twitter might have been the back catalogue of data as much as any other reason.

So yes, we need legislation.

And I think, you know, there are a couple of things that we can do around copyright and making it clear that copyright applies.

It's been around for 100s of years.

It's worked quite well.

What happened when the dot com boom occurred is publishers gave a lot of their content away for free.

And I think that was the thin end of the wedge and they've been eroded since then.

So things like putting those terms and conditions in place like we discussed earlier is important, but legislation to make it clearer, and particularly those AIs that have very large market power.

Yeah, those were that are operated by, you know, the Microsofts, the Facebooks, the bugles of this world, um, they might need to be um, held to a higher standard in terms of use of data and things like that. Than smaller AIs.

But ultimately, it's about establishing a really good business model for publishers, and I think that's the exciting pit.

When we can turn to a market for content, then suppose you write an article and it's normally seen as a journalist by, say, 10,000 people, and you get rewarded for 10,000 people, suppose your work can then be used to create derivative works and you could earn 50% more from those derivative works because of AI.

That would be really good.

So this is where we're like pro AI, you know, anti-theft.

So, you know, what we really hate is when big companies come along and just say, well, we shouldn't have to pay for content that's available for free.

Well, what does free mean?

I think it was made free for humans.

It wasn't made free for you to go and create derivative works from it.

Yeah, and I think that you've described it and not sure what the issue is in my industry.

Yeah, yeah.

What everyone is debating and panicking about.

To move on now, and I am very intrigued.

If the good AIs are the ones that identify themselves and the bad AIs are the ones that pretend to be humans, what on earth is an ugly AI?

Well, the ugly AI is predominantly one company, which is Google.

So this is a crawler and a company that is accessing content.

It's been accessing it for decades, for the massive back catalogue of the internet, effectively all the content that's ever been put out there on the internet, and is then using that data to create derivative works.

And they're going further than the other AIs because what they're doing is then inserting the AI overview at the top of search.

So be, so you go to the search and you search for something and we've all been conditioned to go, what, number one and Google, that's clearly the link to go to.

Well, now number one in Google is the AI overview created by Google, which then keeps tends to keep people on the Google property or direct them to YouTube or something like this.

So we have the twin problem of discovery being diminished, so that value exchange for providing information to be in search diminishing because of the AI placement at the top of the search engine result page.

And then we have the derivative product effectively that's taken that content.

And often mislabels.

So often you'll see the publisher brand might be in a little link at the bottom underneath the AI overview to kind of say here are some of my sources.

But suppose that publisher content has been misrepresented with other information and actually doesn't represent what the publisher intended, even though there are links there, it's kind of like, you know, small consolation could actually be damaging.

So I think Google in particular.

Um, need to really change, uh, change the way they operate.

And in my experience, they don't tend to do that voluntarily.

So we would like to see the unbundling of AI overviews.

So that the AI products from Google and Microsoft as well, but I think Google are the most significant in terms of the current harm that's being done for the industry is dealt with as soon as possible.

And it would simply then put the AIs on a level playing field.

A lot of people say, certainly my social circle will say, well, I'll chat GPT or I'll open AI that, you know, and you go to a separate page.

So all it would be is if you want to use Google's AI product, you go to Gemini.got Google.

You don't just go to the normal search home page.

Now, I don't want to sound pessimistic, but this, I mean, how does one take on Google on an issue like this, a company of that size and scale and scope?

This is we're surely beyond technical solutions here now.

We're into the world of regulation.

How do people get involved in that?

How do people kind of get make their voice heard?

Yeah.

Well, the answer is not on your own.

Yeah.

You need to form together.

And I think what we're seeing, those of you who are watching in the UK might remember last year, there was the fairly trained campaign where every single national newspaper had the same front page.

I've never seen that in my lifetime.

Yeah, imagine the Daily Mail and the Mirror having the same front page.

Well, that happened.

So I think that working together is absolutely the answer.

And regulators in Europe, that's both the UK, the European Commission or other European countries, the door is open.

Yeah, they understand, unlike other issues in technology, that plurality of the media is essential to a functioning democracy.

So this is high up on the political agenda.

It's high up on the regulatory agenda, the door is open.

If you care about this subject, then you have to do the things we've talked about earlier.

Yes, continue to be involved in the conversation, but through your trade body, or trade bodies, make sure your voice is heard, or if you have the means go direct.

But my personal journey, I wrote a letter to the CMA in February 2020 about some previous abuses, something called privacy sandbox, the restrictions of interoperability, which again further tips the tables in Google's favour.

And I then met some really interesting people.

And together we formed a group called Movement for the Open Web.

We're not a nonprofit based in the UK, but we work internationally in order to raise awareness of these issues, both in the general narrative, but also with the regulators.

And we've been quite successful.

We've been lead complainant on quite a few cases that have led to fines.

So I think, you know, I understand the reason for the question.

But if you don't engage with a regulator, then, you know, I think now is actually the time to do it.

Um, I mean, for some people listening or watching, you know, direct engagement with a regulator maybe something that they're comfortable with and are experienced in doing, but that doesn't necessarily apply to anyone.

I mean, is movement for an open web, sort of organisation that you can act as a facilitator for that kind of approach.

Yes, exactly.

So we've come together.

One of the problems that I think a lot of trade bodies have is that they tend to have Google as a member.

So that might limit what they're able to do.

What a group-like movement for the open web can do is make the case independently and engage, you know, deeply across many subjects.

I mean, in order to make a comprehensive case to a regulator, you need sort of economics, legal, product, engineering, political, you know, it requires quite a lot of disciplines to come together.

Great.

So just to recap then.

Um, we, we've got a good AIs, which, um, we've talked about robots TXT, which is a nice little friendly signpost that we can place to point them in the right direction, if I've understood it correctly, which they don't have to follow.

Which they don't have to follow.

But also then we've got bad AIs that we're kind of looking forward, essentially pattern recognition, really, isn't it?

We're trying to work out using the behaviour in inverted commas of these crawlers to work out whether they are coming from somebody's the behaviour of the crawler and the humans.

Yes.

Yeah, and looking for subtle differences, whether it's the network or the device or, you know, how their behaving on the site.

Yeah.

And obviously there's a there's a significant technical angle to that, but there is also the very practical and the very simple thing.

You could go and do tomorrow, which is to get your terms and conditions, watertight, rock solid, and put them in there for the world to see, essentially.

Yeah, I mean, you've still got copyright, but it just makes it even clearer what you intend to do with your content and what the license is affectively for different recipients.

Yeah.

And then, obviously, at the level of ugly, which we've just talked about ugly AIs, really, that is we're beyond the technical and we're into speaking to regulators, it's about sort of systemic change.

It's about change on a kind of a global scale almost to try to deal with those issues.

And that's really where shout loudly, I think, seems to be the advice.

Make your voice heard.

Yeah, I mean, I think it's difficult sometimes to shout loudly because businesses are so dependent on Google.

So I think there's a reticence, which is why I work with the trade body and have the trade body shout, or support Mo, and we can help shout for you.

Um, We have got some questions come in, which we will come to shortly.

If you do have any, if there is anyone out there who still has any questions, please get them in now because we've not got along to run.

Before we get to that point, I just wanted to ask if there was anything else that we've missed, anything else that I may have misinterpreted or misunderstood as part of this conversation or anything you've not had a chance to talk about yet.

Yeah, I think data labels we touched on earlier.

It's very important that we don't consider solutions around providing clear terms and conditions in isolation and particular use cases.

This is sort of a primitive thing that the internet needs to have.

It kind of was missing because everyone moves so fast in the 1990s.

They weren't sort of thinking about enforcing contracts and stuff like that.

So I think the data labels proposal is worth looking at because it differs from other proposals that are just dealing with very particular niche problems.

It's looking across the board and saying not just for AI and content, but also other types of data, how do we signal clearly the terms and conditions associated with how that can be used?

And I think that's why it's different and why I'm, you know, helping progress that piece and piece of work as the lead editor of that proposal.

Okay.

So going to questions from our audience, the 1st one that popped up a little while ago actually was asking about the solutions that Cloud Flair and Tolbot offer.

And can they can they tackle some of the things that you've been talking about?

Yeah, so Cloud Flare are a very large infrastructure hosting company.

They are providing alternatives.

So to the solutions that I talked about earlier, but broadly the same thing.

So, you know, find patterns, proof of humanness, you often see cloud flow because they don't miss a trick to put their branding on the screen.

It comes up, there's a little circle, there's cloud flare, verify human, I think it is.

It comes up, black screen.

You've probably seen it more and more.

And they can do that because they are a very large infrastructure company, one of the largest in the world.

You know, you notice in the mainstream media when there's a problem on the internet and people go, oh, Cloudflare have had a problem and then she goes to like 30, 40% of sites have gone offline, you know, which just shows you how critical they are.

So here's a company that's leveraging its dominance in infrastructure to provide these services, listening to their narrative.

They try trying to insert themselves into the payment for content, which again is fine.

But just be wary about being disintermediated.

Remember when Google did no evil and it was amazing because you just went to Google and they were all lovely and you suddenly found your site was found and it was all wonderful.

So I'd be wary about that.

I think ask questions, just don't accept things blindly.

And then there's this nascent marketplace for payment for content and toll bidder sort of trying to get into that into that space.

And again, that's great.

You know, we, we, you know, very, very, very happy about that.

What I have observed over my time over the last 25 years working in this space. Is a lot of publishers complain about middlemen.

Yeah?

So what I would like to see happening is a more decentralised marketplace for payment of the content.

And I think, again, with the right narrative, with the support of regulators, we can move into that space rather than proprietary solutions, where ultimately I can see ourselves in 5 years time for those publishers that survive complaining about the margins that the payment intermediaries take.

Okay, interesting.

This question is an interesting one and perhaps something of a hot potato to discuss, and that is this idea of kind of regulators taking on the mite of Google, et cetera.

Does the sort of global political climate at the moment, without saying too much, make that more difficult than perhaps it might have been 2 or 3 years ago?

Are the regulators afraid to deal with these things with, you know, the yeah, great question.

I think...

So I, firstly, I don't know anything more than, I suppose, what I read and the work that I do through movement for an April web, I don't have privileged access to any particular regulator.

Um, I would say um, there is very much a 2 tier, um, solution.

There's Europe in the UK and North America.

I suppose 3rd would be the rest of the rest of the world.

So I think certainly what we're seeing in Europe is no backing down when it comes to robust regulation.

And I think the kind of Eurostack discussion that's taking place at the moment and some of the things that are being done there, you know, show that there's a willingness particularly in Europe to try and gain more independence from big tech.

I think the UK, US Big Tech, I think the UK's like in many things tends to sit between Europe and the US, isn't moving as quickly.

And then obviously in the US, you know, things are different.

So a lot of these big tech companies relied on fair use, which doesn't really exist in Europe.

So we're seeing things moving in a different pace, but certainly if you're a publisher in Europe, then everything, I've said, makes sense if you're a publisher in the US, then it's state and federal level that you need to lean in at, and there's plenty of trade bodies that can help with that, or indeed movement from an open web is very active over there as well.

Okay?

Just a couple more things before we wrap up.

And this one sort of rang a little bell in my head when you mentioned it and I didn't pick you up on it at the time, but just going back to Robots TXT being this kind of very old bit of tech, if you like this kind of old little signpost that's been around for a very long time.

You said it can be ignored.

And now the question then that follows from that is if it can be ignored, why bother with it at all?

But it's the same reason as having a signpost saying keep off the grass, I suppose.

You want to help good actors. Do the right thing.

I think, you know, it really is as simple as it really is as simple as that.

So not helping those good actors just seems a bit churlish given how simple it is.

What it also does, of course, is it you can say, There was a signpost.

So I don't know anyone who's been subject to a parking charge might go, oh, you know, I didn't see the signpost.

Well, there was a signpost and if you broke the rules when you parked your car, then that's how the parking charge company will enforce.

So, you know, you often need to have signposts.

So I really don't think it does any harm.

Okay.

I think that's probably, if I can just double check.

Um, that's probably it.

I suppose my final question to you then is just to to sort of wrap up everything that we've talked about this afternoon is how do people find out more about this if you've listened to this and you want to know more or understand more or kind of get on with dealing with it?

Where do you go?

What should you do?

Well, ultimately contact us.

Yeah, at 51Degrees We can help you with the technical side.

If you want to deal with the regulatory side, then again, you can contact me.

I also direct movement from open web and we can help on that outside as well.

Ultimately, you know, do the basics, right?

Make sure you've got terms and conditions that cover crawlers, um, and make sure you've got a robots.txt.

And if you want help with that, go to 51degrees.com, developers, rainbots.txt, we can help create one for your usage scenario, then look at how you can handle that human, non-human side of things.

We have tools for that as well.

Um, and then uh, I imagine anyone who's watching this is going to be concerned about Google. Even if they don't vocalise it outside of their group, for the reasons I mentioned earlier, you need to have a policy on that.

Yeah, this unrestricted access really, you know, needs needs to come to an end and there needs to be a balance and now is the time to act.

Great.

I do have a final question, actually.

And that is, I mean, obviously, I'm scientifically literate, but perhaps not particularly technically literate, but one thing that really strikes me about AI, and it's obviously it's in the news every single day now, and you can't fail to be aware of what's going on, the single biggest thing is the pace at which it's moving, and it seems to be advancing at an ever increasing rate, and it's, and certainly, you know, I've talked briefly about the situation broadcasting is like no one knows where it's going to land.

No one no one, everyone feels very uncertain about what the future looks like.

And I guess my final question is.

What do you think the future?

Maybe, let's say, 5 years.

What do you think the next 5 years look like in terms of AI and what would your hopes be as to where we might be if we've dealt with this correctly in 5 years time?

Right, how long have I got?

So that's quite a long, quite a big question.

I think what I would like to see happen in relation to people's relationship with AI is kind of like learning from social media, which is we just take a pause.

Yeah. And maybe not rush headlong into something without a little bit more thought.

So if we can't establish a way of humans creating content, or a limited number of humans are kept on life support by some very large companies just to provide the bare minimum of human content, to be able to feed the AI and people are being consuming highly personalised AI results, that could be really damaging for society.

So I think we need to be very careful about that outcome.

I don't think there's anything that, you know, most people would say, which, like, well, AI, um, absolutely, you know, it has to go at the very fastest pace, that narrative that you talked about about the pace tends to come from the people who are selling AI, um, and creating a narrative around it.

You know, yes, it can help improve efficiencies in, you know, certain jobs and things like that.

Um, you know, and for each person, each individual, each business has to, has to take a decision on on that.

But ultimately what's got to happen is that the AI needs customers, needs information, needs power, and it needs chips, and all of those 3 things, need to be balanced.

So consumers are not getting a rum deal.

Those that provide the information and getting a good deal.

We're not strip mining the planet for power, and chips are not heavily dominated by ultimately one monopolist who can then yank everyone else's J.

Great.

Well, thank you, James, and thank you everyone for joining.

Thank you very much.

We'll be back on the 17th of March.

Yeah, and what are we talking about then?

We are going to be talking about device detection and IP intelligence combined.

So these products have been previously thought about as separate products, with separate vendors, very little integration.

We will be looking at what happens when you integrate them, and we touched on that very briefly in this webinar with things like how you can use probabilistic behaviour, looking at both device, operating system, crawler information, and network information from IP.

So we're going to be going into that in a little bit more detail on the 17th of March.

Great.

Look forward to it.

I'd better go off and do some research.

See you then.