Open Data

By Aleksandras Šulženko, Product Owner at Oxylabs.io

Web scraping can be a force for good in the world. While it has been predominantly used by large corporations, more opportunities for non-profit use of web scraping have been becoming apparent.

Most of these non-profit projects focused on things such as catching corruption. However, it can be beneficial anywhere where data is public. Luckily, the global trend seems to be moving us towards the democratization of data where it becomes available to everyone.

A perfect example is the recent push by the government of the United States of America to tighten hospital price transparency regulations. In short, hospitals in the USA are now required to publish total charges, insurance-specific negotiated rates, and discounted prices for people paying out-of-pocket. 

Additionally, hospitals must also publish the prices for 300 commonly used services that patients could schedule and shop for in advance, such as surgeries and X-rays. All this data should become publicly available fairly quickly.

 

Enabling informed consumers

One of the key, and debatable, assumptions of classical economics is that everyone is a rational actor. Usually, that means that the person would rationally weigh each possibility before making a decision. Regardless of whether we agree that people are rational actors, the key to such decision making is information. 

When information is hidden, no one can truly make an informed decision. Additionally, some cases are even worse as one party holds an informational hegemony, due to asymmetry. That has been the case for the longest time in healthcare as hospitals in the US did not display any pricing information. However, with the changes in legislation, an opportunity to make consumers better informed arises.

While the high number of hospitals within the USA is certainly a boon, they produce an inordinate amount of information. Collecting and comparing all the pricing information manually would probably be an impossible task if we were to take into account changes that can happen quite frequently.

It can be speculated why pricing data has been somewhat hidden or attempted to be buried previously. Of course, if we were cynical about healthcare, the primary benefit for hospitals is to charge higher prices, as services (compared to products) can have higher margins. Additionally, healthcare is non-negotiable in many cases.

 

Web scraping to the rescue

Price comparison services are already an established industry that utilizes web scraping. Ecommerce giants, airlines, and many other businesses use web scraping daily to inform their pricing decisions.

With public healthcare pricing information, web scraping can be used for much of the same. However, the end-goal and results would be totally different. First, access to a public healthcare pricing database would enable consumers to make better choices. While there are likely only several to a dozen hospitals reasonably accessible to anyone, that still enables people to get a quick overview of what would be the best choice.

However, the real kicker is in collecting such a database for other reasons. Primarily, the greatest benefit lies in increased competition. As consumers become more educated and data on healthcare pricing more accessible, they will eventually trend towards the best price to quality ratio.

In turn, such a trend would force hospitals to compete harder for patients and service quality. If we know anything about free market economics, it’s that competition leads to better services to consumers. In fact, hospitals may begin scraping competitors and matching prices like it has already happened with other industries.

 

Open data is a public good

As we can see from the above example, the simple process of enabling open access to healthcare pricing data can produce significant results. Most importantly, open data creates competition and benefits consumers.

We have already seen a nearly identical effect with the proliferation of dynamic pricing. As ecommerce and retail pricing data is by necessity public, companies have been using web scraping and data analysis to create a more competitive environment. However, as they compete to provide the best pricing, the one truly benefitting in the long run is the consumer.

Nevertheless, we shouldn’t limit ourselves to just pricing data. All data can produce significant public good through the use of web scraping. For example, as previously mentioned, government data can provide journalists with the opportunity to uncover corruption.

Yet, one of the biggest challenges to moving web scraping towards household use is the difficult technical maintenance. Aspects of web scraping, such as parsing, are so challenging that they require dedicated teams of developers. However, with advancements in artificial intelligence and machine learning certain solutions (such as our Adaptive Parser) can create opportunities for smaller players to engage with scraping.

In the end, as data will proliferate further, web scraping will become a way to avoid getting lost in the noise of information. It will allow us to dig through massive, insurmountable amounts of information to find the hidden gems and achieve what would otherwise be impossible.

 

Conclusion

Hopefully, as much data as possible becomes accessible to everyone. While web scraping might require some resources to get started, small projects can usually run within one system but still provide great benefit. As long as data remains and becomes open, we will be able to reap the rewards of web scraping.