Virtualised broadcast infrastructure

Feb 05, 2010

This is a proposal I am working on with some people, I have put it here because it might protect me a little to have it published online:

By Bob Hannent

The key components of a broadcast infrastructure are based on the tenants of specialist applications running on dedicated hardware. However increasingly to improve margins and drive sales in a competitive market, equipment manufacturers are utilising commodity hardware to deliver the same results. There are now quite a few software applications which can deliver the same results as traditional dedicated hardware.

Furthermore there has been a dramatic reduction in the cost of hosted computer power. This has been lead by the major computing users such as Google and Amazon reselling their data centre capacity in order to drive additional cost multiple savings. Combining these two points it may be possible to deliver software based broadcast network systems which exist entirely virtually and which can scale on demand within seconds. Additionally it could have a high resilience due to the virtual nature of the environment.

The bandwidth available within a “cloud” network (as offered by Amazon or Google) is often relatively high (250Mbits/sec). However, one significant challenge with the traditional distributed computing approach is that the available bandwidth to and from external systems is often allocated on the basis of small external transactions rather than large continuous streams. This lends itself to low rate streaming but doesn’t cater for the high speeds required for broadcasting.

Outsourcing Broadcasting

The broadcast industry is a heavy user of IT equipment but often the tasks are just supporting traditional mechanism. There has been a trend towards outsourcing of tasks to third party companies, but often this more related to headcount than actually making financial savings. Unless the third-party company operates substantially differently from their customer they are just adding a profit margin to the same function that was previously in-house. It is often sited that the savings are made in reducing head-count because of duplication of function, but even where headcount savings are made the overhead of profit for the third-party supplier eliminates any saving.

Additionally outsourcing increases the administrative overhead to the operation so that where previously the broadcaster had choice, now, as a customer, they must fill out a service request and get a purchase order approved for the work. By virtualising the function it is possible to put the power back in the hands of the customer, dramatically increasing their choice and yet allow them to reduce their overheads. They are able to pay for the resources they use, not pay for equipment they don't use and also upgrade as the technology changes. They can have the security of aggregated supply, the power of heavy equipment and above all maintain a lean operation.

Proposal

The acceptable compromise is to design a system which utilises commodity infrastructure common to large data centre providers and containing everything in virtualised software components. The architecture of the system would be based around time leased infrastructure using commodity pricing and vendors should include their function in the infrastructure as ‘appliance’. These appliances are not hardware; they should be based on virtual appliances using a standardised virtualisation engine.

Some of the applications such as real-time multiplexing should spend a great deal of processing time on real-time operations but with careful execution the resources can be made available to manage such a system.

Principles

The proposed system would be based on commodity hardware available from high-end computer manufacturers and would effectively be a modestly priced super-computing cluster. This would allow multiple intensive applications to be running on a hardware design which from the beginning is designed to be fault tolerant. The system should assume that hardware will fail with a predictable rate and unpredictable timing, thus it should deliver constant service and maintenance can be achieved with zero system downtime.

High availability is achieved by loosely coupling all of the components, each element communicates using simple very high speed networking and it's communication is managed to ensure that no single component is able to put the system at risk. Some systems treat management signalling and content data differently, but in this proposal we assume that each is of equal importance and to aid simplicity the system would not separate the two. Only through careful management and traditional quality management metrics would the traffic be identified.

Most high power computing efforts are built on the basis of accessing large datasets at random (e.g. database operations for finance) but in this case the design is focused on delivering streams of data between components and the only stored data would be handled in containers for content management where content is streamed out for processing. Storage becomes an application in its own right and not actually a function of an application.

Applications

The applications are the heart of the system and they would be provided to customers in an "app store" model as seen on Android and iTunes. The customer can pick from a range of software vendors for their components and be assured that only tested and validated functions are available. Interoperability will be mediated by the service so that all blocks act in a predictable way and despite being "black box" implementations they are able to deliver standard outputs from standard inputs.

Traditional companies who already provide hardware for broadcast systems would be encouraged to submit their applications for the virtualised environment and they would be able to achieve a wider range of revenues from their IPR. There would be a greater risk for them because they are more liable to the customer making a change of vendor, but it also means they can take custom away from their competition without waiting for equipment renewal cycles.

Compared to many high performance computing business operations most applications would function with relatively small amounts random access memory operating very quickly. Encoding video may require less than 500MB of RAM and stream multiplexing may only require a handful of megabytes of RAM. The biggest burden would be on CPU time and I/O as the content is taken in, processed and delivered out in real-time.

Applications would run in a read-only container and fetch their configuration as they start from a central resilient configuration database over the network. This prevents an application from becoming corrupt and allows them to behave in a more predictable way.

Some functions, such as video encoding, are heavily dependent on complex mathematical operations that involve floating points and vectors. Traditional CPUs are able to handle these transactions but are perhaps not optimised for this function. It may be possible for the cluster to also contain some generic mathematical acceleration hardware such as a GP-GPU and to make this resource available to the applications through standardised programming interfaces.

Apart from processing and storage no other hardware components would be customisable for an application. A file serving application would make large quantities of fast mass-storage available to an application. Applications would exist purely in memory and should store their configurations centrally.

Management

Management of applications would be through standard programming interfaces over standard network protocols. Should a hardware node fail the application would already exist in storage, the settings would be available over the network and storage would be always active. If an application is lost it could be restarted in seconds, even automatically. Of even greater interest is the idea that if a hardware node needs to be serviced all applications can just be migrated to another hardware node automatically and with no down-time. Because the communication is all over network the changeover between hardware nodes is near seamless.

The customer would be given a secure interface through which they can control their infrastructure, selecting components and configuring their settings. The process of background migration and hardware allocation would be separated so that the customer would never see this happening.

The majority of monitoring and control would be automatic, however this will not satisfy all customers and they may wish to have human monitoring of the systems for higher function decision making. The customer can use their own broadcast systems to monitor and to a certain extent control the system (through mediating control applications) but they may also use a monitoring and control user interface online. If they broadcaster is not staffed for 24hour monitoring they have the option to "hire" a freelance engineer who has been suitably trained on the system.

Multiple freelance engineers could work from home, monitor the system and they would also be treated by the system as remote commodities. Any action which was destructive would not be allowed unless verified by another engineer. So the function of an engineer would be to monitor a range of customers, suggest actions and if the suggestion is agreed by consensus it is actioned. This perhaps does not increase the speed of the remedial action, but where speed is require the automation logic has already acted, the slower remedial action is used to fix standing issues which are not possible to automate.

Storage

A fairly large SAN system would be advisable in order to allow play-out applications to have high quality content available within the cluster. There should be a core storage application which has access to the SAN elements in a controlled fashion and if a processing application need to access a file the storage application can mediate that process. Content can be checked in and checked out of the storage for use by the applications, locked and copied. Transcoding applications would be used to convert content from one format to another as it would be common for there to be a variety of delivery formats from broadcaster and play-out applications would have their own demands for source material. The option to deliver in any format allows for the system to directly interface to post production systems if quick turn-around is required.

External Interfaces

As mentioned, no vendor specific hardware would be available for the system. Interface applications would be made available to allow the customer to connect to their existing broadcast management systems, meta-data sources and content sources. The only time vendor specific hardware would be used would be at "end-points" of the system. For example customer wishing to have a live feed from a traditional SDI video circuit would purchase/lease an appliance that would be a commodity computer, a network interface and an SDI I/O card. The SDI video would be then mezzanine compressed (to preserve quality but reduce the size) and delivered over conventional commodity telecommunications infrastructure to a suitable cluster for the next stage of processing. It is also possible that vendors could supply their own "end-points" to a specification, this would allow a range of interfaces, a range of applications and improved integration.

Infrastructure

The system depends on the purchase of commodity equipment and placing that in hosted data centres with the best commodity pricing. As such a rack cage in a building in London's Docklands is ideal because of the competitive nature of the pricing, the plethora of connectivity options and the access to skill support.
Relationships with key communications providers would be essential as traffic outside of the system would be delivered over non-service specific telecommunications circuits. It would be practical to purchase fibre peering with providers such as Level3, Global Crossing and in addition it may be sensible to make arrangements for video connectivity with BT Broadcast and Siemens to deliver direct video connectivity to their broadcast switching exchange fabric.

Challenges

This design hosts a number of challenges to those implementing it:

1) Resilient clustering

This type of high throughput cluster is typically difficult to implement and will require some customisation of off-the-shelf software. However the software architecture should be easily understood by the designers and keeping the design simple will reduce the chance of failure.

2) Synchronisation

Maintaining audio and video synchronisation is always one of the greatest problems of any broadcast system. But the burden is placed on the application designer to ensure that they have the highest level of integrity in their implementations. Where audio and video are separated in the system there should be discussions with the application vendors about ways in which the temporal factors can be catered for (time-stamping and clock references)

Gains

This design allows for scaled growth as customer demand increases, the availability of resources can be predicted and additional processing power added in advance of the customer orders. The commodity pricing and non-specialised nature of the equipment means that the system can be maintained by conventional IT literate support staff. The software infrastructure while customised is intended to be off-the-shelf in its nature and this increases the ability to find qualified staff to maintain the system. The system can also scale beyond one location because additional clusters can be built anywhere in the world that has sufficient connectivity. The availability of a local cluster aids the customers use of the service but the ability to remotely manage a cluster outside of your region is bonus to many companies as well. A North American based broadcaster could make use of the system to remotely manage their European operations without needing a local staff presence or even the need to maintain any hardware in Europe.

Expansion

After starting with centralised broadcast transmission, multiplexing and play-out, all of which have fairly defined interface points and common operating models, there is a possibility to extend the model to post-production and possibly even deep to live production. By delivering a satellite cluster or dark-fibre link into the premises of a broadcaster it is possible to have virtual edit suites using a virtual desktop infrastructure approach. Content can be edited live on the system and made available to the broadcast infrastructure or any other customer on the cluster.

Once post-production is in place the further step is into live production where it may be possible to provide real-time vision mixing and audio mixing in a virtual environment. However the specialised nature of the control surfaces here might create logistical issues. But for live contribution and self-operated studios the challenges are less.

Bob, Doing Bob Things

Discussion about this post