System Concept

Interactive Multicultural Media System Concept

This article was originally written by Ismaila Ikani Sule in March 2010 (withdrawn UK patent application number GB1004067.3). Acknowledgement also goes to Professor Hamid Arabnia and Chul Woo Lim at the University of Georgia, USA, for their assistance in reviewing the article and sourcing out similar research projects for further study (see videos 2 and 3 below).


This concept deals with the multicultural consumption of entertainment content, movies in particular. The entire movie content is literally translated in terms of the visual elements and audio content in ways to match the cultural tastes of the individual consumer. Movie-makers can produce one version of movie which can be modified by the invention to match cultural differences amongst viewers around the globe. Someone in Tanzania or Thailand watching a Hollywood movie would be able to not just add subtitles interpreting what the actors are saying or playback audio in the person’s native language, but actually ‘translate’ the entire movie – the actors’ attires, choice of words in English or another chosen language and even automatically focus out of unwanted scenes based on preset options.

FIGURE 1, above, shows the film/movie medium with content and metadata comprised of dotted coordinate points on an actor along with 3D geometric blocks overlaying the actors image to capture information like structure and movements. Details of the actor’s clothing like the design patterns are also captured in the metadata.
All the information is passed on to the multimedia system for processing. Additional updates are also downloaded from the Internet on to the system. Selected themes, settings and/or designs are then used to modify the video giving the output on display with the actor’s clothes now digitally stretched to include a hood.

FIGURE 2, above, shows the components working together within the overall multicultural media system. The movie medium consists of a movie made up of characters (actors), their clothes, audio and scenes. Accompanying metadata further identify and hold unique specifications for each of these movie items. Everything is then passed on to the media system which has a default setting, can playback the movie in its original form or use the metadata in modifying the output displayed as specified by the user. The themes and settings on the media system or provided by the system user have already been preset and use information from the metadata to match themselves with the movie content.

Background of the Concept

When we watch movies on media like DVDs, Blu-ray disks and others we have several features included to make the experience more interactive and entertaining. We can view special features, select preset languages or add subtitles. Online video sites like also provide us with similar features. However these features on their own do little to actually enrich the actual content we’re viewing our listening to according to our own particular cultural tastes and needs. Someone in the UK may have different tastes in content to someone, say, in Egypt. In a world made up of a variety of cultures affecting the way in which users consume the content being offered to them, providing more choice and freedom gives the product (i.e. the movie) a wider market audience.

The issue of matching video and audio content to suit the different tastes of audiences, particularly when it comes to different cultural backgrounds, has been quite tricky for the movie industry worldwide. Several approaches have been adopted:

  • making video and audio content suited to only targeted local audiences or a section of a wider global audience (for instance Bollywood movies mainly for home fans or popular movies targeting audiences around the world who are adapted to just one dominant version of popular culture)
  • producing different versions of the same film content for different groups of audiences (this can be a costly venture as sometimes different actors may be needed who are able to convey content better to the targeted audience especially when it comes to languages other than English)
  • content editing, which is more common, in order to filter out what is acceptable and unacceptable to different sets of audiences (hence scenes and/or from a film as released in one country might be missing in another country where they may be considered sensitive)
  • use of DRM (Digital Rights Management) software such as DVD Region settings to try and control content viewership in certain regions of the world (this method is also not foolproof and is increasing falling victim to other customised software which disable DRM settings)
  • censorship by government or movie industry institutions to try and control what viewers in their localities can view and hear (this in many cases distorts the content and can make the affected film rather unpleasant to the audience due to, for example, missing scenes or clips, smudges, blurs or black bars over images, audio replaced by beeps or irregular patches of silence and so on)
  • online limitations and controls (certain content may not viewable in countries identified by their IP addresses and server locations plus content is regulated both by site administrators taking off restricted content or users reporting them causing them to the offending content to be removed)
  • banning the content entirely (which in many cases only leads to it generating an underground appeal).

The challenge would be to have movie makers and actors do and say whatever they want in the movie then have movie viewers see and hear only the things they want to see and hear.
That would be the ideal win-win situation.
Freedom of expression, freedom of consumption.

Consumer Choice

Movie audiences would be able to select how they wish to enjoy them.

The viewer can alter the shape of an actor’s clothes into any other desired shape or fashion, for example a shirt into a kimono while maintaining the original texture and design patterns.

Audiences wanting action-packed content only without all the kissing scenes, do not hit the fast-forward button, but just keep watching while the media system automatically detects two heads about to kiss and zooms in on something else on another part of the screen.

Squeamish viewers avoid gory scenes, by having the system automatically move displayed images away to other safe images within the same scenes. Remain in the same scenes and view what you’re comfortable with even if it means using your imagination to guess what you’re not seeing (no need to close your eyes, turn your head or run out of the room in fear).

Explicit viewers watch and listen to the language they’re used to, non-explicit audiences have the system simply replace words matching the tones and audio quality to make everything flow as normally as possible. No irritating beeps, the ‘F’ word could become ‘food’ or ‘disease’ depending on whichever word the viewer opts to use.

How about even watching the latest movie but having all the characters dressed up in clothes from the 1920s complete with the slangs and monochrome effects?

Commercial Value

Developers could share their themes or settings (clothes/audio effects) which can be added to give movie viewing even more variety and taste – not just say for example, clothes and items from anime or the gaming world, but actual professional works of fashion from designers and aspiring artistes.

These could be shared over the web or mobile devices. Movie-makers and actors can ask for royalties for such use of their material generating more revenue.

How It Would Work

First, the entire length of a movie would is mapped out frame by frame identifying objects like the characters and their body structures, their postures, movement and clothing and other items in their surroundings. This information is captured digitally using animated, basic 3D geometric shapes overlayed on the movie images. Structural points are dotted out on the images and corresponding coordinates subsequently stored as metadata to be used by the media system. These points help determine the various shapes and angles to which clothes can be stretched and fashioned out.

Objects overlaying each other, exposed skin as well as other covering like hair, sheets or leaves are also identified within the collected data to distinguish them from the clothing worn by actors.

Metadata identifying objects like clothing, skin, blood, body postures and movement and so on are set and stored as files along with movie’s digital content either on the storage medium such as disks or during TV or online Internet-based broadcasts.

The metadata files associated with the movie would provide the particular skeletal framework to be used by a programme or software on the multimedia system for adding user viewing/audio options and themes to that movie.

When the system receives movie and associated files from the medium, the metadata files provide the specifications and dimensions (shapes, colours, angles, tones, motions, moods, sounds, etc.) related to characters, clothing, objects and dialogue which can be modified.

The system has a default setting featuring options common to all users. Preset options, themes and choices are available to system users as well as the means of them developing and adding their own desired themes, features and options. More can be shared and downloaded over the Internet.

To alter clothing in the movie – the system communicates with the metadata file to identify each actor or character’s clothing, shape and motions; using this information the system digitally ‘stretch’ and alter actors’ clothes and their textile patterns, pixel by pixel, into preset shape themes or to new user customised ones; the finished output is displayed to the viewer.

To alter audio or move the users’ view of sections of selected scenes, the system again matches metadata information with the themes and choices available.

Similar Research Projects

[1] Engadget – Kinect lightsaber, and other inevitable milestones for the open-source robot eye (video, 2010)

[2] MovieReshape: Tracking and Reshaping of Humans in Videos (2010)

[3] Xiaolin Wei – VideoMocap: Modeling Physically Realistic Human Motion from Monocular Video Sequences (2010)