The core of Alembik server implements the majority of server-side functionalities and performs processing of the client requests. The following schema overviews its internal architecture.

The picture outlines the last two out of four fundamental steps, which are passed in the following order for every transcoding request:

The execution chain for all these steps is controlled by the implementation of org.alembik.TranscodingService interface - TranscodingServiceImpl, which is the actual receiver of every request at the server-side. First the component tries to obtain the source media file in the local environment using Storage.loadSourceFile() method. Once the file is ready the control is passed to ProfileResolver.evaluateProfile() method, which analyzes request and prepares all transcoding parameters (e.g. by resolving User-Agent string). Having all the data in place the controller selects an appropriate media processor (MediaProcessorFactory.getProcessor()) and starts the transcoding operation with its MediaProcessor.process() method.

Some media processors like ImageMagick and FFmpeg are based on native libraries and execute transcoding requests externally (outside the Java VM). Others, including WebRenderingProcessor and GAIAMediaProcessor, do their job using Java libraries. For example the GAIA processor delegates the majority of transcoding operations to a chain of org.alembik.processing.gaia.TransformationProcessor components, each responsible for a single transformation requested by a client within the transcoding job.

 

As mentioned in the engine overview, it is the org.alembik.resolver.ProfileResolver component, which pre-processes transcoding jobs data. Pre-processing of each request involves analysis of transcoding parameters sent by a client. Actually the client may choose among the following options:

  • Create and pass its own TranscodingParams instance with a custom properties set, like a desired image dimensions, audio/video bit rate values or a maximum number of colors.

  • Pick up a desired profileID from a list of predefined profiles, which are declared in the XML profile definitions file.

  • Place a User-Agent identifier into profileID field and let Alembik use the appointed UserAgentResolver component to generate a transcoding parameters set corresponding with the given device capabilities (see the section below).

  • Apply best matching strategy for a passed User-Agent, which force matching the transcoding parameters resolved by UserAgentResolver against the (sub)set of predefined profiles in order to choose the best fit (see TranscodingUtils.setBestMatchingQuality() method).

A definite profile to be used for each job in its transcoding process is determined in the following way. If the examined transcoding job has the profileID field assigned (either in its TranscodingJob.Target or on the TranscodingRequest level), Alembik tries first to match it against its predefined profiles set stored in the profile definitions file.

Each predefined profile is distingushed by its unique id and should contain a set of transcoding parameters related to a single type of media only. The matching process is actually carried out in org.alembik.TranscodingProfileRepository component. There, if a match is found, the retrieved profile is combined with possible transformations sent by the client.

The profile definitions file is supposed to be placed in the definition files directory under profile-defs.xml name. The XML schema of the repository file can be found here; an exemplary profile definition might look like it is shown below:

<mts:transcodingProfile>
    <mts:profileID>VIDEO_PROFILE_EXAMPLE</mts:profileID>
    <!--
    -- optional quality setting (0-ANY, 1-LOW, 2-MEDIUM and 3-HIGH) --
    <mts:quality>2</mts:quality>

    -- optional hinting enabler (false by default) --
    <mts:hintable>true</mts:hintable>
   
-->

    <mts:transcodingParams>
        <video>
            <contentType>video/3gpp</contentType>
            <videoVisual>
                <codec>video/h263_0</codec>
                <bitRate>100000</bitRate>
                <frameRate>10</frameRate>
                <width>176</width>
                <height>144</height>
            </videoVisual>
            <videoAudio>
                <codec>audio/amr</codec>
                <bitRate>7950</bitRate>
                <samplingRate>8000</samplingRate>
                <channels>Mono</channels>
            </videoAudio>
        </video>
    </mts:transcodingParams>
</mts:transcodingProfile>

Otherwise, if no matching predefined profile has been found, the profileID is passed through the User-Agent resolution procedure. Here, if the value appears to be a valid UA string, the TranscodingParams properties are resolved by combining the capabilities of relevant mobile device with the values passed by the client (the latter ones having higher priority).

The most complex algorithm is involved in the "best matching" strategy. This option is be activated by placing the transcodingQuality property with a relevant quality number into TranscodingJob.Target's extension data.

In this case the profileID is treated as a UA string and the selected properties of predefined profiles are compared with capabilities of a relevant device. Actually the analyzed parameters are: media type (currently only video and audio are supported), codec names and (optionally) screen dimensions. In case of video profiles the streaming support is considered, only if Streaming transformation is present (both by the client and within predefined profiles).

If the client chooses HIGH, MEDIUM or LOW quality matching, screen dimensions are not taken into account and the first found profile of a given Quality is returned. If none fits Alembik picks the first defined profile of ANY quality and a relevant media type (with streaming declared if required). In case of ANY quality matching applied, all the levels are analyzed and the first encountered profile of the highest possible quality is returned.

Another aspect comes with the transcodingHintable property supposed to be placed in TranscodingJob.Target's extension data. If present and set to true, the property makes the matching algorithm aware that certain devices (e.g. backed by Android OS) require hinted video files to perform progressive download. Hence is case of such a device only the "hintable" predefined profiles are taken into account for matching and the result set of transcoding parameters will contain the hinting transformation.

 

Alembik delegates User-Agent resolution to a org.alembik.resolver.UserAgentResolver implementation obtained via UserAgentResolverFactory class. The default implementation provided in each Alembik distribution is org.alembik.wurfl.WurflUAResolver, which evaluates parameters for a given device making use of WURFL repository.

Basically the WURFL resolver provides the mapping between OMA transcoding parameters and WURFL capabilities. In particular it fills TranscodingParams' numeric values (e.g. picture dimensions, size limits, audio bit and sampling rates), performs features support checking (e.g. markup language, stereo channels, colors) and translates WURFL codec and type names into mime format.

In case of image, audio and text capabilities the resolution is pretty straight forward; slightly more complicated is treatment of video requests. First of all the resolver tries to evaluate the supported video architecture (aka container) along with the appropriate video and audio codec pair. Since the video media processor (and the streaming server) may not be able to handle all possible formats, Alembik defines a list of supported architectures and codecs (download and streaming modes separated) in a property file. The order in each category is significant and gives priority to an earlier entry (in case when there are more than one supported).

video.properties

 architecture.download.3gp
 architecture.download.3g2
 architecture.download.mp4
 architecture.download.wmv
 architecture.download.mov

 architecture.streaming.3gp
 #architecture.streaming.3g2
 architecture.streaming.mp4
 #architecture.streaming.wmv
 #architecture.streaming.mov

 videocodec.mpeg4
 videocodec.h263
 videocodec.h264

 audiocodec.aac
 audiocodec.amr

The default settings may be overridden by placing a custom video.properties file in the definition files directory. Note that a similar (albeit much simpler) support hierarchy is defined for audio codecs too. Here the default configuration may be changed by placing a custom audio.properties file into the aforementioned folder.

Once having a particular video architecture or audio codec selected, Alembik evaluates the relevant transcoding parameters through a separate WURFL extension library, which assembles predefined sets of values in its XML configuration file. Here the video transcoding parameters (bit rate, frame rate and picture size) are evaluated by matching existing video configurations (of a given codec) against device screen dimensions and codec level (both retrieved from WURFL). In case of audio part (bit rate, sampling rate and channel) the highest possible configuration is retrieved (with 96 kbit/s upper limit for the bit rate).

Note that if WURFL does not provide sufficient information on the codec of a particular device, Alembik makes an assumption based on the codec popularity.

Video architecture Default video codec Default audio codec
3GPP h263_0 amr
3GPP2 mpeg4_sp amr
MP4 mpeg4_sp aac_lc
MOV h263_0 amr
Windows Media 7 wmv1 wmav1
Windows Media 8 wmv2 wmav2
Windows Media 9 wmv3 wmav3
Real Media 8 rv30 aac_lc
Real Media 9 rv40 aac_lc
Real Media 10 rv50 aac_lc

 

While it is org.alembik.TranscodingServiceImpl class, which is responsible for routing incoming requests to appropriate MediaProcessor instances, the real job is started by org.alembik.processing.MediaProcessorFactory.

The sequence schema of this interaction is shown in the figure below.

 

Thus first the TranscodingServiceImpl class receives a MediaProcessor instance from MediaProcessorFactory and then it invokes the process(TranscodingJob, JobResult) method on it (see below).

The architecture is open for introduction of other MediaProcessor implementations; it is fairly easy to substitute an existing processor or the processors selection policy without changing a single line of code.

 

The media processor is a standard component, whose principal role is to perform transcoding operation. Each one has to implement the org.alembik.processing.MediaProcessor interface.

org.alembik.processing.MediaProcessor
public interface MediaLoader
{
   public void init (Properties initProps);

   public void process (TranscodingJob job, JobResult result) throws TranscodingException {}
}

At the moment there are five media processors implemented for the project:

  • org.alembik.processing.DefaultMediaProcessor

  • org.alembik.processing.ImageMagickMediaProcessor
  • org.alembik.processing.gaia.GAIAMediaProcessor
  • org.alembik.processing.FFMpegMediaProcessor
  • org.alembik.processing.WebRenderingProcessor

DefaultMediaProcessor is actually a processor controller. It receives all requests from TranscodingServiceImpl and dispatches them to pertinent MediaProcessors. The actual media processor is selected according to the media type of a target result. DefaultMediaProcessor assumes only one MediaProcessor implementation for each media type: image, audio, video and text; the actual class name is the assigned by the Configurator module.

Out of the actual processors there are two performing image processing: ImageMagickMediaProcessor and GAIAMediaProcessor. They are based respectively on the C++ ImageMagick project and on the pure Java GAIA-GIT project. The FFMpegMediaProcessor processor is used for audio and video transcoding and is based on C/assembler FFmpeg libraries. The last one, WebRenderingProcessor, is again a pure Java module, whose only purpose is text transcoding, mainly the HTML web content.

The processors based on external libraries, namely ImageMagick and FFmpgeg, use the command execution pattern (java.lang.Runtime.getRuntime().exec("...")) to transcode files. Although there are several Java wrapper projects (based on JNI or JNA) available for both transcoders, none has been incorporated so far due to their various limitations. Fortunately both transcoders are open-source and have releases for all popular operating systems. Moreover, despite the vast range of supported codecs and the broad scope of functionality, there were additional components introduced. ImageMagick processor was enhanced with the Batik library support for processing SVG files (org.almebik.processing.SVGUtility), while FFmpeg processor received backing from MP4Box utility for streaming preprocessing (org.almebik.processing.MP4BoxUtility).

As mentioned above the GAIA processor is the pure Java image transcoder implementation, largely based on the JAI library. The module provides a set of Java encoders and decoders for the most popular image formats: JPG, GIF, PNG, BMP and WBMP. Once having a media file decoded, the control is passed to the chain of TransformationProcessor implementations, served promptly by org.alembik.processing.gaia.TransformationProcessorFactory. The factory offers a simple contract for matching any possible transformation type with its implementation classname; the transformation type (name) is simply appended to the package path, which consists of the org.alembik.processing.gaia prefix and the suffix of a media type it serves (image, audio, video, etc).

The following table gives a summary of the Transformation types currently supported by each media processor.

Transformation Source
media type
ImageMagick
MediaProcessor
GAIA
MediaProcessor
FFMpeg
MediaProcessor
WebRendering
Processor
AnimatedToStatic Image V V n/a n/a
Brightness V X
Color V X
Contrast V X
Cropping V V
FrameFill V X
FrameRateOutput X X
FrameRateSample X X
LevelCorrection V X
Mirror V V
NoiseReduction V X
NumberOfFrames X X
OverlayLogo V V
Rotation V X
Sharpen V X
TextOverlay X V
AGC Audio n/a X n/a
DurationLimit V
Offset V
Streaming X
Advertisement Video --- n/a V n/a
DurationLimit V
Hinting V
Offset V
Streaming V
AGC audio X
Brightness visual X
Color X
Contrast X
Cropping X
ExtractFrame V
FrameFill X
LevelCorrection X
Mirror X
NoiseReduction X
OverlayLogo V
Rotation X
Sharpen X
TextOverlay V
OnlyText Text n/a n/a V
NoLinks V
OrganizeText V
NavigBar V

The description of each transformation can be found in the tag library section or, in case of Text-related transformations, in the web rendering section. Moreover, the TranscodingUtils class offers a handful of utility methods for creating transformations with their respective attributes set.

 

When handling asynchronous jobs Alembik stores the current state of each task in the org.alembik.monitoring.MonitoringCache component. Its main purpose is to deliver the status of a job processed asynchronously to the clients upon request. Each particular TranscodingJob is distinguished by a hash key value generated and stored in its extension properties (org.alembik.util.Utils.generateCacheKey() method). The actual object stored in the cache is JobResult, whose output MediaProperties are being filled with TranscodingState and TranscodingUtils.FileInfo-related values as the transcoding operation gets along.

The MonitoringCache's filling and retrieval operations are thread-safe. Moreover, the cache is aware that several transcoding tasks with the same hash key can be processed in parallel; if it is the case it maintains the state according to the processing sequence algorithm. The implementation is based on the internal cache framework, which is described below.

 

Alembik server supports streaming of videos. When a client requests for the streaming transformation within a transcoding job, FFmpegMediaProcessor has to perform additional job. Hence straight after finishing that video transcoding it executes the external program called MP4Box (based on the GPAC library), which hints the video (i.e. inserts special marks into the file body). Finally, instead of returning the standard storage URL, the server produces a RTSP link of the streaming server, which is supposed to stream the video to the client.

Alembik does not implement streaming internally; it uses the Darwin Streaming Server instead. Basically Darwin supports video streaming for two formats: 3gp and mp4. Its only requirement is the aforementioned hinting of a video file to be streamed.

Certain devices (e.g. gPhone) require hinting for so-called progressive downloads; in this case Alembik offers a dinstict hinting transformation, which only marks a target file (it neither changes protocol nor URL).

 

In certain places of the framework there is an in-memory cache support required. For all such cases Alembik provides its own cache framework. Its central point is org.alembik.cache.Cache interface, whose instances can be retrieved via a CacheLocator implementation. In standard server mode it is the LocalCacheLocator, which provides a desired cache instance. The actual implementation class is chosen on the basis of cache.class property passed to the locator. At the moment Alembik offers three different options:

 

 

Apart from transcoding image, audio and video files Alembik offers also a text transcoding support, i.e. transformation of HTML documents (including embedded media content) into various mobile-supported formats (XHTML, WML). The heart of the web rendering module is org.alembik.processing.WebRenderingProcessor, which controls the flow of transcoding process.

Full transcoding operation consists of three phases: parsing, organization and adaptation. The first one is supervised by org.alembik.webrendering.parser.Parser, which reads the original HTML file and converts it into pure XHTML document (with NekoHTML utility) with all CSS styles inlined (through the adapted CSS4J library).

Then the control is passed to org.alembik.webrendering.organizer.Organizer component, which extracts a set of applicable CSS styles and detects the main logical blocks of the document (menu, logo, forms and main text sections). By the end of the phase the analysed document with its CSS styles extracted is ready to be adapted according to a particular set of transcoding parameters. In this moment WebRenderingProcessor puts the prepared data into cache, which improves the transcoding performance in case of subsequent transcoding requests for the same HTML file.

Finally org.alembik.webrendering.adapter.Adapter comes into play; it transforms the document to a requested format, reorganizes its logical parts to fit best to a device's screen, inlines a relevant subset of CSS styles and whenever necessary divides the transcoded result into smaller pages. The component handles also all media content (like images or videos) embedded in the document as well as internal links to other HTML pages. The original URLs are replaced with Alembik-based links, which will redirect client to the HTTP transcoding servlet (with an appropriate set of transcoding parameters) to handle the referenced content.

When there is a size limit imposed and the transcoded document exceeds it, the processor divides the result into two or more pages. Then there is an additional navigation bar added in the footer of each page, with the current page number and links to other pages. Moreover all menus and forms identified by Organizer (if any) are also extracted from the main page(s) and made accessible via appropriate links located in page bodies. In any case the transcoded result location returned to client refers always to the first page.

Thus the web rendering module allows clients to browse the Internet (via a device browser) starting from a first transcoded page. In this mode if it comes to error handling, Alembik assumes very simplistic approach; whenever a page cannot be transcoded the server redirects the browser to a predefined error page.

Advanced HTML support

Alembik is capable of handling more complex HTML documents too. Its web rendering engine handles HTML frames and iFrames joining their content into a result file. Additionally it offers basic support for JavaScript, mainly in the parts responsible for site redirections and dynamic HTML generation.

In case of embedded videos, the 1,5MB preview is asynchronously scheduled and made available through the additional link. At the moment only YouTube and DailyMotion players are handled properly.

Text-based parameters

The typical client for HTML documents transcoding is a mobile browser, hence transcoding parameters are most often created on the User-Agent resolution basis. However it is also possible to specify those parameters manually. Since OMA specification does not cover web rendering domain well there have been certain extension properties introduced into API.

The first of OMA's Text extension data parameters is mrkupLang, which specifies one of the following mobile-based markup language formats:

  • WML_1_1,
  • WML_1_2,
  • WML_1_3,
  • XHTML_BASIC_1_0,
  • XHTML_BASIC_1_1,
  • XHTML_MP_1_0,
  • XHTML_MP_1_1,
  • XHTML_MP_1_2,

where MP stands for Mobile Profile. The TranscodingUtils.setMarkupLanguage() provides useful interface to handle that setting.

Another property data is cssVers, whose presence controls the treatment of CSS files. Currently there are two modes supported: NO_CSS (which disables any CSS processing) and CSS_MP (which provides support for CSS Mobile Profile standard). As above there is relevant utility method available: TranscodingUtils.setCSSVersion.

Last but not least, there is Text's screenWidth and screenHeight extension properties pair, which specifies the screen dimensions of the target device (in pixels). One may adjust them via TranscodingUtils.setDimensions method.

While OMA's Text size limit parameter controls the weight of transcoded pages (which affects how many of them are produced from a particular HTML document), there is additional parameter, by which clients may adjust weight measurement algorithm. The Text extension data's PageSizeEval property can have two values: TEXT_ONLY (default one), which tells the processor to count only pure text, and TEXT_WITH_TAGS making HTML tags be taken into account.

Another page weight-related feature comes with ImageRendering property. It may be set to one of the following values:

  • NO_IMAGES - excludes embedded images from the processed document (useful when transcoding documents for devices of limited memory),
  • CLIENT_PROCESSING - images are downloaded and transcoded on a separate request from a client browser (default value),
  • SERVER_PROCESSING - images are downloaded and transcoded during the document processing (slower, but with full control over the document content).

Text transformations

Alembik offers a couple of custom transfomations. The first three come with no parameters - if present they adjust the way the web rendering processor handles the original markup.

NoLinks transformation prevents the engine from producing hyperlinks in a result page. OnlyText goes a step further and retains only text of a rendered page. The last one - OrganizeText makes Adapter component rearrange the content of a source page by pushing its principal block of text to the fore.

More complex transformation is NavigBar, which lets clients handle the labels and the URLs of page navigation elements. Below there is a list of possible attributes:

  • back - the label of the link to a main page from its menu/form (defaults to 'Go back...'),
  • goTo - the label of the link from a main page to its menu/form (defaults to 'Go to'),
  • page - the current page label in the navigation bar (defaults to 'Page'),
  • home - the label of a declared home page in the navigation bar (by default not present),
  • url - the URL of a declared home page in the navigation bar (by default not present),
  • bgCol - the background RGB color of the navigation panel in HEX form (defaults to '#F7B223'),
  • fCol - the RGB color value of the navigation labels in HEX form (defaults to '#000000'),
  • lCol - the RGB color value of the navigation links in HEX form (defaults to '#FFFFFF'),
  • close - the label of the link closing opened menus, forms, etc. (defaults to 'Close').

For clients convenience there is TranscodingUtils.setNavigationBarAttrs() method made available.

HTML-enabled devices detection

Client may choose to turn off web rendering for top-end devices, which support HTML. The web rendering engine will check then if a requesting device supports HTML and JavaScript and return an original page instead. This mode is disabled by default, although there is a special Text's extension property called HTMLSupport, which enforces it. Please use TranscodingUtils.setHTMLDetection utility method to switch it on/off.

Mobile-enabled sites detection

It is quite obvious that in the browser mode (when the transcoding parameters are evaluated on the User-Agent basis) the HTML site already "mobilized" should not be transcoded. Following the rules for responsible reformatting Alembik introduces three levels of mobile-friendly sites detection: the default ACTIVE (completely transparent - all client headers remain intact), PASSIVE (analyzes only URLs syntax and metadata information in the returned content) and NONE.

All detection algorithms are implemented in org.alembik.webrendering.utils.MobileWebDetection component, while client may adjust its desired level by setting MWDetect extension property in the Text object. As usual there is a TranscodingUtils.setMobileWebDetectionLevel() utility method supplied too.