Quantcast
Jump to content


New Vulkan Extensions for Mobile: Maintenance Extensions


Recommended Posts

2021-06-14-01-banner.jpg

The Samsung Developers team works with many companies in the mobile and gaming ecosystems. We're excited to support our partner, Arm, as they bring timely and relevant content to developers looking to build games and high-performance experiences. This Vulkan Extensions series will help developers get the most out of the new and game-changing Vulkan extensions on Samsung mobile devices.

Android is enabling a host of useful new Vulkan extensions for mobile. These new extensions are set to improve the state of graphics APIs for modern applications, enabling new use cases and changing how developers can design graphics renderers going forward. In particular, in Android R, there has been a whole set of Vulkan extensions added. These extensions will be available across various Android smartphones, including the Samsung Galaxy S21, which was recently launched on 14 January. Existing Samsung Galaxy S models, such as the Samsung Galaxy S20, also allow upgrades to Android R.

One of these new Vulkan extensions for mobile are ‘maintenance extensions’. These plug up various holes in the Vulkan specification. Mostly, a lack of these extensions can be worked around, but it is annoying for application developers to do so. Having these extensions means less friction overall, which is a very good thing.

VK_KHR_uniform_buffer_standard_layout

This extension is a quiet one, but I still feel it has a lot of impact since it removes a fundamental restriction for applications. Getting to data efficiently is the lifeblood of GPU programming.

One thing I have seen trip up developers again and again are the antiquated rules for how uniform buffers (UBO) are laid out in memory. For whatever reason, UBOs have been stuck with annoying alignment rules which go back to ancient times, yet SSBOs have nice alignment rules. Why?

As an example, let us assume we want to send an array of floats to a shader:

#version 450

layout(set = 0, binding = 0, std140) uniform UBO
{
    float values[1024];
};

layout(location = 0) out vec4 FragColor;
layout(location = 0) flat in int vIndex;

void main()
{
    FragColor = vec4(values[vIndex]);
}

If you are not used to graphics API idiosyncrasies, this looks fine, but danger lurks around the corner. Any array in a UBO will be padded out to have 16 byte elements, meaning the only way to have a tightly packed UBO is to use vec4 arrays. Somehow, legacy hardware was hardwired for this assumption. SSBOs never had this problem.

std140 vs std430

You might have run into these weird layout qualifiers in GLSL. They reference some rather old GLSL versions. std140 refers to GLSL 1.40, which was introduced in OpenGL 3.1, and it was the version uniform buffers were introduced to OpenGL.

The std140 packing rules define how variables are packed into buffers. The main quirks of std140 are:

  • Vectors are aligned to their size. Notoriously, a vec3 is aligned to 16 bytes, which have tripped up countless programmers over the years, but this is just the nature of vectors in general. Hardware tends to like aligned access to vectors.
  • Array element sizes are aligned to 16 bytes. This one makes it very wasteful to use arrays of float and vec2.

The array quirk mirrors HLSL’s cbuffer. After all, both OpenGL and D3D mapped to the same hardware. Essentially, the assumption I am making here is that hardware was only able to load 16 bytes at a time with 16 byte alignment. To extract scalars, you could always do that after the load.

std430 was introduced in GLSL 4.30 in OpenGL 4.3 and was designed to be used with SSBOs. std430 removed the array element alignment rule, which means that with std430, we can express this efficiently:

#version 450

layout(set = 0, binding = 0, std430) readonly buffer SSBO
{
    float values[1024];
};

layout(location = 0) out vec4 FragColor;
layout(location = 0) flat in int vIndex;

void main()
{
    FragColor = vec4(values[vIndex]);
}

Basically, the new extension enables std430 layout for use with UBOs as well.

#version 450
#extension GL_EXT_scalar_block_layout : require

layout(set = 0, binding = 0, std430) uniform UBO
{
    float values[1024];
};

layout(location = 0) out vec4 FragColor;
layout(location = 0) flat in int vIndex;

void main()
{
    FragColor = vec4(values[vIndex]);
}

Why not just use SSBOs then?

On some architectures, yes, that is a valid workaround. However, some architectures also have special caches which are designed specifically for UBOs. Improving memory layouts of UBOs is still valuable.

GL_EXT_scalar_block_layout?

The Vulkan GLSL extension which supports std430 UBOs goes a little further and supports the scalar layout as well. This is a completely relaxed layout scheme where alignment requirements are essentially gone, however, that requires a different Vulkan extension to work.

VK_KHR_separate_depth_stencil_layouts

Depth-stencil images are weird in general. It is natural to think of these two aspects as separate images. However, the reality is that some GPU architectures like to pack depth and stencil together into one image, especially with D24S8 formats.

Expressing image layouts with depth and stencil formats have therefore been somewhat awkward in Vulkan, especially if you want to make one aspect read-only and keep another aspect as read/write, for example.

In Vulkan 1.0, both depth and stencil needed to be in the same image layout. This means that you are either doing read-only depth-stencil or read/write depth-stencil. This was quickly identified as not being good enough for certain use cases. There are valid use cases where depth is read-only while stencil is read/write in deferred rendering for example.

Eventually, VK_KHR_maintenance2 added support for some mixed image layouts which lets us express read-only depth, read/write stencil, and vice versa:

VK_IMAGE_LAYOUT_DEPTH_ATTACHMENT_STENCIL_READ_ONLY_OPTIMAL_KHR

VK_IMAGE_LAYOUT_DEPTH_READ_ONLY_STENCIL_ATTACHMENT_OPTIMAL_KHR

Usually, this is good enough, but there is a significant caveat to this approach, which is that depth and stencil layouts must be specified and transitioned together. This means that it is not possible to render to a depth aspect, while transitioning the stencil aspect concurrently, since changing image layouts is a write operation. If the engine is not designed to couple depths and stencil together, it causes a lot of friction in implementation.

What this extension does is completely decouple image layouts for depth and stencil aspects and makes it possible to modify the depth or stencil image layouts in complete isolation. For example:

    VkImageMemoryBarrier barrier = {…};

Normally, we would have to specify both DEPTH and STENCIL aspects for depth-stencil images. Now, we can completely ignore what stencil is doing and only modify depth image layout.

    barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_DEPTH_BIT;
    barrier.oldLayout = VK_IMAGE_LAYOUT_DEPTH_ATTACHMENT_OPTIMAL_KHR;
    barrier.newLayout = VK_IMAGE_LAYOUT_DEPTH_READ_ONLY_OPTIMAL;

Similarly, in VK_KHR_create_renderpass2, there are extension structures where you can specify stencil layouts separately from the depth layout if you wish.

typedef struct VkAttachmentDescriptionStencilLayout {
    VkStructureType sType;
    void*          pNext;
    VkImageLayout      stencilInitialLayout;
    VkImageLayout      stencilFinalLayout;
} VkAttachmentDescriptionStencilLayout;

typedef struct VkAttachmentReferenceStencilLayout {
    VkStructureType sType;
    void*          pNext;
    VkImageLayout  stencilLayout;
} VkAttachmentReferenceStencilLayout;

Like image memory barriers, it is possible to express layout transitions that only occur in either depth or stencil attachments.

VK_KHR_spirv_1_4

Each core Vulkan version has targeted a specific SPIR-V version. For Vulkan 1.0, we have SPIR-V 1.0. For Vulkan 1.1, we have SPIR-V 1.3, and for Vulkan 1.2 we have SPIR-V 1.5.

SPIR-V 1.4 was an interim version between Vulkan 1.1 and 1.2 which added some nice features, but the usefulness of this extension is largely meant for developers who like to target SPIR-V themselves. Developers using GLSL or HLSL might not find much use for this extension. Some highlights of SPIR-V 1.4 that I think are worth mentioning are listed here.

OpSelect between composite objects

OpSelect before SPIR-V 1.4 only supports selecting between scalars and vectors. SPIR-V 1.4 thus allows you to express this kind of code easily with a simple OpSelect:

    MyStruct s = cond ? MyStruct(1, 2, 3) : MyStruct(4, 5, 6);

OpCopyLogical

There are scenarios in high-level languages where you load a struct from a buffer and then place it in a function variable. If you have ever looked at SPIR-V code for this kind of scenario, glslang would copy each element of the struct one by one, which generates bloated SPIR-V code. This is because the struct type that lives in a buffer and a struct type for a function variable are not necessarily the same. Offset decorations are the major culprits here. Copying objects in SPIR-V only works when the types are exactly the same, not “almost the same”. OpCopyLogical fixes this problem where you can copy objects of types which are the same except for decorations.

Advanced loop control hints

SPIR-V 1.4 adds ways to express partial unrolling, how many iterations are expected, and such advanced hints, which can help a driver optimize better using knowledge it otherwise would not have. There is no way to express these in normal shading languages yet, but it does not seem difficult to add support for it.

Explicit look-up tables

Describing look-up tables was a bit awkward in SPIR-V. The natural way to do this in SPIR-V 1.3 is to declare an array with private storage scope with an initializer, access chain into it and load from it. However, there was never a way to express that a global variable is const, which relies on compilers to be a little smart. As a case study, let us see what glslang emits when using Vulkan 1.1 target environment:

#version 450

layout(location = 0) out float FragColor;
layout(location = 0) flat in int vIndex;

const float LUT[4] = float[](1.0, 2.0, 3.0, 4.0);

void main()
{
    FragColor = LUT[vIndex];
}

%float_1 = OpConstant %float 1
%float_2 = OpConstant %float 2
%float_3 = OpConstant %float 3
%float_4 = OpConstant %float 4
%16 = OpConstantComposite %_arr_float_uint_4 %float_1 %float_2 %float_3 %float_4

This is super weird code, but it is easy for compilers to promote to a LUT. If the compiler can prove there are no readers before the OpStore, and only one OpStore can statically happen, compiler can optimize it to const LUT.

%indexable = OpVariable %_ptr_Function__arr_float_uint_4 Function
OpStore %indexable %16
%24 = OpAccessChain %_ptr_Function_float %indexable %index
%25 = OpLoad %float %24

In SPIR-V 1.4, the NonWritable decoration can also be used with Private and Function storage variables. Add an initializer, and we get something that looks far more reasonable and obvious:

OpDecorate %indexable NonWritable
%16 = OpConstantComposite %_arr_float_uint_4 %float_1 %float_2 %float_3 %float_4

// Initialize an array with a constant expression and mark it as NonWritable.
// This is trivially a LUT.
%indexable = OpVariable %_ptr_Function__arr_float_uint_4 Function %16
%24 = OpAccessChain %_ptr_Function_float %indexable %index
%25 = OpLoad %float %24

VK_KHR_shader_subgroup_extended_types

This extension fixes a hole in Vulkan subgroup support. When subgroups were introduced, it was only possible to use subgroup operations on 32-bit values. However, with 16-bit arithmetic getting more popular, especially float16, there are use cases where you would want to use subgroup operations on smaller arithmetic types, making this kind of shader possible:

#version 450

// subgroupAdd
#extension GL_KHR_shader_subgroup_arithmetic : require

For FP16 arithmetic:

#extension GL_EXT_shader_explicit_arithmetic_types_float16 : require

For subgroup operations on FP16:

#extension GL_EXT_shader_subgroup_extended_types_float16 : require

layout(location = 0) out f16vec4 FragColor;
layout(location = 0) in f16vec4 vColor;

void main()
{
    FragColor = subgroupAdd(vColor);
}

VK_KHR_imageless_framebuffer

In most engines, using VkFramebuffer objects can feel a bit awkward, since most engine abstractions are based around some idea of:

MyRenderAPI::BindRenderTargets(colorAttachments, depthStencilAttachment)

In this model, VkFramebuffer objects introduce a lot of friction, since engines would almost certainly end up with either one of two strategies:

  • Create a VkFramebuffer for every render pass, free later.
  • Maintain a hashmap of all observed attachment and render-pass combinations.

Unfortunately, there are some … reasons why VkFramebuffer exists in the first place, but VK_KHR_imageless_framebuffer at least removes the largest pain point. This is needing to know the exact VkImageViews that we are going to use before we actually start rendering.

With imageless frame buffers, we can defer the exact VkImageViews we are going to render into until vkCmdBeginRenderPass. However, the frame buffer itself still needs to know about certain metadata ahead of time. Some drivers need to know this information unfortunately.

First, we set the VK_FRAMEBUFFER_CREATE_IMAGELESS_BIT flag in vkCreateFramebuffer. This removes the need to set pAttachments. Instead, we specify some parameters for each attachment. We pass down this structure as a pNext:

typedef struct VkFramebufferAttachmentsCreateInfo {
    VkStructureType                        sType;
    const void*                                pNext;
    uint32_t                                   attachmentImageInfoCount;
    const VkFramebufferAttachmentImageInfo*    pAttachmentImageInfos;
} VkFramebufferAttachmentsCreateInfo;

typedef struct VkFramebufferAttachmentImageInfo {
    VkStructureType   sType;
    const void*       pNext;
    VkImageCreateFlags flags;
    VkImageUsageFlags usage;
    uint32_t          width;
    uint32_t          height;
    uint32_t          layerCount;
    uint32_t          viewFormatCount;
    const VkFormat*   pViewFormats;
} VkFramebufferAttachmentImageInfo;

Essentially, we need to specify almost everything that vkCreateImage would specify. The only thing we avoid is having to know the exact image views we need to use.

To begin a render pass which uses imageless frame buffer, we pass down this struct in vkCmdBeginRenderPass instead:

typedef struct VkRenderPassAttachmentBeginInfo {
    VkStructureType   sType;
    const void*       pNext;
    uint32_t          attachmentCount;
    const VkImageView* pAttachments;
} VkRenderPassAttachmentBeginInfo;

Conclusions

Overall, I feel like this extension does not really solve the problem of having to know images up front. Knowing the resolution, usage flags of all attachments up front is basically like having to know the image views up front either way. If your engine knows all this information up-front, just not the exact image views, then this extension can be useful. The number of unique VkFramebuffer objects will likely go down as well, but otherwise, there is in my personal view room to greatly improve things.

In the next blog on the new Vulkan extensions, I explore 'legacy support extensions.'

Follow Up

Thanks to Hans-Kristian Arntzen and the team at Arm for bringing this great content to the Samsung Developers community. We hope you find this information about Vulkan extensions useful for developing your upcoming mobile games.

The Samsung Developers site has many resources for developers looking to build for and integrate with Samsung devices and services. Stay in touch with the latest news by creating a free account or by subscribing to our monthly newsletter. Visit the Marketing Resources page for information on promoting and distributing your apps and games. Finally, our developer forum is an excellent way to stay up-to-date on all things related to the Galaxy ecosystem.

View the full blog at its source

Link to comment
Share on other sites



  • Replies 0
  • Created
  • Last Reply

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Similar Topics

    • By Samsung Newsroom
      Samsung Electronics’ newest AI TVs merge entertainment with convenience and security to upscale everyday life and elevate the viewing experience.
       
      The 2024 Neo QLED 8K QN900D TV is fitted with the 8K NQ8 AI Gen3 Processor. With its 512 neural networks the chip helps upscale low-resolution content to 8K-like qualities in real-time.1 Users can experience new channels through a Samsung TV Plus2 update and experience strengthened SmartThings connectivity.3 Since security is at the core of the smart home ecosystem, the new AI TV is equipped with Samsung Knox4 to strengthen protection measures.
       
      Take a look at how the Samsung AI TV is ushering in a new era of viewing through the infographic56 below:
       

       

       

       

       

       

       
       
      1 8K AI upscaling may not be available when connected to a PC or in AI Auto Game Mode.
      2 Samsung account required for Samsung TV Plus. Supported Samsung devices and channels may vary by region and are subject to change without prior notice. Additional settings may be required to use these functions.
      3 Samsung account and additional settings required for SmartThings functions such as AI Energy Mode.
      4 Samsung Knox is applied to Samsung TVs through Tizen OS for models after 2015. To maintain security, the TV requires the latest security software updates. Protection is guaranteed for three years after the product’s release date.
      5 Image is designed for reference only. Results may vary with real use.
      6 Relumino Mode may not be available for sources not input by DTV or HDMI.
      View the full article
    • By Samsung Newsroom
      What if technology could make an everyday moment richer, more connected and more lifelike? Samsung’s latest AI TV strives to enhance the viewing experience and integrate advanced technology into people’s lives so seamlessly that they barely notice.
       
      ▲ The 8K NQ8 AI Gen3 Processor combines AI sound, AI picture and AI optimization features for a frictionless viewing experience.
       
      Samsung’s AI TV, 2024 Neo QLED 8K QN900D, is equipped with the 8K NQ8 AI Gen3 Processor, the company’s most powerful processor to date, as well as a neural processing unit (NPU) that runs twice as fast as its predecessor. Featuring eight times as many neural networks — 512 instead of 64 — the new AI TV analyzes and optimizes content in real time, delivering superior picture and sound quality and elevating the TV experience to one that places viewers at the center.
       
       
      Full Immersion With AI Picture Technology
      Watching a tennis match with the Samsung AI TV is like getting courtside seats. Every serve and every volley come to life right in the living room.
       

       
      This immersive experience is possible thanks to 8K AI Upscaling Pro and AI Motion Enhancer Pro which work together to deliver exceptional clarity by sharpening low-resolution content and minimizing ball distortion and blurring.
       
      ▲ AI Motion Enhancer Pro uses deep learning to show the precise movements of a soccer ball during a match.
       
       
      Revolutionizing Audio
      Lifelike audio is equally important — and with AI guiding the experience, dialogue is crystal clear even in loud surroundings. Active Voice Amplifier Pro distinguishes between voices and background noise, cutting through the commotion so viewers hear only what’s important.
       
      ▲ Active Voice Amplifier Pro ensures that dialogue is clear and audible.
       
      When watching a movie with Object Tracking Sound (OTS) Pro, viewers are no longer spectators — they’re in the character’s shoes, hearing everything from all directions.
       
      ▲ OTS Pro puts viewers in the scene so they hear what the characters hear.
       
       
      Effortless Fine-Tuning
      With AI Optimization, viewers can sit back and relax. This feature automatically fine-tunes the TV’s settings for the best viewing experience. Likewise, gamers don’t have to worry about adjusting the picture or sound with AI Auto Game Mode, which recognizes game titles and genres and automatically optimizes settings for an upgraded experience.
       
      ▲ AI Auto Game Mode allows users to focus on their gameplay.
       
      AI Energy Mode conserves energy by using sensors to analyze ambient lighting before automatically adjusting the AI TV screen’s brightness. Similarly, the TV’s processor can identify on-screen motion and alter screen brightness. This feature represents a step toward greater sustainability.
       
      ▲ Samsung Tizen OS transforms the AI TV into a personal entertainment center.
       
      Meanwhile, Samsung Tizen OS turns the AI TV into more than an entertainment center. A personal curator, the operating system tailors content to users’ preferences while protecting their privacy with Samsung Knox.
       
      ▲ Samsung Tizen OS transforms the AI TV into a personal entertainment center.
       
      At its best, technology that enriches lives sits in the background — present, but not intrusive. Samsung’s AI TV continues to evolve to do just that by pushing the frontier of audio and visual experiences, personalized experiences and enhanced sustainability while putting AI to work under the hood.
      View the full article
    • By Samsung Newsroom
      Start Date May 28, 2024 - May 28, 2024
      Location 그래비티 서울 판교, 스페이스 볼룸 B1
      View the full blog at its source
    • By Samsung Newsroom
      Samsung Electronics, a global leader in the display industry, has secured the top position in global sales of OLED monitors just one year after launching its first OLED model — the 34″ Odyssey OLED G8 (G85SB model), a gaming monitor.
       
      According to the International Data Corporation (IDC), Samsung Electronics has taken the top position in the global OLED monitor market by capturing 34.7% of market share based on total revenue, and the top position in market share based on sales volume with 28.3% of OLED monitors sold in 2023.1
       
      “The OLED monitor market is highly competitive, so reaching the top spot requires unparalleled innovation and product quality,” said Hoon Chung, Executive Vice President of Visual Display Business at Samsung Electronics. “This achievement speaks to our drive for excellence and understanding of consumer needs, the key factors in producing outstanding OLED monitors for performance-demanding gamers around the globe.”
       
      Samsung has also maintained its leadership in the overall global gaming monitor market for the fifth consecutive year, recording a market share of 20.8% in terms of total revenue.2
       
      Since entering the OLED market, Samsung has continued to innovate and receive praise for new monitors, including the Odyssey OLED G9 (G95SC model), which received widespread acclaim from experts and reviewers worldwide.

       
      At CES 2024, Samsung announced an expansion of its OLED monitor lineup, unveiling three new products:
       
      The Odyssey OLED G8 (G80SD model), with a 32” 4K UHD resolution, a 16:9 aspect ratio, 240Hz refresh rate and 0.03ms response time (GtG)3 The Odyssey OLED G6 (G60SD model), with a 27” QHD resolution, a 16:9 aspect ratio, 360Hz refresh rate and 0.03ms response time (GtG) An updated Odyssey OLED G9 (G95SD model), with 49” dual QHD resolution in a 32:9 aspect ratio, a 240Hz refresh rate and 0.03ms response time (GtG) and new features  
      The new OLED offerings have impressed early reviewers, and have already won awards. At CES — the most powerful tech event in the world — the Odyssey OLED G9 was named a CES® 2024 Innovation Awards Honoree.4
       
      Samsung will continue to diversify its gaming monitor lineup by introducing new Odyssey OLED models, each of which will leverage Samsung’s proprietary OLED technology. This innovation follows the success of the Odyssey Neo series with Quantum Mini LED technology, as well as the Odyssey Ark, which showcased a groundbreaking interface and form factor.
       
      For more information on Samsung’s industry-leading monitor lineups, please visit www.samsung.com.
       

       

       
       
      1 IDC Q4 2023 Worldwide Quarterly Gaming Tracker
      2 IDC Q4 2023 Worldwide Quarterly Gaming Tracker, Gaming monitor classification is based on IDC criteria (over 144Hz since 2023 2Q, over 100Hz prior to that), Value Based.
      3 Gray to gray, a unit of measurement for how long it takes for a pixel to go from one gray level to the next.
      4 The CES Innovation Awards are based upon descriptive materials submitted to the judges. The Consumer Technology Association (CTA) did not verify the accuracy of any submission or of any claims made and did not test the item to which the award was given.
      View the full article
    • By OmeJuup
      An app that I would like to download and use (CAIWAY WebTV) is only supported by Tizen OS 5.0, but my Samsung TV is running on Tizen 4.0, so the app cannot be found. I updated the TV software to the latest version, but this does not update the Tizen OS itself. Is there a way to update the Tizen OS itself?





×
×
  • Create New...