Add DXGI_COLOR_SPACE_TYPE and SDR white level when available for HDR
characteristics.
GetPathInfo/IsInternalVideoOutput functions were copied from MS docs.
Despite DuplicateOutput deing documented to always return BGRA8, it is
seen fluctuating between BGRA8 and RGBA16F when HDR is enabled.
DuplicateOutput1 seems to provide a stable format whether SDR or HDR.
Also update the texture recreation logic to check for format changes.
Adds support for using shared textures that were made with the
D3D11_RESOURCE_MISC_SHARED_NTHANDLE flag.
This increases the minimum required Windows version to Windows 8 or the
Platform Update for Windows 7. As official support is only for Win 8+
this does not change official support.
Windows 11 support for DISCARD swap effect seems to be buggy and slow.
Use FLIP_DISCARD instead. Staying with DISCARD on Windows 10 because of
reports of flickering that hopefully won't happen on Windows 11.
We're also not using ALLOW_TEARING because it seems to break after an
OBS cycle of minimize/restore on both Windows 10 and 11. The swap chain
displays a stale image. Not sure if ALLOW_TEARING would be beneficial
even if it was working.
On systems that have both Intel iGPU and Intel dGPU at the same time,
default/prioritize running OBS the iGPU instead to improve performance.
The user can still choose the dGPU if they change the adapter index, but
the adapter index will now be the second value instead of the first
value. (-Jim)
GS_RGBA, GS_BGRX, and GS_BGRA now use TYPELESS DXGI formats, so we can
alias them between UNORM and UNORM_SRGB as necessary. GS_RGBA_UNORM,
GS_BGRX_UNORM, and GS_BGRA_UNORM have been added to support straight
UNORM types, which Windows requires for sharing textures from D3D9 and
OpenGL. The D3D path aliases via views, and GL aliases via
GL_EXT_texture_sRGB_decode/GL_FRAMEBUFFER_SRGB.
A significant amount of code has changed in the D3D/GL backends, but the
concepts are simple. On the D3D side, we need separate SRVs and RTVs to
support nonlinear/linear reads and writes. On the GL side, we need to
set the proper GL parameters to emulate the same.
Add gs_enable_framebuffer_srgb/gs_framebuffer_srgb_enabled to set/get
the framebuffer as SRGB or not.
Add gs_linear_srgb_active/gs_set_linear_srgb to instruct sources that
they should render as SRGB. Legacy sources can ignore this setting
without regression.
Update obs_source_draw to use linear SRGB as needed.
Update render_filter_tex to use linear SRGB as needed.
Add gs_effect_set_texture_srgb next to gs_effect_set_texture to set
texture with SRGB view instead.
Add SRGB helpers for vec4 struct.
Create GDI-compatible textures without SRGB support. Doesn't seem to
work with SRGB formats.
This reverts commit 4da73445c3.
This is being reverted because apparently it causes flickering displays
for some people. Bad drivers or something? Not sure. Very annoying.
Complex external systems using the D3D11 device may need to perform
their own device loss handling, the upcoming Windows Graphics Capture
support for example.
Because this did not have WINAPI (stdcall) specified as the calling
convention on the gdi32 export, caused a crash due to stack corruption
on the 32bit version of OBS.
We really shouldn't be resetting duplicator state as part of gs_flush.
gs_begin_scene is not ideal because it is called twice per frame, and
only after duplicators have been ticked. Even though it makes no
user-facing difference, it makes more logical sense to reset at the top
of the frame than the bottom.
Not in love with STL, but lets at least use the semantically-correct
collection. It's also a shame this is a global variable with gross
pre-main allocations, but attaching it to the device instance would
break the interface.
(This commit also modifies the UI)
This solves the issue where OBS would be deprioritized by Windows over
fullscreen games, causing OBS to lag out whereas the games would still
run fine.
Feature Level 9.3 appears to never have actually worked because shaders
are compiled as straight 4_0 instead of 4_0_level_9_3. That being the
case, baseline against 10_0 instead.
NV12 GPU copies to staging textures for CPU read take a ridiculously
long time on my integrated Intel GPU. Using R8/R8G8 instead seems to be
a huge speed-up.
Intel HD Graphics 530, D3D11 query timings, SetStablePowerState
NV12: ~3268 us (minimum of wild timings)
R8/R8G8: ~781 us (most frequently occurring timing)
This change only wraps the functionality. I have rough code to exercise
the the query functionality, but that part is not really clean enough to
submit.
Code submissions have continually suffered from formatting
inconsistencies that constantly have to be addressed. Using
clang-format simplifies this by making code formatting more consistent,
and allows automation of the code formatting so that maintainers can
focus more on the code itself instead of code formatting.
The cache coherency of rasterization for full-screen passes is better
using an oversized triangle that is clipped rather than two triangles.
Traversal order of rasterization is GPU-specific, but will almost
certainly be better using an undivided primitive.
A smaller benefit is that quads along the diagonal are not evaluated
multiple times, but that's minor in comparison.
Redo format shaders to bypass vertex buffer, and input layout. Add
global shader bool "obs_glsl_compile" to make API-specific decisions,
i.e. handle upside-down UVs. gl_ortho is not needed for format
conversion because the vertex shader does not use ViewProj anymore.
This can be applied to more situations, but start small first.
Testbed full screen passes, Intel HD Graphics 530:
RGBA -> UYVX: 467 -> 439 us, ~6% savings
UYVX -> uv: 295 -> 239 us, ~19% savings
There are cases where alpha is multiplied unnecessarily. This change
attempts to use premultiplied alpha blending for composition.
To keep this change simple, The filter chain will continue to use
straight alpha. Otherwise, every source would need to modified to output
premultiplied, and every filter modified for premultiplied input.
"DrawAlphaDivide" shader techniques have been added to convert from
premultiplied alpha to straight alpha for final output. "DrawMatrix"
techniques ignore alpha, so they do not appear to need changing.
One remaining issue is that scale effects are set up here to use the
same shader logic for both scale filters (straight alpha - incorrectly),
and output composition (premultiplied alpha - correctly). A fix could be
made to add additional shaders for straight alpha, but the "real" fix
may be to eliminate the straight alpha path at some point.
For graphics, SrcBlendAlpha and DestBlendAlpha were both ONE, and could
combine together to form alpha values greater than one. This is not as
noticeable of a problem for UNORM targets because the channels are
clamped, but it will likely become a problem in more situations if FLOAT
targets are used.
This change switches DestBlendAlpha to INVSRCALPHA. The blending
behavior of stacked transparents is preserved without overflowing the
alpha channel.
obs-transitions: Use premultiplied alpha blend, and simplify shaders
because both inputs and outputs use premultiplied alpha now.
Fixes https://obsproject.com/mantis/view.php?id=1108
Add support for debug markers via D3DPERF API and KHR_debug. This makes
it easier to understand RenderDoc captures.
D3DPERF is preferred to ID3DUserDefinedAnnotation because it supports
colors. d3d9.lib is now linked in to support this.
This feature is disabled by default, and is controlled by
GS_USE_DEBUG_MARKERS.
From: obsproject/obs-studio#1799
Currently SrcBlendAlpha and DestBlendAlpha are both ONE, and can
combine together to form two. This is not a noticeable problem for
UNORM targets because the channels are clamped, but it will likely
become a problem if FLOAT targets are more widely used.
This change switches DestBlendAlpha to INVSRCALPHA, and starts
backgrounds as opaque black instead of transparent black. The blending
behavior of stacked transparents is preserved without overflowing the
alpha channel.
On some AMD systems, the test for bad NV12 output would come up negative
(output was good), but when recording/streaming, it would have bad
output. While the test worked correctly with NVIDIA systems, it did not
work in these cases. This was because we did not correctly reproduce
the exact conditions of green output because the textures used for the
test were not also keyed mutex shared textures. This fixes that issue
and ensures the test works correctly in those other niche cases.
Performs a test on a texture to determine if NV12 textures are
functioning correctly. Older NVIDIA drivers appear to have a bug where
they will output their UV channel data on to the Y channel when copying
from the GPU. This test on startup determines whether that bug is
occurring, and if so, disables NV12 texture support.
If a texture has to be rebuilt due to a driver reset and is a keyed
mutex shared texture, make sure to reacquire the shared handle and
acquire the lock.
When NV12 textures were added, driver crashes and their rebuild process
was not taken in to consideration. This fixes that by adding explicit
NV12 rebuild functions.
Gives the ability to retrieve param annotations. Blocks wrapped in <>
following a parameter.
For example:
float slider < float max_value = 10.0; float min_value = 0.0; >;
These blocks are not for shading purposes but to help describe the
shader's gui as in the example above.
Adds graphics api functions for retrieving annotations:
size_t gs_param_get_num_annotations(const gs_eparam_t *param);
gs_eparam_t *gs_param_get_annotation_by_idx(const gs_eparam_t *param,
size_t annotation);
gs_eparam_t *gs_param_get_annotation_by_name(const gs_eparam_t *param,
const char *name);
This commit fixes a bug that occurs on Windows 8+ when two or more
"Display Capture" sources are active that are configured to capture the
same monitor. Only one display capture would show, while all subsequent
display captures would display nothing.
Closesjp9000/obs-studio#1142
(Note: This commit also modifies libobs-d3d11 and libobs-opengl)
Allows the ability to flush data directly without having to use the
buffer's internal data.
Allows the caller to manage his/her own vertex/index buffer data if
desired, working around the design flaw of having to rely on a
vertex/index buffer's internal data.