- Enhance documentation;
- Fix the regular expression fallback on HTML embed watch page;
- Use HTML scripts tag search first instead of the regular expression approach,
now used as a last resort;
- Compile regular expressions only once, in order to improve the performance of
subsequent extraction calls when clearing the cache;
- Provide original exceptions when fetching or parsing pages on which the base
JavaScript's player could be found failed, allowing clients to detect network
errors when they are the cause of the failures for instance;
- Remove delegate method which was not taking a video ID and hardcoding one, as
we can provide the video ID in all cases or do not provide a video ID at worse;
- Rename and make extraction methods package-private, as they are not intended
to be used publicly.
These breaking internal changes have been applied where needed, in
YoutubeJavaScriptExtractorTest and YoutubeStreamExtractor (in which an unneeded
initStsFromPlayerJsIfNeeded call have been removed).
- Fix testCheckAudioStreams test of
YoutubeStreamExtractorDefaultTest.AudioTrackLanguage test class, by updating
the excepted audio track name test to use the updated English audio track name
(audio track type info has been added on the video tested);
- Fix YoutubeStreamExtractorDefaultTest.PublicBroadcasterTest test class by
using a different video from a French and German public broadcast channel, as
the channel Dinge Erklärt – Kurzgesagt is not affiliated with a public
broadcast channel anymore;
- Fix YoutubeStreamExtractorLivestreamTest test class, by updating the excepted
name of the livestream to the current one.
These parameters are the only ones currently known to bypass 403 HTTP issues
related to failure of passing Android client integrity checks, as the ones of
stories (and the base of the shorts ones) do not work anymore, which may be
related to end of this format on the service.
- Remove useless concatenation on the downloader path;
- Remove unneeded public test modifier;
- Update license header;
- Specify the service class tested instead of the generic class.
- Switch to the new domain used by YouTube for search suggestions,
suggestqueries-clients6.youtube.com, and add the xhr query parameter with the
t value, to allow getting responses without requiring trim;
- Use the Java 8 Stream API to collect search suggestions and improve invalid
response detection by checking whether the content type of the response
returned is JSON;
- Move the licence header at the top of the file.
When a description is missing, no description should be returned, even the ones
indicating there is no description. This behavior is represented by a null
return instead.
Also update PeertubeAccountExtractorTest to reflect these changes.
The tested comment has been removed, so it couldn't be found in the comments
list.
This comment has been replaced by a new one from the current comments of the
video.
Also, in the parent class PeertubeCommentsExtractorTest, final has been used as
much as possible and for-each loops of lists have been replaced by their
forEach method or the Stream API, in order to simplify code.
As the "No views" string is returned in the case there is no view on a video, a
number cannot be parsed in this case, so -1 was returned.
This string is now detected in all methods to get the view count of a stream.
Getting audio tracks locales by parsing their ID or their label, should not be
done by clients, but by the extractor.
This commit adds the ability to store the Locale of an AudioStream, which is
used to compare similar AudioStreams (in the equalStats method).
webCommandMetadata object is contained inside a commandMetadata one, so it is
not accessible from the root of the navigationEndpoint object.
The corresponding statement has been moved at the bottom of the specific
endpoints parsing, as the webCommandMetadata object is present almost
everywhere, otherwise URLs of some endpoints would have be changed, such as
uploader URLs (from channel IDs to handles).
As no ParsingException is now thrown by getUrlFromNavigationEndpoint, and so by
getTextFromObject, getUrlFromObject and getTextAtKey, the methods which were
catching ParsingExceptions thrown by these methods had to be updated.
URLs got in the HTML version of getTextFromObject are now escaped properly to
provide valid HTML to clients. This has been also done for attribute
descriptions, with the description text for this type of descriptions.
As YouTube descriptions are in HTML format (except for the fallback on the JSON
player response, which is plain text and only happens when there is no visual
metadata or a breaking change), all URLs returned are escaped, so tests which
are testing presence of URLs with escaped characters had to be updated (it was
only the case for YoutubeStreamExtractorDefaultTest.DescriptionTestUnboxing).
As like count is now returned by the extractor, we need to assert a positive
minimum like count, which is close to the actual value, in order to avoid test
failures due to lower like counts than the ones excepted.
- Remove unused imports;
- Replace wildcard imports by single class imports;
- Suppress "HTTP links are not secured" warnings from IDEA IDEs;
- Replace removed video jZViOEv90dI by an existing video, 9Dpqou5cI08 (the
corresponding test has been of course renamed).
- Move license header at the top;
- Use an unmodifiable set for the subpaths instead of a modifiable list;
- Add missing Nonnull and Nullable annotations;
- Improve exception messages.
The yt:channelId element doesn't provide the channel ID anymore and is empty,
like the id element, so we need now to extract it from the channel URL provided
in two elements: author -> uri and feed -> link.
Also avoid a NullPointerException in getUrl and getName methods.
- Return duration of video premieres;
- Add another non-localized method to determine whether a stream is a running
livestream;
- Return view count and upload date of videos in playlists;
- Store isPremiere result;
- Remove shorts workaround code, as it was only useful on channels and shorts
have been moved into a separated channel tab;
- Improve some other code.
This header was not sent partially before and was added and guessed by OkHttp. This can create issues when using other HTTP clients than OkHttp, such as Cronet.
Some code in the modified classes has been improved and / or deduplicated, and usages of the UTF_8 constant of the Utils class has been replaced by StandardCharsets.UTF_8 where possible.
Note that this header has been not added in except in YoutubeDashManifestCreatorsUtils, as an empty body is sent in the POST requests made by this class.
This header was not sent before and was added and guessed by OkHttp. This can create issues when using other HTTP clients than OkHttp, such as Cronet.
Also make use of StandardCharsets.UTF_8 when getting bytes of bodies instead of the platform default's charset, to make sure to prevent some encoding issues on some JVMs.