Some code in these classes has been also refactored/improved/optimized.
Also fix the extraction of PeerTube audio streams as video streams, which are now returned as audio streams.
These fields can be now replaced by a getter and a setter.
New fields have been added and will allow the creation of DASH manifests for OTF and ended livestreams. There are:
- contentLength;
- approxDurationMs;
- targetDurationSec;
- sampleRate;
- audioChannels.
DashMpdParser is only working with YouTube streams, as it uses the ItagItem class.
Also improve code and comments of StreamInfo (especially final use where possible).
Stream constructors are now private and streams can be constructed with new Builder classes per stream class. This change has been made to prevent creating and using several constructors in stream classes.
Some default cases have been also added in these Builder classes, so not everything has to be set, depending of the service and the content.
Google returns now the consent page of YouTube for YouTube Music in EU, which can be also avoided by adding the ucbcb parameter to the URL with the value 1 ("?ucbcb=1").
* Fixed obvious sonar(lint) warnings
* Abstracted some code (get*Streams)
* Used some new lines to make code better readable
* Chopped down brace-jungle in some methods
* Use StandardCharset (Java 8 4tw)
Mixes seems to be not given by YouTube anymore if you use a PENDING consent cookie value.
As mocks needs to updated, the test is always failing because of this change.
Use the TV embedded client technique to get streams of embeddable age-restricted videos.
This client doesn't provide the playerMicroFormatRenderer object in the player response, but it is still returned on the WEB player response, even for unavailable (but non-private) contents, so we need now to store it, as we are replacing the player response from the WEB client by the TV embedded one.
Otherwise, some metadata such as the unlisted property, category, the uploadDate and the publishDate properties.
The outdated code for these contents has been removed.
Add the racyCheckOk and contentCheckOk to player and next requests to the InnerTube API.
The first doesn't seem to make any difference when used anonymously, but the second one is needed to get streams of contents with a warning before they can be played.
Also apply some requested changes, fixes and improvements in YoutubeParsingHelper and YoutubeStreamExtractor.
Also move the iPhone device machine id to a constant, explain how it is used and move the licence in the header of the file, and fix missing imports in YoutubeStreamExtractor (due to a rebase issue).
InnerTube responses return pretty printed responses, which increase responses' size for nothing.
By using the prettyPrint parameter on requests and setting its value to false, responses are not pretty printed anymore, which reduces responses size, and so data transfer and processing times.
This usage has been recently deployed by YouTube on their websites.
In order to use still use mocks with the generation of random strings in player requests, we need to use YoutubeParsingHelper.setSeedForVideoTests() method in every stream test.
The iOS client is only enabled for livestreams and the Android client is now only enabled for videos, both by default.
A way to force, or not, the fetch of both clients have been added with two new static methods in YoutubeStreamExtractor.
This is done by fetching https://www.youtube.com/sw.js for YouTube and https://music.youtube.com/sw.js for YouTube Music.
Two new methods in Utils class have been added which allow to try to get a match of regular expressions in a string array, or a Pattern array, on a content, on a specific index or 0.
Also some code refactoring has been made in this class.
The cpn param, aka the content playback nonce param, is a parameter sent by YouTube web client in videoplayback requests, and for some of them, in the player request body. This PR adds it everywhere.
For the desktop/WEB client, some params were missing from the playbackContext object, which seemed (or not) to make YouTube throttle streams extracted from the WEB client. This PR adds them.
Fingerprinting on the WEB client basing on the client version used is not possible anymore, because the latest client version is extracted at the first time of a YouTube request on a session which require the extractor to fetch again the website (and this may come back the reCaptcha issues again unfortunately, but it seems there is no other way to get it).
For the Android client, the video id is now also sent as a query parameter, like a 12 characters string, in the t query parameter, in order to spoof better this client. Researches need to be done on this parameter, unique to each request, and how it is generated by clients.
This commit also fixes a small bug with the Android User-Agent string.
Some code improvements have been also made.
The boolean keyAndVersionExtracted in YoutubeParsingHelper was not set to false when resetting the client version and the key, which makes the extractor uses null on the next getting of the client version or the key if the clientVersion and the key were extracted before.
Also update client versions.
That is, basically where the overriding function was missing an annotation from the base method.
Also apply renaming of emptyDescription to EMPTY_DESCRIPTION
``contentFilters`` and ``sortfilter`` are get inside the ``ListLinkHandler`` and not the ``ListLinkHandlerFactory``
``ListLinkHandlerFactory`` only passes these values through when ``fromQuery`` is called
With respect to NewPipe's checkstyle.xml, checkstyle is disabled for javadoc comments. There is no need for strict rules over comments here in the extractor, as sometimes javadocs are just needed to clarify a small thing and having empty/meaningless @param or @throws is useless.
Replaces mix tests based on a strange mix type RDQM{videoId} (only reference I could find is https://github.com/ytdl-org/youtube-dl/issues/26228) and with an invalid video id of 13 characters (the first two characters were QM, but even after removing QM there still wasn't a video available at that id).
Also updates mocks.
Note: genre mixes already worked, now they are just considered as such in various video id extraction and in related items
Note 2: now extracting a mix id from a *normal* youtube mix id will fail if the video id wouldn't be exactly 11 characters long
It is a collector that can handle many extractor types, to be used when a list contains items of different types (e.g. search). It was renamed from InfoItemsSearchCollector so that it can now be used not just for search but for any extractor needing it. It supports, streams, channels, playlists and *mixes*.
ITEM_COUNT_UNKNOWN is returned when the JSON array which contains usally the number of videos is less than 3 items.
Also apply the same type of optimizations done in other PlaylistExtractors in YoutubePlaylistExtractor.
Also fix some issues in the extractors, remove uneeded overrides, use the Java 8 Stream API where possible and replace usages of Utils.UTF_8 with StandardCharsets.UTF_8 in these classes.
* Deprecated Utils#UTF-8; see StandardCharsets
* Added more helpful methods to ``ExtractorAsserts``
* Use parameterized (cool new) tests
* Restore functionality of some tests + updated mockdata
* Other code cleanups + Sonarlint improvements
csv:
Improved error messages
Exits early if it hasnt found any items in the first few lines
zip:
Now checks all CSV files instead of hard-coded paths
final qualifiers for immutable locals and parameters
Co-authored-by: litetex <40789489+litetex@users.noreply.github.com>
* Faster iframe api based player extraction.
Uses the IFrame API to reduce the required download to less than 1/50 of the size.
* Remove debug code.
* Extract to two methods.
* Add tests for player URL extraction.
* Add assertThat for tests.
Without removing RunWith and SuiteClasses annotations (and the corresponding imports) in YoutubePlaylistExtractorTest and YoutubeMixPlaylistExtractorTest, some mocks cannot be generated, so the CI fails because of the missing mocks. Mocks of workings tests have been also updated.
Migrate YouTube comments to the desktop version by using the `next` endpoint of the InnerTube internal API.
With the desktop version, we are able to get the exact like count of YouTube comments (by parsing the accessibility data) (the current extraction is used as a fallback). We are also now able to get if the uploader of the comment is verified or not.
Co-authored-by: TiA4f8R <74829229+TiA4f8R@users.noreply.github.com>
Here is now the requests which will be made by the `onFetchPage` method of `YoutubeStreamExtractor`:
- the desktop API is fetched.
If there is no streaming data, the desktop player API with the embed client screen will be fetched (and also the player code), then the Android mobile API.
- if there is no streaming data, a `ContentNotAvailableException` will be thrown by using the message provided in playability status
If the video is age restricted, a request to the next endpoint of the desktop player with the embed client screen will be sent.
Otherwise, the next endpoint will be fetched normally, if the content is available.
If the video is not age-restricted, a request to the player endpoint of the Android mobile API will be made.
We can get more streams by using the Android mobile API but some streams may be not available on this API, so the streaming data of the Android mobile API will be first used to get itags and then the streaming data of the desktop internal API will be used.
If the parsing of the Android mobile API went wrong, only the streams of the desktop API will be used.
Other code changes:
- `prepareJsonBuilder` in `YoutubeParsingHelper` was renamed to `prepareDesktopJsonBuilder`
- `prepareMobileJsonBuilder` in `YoutubeParsingHelper` was renamed to `prepareAndroidMobileJsonBuilder`
- two new methods in `YoutubeParsingHelper` were added: `prepareDesktopEmbedVideoJsonBuilder` and `prepareAndroidMobileEmbedVideoJsonBuilder`
- `createPlayerBodyWithSts` is now public and was moved to `YoutubeParsingHelper`
- a new method in `YoutubeJavaScriptExtractor` was added: `resetJavaScriptCode`, which was needed for the method `resetDebofuscationCode` of `YoutubeStreamExtractor`
- `areHardcodedClientVersionAndKeyValid` in `YoutubeParsingHelper` returns now a `boolean` instead of an `Optional<Boolean>`
- the `fetchVideoInfoPage` method of `YoutubeStreamExtractor` was removed because YouTube returns now 404 for every client with the `get_video_info` page
- some unused objects and some warnings in `YoutubeStreamExtractor` were removed and fixed
Co-authored-by: TiA4f8R <74829229+TiA4f8R@users.noreply.github.com>
This method is needed for YouTube stream tests, because when all YouTube tests are ran, the signatureTimestamp is known (the sts string) so a different body than the body present in the mocks is send by the extractor instance.
As a result, running all YouTube stream tests with the MockDownloader (like the CI does) will fail if this method is not called before fetching the page of a test.
The strings playerJsUrl, sts and playerCode are now static in order to don't fetch again the JavaScript player at each time the signatureTimestamp is needed.
Catch every exception instead of only IOException and ExtractionException.
Add JavaDoc for fetchAndroidMobileJsonPlayer method of YoutubeStreamExtractor